Btcd区块链协议消息解析

2024-07-11 05:16| 来源: 网络整理| 查看: 265

介绍完Bitcoin P2P网络的组网机制后，本文将介绍Peer之间交换的协议消息。Bitcoin节点将Transaction和Block在全网广播，就是通过在Peer与Peer之间同步Transaction和Block实现的，这正是Bitcoin协议的设计目标。同时，为了新建或者维持Peer关系，协议也定义了ping/pong心跳和getaddr/addr等消息，我们在前文的分析中均提到过。协议消息的定义在btcd/wire包中实现，wire并没有定义协议交互，即不包含收到消息后如何处理或者响应的逻辑，只定义了消息格式、消息的封装和解析方法等。消息的响应及交互在serverPeer中实现，其中涉及到的区块处理的逻辑在blockmanager和btcd/blockchain中实现。本文将重点介绍btcd/wire中定义的协议消息并简要介绍消息之间的交互，待我们分析完btcd/blockchain中关于区块的处理和共识后，再详细介绍消息的响应及交互过程。

btcd/wire主要包含如下文件:

protocol.go: 定义了Bitcoin协议的版本号、网络号及ServiceFlag等常量; common.go: 定义了向二进制流读写基础数据类型的的方法，同时也定义了读写可变长度整数值和可变长度字符串的方法; message.go: 定义了Message接口及消息封装和解析的“工厂方法”; msgXXX.go: 定义了具体消息的格式及接口方法的实现; blockheader.go: 定义了BlockHeader类型，用于block、headers等消息; invvect.go: 定义了InvVect类型，用于inv消息; netaddress.go: 定义了NetAddress类型; mgsXXX_test.go: 对应消息的测试文件; doc.go: 包btcd/wire的doc文档;

读者可以阅读bitcoinwiki上的《Protocol documentation》来对各协议消息的格式作全面了解，这将有助于理解我们接下来的代码分析。接下来，我们来看看Message接口及消息结构的定义:

//btcd/wire/message.go // Message is an interface that describes a bitcoin message. A type that // implements Message has complete control over the representation of its data // and may therefore contain additional or fewer fields than those which // are used directly in the protocol encoded message. type Message interface { BtcDecode(io.Reader, uint32) error BtcEncode(io.Writer, uint32) error Command() string MaxPayloadLength(uint32) uint32 } ...... // messageHeader defines the header structure for all bitcoin protocol messages. type messageHeader struct { magic BitcoinNet // 4 bytes command string // 12 bytes length uint32 // 4 bytes checksum [4]byte // 4 bytes }

可以看到，Bitcoin消息的消息头中包含4个字段，它们的意义是:

magic: 标识Bitcoin协议消息的“魔数”，同时也用于区分Bitcoin网络，有MainNet、TestNet、TestNet3及SimNet，节点启动时可以指定在哪个网络下运行; command: 协议消息包含一个命令字符串，如version、addr等，用于标识协议消息的类型; length: 消息体的长度; checksum: 消息体头4个字节的双SHA256的结果;

协议消息的结构如下图所示:

Message接口中BtcDecode()和BtcEncode()定义了解析和封装消息体的方法，它们在每个具体的消息定义中实现，主要是将结构化的消息体序列化为字节流或者将字节流实例化为某种消息格式，其中需要用到common.go中定义的各种读写基础数据类型的的方法。我们先来看看common.go中定义的各个方法。

//btcd/wire/common.go // readElement reads the next sequence of bytes from r using little endian // depending on the concrete type of element pointed to. func readElement(r io.Reader, element interface{}) error { // Attempt to read the element based on the concrete type via fast // type assertions first. switch e := element.(type) { case *int32: rv, err := binarySerializer.Uint32(r, littleEndian) if err != nil { return err } *e = int32(rv) return nil ...... case *bool: rv, err := binarySerializer.Uint8(r) if err != nil { return err } if rv == 0x00 { *e = false } else { *e = true } return nil ...... // Message header checksum. case *[4]byte: _, err := io.ReadFull(r, e[:]) if err != nil { return err } return nil // Message header command. case *[CommandSize]uint8: _, err := io.ReadFull(r, e[:]) if err != nil { return err } return nil ...... } // Fall back to the slower binary.Read if a fast path was not available // above. return binary.Read(r, littleEndian, element) }

其主要过程是通过类型断言(type assertion)解析欲读取字节对应的数据类型，然后根据类型的size读出字节slice，并进行强制类型转换后得到格式化的数据。writeElement()则是与其完全相反的过程，我们不再赘述。值得注意的是，对于如uint8、iunt32及uint64等基础数据类型的读写是通过binarySerializer的读写方法，而不是直接调用io.Read()或者io.Write()来实现的，而且，这些类型序列化后按小端字节序存储。binarySerializer是一个缓冲为1024个容量为8字节的byte slice管道，这里它并不用来在协程之间通信，而是作一个缓存队列使用。为了避免序列化或反序列化基础数据类型时频数地分配或者释放内存，binarySerializer提供了一个大小固定的“缓存池”，当需要缓存时，向“缓存池”“借”指定大小的的byte slice，使用完毕后“归还”。然而，尽管“缓存池”的大小固定，当它分配完毕后，后续的申请并不会被阻塞，而是从内存直接分配，使用完毕后交由gc回收。

//btcd/wire/common.go // binaryFreeList defines a concurrent safe free list of byte slices (up to the // maximum number defined by the binaryFreeListMaxItems constant) that have a // cap of 8 (thus it supports up to a uint64). It is used to provide temporary // buffers for serializing and deserializing primitive numbers to and from their // binary encoding in order to greatly reduce the number of allocations // required. // // For convenience, functions are provided for each of the primitive unsigned // integers that automatically obtain a buffer from the free list, perform the // necessary binary conversion, read from or write to the given io.Reader or // io.Writer, and return the buffer to the free list. type binaryFreeList chan []byte // Borrow returns a byte slice from the free list with a length of 8. A new // buffer is allocated if there are not any available on the free list. func (l binaryFreeList) Borrow() []byte { var buf []byte select { case buf = = NetAddressTimeVersion. Timestamp time.Time // Bitfield which identifies the services supported by the address. Services ServiceFlag // IP address of the peer. IP net.IP // Port the peer is using. This is encoded in big endian on the wire // which differs from most everything else. Port uint16 }

其中各字段的意义如下:

Timestamp: 记录节点从“外部”获知该地址的最近时间点，该时刻离现在越早，说明该地址的“存活期”越长，对应地址失效的可能性就越大。值得注意的是，version消息里的发送端(AddrMe)和接收端地址(AddrYou)里并没有包含该字段; Services: 表明节点支持的服务，也即节点类型，包含SFNodeNetwork、SFNodeGetUTXO和SFNodeBloom; IP: IP地址; Port: 端口号;

在熟悉了version的格式定义后，理解BtcEncode()和BtcDecode()变得非常简单，它们就是调用writeElement()或readElement等方法对不同的数据类型进行读写。MessageVersion的BtcEncode()和BtcDecode()比较简单，我们不再专门分析。

inv

inv消息的定义如下:

//btcd/wire/msginv.go // MsgInv implements the Message interface and represents a bitcoin inv message. // It is used to advertise a peer's known data such as blocks and transactions // through inventory vectors. It may be sent unsolicited to inform other peers // of the data or in response to a getblocks message (MsgGetBlocks). Each // message is limited to a maximum number of inventory vectors, which is // currently 50,000. // // Use the AddInvVect function to build up the list of inventory vectors when // sending an inv message to another peer. type MsgInv struct { InvList []*InvVect }

inv主要用来向Peer通告区块或者交易数据，它是getblocks消息的响应消息，也可以主动发送。inv消息体包含一个InvVect列表和表示InvVect个数的可变长度整数Count值。InvVect的定义如下:

//btcd/wire/invvect.go // InvVect defines a bitcoin inventory vector which is used to describe data, // as specified by the Type field, that a peer wants, has, or does not have to // another peer. type InvVect struct { Type InvType // Type of data Hash chainhash.Hash // Hash of the data } ...... // These constants define the various supported inventory vector types. const ( InvTypeError InvType = 0 InvTypeTx InvType = 1 InvTypeBlock InvType = 2 InvTypeFilteredBlock InvType = 3 )

InvVect包含两个字段:

Type: 指明数据的类型，如Tx、Block、或者FilteredBlock; Hash: 对应数据的Hash值，如某个transaction的hash或者block头的hash; getblocks

getblocks消息定义如下:

//btcd/wire/msggetblocks.go type MsgGetBlocks struct { ProtocolVersion uint32 BlockLocatorHashes []*chainhash.Hash HashStop chainhash.Hash }

其中各字段意义如下:

ProtocolVersion: 协议的版本号; BlockLocatorHashes: 记录一个BlockLocator，BlockLocator用于定位列表中第一个block元素在区块链中的位置; HashStop: getblocks请求的block区间的结束位置;

getblocks请求的区块位于BlockLocator指向的区块和HashStop指向的区块之间，不包括BlockLocator指向的区块；如果HashStop为零，则返回BlockLocator指向的区块之后的500个区块。这里需要理解BlockLocator，我们来看看它的定义:

//btcd/blockchain/blocklocator.go // BlockLocator is used to help locate a specific block. The algorithm for // building the block locator is to add the hashes in reverse order until // the genesis block is reached. In order to keep the list of locator hashes // to a reasonable number of entries, first the most recent previous 10 block // hashes are added, then the step is doubled each loop iteration to // exponentially decrease the number of hashes as a function of the distance // from the block being located. // // For example, assume you have a block chain with a side chain as depicted // below: // genesis -> 1 -> 2 -> ... -> 15 -> 16 -> 17 -> 18 // \-> 16a -> 17a // // The block locator for block 17a would be the hashes of blocks: // [17a 16a 15 14 13 12 11 10 9 8 6 2 genesis] type BlockLocator []*chainhash.Hash

可以看出，BlockLocator实际上是一个*chainhash.Hash类型的slice，用于记录一组block的hash值，slice中的第一个元素即BlockLocator指向的区块。由于区块链可能分叉，为了指明该区块的位置，BlockLocator记录了从指定区块往创世区块回溯的路径: BlockLocator中的前10个hash值是指定区块及其后(区块高度更小)的9个区块的hash值，它们之间的步长为1，第11个元素后步长成级数增加，即每一次向前回溯时，步长翻倍，使之加快回溯到创世区块，保证了BlockLocator中元素不至于过多。总之，BlockLocator记录slice中第一个元素代表的区块的位置。

我们可以通过分析MsgGetBlocks的BtcEncode方法来了解getblocks消息体的格式:

//btcd/wire/msggetblocks.go // BtcEncode encodes the receiver to w using the bitcoin protocol encoding. // This is part of the Message interface implementation. func (msg *MsgGetBlocks) BtcEncode(w io.Writer, pver uint32) error { count := len(msg.BlockLocatorHashes) if count > MaxBlockLocatorsPerMsg { str := fmt.Sprintf("too many block locator hashes for message "+ "[count %v, max %v]", count, MaxBlockLocatorsPerMsg) return messageError("MsgGetBlocks.BtcEncode", str) } err := writeElement(w, msg.ProtocolVersion) if err != nil { return err } err = WriteVarInt(w, pver, uint64(count)) if err != nil { return err } for _, hash := range msg.BlockLocatorHashes { err = writeElement(w, hash) if err != nil { return err } } return writeElement(w, &msg.HashStop) }

可以看出，MsgGetBlocks序列化时按顺序写入协议版本号、BlockLocator中hash个数、BlockLocator中hash列表及截止hash值，这就是getblocks消息体的格式。

getdata

getdata的消息定义是:

//btcd/wire/msggetdata.go // MsgGetData implements the Message interface and represents a bitcoin // getdata message. It is used to request data such as blocks and transactions // from another peer. It should be used in response to the inv (MsgInv) message // to request the actual data referenced by each inventory vector the receiving // peer doesn't already have. Each message is limited to a maximum number of // inventory vectors, which is currently 50,000. As a result, multiple messages // must be used to request larger amounts of data. // // Use the AddInvVect function to build up the list of inventory vectors when // sending a getdata message to another peer. type MsgGetData struct { InvList []*InvVect }

节点收到Peer的inv通告后，如果发现有更新的区块或者交易，则可以向Peer发送getdata请求来同步区块或者交易。getdata消息比较简单，与inv类似，它的消息体包含了InvVect列表，指明自己希望同步的区块或者交易的hash列表；Peer收到后回复block或tx消息，将区块或者交易发送给节点。

tx消息用于在Peer之间同步transations，它的定义如下:

//btcd/wire/msgtx.go // MsgTx implements the Message interface and represents a bitcoin tx message. // It is used to deliver transaction information in response to a getdata // message (MsgGetData) for a given transaction. // // Use the AddTxIn and AddTxOut functions to build up the list of transaction // inputs and outputs. type MsgTx struct { Version int32 TxIn []*TxIn TxOut []*TxOut LockTime uint32 } ...... // TxIn defines a bitcoin transaction input. type TxIn struct { PreviousOutPoint OutPoint SignatureScript []byte Sequence uint32 } ...... // OutPoint defines a bitcoin data type that is used to track previous // transaction outputs. type OutPoint struct { Hash chainhash.Hash Index uint32 } ...... // TxOut defines a bitcoin transaction output. type TxOut struct { Value int64 PkScript []byte }

从MsgTx的定义可以看出，一个transaction中主要包含一个TxIn的列表和TxOut列表，TxIn实际上指向输入交易的UTXO，TxOut是当前交易的UTXO。MsgTx的各字段意义如下:

Version: Tx的版本号，当前版本号为1；高版本的Tx对LockTime或TxIn中的Sequence的使用不一样; TxIn: 引用的输入交易的UTXO(s)，包含上一个交易的hash值和Index。Index表示上一个交易的输出的序号(因为上一个交易的输出UTXO可能有多个，序号从0开始)；SignatureScript是解锁脚本；Sequence表示输入交易的序号，对于同一个交易，“矿工”优先选择Sequence更大的交易加入区块进行挖矿，但如果其值为0xffffffff，则表明该交易可以被加进任何区块; TxOut: 当前交易的输出UTXO(s)，它包含解锁脚本和输出的Bitcoin数量，这里Value的单位是“聪”，即千万分之一比特币。PreviousOutPoint中的Index即是前一个交易的[]*TxOut中的索引号; LockTime: 既可以表示UTC时间，也可以表示区块高度。当其值小于 5x 10^8 (Tue Nov 5 00:53:20 1985 UTC) 时，它表示区块高度。交易只能被打包进大于该高度值或者在该时间点后的区块中。如果其值为0，表明该交易可以加入任何区块中。版本2及以上的交易结构引入了相对锁定时间(RLT, relative lock-time)的概念，联合LockTime和TxIn的Sequence字段来控制“矿工”节点能否将一个交易打包到某个区块中，详细说明可以参见BIP0068，本文暂是不深入介绍，我们将在介绍区块的处理和共识时再说明;

从PreviousOutPoint的定义中，可以看到所有的交易均会向前引用形成一条“交易链”，直到coinbase交易，我们在《曲线上的“加密货币”(一)》中介绍过UTXO(s)引用的结构。了解了其定义和结构后，我们再通过MsgTx的TxHash()了解交易的Hash是如何计算的:

//btcd/wire/msgtx.go // TxHash generates the Hash for the transaction. func (msg *MsgTx) TxHash() chainhash.Hash { // Encode the transaction and calculate double sha256 on the result. // Ignore the error returns since the only way the encode could fail // is being out of memory or due to nil pointers, both of which would // cause a run-time panic. buf := bytes.NewBuffer(make([]byte, 0, msg.SerializeSize())) _ = msg.Serialize(buf) return chainhash.DoubleHashH(buf.Bytes()) }

可以看到，交易的Hash是整个交易结构的字节流进行两次SHA256()后的结果。其中，Serialize()方法就是调用BtcEncode()对MsgTx进行序列化。BtcEncode()或BtcDecode()就是按MsgTx的定义逐元素写或者读，逻辑比较清晰，我们不再赘述。需要注意的是，在BtcDecode()中，对锁定脚本或解锁脚本的读取用到了scriptFreeList，它类似于前面介绍过的binaryFreeList，也是用channel实现的“内存池”，读者可以自行分析。

block

除了tx外，block是btcd/wire里最重要的概念，它定义了区块的结构:

//btcd/wire/msgblock.go // MsgBlock implements the Message interface and represents a bitcoin // block message. It is used to deliver block and transaction information in // response to a getdata message (MsgGetData) for a given block hash. type MsgBlock struct { Header BlockHeader Transactions []*MsgTx }

区块里包含区块头和一系列交易的集合，区块头的定义为:

//btcd/wire/blockheader.go // BlockHeader defines information about a block and is used in the bitcoin // block (MsgBlock) and headers (MsgHeaders) messages. type BlockHeader struct { // Version of the block. This is not the same as the protocol version. Version int32 // Hash of the previous block in the block chain. PrevBlock chainhash.Hash // Merkle tree reference to hash of all transactions for the block. MerkleRoot chainhash.Hash // Time the block was created. This is, unfortunately, encoded as a // uint32 on the wire and therefore is limited to 2106. Timestamp time.Time // Difficulty target for the block. Bits uint32 // Nonce used to generate the block. Nonce uint32 }

其中和字段的意义为:

Version: 区块的版本，与协议版本号不同; PrevBlock: 链上前一个区块的Hash值，每个区块都通过该字段指向上一个区块，直到创世区块，从而形成链结构; MerkleRoot: 该区块中所有交易Hash构成的Merkle树的树根的Hash，它包涵了区块中所有交易的信息，我们将在后文中介绍Merkle树; Timestamp: 区块创建的时间点; Bits: 区块的目标难度值，“挖矿”的过程就是找到一个Nonce值使得区块Hash小于该值; Nonce: 用于“挖矿”或验证区块难度的随机值;

通过MsgBlock的BtcEncode()方法可以了解区块序列化后的格式:

//btcd/wire/msgblock.go // BtcEncode encodes the receiver to w using the bitcoin protocol encoding. // This is part of the Message interface implementation. // See Serialize for encoding blocks to be stored to disk, such as in a // database, as opposed to encoding blocks for the wire. func (msg *MsgBlock) BtcEncode(w io.Writer, pver uint32) error { err := writeBlockHeader(w, pver, &msg.Header) if err != nil { return err } err = WriteVarInt(w, pver, uint64(len(msg.Transactions))) if err != nil { return err } for _, tx := range msg.Transactions { err = tx.BtcEncode(w, pver) if err != nil { return err } } return nil }

可以看出，区块的序列化结构中包含区块头，表示交易数量的整数值和交易列表。它的结构如下图所示:

值得注意的是，区块头是不包含交易数量值的。在计算区块的Hash时，由于MerkleRoot已经包涵了所有交易的信息，所以不用计算事个区块的Hash，只计算区块头的Hash，且不包括交易数量值。

//btcd/wire/msgblock.go // BlockHash computes the block identifier hash for this block. func (msg *MsgBlock) BlockHash() chainhash.Hash { return msg.Header.BlockHash() }

下面是通过wireshark抓到的block消息的包:

可以看出，block及tx的格式与我们上述介绍的一致。值得注意的是，图中第一个交易的输入是一个coinbase交易，其Hash是全零，Index是0xffffffff，且输出的Value是25.06238530个比特币。图中的Hash值全是小端模式，在blockchain.info上查询时需要先转换成大端模式。

至此，我们就介绍完了wire中协议消息封装和解析的过程，并重点分析了inv、tx、block等核心消息和概念。然而，节点同步到Peer的transaction或者block后会如何处理？或者，节点收到Peer的getblocks或者getdata请求后如何从自己的交易池或者区块链上找到对方需要的交易或者区块呢？只有弄清了这些问题，我们才能完整地了解Bitcoin协议交互的全过程，所以在继续消息的交互之前，我们将在下一篇文章《Btcd区块链的构建》中回答这些问题。

==大家可以关注我的微信公众号，后续文章将在公众号中同步更新:==

【本文地址】

Btcd区块链协议消息解析

Btcd区块链协议消息解析

今日新闻

推荐新闻