Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
228 changes: 228 additions & 0 deletions bip-XXXX.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,228 @@
```
BIP: ?
Layer: Peer Services
Title: P2P UTXO Set Sharing
Authors: Fabian Jahr <fjahr@protonmail.com>
Status: Draft
Type: Specification
Assigned: ?
Discussion: 2026-05-06: https://groups.google.com/g/bitcoindev/c/rThmyI8ZN3Q
Version: 0.2.0
License: BSD-2-Clause
Comment on lines +8 to +11
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Headers have a fixed order, please update to:

Suggested change
Assigned: ?
Discussion: 2026-05-06: https://groups.google.com/g/bitcoindev/c/rThmyI8ZN3Q
Version: 0.2.0
License: BSD-2-Clause
Assigned: ?
License: BSD-2-Clause
Discussion: 2026-05-06: https://groups.google.com/g/bitcoindev/c/rThmyI8ZN3Q
Version: 0.2.0

```

## Abstract

This BIP defines a P2P protocol extension for sharing full UTXO sets between peers. It introduces
a new service bit `NODE_UTXO_SET`, four new P2P messages (`getutxotree`, `utxotree`, `getutxoset`,
`utxoset`), and a chunk-hash list anchored to a Merkle root known to the requesting node, enabling
per-chunk verification. This allows nodes to bootstrap from a recent height by obtaining the
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that the nodes are bootstrapping from scratch to a recent height rather than from a recent height? Maybe:

Suggested change
per-chunk verification. This allows nodes to bootstrap from a recent height by obtaining the
per-chunk verification. This allows bootstrapping nodes to leapfrog to a recent height by obtaining the

required UTXO set directly from the P2P network via mechanisms such as assumeutxo.

## Motivation

The assumeutxo feature (implemented in Bitcoin Core) allows nodes to begin operating from a serialized
UTXO set while validating
historical blocks in the background. However, there is currently no canonical source for obtaining this
data. Users must either generate one themselves from a fully synced node (using `dumptxoutset` in
Bitcoin Core), or download one from a third party.

By enabling UTXO set sharing over the P2P network, new nodes can obtain the data directly from
peers, removing the dependency on external infrastructure.

## Specification

The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY" in this document are to be
interpreted as described in RFC 2119.

### Service Bit

| Name | Bit | Description |
|------|-----|-------------|
| `NODE_UTXO_SET` | 12 (0x1000) | The node can serve complete UTXO set data for at least one height. |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in conflict with the Utreexo proposal which allocates

  • Bit 12 to NODE_UTREEXO
  • Bit 13 to NODE_UTREEXO_ARCHIVE

per the BIP183 draft.


A node MUST NOT set this bit unless it has at least one full UTXO set available to serve.
A node signaling `NODE_UTXO_SET` MUST be capable of responding to `getutxotree` and `getutxoset`
requests for every UTXO set it is willing to serve, including the full chunk-hash list and every
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
requests for every UTXO set it is willing to serve, including the full chunk-hash list and every
requests for every UTXO set that it is willing to serve, including the full chunk-hash list and every

chunk of those sets.

### Data Structures

#### Serialized UTXO Set

The serialized UTXO set uses the format established by the Bitcoin Core RPC `dumptxoutset` (as of Bitcoin Core v31).

**Header (55 bytes):**

| Field | Type | Size | Description |
|-------|------|------|-------------|
| `magic` | `bytes` | 5 | `0x7574786fff` (ASCII `utxo` + `0xff`). |
| `version` | `uint16_t` | 2 | Format version. |
| `network_magic` | `bytes` | 4 | Network message start bytes. |
| `base_height` | `uint32_t` | 4 | Block height of the UTXO set. |
| `base_blockhash` | `uint256` | 32 | Block hash of the UTXO set. |
| `coins_count` | `uint64_t` | 8 | Total number of coins (UTXOs) in the set. |

**Body (coin data):**

Coins are grouped by transaction hash. For each group:

| Field | Type | Size | Description |
|-------|------|------|-------------|
| `txid` | `uint256` | 32 | Transaction hash. |
| `num_coins` | `compact_size` | 1–9 | Number of outputs for this txid. |

For each coin in the group:

| Field | Type | Size | Description |
|-------|------|------|-------------|
| `vout_index` | `compact_size` | 1–9 | Output index. |
| `coin` | `Coin` | variable | Serialized coin (varint-encoded code for height/coinbase, then compressed txout). |

Coins are ordered lexicographically by outpoint (txid, then vout index), matching the LevelDB iteration
order of the coins database.

#### Chunk Merkle Tree

The serialized UTXO set (header + body) is split into chunks of exactly 3,900,000 bytes (3.9 MB). The
last chunk contains the remaining bytes and may be smaller.

The leaf hash for each chunk is `SHA256d(chunk_data)`. The tree is built as a balanced binary tree. When
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: when reading this, I was wondering if missed a section. A very brief (non-title) introduction of the tree we're building before talking about its leafs would read more naturally.

the number of nodes at a level is odd, the last node is duplicated before hashing the next level.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is safe because no left_child and right_child should ever have the same data, but I dimly remember that this construction had some downsides in the Merkle tree for transactions in a block. (Some node implementation accepted a block with the same transaction repeated, or smth?) Would it perhaps be better to have a dedicated value to add into the hash instead of repeating the left_child, e.g., SHA256d(left_child || SHA256d(odd_number_of_leaves_no_right_child))?

Interior nodes are computed as `SHA256d(left_child || right_child)`.

The leaves are delivered to the node in a single `utxotree` response. A node that knows
the Merkle root for a given UTXO set checks a received list of leaves by recomputing the root and
comparing. The Merkle root is the sole trust input required to verify the integrity of the received UTXO set.

`SHA256d` denotes double-SHA256: `SHA256d(x) = SHA256(SHA256(x))`.

### Messages

#### `getutxotree`

Sent to request the chunk-hash list for a specific UTXO set.

| Field | Type | Size | Description |
|-------|------|------|-------------|
| `block_hash` | `uint256` | 32 | Block hash identifying the requested UTXO set. |

A node that has advertised `NODE_UTXO_SET` and can serve the requested UTXO set MUST respond with
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
A node that has advertised `NODE_UTXO_SET` and can serve the requested UTXO set MUST respond with
A node that has advertised `NODE_UTXO_SET` and can serve the requested UTXO set SHOULD respond with

`utxotree`. If the serving node cannot fulfill the request, it MUST NOT respond. The requesting
node SHOULD apply a reasonable timeout and try another peer.

#### `utxotree`

Sent in response to `getutxotree`, delivering the full chunk-hash list along with per-snapshot
metadata.

| Field | Type | Size | Description |
|-------|------|------|-------------|
Comment thread
fjahr marked this conversation as resolved.
| `block_hash` | `uint256` | 32 | Block hash this data corresponds to. |
| `version` | `uint16_t` | 2 | Format version of the serialized UTXO set. |
| `data_length` | `uint64_t` | 8 | Total size of the serialized UTXO set in bytes (header + body). |
| `num_chunks` | `compact_size` | 1–9 | Number of chunks the serialized UTXO set is split into. |
| `chunk_hashes` | `uint256[]` | 32 × `num_chunks` | The ordered list of chunk hashes. |

Upon receiving a `utxotree` message, the node MUST recompute the Merkle root from
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stumbled over this the first time, because “node” could apply to either side.

Suggested change
Upon receiving a `utxotree` message, the node MUST recompute the Merkle root from
Upon receiving a `utxotree` message, the requesting node MUST recompute the Merkle root from

`chunk_hashes` and compare it against the Merkle root it knows for the corresponding UTXO set. If
the roots do not match, the node MUST discard the response and MUST disconnect the peer.

#### `getutxoset`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getutxoset feels a bit odd, when the message actually requests a chunk. Also, Cluster Mempool makes extensive use of the term "chunk", and I was wondering whether this overlap could cause confusion in the future.


Sent to request a single chunk of UTXO set data. The requesting node MUST have received a `utxotree`
for the corresponding UTXO set before sending this message.
Comment on lines +133 to +134
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there perhaps also be an addition that the serving node must not reply if it has not previously sent a utxotree message to the requesting peer?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also checked that the Utreexo BIPs don’t use the term "utxotree", but I worry that "utreexo" and "utxotree" are extremely similar and that will cause headaches in the future.


| Field | Type | Size | Description |
|-------|------|------|-------------|
| `block_hash` | `uint256` | 32 | Block hash identifying the requested UTXO set. |
| `chunk_index` | `uint32_t` | 4 | Zero-based index of the requested chunk. |

If the serving node cannot fulfill the request, it MUST NOT respond. The requesting node SHOULD apply
a reasonable timeout and try another peer.

#### `utxoset`

Sent in response to `getutxoset`, delivering one chunk.

| Field | Type | Size | Description |
|-------|------|------|-------------|
| `block_hash` | `uint256` | 32 | Block hash this data corresponds to. |
| `chunk_index` | `uint32_t` | 4 | Zero-based index of this chunk. |
| `data` | `bytes` | variable | Chunk payload, exactly 3.9 MB except for the last chunk. |

The transfer is receiver-driven: the requesting node sends one `getutxoset` per chunk. Chunks MAY be
requested in any order and from different peers.

Upon receiving a `utxoset` message, the node MUST compute `SHA256d(data)` and compare it against
`chunk_hashes[chunk_index]` from the `utxotree` it accepted for this UTXO set. If the hashes do not
match, the node MUST discard the chunk and MUST disconnect the peer. A node SHOULD also disconnect
a peer that sends a `utxoset` message with fields (`chunk_index`, `block_hash`) that do not match
the outstanding request.

After all chunks have been received, the node SHOULD parse the reassembled UTXO set against the
serialized UTXO set format to confirm it is well-formed.

### Protocol Flow

1. The requesting node identifies peers advertising `NODE_UTXO_SET`.
2. The requesting node sends `getutxotree` for the desired block hash to one or more of these peers.
3. Each peer responds with `utxotree`. The requesting node verifies the response by recomputing
the Merkle root against a value it knows for the given UTXO set, either from a trusted source
or by selecting a root with agreement among multiple peers.
4. The requesting node downloads chunks via `getutxoset`/`utxoset` exchanges, verifying each chunk
against its entry in the accepted `utxotree` on receipt. On verification failure the peer is
disconnected and download continues from another peer without losing already-verified chunks.
5. After all chunks are received, the node parses the reassembled UTXO set against the serialized
UTXO set format to confirm that it is well-formed.

Serving nodes are free to limit the number of concurrent and repeated transfers per peer at their own
discretion to manage resource consumption.

## Rationale

**Usage of service bit 12:** Service bits allow selective peer discovery through
DNS seeds and addr relay. Bit 12 is chosen as the next unassigned bit after `NODE_P2P_V2` (bit 11, BIP 324).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted above, your service flag should perhaps be moved to bit 14, unless the Utreexo project is willing to move instead.


**Direct request model:** Peers signal availability of UTXO sets via the `NODE_UTXO_SET`
service bit; the requesting node identifies the desired UTXO set by block hash when sending
`getutxotree`. The serving node responds only if it can serve that specific UTXO set.

**Per-chunk verification:** The chunk-hash list returned in `utxotree` enables each chunk to be verified
by direct lookup against the accepted list as it arrives, allowing immediate detection of corrupt data,
peer switching without data loss, and parallel download from multiple peers. The list itself is small
(~80 KB for a ~10 GB set). The specified serialization is deterministic, so all honest nodes produce
Comment on lines +193 to +194
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a node is expected to source chunks from multiple different peers, is it really necessary to receive the entire tree description of 80 KB from each of the peers?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec says "The requesting node sends getutxotree for the desired block hash to one or more of these peers." so I think the answer is "no, it is not necessary to receive the entire tree description of 80 KB from each of the peers" -- you only send requests to the number of peers you want to receive response from. Any attempt to give different responses will (should) result in them not hashing back to your known merkle root, so all valid descriptions will be identical, AIUI.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read line 133,134

Sent to request a single chunk of UTXO set data. The requesting node MUST have received a utxotree
for the corresponding UTXO set before sending this message.

as the serving node not being permitted to respond to getutxoset calls for a specific tree unless it previously sent a utxotree message to the same peer, but maybe I misinterpreted that. It seems to me that both aspects of the question should be clarified:

  • Must a peer send getutxotree before being eligible to responses to getutxoset for the same tree?
  • Is it necessary to retrieve the utxotree from multiple peers before requesting chunks?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it would be better to send getutxotree to one node (repeating until you get a valid response), and then send getutxoset to any nodes that support utxo set sharing, with the response utxoset <hash> <n> <empty> indicating "i don't have that utxoset data" ? So instead of getutxotree / utxotree to establish whether a peer has the data you want, you send getutxoset / utxoset and either get an explicit nope or data you actually want?

byte-identical output, guaranteeing Merkle root agreement.

**3.9 MB chunk size:** The number balances round trips (~2,560 for a ~10 GB set) against memory usage
for buffering and verifying a single chunk. Smaller chunks would increase protocol overhead; larger
chunks would increase memory pressure on constrained devices commonly used to run Bitcoin nodes.
Together with the additional message overhead, the `utxoset` message including the chunk data also
sits just below the theoretical maximum block size which means any implementation should be able to
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also happens to sit just below the maximum P2P message size MAX_PROTOCOL_MESSAGE_LENGTH, so it may be clearer to refer to that instead of block size

Copy link
Copy Markdown
Contributor Author

@fjahr fjahr May 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but this was a contious decision actually. MAX_PROTOCOL_MESSAGE_LENGTH is a Bitcoin Core implementation specific value. A different implementation may have a higher value for this. But every implementation will at least need to be able to receive the biggest possible block. So I think it's better to anchor it to that.

handle messages of this size.

**Reusing the `dumptxoutset` format:** Avoids introducing a new serialization format and ensures
compatibility with UTXO sets already being generated and shared.

**Relationship to BIP 64:** BIP 64 defined a protocol for querying individual UTXOs by outpoint and is
now closed. This BIP addresses a different use case: bulk transfer of the entire UTXO set for node
bootstrapping.

## Reference Implementation

[Bitcoin Core implementation pull request](https://github.com/bitcoin/bitcoin/pull/35054)

## Copyright

This BIP is made available under the terms of the 2-clause BSD license. See
https://opensource.org/license/BSD-2-Clause for more information.

## Changelog

* __0.2.0__ (2026-05-04):
* Dropped discovery before download approach, instead request the chunk-hash list via `getutxotree`/`utxotree`
* Dropped per-chunk Merkle proofs; chunks verified directly against the chunk-hash list
* Dropped `height` from requests (`block_hash` is the sole identifier); added format `version` to `utxotree`
* Dropped references to the serialized hash; the Merkle root is the sole integrity check
* __0.1.0__ (2026-04-10):
* Initial draft