Finalizing the UData serialization #326

kcalvinalvin · 2021-10-04T06:55:55Z

kcalvinalvin
Oct 4, 2021

This is a first attempt at finalizing the serialization for UData to work towards
a spec for Utreexo. Looking for reviews.

UData Serialization

UData just stands for 'Utreexo Data' and includes all the information
that is needed by a verifying node to verify a Bitcoin block with only
the headers and the Utreexo accumulator roots.

The data that is needed is essentially just three things:

The accumulator proof. This proves the existance of all the UTXO data
that each TxIn in the block is referencing.
The UTXO data that each TxIn in the block is referencing. This is needed
to verify bitcoin spending conditions.
Time to live values for each TXOs. This information is only needed
for caching.

The serialized format is:

<accumulator proof><leaf datas><txo time-to-live values>

All together, the udata serialization looks like so:

Field                    Type       Size
accumulator proof        []byte     variable
leaf datas               []byte     variable
txo time-to-live values  []int32    variable

Each of these elements follow their own serialization format which is defined below.

Accumulator Proof Serialization

Accumulator proof is called BatchProof in package accumulator and its
serialization format is:

<target count><targets><proof count><proofs>

The batchproof serialization looks like so:

Field          Type       Size
target count   varint     1-8 bytes
targets        []uint64   variable
hash count     varint     1-8 bytes
hashes         []32 byte  variable

Leaf Data Serialization

Leaf datas are essentially the revXXXXX.dat block data with the exclusion of
same block spends(saves ~20%). BlockHash and outpoint are included on top of
the revXXXXX.dat block data as those also must be included in the hash
commitment to be included in the accumulator.

The serialization format is:

<block hash><outpoint><header code><amount><pkscript len><pkscript>

The outpoint serialized format is:

<tx hash><index>

The serialized header code format is:
bit 0 - containing transaction is a coinbase
bits 1-x - height of the block that contains the spent txout

It's calculated with:

  header_code = <<= 1
  if IsCoinBase {
      header_code |= 1 // only set the bit 0 if it's a coinbase.
  }

All together, the serialization looks like so:

Field              Type       Size
block hash         [32]byte   32
outpoint           -          36
  tx hash          [32]byte   32
  vout             [4]byte    4
header code        int32      4
amount             int64      8
pkscript length    VLQ        variable
pkscript           []byte     variable

TXO Time-To-Live Value Serialization

The txo time-to-live values are how long each txo lasts until it is spent. This
information is needed for caching and saves massive amount of bandwidth when
a peer uses the ttl values for caching.

The serialization format is:

<txo time-to-live count><txo time-to-live values>

The serialization looks like so:

Field                          Type      Size
txo time-to-live values count  varint    1-8 bytes
txo time-to-live values        []int32   variable

Compact UData Serialization

Compact UData serialization includes only the data that is missing for a
utreexo node to verify a block or a tx with only the utreexo roots. The
compact serialization leaves out data that is able to be fetched locally
by a node, saving bandwidth and storage space.

Note that compact UData serialization differs for a block message and for a
transaction message. This is because transaction messages may reference TXOs
that are not yet included in a block. If a transaction is not included in a block,
there is no accumulator proof for it.

Because of this, each serialization differs to optimize bandwidth savings.

Compact UData Serialization for a block message.

The compact UData serialization for a block is the same as a normal udata
serialization except for the fact that it uses the compact leaf data serialization.

The serialized format for a block is:

<header code><amount><pkscript len><pkscript>

Serialization looks like so:

Field              Type       Size
header code        int32      4
amount             int64      8
pkscript length    VLQ        variable
pkscript           []byte     variable

Note that this information is essentially the same as what's included in the
revXXXXX.dat block (except for the removal of same block spends).

Compact UData Serialization for a transaction message.

Transaction messages may reference inputs that are not yet included in a block (ex: CPFP txs).
This results in some inputs not needing any UData. For these, we just replace with a single byte
unconfirmed marker.

The serialized format for a transaction is:

<unconfirmed marker><header code><amount><pkscript len><pkscript>

All other fields with the exception of 'unconfirmed marker' is the same as
the serialization for a block. The unconfirmed marker is represented in
the struct as height = -1.

Field               Type       Size
unconfirmed marker  byte       1
header code         int32      4
amount              int64      8
pkscript length     VLQ        variable
pkscript            []byte     variable

We need this unconfirmed marker as if we don't, the receiver of the UData won't
know which accumulator proof/leaf data is for which TxIn. For example, if we
have a transaction with 3 inputs with one of that input referencing a
transaction not yet confirmed in a block, then we will only have 2 proofs/leaf
datas.

However, a Compact State Node won't know which of the 3 TxIns are unconfirmed.
We don't include the outpoint in the compact leaf data serialization so there's
no way to tell. So the solution is to force there to be an equal amount of
accumulator proof/leaf data and TxIns and they also must be sent in the same
permutation. If a UTXO being referenced is unconfirmed, then it will have an
unconfirmed marker of 0x1. If the UTXO being referenced is confirmed, then it
will have an unconfirmed marker of 0x0 with the actual data following it.

naumenkogs · 2021-10-04T10:34:11Z

naumenkogs
Oct 4, 2021

Some initial feedback.

Why we need a block hash for every leaf data item?
Int/uint seems inconsistent? Sometimes you use the fact it’s non-negative, sometimes you don’t
I assume “height of the block that contains the spent txout” is something we’d need for verification, but that’s my guess as I’m not a utreexo expert yet.
Could it be useful to drop TTL in some cases? In that case, we should support ttl-free communication.
How about the ordering of leafs? Does it matter? Or it doesn’t matter since we omit same block-spends?
“For these, we just replace with a single byte
unconfirmed marker.” You mean single bit?

1 reply

kcalvinalvin Oct 5, 2021
Author

1. Why we need a block hash for every leaf data item?

Ah that's described in section 5.6 in the utreexo paper. The general idea being that if you include the blockhash in the leaf commitment, then you also force the attacker trying to perform a collision attack to mine a valid bitcoin block.

2. Int/uint seems inconsistent? Sometimes you use the fact it’s non-negative, sometimes you don’t

Ah I was just going straight off what's used in utcd. How does Bitcoin Core handle things? I know some things are mixed but is Core
trying to unify Int/uint going forward?

I guess the internal representation of height = -1 doesn't need to be there in a spec.

3. I assume “height of the block that contains the spent txout” is something we’d need for verification, but that’s my guess as I’m not a utreexo expert yet.

This is taken straight from Core here:
https://github.com/bitcoin/bitcoin/blob/9e530c6352c3e3d4f2936bbbb1bcb34ff9ca6378/src/undo.h#L38-L40

In the rev blocks, nHeight and fCoinbase are squashed together. You just bitshift left once and toggle the LSB.

4. Could it be useful to drop TTL in some cases? In that case, we should support ttl-free communication.

By default we support TTL-free communication. I also found out during yesterday's call, that TTL is on the way out for something better so it should be removed anyways.

TTL is totally optional anyhow so may be better to leave it out for a spec.

5. How about the ordering of leafs? Does it matter? Or it doesn’t matter since we omit same block-spends?

Ordering of the leaves do matter. For blocks they matter because the currently accumulator design forces the data being verified to be in the same order as they were proven.

utreexo/accumulator/pollardproof.go

Lines 7 to 32 in 37699d1

    
           // VerifyBatchProof verifies the hash and the proof passed in. It does not 
        
           // make any modifications to the pollard. 
        
           // 
        
           // NOTE: The order in which the hashes are given matter (aka permutation matters). 
        
           // The hashes being verified should be in the same order as they were 
        
           // proven. 
        
           func (p *Pollard) VerifyBatchProof(toProve []Hash, bp BatchProof) error { 
        
           	// verify the batch proof. 
        
           	rootHashes := p.rootHashesForward() 
        
           	_, _, err := verifyBatchProof(toProve, bp, rootHashes, p.numLeaves, 
        
           		// pass a closure that checks the pollard for cached nodes. 
        
           		// returns true and the hash value of the node if it exists. 
        
           		// returns false if the node does not exist or the hash value is empty. 
        
           		func(pos uint64) (bool, Hash) { 
        
           			n, _, _, err := p.readPos(pos) 
        
           			if err != nil { 
        
           				return false, empty 
        
           			} 
        
           			if n != nil && n.data != empty { 
        
           				return true, n.data 
        
           			} 
        
           			return false, empty 
        
           		}) 
        
           	return err 
        
           }

This could be changed so that the ordering doesn't matter. It was changed to be like this because the accumulator is more efficient when you force the order to be the same.

So for example:

With a below block containing txs:1-5 with tx 4 being a same block spend, we'd omit that when adding to the accumulator.
block[<1><2><3><4*><5>]

We add the below to the accumulator:
[<1><2><3><5>]

When we are generating proofs, we use the above ordering. Once the proof has been generated, the corresponding leaves must be in the same order.

If we proved them in [<1><2><3><5>], then this ordering must be preserved when sending the proof and leaves over to a peer. If we proved them in [<5><2><3><1>], this ordering must be preserved when sending them over to a peer.

6. “For these, we just replace with a single byte
   unconfirmed marker.” You mean single bit?

Basically yeah. Just a single bit of information. It be nice if it was just a single bit but not sure how to actually do so.

naumenkogs · 2021-10-05T10:10:30Z

naumenkogs
Oct 5, 2021

Ah that's described in section 5.6 in the utreexo paper. The general idea being that if you include the blockhash in the leaf commitment, then you also force the attacker trying to perform a collision attack to mine a valid bitcoin block.

I'm still kind of confused... Okay, so based on the "Compact Udata for a block message", the hash indeed doesn't have to be repeated. What's the point of explaining non-compact udata serialization format anyway? When a block is mined, nodes exchange compact udata for block. When unconfirmed tx is produced, a compact udata for tx could be transmitted.

Ah I was just going straight off what's used in utcd. How does Bitcoin Core handle things? I know some things are mixed but is Core
trying to unify Int/uint going forward?

I mean, we're building a new system, why just don't make it right? For example, amount cant be negative, right? And it's not related to consensus, so might use uint.

Ordering of the leaves do matter.

That's what I thought, so I was surprised it's not reflected anywhere in this serialization. Perhaps you depend on the regular Bitcoin Core message ordering in that case?

Basically yeah. Just a single bit of information. It be nice if it was just a single bit but not sure how to actually do so.

Yeah I guess I see where you're coming from. This is not critical anyway, I just thought it's a typo but yeah it's not.

1 reply

kcalvinalvin Oct 6, 2021
Author

Ah that's described in section 5.6 in the utreexo paper. The general idea being that if you include the blockhash in the leaf commitment, then you also force the attacker trying to perform a collision attack to mine a valid bitcoin block.

I'm still kind of confused... Okay, so based on the "Compact Udata for a block message", the hash indeed doesn't have to be repeated. What's the point of explaining non-compact udata serialization format anyway? When a block is mined, nodes exchange compact udata for block. When unconfirmed tx is produced, a compact udata for tx could be transmitted.

My intention was for the non-compact udata serialization to be the format for creating the hash that's to be committed in the accumulator (admittedly it's not very good at that since the txo ttls should be left out).

On the other hand, the compact serialization should be the only one used for any sort of i/o.

Ah I was just going straight off what's used in utcd. How does Bitcoin Core handle things? I know some things are mixed but is Core
trying to unify Int/uint going forward?

I mean, we're building a new system, why just don't make it right? For example, amount cant be negative, right? And it's not related to consensus, so might use uint.

Sounds good. I'll change all to uint.

Ordering of the leaves do matter.

That's what I thought, so I was surprised it's not reflected anywhere in this serialization. Perhaps you depend on the regular Bitcoin Core message ordering in that case?

Yes, the ordering of the leaves for a particular block will be the same as the bitcoin message it corresponds to (whether that be a block message or a tx message).

naumenkogs · 2021-10-06T11:03:59Z

naumenkogs
Oct 6, 2021

My intention was for the non-compact udata serialization to be the format for creating the hash that's to be committed in the accumulator (admittedly it's not very good at that since the txo ttls should be left out).

I see now yeah.

Overall, I have no more comments, and I hope this discussion did some help. It would be useful to revisit this once I catch up with the utreexo context better :)

0 replies

dergoegge · 2022-04-04T09:46:14Z

dergoegge
Apr 4, 2022

Compact UData Serialization for a transaction message.
<unconfirmed marker><header code><amount><pkscript len><pkscript>

I don't think we need the unconfirmed marker, as nodes can check if something is unconfirmed by looking it up in their mempool. You compute an input skip list for a transaction by checking which TxIns come from the mempool. That way you can tell which leaf data belongs to which input, allowing you to only send the required/confirmed leaves.

Orphans are detected if the size of the skip list plus the number of provided leaves does not match the total number of TxIns.

0 replies

instagibbs · 2022-10-13T20:01:10Z

instagibbs
Oct 13, 2022

Just make the length of the accumulator proof 0 for unconfirmed spends. It cannot be non-zero anyways.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finalizing the UData serialization #326

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Finalizing the UData serialization #326

kcalvinalvin Oct 4, 2021

UData Serialization

Accumulator Proof Serialization

Leaf Data Serialization

TXO Time-To-Live Value Serialization

Compact UData Serialization

Compact UData Serialization for a block message.

Compact UData Serialization for a transaction message.

Replies: 5 comments · 2 replies

naumenkogs Oct 4, 2021

kcalvinalvin Oct 5, 2021 Author

naumenkogs Oct 5, 2021

kcalvinalvin Oct 6, 2021 Author

naumenkogs Oct 6, 2021

dergoegge Apr 4, 2022

instagibbs Oct 13, 2022

kcalvinalvin
Oct 4, 2021

Replies: 5 comments 2 replies

naumenkogs
Oct 4, 2021

kcalvinalvin Oct 5, 2021
Author

naumenkogs
Oct 5, 2021

kcalvinalvin Oct 6, 2021
Author

naumenkogs
Oct 6, 2021

dergoegge
Apr 4, 2022

instagibbs
Oct 13, 2022