Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backwards compatibility of bdk::ChangeSet data structure #1103

Closed
Tracked by #76
evanlinjin opened this issue Aug 29, 2023 · 23 comments
Closed
Tracked by #76

Backwards compatibility of bdk::ChangeSet data structure #1103

evanlinjin opened this issue Aug 29, 2023 · 23 comments
Assignees
Labels
api A breaking API change module-database module-wallet new feature New feature or request
Milestone

Comments

@evanlinjin
Copy link
Member

evanlinjin commented Aug 29, 2023

Describe the enhancement

The bdk_wallet::ChangeSet is what the persistence interfaces with. As suggested by @tnull, this structure should be backwards compatible (in case we make changes to this structure).

An easy solution is to make bdk_wallet::ChangeSet a enum:

#[non_exhaustive]
pub enum ChangeSet {
    V0(ChangeSet0),
    V1(ChangeSet1), // in the future this may exist
}

// We always convert to the latest version
impl From<ChangeSet0> for ChangeSet1 { /* TODO */ }
@evanlinjin evanlinjin added new feature New feature or request discussion There's still a discussion ongoing labels Aug 29, 2023
@tnull
Copy link
Contributor

tnull commented Aug 29, 2023

Note that it would also be nice to consider whether even forwards compatibility could be guaranteed. It may make things a bit more complicated, but it's worth exploring, especially since these properties are transitive for any downstream users (e.g., LDK Node can only guarantee what BDK guarantees...).

An easy solution is to make bdk::ChangeSet a enum:

Mh, but then you'd need to define explicit migration functions to allow user migrating from V0 to V1? Another way would be to a) don't break compat for any existing fields and b) introduce any new fields as Options so that users upgrading could deal with 'old' objects in which the fields are still None.

@evanlinjin
Copy link
Member Author

@tnull so you want an older version of BDK to understand changesets that a newer version of BDK outputs. Can you elaborate why this would be useful? Thank you

@tnull
Copy link
Contributor

tnull commented Aug 31, 2023

@tnull so you want an older version of BDK to understand changesets that a newer version of BDK outputs. Can you elaborate why this would be useful? Thank you

Forwards compatibility is really useful to allow downgrades for various reasons: say a user upgrades to a newer BDK version and then a bug is discovered. If the upgrade breaks serialization forwards compatibility they couldn't downgrade to get going again and would be stuck until BDK ships a fix. Similarly, it would be necessary to be a bit more forgiving when restoring from backups or when having multiple apps that operate on the same data set.

In LDK Node's case we for example plan to allow multi-device support synced via a VSS backend. Not having forwards compatibility of the serialized data would require our users to upgrade all apps and devices in lockstep as otherwise an outdated device would refuse to read any data written by the already upgraded devices.

As said above, I'm aware that it may complicate things, but it's at least worth considering, i.e., it should be a dedicated design choice not to provide these guarantees.

@LLFourn
Copy link
Contributor

LLFourn commented Sep 1, 2023

Thanks for bringing up this point. I think we should never change this for structures in bdk_chain once we have a proper release. Adding new functionality should be adding a new type. I think there are two remaining proposed changeset changes there:

  1. LocalChain changes should be monotone #1005
  2. Reintroduce descriptor data to bdk::Wallet persistence #1101

I'm not sure if we can apply the same policy to bdk::Wallet. If we do want to make changes there then we should indeed try and do it via optional fields at the end of the changeset.

I think forwards compatibility is possible but this depends on the application serializing each changeset in a particular way. Note in bdk_file_store we don't serialize changesets with a length so the decoder is expected to understand everything in a changeset. We could improve this by first bincode serializing Vec<u8> of the encoded entry which will prefix it with the length.

@tnull
Copy link
Contributor

tnull commented Sep 1, 2023

Note in bdk_file_store we don't serialize changesets with a length so the decoder is expected to understand everything in a changeset. We could improve this by first bincode serializing Vec<u8> of the encoded entry which will prefix it with the length.

Yeah, I think even independently from (forwards) compatibility guarantees, having some TLV-style encoding or at the very least adding the length sounds like a very good idea. Not having this really limits how you can make necessary changes to the data model in the future.

@notmandatory
Copy link
Member

Do we really need forward and backwards compatibility baked into the data structures for persisted data? My understanding is that the data BDK persists should always able to be regenerated from the wallet/app descriptors and blockchain client. So in the case of a roll forward/back should be able to just do a fresh full scan.

Optionally some persistence modules like SQL based ones should be able to at least rolling forward new schema changes to work with new bdk::ChangeSet versions.

@notmandatory notmandatory added this to BDK Nov 13, 2023
@notmandatory notmandatory added this to the 1.0.0-alpha.4 milestone Nov 13, 2023
@notmandatory
Copy link
Member

I added this to the alpha.4 milestone to decide at least if we're going to do this or not since it's a biggish functional change.

@tnull
Copy link
Contributor

tnull commented Nov 14, 2023

Do we really need forward and backwards compatibility baked into the data structures for persisted data? My understanding is that the data BDK persists should always able to be regenerated from the wallet/app descriptors and blockchain client. So in the case of a roll forward/back should be able to just do a fresh full scan.

I mean generally forwards compat. is a "nice-to-have", not a hard requirement. Still, ... nice to have it. ;)
And I agree that it doesn't have to be explicit in the data structures (in fact, none of the compat guarantees have to be), but if we want it, it should be clearly defined and documented, in the best case even (regularly) checked.

Optionally some persistence modules like SQL based ones should be able to at least rolling forward new schema changes to work with new bdk::ChangeSet versions.

Mh, but this is really about the data model itself, not the persistence layer. Of course the persistence layer could always provide migrations, but the idea compat. guarantees is exactly that the data model does not need an explicit migration step to go forwards/backwards.

@LLFourn
Copy link
Contributor

LLFourn commented Nov 16, 2023

Thanks for bringing up this point. I think we should never change this for structures in bdk_chain once we have a proper release. Adding new functionality should be adding a new type. I think there are two remaining proposed changeset changes there:

1. [LocalChain changes should be monotone #1005](https://github.com/bitcoindevkit/bdk/issues/1005)

2. [Reintroduce descriptor data to `bdk::Wallet` persistence #1101](https://github.com/bitcoindevkit/bdk/issues/1101)

I'm not sure if we can apply the same policy to bdk::Wallet. If we do want to make changes there then we should indeed try and do it via optional fields at the end of the changeset.

I just realized this suggestion doesn't make sense since we're using bincode. You indeed have to do what @evanlinjin suggested in the OP for backwards compat.

@nondiremanuel nondiremanuel modified the milestones: 1.0.0-alpha.4, 1.0.0 Jan 6, 2024
@nondiremanuel nondiremanuel moved this to Todo in BDK Jan 6, 2024
@nondiremanuel
Copy link

Lib team call: It changes the API so we should include it before cutting off for beta release

@nondiremanuel nondiremanuel added the api A breaking API change label Jan 25, 2024
@nondiremanuel nondiremanuel moved this from Todo to Discussion in BDK Jan 25, 2024
@evanlinjin evanlinjin modified the milestones: 1.0.0, 1.0.0-beta, 1.0.0-alpha Feb 5, 2024
@evanlinjin evanlinjin moved this from Discussion to Todo in BDK Apr 12, 2024
@evanlinjin
Copy link
Member Author

I suggest we go ahead with this and do what is described in the ticket description.

@evanlinjin evanlinjin removed the discussion There's still a discussion ongoing label Apr 12, 2024
@evanlinjin evanlinjin self-assigned this Apr 12, 2024
@notmandatory
Copy link
Member

We are now using the CombindedChangeSet type for persisting Wallet data and that type implements Default, so all fields are optional or can be empty. I think this lets us maintain backward compatibility as long as we require that going forward any new fields added to CombindedChangeSet also implement Default.

So for example, if we add some field like wallet_name: Option<String> to a future version of CombinedChangeSet, then when we read an old data file with the old struct we'll get None and have to ensure the rest of the code will handle it in some reasonable way without a panic.

@tnull does this work for you? I believe it's what you suggested in our last call. If so I suggest we only need to document this restriction in the docs for CombinedChangeSet.

@notmandatory notmandatory moved this from Todo to Discussion in BDK Jun 1, 2024
@tnull
Copy link
Contributor

tnull commented Jun 3, 2024

We are now using the CombindedChangeSet type for persisting Wallet data and that type implements Default, so all fields are optional or can be empty. I think this lets us maintain backward compatibility as long as we require that going forward any new fields added to CombindedChangeSet also implement Default.

So for example, if we add some field like wallet_name: Option<String> to a future version of CombinedChangeSet, then when we read an old data file with the old struct we'll get None and have to ensure the rest of the code will handle it in some reasonable way without a panic.

@tnull does this work for you? I believe it's what you suggested in our last call. If so I suggest we only need to document this restriction in the docs for CombinedChangeSet.

Yes, it sounds like this could work, especially if newly added fields are Options and this remains the only type that needs to be persisted. Note that it would still be unclear how you'd handle any types for which a sane default isn't easily available.
Might also be worth to add a test for the serialization upgrade whenever changes are made to CombinedChangeSet or any of the dependent types.

@evanlinjin
Copy link
Member Author

evanlinjin commented Jun 3, 2024

@notmandatory some serialization formats would not work with what's suggested (i.e. bincode) when the entire changeset is persisted. I think we should create a new type (i.e. VersionedChangeSet) which is an enum of different versions of CombinedChangeSet. The changeset will auto-upgrade itself to the latest version (if that makes sense).

We still need to ensure new fields added have sane defaults (as @tnull pointed out). However, I can't imagine a scenario where something added wouldn't have a sane default.


P.s. I'm working on this right now!

@notmandatory
Copy link
Member

notmandatory commented Jun 5, 2024

Can you elaborate on why serialization schemes like bincode wouldn't work with the idea of making all fields, and new ones in particular, optional (or potentially empty)? doesn't any serialization we use have to have the property that serdes change sets from/to regular rust structs and that can be appended as we do now?

@LLFourn
Copy link
Contributor

LLFourn commented Jun 12, 2024

In theory bincode could work like this if you added a length prefix to each changeset but in practice there doesn't seem to be APIs to "decode this thing and leave the remaining fields as Default if you run out of data" which seems really unfortunate. This stops you doing backwards compatibility without version bytes. Version bytes break forwards compatibility.

I think it would be possible to get both if we were willing to do a custom serde derserializer that did versioning but if the version is higher than the current impl, then just try deserializing anyway but ignore trailing bytes (note that bincode can ignore trailing bytes). This would be a little better since it doesn't require making the enum.

I wonder if there's a way we could do this in bdk file persist where we just do the version prefixing there and you provide a callback to the bdk_file_persist telling it how to decode each older version. This seems like the cleanest because it puts the solution where the actual problem is (bincode) and doesn't preclude forwards compatibility for other formats (e.g. cbor, json etc). This would turn it into a TLV kind of thing where each entry is in the form:

struct Entry<V> {
    version: V,
    data: Vec<u8>,
}

Then you'd provide a callback like:

Fn(Option<V>, data: Vec<u8>) -> Result<ChangeSet>

Where you would actually do the bincode decoding, looking at the V to tell which verison of the struct you should try and decode. To get forwards compatibility, if V couldn't be decoded it will pass in None and you can just try and decode with trailing bytes.

@notmandatory notmandatory moved this from Discussion to Needs Review in BDK Jun 14, 2024
@notmandatory notmandatory moved this from Needs Review to Todo in BDK Jun 15, 2024
@notmandatory
Copy link
Member

notmandatory commented Jun 17, 2024

@LLFourn I agree that handling empty fields only needs to be handled right now in the file store. I don't expect versions will change that often, so rather than making a whole general purpose framework for handling this can we just take the simple enum approach in that crate?

I'm hoping we don't add new fields that often, something like this:

pub enum VersionedCombinedChangeSet<K,A> {
    V1 {
        chain: crate::local_chain::ChangeSet,
        indexed_tx_graph: crate::indexed_tx_graph::ChangeSet<A, crate::keychain::ChangeSet<K>>,
        network: Option<bitcoin::Network>,
    },
    V2 {
        chain: crate::local_chain::ChangeSet,
        indexed_tx_graph: crate::indexed_tx_graph::ChangeSet<A, crate::keychain::ChangeSet<K>>,
        network: Option<bitcoin::Network>,
        new_info: Option<u32>,
    }
}

impl<K,A> From<VersionedCombinedChangeSet<K,A>> for CombinedChangeSet<K,A> {
    fn from(versioned: VersionedCombinedChangeSet<K, A>) -> Self {
        match versioned {
            VersionedCombinedChangeSet::V1 { chain, indexed_tx_graph, network } => Self {
                chain,
                indexed_tx_graph,
                network,
                // missing fields get default
                ..Default::default()
            }
            VersionedCombinedChangeSet::V2 { chain, indexed_tx_graph, network, new_info } => Self {
                chain,
                indexed_tx_graph,
                network,
                new_info,
            }
        }
    }
}

impl<K,A> From<CombinedChangeSet<K,A>> for VersionedCombinedChangeSet<K,A> {
    fn from(changeset: CombinedChangeSet<K,A>) -> Self {
        // current latest version
        VersionedCombinedChangeSet::V2 {
            chain: changeset.chain,
            indexed_tx_graph: changeset.indexed_tx_graph,
            network: changeset.network,
            new_info: changeset.new_info,
        }
    }
}

@nondiremanuel nondiremanuel moved this from Todo to In Progress in BDK Jun 18, 2024
@evanlinjin
Copy link
Member Author

To add some context to this conversation, I'll touch on all proposed future changes that will affect ChangeSet structures.

Add another tx timestamp for when wallet created tx

Wallet birthday

  • Where? bdk_wallet::ChangeSet
  • What? New field - birthday: Option<u64>
  • Default? None
  • Why? For block-by-block sync, can know where to start scanning from for recovery
  • Context? Add birthday to the wallet database #1318

Script pubkey birthday

  • Where? bdk_wallet::keychain::ChangeSet
  • What? New field - spk_birthdays: BTreeMap<u32, u64>` - where the key is the derivation index and the value is the birthday timestamp. If a derivation index has no birthday, the birthday of an earlier derivation index is used.
  • Default? BTreeMap::new()
  • Why? Can be used as policy for light-sync. I.e. Only scan for addresses which have birthdays earlier than 3 days. Good information for users.
  • Context? No ticket yet

Monotone LocalChain changesets

  • Where? local_chain::ChangeSet
  • What? Change BTreeMap<u32, Option<BlockHash>> to BTreeMap<BlockId, BTreeSet<BlockId>> where key is the block-id in question, and values are prev-blocks which are in the same chain.
  • Default? BTreeMap::new()
  • Why? This creates an API that is harder to break. It also allows us to update the chain from multiple sources without creating conflicts.
  • Context? LocalChain changes should be monotone #1005

Add block metadata in LocalChain

  • Where?
    • Without monotone LocalChain changesets: BTreeMap<u32, Option<(BlockHash, B)>
    • With monotone LocalChain changesets: BTreeMap<BlockId, (BTreeSet<BlockId>, B)>
  • Default? BTreeMap::new()
  • Why? Can store header data for SPV, filters, utreexo, etc.
  • Context? No ticket yet

@evanlinjin
Copy link
Member Author

Changes which only adds new fields are easily backwards and forwards compatible. Changes which change fields (monotone LocalChain changeset and adding metadata to LocalChain) are more problematic.

For monotone LocalChain changesets, we can easily upgrade older changesets into the newer one. However, it will be problematic if a changeset of an older version is merged onto a changeset of a newer version, therefore this breaks "forward compatibility".

For block metadata, we can have forwards and backwards compatibility if B can be empty and have a default.

@evanlinjin
Copy link
Member Author

I think we should have the following:

  1. Allow breaking backwards-compatibility for bdk_chain changesets. Otherwise for all bdk_chain changesets, we either need to keep the old fields (which would be weird), or version all changesets individually (i.e. an enum). This is difficult to maintain.
  2. Only make bdk_wallet::ChangeSet backwards compatible. This can either be achieved via enum versioning, or flattening the bdk_chain changeset fields in bdk_wallet::ChangeSet.
  3. For persistence that uses serialization formats without delimiters, have special logic to handle it.

I'm a fan of flattening the bdk_wallet::ChangeSet. It makes it easier to maintain. For upgrades that require changing fields, we have an upgrade method that empties the legacy fields and tries to populate the newer fields. However, a non-upgraded changeset cannot be applied on an updated changeset. Wallets would never create non-upgraded changesets after an upgrade anyway - therefore, a panic here is fine? Or we can just ignore the legacy fields (chain sources can figure out what is missing anyway).

@notmandatory
Copy link
Member

notmandatory commented Jul 1, 2024

@evanlinjin per recent discord chat the changes that need to be done are:

  1. Move CombinedChangeSet back to bdk_wallet::ChangeSet
  2. Introduce B generic to LocalChain.
  3. Remove keychain::ChangeSet::keychains_added and add this to bdk_wallet::ChangeSet.
  4. Introduce TLV for bdk_file_store.

Do we need separate issues and/or PRs for these or want to do them all together? Can I help by doing a PR for 1?

@evanlinjin
Copy link
Member Author

evanlinjin commented Jul 2, 2024

@notmandatory one more breaking change here: #1333

I would like more feedback on this.

I'm also currently working on 1 & 2 (should have a PR ready tomorrow).

@notmandatory notmandatory moved this from In Progress to Needs Review in BDK Jul 18, 2024
@notmandatory notmandatory moved this from Needs Review to In Progress in BDK Jul 18, 2024
@notmandatory notmandatory moved this from In Progress to Needs Review in BDK Jul 18, 2024
@notmandatory
Copy link
Member

fixed by #1514

@github-project-automation github-project-automation bot moved this from Needs Review to Done in BDK Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api A breaking API change module-database module-wallet new feature New feature or request
Projects
Archived in project
Development

No branches or pull requests

5 participants