Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

assertion failed: self.input_best_block_index.map_or #1901

Open
wirednkod opened this issue Jul 8, 2024 · 11 comments
Open

assertion failed: self.input_best_block_index.map_or #1901

wirednkod opened this issue Jul 8, 2024 · 11 comments

Comments

@wirednkod
Copy link
Contributor

wirednkod commented Jul 8, 2024

Smoldot crashed, while running for a few minutes, in the browser (in fellowship dashboard) with the following message:

panicked at /__w/smoldot/smoldot/lib/src/chain/async_tree.rs:866: 
assertion failed: self.input_best_block_index.map_or(false,
    |current_input_best|
        self.non_finalized_blocks.is_ancestor(node_to_finalize,
            current_input_best))

and

smoldot_worker-bYUeUND9.js:1 Uncaught (in promise) Error
    at panic (smoldot_worker-bYUeUND9.js:1:1249)
    at 01388372:0x188bf6
    at 01388372:0x37fd96
    at 01388372:0x38035a
    at 01388372:0x76cd1
    at 01388372:0x742fb
    at 01388372:0x245b8
    at 01388372:0x2510a
    at 01388372:0x24f33
    at 01388372:0x1872a4

There are specific panic reasons as described here but am not sure if such a case occured.
I have a question concerning the case: Why should smoldot's code panic in case node_to_finalize isn't a valid node or if the current input best block is not a descendant of node_to_finalize? Could it just "skip" the state background update for this iteration?

@tomaka
Copy link
Contributor

tomaka commented Jul 17, 2024

I can't do much without logs unfortunately.
Looking at the code, I've found #1912, but I have no idea if this is the cause here.

Could it just "skip" the state background update for this iteration?

That would simply hide the bug.

@wirednkod
Copy link
Contributor Author

wirednkod commented Jul 17, 2024

Thank you @tomaka
Unfortunately I was not able to reproduce it, even though I tried several time. It looked like a rare case - but will retry to (I know how much you need logs).

Could it just "skip" the state background update for this iteration?

The reasoning here is not to "hide" the bug but just not act/panic when a node_to_finalize is not valid. If node is not valid then smoldot just disregard ( i am sure im missing some context to why this is maybe a bad idea)

@tomaka
Copy link
Contributor

tomaka commented Jul 17, 2024

The reasoning here is not to "hide" the bug but just not act/panic when a node_to_finalize is not valid. If node is not valid then smoldot just disregard ( i am sure im missing some context to why this is maybe a bad idea)

Imagine that (in Javascript) you're doing for example array.push(something), and then later array.pop() in order to retrieve something.
Now imagine that for some reason push has a problem and cannot push the element.
What's the best thing to do? Immediately stop everything and report the problem, or silently continue executing as if the element had been pushed?
If you do the latter, then later you'll pop the wrong element, which will most likely lead to other bugs that are way harder to debug. And if you ignore these other bugs, you'll get even more nasty bugs.

It's the same situation here, except that push is instead the input_finalize function. And in the case of security-sensitive code, these bugs could be exploited by an attacker.

@wirednkod
Copy link
Contributor Author

Thank you for the explanation Pierre. Highly appreciated 🙏🏽

@tien
Copy link
Contributor

tien commented Sep 3, 2024

I have reported this further in #1934. Can confirm that this is still happening on the latest version, I'll try to capture some logs today.

@ryanleecode
Copy link

I have some logs here. They are taken from the browser, so apologies that its not super clean.

Smoldot Version: 2.0.30

Line 9810

https://gist.github.com/ryanleecode/578a14bd6f185a299ce76f63997f0709#file-smoldot_crash-log-L9810

@tomaka
Copy link
Contributor

tomaka commented Sep 20, 2024

Thanks!


  • Line 7371 new Westend block hash=0x8d0c…cdf3, height=22648624, new best
  • Line 7375 starting to fetch parablock of relay_block_hash=0x8d0c…cdf3
  • Line 7401 successfully fetched parahead_hash=0x02d6…93d1, relay_blocks=0x8d0c…cdf3
  • Line 7518 relay block 0xa11a…4b61 is finalized, leading to parablock 0xf004…1dee being finalized
  • Line 7525 relay block 0x3548…7107 is finalized, leading to parablock 0x0f81…aeff being finalized
  • Line 7650 relay block 0x50f2…af9f is finalized, leading to parablock 0x64d4…fdf0 being finalized
  • Line 7673 new Westend block hash=0x5596…9832, height=22648625, new best
  • Line 7676 starting to fetch parablock of relay_block_hash=0x5596…9832
  • Line 7689 new Westend block hash=0x7f8a…e5b8, height=22648625 (same height as previous line), new best
  • Line 7693 starting to fetch parablock of relay_block_hash=0x7f8a…e5b8
  • Line 7734 successfully fetched parahead_hash=0x9979…9ac9, relay_blocks=0x5596…9832
  • Line 7770 successfully fetched parahead_hash=0x9979…9ac9, relay_blocks=0x7f8a…e5b8
  • Line 7955 relay block 0x505f…0d66 is finalized
  • Line 8067 new Westend block hash=0xd5c7…a485, height=22648626, parent_hash=0x7f8a…e5b8, new best
  • Line 8070 starting to fetch parablock of relay_block_hash=0xd5c7…a485
  • Line 8285 successfully fetched parahead_hash=0x97e3…c2da, relay_blocks=0xd5c7…a485
  • Line 8444 relay block 0x8d0c…cdf3 is finalized
  • Line 8525 new Westend block hash=0x7b9b…dcd6, height=22648627, parent_hash=0xd5c7…a485, new best
  • Line 8528 starting to fetch parablock of relay_block_hash=0x7b9b…dcd6

Then until line 9127 the asset hub sync service shuts down and is restarted multiple times because the asset hub chain is removed and re-added multiple times. This is probably not intended but smoldot just follows instructions.

Then the asset hub sync service logs are:

0131dc6e:0x49018 [sync-service-asset-hub-westend] relay-chain-new-subscription; finalized_hash=0x8d0c…cdf3
0131dc6e:0x48177 [sync-service-asset-hub-westend] parahead-fetch-operations-cleared
0131dc6e:0x49018 [sync-service-asset-hub-westend] parahead-fetch-operation-started; relay_block_hash=0x8d0c…cdf3
0131dc6e:0x49018 [sync-service-asset-hub-westend] parahead-fetch-operation-started; relay_block_hash=0x7b9b…dcd6
0131dc6e:0x49018 [sync-service-asset-hub-westend] parahead-fetch-operation-started; relay_block_hash=0x7f8a…e5b8
0131dc6e:0x49018 [sync-service-asset-hub-westend] parahead-fetch-operation-started; relay_block_hash=0xd5c7…a485
0131dc6e:0x49018 [sync-service-asset-hub-westend] relay-chain-block-finalized; hash=0x7f8a…e5b8
0131dc6e:0x48842 [sync-service-asset-hub-westend] parahead-fetch-operation-success; parahead_hash=0x6221…20e4, relay_blocks=0x7b9b…dcd6
0131dc6e:0x49018 [sync-service-asset-hub-westend] parahead-fetch-operation-started; relay_block_hash=0x5596…9832
0131dc6e:0x48842 [sync-service-asset-hub-westend] parahead-fetch-operation-success; parahead_hash=0x02d6…93d1, relay_blocks=0x8d0c…cdf3
0131dc6e:0x49018 [sync-service-asset-hub-westend] subscriptions-notify-parablock-finalized; hash=0x02d6…93d1
0131dc6e:0x48842 [sync-service-asset-hub-westend] parahead-fetch-operation-success; parahead_hash=0x97e3…c2da, relay_blocks=0xd5c7…a485
0131dc6e:0x48842 [sync-service-asset-hub-westend] parahead-fetch-operation-success; parahead_hash=0x9979…9ac9, relay_blocks=0x7f8a…e5b8
0131dc6e:0x49018 [sync-service-asset-hub-westend] subscriptions-notify-new-parablock; hash=0x9979…9ac9
0131dc6e:0x49018 [sync-service-asset-hub-westend] subscriptions-notify-parablock-finalized; hash=0x9979…9ac9

The behavior of the sync service seems correct. It reports 0x02d6…93d1 then finalizes it, then reports 0x9979…9ac9 then finalizes it. The latter block is indeed a child of the former.

The runtime service is reset at line 9125, then starts trying to fetch the runtime of the genesis block, then starts trying to fetch the runtime of 0x02d6…93d1 once it hears about it.
Then the panic happens. I guess that it happens either line 2444 or 2452 in the runtime service, following the report for the new parablock finalization, but it's not clear to me why.

@tomaka
Copy link
Contributor

tomaka commented Sep 20, 2024

It's annoying, but the debug logs look completely correct.
My guess is that there's a subtle bug somewhere, for example is_new_best being false when it's supposed to be true, but nothing in the logs is suspicious.

If you could generate trace logs (log level 5, instead of 4), it would generate way more info and probably show the problem.

Also note that you seem to remove them immediately add back the asset hub chains multiple times. This is probably what indirectly triggers the bug. If you want a work-around, I suppose that not doing that would help.

@ryanleecode
Copy link

ryanleecode commented Sep 20, 2024

ogdfiibbabaebbogenjlolnmngfinoeb-1726846418319.log

Line 58184

EDIT: this is actually #1941

@ryanleecode
Copy link

ogdfiibbabaebbogenjlolnmngfinoeb-1726859651448.log

Line 3232.

Can't get the trace logs my PC just freezes from the sheer amoutn of output there is. this is debug logs + verbose turn on in chrome console.

I'm able to reproduce this easily by just refreshign the dapp im connecting with in rapid sucession. that seems to align with asset hub being started and shutdown mutliple times

@tomaka
Copy link
Contributor

tomaka commented Nov 8, 2024

This is plausibly fixed by #2029, but I'm not sure since I've never identified the cause.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants