Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fast sync algorithm. #2763
Add fast sync algorithm. #2763
Changes from 6 commits
f2c9704
eca1522
00c3d65
8a5e758
011f4a5
042ae07
f8c4600
ae590ca
1fb2c0e
14408db
90b0ec9
9685fe9
9d384f5
34b5a14
7e71097
c6f0240
08510d2
f9d544f
70de7e5
5938323
72820c2
9a8778d
81b011d
17601c2
f335461
83228fc
73c0d8d
ac4db9e
96c20cf
7e416f1
3771a0a
1ad08d9
2005b4b
fa89440
350dfbc
f247877
3b5ab82
95d17ce
5b39dd1
0edf1f7
ae8a54c
ccea056
c14a7b2
7b39a9d
0dafb27
c175ca8
1940fe4
677c662
2e0a3d7
b216a18
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is the right API, it doesn't scale.
Substrate CLI had this, which we have removed since we only supported full sync:
I think we should bring
--sync
back and this particular mode can be called "fast" (even though it will be a bit confusing for those farmiliar with Substrate). I'm also not sure it is a good idea to have state pruning defined by a number since IIRC it will prune state regardless of finality, which means node will likely fail to restart in some cases due to state being unavailable at the point of archiver initialization, which is problematic.This is why I mentioned in the past that we should probably call it "farmer sync" and not support defining state pruning mode explicitly, we should manually prune things when we need to, but not sooner (hence paritytech/polkadot-sdk#1570 from September of last year).
So for now I'd vote for
--sync farmer
and do not allow user to specify--blocks-pruning
or--state-pruning
when these are used. Not sure what to do about state pruning yet though. Did you confirm that it doesn't remove non-finalized state?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I will add this later either within this PR or in a separate one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we skipping PoT verification here? PR description "PoT check update to support fast-sync" is not sufficient to understand why this is required and I do not remember discussing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's yet another special case for fast-sync. Did you mean something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can guess from the code that this is some kind of special case, but I don't understand what that special case is, what issues it causes and why you decided to fix it this way out of a set of possibilities. I sometimes can infer those things, but you'll have to help me understand it in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIR PoT verification requires runtime parameters check and it will fail without the existing state. This snippet disables the check that will fail for this reason for blocks before the fast-sync target block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we always import blocks with state, I'm not sure I'm following what you're saying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the reason you had to add it here is the side effect of incorrectly importing two raw blocks instead of one (the first one being without state). This is also the reason a more advanced version of #2767 failed: there existed block without state that should not have been there. The fact that that extra block is present will also cause archiver to fail to restart if interrupted right after insertion of the first block into database.
Basically insertion of two raw blocks is incorrect and should be fixed.
Once fixed, the failure will happen at the beginning of the function when attempting to read parent header and should be handled there accordingly.
I checked the code and I believe we should never ever have invalid votes included in the block and parent hash always comes from runtime's state and not user input, thus it should be safe to return
true
from this function in that special case.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in #2779
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like a completely random place to clear gap sync. Can it be moved right next to the code that triggers gap sync in the first place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need this operation before the regular sync starts. What place do you advise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this suggestion viable?:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to clear the gap before driving any networking futures. This is why it's placed in the initialization code far away from the sync-code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I understand the rationate here. If gap sync is triggered by some specific behavior then we should clear gap sync after we trigger that behavior. If the behavior is importing blocks in specific order then you will absolutely drive network prior to gap sync because that is how you end up with block you can physically import in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you just need import queue service in multiple places? If so you can call
import_queue.service()
more than once and get two service instances with handles to the same import queue. Or is there a different reason for this?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some point, there were three instances passed across several methods that seemed awkward. I managed to work with two instances but I left shared ref anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're still sharing it this way, what is the benefit of
Arc<Mutex<T>>
comparing to justT
here? Seems like unnecessary complexity to me.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I correct that you prefer passing two instances of service through methods instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely, it is like having a Arc and instead of accessing it from two places directly you wrapped it in additional
Arc<Mutex>>
. I simply do not understand why would you even do that.