replace validator Bloom filter with partial bucket sort #6469
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Prior to Electra, the Bloom filter along with the reversed search order sufficed to improve deposit processing speeds, and nowhere else in the beacon chain spec was it required to look up a validator index from a pubkey. The Bloom filter exploited that finding the actual validator index wasn't even necessary, only whether it existed:
However, in Electra, execution layer withdrawal requests and consolidation requests both require ascertaining the specific validator index, not merely its presence or absence, and the latter does so twice per request.
Because there can be up to 16 withdrawal and 1 consolidation request(s) per block, this might require up to 18 full validator set scans per block for those functions alone.
One obvious approach is to use a
Table
or similar construction, but even with some clever encoding, that typically requires hundreds (plural) of additional megabytes for this functionality. It's not required to be O(1), just fast enough to run a few dozen times per block (because as of this PR it's now responsible for deposit processing too, which can occur 16 times per block, in addition to the deposit requests).The Bloom filter had been configured to use a single 8MB array, and since Nimbus represents
ValidatorIndex
asdistinct int32
, for 2 million validators total (already at ~1.6M, not just active), the bucket sort is equivalent in memory usage, while taking the same approximately 12ms to initialize and still doing a full search in about 0.1ms, which is fast enough to construct it per block and avoid long-lived cache weirdness and allow a couple of hundred searches without serious impact on block processing time.The data structure gets a little convoluted with a separate one-time-constructed array and a separate free-form
seq
, which was arrived at after benchmarking: a version which constructed (initially also perfectly allocated and uninitialized)seq
s lost in initialization time by 3-4ms, or about 33%, because the nestedarray
/seq
indexing combined withboundChecks:on
made for a rather hot inner loop. Because it's a short-lived acceleration structure, it can safely assume that theseextraItems
won't be many, only as many validators as a single block can introduce, and it never appears in a tight loop, so simple is fine.