Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace validator Bloom filter with partial bucket sort #6469

Merged
merged 1 commit into from
Aug 2, 2024
Merged

Conversation

tersec
Copy link
Contributor

@tersec tersec commented Aug 2, 2024

Prior to Electra, the Bloom filter along with the reversed search order sufficed to improve deposit processing speeds, and nowhere else in the beacon chain spec was it required to look up a validator index from a pubkey. The Bloom filter exploited that finding the actual validator index wasn't even necessary, only whether it existed:

def apply_deposit(state: BeaconState,
    ...
    validator_pubkeys = [v.pubkey for v in state.validators]
    if pubkey not in validator_pubkeys:
        # Verify the deposit signature (proof of possession) which is not checked by the deposit contract
        deposit_message = DepositMessage(
            pubkey=pubkey,
            withdrawal_credentials=withdrawal_credentials,
            amount=amount,
        )
   ...

However, in Electra, execution layer withdrawal requests and consolidation requests both require ascertaining the specific validator index, not merely its presence or absence, and the latter does so twice per request.

Because there can be up to 16 withdrawal and 1 consolidation request(s) per block, this might require up to 18 full validator set scans per block for those functions alone.

One obvious approach is to use a Table or similar construction, but even with some clever encoding, that typically requires hundreds (plural) of additional megabytes for this functionality. It's not required to be O(1), just fast enough to run a few dozen times per block (because as of this PR it's now responsible for deposit processing too, which can occur 16 times per block, in addition to the deposit requests).

The Bloom filter had been configured to use a single 8MB array, and since Nimbus represents ValidatorIndex as distinct int32, for 2 million validators total (already at ~1.6M, not just active), the bucket sort is equivalent in memory usage, while taking the same approximately 12ms to initialize and still doing a full search in about 0.1ms, which is fast enough to construct it per block and avoid long-lived cache weirdness and allow a couple of hundred searches without serious impact on block processing time.

The data structure gets a little convoluted with a separate one-time-constructed array and a separate free-form seq, which was arrived at after benchmarking: a version which constructed (initially also perfectly allocated and uninitialized) seqs lost in initialization time by 3-4ms, or about 33%, because the nested array/seq indexing combined with boundChecks:on made for a rather hot inner loop. Because it's a short-lived acceleration structure, it can safely assume that these extraItems won't be many, only as many validators as a single block can introduce, and it never appears in a tight loop, so simple is fine.

Copy link

github-actions bot commented Aug 2, 2024

Unit Test Results

         9 files  ±0    1 334 suites  ±0   28m 53s ⏱️ -29s
  5 000 tests ±0    4 652 ✔️ ±0  348 💤 ±0  0 ±0 
20 880 runs  ±0  20 476 ✔️ ±0  404 💤 ±0  0 ±0 

Results for commit 378d8b4. ± Comparison against base commit 6a57697.

♻️ This comment has been updated with latest results.

@tersec tersec enabled auto-merge (squash) August 2, 2024 03:55
@tersec tersec merged commit 8333365 into unstable Aug 2, 2024
13 checks passed
@tersec tersec deleted the imB branch August 2, 2024 05:30
@tersec
Copy link
Contributor Author

tersec commented Aug 4, 2024

Around 25% faster on ordinary deposit processing, too. Comparing blocks on 2024-08-03 on linux-0x.ih-eu-mda1.nimbus.mainnet hosts, for blocks without deposits:

$ for i in $(seq 1 6); do ssh linux-0$i.ih-eu-mda1.nimbus.mainnet "zgrep -h Block.resolved ${NODE_LOGS}/service.20240803_{13,20}00.log.gz | grep ,.deposits_len.:0,  | jq .stateVerifyDur.value"; done | datamash count 1 mean 1 q1 1 median 1 q3 1
3490	56564476.147278	54566626.5	55893542.5	57634373.75

for stable and

$ for i in $(seq 1 6); do ssh linux-0$i.ih-eu-mda1.nimbus.mainnet "zgrep -h Block.resolved ${NODE_LOGS}/service.
20240803_{13,20}00.log.gz | grep ,.deposits_len.:0,  | jq .stateVerifyDur.value"; done | datamash count 1 mean 1 q1 1 median 1 q3 1
3488	57014482.569954	54880211.5	56296166.5	58097381.5

for unstable, where it's quicker to scan just those two hours because that's also when the blocks with deposits are, and the sample size suffices regardless.

These are effectively the same, no real change. By contrast, with deposits, stable (without these optimizations) has

$ for i in $(seq 1 6); do ssh linux-0$i.ih-eu-mda1.nimbus.mainnet "zgrep -h Block.resolved ${NODE_LOGS}/service.20240803_{13,20}00.log.gz | grep -v ,.deposits_len.:0,  | jq .stateVerifyDur.value"; done | datamash count 1 mean 1 q1 1 median 1 q3 1
96	381539194.44792	342287028.25	371257674.5	415338661.75

whereas unstable brings that down to

$ for i in $(seq 1 6); do ssh linux-0$i.ih-eu-mda1.nimbus.mainnet "zgrep -h Block.resolved ${NODE_LOGS}/service.20240803_{13,20}00.log.gz | grep -v ,.deposits_len.:0,  | jq .stateVerifyDur.value"; done | datamash count 1 mean 1 q1 1 median 1 q3 1
96	272465554.0625	268435887.5	277284808	289672389

for this 25+% improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant