replace validator Bloom filter with partial bucket sort #6469

tersec · 2024-08-02T02:43:09Z

Prior to Electra, the Bloom filter along with the reversed search order sufficed to improve deposit processing speeds, and nowhere else in the beacon chain spec was it required to look up a validator index from a pubkey. The Bloom filter exploited that finding the actual validator index wasn't even necessary, only whether it existed:

def apply_deposit(state: BeaconState,
    ...
    validator_pubkeys = [v.pubkey for v in state.validators]
    if pubkey not in validator_pubkeys:
        # Verify the deposit signature (proof of possession) which is not checked by the deposit contract
        deposit_message = DepositMessage(
            pubkey=pubkey,
            withdrawal_credentials=withdrawal_credentials,
            amount=amount,
        )
   ...

However, in Electra, execution layer withdrawal requests and consolidation requests both require ascertaining the specific validator index, not merely its presence or absence, and the latter does so twice per request.

Because there can be up to 16 withdrawal and 1 consolidation request(s) per block, this might require up to 18 full validator set scans per block for those functions alone.

One obvious approach is to use a Table or similar construction, but even with some clever encoding, that typically requires hundreds (plural) of additional megabytes for this functionality. It's not required to be O(1), just fast enough to run a few dozen times per block (because as of this PR it's now responsible for deposit processing too, which can occur 16 times per block, in addition to the deposit requests).

The Bloom filter had been configured to use a single 8MB array, and since Nimbus represents ValidatorIndex as distinct int32, for 2 million validators total (already at ~1.6M, not just active), the bucket sort is equivalent in memory usage, while taking the same approximately 12ms to initialize and still doing a full search in about 0.1ms, which is fast enough to construct it per block and avoid long-lived cache weirdness and allow a couple of hundred searches without serious impact on block processing time.

The data structure gets a little convoluted with a separate one-time-constructed array and a separate free-form seq, which was arrived at after benchmarking: a version which constructed (initially also perfectly allocated and uninitialized) seqs lost in initialization time by 3-4ms, or about 33%, because the nested array/seq indexing combined with boundChecks:on made for a rather hot inner loop. Because it's a short-lived acceleration structure, it can safely assume that these extraItems won't be many, only as many validators as a single block can introduce, and it never appears in a tight loop, so simple is fine.

github-actions · 2024-08-02T02:58:46Z

Unit Test Results

        9 files ±0   1 334 suites ±0 28m 53s ⏱️ -29s
  5 000 tests ±0   4 652 ✔️ ±0 348 💤 ±0 0 ❌ ±0
20 880 runs ±0 20 476 ✔️ ±0 404 💤 ±0 0 ❌ ±0

Results for commit 378d8b4. ± Comparison against base commit 6a57697.

♻️ This comment has been updated with latest results.

tersec · 2024-08-04T01:37:17Z

Around 25% faster on ordinary deposit processing, too. Comparing blocks on 2024-08-03 on linux-0x.ih-eu-mda1.nimbus.mainnet hosts, for blocks without deposits:

$ for i in $(seq 1 6); do ssh linux-0$i.ih-eu-mda1.nimbus.mainnet "zgrep -h Block.resolved ${NODE_LOGS}/service.20240803_{13,20}00.log.gz | grep ,.deposits_len.:0,  | jq .stateVerifyDur.value"; done | datamash count 1 mean 1 q1 1 median 1 q3 1
3490	56564476.147278	54566626.5	55893542.5	57634373.75

for stable and

$ for i in $(seq 1 6); do ssh linux-0$i.ih-eu-mda1.nimbus.mainnet "zgrep -h Block.resolved ${NODE_LOGS}/service.
20240803_{13,20}00.log.gz | grep ,.deposits_len.:0,  | jq .stateVerifyDur.value"; done | datamash count 1 mean 1 q1 1 median 1 q3 1
3488	57014482.569954	54880211.5	56296166.5	58097381.5

for unstable, where it's quicker to scan just those two hours because that's also when the blocks with deposits are, and the sample size suffices regardless.

These are effectively the same, no real change. By contrast, with deposits, stable (without these optimizations) has

$ for i in $(seq 1 6); do ssh linux-0$i.ih-eu-mda1.nimbus.mainnet "zgrep -h Block.resolved ${NODE_LOGS}/service.20240803_{13,20}00.log.gz | grep -v ,.deposits_len.:0,  | jq .stateVerifyDur.value"; done | datamash count 1 mean 1 q1 1 median 1 q3 1
96	381539194.44792	342287028.25	371257674.5	415338661.75

whereas unstable brings that down to

$ for i in $(seq 1 6); do ssh linux-0$i.ih-eu-mda1.nimbus.mainnet "zgrep -h Block.resolved ${NODE_LOGS}/service.20240803_{13,20}00.log.gz | grep -v ,.deposits_len.:0,  | jq .stateVerifyDur.value"; done | datamash count 1 mean 1 q1 1 median 1 q3 1
96	272465554.0625	268435887.5	277284808	289672389

for this 25+% improvement.

tersec force-pushed the imB branch from 3156996 to 092c12f Compare August 2, 2024 03:06

replace validator Bloom filter with partial bucket sort

378d8b4

tersec force-pushed the imB branch from 092c12f to 378d8b4 Compare August 2, 2024 03:29

tersec enabled auto-merge (squash) August 2, 2024 03:55

tersec merged commit 8333365 into unstable Aug 2, 2024
13 checks passed

tersec deleted the imB branch August 2, 2024 05:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

replace validator Bloom filter with partial bucket sort #6469

replace validator Bloom filter with partial bucket sort #6469

tersec commented Aug 2, 2024 •

edited

Loading

github-actions bot commented Aug 2, 2024 •

edited

Loading

tersec commented Aug 4, 2024

replace validator Bloom filter with partial bucket sort #6469

replace validator Bloom filter with partial bucket sort #6469

Conversation

tersec commented Aug 2, 2024 • edited Loading

github-actions bot commented Aug 2, 2024 • edited Loading

Unit Test Results

tersec commented Aug 4, 2024

tersec commented Aug 2, 2024 •

edited

Loading

github-actions bot commented Aug 2, 2024 •

edited

Loading