accounts-db: Sampled LRU eviction #3943

vadorovsky · 2024-12-05T12:00:00Z

Instead of rigid LRU eviction, perform sampled LRU eviction.

Sampled LRU eviction takes K random keys and picks the one with the lowest update timestamp. This way, even though the evicted element is not always the oldest in the whole cache, removes the necessity of maintaining a queue and reduces the contention caused by locking the queue.

The K parameter is configurable, but the best performing value so far was 8 and it's used as the default one.

The new eviction mechanism results in performance improvement in the cache benchmarks:

read_only_accounts_cache_eviction_lo_hi/load/8
                        time:   [1.3057 µs 1.3065 µs 1.3076 µs]
                        change: [-63.644% -60.958% -57.959%] (p = 0.00 < 0.05)
                        Performance has improved.
read_only_accounts_cache_eviction_lo_hi/store/8
                        time:   [2.1751 µs 2.1809 µs 2.1856 µs]
                        change: [-79.874% -79.642% -79.420%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) low mild
read_only_accounts_cache_eviction_lo_hi/load/16
                        time:   [2.3593 µs 2.3614 µs 2.3630 µs]
                        change: [-56.035% -52.626% -48.887%] (p = 0.00 < 0.05)
                        Performance has improved.
read_only_accounts_cache_eviction_lo_hi/store/16
                        time:   [3.1320 µs 3.1436 µs 3.1558 µs]
                        change: [-85.141% -84.976% -84.813%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  7 (7.00%) low severe
  5 (5.00%) low mild
  1 (1.00%) high severe
read_only_accounts_cache_eviction_lo_hi/load/32
                        time:   [1.5572 µs 1.5614 µs 1.5662 µs]
                        change: [-85.886% -84.973% -83.975%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  8 (8.00%) high mild
  1 (1.00%) high severe
read_only_accounts_cache_eviction_lo_hi/store/32
                        time:   [3.3965 µs 3.4077 µs 3.4184 µs]
                        change: [-90.858% -90.724% -90.585%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) low severe
  2 (2.00%) low mild
read_only_accounts_cache_eviction_lo_hi/load/64
                        time:   [2.2669 µs 2.5165 µs 2.7736 µs]
                        change: [-91.867% -91.083% -90.189%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
read_only_accounts_cache_eviction_lo_hi/store/64
                        time:   [6.6097 µs 6.9530 µs 7.2983 µs]
                        change: [-91.090% -90.788% -90.417%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 26 outliers among 100 measurements (26.00%)
  3 (3.00%) low severe
  7 (7.00%) low mild
  6 (6.00%) high mild
  10 (10.00%) high severe

read_only_accounts_cache_eviction_hi/load/8
                        time:   [2.1554 µs 2.1580 µs 2.1600 µs]
                        change: [-44.674% -39.965% -34.699%] (p = 0.00 < 0.05)
                        Performance has improved.
read_only_accounts_cache_eviction_hi/store/8
                        time:   [2.7438 µs 2.7580 µs 2.7689 µs]
                        change: [-80.454% -80.144% -79.805%] (p = 0.00 < 0.05)
                        Performance has improved.
read_only_accounts_cache_eviction_hi/load/16
                        time:   [1.9593 µs 1.9604 µs 1.9619 µs]
                        change: [-64.954% -61.744% -58.367%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
read_only_accounts_cache_eviction_hi/store/16
                        time:   [2.7436 µs 2.7581 µs 2.7697 µs]
                        change: [-84.959% -84.721% -84.489%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild
read_only_accounts_cache_eviction_hi/load/32
                        time:   [2.5226 µs 2.5260 µs 2.5301 µs]
                        change: [-79.214% -77.822% -76.252%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  8 (8.00%) low mild
read_only_accounts_cache_eviction_hi/store/32
                        time:   [4.0885 µs 4.1015 µs 4.1148 µs]
                        change: [-89.677% -89.496% -89.310%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  7 (7.00%) low severe
  4 (4.00%) low mild
read_only_accounts_cache_eviction_hi/load/64
                        time:   [4.1474 µs 4.3543 µs 4.5721 µs]
                        change: [-86.307% -85.336% -84.364%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild
read_only_accounts_cache_eviction_hi/store/64
                        time:   [5.8232 µs 6.3214 µs 6.7945 µs]
                        change: [-93.624% -93.167% -92.638%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

The InfluxDB metrics report:

A slight improvement in load times in general and less spikes.
A significant improvement in store times and less spikes.
Longer evict times, but that's expected and we are willing to accept it as a trade-off.

Ref #4272

vadorovsky · 2024-12-05T12:02:35Z

@alessandrod @brooksprumo Marked as a draft,I still want to take some time for more benchmark runs and figuring out how that behaves on the actual MB validator. And also on ensuring about the choice of K parameter. The benchmark is not the best representation of the real workload, but I still expect the overall "numbers go lower" tendency.

Feel free to comment on the code already.

accounts-db/src/read_only_accounts_cache.rs

brooksprumo · 2024-12-05T20:08:15Z

behaves on the actual MB validator

Yeah, this will be the valuable one.

A thing to also check is if we should also remove the code the conditionally doesn't update the last accessed time. Because if we're randomly sampling, we want those last accessed times to be more accurate. We'll have to see if there's perf issue with that though, or if it is fast since we don't have to maintain the lru queue anymore.

vadorovsky · 2024-12-06T08:26:07Z

behaves on the actual MB validator

Yeah, this will be the valuable one.

A thing to also check is if we should also remove the code the conditionally doesn't update the last accessed time. Because if we're randomly sampling, we want those last accessed times to be more accurate. We'll have to see if there's perf issue with that though, or if it is fast since we don't have to maintain the lru queue anymore.

Good point. I will benchmark that, theoretically the unconditional update of the timestamp shouldn't change much.

            if let Some(entry) = self.cache.get(&pubkey) {
                if entry.slot == slot {
                    let update_lru = entry.ms_since_last_update() >= self.ms_to_skip_lru_update;
                    if update_lru {
                        entry
                            .last_update_time
                            .store(ReadOnlyAccountCacheEntry::timestamp(), Ordering::Release);
                    }
                    [...]
            }

self.cache.get returns Ref, which acquires a read lock. Updating the timestamp without write lock works, because it's an atomic field. So since we got rid of the queue, we don't have to write lock anything.

accounts-db/src/read_only_accounts_cache.rs

brooksprumo

I think the idea looks good now. I'll do a deep dive on the code/impl itself next pass.

accounts-db/src/read_only_accounts_cache.rs

brooksprumo

The evict code looks good. I need to do another pass on the tests still.

accounts-db/src/read_only_accounts_cache.rs

accounts-db/src/accounts_db.rs

accounts-db/src/read_only_accounts_cache.rs

alessandrod

almost there! left a few comments

accounts-db/src/accounts_db.rs

accounts-db/src/read_only_accounts_cache.rs

accounts-db/src/accounts_db.rs

accounts-db/src/read_only_accounts_cache.rs

Instead of rigid LRU eviction, perform sampled LRU eviction. Sampled LRU eviction takes K random keys and picks the one with the lowest update timestamp. This way, even though the evicted element is not always the oldest in the whole cache, removes the necessity of maintaining a queue and reduces the contention caused by locking the queue. The K parameter is configurable, but the best performing value so far was 8 and it's used as the default one. The new eviction mechanism results in performance improvement in the cache eviction benchmarks: ``` read_only_accounts_cache_eviction_lo_hi/load/8 time: [1.3057 µs 1.3065 µs 1.3076 µs] change: [-63.644% -60.958% -57.959%] (p = 0.00 < 0.05) Performance has improved. read_only_accounts_cache_eviction_lo_hi/store/8 time: [2.1751 µs 2.1809 µs 2.1856 µs] change: [-79.874% -79.642% -79.420%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) low mild read_only_accounts_cache_eviction_lo_hi/load/16 time: [2.3593 µs 2.3614 µs 2.3630 µs] change: [-56.035% -52.626% -48.887%] (p = 0.00 < 0.05) Performance has improved. read_only_accounts_cache_eviction_lo_hi/store/16 time: [3.1320 µs 3.1436 µs 3.1558 µs] change: [-85.141% -84.976% -84.813%] (p = 0.00 < 0.05) Performance has improved. Found 13 outliers among 100 measurements (13.00%) 7 (7.00%) low severe 5 (5.00%) low mild 1 (1.00%) high severe read_only_accounts_cache_eviction_lo_hi/load/32 time: [1.5572 µs 1.5614 µs 1.5662 µs] change: [-85.886% -84.973% -83.975%] (p = 0.00 < 0.05) Performance has improved. Found 9 outliers among 100 measurements (9.00%) 8 (8.00%) high mild 1 (1.00%) high severe read_only_accounts_cache_eviction_lo_hi/store/32 time: [3.3965 µs 3.4077 µs 3.4184 µs] change: [-90.858% -90.724% -90.585%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 5 (5.00%) low severe 2 (2.00%) low mild read_only_accounts_cache_eviction_lo_hi/load/64 time: [2.2669 µs 2.5165 µs 2.7736 µs] change: [-91.867% -91.083% -90.189%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild read_only_accounts_cache_eviction_lo_hi/store/64 time: [6.6097 µs 6.9530 µs 7.2983 µs] change: [-91.090% -90.788% -90.417%] (p = 0.00 < 0.05) Performance has improved. Found 26 outliers among 100 measurements (26.00%) 3 (3.00%) low severe 7 (7.00%) low mild 6 (6.00%) high mild 10 (10.00%) high severe read_only_accounts_cache_eviction_hi/load/8 time: [2.1554 µs 2.1580 µs 2.1600 µs] change: [-44.674% -39.965% -34.699%] (p = 0.00 < 0.05) Performance has improved. read_only_accounts_cache_eviction_hi/store/8 time: [2.7438 µs 2.7580 µs 2.7689 µs] change: [-80.454% -80.144% -79.805%] (p = 0.00 < 0.05) Performance has improved. read_only_accounts_cache_eviction_hi/load/16 time: [1.9593 µs 1.9604 µs 1.9619 µs] change: [-64.954% -61.744% -58.367%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild read_only_accounts_cache_eviction_hi/store/16 time: [2.7436 µs 2.7581 µs 2.7697 µs] change: [-84.959% -84.721% -84.489%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) low mild read_only_accounts_cache_eviction_hi/load/32 time: [2.5226 µs 2.5260 µs 2.5301 µs] change: [-79.214% -77.822% -76.252%] (p = 0.00 < 0.05) Performance has improved. Found 8 outliers among 100 measurements (8.00%) 8 (8.00%) low mild read_only_accounts_cache_eviction_hi/store/32 time: [4.0885 µs 4.1015 µs 4.1148 µs] change: [-89.677% -89.496% -89.310%] (p = 0.00 < 0.05) Performance has improved. Found 11 outliers among 100 measurements (11.00%) 7 (7.00%) low severe 4 (4.00%) low mild read_only_accounts_cache_eviction_hi/load/64 time: [4.1474 µs 4.3543 µs 4.5721 µs] change: [-86.307% -85.336% -84.364%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) low mild read_only_accounts_cache_eviction_hi/store/64 time: [5.8232 µs 6.3214 µs 6.7945 µs] change: [-93.624% -93.167% -92.638%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild ```

brooksprumo

Thanks for all the work you've done here! I'm excited for this to merge!

vadorovsky requested review from alessandrod and brooksprumo December 5, 2024 12:00

vadorovsky commented Dec 5, 2024

View reviewed changes

accounts-db/src/read_only_accounts_cache.rs Outdated Show resolved Hide resolved

vadorovsky force-pushed the sampled-lru-eviction branch 4 times, most recently from ef1a7d2 to afbf23f Compare December 10, 2024 10:33

vadorovsky force-pushed the sampled-lru-eviction branch 7 times, most recently from 1a5ce85 to e8597e4 Compare December 26, 2024 07:13

vadorovsky force-pushed the sampled-lru-eviction branch from e8597e4 to 84f4a45 Compare December 30, 2024 10:22

brooksprumo reviewed Dec 30, 2024

View reviewed changes

accounts-db/src/read_only_accounts_cache.rs Outdated Show resolved Hide resolved

brooksprumo self-requested a review December 30, 2024 18:37

vadorovsky force-pushed the sampled-lru-eviction branch 6 times, most recently from e13717b to 142cc80 Compare January 2, 2025 09:33

brooksprumo reviewed Jan 2, 2025

View reviewed changes

accounts-db/src/read_only_accounts_cache.rs Outdated Show resolved Hide resolved

accounts-db/src/read_only_accounts_cache.rs Show resolved Hide resolved

brooksprumo self-requested a review January 2, 2025 16:12

vadorovsky force-pushed the sampled-lru-eviction branch from 142cc80 to 92d4309 Compare January 3, 2025 11:53

vadorovsky mentioned this pull request Jan 3, 2025

accounts-db: Improve sharding of the read-only cache #4255

Draft

vadorovsky force-pushed the sampled-lru-eviction branch 2 times, most recently from e61d65b to ed25318 Compare January 13, 2025 14:53

brooksprumo reviewed Jan 13, 2025

View reviewed changes

accounts-db/src/read_only_accounts_cache.rs Outdated Show resolved Hide resolved

accounts-db/src/read_only_accounts_cache.rs Outdated Show resolved Hide resolved

vadorovsky force-pushed the sampled-lru-eviction branch 9 times, most recently from f4d4d7d to 7fb5818 Compare January 15, 2025 09:40

brooksprumo reviewed Jan 15, 2025

View reviewed changes

brooksprumo self-requested a review January 15, 2025 15:35

vadorovsky force-pushed the sampled-lru-eviction branch 3 times, most recently from 62ff8de to 0714d61 Compare January 15, 2025 19:54

alessandrod reviewed Jan 16, 2025

View reviewed changes

vadorovsky force-pushed the sampled-lru-eviction branch 3 times, most recently from 8c940b5 to e3b24e0 Compare January 16, 2025 08:38

alessandrod reviewed Jan 16, 2025

View reviewed changes

accounts-db/src/accounts_db.rs Outdated Show resolved Hide resolved

vadorovsky force-pushed the sampled-lru-eviction branch from e3b24e0 to 582c165 Compare January 16, 2025 08:59

alessandrod reviewed Jan 16, 2025

View reviewed changes

accounts-db/src/read_only_accounts_cache.rs Show resolved Hide resolved

vadorovsky force-pushed the sampled-lru-eviction branch from 582c165 to 565acd7 Compare January 16, 2025 10:30

alessandrod approved these changes Jan 16, 2025

View reviewed changes

brooksprumo approved these changes Jan 16, 2025

View reviewed changes

vadorovsky merged commit 1b99961 into anza-xyz:master Jan 16, 2025
30 checks passed

vadorovsky deleted the sampled-lru-eviction branch January 16, 2025 14:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

accounts-db: Sampled LRU eviction #3943

accounts-db: Sampled LRU eviction #3943

vadorovsky commented Dec 5, 2024 •

edited

Loading

vadorovsky commented Dec 5, 2024 •

edited

Loading

brooksprumo commented Dec 5, 2024 •

edited

Loading

vadorovsky commented Dec 6, 2024

brooksprumo left a comment

brooksprumo left a comment

alessandrod left a comment

brooksprumo left a comment

accounts-db: Sampled LRU eviction #3943

accounts-db: Sampled LRU eviction #3943

Conversation

vadorovsky commented Dec 5, 2024 • edited Loading

vadorovsky commented Dec 5, 2024 • edited Loading

brooksprumo commented Dec 5, 2024 • edited Loading

vadorovsky commented Dec 6, 2024

brooksprumo left a comment

Choose a reason for hiding this comment

brooksprumo left a comment

Choose a reason for hiding this comment

alessandrod left a comment

Choose a reason for hiding this comment

brooksprumo left a comment

Choose a reason for hiding this comment

vadorovsky commented Dec 5, 2024 •

edited

Loading

vadorovsky commented Dec 5, 2024 •

edited

Loading

brooksprumo commented Dec 5, 2024 •

edited

Loading