-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
accounts-db: Sampled LRU eviction #3943
Conversation
@alessandrod @brooksprumo Marked as a draft,I still want to take some time for more benchmark runs and figuring out how that behaves on the actual MB validator. And also on ensuring about the choice of K parameter. The benchmark is not the best representation of the real workload, but I still expect the overall "numbers go lower" tendency. Feel free to comment on the code already. |
Yeah, this will be the valuable one. A thing to also check is if we should also remove the code the conditionally doesn't update the last accessed time. Because if we're randomly sampling, we want those last accessed times to be more accurate. We'll have to see if there's perf issue with that though, or if it is fast since we don't have to maintain the lru queue anymore. |
Good point. I will benchmark that, theoretically the unconditional update of the timestamp shouldn't change much. if let Some(entry) = self.cache.get(&pubkey) {
if entry.slot == slot {
let update_lru = entry.ms_since_last_update() >= self.ms_to_skip_lru_update;
if update_lru {
entry
.last_update_time
.store(ReadOnlyAccountCacheEntry::timestamp(), Ordering::Release);
}
[...]
}
|
ef1a7d2
to
afbf23f
Compare
1a5ce85
to
e8597e4
Compare
e8597e4
to
84f4a45
Compare
e13717b
to
142cc80
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the idea looks good now. I'll do a deep dive on the code/impl itself next pass.
142cc80
to
92d4309
Compare
e61d65b
to
ed25318
Compare
f4d4d7d
to
7fb5818
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The evict code looks good. I need to do another pass on the tests still.
62ff8de
to
0714d61
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
almost there! left a few comments
8c940b5
to
e3b24e0
Compare
e3b24e0
to
582c165
Compare
Instead of rigid LRU eviction, perform sampled LRU eviction. Sampled LRU eviction takes K random keys and picks the one with the lowest update timestamp. This way, even though the evicted element is not always the oldest in the whole cache, removes the necessity of maintaining a queue and reduces the contention caused by locking the queue. The K parameter is configurable, but the best performing value so far was 8 and it's used as the default one. The new eviction mechanism results in performance improvement in the cache eviction benchmarks: ``` read_only_accounts_cache_eviction_lo_hi/load/8 time: [1.3057 µs 1.3065 µs 1.3076 µs] change: [-63.644% -60.958% -57.959%] (p = 0.00 < 0.05) Performance has improved. read_only_accounts_cache_eviction_lo_hi/store/8 time: [2.1751 µs 2.1809 µs 2.1856 µs] change: [-79.874% -79.642% -79.420%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) low mild read_only_accounts_cache_eviction_lo_hi/load/16 time: [2.3593 µs 2.3614 µs 2.3630 µs] change: [-56.035% -52.626% -48.887%] (p = 0.00 < 0.05) Performance has improved. read_only_accounts_cache_eviction_lo_hi/store/16 time: [3.1320 µs 3.1436 µs 3.1558 µs] change: [-85.141% -84.976% -84.813%] (p = 0.00 < 0.05) Performance has improved. Found 13 outliers among 100 measurements (13.00%) 7 (7.00%) low severe 5 (5.00%) low mild 1 (1.00%) high severe read_only_accounts_cache_eviction_lo_hi/load/32 time: [1.5572 µs 1.5614 µs 1.5662 µs] change: [-85.886% -84.973% -83.975%] (p = 0.00 < 0.05) Performance has improved. Found 9 outliers among 100 measurements (9.00%) 8 (8.00%) high mild 1 (1.00%) high severe read_only_accounts_cache_eviction_lo_hi/store/32 time: [3.3965 µs 3.4077 µs 3.4184 µs] change: [-90.858% -90.724% -90.585%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 5 (5.00%) low severe 2 (2.00%) low mild read_only_accounts_cache_eviction_lo_hi/load/64 time: [2.2669 µs 2.5165 µs 2.7736 µs] change: [-91.867% -91.083% -90.189%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild read_only_accounts_cache_eviction_lo_hi/store/64 time: [6.6097 µs 6.9530 µs 7.2983 µs] change: [-91.090% -90.788% -90.417%] (p = 0.00 < 0.05) Performance has improved. Found 26 outliers among 100 measurements (26.00%) 3 (3.00%) low severe 7 (7.00%) low mild 6 (6.00%) high mild 10 (10.00%) high severe read_only_accounts_cache_eviction_hi/load/8 time: [2.1554 µs 2.1580 µs 2.1600 µs] change: [-44.674% -39.965% -34.699%] (p = 0.00 < 0.05) Performance has improved. read_only_accounts_cache_eviction_hi/store/8 time: [2.7438 µs 2.7580 µs 2.7689 µs] change: [-80.454% -80.144% -79.805%] (p = 0.00 < 0.05) Performance has improved. read_only_accounts_cache_eviction_hi/load/16 time: [1.9593 µs 1.9604 µs 1.9619 µs] change: [-64.954% -61.744% -58.367%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild read_only_accounts_cache_eviction_hi/store/16 time: [2.7436 µs 2.7581 µs 2.7697 µs] change: [-84.959% -84.721% -84.489%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) low mild read_only_accounts_cache_eviction_hi/load/32 time: [2.5226 µs 2.5260 µs 2.5301 µs] change: [-79.214% -77.822% -76.252%] (p = 0.00 < 0.05) Performance has improved. Found 8 outliers among 100 measurements (8.00%) 8 (8.00%) low mild read_only_accounts_cache_eviction_hi/store/32 time: [4.0885 µs 4.1015 µs 4.1148 µs] change: [-89.677% -89.496% -89.310%] (p = 0.00 < 0.05) Performance has improved. Found 11 outliers among 100 measurements (11.00%) 7 (7.00%) low severe 4 (4.00%) low mild read_only_accounts_cache_eviction_hi/load/64 time: [4.1474 µs 4.3543 µs 4.5721 µs] change: [-86.307% -85.336% -84.364%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) low mild read_only_accounts_cache_eviction_hi/store/64 time: [5.8232 µs 6.3214 µs 6.7945 µs] change: [-93.624% -93.167% -92.638%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild ```
582c165
to
565acd7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the work you've done here! I'm excited for this to merge!
Instead of rigid LRU eviction, perform sampled LRU eviction.
Sampled LRU eviction takes K random keys and picks the one with the lowest update timestamp. This way, even though the evicted element is not always the oldest in the whole cache, removes the necessity of maintaining a queue and reduces the contention caused by locking the queue.
The K parameter is configurable, but the best performing value so far was 8 and it's used as the default one.
The new eviction mechanism results in performance improvement in the cache benchmarks:
The InfluxDB metrics report:
Ref #4272