Remove mmap isLoaded check before madvise #14156

original-brownbear · 2025-01-21T13:50:18Z

The counter guarding the madvise callback is not all that effective in preventing madvise calls on slices since it's not shared with the original input and this keeps showing up hot in profiling ES benchmarks.
An outright madvise call should be about as expensive as the isLoaded check when things are already in the page cache and cheaper than that same call + a call to isLoaded that returns false, simply by avoiding another roundtrip to userspace.

-> remove the isLoaded guard to speed up call cases

Excerpt from ES profile motivating this:

Checking the page table for loaded is comparable in cost to many thousands of cycles in practice.
Plus, the racy nature of such a check reduces its value in memory constrained scenarios.

The counter guarding the `madvise` callback is not all that effective in preventing `madvise` calls on slices since it's not shared with the original input and this keeps showing up hot in profiling ES benchmarks. An outright `madvise` call should be about as expensive as the `isLoaded` check when things are already in the page cache and cheaper than that same call + a call to `isLoaded` that returns `false`, simply by avoiding another roundtrip to userspace. -> remove the isLoaded guard to speed up call cases

ChrisHegarty · 2025-01-21T14:15:22Z

The overhead of MS::isLoaded is certainly not a good tradeoff here, as can be seen from the profile that you posted @original-brownbear. Given the new code, consecutivePrefetchHitCount doesn't seem effective now at all.

Additionally, and orthogonally, I'm less sure of the checks anyway, since they don't count the against the actual memory range but rather against the whole segment.

jpountz · 2025-01-21T21:06:04Z

An outright madvise call should be about as expensive as the isLoaded check when things are already in the page cache

The PR where consecutivePrefetchHitCount was introduced had a benchmark that demonstrated an improvement with the current logic: #13381 (comment). That said, this benchmark had a different access pattern compared to this one, which I guess is terms dictionary lookups performed by TermQuery, so I can believe that it's not always better. I wonder if you are able to create a benchmark that shows an improvement with this change.

The counter guarding the madvise callback is not all that effective in preventing madvise calls on slices since it's not shared with the original input

Thinking out loud, this isLoaded() check only makes sense when prefetch() is called multiple times on the same IndexInput, which is never the case for terms dictionary lookups (as you pointed out). So another way of improving things here may be to disable the isLoaded() check on the first (or first few) calls to prefetch().

Additionally, and orthogonally, I'm less sure of the checks anyway, since they don't count the against the actual memory range but rather against the whole segment.

These checks make an assumption that if many ranges of bytes on which prefetch() has been called were already loaded in memory, then there are good chances that most of the segment is loaded in memory.

original-brownbear · 2025-01-22T11:59:47Z

@jpountz

was introduced had a benchmark that demonstrated an improvement with the current logic

Huh those results are quite unexpected I must admit :) When me and Chris reasoned about this, the idea was that madvise ("will need") over a range that is fully cached would be the same cost as the isLoaded check. If that assumption holds this PR would be a clear winner but it seems that benchmark suggests otherwise?

jpountz · 2025-01-22T12:49:21Z

Well, you may be right as well that the cost of MS::isLoaded is of a similar order of magnitude as madvise. What the current logic does is that if you get MS::isLoaded to frequently return true (suggesting that a good share of the MS is loaded), then both the call to MS::isLoaded and the call to madvise will be skipped, this is likely where the speedup from the benchmark that I linked is coming from.

What do you think of skipping the isLoaded() check for the first per-IndexInput call to prefetch()? This should help in the case that you identified (single prefetch() call per IndexInput slice instance), while not defeating the current logic to skip calls to MS::isLoaded / madvise for index inputs that appear to mostly be loaded already when doing several prefetch() calls on the same slice.

original-brownbear · 2025-01-22T14:02:41Z

@jpountz I see. Hmm I wonder how much that saves us? Seems we just trade an isLoaded for an madvise on systems with enough memory? That said, maybe the madvise is far cheaper because it doesn't necessarily cover the whole segment like the isLoaded check?

EDIT: sorry scratch that question, both run against the same code, I misread things. So not sure how much that buys us on the fast path. It seems fundamentally, with it comes down to on my machine when benchmarking is that if everything's in the cache then either madvise or isLoaded incur significant overhead for no gain. Not sure there's a no-tradeoffs way out of this, probably not :) The faster your disk and the more memory you have the more pronounced the downsides of this logic and vice-versa for the gains :)
Maybe the trick here would be to keep track of the loaded state in a way that works across slices so that we throttle more efficiently?

jpountz · 2025-01-22T14:29:24Z

Seems we just trade an isLoaded for an madvise on systems with enough memory?

This is correct. I made this suggestion because it was similar to your initial proposal: skipping the MS::isLoaded check and doing a madvise call directly instead.

Maybe the trick here would be to keep track of the loaded state in a way that works across slices so that we throttle more efficiently?

I'm good with exploring sharing state across slices as well. I optimized for simplicity with the current implementation, but we may be able to figure out something that works well across slices as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove mmap isLoaded check before madvise #14156

Remove mmap isLoaded check before madvise #14156

original-brownbear commented Jan 21, 2025 •

edited

Loading

ChrisHegarty commented Jan 21, 2025

jpountz commented Jan 21, 2025

original-brownbear commented Jan 22, 2025

jpountz commented Jan 22, 2025

original-brownbear commented Jan 22, 2025 •

edited

Loading

jpountz commented Jan 22, 2025

Remove mmap isLoaded check before madvise #14156

Are you sure you want to change the base?

Remove mmap isLoaded check before madvise #14156

Conversation

original-brownbear commented Jan 21, 2025 • edited Loading

ChrisHegarty commented Jan 21, 2025

jpountz commented Jan 21, 2025

original-brownbear commented Jan 22, 2025

jpountz commented Jan 22, 2025

original-brownbear commented Jan 22, 2025 • edited Loading

jpountz commented Jan 22, 2025

original-brownbear commented Jan 21, 2025 •

edited

Loading

original-brownbear commented Jan 22, 2025 •

edited

Loading