Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove mmap isLoaded check before madvise #14156

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

original-brownbear
Copy link
Member

@original-brownbear original-brownbear commented Jan 21, 2025

The counter guarding the madvise callback is not all that effective in preventing madvise calls on slices since it's not shared with the original input and this keeps showing up hot in profiling ES benchmarks.
An outright madvise call should be about as expensive as the isLoaded check when things are already in the page cache and cheaper than that same call + a call to isLoaded that returns false, simply by avoiding another roundtrip to userspace.

-> remove the isLoaded guard to speed up call cases

Excerpt from ES profile motivating this:

image

Checking the page table for loaded is comparable in cost to many thousands of cycles in practice.
Plus, the racy nature of such a check reduces its value in memory constrained scenarios.

The counter guarding the `madvise` callback is not all that effective
in preventing `madvise` calls on slices since it's not shared
with the original input and this keeps showing up hot in profiling
ES benchmarks.
An outright `madvise` call should be about as expensive as the `isLoaded`
check when things are already in the page cache and cheaper than that same
call + a call to `isLoaded` that returns `false`, simply by avoiding another
roundtrip to userspace.

-> remove the isLoaded guard to speed up call cases
@ChrisHegarty
Copy link
Contributor

The overhead of MS::isLoaded is certainly not a good tradeoff here, as can be seen from the profile that you posted @original-brownbear. Given the new code, consecutivePrefetchHitCount doesn't seem effective now at all.

Additionally, and orthogonally, I'm less sure of the checks anyway, since they don't count the against the actual memory range but rather against the whole segment.

@jpountz
Copy link
Contributor

jpountz commented Jan 21, 2025

An outright madvise call should be about as expensive as the isLoaded check when things are already in the page cache

The PR where consecutivePrefetchHitCount was introduced had a benchmark that demonstrated an improvement with the current logic: #13381 (comment). That said, this benchmark had a different access pattern compared to this one, which I guess is terms dictionary lookups performed by TermQuery, so I can believe that it's not always better. I wonder if you are able to create a benchmark that shows an improvement with this change.

The counter guarding the madvise callback is not all that effective in preventing madvise calls on slices since it's not shared with the original input

Thinking out loud, this isLoaded() check only makes sense when prefetch() is called multiple times on the same IndexInput, which is never the case for terms dictionary lookups (as you pointed out). So another way of improving things here may be to disable the isLoaded() check on the first (or first few) calls to prefetch().

Additionally, and orthogonally, I'm less sure of the checks anyway, since they don't count the against the actual memory range but rather against the whole segment.

These checks make an assumption that if many ranges of bytes on which prefetch() has been called were already loaded in memory, then there are good chances that most of the segment is loaded in memory.

@original-brownbear
Copy link
Member Author

@jpountz

was introduced had a benchmark that demonstrated an improvement with the current logic

Huh those results are quite unexpected I must admit :) When me and Chris reasoned about this, the idea was that madvise ("will need") over a range that is fully cached would be the same cost as the isLoaded check. If that assumption holds this PR would be a clear winner but it seems that benchmark suggests otherwise?

@jpountz
Copy link
Contributor

jpountz commented Jan 22, 2025

Well, you may be right as well that the cost of MS::isLoaded is of a similar order of magnitude as madvise. What the current logic does is that if you get MS::isLoaded to frequently return true (suggesting that a good share of the MS is loaded), then both the call to MS::isLoaded and the call to madvise will be skipped, this is likely where the speedup from the benchmark that I linked is coming from.

What do you think of skipping the isLoaded() check for the first per-IndexInput call to prefetch()? This should help in the case that you identified (single prefetch() call per IndexInput slice instance), while not defeating the current logic to skip calls to MS::isLoaded / madvise for index inputs that appear to mostly be loaded already when doing several prefetch() calls on the same slice.

@original-brownbear
Copy link
Member Author

original-brownbear commented Jan 22, 2025

@jpountz I see. Hmm I wonder how much that saves us? Seems we just trade an isLoaded for an madvise on systems with enough memory? That said, maybe the madvise is far cheaper because it doesn't necessarily cover the whole segment like the isLoaded check?

EDIT: sorry scratch that question, both run against the same code, I misread things. So not sure how much that buys us on the fast path. It seems fundamentally, with it comes down to on my machine when benchmarking is that if everything's in the cache then either madvise or isLoaded incur significant overhead for no gain. Not sure there's a no-tradeoffs way out of this, probably not :) The faster your disk and the more memory you have the more pronounced the downsides of this logic and vice-versa for the gains :)
Maybe the trick here would be to keep track of the loaded state in a way that works across slices so that we throttle more efficiently?

@jpountz
Copy link
Contributor

jpountz commented Jan 22, 2025

Seems we just trade an isLoaded for an madvise on systems with enough memory?

This is correct. I made this suggestion because it was similar to your initial proposal: skipping the MS::isLoaded check and doing a madvise call directly instead.

Maybe the trick here would be to keep track of the loaded state in a way that works across slices so that we throttle more efficiently?

I'm good with exploring sharing state across slices as well. I optimized for simplicity with the current implementation, but we may be able to figure out something that works well across slices as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants