-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make sure DocumentsWriterPerThread#getAndLock
never returns null
on a non-empty queue.
#12959
Make sure DocumentsWriterPerThread#getAndLock
never returns null
on a non-empty queue.
#12959
Conversation
…ll` on a non-empty queue. Before this change, `ConcurrentApproximatePriorityQueue#poll` could sometimes return `null` even though the queue was empty at no point in time. The practical implication is that we can end up with more DWPTs in memory than indexing threads, which, while not strictly a bug, may require doing more merging than we'd like later on. I ran luceneutil's `IndexGeonames` with this change, and `ConcurrentApproximatePriorityQueue#poll` was not the main source of contention. I instrumented the code to check how many DWPTs got pulled from the queue using the optimistic path vs. pessimistic path and got 8525598 for the optimistic path vs. 12247 for the pessimistic path. Closes apache#12649 apache#12916
For what it's worth, I've tried reproducing this on Adrien's branch with:
but nope, no fails. |
I get it to fail with both OpenJ9 and Hotspot on this branch, see #12916 |
OK, I think I understand why the test is still failing, this seems to be because we unlock the DWPT after putting it back into the queue. But there is also a good reason to do so. I need to think more about it. |
I have a new iteration of this change, which now also accounts for how |
If there are no concerns, I plan on merging this PR soon. |
Hi, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I beasted this PR and it produced on my Ryzen CPU under load no failures. So it looks like it fixed the issue.
The AtomicInteger
is only incremented so at some point it overflows. But when analyzing the code, the actual number does not matter, it just needs to be different to retry, correct? So when it overflows it does not do any harm.
FYI, I did not try to run the beasting for hours because previously it was failing very fast on a heavy loaded Ryzen CPU with both Hotspot and OpenJ9. Now its stable, so I trust the results also after 2 hours (1 hour with Hotspot, 1 hour with OpenJ9). |
ConcurrentApproximatePriorityQueue#poll
never returns null
on a non-empty queue.DocumentsWriterPerThread#getAndLock
never returns null
on a non-empty queue.
Thanks a lot @uschindler ! |
…on a non-empty queue. (#12959) Before this change, `DocumentsWriterPerThread#getAndLock` could sometimes return `null` even though the queue was empty at no point in time. The practical implication is that we can end up with more DWPTs in memory than indexing threads, which, while not strictly a bug, may require doing more merging than we'd like later on. I ran luceneutil's `IndexGeonames` with this change, and `DocumentsWriterPerThread#getAndLock` was not the main source of contention. Closes #12649 #12916
…on a non-empty queue. (apache#12959) Before this change, `DocumentsWriterPerThread#getAndLock` could sometimes return `null` even though the queue was empty at no point in time. The practical implication is that we can end up with more DWPTs in memory than indexing threads, which, while not strictly a bug, may require doing more merging than we'd like later on. I ran luceneutil's `IndexGeonames` with this change, and `DocumentsWriterPerThread#getAndLock` was not the main source of contention. Closes apache#12649 apache#12916
Before this change,
DocumentsWriterPerThread#getAndLock
could sometimesreturn
null
even though the queue was empty at no point in time. Thepractical implication is that we can end up with more DWPTs in memory than
indexing threads, which, while not strictly a bug, may require doing more
merging than we'd like later on.
I ran luceneutil's
IndexGeonames
with this change, andDocumentsWriterPerThread#getAndLock
was not the main source ofcontention.
Closes #12649 #12916