Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

supports force merge based on specified segments. #14163

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

cheng66551
Copy link

In version 7.6.0 of ElasticSearch, I found through /_cat/segments that the docs.deleted count of many segments was continuously increasing, but over time, these deleted documents were never automatically merged. The segment information is as follows:

segment generation docs.count docs.deleted    size size.memory committed searchable version compound
_1bn4gh   80020817    2434329     85866860   8.9gb     6845726 true      true       8.4.0   false
_1bqg6j   80175979     258975     18754886   1.8gb     1708132 true      true       8.4.0   false
_1brsd1   80238421     340857     17805014   1.8gb     1807134 true      true       8.4.0   false
_1bt573   80301711     444912     17747931   1.8gb     1831663 true      true       8.4.0   false
_1buf8x   80361393     590820     18290815   1.9gb     1762322 true      true       8.4.0   false
_1btbuk   80310332    2666453     12507939   1.9gb     2543630 true      true       8.4.0   false
_1bzdsz   80592803     242465     17280902   1.8gb     1565934 true      true       8.4.0   false
_1c3msi   80791074     330315     17941295   1.8gb     1623871 true      true       8.4.0   false
_1c75vi   80955774     425781     17177269   1.8gb     1645538 true      true       8.4.0   false
_1c9xyl   81085485     542056     18550711   1.8gb     1692414 true      true       8.4.0   false
......

So I triggered a forced merge through _forcemerge?only_expunge_deletes=true, but it had no effect.A similar phenomenon is mentioned in Issue #13226

I suspect that TieredMergePolicy did not select these segments, thus no merge was triggered.
Therefore, I wrote this forceMergeBySegmentNames method, which can bypass the logic of TieredMergePolicy and perform merging based on the specified segment names. When verified in the production environment, it achieved very good results.

@jpountz
Copy link
Contributor

jpountz commented Jan 23, 2025

I don't think we should merge this change, but it's good that you were able to use it to confirm that merging would reclaim these deleted docs.

Can you add your data about this issue to #13226? There is a smell merging not keeping up or a bad interaction between the merge policy and soft deletes.

@mikemccand
Copy link
Member

It's terrible that TieredMergePolicy was not merging these segments, naturally or under forceMerge -- let's understand why it's failing to do so? It's like we need an explain API for its merge selection.

TMP does have a setForceMergeDeletesPctAllowed, which defaults to 10%, meaning if a segment has <= 10% deletions, it won't be selected under forceMerge. But if I'm reading it right you have a segment _1btbuk with ~82.4% deleted docs (12507939 / (2666453 + 12507939) = 0.8242794175872088), which should have been selected.

Have you changed setMaxMergedSegmentMB away from its default (5 GB)?

Separately, you have crazy high segment names -- I'm curious if this is a very long lived index?

This PR reminds me of the Linux "direct IO" struggles. Linus really does not like the existence of "direct IO" (O_DIRECT flag to open API), because its existence means users may jump straight to that and take pressure off improving how Linux manages IO caching (the buffer cache). I.e. rather than improving the kernel's IO caching, users can skip it altogether. It's the same thing here: if we expose a merge policy where users can simply pick their own merges, we take pressure off of fixing the problems in our default TieredMergePolicy. That being said, MergePolicy is pluggable for exactly this reason: users (well direct Lucene users) are free to customize merge selection.

@mikemccand
Copy link
Member

If you are able to turn on InfoStream for the ES shard that won't merge segments with so many deletions, and post a chunk here, I can have a look and see if there are clues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants