Not maintain docBufferUpTo when only docs needed #14164

gf2121 · 2025-01-23T09:39:15Z

The docBufferUpTo variable is mainly maintained to obtain the corresponding value of freq/pos buffer. We can avoid the maintaining when only docs needed.

Result on wikimediumall:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                          IntNRQ       55.22      (5.8%)       52.09     (11.2%)   -5.7% ( -21% -   12%) 0.044
               HighTermMonthSort     1425.08      (6.9%)     1374.98      (4.5%)   -3.5% ( -13% -    8%) 0.056
                      HighPhrase       63.98      (4.6%)       61.98      (3.7%)   -3.1% ( -10% -    5%) 0.017
                      OrHighHigh       84.20      (4.9%)       82.20      (5.2%)   -2.4% ( -11% -    8%) 0.138
                        Wildcard      294.72      (6.4%)      287.84      (8.3%)   -2.3% ( -16% -   13%) 0.321
                 MedSloppyPhrase      133.44      (6.8%)      130.76      (5.6%)   -2.0% ( -13% -   11%) 0.306
                      AndHighMed      157.91      (8.3%)      154.85      (8.5%)   -1.9% ( -17% -   16%) 0.466
            HighIntervalsOrdered       35.23      (7.1%)       34.59      (5.9%)   -1.8% ( -13% -   11%) 0.373
                       MedPhrase      332.72      (7.4%)      326.68      (9.9%)   -1.8% ( -17% -   16%) 0.512
                    OrNotHighLow     1335.53      (6.9%)     1313.64      (5.0%)   -1.6% ( -12% -   10%) 0.387
                     AndHighHigh      132.83      (7.4%)      130.66      (7.7%)   -1.6% ( -15% -   14%) 0.494
                 CountOrHighHigh      169.66      (5.3%)      166.92      (5.9%)   -1.6% ( -12% -   10%) 0.362
                   OrHighNotHigh      513.18      (8.8%)      505.55      (8.0%)   -1.5% ( -16% -   16%) 0.577
               HighTermTitleSort      105.13      (5.5%)      103.71      (4.0%)   -1.3% ( -10% -    8%) 0.377
                         Prefix3      525.89      (6.9%)      518.94      (8.0%)   -1.3% ( -15% -   14%) 0.576
                       LowPhrase       45.66      (3.1%)       45.06      (5.2%)   -1.3% (  -9% -    7%) 0.334
          CountFilteredOrHighMed      123.62      (4.4%)      122.75      (4.8%)   -0.7% (  -9% -    8%) 0.627
                          Fuzzy2       74.49      (4.1%)       74.04      (3.6%)   -0.6% (  -7% -    7%) 0.612
             CountFilteredPhrase      193.31      (6.7%)      192.13      (5.5%)   -0.6% ( -12% -   12%) 0.754
                CountAndHighHigh      194.92      (6.3%)      193.83      (6.8%)   -0.6% ( -12% -   13%) 0.787
                       CountTerm    10935.09     (10.2%)    10895.04     (11.7%)   -0.4% ( -20% -   23%) 0.916
             CountFilteredIntNRQ       52.36      (8.7%)       52.18     (13.4%)   -0.3% ( -20% -   23%) 0.924
                         Respell       59.02      (4.1%)       58.82      (4.1%)   -0.3% (  -8% -    8%) 0.795
                      TermDTSort      160.85      (6.4%)      160.33      (8.8%)   -0.3% ( -14% -   15%) 0.894
                      AndHighLow     1510.60      (5.3%)     1505.83      (5.9%)   -0.3% ( -10% -   11%) 0.858
                          Fuzzy1       93.91      (3.6%)       93.64      (5.4%)   -0.3% (  -8% -    8%) 0.843
                        PKLookup      257.87      (2.8%)      257.23      (3.0%)   -0.2% (  -5% -    5%) 0.789
                 LowSloppyPhrase      190.90      (4.8%)      190.79      (8.9%)   -0.1% ( -13% -   14%) 0.980
                       OrHighLow      692.01      (7.5%)      692.36      (4.4%)    0.1% ( -11% -   12%) 0.979
                    OrHighNotLow      632.27      (8.5%)      632.96      (7.3%)    0.1% ( -14% -   17%) 0.965
                     LowSpanNear       33.43      (3.4%)       33.47      (4.2%)    0.1% (  -7% -    7%) 0.917
                    OrNotHighMed      480.27      (9.1%)      481.13     (10.0%)    0.2% ( -17% -   21%) 0.952
                           range    10942.44     (10.2%)    10975.32      (8.9%)    0.3% ( -17% -   21%) 0.921
                       OrHighMed      299.53      (7.4%)      300.57      (7.8%)    0.3% ( -13% -   16%) 0.886
                     MedSpanNear       44.82      (5.6%)       45.06      (5.4%)    0.5% (  -9% -   12%) 0.758
                        HighTerm      741.13     (10.4%)      745.76      (7.3%)    0.6% ( -15% -   20%) 0.826
            HighTermTitleBDVSort       36.24      (3.6%)       36.49      (4.0%)    0.7% (  -6% -    8%) 0.565
                    HighSpanNear       45.38      (4.2%)       45.74      (4.5%)    0.8% (  -7% -    9%) 0.563
           HighTermDayOfYearSort      171.66      (6.4%)      173.24      (7.6%)    0.9% ( -12% -   15%) 0.679
             MedIntervalsOrdered      127.74      (7.7%)      129.04      (6.6%)    1.0% ( -12% -   16%) 0.654
                HighSloppyPhrase       34.31      (2.9%)       34.76      (4.0%)    1.3% (  -5% -    8%) 0.232
                     CountPhrase      157.99      (6.6%)      160.55      (3.9%)    1.6% (  -8% -   12%) 0.345
             LowIntervalsOrdered      126.29      (6.7%)      128.58      (4.2%)    1.8% (  -8% -   13%) 0.305
                 CountAndHighMed      258.83      (7.7%)      263.58      (6.3%)    1.8% ( -11% -   17%) 0.410
                         MedTerm      942.47      (8.2%)      962.22     (11.2%)    2.1% ( -16% -   23%) 0.501
                         LowTerm      616.37      (9.7%)      629.81      (8.0%)    2.2% ( -14% -   22%) 0.438
                   OrNotHighHigh      597.56     (10.8%)      616.64      (9.0%)    3.2% ( -14% -   25%) 0.309
             CountFilteredOrMany       30.09      (6.2%)       31.75      (5.9%)    5.5% (  -6% -   18%) 0.004
                     CountOrMany       32.19      (7.3%)       33.97      (6.5%)    5.5% (  -7% -   20%) 0.012
         CountFilteredOrHighHigh      107.96      (4.8%)      114.57      (5.6%)    6.1% (  -4% -   17%) 0.000
                    OrHighNotMed      622.10      (9.3%)      665.07      (7.3%)    6.9% (  -8% -   25%) 0.009
                  CountOrHighMed      290.83      (9.5%)      313.29      (9.1%)    7.7% (  -9% -   29%) 0.009

jpountz

Thanks, I had meant to try out something like that too. I left minor comments but the change looks good to me.

jpountz · 2025-01-23T13:26:50Z

lucene/core/src/java/org/apache/lucene/codecs/lucene101/Lucene101PostingsReader.java

@@ -388,6 +388,7 @@ private enum DeltaEncoding {
    final boolean needsOffsetsOrPayloads;
    final boolean needsImpacts;
    final boolean needsDocsAndFreqsOnly;
+    final boolean needsDocsOnly;


I believe that we do not need to track a separate variable as it should always be the same as needsFreq == false (needsFreq should always be true whenever positions or impacts are needed).

jpountz · 2025-01-23T13:27:36Z

lucene/core/src/java/org/apache/lucene/codecs/lucene101/Lucene101PostingsReader.java

@@ -345,7 +345,7 @@ private enum DeltaEncoding {
    private int prevDocID; // last doc ID of the previous block

    private int docBufferSize;
-    private int docBufferUpto;
+    private int docBufferUpto; // only makes sense for packed encoding


The comment is a bit misleading: it makes sense for the bitset encoding, but in this case it's only the index into the freq buffer, not into the doc buffer.

gf2121 added 3 commits January 23, 2025 17:10

docs only posting bitset opt

312f734

iter

78f2bb0

iter

2bee634

jpountz approved these changes Jan 23, 2025

View reviewed changes

gf2121 added 4 commits January 24, 2025 12:18

iter

fc7ce02

typo

0d923f9

blank line back

14757cd

improve comment

af6561a

jpountz approved these changes Jan 24, 2025

View reviewed changes

gf2121 merged commit 52d3809 into apache:main Jan 25, 2025
5 checks passed

asfgit pushed a commit that referenced this pull request Jan 25, 2025

Not maintain docBufferUpTo when only docs needed (#14164)

059d8fe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not maintain docBufferUpTo when only docs needed #14164

Not maintain docBufferUpTo when only docs needed #14164

gf2121 commented Jan 23, 2025

jpountz left a comment

jpountz Jan 23, 2025

jpountz Jan 23, 2025

Not maintain docBufferUpTo when only docs needed #14164

Not maintain docBufferUpTo when only docs needed #14164

Conversation

gf2121 commented Jan 23, 2025

jpountz left a comment

Choose a reason for hiding this comment

jpountz Jan 23, 2025

Choose a reason for hiding this comment

jpountz Jan 23, 2025

Choose a reason for hiding this comment