Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not maintain docBufferUpTo when only docs needed #14164

Merged
merged 7 commits into from
Jan 25, 2025

Conversation

gf2121
Copy link
Contributor

@gf2121 gf2121 commented Jan 23, 2025

The docBufferUpTo variable is mainly maintained to obtain the corresponding value of freq/pos buffer. We can avoid the maintaining when only docs needed.

Result on wikimediumall:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                          IntNRQ       55.22      (5.8%)       52.09     (11.2%)   -5.7% ( -21% -   12%) 0.044
               HighTermMonthSort     1425.08      (6.9%)     1374.98      (4.5%)   -3.5% ( -13% -    8%) 0.056
                      HighPhrase       63.98      (4.6%)       61.98      (3.7%)   -3.1% ( -10% -    5%) 0.017
                      OrHighHigh       84.20      (4.9%)       82.20      (5.2%)   -2.4% ( -11% -    8%) 0.138
                        Wildcard      294.72      (6.4%)      287.84      (8.3%)   -2.3% ( -16% -   13%) 0.321
                 MedSloppyPhrase      133.44      (6.8%)      130.76      (5.6%)   -2.0% ( -13% -   11%) 0.306
                      AndHighMed      157.91      (8.3%)      154.85      (8.5%)   -1.9% ( -17% -   16%) 0.466
            HighIntervalsOrdered       35.23      (7.1%)       34.59      (5.9%)   -1.8% ( -13% -   11%) 0.373
                       MedPhrase      332.72      (7.4%)      326.68      (9.9%)   -1.8% ( -17% -   16%) 0.512
                    OrNotHighLow     1335.53      (6.9%)     1313.64      (5.0%)   -1.6% ( -12% -   10%) 0.387
                     AndHighHigh      132.83      (7.4%)      130.66      (7.7%)   -1.6% ( -15% -   14%) 0.494
                 CountOrHighHigh      169.66      (5.3%)      166.92      (5.9%)   -1.6% ( -12% -   10%) 0.362
                   OrHighNotHigh      513.18      (8.8%)      505.55      (8.0%)   -1.5% ( -16% -   16%) 0.577
               HighTermTitleSort      105.13      (5.5%)      103.71      (4.0%)   -1.3% ( -10% -    8%) 0.377
                         Prefix3      525.89      (6.9%)      518.94      (8.0%)   -1.3% ( -15% -   14%) 0.576
                       LowPhrase       45.66      (3.1%)       45.06      (5.2%)   -1.3% (  -9% -    7%) 0.334
          CountFilteredOrHighMed      123.62      (4.4%)      122.75      (4.8%)   -0.7% (  -9% -    8%) 0.627
                          Fuzzy2       74.49      (4.1%)       74.04      (3.6%)   -0.6% (  -7% -    7%) 0.612
             CountFilteredPhrase      193.31      (6.7%)      192.13      (5.5%)   -0.6% ( -12% -   12%) 0.754
                CountAndHighHigh      194.92      (6.3%)      193.83      (6.8%)   -0.6% ( -12% -   13%) 0.787
                       CountTerm    10935.09     (10.2%)    10895.04     (11.7%)   -0.4% ( -20% -   23%) 0.916
             CountFilteredIntNRQ       52.36      (8.7%)       52.18     (13.4%)   -0.3% ( -20% -   23%) 0.924
                         Respell       59.02      (4.1%)       58.82      (4.1%)   -0.3% (  -8% -    8%) 0.795
                      TermDTSort      160.85      (6.4%)      160.33      (8.8%)   -0.3% ( -14% -   15%) 0.894
                      AndHighLow     1510.60      (5.3%)     1505.83      (5.9%)   -0.3% ( -10% -   11%) 0.858
                          Fuzzy1       93.91      (3.6%)       93.64      (5.4%)   -0.3% (  -8% -    8%) 0.843
                        PKLookup      257.87      (2.8%)      257.23      (3.0%)   -0.2% (  -5% -    5%) 0.789
                 LowSloppyPhrase      190.90      (4.8%)      190.79      (8.9%)   -0.1% ( -13% -   14%) 0.980
                       OrHighLow      692.01      (7.5%)      692.36      (4.4%)    0.1% ( -11% -   12%) 0.979
                    OrHighNotLow      632.27      (8.5%)      632.96      (7.3%)    0.1% ( -14% -   17%) 0.965
                     LowSpanNear       33.43      (3.4%)       33.47      (4.2%)    0.1% (  -7% -    7%) 0.917
                    OrNotHighMed      480.27      (9.1%)      481.13     (10.0%)    0.2% ( -17% -   21%) 0.952
                           range    10942.44     (10.2%)    10975.32      (8.9%)    0.3% ( -17% -   21%) 0.921
                       OrHighMed      299.53      (7.4%)      300.57      (7.8%)    0.3% ( -13% -   16%) 0.886
                     MedSpanNear       44.82      (5.6%)       45.06      (5.4%)    0.5% (  -9% -   12%) 0.758
                        HighTerm      741.13     (10.4%)      745.76      (7.3%)    0.6% ( -15% -   20%) 0.826
            HighTermTitleBDVSort       36.24      (3.6%)       36.49      (4.0%)    0.7% (  -6% -    8%) 0.565
                    HighSpanNear       45.38      (4.2%)       45.74      (4.5%)    0.8% (  -7% -    9%) 0.563
           HighTermDayOfYearSort      171.66      (6.4%)      173.24      (7.6%)    0.9% ( -12% -   15%) 0.679
             MedIntervalsOrdered      127.74      (7.7%)      129.04      (6.6%)    1.0% ( -12% -   16%) 0.654
                HighSloppyPhrase       34.31      (2.9%)       34.76      (4.0%)    1.3% (  -5% -    8%) 0.232
                     CountPhrase      157.99      (6.6%)      160.55      (3.9%)    1.6% (  -8% -   12%) 0.345
             LowIntervalsOrdered      126.29      (6.7%)      128.58      (4.2%)    1.8% (  -8% -   13%) 0.305
                 CountAndHighMed      258.83      (7.7%)      263.58      (6.3%)    1.8% ( -11% -   17%) 0.410
                         MedTerm      942.47      (8.2%)      962.22     (11.2%)    2.1% ( -16% -   23%) 0.501
                         LowTerm      616.37      (9.7%)      629.81      (8.0%)    2.2% ( -14% -   22%) 0.438
                   OrNotHighHigh      597.56     (10.8%)      616.64      (9.0%)    3.2% ( -14% -   25%) 0.309
             CountFilteredOrMany       30.09      (6.2%)       31.75      (5.9%)    5.5% (  -6% -   18%) 0.004
                     CountOrMany       32.19      (7.3%)       33.97      (6.5%)    5.5% (  -7% -   20%) 0.012
         CountFilteredOrHighHigh      107.96      (4.8%)      114.57      (5.6%)    6.1% (  -4% -   17%) 0.000
                    OrHighNotMed      622.10      (9.3%)      665.07      (7.3%)    6.9% (  -8% -   25%) 0.009
                  CountOrHighMed      290.83      (9.5%)      313.29      (9.1%)    7.7% (  -9% -   29%) 0.009

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I had meant to try out something like that too. I left minor comments but the change looks good to me.

@@ -388,6 +388,7 @@ private enum DeltaEncoding {
final boolean needsOffsetsOrPayloads;
final boolean needsImpacts;
final boolean needsDocsAndFreqsOnly;
final boolean needsDocsOnly;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that we do not need to track a separate variable as it should always be the same as needsFreq == false (needsFreq should always be true whenever positions or impacts are needed).

@@ -345,7 +345,7 @@ private enum DeltaEncoding {
private int prevDocID; // last doc ID of the previous block

private int docBufferSize;
private int docBufferUpto;
private int docBufferUpto; // only makes sense for packed encoding
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is a bit misleading: it makes sense for the bitset encoding, but in this case it's only the index into the freq buffer, not into the doc buffer.

@gf2121 gf2121 merged commit 52d3809 into apache:main Jan 25, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants