Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ACORN-1 search for HNSW #14085

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

benchaplin
Copy link
Contributor

@benchaplin benchaplin commented Dec 20, 2024

Description

Playing around with some ideas from ACORN-1 to improve filtered HNSW search. The ideas are:

  • Predicate subgraph traversal (only consider/score candidates that pass the filter)
  • Two-hop neighbor expansion (I read up on Weaviate's implementation and used their idea to consider two-hop neighbors only when the first hop doesn't pass the filter)

I benchmarked using Cohere/wikipedia-22-12-en-embeddings with params:

  • nDoc = 200000
  • topK = 100
  • fanout = 50
  • maxConn = 32
  • beamWidth = 100
  • filterSelectivity = [0.05, 0.25, 0.5, 0.75, 0.95]

Here are some results:

Baseline:

filterSelectivity recall latency (ms)
0.05 0.037 17.182
0.25 0.166 7.348
0.5 0.332 4.376
0.75 0.489 3.165
0.95 0.608 2.441

Candidate (this code):

filterSelectivity recall latency (ms)
0.05 0.028 2.744
0.25 0.157 4.614
0.5 0.308 4.833
0.75 0.449 4.622
0.95 0.563 3.244

Pros: significantly faster for selective filters.
Cons: slightly worse recall across the board, slightly slower for inclusive filters.

There's a lot to play around with here, this code represents the best results I got with this testing. One thing that must be tested is correlation between filter and query vector (this is discussed and tested in the paper). luceneutil only offers zero correlation at the moment, so I'm working on adding a knob to turn there for future benchmarks.

Code should also be cleaned up, but for me, keeping everything in one method makes it easier to read the changes.

@benwtrent
Copy link
Member

Thank you for taking a stab at this @benchaplin ! I wonder if we can adjust the algorithm to more intelligently switch between the algorithms. something like:

  • Fan out one layer (only accepting the filtered docs) add candidates.
  • If we get an acceptable "saturation" (e.g. some number <= m*2 that we consider adequate connectedness), we just stick to those candidates and explore.
  • If we do not reach appropriate saturation fan out second layer add candidates.
  • If we fail saturation (again, some number <= m*2), do we fan out a third layer? Do we "jump up" a layer in the graph to gather a better entry point as the current one is garbage?

The initial algorithm makes sense, we are trying to recover the graph connectedness for exploration. The bottom layer entry point is the initial "exploration zone". One idea also is that we allow multiple "exploration zones" from which we fan out finding the filtered values.

These are just 🧠 ⚡ ideas. The initial numbers are promising.

@benwtrent
Copy link
Member

Hey @benchaplin there are a number of things broken with lucene util right now. Your recall numbers surprised me and I think they don't reflect actual performance.

I am working on getting real numbers with some local patches & providing patches to Lucene util as I can.

see:

@benwtrent
Copy link
Member

This is the branch I am using for testing recall/latency for filter cases for right now: https://github.com/mikemccand/luceneutil/compare/main...benwtrent:luceneutil:filter-testing?expand=1

@benwtrent
Copy link
Member

Here are some benchmarks (100k float32[1024]).

Baseline:

recall  latency (ms)    nDoc  topK  fanout  maxConn  beamWidth  visited  selectivity
 0.915         0.950  100000   100       0       16        100     2054         0.95
 0.918         0.950  100000   100       0       16        100     2128         0.90
 0.924         1.090  100000   100       0       16        100     2417         0.75
 0.935         1.430  100000   100       0       16        100     3357         0.50
 0.962         2.740  100000   100       0       16        100     5846         0.25
 1.000         9.530  100000   100       0       16        100     9882         0.10
 1.000         4.750  100000   100       0       16        100     4913         0.05
 1.000         2.100  100000   100       0       16        100     2507         0.03
 1.000         0.970  100000   100       0       16        100     1023         0.01
 0.975         1.660  100000   100     100       16        100     3545         0.95
 0.977         1.680  100000   100     100       16        100     3682         0.90
 0.977         2.120  100000   100     100       16        100     4218         0.75
 0.984         2.720  100000   100     100       16        100     5789         0.50
 0.990         5.190  100000   100     100       16        100     9889         0.25
 1.000         9.900  100000   100     100       16        100     9883         0.10
 1.000         4.740  100000   100     100       16        100     4913         0.05
 1.000         2.150  100000   100     100       16        100     2507         0.03
 1.000         0.970  100000   100     100       16        100     1023         0.01

Candidate:

recall  latency (ms)    nDoc  topK  fanout  maxConn  beamWidth  visited  selectivity
 0.852         1.330  100000   100       0       16        100     2723         0.95
 0.821         1.510  100000   100       0       16        100     3011         0.90
 0.793         1.680  100000   100       0       16        100     3329         0.75
 0.795         1.730  100000   100       0       16        100     3357         0.50
 0.880         1.820  100000   100       0       16        100     2912         0.25
 0.891         1.430  100000   100       0       16        100     1692         0.10
 0.823         1.320  100000   100       0       16        100     1070         0.05
 0.765         1.170  100000   100       0       16        100      620         0.03
 0.630         1.190  100000   100       0       16        100      341         0.01
 0.942         2.210  100000   100     100       16        100     4786         0.95
 0.934         2.500  100000   100     100       16        100     5360         0.90
 0.927         2.950  100000   100     100       16        100     5983         0.75
 0.925         3.370  100000   100     100       16        100     6005         0.50
 0.956         3.160  100000   100     100       16        100     4711         0.25
 0.947         2.380  100000   100     100       16        100     2560         0.10
 0.895         2.160  100000   100     100       16        100     1569         0.05
 0.842         1.910  100000   100     100       16        100      888         0.03
 0.744         1.940  100000   100     100       16        100      482         0.01

You can see until about 50% selectivity, latency & recall are worse in candidate. However, was we select even fewer than 50%, visited gets better, but recall suffers.

This is likely because in the more restrictive cases we are actually dropping to brute-force because of excessive exploration (note the 1.0 recall in baseline).

@benwtrent
Copy link
Member

https://github.com/apache/lucene/compare/main...benwtrent:lucene:acorn_search?expand=1

Here are two of my ideas:

  • We only go to 2-hop if a percentage of the current candidate's neighbors are filtered out
  • We oversample by a percentage the total candidates considered.

Tweaking the settings might take some steps.

I also think there are things to do around:

  • Considering jumping up the layer and going to a different entry point if we get "far away" from the current entry point
  • More intelligently choosing the candidates for two-hop
  • Consider three-hop on very restrictive filters (e.g. we don't satiate our expanded set, we should look one layer more out)

@benwtrent
Copy link
Member

I updated my branch further. Got some interesting results which indicate that our graph exploration is slightly too expensive (vint reading and graph seek end up dominating the cost), but still ends up faster with nicer recall behavior than baseline:

1M vectors

Candidate: Note the weirdness where visited is low, but latency is high, this is due to graph serialization costs (vint and binary search to find nsw offsets). But even with this weirdness, it ends up faster than baseline at comparable recall.

recall  latency (ms)     nDoc  topK  fanout  maxConn  beamWidth  visited  index s  index docs/s  index size (MB)  selectivity  vec disk (MB)  vec RAM (MB)
 0.457         0.394  1000000   100       0       16        100      543     0.00      Infinity          3976.06         0.001      3906.250      3906.250
 0.617         4.128  1000000   100       0       16        100     1169     0.00      Infinity          3976.06         0.005      3906.250      3906.250
 0.704         4.460  1000000   100       0       16        100     1119     0.00      Infinity          3976.06         0.01       3906.250      3906.250
 0.856         4.426  1000000   100       0       16        100     3018     0.00      Infinity          3976.06         0.05       3906.250      3906.250
 0.860         3.279  1000000   100       0       16        100     3190     0.00      Infinity          3976.06         0.10       3906.250      3906.250
 0.856         2.868  1000000   100       0       16        100     3280     0.00      Infinity          3976.06         0.20       3906.250      3906.250
 0.844         2.595  1000000   100       0       16        100     3341     0.00      Infinity          3976.06         0.30       3906.250      3906.250
 0.846         2.375  1000000   100       0       16        100     3277     0.00      Infinity          3976.06         0.40       3906.250      3906.250
 0.848         2.125  1000000   100       0       16        100     3127     0.00      Infinity          3976.06         0.50       3906.250      3906.250
 0.852         1.913  1000000   100       0       16        100     2926     0.00      Infinity          3976.06         0.60       3906.250      3906.250
 0.855         1.674  1000000   100       0       16        100     2669     0.00      Infinity          3976.06         0.70       3906.250      3906.250
 0.858         1.503  1000000   100       0       16        100     2357     0.00      Infinity          3976.06         0.80       3906.250      3906.250
 0.866         1.242  1000000   100       0       16        100     2009     0.00      Infinity          3976.06         0.90       3906.250      3906.250
 0.870         1.198  1000000   100       0       16        100     1931     0.00      Infinity          3976.06         0.95       3906.250      3906.250
 1.000         0.613  1000000   100     250       16        100     1082     0.00      Infinity          3976.06         0.001      3906.250      3906.250
 0.830        13.153  1000000   100     250       16        100     2159     0.00      Infinity          3976.06         0.005      3906.250      3906.250
 0.869        15.463  1000000   100     250       16        100     3083     0.00      Infinity          3976.06         0.01       3906.250      3906.250
 0.943        13.776  1000000   100     250       16        100     8378     0.00      Infinity          3976.06         0.05       3906.250      3906.250
 0.952        10.126  1000000   100     250       16        100     8947     0.00      Infinity          3976.06         0.10       3906.250      3906.250
 0.957         7.990  1000000   100     250       16        100     9164     0.00      Infinity          3976.06         0.20       3906.250      3906.250
 0.958         6.964  1000000   100     250       16        100     9239     0.00      Infinity          3976.06         0.30       3906.250      3906.250
 0.960         6.533  1000000   100     250       16        100     9077     0.00      Infinity          3976.06         0.40       3906.250      3906.250
 0.961         5.836  1000000   100     250       16        100     8666     0.00      Infinity          3976.06         0.50       3906.250      3906.250
 0.963         5.671  1000000   100     250       16        100     8117     0.00      Infinity          3976.06         0.60       3906.250      3906.250
 0.965         4.923  1000000   100     250       16        100     7348     0.00      Infinity          3976.06         0.70       3906.250      3906.250
 0.966         4.151  1000000   100     250       16        100     6374     0.00      Infinity          3976.06         0.80       3906.250      3906.250
 0.969         3.323  1000000   100     250       16        100     5392     0.00      Infinity          3976.06         0.90       3906.250      3906.250
 0.970         3.188  1000000   100     250       16        100     5212     0.00      Infinity          3976.06         0.95       3906.250      3906.250

Baseline:

recall  latency (ms)     nDoc  topK  fanout  maxConn  beamWidth  visited  index s  index docs/s  index size (MB)  selectivity  vec disk (MB)  vec RAM (MB)
 1.000         1.041  1000000   100       0       16        100     1955     0.00      Infinity          3976.06         0.001      3906.250      3906.250
 1.000         5.594  1000000   100       0       16        100     9699     0.00      Infinity          3976.06         0.005      3906.250      3906.250
 1.000        12.979  1000000   100       0       16        100    19847     0.00      Infinity          3976.06         0.01       3906.250      3906.250
 0.956        15.219  1000000   100       0       16        100    22052     0.00      Infinity          3976.06         0.05       3906.250      3906.250
 0.950         8.020  1000000   100       0       16        100    12591     0.00      Infinity          3976.06         0.10       3906.250      3906.250
 0.936         4.343  1000000   100       0       16        100     7148     0.00      Infinity          3976.06         0.20       3906.250      3906.250
 0.926         3.095  1000000   100       0       16        100     5202     0.00      Infinity          3976.06         0.30       3906.250      3906.250
 0.917         2.280  1000000   100       0       16        100     4077     0.00      Infinity          3976.06         0.40       3906.250      3906.250
 0.907         1.944  1000000   100       0       16        100     3401     0.00      Infinity          3976.06         0.50       3906.250      3906.250
 0.899         1.750  1000000   100       0       16        100     2951     0.00      Infinity          3976.06         0.60       3906.250      3906.250
 0.892         1.547  1000000   100       0       16        100     2626     0.00      Infinity          3976.06         0.70       3906.250      3906.250
 0.887         1.414  1000000   100       0       16        100     2376     0.00      Infinity          3976.06         0.80       3906.250      3906.250
 0.883         1.277  1000000   100       0       16        100     2154     0.00      Infinity          3976.06         0.90       3906.250      3906.250
 0.879         1.221  1000000   100       0       16        100     2061     0.00      Infinity          3976.06         0.95       3906.250      3906.250
 1.000         1.091  1000000   100     250       16        100     1955     0.00      Infinity          3976.06         0.001      3906.250      3906.250
 1.000         5.649  1000000   100     250       16        100     9699     0.00      Infinity          3976.06         0.005      3906.250      3906.250
 1.000        13.369  1000000   100     250       16        100    19847     0.00      Infinity          3976.06         0.01       3906.250      3906.250
 0.999        57.741  1000000   100     250       16        100    90785     0.00      Infinity          3976.06         0.05       3906.250      3906.250
 0.991        25.507  1000000   100     250       16        100    35212     0.00      Infinity          3976.06         0.10       3906.250      3906.250
 0.988        13.688  1000000   100     250       16        100    19755     0.00      Infinity          3976.06         0.20       3906.250      3906.250
 0.985         9.582  1000000   100     250       16        100    14228     0.00      Infinity          3976.06         0.30       3906.250      3906.250
 0.984         7.421  1000000   100     250       16        100    11239     0.00      Infinity          3976.06         0.40       3906.250      3906.250
 0.982         5.938  1000000   100     250       16        100     9366     0.00      Infinity          3976.06         0.50       3906.250      3906.250
 0.980         5.067  1000000   100     250       16        100     8080     0.00      Infinity          3976.06         0.60       3906.250      3906.250
 0.978         4.419  1000000   100     250       16        100     7147     0.00      Infinity          3976.06         0.70       3906.250      3906.250
 0.976         3.970  1000000   100     250       16        100     6443     0.00      Infinity          3976.06         0.80       3906.250      3906.250
 0.975         3.587  1000000   100     250       16        100     5843     0.00      Infinity          3976.06         0.90       3906.250      3906.250
 0.974         3.394  1000000   100     250       16        100     5587     0.00      Infinity          3976.06         0.95       3906.250      3906.250

@benchaplin
Copy link
Contributor Author

Awesome stuff @benwtrent - thanks for spearheading the luceneutil recall fix, still trying to wrap my head around how I followed so many "patterns" in those numbers during initial development when I guess they were all... bogus? Anyway, glad you were able to get back the gains.

I like your idea to look past 2-hop nbs to a point (from what I can tell, that's what you're doing now with filteredNeighborQueue.add(twoHopFriendOrd);). I'm struggling to understand the reasoning behind the definitions of expandedNeighborCount & maxExpandedNeighbors though.

I'll pull your branch and play around with some parameters. Also been working on adding correlated filters to luceneutil so should be able to test with that soon.

@benwtrent
Copy link
Member

Thanks @benchaplin

Those constants and numbers are focused on expanding and contracting the graph search as we hit various NSW with more or fewer matching docs.

One dictates how many extra matching candidates we take over the current NSW connections. (e.g. collecting 40 valid candidates instead of 32).

The other is controlling how far into the graph we will search non- matching candidates. At some point, you just gotta cut your losses. I tried to relate this to how restricted the filter is respective to the NSW connections.

The numbers and their calculations are definitely not written in stone, so if you find something that works better let me know.

Definitely a POC and poorly documented right now.

Another good idea is to expand our entry points to layer zero on very restricted filters (e.g. when the filter limit is less than the number of docs on layer 1 or something).

@benwtrent
Copy link
Member

@benchaplin I updated my branch further and simplified the logic:

  • we consider at max 2x scored candidates
  • We will explore at max 3x filtered candidates

This seemed to be a sweat spot and simplified the math (which had numerous numerical errors).

I think its getting close to being "done", there are some failing tests, but I think they can be easily fixed.

One thing I am worried about is that this change will pretty substantially change recall at various filter levels. Filtered kNN search is likely the most common kNN search there is. So, we should some how allow users to keep the old behavior and opt in as this might be considered a breaking change due to the significant change in behavior. Maybe its not a breaking change, maybe its just ok 🤷

Here are some other augmentations to the current change I tried. Neither seems justified IMO.

A. Adding just the first maxConn filtered docs to the result set & entry point list on the bottom layer. This didn't seem to do anything useful and hurt the recall/latency & vector ops measurements
B. Adding additional 'nearest entry points' from layer 1. So, gather the nearest and then gather N more (up to some limit related to maxConn) and use those when exploring the base layer. This seemed to have better behavior for highly filtered scenarios than option A. But, it still didn't add much. So, I opted for simplicity.

@benwtrent
Copy link
Member

I am still getting some pretty abhorrent cases (0.005 filter over 1M docs with 100 k and 100-250 fan-out).

I wonder if I have a bug with tracking visited nodes and allowing adequate exploration of valid candidates....

@benwtrent
Copy link
Member

I think I have addressed the bug in my implementation. I simplified it greatly and it more resembles your original change, though with some constants being changed. The recall curve & runtime curve look way better now for most filtered results. I will post graphs soon as the raw data is an eye sore.

Here is my data: https://docs.google.com/spreadsheets/d/1GqD7Jw42IIqimr2nB78fzEfOohrcBlJzOlpt0NuUVDQ/edit?usp=sharing

It looks like at 0.5 and above the improvements start tapering off, but it never gets significantly worse :/

Here is a graph of some of the more restrictive filters. 10% filtered allowed does much much better with this new algorithm.

image

@benchaplin
Copy link
Contributor Author

One thing I've been unsure about is this "dropping to brute-force" (recall = 1 & visited ~= selectivity * nDoc). I can't seem to find where this happens in code.

It's interesting to see where the gains happen (looking at your Google doc) - it seems like 0.1 selectivity is the sweet spot, 3-4x latency speedup with little recall drop (particularly for high fanouts). By 0.5 selectivity those gains are gone. For selectivities <= 0.01, baseline drops to brute-force. But the candidate gets worse and worse recall numbers until dropping to brute force at 0.001.

Pure opt in is one option, but I wonder if it's possible to force the candidate to drop to brute-force for the same cases the baseline does. I guess the question is: will anyone get value out of 0.5 recall 2-3x faster than brute force 1 recall?

@benwtrent
Copy link
Member

The brute force drop occurs when the visit limit is breached. You can see this in abstract knn query.

I only count visited on vector comparisons. So filtered search will explore the graph more before dropping to brute force.

I do think we will always need a "drop to brute force" to catch edge cases but we need to make the calculation on when to do brute force upfront better. Right now it's pretty naive.

I tried this a little in my change (see changes in hnsw reader), but it can be much better.

As for exposing this to users, I am thinking it needs to be a setting on the knn query that gets passed to the collector. This is a pretty big change in search behavior

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants