Implement ACORN-1 search for HNSW #14085

benchaplin · 2024-12-20T18:31:06Z

Description

Playing around with some ideas from ACORN-1 to improve filtered HNSW search. The ideas are:

Predicate subgraph traversal (only consider/score candidates that pass the filter)
Two-hop neighbor expansion (I read up on Weaviate's implementation and used their idea to consider two-hop neighbors only when the first hop doesn't pass the filter)

I benchmarked using Cohere/wikipedia-22-12-en-embeddings with params:

nDoc = 200000
topK = 100
fanout = 50
maxConn = 32
beamWidth = 100
filterSelectivity = [0.05, 0.25, 0.5, 0.75, 0.95]

Here are some results:

Baseline:

filterSelectivity	recall	latency (ms)
0.05	0.037	17.182
0.25	0.166	7.348
0.5	0.332	4.376
0.75	0.489	3.165
0.95	0.608	2.441

Candidate (this code):

filterSelectivity	recall	latency (ms)
0.05	0.028	2.744
0.25	0.157	4.614
0.5	0.308	4.833
0.75	0.449	4.622
0.95	0.563	3.244

Pros: significantly faster for selective filters.
Cons: slightly worse recall across the board, slightly slower for inclusive filters.

There's a lot to play around with here, this code represents the best results I got with this testing. One thing that must be tested is correlation between filter and query vector (this is discussed and tested in the paper). luceneutil only offers zero correlation at the moment, so I'm working on adding a knob to turn there for future benchmarks.

Code should also be cleaned up, but for me, keeping everything in one method makes it easier to read the changes.

benwtrent · 2025-01-06T21:23:11Z

Thank you for taking a stab at this @benchaplin ! I wonder if we can adjust the algorithm to more intelligently switch between the algorithms. something like:

Fan out one layer (only accepting the filtered docs) add candidates.
If we get an acceptable "saturation" (e.g. some number <= m*2 that we consider adequate connectedness), we just stick to those candidates and explore.
If we do not reach appropriate saturation fan out second layer add candidates.
If we fail saturation (again, some number <= m*2), do we fan out a third layer? Do we "jump up" a layer in the graph to gather a better entry point as the current one is garbage?

The initial algorithm makes sense, we are trying to recover the graph connectedness for exploration. The bottom layer entry point is the initial "exploration zone". One idea also is that we allow multiple "exploration zones" from which we fan out finding the filtered values.

These are just 🧠 ⚡ ideas. The initial numbers are promising.

benwtrent · 2025-01-08T17:28:37Z

Hey @benchaplin there are a number of things broken with lucene util right now. Your recall numbers surprised me and I think they don't reflect actual performance.

I am working on getting real numbers with some local patches & providing patches to Lucene util as I can.

see:

benwtrent · 2025-01-08T19:52:58Z

This is the branch I am using for testing recall/latency for filter cases for right now: https://github.com/mikemccand/luceneutil/compare/main...benwtrent:luceneutil:filter-testing?expand=1

benwtrent · 2025-01-08T19:58:12Z

Here are some benchmarks (100k float32[1024]).

Baseline:

recall  latency (ms)    nDoc  topK  fanout  maxConn  beamWidth  visited  selectivity
 0.915         0.950  100000   100       0       16        100     2054         0.95
 0.918         0.950  100000   100       0       16        100     2128         0.90
 0.924         1.090  100000   100       0       16        100     2417         0.75
 0.935         1.430  100000   100       0       16        100     3357         0.50
 0.962         2.740  100000   100       0       16        100     5846         0.25
 1.000         9.530  100000   100       0       16        100     9882         0.10
 1.000         4.750  100000   100       0       16        100     4913         0.05
 1.000         2.100  100000   100       0       16        100     2507         0.03
 1.000         0.970  100000   100       0       16        100     1023         0.01
 0.975         1.660  100000   100     100       16        100     3545         0.95
 0.977         1.680  100000   100     100       16        100     3682         0.90
 0.977         2.120  100000   100     100       16        100     4218         0.75
 0.984         2.720  100000   100     100       16        100     5789         0.50
 0.990         5.190  100000   100     100       16        100     9889         0.25
 1.000         9.900  100000   100     100       16        100     9883         0.10
 1.000         4.740  100000   100     100       16        100     4913         0.05
 1.000         2.150  100000   100     100       16        100     2507         0.03
 1.000         0.970  100000   100     100       16        100     1023         0.01

Candidate:

recall  latency (ms)    nDoc  topK  fanout  maxConn  beamWidth  visited  selectivity
 0.852         1.330  100000   100       0       16        100     2723         0.95
 0.821         1.510  100000   100       0       16        100     3011         0.90
 0.793         1.680  100000   100       0       16        100     3329         0.75
 0.795         1.730  100000   100       0       16        100     3357         0.50
 0.880         1.820  100000   100       0       16        100     2912         0.25
 0.891         1.430  100000   100       0       16        100     1692         0.10
 0.823         1.320  100000   100       0       16        100     1070         0.05
 0.765         1.170  100000   100       0       16        100      620         0.03
 0.630         1.190  100000   100       0       16        100      341         0.01
 0.942         2.210  100000   100     100       16        100     4786         0.95
 0.934         2.500  100000   100     100       16        100     5360         0.90
 0.927         2.950  100000   100     100       16        100     5983         0.75
 0.925         3.370  100000   100     100       16        100     6005         0.50
 0.956         3.160  100000   100     100       16        100     4711         0.25
 0.947         2.380  100000   100     100       16        100     2560         0.10
 0.895         2.160  100000   100     100       16        100     1569         0.05
 0.842         1.910  100000   100     100       16        100      888         0.03
 0.744         1.940  100000   100     100       16        100      482         0.01

You can see until about 50% selectivity, latency & recall are worse in candidate. However, was we select even fewer than 50%, visited gets better, but recall suffers.

This is likely because in the more restrictive cases we are actually dropping to brute-force because of excessive exploration (note the 1.0 recall in baseline).

benwtrent · 2025-01-08T21:19:34Z

https://github.com/apache/lucene/compare/main...benwtrent:lucene:acorn_search?expand=1

Here are two of my ideas:

We only go to 2-hop if a percentage of the current candidate's neighbors are filtered out
We oversample by a percentage the total candidates considered.

Tweaking the settings might take some steps.

I also think there are things to do around:

Considering jumping up the layer and going to a different entry point if we get "far away" from the current entry point
More intelligently choosing the candidates for two-hop
Consider three-hop on very restrictive filters (e.g. we don't satiate our expanded set, we should look one layer more out)

benwtrent · 2025-01-14T22:04:47Z

I updated my branch further. Got some interesting results which indicate that our graph exploration is slightly too expensive (vint reading and graph seek end up dominating the cost), but still ends up faster with nicer recall behavior than baseline:

1M vectors

Candidate: Note the weirdness where visited is low, but latency is high, this is due to graph serialization costs (vint and binary search to find nsw offsets). But even with this weirdness, it ends up faster than baseline at comparable recall.

recall  latency (ms)     nDoc  topK  fanout  maxConn  beamWidth  visited  index s  index docs/s  index size (MB)  selectivity  vec disk (MB)  vec RAM (MB)
 0.457         0.394  1000000   100       0       16        100      543     0.00      Infinity          3976.06         0.001      3906.250      3906.250
 0.617         4.128  1000000   100       0       16        100     1169     0.00      Infinity          3976.06         0.005      3906.250      3906.250
 0.704         4.460  1000000   100       0       16        100     1119     0.00      Infinity          3976.06         0.01       3906.250      3906.250
 0.856         4.426  1000000   100       0       16        100     3018     0.00      Infinity          3976.06         0.05       3906.250      3906.250
 0.860         3.279  1000000   100       0       16        100     3190     0.00      Infinity          3976.06         0.10       3906.250      3906.250
 0.856         2.868  1000000   100       0       16        100     3280     0.00      Infinity          3976.06         0.20       3906.250      3906.250
 0.844         2.595  1000000   100       0       16        100     3341     0.00      Infinity          3976.06         0.30       3906.250      3906.250
 0.846         2.375  1000000   100       0       16        100     3277     0.00      Infinity          3976.06         0.40       3906.250      3906.250
 0.848         2.125  1000000   100       0       16        100     3127     0.00      Infinity          3976.06         0.50       3906.250      3906.250
 0.852         1.913  1000000   100       0       16        100     2926     0.00      Infinity          3976.06         0.60       3906.250      3906.250
 0.855         1.674  1000000   100       0       16        100     2669     0.00      Infinity          3976.06         0.70       3906.250      3906.250
 0.858         1.503  1000000   100       0       16        100     2357     0.00      Infinity          3976.06         0.80       3906.250      3906.250
 0.866         1.242  1000000   100       0       16        100     2009     0.00      Infinity          3976.06         0.90       3906.250      3906.250
 0.870         1.198  1000000   100       0       16        100     1931     0.00      Infinity          3976.06         0.95       3906.250      3906.250
 1.000         0.613  1000000   100     250       16        100     1082     0.00      Infinity          3976.06         0.001      3906.250      3906.250
 0.830        13.153  1000000   100     250       16        100     2159     0.00      Infinity          3976.06         0.005      3906.250      3906.250
 0.869        15.463  1000000   100     250       16        100     3083     0.00      Infinity          3976.06         0.01       3906.250      3906.250
 0.943        13.776  1000000   100     250       16        100     8378     0.00      Infinity          3976.06         0.05       3906.250      3906.250
 0.952        10.126  1000000   100     250       16        100     8947     0.00      Infinity          3976.06         0.10       3906.250      3906.250
 0.957         7.990  1000000   100     250       16        100     9164     0.00      Infinity          3976.06         0.20       3906.250      3906.250
 0.958         6.964  1000000   100     250       16        100     9239     0.00      Infinity          3976.06         0.30       3906.250      3906.250
 0.960         6.533  1000000   100     250       16        100     9077     0.00      Infinity          3976.06         0.40       3906.250      3906.250
 0.961         5.836  1000000   100     250       16        100     8666     0.00      Infinity          3976.06         0.50       3906.250      3906.250
 0.963         5.671  1000000   100     250       16        100     8117     0.00      Infinity          3976.06         0.60       3906.250      3906.250
 0.965         4.923  1000000   100     250       16        100     7348     0.00      Infinity          3976.06         0.70       3906.250      3906.250
 0.966         4.151  1000000   100     250       16        100     6374     0.00      Infinity          3976.06         0.80       3906.250      3906.250
 0.969         3.323  1000000   100     250       16        100     5392     0.00      Infinity          3976.06         0.90       3906.250      3906.250
 0.970         3.188  1000000   100     250       16        100     5212     0.00      Infinity          3976.06         0.95       3906.250      3906.250

Baseline:

recall  latency (ms)     nDoc  topK  fanout  maxConn  beamWidth  visited  index s  index docs/s  index size (MB)  selectivity  vec disk (MB)  vec RAM (MB)
 1.000         1.041  1000000   100       0       16        100     1955     0.00      Infinity          3976.06         0.001      3906.250      3906.250
 1.000         5.594  1000000   100       0       16        100     9699     0.00      Infinity          3976.06         0.005      3906.250      3906.250
 1.000        12.979  1000000   100       0       16        100    19847     0.00      Infinity          3976.06         0.01       3906.250      3906.250
 0.956        15.219  1000000   100       0       16        100    22052     0.00      Infinity          3976.06         0.05       3906.250      3906.250
 0.950         8.020  1000000   100       0       16        100    12591     0.00      Infinity          3976.06         0.10       3906.250      3906.250
 0.936         4.343  1000000   100       0       16        100     7148     0.00      Infinity          3976.06         0.20       3906.250      3906.250
 0.926         3.095  1000000   100       0       16        100     5202     0.00      Infinity          3976.06         0.30       3906.250      3906.250
 0.917         2.280  1000000   100       0       16        100     4077     0.00      Infinity          3976.06         0.40       3906.250      3906.250
 0.907         1.944  1000000   100       0       16        100     3401     0.00      Infinity          3976.06         0.50       3906.250      3906.250
 0.899         1.750  1000000   100       0       16        100     2951     0.00      Infinity          3976.06         0.60       3906.250      3906.250
 0.892         1.547  1000000   100       0       16        100     2626     0.00      Infinity          3976.06         0.70       3906.250      3906.250
 0.887         1.414  1000000   100       0       16        100     2376     0.00      Infinity          3976.06         0.80       3906.250      3906.250
 0.883         1.277  1000000   100       0       16        100     2154     0.00      Infinity          3976.06         0.90       3906.250      3906.250
 0.879         1.221  1000000   100       0       16        100     2061     0.00      Infinity          3976.06         0.95       3906.250      3906.250
 1.000         1.091  1000000   100     250       16        100     1955     0.00      Infinity          3976.06         0.001      3906.250      3906.250
 1.000         5.649  1000000   100     250       16        100     9699     0.00      Infinity          3976.06         0.005      3906.250      3906.250
 1.000        13.369  1000000   100     250       16        100    19847     0.00      Infinity          3976.06         0.01       3906.250      3906.250
 0.999        57.741  1000000   100     250       16        100    90785     0.00      Infinity          3976.06         0.05       3906.250      3906.250
 0.991        25.507  1000000   100     250       16        100    35212     0.00      Infinity          3976.06         0.10       3906.250      3906.250
 0.988        13.688  1000000   100     250       16        100    19755     0.00      Infinity          3976.06         0.20       3906.250      3906.250
 0.985         9.582  1000000   100     250       16        100    14228     0.00      Infinity          3976.06         0.30       3906.250      3906.250
 0.984         7.421  1000000   100     250       16        100    11239     0.00      Infinity          3976.06         0.40       3906.250      3906.250
 0.982         5.938  1000000   100     250       16        100     9366     0.00      Infinity          3976.06         0.50       3906.250      3906.250
 0.980         5.067  1000000   100     250       16        100     8080     0.00      Infinity          3976.06         0.60       3906.250      3906.250
 0.978         4.419  1000000   100     250       16        100     7147     0.00      Infinity          3976.06         0.70       3906.250      3906.250
 0.976         3.970  1000000   100     250       16        100     6443     0.00      Infinity          3976.06         0.80       3906.250      3906.250
 0.975         3.587  1000000   100     250       16        100     5843     0.00      Infinity          3976.06         0.90       3906.250      3906.250
 0.974         3.394  1000000   100     250       16        100     5587     0.00      Infinity          3976.06         0.95       3906.250      3906.250

benchaplin · 2025-01-14T23:02:33Z

Awesome stuff @benwtrent - thanks for spearheading the luceneutil recall fix, still trying to wrap my head around how I followed so many "patterns" in those numbers during initial development when I guess they were all... bogus? Anyway, glad you were able to get back the gains.

I like your idea to look past 2-hop nbs to a point (from what I can tell, that's what you're doing now with filteredNeighborQueue.add(twoHopFriendOrd);). I'm struggling to understand the reasoning behind the definitions of expandedNeighborCount & maxExpandedNeighbors though.

I'll pull your branch and play around with some parameters. Also been working on adding correlated filters to luceneutil so should be able to test with that soon.

benwtrent · 2025-01-15T01:30:30Z

Thanks @benchaplin

Those constants and numbers are focused on expanding and contracting the graph search as we hit various NSW with more or fewer matching docs.

One dictates how many extra matching candidates we take over the current NSW connections. (e.g. collecting 40 valid candidates instead of 32).

The other is controlling how far into the graph we will search non- matching candidates. At some point, you just gotta cut your losses. I tried to relate this to how restricted the filter is respective to the NSW connections.

The numbers and their calculations are definitely not written in stone, so if you find something that works better let me know.

Definitely a POC and poorly documented right now.

Another good idea is to expand our entry points to layer zero on very restricted filters (e.g. when the filter limit is less than the number of docs on layer 1 or something).

benwtrent · 2025-01-16T14:54:36Z

@benchaplin I updated my branch further and simplified the logic:

we consider at max 2x scored candidates
We will explore at max 3x filtered candidates

This seemed to be a sweat spot and simplified the math (which had numerous numerical errors).

I think its getting close to being "done", there are some failing tests, but I think they can be easily fixed.

One thing I am worried about is that this change will pretty substantially change recall at various filter levels. Filtered kNN search is likely the most common kNN search there is. So, we should some how allow users to keep the old behavior and opt in as this might be considered a breaking change due to the significant change in behavior. Maybe its not a breaking change, maybe its just ok 🤷

Here are some other augmentations to the current change I tried. Neither seems justified IMO.

A. Adding just the first maxConn filtered docs to the result set & entry point list on the bottom layer. This didn't seem to do anything useful and hurt the recall/latency & vector ops measurements
B. Adding additional 'nearest entry points' from layer 1. So, gather the nearest and then gather N more (up to some limit related to maxConn) and use those when exploring the base layer. This seemed to have better behavior for highly filtered scenarios than option A. But, it still didn't add much. So, I opted for simplicity.

benwtrent · 2025-01-17T13:51:12Z

I am still getting some pretty abhorrent cases (0.005 filter over 1M docs with 100 k and 100-250 fan-out).

I wonder if I have a bug with tracking visited nodes and allowing adequate exploration of valid candidates....

benwtrent · 2025-01-17T21:20:40Z

I think I have addressed the bug in my implementation. I simplified it greatly and it more resembles your original change, though with some constants being changed. The recall curve & runtime curve look way better now for most filtered results. I will post graphs soon as the raw data is an eye sore.

Here is my data: https://docs.google.com/spreadsheets/d/1GqD7Jw42IIqimr2nB78fzEfOohrcBlJzOlpt0NuUVDQ/edit?usp=sharing

It looks like at 0.5 and above the improvements start tapering off, but it never gets significantly worse :/

Here is a graph of some of the more restrictive filters. 10% filtered allowed does much much better with this new algorithm.

benchaplin · 2025-01-17T22:46:37Z

One thing I've been unsure about is this "dropping to brute-force" (recall = 1 & visited ~= selectivity * nDoc). I can't seem to find where this happens in code.

It's interesting to see where the gains happen (looking at your Google doc) - it seems like 0.1 selectivity is the sweet spot, 3-4x latency speedup with little recall drop (particularly for high fanouts). By 0.5 selectivity those gains are gone. For selectivities <= 0.01, baseline drops to brute-force. But the candidate gets worse and worse recall numbers until dropping to brute force at 0.001.

Pure opt in is one option, but I wonder if it's possible to force the candidate to drop to brute-force for the same cases the baseline does. I guess the question is: will anyone get value out of 0.5 recall 2-3x faster than brute force 1 recall?

benwtrent · 2025-01-18T12:54:30Z

The brute force drop occurs when the visit limit is breached. You can see this in abstract knn query.

I only count visited on vector comparisons. So filtered search will explore the graph more before dropping to brute force.

I do think we will always need a "drop to brute force" to catch edge cases but we need to make the calculation on when to do brute force upfront better. Right now it's pretty naive.

I tried this a little in my change (see changes in hnsw reader), but it can be much better.

As for exposing this to users, I am thinking it needs to be a setting on the knn query that gets passed to the collector. This is a pretty big change in search behavior

Implement ACORN-1 search for HNSW

046178b

benchaplin mentioned this pull request Dec 20, 2024

Look into ACORN-1, or another algorithm to aid in filtered HNSW search #13940

Open

benwtrent mentioned this pull request Jan 22, 2025

Add new Acorn-esque filtered HNSW search heuristic #14160

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement ACORN-1 search for HNSW #14085

Implement ACORN-1 search for HNSW #14085

benchaplin commented Dec 20, 2024 •

edited

Loading

benwtrent commented Jan 6, 2025

benwtrent commented Jan 8, 2025

benwtrent commented Jan 8, 2025

benwtrent commented Jan 8, 2025

benwtrent commented Jan 8, 2025

benwtrent commented Jan 14, 2025

benchaplin commented Jan 14, 2025

benwtrent commented Jan 15, 2025

benwtrent commented Jan 16, 2025

benwtrent commented Jan 17, 2025

benwtrent commented Jan 17, 2025

benchaplin commented Jan 17, 2025

benwtrent commented Jan 18, 2025

Implement ACORN-1 search for HNSW #14085

Are you sure you want to change the base?

Implement ACORN-1 search for HNSW #14085

Conversation

benchaplin commented Dec 20, 2024 • edited Loading

Description

benwtrent commented Jan 6, 2025

benwtrent commented Jan 8, 2025

benwtrent commented Jan 8, 2025

benwtrent commented Jan 8, 2025

benwtrent commented Jan 8, 2025

benwtrent commented Jan 14, 2025

benchaplin commented Jan 14, 2025

benwtrent commented Jan 15, 2025

benwtrent commented Jan 16, 2025

benwtrent commented Jan 17, 2025

benwtrent commented Jan 17, 2025

benchaplin commented Jan 17, 2025

benwtrent commented Jan 18, 2025

benchaplin commented Dec 20, 2024 •

edited

Loading