-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement ACORN-1 search for HNSW #14085
base: main
Are you sure you want to change the base?
Conversation
Thank you for taking a stab at this @benchaplin ! I wonder if we can adjust the algorithm to more intelligently switch between the algorithms. something like:
The initial algorithm makes sense, we are trying to recover the graph connectedness for exploration. The bottom layer entry point is the initial "exploration zone". One idea also is that we allow multiple "exploration zones" from which we fan out finding the filtered values. These are just 🧠 ⚡ ideas. The initial numbers are promising. |
Hey @benchaplin there are a number of things broken with lucene util right now. Your recall numbers surprised me and I think they don't reflect actual performance. I am working on getting real numbers with some local patches & providing patches to Lucene util as I can. see: |
This is the branch I am using for testing recall/latency for filter cases for right now: https://github.com/mikemccand/luceneutil/compare/main...benwtrent:luceneutil:filter-testing?expand=1 |
Here are some benchmarks (100k float32[1024]). Baseline:
Candidate:
You can see until about 50% selectivity, latency & recall are worse in candidate. However, was we select even fewer than 50%, visited gets better, but recall suffers. This is likely because in the more restrictive cases we are actually dropping to brute-force because of excessive exploration (note the 1.0 recall in baseline). |
https://github.com/apache/lucene/compare/main...benwtrent:lucene:acorn_search?expand=1 Here are two of my ideas:
Tweaking the settings might take some steps. I also think there are things to do around:
|
I updated my branch further. Got some interesting results which indicate that our graph exploration is slightly too expensive (vint reading and graph seek end up dominating the cost), but still ends up faster with nicer recall behavior than baseline: 1M vectors Candidate: Note the weirdness where visited is low, but latency is high, this is due to graph serialization costs (vint and binary search to find nsw offsets). But even with this weirdness, it ends up faster than baseline at comparable recall.
Baseline:
|
Awesome stuff @benwtrent - thanks for spearheading the luceneutil recall fix, still trying to wrap my head around how I followed so many "patterns" in those numbers during initial development when I guess they were all... bogus? Anyway, glad you were able to get back the gains. I like your idea to look past 2-hop nbs to a point (from what I can tell, that's what you're doing now with I'll pull your branch and play around with some parameters. Also been working on adding correlated filters to luceneutil so should be able to test with that soon. |
Thanks @benchaplin Those constants and numbers are focused on expanding and contracting the graph search as we hit various NSW with more or fewer matching docs. One dictates how many extra matching candidates we take over the current NSW connections. (e.g. collecting 40 valid candidates instead of 32). The other is controlling how far into the graph we will search non- matching candidates. At some point, you just gotta cut your losses. I tried to relate this to how restricted the filter is respective to the NSW connections. The numbers and their calculations are definitely not written in stone, so if you find something that works better let me know. Definitely a POC and poorly documented right now. Another good idea is to expand our entry points to layer zero on very restricted filters (e.g. when the filter limit is less than the number of docs on layer 1 or something). |
@benchaplin I updated my branch further and simplified the logic:
This seemed to be a sweat spot and simplified the math (which had numerous numerical errors). I think its getting close to being "done", there are some failing tests, but I think they can be easily fixed. One thing I am worried about is that this change will pretty substantially change recall at various filter levels. Filtered kNN search is likely the most common kNN search there is. So, we should some how allow users to keep the old behavior and opt in as this might be considered a breaking change due to the significant change in behavior. Maybe its not a breaking change, maybe its just ok 🤷 Here are some other augmentations to the current change I tried. Neither seems justified IMO. A. Adding just the first |
I am still getting some pretty abhorrent cases (0.005 filter over 1M docs with 100 k and 100-250 fan-out). I wonder if I have a bug with tracking visited nodes and allowing adequate exploration of valid candidates.... |
I think I have addressed the bug in my implementation. I simplified it greatly and it more resembles your original change, though with some constants being changed. The recall curve & runtime curve look way better now for most filtered results. I will post graphs soon as the raw data is an eye sore. Here is my data: https://docs.google.com/spreadsheets/d/1GqD7Jw42IIqimr2nB78fzEfOohrcBlJzOlpt0NuUVDQ/edit?usp=sharing It looks like at 0.5 and above the improvements start tapering off, but it never gets significantly worse :/ Here is a graph of some of the more restrictive filters. 10% filtered allowed does much much better with this new algorithm. |
One thing I've been unsure about is this "dropping to brute-force" (recall = 1 & visited ~= selectivity * nDoc). I can't seem to find where this happens in code. It's interesting to see where the gains happen (looking at your Google doc) - it seems like 0.1 selectivity is the sweet spot, 3-4x latency speedup with little recall drop (particularly for high fanouts). By 0.5 selectivity those gains are gone. For selectivities <= 0.01, baseline drops to brute-force. But the candidate gets worse and worse recall numbers until dropping to brute force at 0.001. Pure opt in is one option, but I wonder if it's possible to force the candidate to drop to brute-force for the same cases the baseline does. I guess the question is: will anyone get value out of 0.5 recall 2-3x faster than brute force 1 recall? |
The brute force drop occurs when the visit limit is breached. You can see this in abstract knn query. I only count visited on vector comparisons. So filtered search will explore the graph more before dropping to brute force. I do think we will always need a "drop to brute force" to catch edge cases but we need to make the calculation on when to do brute force upfront better. Right now it's pretty naive. I tried this a little in my change (see changes in hnsw reader), but it can be much better. As for exposing this to users, I am thinking it needs to be a setting on the knn query that gets passed to the collector. This is a pretty big change in search behavior |
Description
Playing around with some ideas from ACORN-1 to improve filtered HNSW search. The ideas are:
I benchmarked using Cohere/wikipedia-22-12-en-embeddings with params:
Here are some results:
Baseline:
Candidate (this code):
Pros: significantly faster for selective filters.
Cons: slightly worse recall across the board, slightly slower for inclusive filters.
There's a lot to play around with here, this code represents the best results I got with this testing. One thing that must be tested is correlation between filter and query vector (this is discussed and tested in the paper). luceneutil only offers zero correlation at the moment, so I'm working on adding a knob to turn there for future benchmarks.
Code should also be cleaned up, but for me, keeping everything in one method makes it easier to read the changes.