-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve the performance of CAGRA new vector addition with the default params #569
Improve the performance of CAGRA new vector addition with the default params #569
Conversation
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @enp1s0 for the PR! It looks good, but we should discuss whether we need a new parameter, or whether we could utilize the workspace
memory resource for controlling the batch size.
The raft resource handle has a workspace
memory resource. This is set up as an rmm::mr::limiting_resource_adaptor. The intended usage of the workspace
is to query it's size, and adjust our internal batching parameter accordingly.
Here is an example how we can query the available workspace size using get_workspace_free_bytes
Afterwards, we need to use the workspace
memory resource to allocate the temporary buffers:
cuvs/cpp/src/neighbors/ivf_pq/ivf_pq_search.cuh
Lines 678 to 684 in bd603a9
auto mr = raft::resource::get_workspace_resource(handle); | |
// Maximum number of query vectors to search at the same time. | |
const auto max_queries = std::min<uint32_t>(std::max<uint32_t>(n_queries, 1), kMaxQueries); | |
auto max_batch_size = get_max_batch_size(handle, k, n_probes, max_queries, max_samples); | |
rmm::device_uvector<float> float_queries(max_queries * dim_ext, stream, mr); |
Users who need fine control on temporary allocations can set the workspace allocator explicitly:
cuvs/examples/cpp/src/cagra_example.cu
Line 76 in bd603a9
// raft::resource::set_workspace_to_pool_resource(dev_resources, 2 * 1024 * 1024 * 1024ull); |
Note that by default the allocation limit is 1/4th of GPU Total memory, which is potentially much larger then the default for max_working_device_memory_size_in_megabyte
introduced` in this PR.
…d use `raft::resource::get_workspace_free_bytes` instead
…to update-cagra-graph-extend
Thank you @tfeher for the review. I changed the code to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Hiroyuki for the update! I have one more question below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @enp1s0 for the updates, the PR looks good to me!
/merge |
This PR updates the default chunk size of the CAGRA graph extension and also adds a knob to control the batch size of the CAGRA searches run inside for better throughput.
The default chunk size was set to 1 in the current implementation because there is a potential problem with low recall when the chunk size is large, because no edges are made within nodes in the same chunk. However, as I have investigated, the low recall problem rarely occurs with large chunk sizes.
Search performance
The performance was measured after applying a bugfix #565
degree = 32
(I don't know the reason the performance is unstable in NYTimes.)
degree = 64
So I increase the default chunk size to the size of the new dataset vectors for better throughput in this PR. I also make public a knob to control the search batch size in the `extend' function to control the balance between throughput and memory consumption.