-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
throughput of inserted data #221
Comments
Hi @heyufeng666888, Could you report the throughput numbers you are getting? Insert throughput should not be very low. I will investigate, once I have more details! You can get even faster throughput if you insert into the table before creating the vector index, and create the vector index externally, as described here: https://docs.lantern.dev/lantern-cli/lantern-index#run-index-creation |
@Ngalstyan4 It is indeed fast to insert data directly without an index. But my business does not allow indexing to be established after inserting data, as the formal business involves both inbound and retrieval operations. Therefore, I have already established the index before inserting data. At this time, the speed of inserting data is very slow, and it takes 3336 seconds for 20 threads of 100k data |
My vector field is already available and features have been extracted. My current business is to create an index in advance and then insert data, which has a low throughput |
Because my feature model has already been generated, I may need a requirement for asynchronous index construction to ensure that the index construction does not affect the insertion of data |
Hi @heyufeng666888 , can you check this Jupyter notebook on your machine? I have inserted 100k 1536 dimensional vectors in ~700 seconds on my Macbook Pro using single connection. Do you have any particular example with more details, so we can help better? |
Hi @heyufeng666888, Did you get a chance to check out the notebook @var77 shared above? Please let us know if you are still having performance issues. Having more details would definitely help us address the issue more quickly, assuming it still exists. |
Hi @Ngalstyan4, |
@Ngalstyan4 @var77 |
Hi @Ngalstyan4 @var77 I tested the throughput using @var77's notebook, which is still very low. May I know the configuration of Postgres and the version of lantern tested? {'platform': 'Darwin', 'platform-release': '23.1.0', 'platform-version': 'Darwin Kernel Version 23.1.0: Mon Oct 9 21:27:27 PDT 2023; root:xnu-10002.41.9~6/RELEASE_X86_64', 'architecture': 'x86_64', 'processor': 'i386', 'ram': '16 GB', 'cores': 8} Inserted 10000 items - speed 14.33975296961922 item/s |
Hi @heyufeng666888 sorry for inconvenience. I have used the latest version of Lantern (built from source) with Postgres 15 installed with homebrew (the postgres settings were the defaults for me). Also you can try to increase shared_buffers from postgres configs to 20% of your memory and maybe set |
@var77 lantern_v0.0.5? |
Yes @heyufeng666888 |
@var77 Postgres installed on your macbookpro? |
Yes @heyufeng666888 via homebrew |
Hi @var77 I tested the throughput of a single thread on the Mac and it is indeed the same as yours, but the processing speed of multiple threads and single threads is the same and there is no improvement. |
Thanks for the info @heyufeng666888. We assume that the HNSW index fits in memory and will always be cached by postgres. This is typical assumption for all kinds of postgres indexes. If index is not in memory, both insert and search queries become extremely slow. In Lantern, inserts from multiple threads are indeed not any faster right now because of our implementation limitations. This will be improved soon, however! Is the current speed of insertions unacceptable for you? |
Hello, I tested that the throughput during concurrent insertion is very low and the CPU usage is very low. 512 dimensional M32 ef_ Construction 128
The text was updated successfully, but these errors were encountered: