Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search optimization #52

Open
kabilov opened this issue Nov 28, 2024 · 2 comments
Open

Search optimization #52

kabilov opened this issue Nov 28, 2024 · 2 comments

Comments

@kabilov
Copy link

kabilov commented Nov 28, 2024

Hi!

Maybe I'm not quite right, but it seemed that loading the database into memory each time a new sample is analyzed is time-consuming. For example, in my case, loading GTDB into memory takes 30 minutes (with --load-whole-db) and if there are 10 samples, then 5 hours of time are lost. Maybe, following the example of the STAR (https://github.com/alexdobin/STAR) , separate commands for loading and unloading the database into and from memory should be introduced.
That is, the database is loaded once, all samples are analyzed, and then the memory is freed.

Best wishes,
Marsel

@shenwei356
Copy link
Owner

Did you try not using --load-whole-db? It won't take extra time to load the database, while it might be slow in cluster environments where the database is located in network-attached-storage.

@kabilov
Copy link
Author

kabilov commented Nov 30, 2024

Time of analysis the same data (204 Mb):
3.5 h with --load-whole-db (0.5 h = loading database, 3 h = analysis)
6 h without --load-whole-db

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants