Search optimization #52

kabilov · 2024-11-28T01:49:38Z

Hi!

Maybe I'm not quite right, but it seemed that loading the database into memory each time a new sample is analyzed is time-consuming. For example, in my case, loading GTDB into memory takes 30 minutes (with --load-whole-db) and if there are 10 samples, then 5 hours of time are lost. Maybe, following the example of the STAR (https://github.com/alexdobin/STAR) , separate commands for loading and unloading the database into and from memory should be introduced.
That is, the database is loaded once, all samples are analyzed, and then the memory is freed.

Best wishes,
Marsel

shenwei356 · 2024-11-28T04:12:52Z

Did you try not using --load-whole-db? It won't take extra time to load the database, while it might be slow in cluster environments where the database is located in network-attached-storage.

kabilov · 2024-11-30T10:23:47Z

Time of analysis the same data (204 Mb):
3.5 h with --load-whole-db (0.5 h = loading database, 3 h = analysis)
6 h without --load-whole-db

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Search optimization #52

Search optimization #52

kabilov commented Nov 28, 2024 •

edited

Loading

shenwei356 commented Nov 28, 2024

kabilov commented Nov 30, 2024

Search optimization #52

Search optimization #52

Comments

kabilov commented Nov 28, 2024 • edited Loading

shenwei356 commented Nov 28, 2024

kabilov commented Nov 30, 2024

kabilov commented Nov 28, 2024 •

edited

Loading