-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
long vectors not supported yet #72
Comments
Hi Daisy, I just pushed a change that should resolve the current issue you're facing. So if you re-download and install the treeWAS package (with dependencies=TRUE) from GitHub, it should work without hitting that error. The line causing the error runs before the lines that are adjusted by the memory limit setting (which then subdivides your snps data into chunks to make it more manageable), which is why it wasn't affected by the mem.lim parameter, unfortunately. That's a mighty large dataset you're working with (gotta love unitigs), so you may run into other issues. If you do, please let me know and I'll try to get back to you quicker with a fix. I'm keen to make the package more scalable. Best, |
Hi Caitlin, Thanks for looking into this and making the change. This has prevented the previous vector issue but we are now encountering another issue:
On another note, I tried to remove the following lines from the previous treeWAS.R code, to get around the long vectors issue. With 900GB memory this split the unitig matrix into 83 chunks, however each chunk took around 24 hours to process. Is this to be expected? portion of treeWAS.R code removed:
|
Hi Daisy, That sounds like far longer per chunk than I would expect. It sounds like you may be bumping up against some memory constraints still, which could be slowing it down. I would suggest trying to run a larger number of smaller chunks. Typically this doesn't actually take longer than running fewer larger chunks, and it may help if you're still approaching any unseen memory limits within each chunk. Try setting chunk.size=10000. |
Hi Caitlin,
Thanks for developing TreeWAS!
I'm trying to use unitigs with TreeWAS, and have been running into the below error:
I've successfully already run TreeWAS on a smaller gene absence presence dataset so to me this looks like a memory based issue. I've therefore added the mem.lim parameter but I still receive the same error. I'm running the job on the cluster with 925GB. The unitigs file is around 27GB, with 2806 genomes and 5682556 unitigs/columns.
My TreeWAS commands are below:
I've also tried using mem.lim = TRUE but this has given me the same error.
If I reduce the number of columns in the unitig matrix down to 1000, TreeWAS then works.
Do you have any advice please, for dealing with a large unitig matrix?
The text was updated successfully, but these errors were encountered: