Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large data set crashing #4

Open
jpalmer37 opened this issue Jul 10, 2019 · 6 comments
Open

Large data set crashing #4

jpalmer37 opened this issue Jul 10, 2019 · 6 comments
Labels

Comments

@jpalmer37
Copy link

Hi there,

I've been using Historian to reconstruct the ancestral sequences within multiple different sequence data sets. My largest data set, which contains roughly 1000 sequences, has been unable to complete a run with Historian. The error message is displayed below with a high verbosity setting, simply stating "Killed" when it crashes.

Screenshot from 2019-07-09 13-05-30

I was first wondering whether this is expected when handling a large data set like this one. Is Historian able to handle sequence sets of this size?

If this failure isn't expected, would you know of a way to retrieve more information about the problem? Or know of possible adjustments to try?

Thanks in advance,

John

@ihh
Copy link
Member

ihh commented Jul 11, 2019

Hi John,

Sorry to hear you've had this issue. If you're willing to share your data file, I'd like to try and replicate it.

A few points:

  • From the info you've given (including the 1000 sequences and the cryptic "Killed" message), this sounds like an out-of-memory error
  • If you can't share the input file, can you be more specific about its nature (are the sequences unaligned, or are you supplying a guide alignment? How long are they?)
  • Historian should be able to deal with large datasets, but you may need to limit its memory usage. The default options make some attempt to do this, but there is a slightly nontrivial interplay between the sequence length, number of sequences, diversity of sequences, and size of the ancestral sequence profiles, which may mean that memory usage needs to be fine-tuned
  • There are a few options to constrain historian's memory usage, specifically the amount of memory it allocates to profiles of ancestral sequences; for example the -profmaxmem option may be useful. historian -h will give you a list of all options (see under "Reconstruction algorithm options" in the help text)

@jpalmer37
Copy link
Author

Hi Dr. Holmes,

It's interesting to hear what you mentioned about the memory usage. I will definitely take a look at the -profmaxmem option in the manual.

My data is publicly available online, so I'd certainly be willing to share my data with you. I use both a guide alignment and guide tree as input when running Historian, so I'll send both of those files to you. Would you prefer to receive them over email?

Thanks for the quick response!

@ihh
Copy link
Member

ihh commented Jul 11, 2019

Hi John, great thanks! You can attach the file here, or send by email, whichever is easiest. It may take me a few days to get around to debugging but I will try to prioritize it.

One option if you are using a guide alignment is to constrain the reconstruction to be very close to that guide alignment. The -band option specifies the width of the band around the guide alignment that historian will use. By default it is 40, but if you set it to e.g. -band 5 then it should go much faster and use less memory (but will obviously be more dependent on the accuracy of the guide alignment).

@jpalmer37
Copy link
Author

Much appreciated, Dr. Holmes! This issue isn't very urgent or pressing, so please take the time you need. I included a second data set which is essentially identical and appeared to run into the same problem. Both contain roughly the same number of sequences (~1000), but the one labeled 101034 contains greater sequence diversity.

historian_data.zip

And thank you for the suggestion. I read about the -band option previously, but haven't experimented with it. Good to know that it might be useful for reducing memory consumption. Thanks again!

@ihh
Copy link
Member

ihh commented Jul 18, 2019

Running this in background on my laptop now. I do see some hefty memory usage. Could you supply the exact command-line usage that led to the crash, and also details of your machine (most importantly memory, but also OS, CPU etc)?

@jpalmer37
Copy link
Author

Certainly.

This is the original command I used to run Historian on both machines:

historian -vvv -guide ~/4MSA/111848.fasta -tree ~/7_MCC/rescaled/111848.tree -ancseq -output fasta > 111848_recon.fasta

This was the first machine that I detected the crash on (but could not see the error messages):

CPU 2 x (Intel Xeon E5-2690v4 2.6 GHz 14-core/28-thread)
OS CentOS 7.3
RAM 56 GB

This is the machine where I performed a single test run to read the output of the crash:

CPU AMD Ryzen Threadripper 1950X (16-core/32-thread) 4.0 GHz
OS Ubuntu 18.04.2 LTS (Bionic Beaver)
RAM 32 GB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants