Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extract stuck re-writing output files #6

Open
cbreenmachine opened this issue Apr 28, 2021 · 3 comments
Open

extract stuck re-writing output files #6

cbreenmachine opened this issue Apr 28, 2021 · 3 comments

Comments

@cbreenmachine
Copy link

Hi Simon,

Thanks again for your help earlier with the non-stranded issue. I am having one (hopefully easy-to-fix) issue with methylation extraction.

Running extract on a few human WGBS datasets does not finish. For some context, mapping and calling on five WGBS can take 2-3 days on my current system without a problem. Running extract will go for a week or more, and it seems that the command is working well, but getting stuck in a loop where it will write an output file, then do an asset check, decide the file needs to be created and then start the whole process over. Because of this, keeping track of the extract output folder size every five minutes results in something like 22 G --> 28 G --> 34 G --> 41 G --> 20 G --> ...

Here's what I have tried:

  1. Used a different server. Issue seems to be the same on Ubuntu as well as a Red Hat Linux Distro
  2. Tried one human WGBS instead of five "at once". Issue still persists.
  3. Restricted the output to just ENCODE style, just gemBS style outputs. It does not seem to make a difference.
  4. Tried increasing the default RAM available to 100G. Does not seem to change anything.

I've interrupted the process before and looked at the files before and they seem reasonable and match my collaborator's outputs. Of course because I interrupted the process the EOF and most of the data has not been written.

Is there any more information I can provide? Or is this a case of user-error?

Thanks very much!

Best,
Coleman

Here is console output in default mode:
/usr/local/lib/gemBS/bin/mextr --loglevel info --compress --md5 --regions-file ./extract/sample002_108_mextr_ctgs.bed --cpgfile ./extract/sample002_108_cpg.txt.gz --tabix ./calls/sample002_108.bcf INFO - Launch: /usr/local/lib/gemBS/bin/mextr --loglevel info --compress --md5 --regions-file ./extract/sample003_109_mextr_ctgs.bed --cpgfile ./extract/sample003_109_cpg.txt.gz --tabix ./calls/sample003_109.bcf INFO - Launch: /usr/local/lib/gemBS/bin/mextr --loglevel info --compress --md5 --regions-file ./extract/sample004_110_mextr_ctgs.bed --cpgfile ./extract/sample004_110_cpg.txt.gz --tabix ./calls/sample004_110.bcf INFO - Launch: /usr/local/lib/gemBS/bin/mextr --loglevel info --compress --md5 --regions-file ./extract/sample005_111_mextr_ctgs.bed --cpgfile ./extract/sample005_111_cpg.txt.gz --tabix ./calls/sample005_111.bcf INFO - Launch: /usr/local/lib/gemBS/bin/mextr --loglevel info --compress --md5 --regions-file ./extract/sample002_108_mextr_ctgs.bed --cpgfile ./extract/sample002_108_cpg.txt.gz --tabix ./calls/sample002_108.bcf INFO - Launch: /usr/local/lib/gemBS/bin/mextr --loglevel info --compress --md5 --regions-file ./extract/sample003_109_mextr_ctgs.bed --cpgfile ./extract/sample003_109_cpg.txt.gz --tabix ./calls/sample003_109.bcf INFO - Launch: /usr/local/lib/gemBS/bin/mextr --loglevel info --compress --md5 --regions-file ./extract/sample004_110_mextr_ctgs.bed --cpgfile ./extract/sample004_110_cpg.txt.gz --tabix ./calls/sample004_110.bcf INFO - Launch: /usr/local/lib/gemBS/bin/mextr --loglevel info --compress --md5 --regions-file ./extract/sample005_111_mextr_ctgs.bed --cpgfile ./extract/sample005_111_cpg.txt.gz --tabix ./calls/sample005_111.bcf INFO - Launch: /usr/local/lib/gemBS/bin/mextr --loglevel info --compress --md5 --regions-file ./extract/sample002_108_mextr_ctgs.bed --cpgfile ./extract/sample002_108_cpg.txt.gz --tabix ./calls/sample002_108.bcf INFO - Launch: /usr/local/lib/gemBS/bin/mextr --loglevel info --compress --md5 --regions-file ./extract/sample003_109_mextr_ctgs.bed --cpgfile ./extract/sample003_109_cpg.txt.gz --tabix ./calls/sample003_109.bcf INFO - Launch: /usr/local/lib/gemBS/bin/mextr --loglevel info --compress --md5 --regions-file ./extract/sample004_110_mextr_ctgs.bed --cpgfile ./extract/sample004_110_cpg.txt.gz --tabix ./calls/sample004_110.bcf INFO - Launch: /usr/local/lib/gemBS/bin/mextr --loglevel info --compress --md5 --regions-file ./extract/sample005_111_mextr_ctgs.bed --cpgfile ./extract/sample005_111_cpg.txt.gz --tabix ./calls/sample005_111.bcf INFO - Launch: /usr/local/lib/gemBS/bin/mextr --loglevel info --compress --md5 --regions-file ./extract/sample002_108_mextr_ctgs.bed --cpgfile ./extract/sample002_108_cpg.txt.gz --tabix ./calls/sample002_108.bcf INFO - Launch: /usr/local/lib/gemBS/bin/mextr --loglevel info --compress --md5 --regions-file ./extract/sample003_109_mextr_ctgs.bed --cpgfile ./extract/sample003_109_cpg.txt.gz --tabix ./calls/sample003_109.bcf

This pattern will continue over and over until interrupted. And then in debug mode (run on a different batch of five samples):

DEBUG - Asset check: "sample009_115_cpg.bed.gz" "./extract/sample009_115_cpg.bed.gz" Absent
DEBUG - Asset check: "sample009_115_cpg.bed.gz.md5" "./extract/sample009_115_cpg.bed.gz.md5" Absent
DEBUG - Asset check: "sample009_115_cpg.bb" "./extract/sample009_115_cpg.bb" Absent
DEBUG - Asset check: "sample009_115_cpg.bb.md5" "./extract/sample009_115_cpg.bb.md5" Absent
DEBUG - Asset check: "sample009_115_chg.bed.gz" "./extract/sample009_115_chg.bed.gz" Absent
DEBUG - Asset check: "sample009_115_chg.bed.gz.md5" "./extract/sample009_115_chg.bed.gz.md5" Absent
DEBUG - Asset check: "sample009_115_chg.bb" "./extract/sample009_115_chg.bb" Absent
DEBUG - Asset check: "sample009_115_chg.bb.md5" "./extract/sample009_115_chg.bb.md5" Absent
DEBUG - Asset check: "sample009_115_chh.bed.gz" "./extract/sample009_115_chh.bed.gz" Absent
DEBUG - Asset check: "sample009_115_chh.bed.gz.md5" "./extract/sample009_115_chh.bed.gz.md5" Absent
DEBUG - Asset check: "sample009_115_chh.bb" "./extract/sample009_115_chh.bb" Absent
DEBUG - Asset check: "sample009_115_chh.bb.md5" "./extract/sample009_115_chh.bb.md5" Absent
DEBUG - Asset check: "sample009_115_.bw" "./extract/sample009_115_.bw" Absent
DEBUG - Asset check: "sample009_115_.bw.md5" "./extract/sample009_115_.bw.md5" Absent
DEBUG - Asset check: "report.tex" "./report/GemBS_QC_Report.tex" Present
DEBUG - Asset check: "report.html" "./report/GemBS_QC_Report.html" Present
DEBUG - Avail slots: 45.0001, avail memory: 11.6 GB
DEBUG - No execution slots

@atggcagatgagtatgcattaaagtag

@cbreenmachine I'm encountering a similar problem. Did you manage to find a solution or workaround?

@cbreenmachine
Copy link
Author

Unfortunately no. Went back and forth with sys admins for weeks and they couldn't figure it out. Ended up abandoning this and using the gem mapper and bs_call separately. While gemBS is fast when it's working, it does not seem to be actively maintained and buggy. I ended up losing a lot of time working around these types of issues. I'd recommend using bismark instead.

@heathsc
Copy link
Owner

heathsc commented May 3, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants