-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coverage discrepancy with inStrain profile #99
Comments
Hi Candice, Hmm interesting. A couple of questions-
Thanks, |
Hi Matt,
Thanks so much for the quick response! I visualized the BAM file in Tablet
and IGV. I also used another program that is part of the ARTIC field
bioinformatics pipeline (specifically artic minion) to call the variants
and it gave me ~400x at the position of interest (contrast to ~50x from
instrain profile for this SNS).
Exact values as follows:
Tablet average coverage depth (for the entire BAM)=317.6x
Tablet coverage at position 23,403=400x
IGV coverage at position 23,403=347x
ARTIC minion total reads at position 23,403=395x
The disagreement between IGV and Tablet at this position is because reads
with deletions/insertions are not included in the total reads. When these
are added back in, IGV agrees with Tablet and ARTIC.
So right now I have 3 different methods in agreement for the coverage and
inStrain is the outlier.
Attached is the mapping_info.tsv. I also tried attaching a zipped BAM and
BAI file, but the file was too big to go through email.
Thanks again,
Candice
…On Sat, Dec 18, 2021 at 2:23 PM Matt Olm ***@***.***> wrote:
Hi Candice,
Hmm interesting. A couple of questions-
1.
What program to calculate coverage gave you the count of ~400x?
2.
Could you attached the file mapping_info.tsv from the inStrain output
file?
Thanks,
Matt
—
Reply to this email directly, view it on GitHub
<#99 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AK6JF7GHIA5KQIPYDI4CFF3URTNR3ANCNFSM5KKS7V5Q>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Hi Candice, Thanks for all of this- it certainly seems like inStrain is struggling with the long-reads. I'll note that I've never run it with long reads before, but I'd really like it to work with long reads, so I appreciate your help troubleshooting this. The mapping_info.tsv file you attached didn't work; I believe you have to attach it on GitHub rather than email it. It would also be great if you could attach the SNVs.tsv file as well. I'll look at the file once I get it, but in thinking through potential problems, do you happen to know if you have multiple reads with the same name? inStrain treats reads with the same name as being the "same", so that could mess with the count? Best, |
Hi Matt, Thanks, [inStrain_rerun_barcode85_mapping_info_tsv.txt](https://github.com/MrOlm/inStrain/files/7755872/in |
OK- I think I got to the root of the problem. Thanks for your help. When calling SNVs, inStrain uses an empirically generated distribution of "expected" variants based on Q30 bases. So when calling SNVs (and calculating coverage, it seems), it only looks at reads with q30. I assume that because you have long reads, many don't reach this q30 threshold, and only the ones that do are "counted" in the SNV table. Adjusting the minimum threshold to be something other than Q30 is trivial (it's line 154 in the file "profile_utilities.py"), but adjusting the SNV-calling assumptions and distributions is complicated. This is something that I will fix, and I'll do it ASAP, but I'm not sure how long it will take. I'll of course ping you as soon as it's finished. But if you need an immediate solution, I'd say your options are to use another program, or to state that only reads with a quality sore of at least Q30 were used when calling SNVs. Sorry again for the confusion and thanks for your help tracking this down and helping me improve inStrain! -Matt |
Hi Matt, Thanks again, |
Hello,
I am using inStrain profile to calculate SNVs for long-read (nanopore) sequencing data. However, I am concerned that inStrain is discounting a large proportion of my input reads. The BAM file that I am using as input to inStrain profile has ~400x coverage. I found that after running inStrain profile, the coverage is between 100 and 200x in most positions. I have checked the ReadFiltering plot (attached to this post below) as well as the mapping_info.tsv and 91% (25,203 out of 27,747) of the reads are passing filter, so I am confused as to why the reported coverage is so much less.
The command I used is as follows:
inStrain profile -c 10 --pairing_filter all_reads --min_read_ani 0.92 /path/sorted.bam /path/reference.fasta -o /path/output/directory
Please advise ASAP as I am currently in the review process for manuscript. If I cannot get this resolved quickly, unfortunately I will have to use another tool for variant calling.
Thank you,
Candice
inStrain_rerun_barcode85_ReadFiltering_plot.pdf
The text was updated successfully, but these errors were encountered: