-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discrepancy in gene ID mapping on same positions in different samples #195
Comments
Hello Thanks for your question and for using inStrain! The discrepancy you’re seeing is likely due to differences in coverage or read alignment at position 331 between the samples. Low coverage in SLG529 might prevent the gene from being mapped at that position, even if it's mapped nearby. Here’s what you can try:
If the issue persists, feel free to share more details! Best, |
Hey @MrOlm, thanks a lot for taking a look. Unless I'm missing something the Furthermore, it appears that the The same I also want to reiterate that this uneven mapping of genes was present at many other positions. I shared a subset of data but happy to share the full profile results over email, if you think that'll be helpful. Cheers! |
Ah- I see the problem now. In the case of SLG529, the variant at position 331 has 3 alleles detected (allele_count > 2). In this case inStrain does not attempt to classify the mutation (S, N, etc.) because it doesn't know what allele to use, resulting in it not attempting to link the gene. This isn't great behavior and I'll try and fix in the next update, but this is what's going on. Best, |
Hey @MrOlm, I had a similar intuition however, after inspecting the data a bit more I found sites with a gene mapped and allele count more than 2. For example, positions 144 and 650 in the SLG1132.txt file attach above has allele count of 3 but also has genes mapped. Also, these positions seem be identify P.S. Issue was closed by mistake. I have re-oped it. |
Mutation_type = M means that there are multiple genes at that position (https://instrain.readthedocs.io/en/latest/example_output.html) In that sample, 650 has a gene mapped and 144 is M? |
Both positions 144 and 650 have genes mapped. The |
InStrain thinks 144 has two genes mapped at that location; could that be the case? 650 seems to be working correctly. What is the problem with 650 in that sample? |
So there is no problem per se with both of those sites. You mentioned above
I just pointed out that this is not always the case as there are sites where genes have been mapped and the allele count is greater than 2, eg site 144 and 650. Plus despite having more than 2 alleles inStrain has classified the mutation at these sites (under column |
Hi,
First of all, thank you for developing this amazing tool!
I have encountered a discrepancy in the
inStrain profile
results regarding gene mapping at the same positions between different samples. Specifically:SLG529
, no gene is mapped at position 331 on scaffoldSLG1007_DASTool_bins_61.fa_k141_12596
.SLG1132
, the geneSLG1007_DASTool_bins_61.fa_k141_12596_2
is mapped at the same position (331) on the same scaffold (SLG1007_DASTool_bins_61.fa_k141_12596
).Additionally, in sample
SLG529
, the same gene (SLG1007_DASTool_bins_61.fa_k141_12596_2
) is mapped at positions immediately before and after position 331 (positions 301 and 332), but not at position 331.I ran inStrain profile using the same representative MAGs but with reads from different samples.
This is just one example, but these discrepancies were seen at many other sites as well. I'm unsure whether I'm missing something in my analysis or if this is a potential bug. Any clarification would be greatly appreciated.
SLG529
:inStrain profile SLG529.bam representative_megamag.fa -o SLG529.IS -p 64 -g megamag_representative.fna -s scaffoldtobin.stb --database_mode
SLG1132
:inStrain profile SLG1132.bam representative_megamag.fa -o SLG1132.IS -p 64 -g megamag_representative.fna -s scaffoldtobin.stb --database_mode
I've attached the relevant dataset for your reference.
Thank you!
SLG529.txt
SLG1132.txt
The text was updated successfully, but these errors were encountered: