-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BamParser.Parse() returns null objects in rare instances #41
Comments
Hi, Thanks for commenting on this! This strikes me as a bit tricky, I think the method you're looking at: When coordinates are not specified these reads should be returned, so that's a bug as you point out. Right now it looks like this case is indicated with a 0/MaxInt for start/end, perhaps with a different sentinel value to indicate all reads regardless of overlap should be returned (
to |
I ran a couple of
I confirmed that if we change the start and end parameters to GetAlignedSequence to -1 and int.MaxValue (instead of 0 and int.MaxValue), that resolves this issue too. (There are several APIs which use those sentinel values; if we change one, we'd want to change them all for consistency) Yet another option: If GetAlignedSequence accepted nullable ints, I think that would make the logic clearer than having special explicit sentinel values like 0 or -1 and int.MaxValue. GetAlignmentMapIterator already has one nullable-int argument (refSeq) so making start and end nullable seems natural. |
Today I used dotnetbio to parse a bam file. In rare instances (for a total of 3 reads out of 1 million), BamParser.Parse() returned a null object rather than a SAMAlignedSequence object.
I stepped through in the debugger and I think these lines, in BamParser.cs, are the cause of my issue:
if (alignedSeq.RefEndPos - 1 < start && alignedSeq.RName!="*")
{
return null;
}
Here I'm reading the whole bam, so start == 0. This is a read which is unmapped, but it still has its Pos set to 1 and RName set because its mate was mapped with Pos==1. For the affected read, alignedSeq.RefEndPos is 0. By subtracting 1 from alignedSeq.RefEndPos we get -1, which is less than start == 0, so we return null.
This one-line change fixes the bug, and I believe it's correct in general - I confirmed the unit tests still pass:
if (alignedSeq.RefEndPos - 1 < start && alignedSeq.CIGAR != "*")
{
return null;
}
The text was updated successfully, but these errors were encountered: