-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
intention of NO_READS_READLEN #74
Comments
For long read data there are likely to be far fewer reads that short read, so its possible that a long read cram may not get to 50,000 reads total. This tool was built for short read data where there is very small variance in the read lengths, Usually there is just a little bit of base clipping off the ends. The majority of reads will have the same length. That is not true with long read data where there is a wide population of possible read lengths. Without significant testing and a full code/logic review I don't know if this is appropriate to run over long read data. |
Yes, agree with @AndyMenzies we might need to add test case for long read data and possibly a parameter to run in long read mode to ignore the read length and other short read specific cutoffs. |
Also agreed, it's on the todo list. Nevertheless still curious as to why 50,000 reads |
If its just pulling from the beginning of the cram file its likely starting at the beginning of chr1. Things map strangely at the ends of chromosomes so 50K reads may be there to move away from the telomere and into better sequence. |
_get_reads_length
returns nothing when the constantNO_READS_READLEN
is not exceeded in the final if condition, propagating an undefined variable down the chain until the program crashes. We've got users trying to run the tool on long read data with low depth - which I appreciate the tool wasn't designed for. The most obvious way to fix this problem would be to lower the value ofNO_READS_READLEN
, currently at 50,000. And notably judging by the commit history this did used to be lower - originally the return conditional checked against a hardcoded value of 1000. However, maybe this is a very bad fix, and I don't want to go changing things without understanding the intention of these checks. @davidrajones git reckons these bits of the codebase were written/modified by you. I know it was quite a while ago, but do you have insight into this? Many thanksThe text was updated successfully, but these errors were encountered: