intention of NO_READS_READLEN #74

blex-max · 2024-11-21T16:34:08Z

_get_reads_length returns nothing when the constant NO_READS_READLEN is not exceeded in the final if condition, propagating an undefined variable down the chain until the program crashes. We've got users trying to run the tool on long read data with low depth - which I appreciate the tool wasn't designed for. The most obvious way to fix this problem would be to lower the value of NO_READS_READLEN, currently at 50,000. And notably judging by the commit history this did used to be lower - originally the return conditional checked against a hardcoded value of 1000. However, maybe this is a very bad fix, and I don't want to go changing things without understanding the intention of these checks. @davidrajones git reckons these bits of the codebase were written/modified by you. I know it was quite a while ago, but do you have insight into this? Many thanks

The text was updated successfully, but these errors were encountered:

AndyMenzies · 2024-11-26T16:17:26Z

For long read data there are likely to be far fewer reads that short read, so its possible that a long read cram may not get to 50,000 reads total.

This tool was built for short read data where there is very small variance in the read lengths, Usually there is just a little bit of base clipping off the ends. The majority of reads will have the same length. That is not true with long read data where there is a wide population of possible read lengths. Without significant testing and a full code/logic review I don't know if this is appropriate to run over long read data.

sb43 · 2024-11-27T09:29:45Z

Yes, agree with @AndyMenzies we might need to add test case for long read data and possibly a parameter to run in long read mode to ignore the read length and other short read specific cutoffs.

blex-max · 2024-11-27T10:48:29Z

Also agreed, it's on the todo list. Nevertheless still curious as to why 50,000 reads

AndyMenzies · 2024-12-02T09:31:06Z

If its just pulling from the beginning of the cram file its likely starting at the beginning of chr1. Things map strangely at the ends of chromosomes so 50K reads may be there to move away from the telomere and into better sequence.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

intention of NO_READS_READLEN #74

intention of NO_READS_READLEN #74

blex-max commented Nov 21, 2024

AndyMenzies commented Nov 26, 2024

sb43 commented Nov 27, 2024

blex-max commented Nov 27, 2024

AndyMenzies commented Dec 2, 2024

intention of NO_READS_READLEN #74

intention of NO_READS_READLEN #74

Comments

blex-max commented Nov 21, 2024

AndyMenzies commented Nov 26, 2024

sb43 commented Nov 27, 2024

blex-max commented Nov 27, 2024

AndyMenzies commented Dec 2, 2024