-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dealing with uneven coverage #15
Comments
Hi @keremozdel Paraphase is designed to work with shotgun type data like WGS and hybrid capture data. Amplicon data is very different and should be analyzed differently. Instead of phasing reads into haplotypes, with amplicon data you could simply cluster reads into consensus groups as they all start and end at the same positions. Are you capturing the SMN genes in just one amplicon? You can try the HiFi amplicon workflow (https://github.com/PacificBiosciences/hifi-amplicon-workflow). The clustering tool it uses is pbaa (https://github.com/PacificBiosciences/pbAA). |
Hi again @xiao-chen-xc , I'm working with multiplex PCR, which includes several amplicons. Would it be correct to say that amplicon data does not provide sufficient resolution for accurate phasing? I will look into the clustering as you suggested. Thank you for your help. |
No, you can get accurate haplotypes out of amplicon data through clustering. With multiple amplicons, you can cluster each amplicon first and then piece together the consensus sequences from several amplicons. We have some experience working with SMN amplicon data internally. I'd be happy to take a look at your data if you like. Feel free to email me at [email protected]. |
Hello,
Thank you for this great tool!
I'm working with amplicon sequencing data capturing SMN gene. However, my data demonstrate uneven coverage along the SMN1 gene, which has resulted in questionable phasing output. This is my first time struggling with a phasing experiment, and I was wondering if you have any suggestions regarding this issue? For example, can performing downsampling in the regions with higher coverage help? Also, could you please briefly explain why entire genome alignment is required for the phasing process instead of using a targeted reference sequence? I'm quite new to this field and it will help me understand the subject better.
Thank you very much for your guidance and insight.
The text was updated successfully, but these errors were encountered: