Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dealing with uneven coverage #15

Open
keremozdel opened this issue Feb 1, 2024 · 3 comments
Open

Dealing with uneven coverage #15

keremozdel opened this issue Feb 1, 2024 · 3 comments

Comments

@keremozdel
Copy link

Hello,

Thank you for this great tool!
I'm working with amplicon sequencing data capturing SMN gene. However, my data demonstrate uneven coverage along the SMN1 gene, which has resulted in questionable phasing output. This is my first time struggling with a phasing experiment, and I was wondering if you have any suggestions regarding this issue? For example, can performing downsampling in the regions with higher coverage help? Also, could you please briefly explain why entire genome alignment is required for the phasing process instead of using a targeted reference sequence? I'm quite new to this field and it will help me understand the subject better.

Thank you very much for your guidance and insight.

@xiao-chen-xc
Copy link
Collaborator

Hi @keremozdel Paraphase is designed to work with shotgun type data like WGS and hybrid capture data. Amplicon data is very different and should be analyzed differently. Instead of phasing reads into haplotypes, with amplicon data you could simply cluster reads into consensus groups as they all start and end at the same positions. Are you capturing the SMN genes in just one amplicon? You can try the HiFi amplicon workflow (https://github.com/PacificBiosciences/hifi-amplicon-workflow). The clustering tool it uses is pbaa (https://github.com/PacificBiosciences/pbAA).

@keremozdel
Copy link
Author

Hi again @xiao-chen-xc ,

I'm working with multiplex PCR, which includes several amplicons. Would it be correct to say that amplicon data does not provide sufficient resolution for accurate phasing? I will look into the clustering as you suggested.

Thank you for your help.

@xiao-chen-xc
Copy link
Collaborator

xiao-chen-xc commented Feb 2, 2024

Would it be correct to say that amplicon data does not provide sufficient resolution for accurate phasing?

No, you can get accurate haplotypes out of amplicon data through clustering. With multiple amplicons, you can cluster each amplicon first and then piece together the consensus sequences from several amplicons.

We have some experience working with SMN amplicon data internally. I'd be happy to take a look at your data if you like. Feel free to email me at [email protected].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants