-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is it possible to check the coding potential with only one species mRNA sequence? #28
Comments
No, a single sequence is not sufficient for PhyloCSF. PhyloCSF gets most of its information from substitutions between different species. If you give it a single sequence, it will give you a score based only on codon frequencies, which is better than random but not accurate enough to call a single transcript protein-coding. When doing an ORF search it could easily find something with a positive score even if it is non-coding.
It is a good idea to include the --bls option when running PhyloCSF, so it will report the "branch length score", which is just a number from 0 to 1 that is the fraction of the phylogenetic tree that is present in your local alignment. If it is much less than 1 then you probably don't have enough alignment for PhyloCSF to compute a meaningful score.
In your particular case, Aldh2 is already a known coding gene, so what are you trying to find out? Are these unannotated alternative transcripts? If so, do they have any portions that are already annotated as coding? Those portions would give a positive PhyloCSF signal even if the rest of your transcript is not coding. I'd suggest you write your transcripts in BED format and load them as custom tracks in the UCSC genome browser (genome.ucsc.edu). Turn on the PhyloCSF track hub, which is listed among the public track hubs, and see if there is a PhyloCSF signal in any of the portions of your transcripts that are not already annotated as coding.
… On Apr 17, 2020, at 3:26 AM, nowandnow ***@***.***> wrote:
I did three tests with the following dataset (Aldh2.mRNA.fa, Aldh2.mRNA.test.fa, Aldh2.mRNA.test2.fa)
I got the same amino acid as a result. (The max score has changed, but it is equally positive score.)
Can I check the coding potential with only the mRNA sequence now?
below is my commands and results.
PhyloCSF 29mammals Aldh2.mRNA.fa --aa --orf=ATGStop --frames=3 --removeRefGaps
-> Aldh2.mRNA.fa max_score(decibans) 2013.9264 343 1899 MLRAAL...
PhyloCSF 29mammals Aldh2.mRNA.test.fa --aa --orf=ATGStop --frames=3 --removeRefGaps
-> Aldh2.mRNA.test.fa max_score(decibans) 760.1431 343 1899 MLRAAL...
PhyloCSF 29mammals Aldh2.mRNA.test2.fa --aa --orf=ATGStop --frames=3
-> Aldh2.mRNA.test.fa max_score(decibans) 760.1431 343 1899 MLRAAL...
fasta.zip <https://github.com/mlin/PhyloCSF/files/4491497/fasta.zip>
The max score was reduced, but it was positive value.
Based on this, can I think of this mRNA as a coding sequence?
I am working on checking whether a novel rna is a coding sequence or a non-coding sequence.
I want to use PhyloCSF as one of the tools to check it.
Yours sincerely.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub <#28>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJ4HUJ2UFVZLAVPY5QJ6XTRNAAAXANCNFSM4MKQQKHQ>.
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I did three tests with the following dataset (Aldh2.mRNA.fa, Aldh2.mRNA.test.fa, Aldh2.mRNA.test2.fa)
I got the same amino acid as a result. (The max score has changed, but it is equally positive score.)
Can I check the coding potential with only the mRNA sequence now?
below is my commands and results.
PhyloCSF 29mammals Aldh2.mRNA.fa --aa --orf=ATGStop --frames=3 --removeRefGaps
-> Aldh2.mRNA.fa max_score(decibans) 2013.9264 343 1899 MLRAAL...
PhyloCSF 29mammals Aldh2.mRNA.test.fa --aa --orf=ATGStop --frames=3 --removeRefGaps
-> Aldh2.mRNA.test.fa max_score(decibans) 760.1431 343 1899 MLRAAL...
PhyloCSF 29mammals Aldh2.mRNA.test2.fa --aa --orf=ATGStop --frames=3
-> Aldh2.mRNA.test.fa max_score(decibans) 760.1431 343 1899 MLRAAL...
fasta.zip
The max score was reduced, but it was positive value.
Based on this, can I think of this mRNA as a coding sequence?
I am working on checking whether a novel rna is a coding sequence or a non-coding sequence.
I want to use PhyloCSF as one of the tools to check it.
Yours sincerely.
The text was updated successfully, but these errors were encountered: