Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FASTA conversion into A3M should be gapless #98

Open
FilomenoSanchez opened this issue Jan 20, 2022 · 1 comment
Open

FASTA conversion into A3M should be gapless #98

FilomenoSanchez opened this issue Jan 20, 2022 · 1 comment
Assignees

Comments

@FilomenoSanchez
Copy link
Member

FilomenoSanchez commented Jan 20, 2022

The query sequence in the A3M format should be gapless as discussed in #96. The hhstuite provides a script reformat.pl capable of dealing with this, take a look and try to add this into conkit.

@sadiogo
Copy link

sadiogo commented Jan 31, 2022

I was thinking about this code and it might be more simple than it looks. Here's how to do it:

  1. Parse the sequences in the alignment, convert them into list and store them in a tuple. Create a variable called gap_postions = [].

  2. Find all the gap indexes in the first sequence, store them in gap_positions and then remove the gaps using the pop() method.

  3. In all the other non-query sequences, run a for loop and inquire if the position in gap_positions is equal to a gap. If Yes, remove it using pop(), else convert letter into lowercase.

That's it. Of course, there might be more efficient ways to do it, but those are the basics.

However, there are many programs that don't use the lowercase letters (which indicate insertions relative to the query sequence) and wrongly ask for alignments in the a3m format (which necessarily must display the insertions). Thus, conkit_convert could also provide an output without insertions, in which case you just need to remove the letters instead of converting them to lowercase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants