Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Alpha Fold structures to construct the torsion database #241

Open
joaomcteixeira opened this issue Mar 6, 2023 · 2 comments
Open
Assignees
Labels
enhancement New feature or request

Comments

@joaomcteixeira
Copy link
Member

When we first conceived IDPConfGen, we used Dunbrack's PISCES culled files as a list of non-redundant PDB files that we could use to generate IDPCG's database of observed torsion angles. However, PISCES culled lists update constantly (which is good) following the release of new PDB structures, forcing us to maintain registry of versions. Moreover, all this logic and infrastructure we built was before Alpha Fold 😉.

Now, we could now use Alpha Fold Homo sapiens predicted proteome to build a reliable torsion angle database that has a honest distribution of observed torsion angles and is not biased by criteria for non-redundancy and the experimental structures available. Besides, such database will not need constant updates (until AF produces a new dataset). I believe Alpha Fold structures are already devoided of structural inconsistencies, which would further improve the reliability of the database.

Alpha Homo sapiens database 👉 https://alphafold.ebi.ac.uk/download#proteomes-section

The Homo sapiens dataset is already extensive. But we can consider expanding it later with other model organisms. Considerations on file size are necessary.

We cannot take all parts of the structures that Alpha Folder predicted because of the presence of large disordered regions. But I think all residues with prediction scores above 70 are reliable.

If we do this, we can distribute to users the torsion database, reliably.

The existing clients used to create a database will still be useful (we should maintain them) but would be much less relevant.

What do you think?
Cheers,

@joaomcteixeira joaomcteixeira added the enhancement New feature or request label Mar 6, 2023
@menoliu
Copy link
Collaborator

menoliu commented Mar 6, 2023

Great point! I also concur that we should not consider any residues <=70 on their pLDDT metric as those models do not use the power of deep MSA. However, we must take into consideration of conditionally folded conformers? I.e. some disordered proteins are represented as conditional folders and those can be captured in AlphaFold... what to do about them?

@menoliu
Copy link
Collaborator

menoliu commented Mar 6, 2023

@joaomcteixeira I've just spoken to Julie and she's not very enthusiastic on the idea for us to use AF structures as those predicted structures are not equivalent to experimental structures. However I agree with Julie that we should at least update the database we're using with the latest PISCES/RCSB structures.

I was also thinking, the size of the database is somewhat related to the speed of building due to the filtering process, if our database is very large (with the human proteome) wouldn't that affect filtering for longer-ish IDPs? I think if the users wanted, they could use AF structures in their database nonetheless :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants