Using Alpha Fold structures to construct the torsion database #241

joaomcteixeira · 2023-03-06T18:01:02Z

When we first conceived IDPConfGen, we used Dunbrack's PISCES culled files as a list of non-redundant PDB files that we could use to generate IDPCG's database of observed torsion angles. However, PISCES culled lists update constantly (which is good) following the release of new PDB structures, forcing us to maintain registry of versions. Moreover, all this logic and infrastructure we built was before Alpha Fold 😉.

Now, we could now use Alpha Fold Homo sapiens predicted proteome to build a reliable torsion angle database that has a honest distribution of observed torsion angles and is not biased by criteria for non-redundancy and the experimental structures available. Besides, such database will not need constant updates (until AF produces a new dataset). I believe Alpha Fold structures are already devoided of structural inconsistencies, which would further improve the reliability of the database.

Alpha Homo sapiens database 👉 https://alphafold.ebi.ac.uk/download#proteomes-section

The Homo sapiens dataset is already extensive. But we can consider expanding it later with other model organisms. Considerations on file size are necessary.

We cannot take all parts of the structures that Alpha Folder predicted because of the presence of large disordered regions. But I think all residues with prediction scores above 70 are reliable.

If we do this, we can distribute to users the torsion database, reliably.

The existing clients used to create a database will still be useful (we should maintain them) but would be much less relevant.

What do you think?
Cheers,

menoliu · 2023-03-06T18:08:27Z

Great point! I also concur that we should not consider any residues <=70 on their pLDDT metric as those models do not use the power of deep MSA. However, we must take into consideration of conditionally folded conformers? I.e. some disordered proteins are represented as conditional folders and those can be captured in AlphaFold... what to do about them?

menoliu · 2023-03-06T18:26:32Z

@joaomcteixeira I've just spoken to Julie and she's not very enthusiastic on the idea for us to use AF structures as those predicted structures are not equivalent to experimental structures. However I agree with Julie that we should at least update the database we're using with the latest PISCES/RCSB structures.

I was also thinking, the size of the database is somewhat related to the speed of building due to the filtering process, if our database is very large (with the human proteome) wouldn't that affect filtering for longer-ish IDPs? I think if the users wanted, they could use AF structures in their database nonetheless :)

joaomcteixeira added the enhancement New feature or request label Mar 6, 2023

joaomcteixeira assigned joaomcteixeira and menoliu Mar 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using Alpha Fold structures to construct the torsion database #241

Using Alpha Fold structures to construct the torsion database #241

joaomcteixeira commented Mar 6, 2023

menoliu commented Mar 6, 2023

menoliu commented Mar 6, 2023

Using Alpha Fold structures to construct the torsion database #241

Using Alpha Fold structures to construct the torsion database #241

Comments

joaomcteixeira commented Mar 6, 2023

menoliu commented Mar 6, 2023

menoliu commented Mar 6, 2023