Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

understanding the energy threshold (cutoff) values after 0.0.20 #136

Open
SeanR22 opened this issue Jun 21, 2021 · 9 comments
Open

understanding the energy threshold (cutoff) values after 0.0.20 #136

SeanR22 opened this issue Jun 21, 2021 · 9 comments
Labels
discussion Discuss new ideas or implementations

Comments

@SeanR22
Copy link

SeanR22 commented Jun 21, 2021

"IDPconfgen build" is working with new energy flags (-etss 5000 and -etbb 100) but I would estimate that it is running 2 to 3 orders of magnitude slower when keeping all other parameters the same as before.

@joaomcteixeira
Copy link
Member

Strange. And if you put the thresholds the same, can you reproduce previous speed?

@SeanR22
Copy link
Author

SeanR22 commented Jun 21, 2021

I can try but I don't know what these values were in the code before.

@joaomcteixeira
Copy link
Member

Before there was only -et which by default was 10. Before, -et controlled the energy threshold to accept or reject a conformer or chunk at both backbone constructions or sidechain packing at the end. Now you can control the threshold for both processes separately. If you set -etss 10 -etbb 10 should be the same as -et 10 in versions before 0.0.20.

@joaomcteixeira joaomcteixeira added the discussion Discuss new ideas or implementations label Jun 21, 2021
@SeanR22
Copy link
Author

SeanR22 commented Jun 21, 2021

I switched both -etss and -etbb to the same value as I had before for -et (50 000) and now it is working fast again but I am still getting a similar amount of low energy structures (under 1000 kcal/mol total).

Interestingly, with my previous flags with lower values, everything took longer - both setting up and making the conformers.

@SeanR22
Copy link
Author

SeanR22 commented Jun 21, 2021

I guess I don't understand why the energy cutoffs for the final structures would affect the build speed unless these values are linked to how chunks are selected or added?

@SeanR22
Copy link
Author

SeanR22 commented Jun 21, 2021

OK, we are in business. If I set the backbone cutoff to 1000 kcal/mol but the sidechain cutoff to 50 000 it works reasonably fast and I get a much greater proportion of low energy structures compared to having both set to 50 000.

@joaomcteixeira
Copy link
Member

That is fantastic and is exactly what I was expecting to hear 😉. With those settings the building of the backbone is a more restrictive process while the placing of the sidechains is more relaxed, in the sense that more violations are allowed. Note that, to date, no one has published a sidechain packing algorithm for IDPs. We are currently using FASPR as an adaptation, but they do reference in their paper that no quality tests were done for IDPs.

Answering your question: Yes! the energy cutoffs affect the selection of each chunk and the structure as a whole at the end. The new -etbb affects the acceptance of each chunk. Regardless of the size of the chunk, after each chunk addition the energy is calculated and if it is higher than the -etbb value, the last chunk is discarded (or more than one chunk if too many failures accumulate). The -etss functions when the backbone is already built. If addition of the sidechains fail with an energy value over the curtoff, the whole conformer is discarded because we assume the "backbone has a strange conformation unable to accept feasible sidechains".

Thanks @SeanR22 ! 😉

@joaomcteixeira joaomcteixeira changed the title version 0.0.20 build function running very slowly understanding the energy threshold (cutoff) values after 0.0.20 Jun 24, 2021
@SeanR22
Copy link
Author

SeanR22 commented Jun 24, 2021

Thanks @joaomcteixeira !

This makes sense to me now. Being able to decouple the backbone and sidechain energy cutoffs is a great improvement.

I am testing how different energy cutoffs affect the speed and quality of the conformers.

So far it seems that the sidechain packing is working well because even if I set the sidechain energy cutoff to 50000 and the backbone cutoff to 1000, I mostly end up with low energy structures. This finding is shown by the following histogram of a 500 conformer run (x-axis Energy, y-axis number of conformers):

2_8_run_7_energies_hist

Within the day I will upload some code I have been working on for analyzing the size and energy distributions of the conformers generated by idpconfgen.

Here is a histogram of the size distribution (x-axis size - Rg in angstroms, y-axis number of conformers) of the same run shown above:

2_8_run_7_Rg_hist

Cheers!

@SeanR22
Copy link
Author

SeanR22 commented Jun 25, 2021

To add to this discussion...

Here are a few different energy histograms of runs using -etbb 1000 and -etss 10000 on sequences of similar lengths (~120 residues). The only notable difference is that the run shown in the top figure is on a protein sequence of lower complexity than the figure in the bottom panel. The bottom sequence has less glycine containing repeats and a higher proportion of bulky hydrophobic residues. The higher complexity sequence run (bottom panel) also took about 10 times as long to reach 500 structures that met the energy requirements of the input parameters.

2_8_run_12_energies_hist

8_14_run_12_energies_hist

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Discuss new ideas or implementations
Projects
None yet
Development

No branches or pull requests

2 participants