Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model with 2 states for amino acids ? #12

Open
odannis opened this issue Jul 11, 2021 · 1 comment
Open

Model with 2 states for amino acids ? #12

odannis opened this issue Jul 11, 2021 · 1 comment

Comments

@odannis
Copy link
Contributor

odannis commented Jul 11, 2021

Hi,

I would like to infer a model on a generated data set where "proteins" have only 2 states available for each site. For example :

MSA = [ [ 1, 0, 1, ... , 0, 1, 0],
         ......,
        [0, 0, 1, ...., 1, 1, 0]] 

Could it be possible to infer a model with only 2 states for each amino acid?
Furthermore, could I use CCMgen for generating data with only 2 states based on the field inferred?

Thank you for your help

@croth1
Copy link

croth1 commented Jul 16, 2021

Hi @odannis,

sorry for the long wait - I have been busy lately. Generally speaking the two-state MRF is a special case of the standard 20 state mrf, where all the singleton potentials corresponding to the 18 "forbidden" states are set to -infinity.

By default training should make the singleton potentials small, given that there's no prior on the singletons. You need to be careful with pseudocounts, though! I would not add any, just to be sure.

If the "forbidden" amino acids still pop up during simulation, you could also modify the parameters manually and set the singleton potentials corresponding to the unwanted states to a large negative number.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants