-
Notifications
You must be signed in to change notification settings - Fork 10
Compute and Visualize Contact Maps
First, we'll use CCMpredPy to learn evolutionary couplings characteristic for our example protein family by maximizing the pseudo-likelihood of the Markov Random Field (MRF) model.
A contact map can be computed from the coupling coefficients of the MRF using the standard L2 norm score. Typically, corrections are applied to this matrix to remove entropy and phylogenetic bias.
ccmpred data/1atzA.fas --ofn-pll \
--plot-opt-progress data/1atzA.log.html \
-m data/1atzA.raw.mat \
--apc data/1atzA.apc.mat \
--entropy-correction data/1atzA.ec.mat \
CCMpredPy output will show as follows:
┏━╸┏━╸┏┳┓┏━┓┏━┓┏━╸╺┳┓┏━┓╻ ╻ version 1.0.0
┃ ┃ ┃┃┃┣━┛┣┳┛┣╸ ┃┃┣━┛┗┳┛ Vorberg, Seemayer and Soeding (2018)
┗━╸┗━╸╹ ╹╹ ╹┗╸┗━╸╺┻┛╹ ╹ https://github.com/soedinglab/ccmgen
Using 1 threads for OMP parallelization.
1atzA is of length L=75 and there are 3068 sequences in the alignment.
Alignment has diversity [sqrt(N)/L]=0.739 and Neff(HHsuite-like)=5.492.
Number of effective sequences after simple reweighting (id-threshold=0.8, ignore_gaps=False): 1149.44.
Calculating AA Frequencies with 0.00087 percent pseudocounts (uniform_pseudocounts 1 1)
L₂ regularization (λsingle=10 λpairfactor=0.2 λpair=0.2)
Plot with optimization statistics will be written to data/1atzA.log.html
Will optimize 3781600 float64 variables wrt PLL
and L₂ regularization (λsingle=10 λpairfactor=0.2 λpair=0.2)
Optimizer: LBFGS optimization
convergence criteria: maxit=2000
[ removed the per-iteration statistics for brevity ]
Finished with code 0 -- CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH
Compute contact map using frobenius norm of couplings.
Apply Average Product Correction (APC).
Apply entropy correction (using 20 states and log2).
Writing contact matrices to:
data/1atzA.raw.mat
data/1atzA.apc.mat
data/1atzA.ec.mat
We now have several new files in data/
:
- a summed contact score matrix
1atzA.raw.mat
with raw contact scores - a summed contact score matrix
1atzA.apc.mat
with contact scores that have been corrected with the Average Product Correction (APC) - a summed contact score matrix
1atzA.ec.mat
with contact scores that have been corrected for entropy bias - an interactive html file
1atzA.log.html
visualizing the optimization log
Contact Maps that have been generated with the CCMpredPy -m
flag can be visualized as .html file using the following command:
ccm_plot cmap \
--mat-file data/1atzA.apc.mat \
--alignment-file data/1atzA.fas \
--pdb-file data/1atzA.pdb \
--plot-file data/1atzA.apc.html \
--seq-sep 4 --contact-threshold 8
Specifying the original alignment file with the flag --alignment-file
will add a subplot with a per-column entropy line graph.
Specifying a reference PDB structure with the flag --pdb-file
will show the observed pairwise amino distance in the lower triangle of the matrix (Note that numbering of residues in the PDB file must begin with 1 and match the dimensions of the contact matrix file!).
The C_beta distance threshold for defining true positive contacts can be specified with the flag --contact-threshold
and residue pairs along the diagonal can be masked by specifying a sequence separation cutoff with the flag --seq-sep
.
The contact map will look like this: