Merge pull request #26 from chemle/feature_aev

Switch to AEV based models
chemle · Oct 10, 2024 · 9b8d7c3 · 9b8d7c3
2 parents 9f9e78d + 1a1e9dc
commit 9b8d7c3
Show file tree

Hide file tree

Showing 26 changed files with 3,340 additions and 411 deletions.
diff --git a/.gitattributes b/.gitattributes
@@ -1,2 +1 @@
-mlmm/_version.py export-subst
 emle/_version.py export-subst
diff --git a/.github/workflows/main.yaml b/.github/workflows/main.yaml
@@ -53,11 +53,9 @@ jobs:
           activate-environment: emle
           environment-file: environment.yaml
           miniforge-version: latest
-          miniforge-variant: Mambaforge
-          use-mamba: true
 #
-      - name: Install pytest
-        run: mamba install pytest
+      - name: Install additional test dependencies
+        run: conda install pytest
 #
       - name: Install the package
         run: pip install .

diff --git a/MANIFEST.in b/MANIFEST.in
diff --git a/README.md b/README.md
@@ -3,6 +3,10 @@
 [![GitHub Actions](https://github.com/chemle/emle-engine/actions/workflows/main.yaml/badge.svg)](https://github.com/chemle/emle-engine/actions/workflows/main.yaml)
 [![License: GPL v2](https://img.shields.io/badge/License-GPL_v2-blue.svg)](https://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html)
 
+![Emily Engine](emily_engine.jpg)
+
+(Mascot courtesy [Nictrain123](https://www.deviantart.com/nictrain123/art/Simply-Emily-774815887) ![CC BY 3.0](https://licensebuttons.net/l/by/3.0/80x15.png).)
+
 A simple interface to allow electrostatic embedding of machine learning
 potentials using an [ORCA](https://orcaforum.kofo.mpg.de/i-nde-x.php-)-like interface. Based on [code](https://github.com/emedio/embedding) by Kirill Zinovjev. An example [sander](htps://ambermd.org/AmberTools.h) implementation is provided. This
 works by reusing the existing interface between sander and [ORCA](https://orcaforum.kofo.mpg.de/index.php), meaning
@@ -38,13 +42,13 @@ environment that is compatible with your CUDA driver.)
 Finally, install `emle-engine`:
 
 ```sh
-python setup.py install
+pip install .
 ```
 
 If you are developing and want an editable install, use:
 
 ```sh
-python setup.py develop
+pip install -e .
 ```
 
 ## Usage
@@ -156,7 +160,18 @@ regardless of where it is launched.
 
 We also support the use [Rascal](https://github.com/lab-cosmo/librascal)
 for the calculation of delta-learning corrections to the in vacuo energies and
-gradients. To use, you will need to specify a model file using the `--rascal-model`
+gradients. To use, you will first need to create an environment with the additional
+dependencies:
+
+```sh
+conda env create -f environment_rascal.yaml
+conda activate emle-rascal
+```
+
+(These are not included in the default environment as they limit the supported
+Python versions.)
+
+Then, specify a model file using the `--rascal-model`
 command-line argument, or via the `EMLE_RASCAL_MODEL` environment variable.
 
 Note that the chosen [backend](#backends) _must_ match the one used to train the model. At
@@ -210,6 +225,15 @@ the environment variable) or a path to a file. When using a file, this should
 be formatted as a single column, with one line per QM atom. The units
 are electron charge.
 
+## Alpha mode
+
+We support two methods for the calculation of atomic polarisabilities. The
+default, `species`, uses a single volume scaling factor for each species.
+Alternatively, `reference`, calculates the scaling factors using Gaussian
+Process Regression (GPR) using the values learned for each reference environment.
+The alpha mode can be specified using the `--alpha-mode` command-line argument,
+or via the `EMLE_ALPHA_MODE` environment variable.
+
 ## Logging
 
 Energies can be written to a file using the `--energy-file` command-line argument
@@ -327,9 +351,8 @@ energies.
 
 We provide an interface between `emle-engine` and [OpenMM](https://openmm.org) via the
 [Sire](https://sire.openbiosim.org/) molecular simulation framework. This allows QM/MM simulations
-to be run with OpenMM using EMLE for the embedding model. This provides improved
-performance and flexibility in comparison to the `sander` interface, although
-the implementation should currently be treated as being _experimental_.
+to be run with OpenMM using EMLE for the embedding model. This provides greatly
+improved performance and flexibility in comparison to the `sander` interface.
 
 To use, first create an `emle-sire` conda environment:
 
@@ -341,16 +364,24 @@ conda activate emle-sire
 Next install `emle-engine` into the environment:
 
 ```sh
-python setup.py install
+pip install .
 ```
 
 For instructions on how to use the `emle-sire` interface, see the tutorial
-documentation [here](https://github.com/OpenBioSim/sire/blob/feature_emle/doc/source/tutorial/partXX/02_emle.rst).
+documentation [here](https://github.com/OpenBioSim/sire/tree/devel/doc/source/tutorial/part08/02_emle.rst).
 
 When performing end-state correction simulations using the `emle-sire` interface
 there is no need to specify the `lambda_interpolate` keyword when creating an
 `EMLECalculator` instance. Instead, interpolation can be enabled when creating a
-`Sire` dynamics object via the same keyword. (See the [tutorial](https://github.com/OpenBioSim/sire/blob/feature_emle/doc/source/tutorial/partXX/02_emle.rst) for details.)
+`Sire` dynamics object via the same keyword. (See the [tutorial](https://github.com/OpenBioSim/sire/tree/devel/doc/source/tutorial/part08/02_emle.rst) for details.)
+
+## Torch models
+
+The `emle.models` module provides a number of `torch` models. The base `EMLE` model
+can be used to compute the EMLE energy in isolation. The combined `ANI2xEMLE`
+and `MACEEMLE` models allow the computation of in vacuo and embedding energies
+in one go, using the [ANI2x](https://github.com/aiqm/torchani) and [MACE](https://github.com/ACEsuit/mace) models respectively. Creating additional models is straightforward. For details of how to use the `torch` models,
+see the tutorial documentation [here](https://github.com/OpenBioSim/sire/blob/feature_emle/doc/source/tutorial/part08/02_emle.rst#creating-an-emle-torch-module).
 
 ## Issues
 

diff --git a/bin/emle-server b/bin/emle-server
@@ -19,7 +19,7 @@
 # GNU General Public License for more details.
 #
 # You should have received a copy of the GNU General Public License
-# along with EMLE-Engine If not, see <http://www.gnu.org/licenses/>.
+# along with EMLE-Engine. If not, see <http://www.gnu.org/licenses/>.
 #####################################################################
 
 import argparse
@@ -58,7 +58,12 @@ try:
 except:
     port = None
 model = os.getenv("EMLE_MODEL")
+try:
+    species = [int(x) for x in os.getenv("EMLE_SPECIES").split(",")]
+except:
+    species = None
 method = os.getenv("EMLE_METHOD")
+alpha_mode = os.getenv("EMLE_ALPHA_MODE")
 mm_charges = os.getenv("EMLE_MM_CHARGES")
 try:
     num_clients = int(os.getenv("EMLE_NUM_CLIENTS"))
@@ -85,6 +90,10 @@ try:
     qm_xyz_frequency = int(os.getenv("EMLE_QM_XYZ_FREQUENCY"))
 except:
     qm_xyz_frequency = 0
+try:
+    ani2x_model_index = int(os.getenv("EMLE_ANI2X_MODEL_INDEX"))
+except:
+    ani2x_model_index = None
 rascal_model = os.getenv("EMLE_RASCAL_MODEL")
 parm7 = os.getenv("EMLE_PARM7")
 try:
@@ -124,7 +133,9 @@ env = {
     "host": host,
     "port": port,
     "model": model,
+    "species": species,
     "method": method,
+    "alpha_mode": alpha_mode,
     "mm_charges": mm_charges,
     "num_clients": num_clients,
     "backend": backend,
@@ -136,6 +147,7 @@ env = {
     "deepmd_deviation_threshold": deepmd_deviation_threshold,
     "qm_xyz_file": qm_xyz_file,
     "qm_xyz_frequency": qm_xyz_frequency,
+    "ani2x_model_index": ani2x_model_index,
     "rascal_model": rascal_model,
     "lambda_interpolate": lambda_interpolate,
     "interpolate_steps": interpolate_steps,
@@ -177,13 +189,27 @@ parser.add_argument("--port", type=str, help="the port number", required=False)
 parser.add_argument(
     "--model", type=str, help="path to an EMLE model file", required=False
 )
+parser.add_argument(
+    "--species",
+    type=str,
+    nargs="*",
+    help="the species supported by the model",
+    required=False,
+)
 parser.add_argument(
     "--method",
     type=str,
     help="the embedding method to use",
     choices=["electrostatic", "mechanical", "nonpol", "mm"],
     required=False,
 )
+parser.add_argument(
+    "--alpha-mode",
+    type=str,
+    help="the alpha mode to use for the embedding method",
+    choices=["species", "reference"],
+    required=False,
+)
 parser.add_argument(
     "--mm-charges",
     type=str,

diff --git a/bin/orca b/bin/orca
@@ -19,7 +19,7 @@
 # GNU General Public License for more details.
 #
 # You should have received a copy of the GNU General Public License
-# along with EMLE-Engine If not, see <http://www.gnu.org/licenses/>.
+# along with EMLE-Engine. If not, see <http://www.gnu.org/licenses/>.
 #####################################################################
 
 import os

diff --git a/emily_engine.jpg b/emily_engine.jpg
diff --git a/emle/_sander_calculator.py b/emle/_sander_calculator.py
@@ -17,7 +17,7 @@
 # GNU General Public License for more details.
 #
 # You should have received a copy of the GNU General Public License
-# along with EMLE-Engine If not, see <http://www.gnu.org/licenses/>.
+# along with EMLE-Engine. If not, see <http://www.gnu.org/licenses/>.
 #####################################################################
 
 """ASE sander calculator implementation."""

diff --git a/emle/_socket.py b/emle/_socket.py
@@ -17,7 +17,7 @@
 # GNU General Public License for more details.
 #
 # You should have received a copy of the GNU General Public License
-# along with EMLE-Engine If not, see <http://www.gnu.org/licenses/>.
+# along with EMLE-Engine. If not, see <http://www.gnu.org/licenses/>.
 #####################################################################
 
 """Simple TCP socket-server implementation."""

diff --git a/emle/_utils.py b/emle/_utils.py
@@ -0,0 +1,128 @@
+#######################################################################
+# EMLE-Engine: https://github.com/chemle/emle-engine
+#
+# Copyright: 2023-2024
+#
+# Authors: Lester Hedges   <[email protected]>
+#          Kirill Zinovjev <[email protected]>
+#
+# EMLE-Engine is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 2 of the License, or
+# (at your option) any later version.
+#
+# EMLE-Engine is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with EMLE-Engine. If not, see <http://www.gnu.org/licenses/>.
+#####################################################################
+
+"""EMLE utilities."""
+
+__author__ = "Lester Hedges"
+__email__ = "[email protected]"
+
+
+def _fetch_resources():
+    """Fetch resources required for EMLE."""
+
+    import os as _os
+    import pygit2 as _pygit2
+
+    # Create the name for the expected resources directory.
+    resource_dir = _os.path.join(
+        _os.path.dirname(_os.path.abspath(__file__)), "resources"
+    )
+
+    # Check if the resources directory exists.
+    if not _os.path.exists(resource_dir):
+        # If it doesn't, clone the resources repository.
+        print("Downloading EMLE resources...")
+        _pygit2.clone_repository(
+            "https://github.com/chemle/emle-models.git", resource_dir
+        )
+    else:
+        # If it does, open the repository and pull the latest changes.
+        repo = _pygit2.Repository(resource_dir)
+        _pull(repo)
+
+
+# The MIT License (MIT)
+
+# Copyright (c) 2015 Michael Boselowitz
+
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice shall be included in all
+# copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+# SOFTWARE.
+
+
+def _pull(repo, remote_name="origin", branch="main"):
+    """
+    Pull the latest changes from the remote repository.
+
+    Taken from:
+    https://github.com/MichaelBoselowitz/pygit2-examples/blob/master/examples.py
+    """
+
+    import pygit2 as _pygit2
+
+    for remote in repo.remotes:
+        if remote.name == remote_name:
+            remote.fetch()
+            remote_master_id = repo.lookup_reference(
+                "refs/remotes/origin/%s" % (branch)
+            ).target
+            merge_result, _ = repo.merge_analysis(remote_master_id)
+            # Up to date, do nothing
+            if merge_result & _pygit2.GIT_MERGE_ANALYSIS_UP_TO_DATE:
+                return
+            # We can just fastforward
+            elif merge_result & _pygit2.GIT_MERGE_ANALYSIS_FASTFORWARD:
+                print("Updating EMLE resources...")
+                repo.checkout_tree(repo.get(remote_master_id))
+                try:
+                    master_ref = repo.lookup_reference("refs/heads/%s" % (branch))
+                    master_ref.set_target(remote_master_id)
+                except KeyError:
+                    repo.create_branch(branch, repo.get(remote_master_id))
+                repo.head.set_target(remote_master_id)
+            elif merge_result & _pygit2.GIT_MERGE_ANALYSIS_NORMAL:
+                print("Updating EMLE resources...")
+                repo.merge(remote_master_id)
+
+                if repo.index.conflicts is not None:
+                    for conflict in repo.index.conflicts:
+                        print("Conflicts found in:", conflict[0].path)
+                    raise AssertionError("Conflicts!")
+
+                user = repo.default_signature
+                tree = repo.index.write_tree()
+                commit = repo.create_commit(
+                    "HEAD",
+                    user,
+                    user,
+                    "Merge!",
+                    tree,
+                    [repo.head.target, remote_master_id],
+                )
+                # We need to do this or git CLI will think we are still merging.
+                repo.state_cleanup()
+            else:
+                raise AssertionError("Unknown merge analysis result")
Original file line number	Diff line number	Diff line change
		@@ -1,2 +1 @@
		mlmm/_version.py export-subst
		emle/_version.py export-subst