Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing out what partial charges could look over the CLI #1068

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

IAlibay
Copy link
Member

@IAlibay IAlibay commented Jan 7, 2025

Checklist

  • Added a news entry

Developers certificate of origin

@jthorton
Copy link
Collaborator

Questions:

In 283d9e9 I have added the basic functionality discussed we have a new CLI command openfe charge-molecules (open to opinions on the name!) we can optionally also overwrite user charges using the --overwrite-charges flag as I thought it would awkward for a user to have to load the ligands removed the charges manually and then dump the ligands back to SDF. Would it also make sense to expose this flag to the plan RBFE and RHFE CLI commands?

In the RBFE, I also run the cofactors through the charge generation pipeline. Does this make sense? I think that's what happens later at the protocol level, and I wanted to match that.

For the RBFE and RHFE should we expose a flag to skip charge generation, there was mention of this in a meeting I think.

Copy link
Member Author

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some additional quick thoughts - we need to be careful about replacing the charge generation from being inplace to being a return.

"""

# If you have non-zero charges and not overwriting, just return
if (offmol.partial_charges is not None and np.any(offmol.partial_charges)):
if not overwrite:
return
return offmol
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably going to silently break a bunch of things (particlularly protocols).

Would it be worth doing an inplace style flag where we optionally just overwrite the input molecule and otherwise return a molecule?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that as we still assign the charges to the input molecule and not a copy this should still be fine. I tried to make a small example of this here, which should hopefully mean we don't break anything!

from openff.toolkit import Molecule

def charge_mol(mol) -> Molecule:
    mol.assign_partial_charges("am1bcc")
    return mol

m = Molecule.from_smiles("CCO")
charge_mol(m)

print(m.partial_charges)

[-0.13610011111111114 0.12639988888888887 -0.5998001111111111 0.04236688888888887 0.04236688888888887 0.04236688888888887 0.04319988888888887 0.04319988888888887 0.3959998888888889] elementary_charge

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry - I think I hallucinated that you had removed the offmol_copy reassignment step below.

So just to confirm, this now does both an inplace asssignment and returns the offmol itself?

I guess, do we want to have the option to not do the inplace assignment? (i.e. could there be cases where this would be useful?)

Copy link
Collaborator

@jthorton jthorton Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it still does in place but I need the return for the multiprocessing to work, we can add an option for not in place but maybe that should be in another PR?

) # type: ignore
# limit the number of threads used by SQM
# <https://github.com/openforcefield/openff-toolkit/issues/1831>
with temp_env({"OMP_NUM_THREADS": "2"}):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's some more pythonic tools for thread limiting like this, let me have a dig in the morning.

i.e. we'll need them going forward, so maybe we should bite the bullet and go for it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what we use in MDAnalysis-land: https://github.com/joblib/threadpoolctl

It might be particularly helpful here because I'm not 100% sure all BLAS/LAPACK libraries respect OMP_NUM_THREADS (I'm thinking of you MKL library).

There might be better ones too!

Copy link
Member Author

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly doing a pass on the "partial charge generation" side of things and the overall concept (I'll do a finer grained review once it's not a draft).

Overall looks good to me!

"""

# If you have non-zero charges and not overwriting, just return
if (offmol.partial_charges is not None and np.any(offmol.partial_charges)):
if not overwrite:
return
return offmol
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry - I think I hallucinated that you had removed the offmol_copy reassignment step below.

So just to confirm, this now does both an inplace asssignment and returns the offmol itself?

I guess, do we want to have the option to not do the inplace assignment? (i.e. could there be cases where this would be useful?)

) # type: ignore
# limit the number of threads used by SQM
# <https://github.com/openforcefield/openff-toolkit/issues/1831>
with temp_env({"OMP_NUM_THREADS": "2"}):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what we use in MDAnalysis-land: https://github.com/joblib/threadpoolctl

It might be particularly helpful here because I'm not 100% sure all BLAS/LAPACK libraries respect OMP_NUM_THREADS (I'm thinking of you MKL library).

There might be better ones too!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably another issue/PR - do we want to move this module somewhere else within the repo? It feels like it's more than just "openmm_utils".

Long term I think I want to move it to "openfe-utils" or similar, that way upstream protocols can use them without depending on OpenFE.

------
ValueError
If the ``toolkit_backend`` is not supported by the selected ``method``.
If ``generate_n_conformers`` is ``None``, but the input ``offmol``
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might actually be the case where we want to not inplace modify charges - what if things fails half way through a charge assignment loop, do we end up with have half our ligands charged and half our ligands uncharged?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If charging fails for one or more molecules the whole loop will break, so we might want to more gracefully handle failing though I am not sure what the end result would be one file with charged ligands and another with fails.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth testing - could be in a follow-up (for the sake of the CLI if this fails the whole thing fails, but the API is a bit different).

return offmol


def bulk_assign_partial_charges(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Have you tested if it scales as intended?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we get a good speed up in local testing, charging 16 malt1 ligands (~42 atoms) with am1bcc (ambertools) takes:

  • 4:02 2 cores
  • 2:20 4 cores
  • 1: 48 6 cores

Copy link
Contributor

@atravitz atravitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's my partial review - I'll look at the rest on Tuesday!



WORKERS = Option(
"-w",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it more consistent with other tools to make this -n, or is there precedent elsewhere using -w?

I know I just added -n as the shorthand for --n-protocol-repeats in this PR, but I think maybe that one just shouldn't have a short version, and that -n should be used for multiprocessing instead.

Also, could you move this to a new misc.py file like I did in that PR? I'd like to keep utils.py for only helper functions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah -n is probably more common I just didn't want to clash with anything so happy to chance to -n and -ncores here.

openfecli/parameters/plan_network_options.py Outdated Show resolved Hide resolved
openfecli/parameters/plan_network_options.py Outdated Show resolved Hide resolved
openfecli/parameters/plan_network_options.py Outdated Show resolved Hide resolved
openfecli/parameters/plan_network_options.py Outdated Show resolved Hide resolved
Copy link

codecov bot commented Jan 21, 2025

Codecov Report

Attention: Patch coverage is 98.14815% with 5 lines in your changes missing coverage. Please review.

Project coverage is 92.87%. Comparing base (915d110) to head (6ab8e9f).

Files with missing lines Patch % Lines
openfe/protocols/openmm_utils/charge_generation.py 75.00% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1068      +/-   ##
==========================================
- Coverage   94.46%   92.87%   -1.60%     
==========================================
  Files         135      138       +3     
  Lines       10090    10343     +253     
==========================================
+ Hits         9532     9606      +74     
- Misses        558      737     +179     
Flag Coverage Δ
fast-tests 92.87% <98.14%> (?)
slow-tests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

) # type: ignore
# limit the number of threads used by SQM
# <https://github.com/openforcefield/openff-toolkit/issues/1831>
with threadpool_limits(limits=2):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the advantage of 2 vs 1 here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought 1 might have been harsh but it seems fine in testing, maybe slightly faster while using more cores so changed to 1.

@jthorton jthorton marked this pull request as ready for review January 21, 2025 17:08
Copy link
Contributor

@atravitz atravitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, I just have a few comments. And please add a news entry!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this test data being changed?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the ligands now have charges stored in the SDF so it should skip generating the charges and keep the CI running quickly!

openfecli/commands/generate_partial_charges.py Outdated Show resolved Hide resolved
openfecli/commands/generate_partial_charges.py Outdated Show resolved Hide resolved
Comment on lines +230 to +254
Partial Charges
===============
Methods
-------
Supported partial charge method choices are:
- ``am1bcc``
- ``am1bccelf10`` (only possible if `off_toolkit_backend` in settings is set to `openeye`)
- ``nagl``
- ``espaloma`` (must have `espaloma_charge` installed)
Settings
--------
The following settings can also be set:
- ``off_toolkit_backend``: The backend to use for partial charge generation. Choose from ``ambertools`` (default), ``openeye`` or ``rdkit``.
- ``number_of_conformers``: The number of conformers to use for partial charge generation.
If unset (default), the input conformer will be used.
- ``nagl_model``: The NAGL model to use.
If unset (default), the latest available production charge model will be used.
For more information on the different options, please refer to https://docs.openfree.energy/en/stable/reference/api/openmm_protocol_settings.html#openfe.protocols.openmm_utils.omm_settings.OpenFFPartialChargeSettings.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is the best place for these docs to live, but that can be a follow-up PR as I do more docs review.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this makes the help message on the CLI way to long we should just move all of this to the yaml settings docs once we get them setup.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm the guilty party on this one - please don't feel like you have to go with this (or really anything else I wrote), this was some late night ramblings.

Comment on lines 47 to 52
'easy_rbfe_lig_ejm_46_solvent_lig_jmc_23_solvent.json',
'easy_rbfe_lig_ejm_46_complex_lig_jmc_23_complex.json',
'easy_rbfe_lig_jmc_23_complex_lig_jmc_27_complex.json',
'easy_rbfe_lig_jmc_23_solvent_lig_jmc_27_solvent.json',
'easy_rbfe_lig_jmc_23_complex_lig_jmc_27_complex.json',
'easy_rbfe_lig_jmc_23_solvent_lig_jmc_28_solvent.json',
'easy_rbfe_lig_jmc_23_complex_lig_jmc_28_complex.json']
'easy_rbfe_lig_jmc_23_complex_lig_jmc_28_complex.json',
'easy_rbfe_lig_ejm_46_solvent_lig_jmc_23_solvent.json',
'easy_rbfe_lig_ejm_46_complex_lig_jmc_23_complex.json']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, what caused this ordering change?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch this was me playing with the edges in the network I'll undo this change.

@jthorton
Copy link
Collaborator

jthorton commented Jan 23, 2025

Narrowed down the test error to (with openeye installed):

from gufe import SmallMoleculeComponent
gdp = SmallMoleculeComponent.from_sdf_file("eg5_cofactor.sdf")
gdp.to_openff().to_file("gdp_ugly.sdf", "sdf")

openff.toolkit.utils.exceptions.InconsistentStereochemistryError: Programming error: OpenEye atom stereochemistry assumptions failed. The atom in the oemol has stereochemistry None and the atom in the offmol has stereochemistry S.

Copy link

No API break detected ✅

@jthorton jthorton changed the title [DNM] Testing out what partial charges could look over the CLI Testing out what partial charges could look over the CLI Jan 24, 2025
Copy link
Member Author

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't do a "request changes" since I'm the og PR author - but "requested changed": looks good, just a couple of things and possibly needs a test with multiprocessing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you adding charges here just because of the failed openeye test or because of other reasons too?

If it's just NAGL openeye issue, then I would ask that we actually keep the falling test and we can xfail it or mark it for skip. The reason is that this failure is actually a real issue that would impact our users - so keeping a test that reminds us it is a problem is very useful.

Alternatively we add it as a separate test/issue, that's ok too - I just don't want to lose visibility on this failure (this type of problem has bitten us enough times in the past).

------
ValueError
If the ``toolkit_backend`` is not supported by the selected ``method``.
If ``generate_n_conformers`` is ``None``, but the input ``offmol``
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth testing - could be in a follow-up (for the sake of the CLI if this fails the whole thing fails, but the API is a bit different).

}
charged_ligands = []

if processors > 1:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a reminder to add a test for this.

help=YAML_OPTIONS.kwargs["help"],
)
@OUTPUT_FILE_AND_EXT.parameter(
help="The name of the SDF file the charged ligands should be written to."
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(apologies if it's somewhere and I've missed it), We probably need a check that we're not allowing users to overwrite their input SDF? 100% can see someone accidentally passing the same SDF as arguments (or some other existing one), by accident.

@print_duration
def plan_rbfe_network(
molecules: list[str], protein: str, cofactors: tuple[str],
yaml_settings: str,
output_dir: str,
n_cores: int,
overwrite_charges: bool
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know they should know better but.. I know someone's going to make a mistake here. Should we be extra and write out that they chose this option so it's going to happen?

Comment on lines +230 to +254
Partial Charges
===============
Methods
-------
Supported partial charge method choices are:
- ``am1bcc``
- ``am1bccelf10`` (only possible if `off_toolkit_backend` in settings is set to `openeye`)
- ``nagl``
- ``espaloma`` (must have `espaloma_charge` installed)
Settings
--------
The following settings can also be set:
- ``off_toolkit_backend``: The backend to use for partial charge generation. Choose from ``ambertools`` (default), ``openeye`` or ``rdkit``.
- ``number_of_conformers``: The number of conformers to use for partial charge generation.
If unset (default), the input conformer will be used.
- ``nagl_model``: The NAGL model to use.
If unset (default), the latest available production charge model will be used.
For more information on the different options, please refer to https://docs.openfree.energy/en/stable/reference/api/openmm_protocol_settings.html#openfe.protocols.openmm_utils.omm_settings.OpenFFPartialChargeSettings.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm the guilty party on this one - please don't feel like you have to go with this (or really anything else I wrote), this was some late night ramblings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants