-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem with mk_export.py #295
Comments
Hello @xavgit For example in this DLG file you could replace the remarks:
by...
the later you can obtain from running ligand preparation (mk_prepare_ligand.py) for the non-deuterated version of the compound. To get the Smiles for the non-deuterated compound, simply delete patterns [2H] and ([2H]). This is an interesting observation. Thanks for reporting. We will take a look and explore ways to improve the code. |
Hi, Saverio |
Hey @xavgit To install the modified version, check it out from my repository: git clone https://github.com/rwxayheee/Meeko.git
cd Meeko; git checkout update_pos_of_H_isotopes; pip install -e .; cd .. Starting from the input DLG in your original post, my output is now: The command is the same. The coordinates of hydrogen isotopes are now created in the same way a regular Hs would be added. Hope you like it, and please don't hesitate to reach out if you have any further questions. Note that there are two related issues that might be affecting your inputs/outputs: |
Hi Thanks again. Saverio |
Hi, xxxx@xxxx-X399-DESIGNARE-EX:~/sources$ git clone https://github.com/rwxayheee/Meeko.git × python setup.py egg_info did not run successfully. note: This error originates from a subprocess, and is likely not a problem with pip. × Encountered error while generating package metadata. note: This is an issue with the package mentioned above, not pip. xxxx@xxxx-X399-DESIGNARE-EX:~/sources/Meeko$ pip3 list | grep setuptools Any suggestions? Thanks. Saverio |
Hi, I can’t reproduce this :’( but it seems like an error because of unexpected Python or setuptools version. The PR didn’t change dependencies. So itself shouldn’t introduce this error. But it’s based on develop branch so it includes some other changes we haven’t merged into an official release. In my environment I have: Could you please try this? pip install -e . --use-pep517 If it doesn’t work, will it be possible to adjust the version of setuptools? |
Hi, xxxx@xxxx-X399-DESIGNARE-EX:~/sources/Meeko$ pip install -e . --use-pep517
WARNING: No metadata found in /home/xxxx/.local/lib/python3.10/site-packages × python setup.py develop did not run successfully.
note: This error originates from a subprocess, and is likely not a problem with pip. |
Hi, Thanks. |
Thanks for letting us know. I will look at the setup details a little later today. I’m not entirely sure which exact changes we made to develop branch that can be causing this.. but in the past I also encountered some problems on Ubuntu, which I think it was maybe because we are doing the old way of package setup. pep517 was the walkaround that worked for me. I can try to reproduce this later in the afternoon. It might be something we will discuss with the team when we are back from the holiday. Thanks again for the info! This is very useful. |
Alternatively you can look at the diff (files changed) of that PR and apply it to Meeko in your current environment if possible: https://github.com/forlilab/Meeko/pull/296/files It’s not complicated, I just added a code block to handle hydrogen isotopes as if regular hydrogens. And I made very minor change to function that’s used by the basic ligand preparation. I still need to write the summary, and I will do that later today! |
Hi, xxxx@xxxx-X399-DESIGNARE-EX:~$ pip3 install --upgrade rcsbsearchapi × python setup.py egg_info did not run successfully. note: This error originates from a subprocess, and is likely not a problem with pip. × Encountered error while generating package metadata. note: This is an issue with the package mentioned above, not pip. Hope this useful. Saverio |
This is the smallest smiles I could find that reproduces the problem. It's notable that none of the 18 deutorated test molecules had showed this problem before. scrub.py "[2H]C([2H])C" -o test.sdf
mk_prepare_ligand.py -i test.sdf -o test.pdbqt
mk_export.py test.pdbqt --write_sdf test_exported.sdf
grep -ci nan test_exported.sdf |
Yeah, with the current release version I also didn't get any nan from the DLG: But the geometry from @xavgit, |
edit because so many things I wrote here are incorrect |
@rwxayheee here's more information that may be helpful. To export a docked pose, the first step is to create an RDKit molecule from the smiles, and then the positions from PDBQT are set. Deuterium is a real atom in the smiles (most hydrogens are not) and thus it becomes a special case because it exists as an atom in the rdkit molecule, but it does not have a position yet. This code block calculates coordinates to such atoms using RDKit's Meeko/meeko/rdkit_mol_create.py Lines 325 to 334 in c5e38dc
|
Yes, I see that we're assigning the coordinates with this. But the geometry is not good (and can cause nan in extreme cases). You can compare the two exported SDF with and without the PR: With: Without: It's a lot better to use AddHs and RemoveHs but requires more work. I think it's worth implementing a correction, but maybe not the way I did. |
I agree. Possibly the first call of |
I actually can't reproduce nan. Has something changed in RDKit? (maybe @xavgit can tell us which version he used) But because SetTerminalAtomCoords is geometry-aware, it isn't really capable of handling the docked poses we have now. But to preserve the chirality tag, AddHs, and RemoveHs are not so convenient.. I will think more about this |
It could be an RDKit version. I could reproduce NaN with 2023.09.6, but with 2024.09.3 I get the following error:
|
Got it. I have 2024.03.4 therefore I was able to write the SDF. The PR applies correction after using it. But if it throws a runtime error, could we replace I don't have a very specific plan how to do this (within |
Hi all, Thanks. Saverio |
Thanks for the info! There's a recent change on the RDKit function we're discussing here. With this version in my current environment: + rdkit 2024.09.3 py310h7378585_0 conda-forge 18MB And the unmodified develop branch of Meeko, the output SDF is: I didn't see the "Cannot normalize a zero length vector" error. It might have been fixed since a recent build. |
Hi, Thanks. Saverio |
Hi, |
Hi @xavgit can you try the latest of my PR: |
Hi, Thanks. Saverio |
Hi, On Ubuntu 22.04 $ pip3 install --upgrade packaging
WARNING: No metadata found in /home/xxxx/.local/lib/python3.10/site-packages × python setup.py develop did not run successfully.
note: This error originates from a subprocess, and is likely not a problem with pip. Thanks. Saverio |
Hi, Now I can try to use mk_export in Ubuntu 20.04 to Thanks again. Saverio |
Thanks for the report! Glad to know #303 helps. |
Hi, $ git clone https://github.com/rwxayheee/Meeko.git Also, Thanks. Saverio |
Hi, $ git clone https://github.com/rwxayheee/Meeko.git How can I modify the previous commands to install the right version of mk_export.py avoiding the nan problem? Thanks. Saverio |
Hi, error: pathspec 'update_pos_of_H_isotopes' did not match any file(s) known to git This means you could git clone this (our official) repository and use the develop branch. The syntax is similar. Let us know if you find any further issues! |
Hi, |
Hi,
I'm experiencing a problem using mk_export.py with the command:
mk_export.py DB15122_docking_res_ad4_sf.dlg -s DB15122_docking_res_ad4_sf.sdf --all_dlg_poses
In the produced .sdf file there are the following first lines
DB15122_docking_res_ad4_sf
RDKit 3D
40 41 0 0 0 0 0 0 0 0999 V2000
-nan -nan -nan H 0 0 0 0 0 0 0 0 0 0 0 0
50.4710 64.8700 80.2250 C 0 0 0 0 0 0 0 0 0 0 0 0
-nan -nan -nan H 0 0 0 0 0 0 0 0 0 0 0 0
-nan -nan -nan H 0 0 0 0 0 0 0 0 0 0 0 0
51.6230 65.3190 79.3370 C 0 0 2 0 0 0 0 0 0 0 0 0
52.0950 64.1880 78.6110 O 0 0 0 0 0 0 0 0 0 0 0 0
52.7740 65.9730 80.1110 C 0 0 0 0 0 0 0 0 0 0 0 0
53.3564 66.6855 80.7136 H 0 0 0 0 0 0 0 0 0 0 0 0
The presence of nan could causing the following error with prolif package:
Processing DB15122_docking_res_ad4_sf
0%| | 0/32 [00:00<?, ?it/s][20:03:38] ERROR: Cannot process coordinates on line 5
[20:03:38] ERROR: moving to the beginning of the next molecule
0%| | 0/32 [00:00<?, ?it/s]
....................................................................................................................
What I can do?
Thanks.
Saverio
DB15122_docking_res_ad4_sf.dlg.txt
DB15122_docking_res_ad4_sf.sdf.txt
P.S.:
I have used the previous mk_export.py command to convert 9715
molecules.
For this set of drugs six of them have this type of problem as the following shows:
$ grep nan DB*.sdf | uniq
DB12161_docking_res_ad4_sf.sdf: -nan -nan -nan H 0 0 0 0 0 0 0 0 0 0 0 0
DB12628_docking_res_ad4_sf.sdf: -nan -nan -nan H 0 0 0 0 0 0 0 0 0 0 0 0
DB15122_docking_res_ad4_sf.sdf: -nan -nan -nan H 0 0 0 0 0 0 0 0 0 0 0 0
DB15141_docking_res_ad4_sf.sdf: -nan -nan -nan H 0 0 0 0 0 0 0 0 0 0 0 0
DB15414_docking_res_ad4_sf.sdf: -nan -nan -nan H 0 0 0 0 0 0 0 0 0 0 0 0
DB16650_docking_res_ad4_sf.sdf: -nan -nan -nan H 0 0 0 0 0 0 0 0 0 0 0 0
If this can be useful I can give the corresponding .dlg files
The text was updated successfully, but these errors were encountered: