Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formatting of prediction tsv #35

Open
E0287979 opened this issue Feb 27, 2024 · 1 comment
Open

Formatting of prediction tsv #35

E0287979 opened this issue Feb 27, 2024 · 1 comment
Labels
question Further information is requested

Comments

@E0287979
Copy link

Is there any specific format requirement for prediction tsv?
I am able to run the predict function when I use the tsv within the repository.

I am getting error when I tried to predict on a file I have generated using surfaceome cayman as the backbone.

Traceback (most recent call last):
File "/anaconda3/envs/conplex-dti/bin/conplex-dti", line 6, in
sys.exit(main())
File "/ConPLex/conplex_dti/main.py", line 41, in main
args.main_func(args)
File "/ConPLex/conplex_dti/cli/predict.py", line 104, in main
drug_featurizer.preload(query_df["moleculeSmiles"].unique())
File "/ConPLex/conplex_dti/featurizer/base.py", line 162, in preload
if seq in h5fi:
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "/anaconda3/envs/conplex-dti/lib/python3.9/site-packages/h5py/_hl/group.py", line 514, in contains
return h5g._path_valid(self.id, self._e(name), self._lapl)
File "/anaconda3/envs/conplex-dti/lib/python3.9/site-packages/h5py/_hl/base.py", line 206, in _e
raise TypeError(f"A name should be string or bytes, not {type(name)}")
TypeError: A name should be string or bytes, not <class 'float'>

I think features returned NaN

@E0287979 E0287979 added the question Further information is requested label Feb 27, 2024
@samsledje
Copy link
Owner

Sorry for the late response on this-- complex expects files formatted as in https://github.com/samsledje/ConPLex/blob/main/tests/toy_predict.tsv, that is a tab-separated file with columns for the protein/molecule identifiers and then descriptions as sequences/SMILES strings. One common error with tab-separated files is, if written by hand, that the tab character is actually four spaces, which isn't parsed properly. Make sure you're using a proper tab / \t when creating this file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants