-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataset Help #54
Comments
+1 in seeing the dataset and better instructions. I received error messages in everything I tried. |
Dear @KeyLKey , thanks for the comment, I will add more information on how to create data! Unfortunately due to the LICENSE of UMLS datasets, we might not be able to share it, however, we can provide the details of how to create one. |
Dear @igorcouto , thanks for the comment, can you share the error message with me till I can check what could be the issue and fix it? |
Dear author, could you tell me which data file to download? Is it named UMLS Metathesaurus Full Subset or UMLS Semantic Network files? The former decompressed 27.1GB, and I am very eager to build a dataset like yours. Thank you very much! |
Hi @KeyLKey, for UMLS you need to download the You will build datasets for MEDCIN, NCI, and SNOMEDCT_US. More later, for Task A, you need to run config = BaseConfig(version=3).get_args(kb_name="umls")
umls_builder = dataset_builder(config=config)
dataset_json, dataset_stats = umls_builder.build()
for kb in list(dataset_json.keys()):
DataWriter.write_json(data=dataset_json[kb],
path=BaseConfig(version=3).get_args(kb_name=kb.lower()).entity_path)
DataWriter.write_json(data=dataset_stats[kb],
path=BaseConfig(version=3).get_args(kb_name=kb.lower()).dataset_stats) You need to look at the for task B you need to run the following scripts (please also consider checking those scripts to use only for UMLS)
And for C please only run the following script:
I hope this helps and Good Luck, |
Dear author, I'm trying to build nci_entities.json following your method, but found that it's missing UMLS_entity_types_with_levels.tsv.May I ask what went wrong? Thank you very much! |
Could you help check whether I missed something when trying your package for Worknet? $ python build_entity_datasets.py --kb_name wn18rr Traceback (most recent call last): File "/home/vagrant/rbox/LLMs4OL/TaskA/build_entity_datasets.py", line 11, in
File "/home/vagrant/rbox/LLMs4OL/TaskA/src/entity_dataset_builder.py", line 16, in build
File "/home/vagrant/rbox/LLMs4OL/TaskA/src/entity_dataset_builder.py", line 42, in load_artifcats
File "/home/vagrant/rbox/LLMs4OL/TaskA/datahandler/datareader.py", line 51, in load_df
File "/home/vagrant/rbox/LLMs4OL/llm4ol_py39_env/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1026, in read_csv
File "/home/vagrant/rbox/LLMs4OL/llm4ol_py39_env/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 620, in _read
File "/home/vagrant/rbox/LLMs4OL/llm4ol_py39_env/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1620, in init
File "/home/vagrant/rbox/LLMs4OL/llm4ol_py39_env/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1880, in _make_engine
File "/home/vagrant/rbox/LLMs4OL/llm4ol_py39_env/lib/python3.9/site-packages/pandas/io/common.py", line 873, in get_handle
FileNotFoundError: [Errno 2] No such file or directory: '../datasets/TaskA/WN18RR/processed-3/entity_train.csv' |
It's a great honor to see your masterpiece, but now I'm facing difficulties. Can you provide nci_entities.json data file, thank you very much
The text was updated successfully, but these errors were encountered: