Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Release #288

Merged
merged 200 commits into from
Dec 3, 2024
Merged
Changes from 6 commits
Commits
Show all changes
200 commits
Select commit Hold shift + click to select a range
0bbae4d
added coverage report generation on git workflow
alkidbaci Aug 2, 2024
f5e3f7e
added coverage configs
alkidbaci Aug 2, 2024
98a3773
added coverage report
alkidbaci Aug 2, 2024
a7e09af
added static bandages
alkidbaci Aug 2, 2024
1d1be16
updated link for coverage bandage
alkidbaci Aug 2, 2024
ccb7b26
Merge pull request #253 from dice-group/static_bandages
Demirrr Aug 2, 2024
c53144c
added logo and favicon
alkidbaci Aug 9, 2024
0849d2c
Merge pull request #254 from dice-group/logo
Demirrr Aug 9, 2024
c1d2ef1
Refactoring In Progres: KGE models are updated. Unused Domain and Ran…
Demirrr Aug 14, 2024
7f6d194
DualE added
Demirrr Aug 14, 2024
f8a1aa9
Merge pull request #255 from dice-group/refactoring
Demirrr Aug 15, 2024
0d571d6
.nt file readded for read_with_pandas
Demirrr Aug 20, 2024
5ec6b1b
Tqdm integrated
Demirrr Aug 20, 2024
e2ed3b3
Merge pull request #257 from dice-group/refactoring
Demirrr Aug 21, 2024
2b44c7d
progress bar are alighed between trainers
Demirrr Aug 22, 2024
5db0643
Potential div by zero avoided
Demirrr Aug 22, 2024
4a6fa5e
Merge pull request #258 from dice-group/refactoring
Demirrr Aug 23, 2024
f8e0c56
torchrun and dicee entry points work together. DDB with two gpus only…
Demirrr Sep 20, 2024
263b2cb
compile added in ddp
Demirrr Sep 21, 2024
367fbca
default params changed to minimize the default training runtimes
Demirrr Sep 23, 2024
88a8c65
refactoring and removing unused imports and codes
Demirrr Sep 23, 2024
60f62b3
defaul lr and optim changed to Adam 0.1
Demirrr Sep 25, 2024
61a9a01
version incremanted
Demirrr Sep 25, 2024
10c4343
Refactoring In Progress: a KGE trained in C++ code base can be used i…
Demirrr Oct 4, 2024
2a8bdc5
Refactoring In Progress: Default optim Adam in base model
Demirrr Oct 4, 2024
25794d2
Refactoring In Progress: Assertion added for entity and relation mapp…
Demirrr Oct 4, 2024
f069b0e
Refactoring In Progress: KeyError exception added
Demirrr Oct 4, 2024
2cced25
__iter__, __len__ and inverse index aded
Demirrr Oct 4, 2024
1ae25ca
predict_top_k() can ba used with string and list of strings
Demirrr Oct 4, 2024
4c63437
unused import removed
Demirrr Oct 4, 2024
b240cde
Refactoring DDP for better logging
Demirrr Oct 4, 2024
1083598
Refactoring In Progress: unused imports and docstrings
Demirrr Oct 4, 2024
943a8f8
DualE with the abbrevations of noqa
Demirrr Oct 4, 2024
a43b3fb
WIP:Refactoring
Demirrr Oct 4, 2024
f3ec5d4
Docstring added to explain the scipt
Demirrr Oct 7, 2024
82197dc
lazy import error catch message
Demirrr Oct 7, 2024
c3f1661
Reading/Indexing/Storing
Demirrr Oct 7, 2024
fd74806
indexing example added
Demirrr Oct 7, 2024
772466c
KvsSample Dataset rewritten for the multi-class classfication problem
Demirrr Oct 9, 2024
e743425
KvsSample refactored
Demirrr Oct 10, 2024
dff2629
BaseKGELightning.training_setup() implements kvssample training step
Demirrr Oct 10, 2024
95c0ac4
kvssample for cliiford implemented
Demirrr Oct 10, 2024
5fe6c7b
distmult kvssample with einsum
Demirrr Oct 10, 2024
e924134
ComplEx.forward_k_vs_sample() implemented
Demirrr Oct 10, 2024
649b6f1
Keci().forward_k_vs_sample() validated with einsum p=0 q>=0
Demirrr Oct 11, 2024
bb620d0
multinomial() is being used to assign probabilities for entities. Thi…
Demirrr Oct 11, 2024
682eb4b
1vsSample name is being used
Demirrr Oct 11, 2024
5f2d9a6
1vsSample name is being used
Demirrr Oct 11, 2024
bf1bbef
1vsSample added
Demirrr Oct 11, 2024
fcdc52a
renamed from kvssample to onevssample
Demirrr Oct 12, 2024
00bbd72
neg sample stated explicitly
Demirrr Oct 12, 2024
a0bd139
attributes explicitly fixed
Demirrr Oct 12, 2024
256f1da
unused imports are removed
Demirrr Oct 12, 2024
28aeedf
1vsSample working and pickle usage will be deprecated
Demirrr Oct 12, 2024
274a992
new version of polars included
Demirrr Oct 14, 2024
45fcb0c
polars_dataframe_indexer() implemented
Demirrr Oct 14, 2024
d276783
preprocess_with_polars() refactored
Demirrr Oct 14, 2024
b6e43b6
if polars is a backend, indices are stored in csv
Demirrr Oct 14, 2024
de6ab29
Reindexing ignored if polars being used
Demirrr Oct 14, 2024
adf77d4
coefficients function only takes p and q dimensions
Demirrr Oct 14, 2024
9c4ccd4
CMult removed
Demirrr Oct 14, 2024
5aa51a8
Large Scale learning KGE on CPU
Demirrr Oct 15, 2024
09f1953
Refactored
Demirrr Oct 15, 2024
0e0ee7d
refactored
Demirrr Oct 15, 2024
c29db15
KvsSample reincluded
Demirrr Oct 15, 2024
47db2f9
KvsSample reincluded
Demirrr Oct 15, 2024
7884d4e
CMULT removed
Demirrr Oct 15, 2024
f1b809c
CMULT removed
Demirrr Oct 15, 2024
dbe5af8
WIP: Integrating kvssample
Demirrr Oct 18, 2024
b388291
Attempt to solve github coverage error
Demirrr Oct 18, 2024
7b4a6ba
Update github-actions-python-package.yml
Demirrr Oct 18, 2024
28836b5
Update .coveragerc
Demirrr Oct 18, 2024
94ed6b2
Merge pull request #261 from dice-group/larger_than_memory
Demirrr Oct 18, 2024
9867b43
load_term_mapping() is implemented to loead indices
Demirrr Oct 21, 2024
c3b95c7
scoring_techniques extended in eval
Demirrr Oct 21, 2024
b8b3e07
read_with_polars() refactored
Demirrr Oct 21, 2024
2ef2dac
raw sets are set to None to decrease the memory
Demirrr Oct 21, 2024
56e7656
Few comments added
Demirrr Oct 21, 2024
a2a2c83
Assertion scoring_technique checking
Demirrr Oct 21, 2024
e8a329c
reading indices as csv files
Demirrr Oct 21, 2024
d331323
Dynamic KvsSample working!
Demirrr Oct 21, 2024
33cdad5
kvssample is default scoring
Demirrr Oct 21, 2024
a1d87bf
todo added
Demirrr Oct 21, 2024
392daeb
Unused imports are removed
Demirrr Oct 21, 2024
4f5f069
ruff check made more restrictive
Demirrr Oct 21, 2024
4a35799
deprecated class method removed
Demirrr Oct 21, 2024
e95180f
raw datasets are not emptied if bpe is being used
Demirrr Oct 22, 2024
b31151a
typo fixed
Demirrr Oct 22, 2024
fe970d7
Merge pull request #262 from dice-group/larger_than_memory
Demirrr Oct 22, 2024
55bd4fc
separator for polars changed from \t to whitespace
Demirrr Oct 22, 2024
0034cec
loggings are aligned across gpus
Demirrr Oct 22, 2024
6e070e4
from numpy to torch conversion moved into __get__item in negsample to…
Demirrr Oct 22, 2024
e6ea7bb
Onevsall with memory map
Demirrr Oct 22, 2024
2eb5db4
WIP: replacing nump arrays with memory maps
Demirrr Oct 22, 2024
37f06f6
WIP: MultiGPU training with memory map
Demirrr Oct 23, 2024
2ffb4c9
WIP: MultiGPUs training
Demirrr Oct 23, 2024
6d84e19
WIP: multi-gpu memory map
Demirrr Oct 23, 2024
f992c34
fixes for separator
Demirrr Oct 23, 2024
8e5b840
formating fixes
Demirrr Oct 23, 2024
4dc027f
is_continual_training flag is removed in the start function of Execut…
Demirrr Oct 24, 2024
5277b3f
Update Readme & Exception handling
Demirrr Oct 24, 2024
29cf2bd
WIP:Unsed attributes
Demirrr Oct 24, 2024
5f8c404
GradScaler included
Demirrr Oct 24, 2024
3764aca
copy memmap array numpy to pytorch tensor
Demirrr Oct 25, 2024
3b1d838
printing selected opt
Demirrr Oct 25, 2024
f1fbb29
WIP: CL with memory map
Demirrr Oct 25, 2024
5612303
WIP: csvs are used instead of pickling dictionaries
Demirrr Oct 25, 2024
64b0487
format errors are fixed
Demirrr Oct 25, 2024
5e82778
Merge pull request #263 from dice-group/refactoring_memorymap
Demirrr Oct 25, 2024
7b83092
WIP: Model Paralellisim
Demirrr Oct 25, 2024
14676c7
WIP: MP negsample
Demirrr Oct 25, 2024
758c858
WIP: Model Paralelisim Refactoring
Demirrr Oct 27, 2024
ff7eb1c
PL and ML returns same results on a single GPU compute
Demirrr Oct 27, 2024
9834ad1
WIP: Refactoring ML
Demirrr Oct 27, 2024
628aa4b
Info about batches added
Demirrr Oct 28, 2024
9fcff4d
Fixed the lint errors
Demirrr Oct 28, 2024
e3dff25
Fixes None trainer error
Demirrr Oct 28, 2024
cf44939
typo fixed at torchDDP
Demirrr Oct 28, 2024
910a2ae
typo fixed at torchDDP
Demirrr Oct 28, 2024
e6f0d78
Merge pull request #265 from dice-group/refactoring_memorymap
Demirrr Oct 28, 2024
4e58fb1
No need to inform user about mappings
Demirrr Oct 29, 2024
71ffcc6
ASWA callback global and local ranks must be 0
Demirrr Oct 31, 2024
e331b0c
Merge branch 'refactoring_memorymap' of https://github.com/dice-group…
Demirrr Oct 31, 2024
9f25d7d
Benchmark results added
Demirrr Oct 31, 2024
f4a86a4
fix for ASWA callback for all trainers
Demirrr Oct 31, 2024
157b2fa
Merge pull request #266 from dice-group/refactoring_memorymap
Demirrr Oct 31, 2024
f32d448
Update README.md
Demirrr Oct 31, 2024
2684c46
Update README.md
Demirrr Oct 31, 2024
6389615
UMLS with AllvsAll
Demirrr Oct 31, 2024
19804c3
Update README.md
Demirrr Nov 6, 2024
2367bee
Update README.md
Demirrr Nov 12, 2024
03854cc
compressed kg can be read by polars.read_csv() directly
Demirrr Nov 12, 2024
7728c70
compressed kg with read few only available in polars
Demirrr Nov 12, 2024
437ffd0
Merge pull request #269 from dice-group/compressed_kg
Demirrr Nov 13, 2024
fb4f72d
write_csv_from_model_parallel implemented
Demirrr Nov 14, 2024
a0ac733
Model Parallel Regression Test and write_csv_from_model_parallel
Demirrr Nov 14, 2024
010c576
if no gpu, no errors fix
Demirrr Nov 14, 2024
794abad
k increased 10 to avoid assertion error. Randomes in answer_multi_hop…
Demirrr Nov 15, 2024
51ea8a9
Merge pull request #271 from dice-group/model_parallel_to_csv
Demirrr Nov 15, 2024
9aa9c28
Saving embeddings into csv implemented and tested
Demirrr Nov 15, 2024
9cc8348
write_csv_from_model_parallel() and from_pretrained_model_write_embed…
Demirrr Nov 15, 2024
f60485a
Merge pull request #272 from dice-group/extracting_embedding_in_csv
Demirrr Nov 16, 2024
d853720
WIP: Tensor Paralelisim
Demirrr Nov 16, 2024
cdc615d
WIP: Forward shapes fixing
Demirrr Nov 16, 2024
e701140
Working version of model/pipeline paralelisim
Demirrr Nov 16, 2024
c190112
Tensor Parallelisim implemented. Yet, torch seemed to have a bug http…
Demirrr Nov 18, 2024
1dae312
WIP: Tensor Paralelisim implemented
Demirrr Nov 19, 2024
7654051
WIP: Training a KGE model with Tensor Paralelisim
Demirrr Nov 19, 2024
f16bc54
MP changed to TP
Demirrr Nov 19, 2024
a78a548
TensorParallel Trainer returns the model
Demirrr Nov 20, 2024
69ff7da
Tensorparallel with EnsembelKGE written into disk with _partial_x.pt …
Demirrr Nov 20, 2024
598abf0
Initial working version of ensemble kge. Name must be changed
Demirrr Nov 20, 2024
b535fed
todo and comments added
Demirrr Nov 20, 2024
29068db
forward_triples() moved into callable()
Demirrr Nov 22, 2024
85af150
Auto batch finding as default for TP
Demirrr Nov 24, 2024
b0fcbd5
drop_last false and static forward_backward_update_loss used
Demirrr Nov 24, 2024
c1a14e7
Adopt included, pytorch version increased
Demirrr Nov 25, 2024
6e0a51b
Fixing lint errors
Demirrr Nov 25, 2024
d2736c3
Log info changed
Demirrr Nov 25, 2024
244d197
Update README.md
Demirrr Nov 25, 2024
abc5226
Merge pull request #274 from dice-group/tensor_parallel
Demirrr Nov 25, 2024
4e518bf
increment factor is the first batch size
Demirrr Nov 25, 2024
4b1a876
avg of last three batches gpu usage measured
Demirrr Nov 25, 2024
a6e15b7
dynomo import removed
Demirrr Nov 25, 2024
4e89b1d
expoential batch size increment is reduced to the linear
Demirrr Nov 26, 2024
0417f19
embeddings can be concatted horiziontally for csv
Demirrr Nov 26, 2024
f38bfa8
Merge pull request #275 from dice-group/tensor_parallel
Demirrr Nov 26, 2024
0871973
Auto batch finding as an argument
Demirrr Nov 26, 2024
9ccee78
Tensor Parallel is working
Demirrr Nov 26, 2024
f53c1e7
Merge pull request #276 from dice-group/tensor_parallel
Demirrr Nov 26, 2024
d3081a1
compile is removed and avg is reduced to single in gpu usage signal
Demirrr Nov 28, 2024
5551241
Improved batch finding in TP
Demirrr Nov 28, 2024
f325b8a
Fix deprecated variable.
sshivam95 Nov 28, 2024
9b17326
Add batched evaluation
sshivam95 Nov 28, 2024
b5de7b5
Merge pull request #279 from dice-group/query-generator-test
Demirrr Nov 28, 2024
94ab305
Update README.md
Demirrr Nov 28, 2024
fbf30ab
WIP: Reducing the runtime of finding a good search & removing redanda…
Demirrr Nov 28, 2024
44b9dbd
fstring usage without any placeholder fixed
Demirrr Nov 28, 2024
58aa98c
Merge pull request #280 from dice-group/tensor_parallel
Demirrr Nov 28, 2024
a5f1648
Comment old code
sshivam95 Nov 29, 2024
f1b263b
TP with auto batch finding can be used to train KGE with >20B
Demirrr Nov 29, 2024
e0ee128
Merge branch 'develop' into tensor_parallel
Demirrr Nov 29, 2024
fb436e6
Merge pull request #277 from dice-group/batched-evaluation
sshivam95 Nov 29, 2024
7580606
Merge pull request #282 from dice-group/tensor_parallel
Demirrr Nov 29, 2024
e8dddb6
Refactoring before new release
Demirrr Nov 29, 2024
6638da6
Fix memory allocation issue
sshivam95 Nov 30, 2024
787b535
Merge pull request #285 from dice-group/batched-evaluation-memory-fix
Demirrr Nov 30, 2024
1dc3000
F841 lint error fixed
Demirrr Nov 30, 2024
4326478
Merge branch 'develop' into refactor
Demirrr Nov 30, 2024
3332aec
Merge pull request #283 from dice-group/refactor
Demirrr Dec 1, 2024
b1b1faf
Regression test added for TP
Demirrr Dec 2, 2024
90e9285
EnsembleKGE moved to init
Demirrr Dec 2, 2024
128bd18
TP can be contiually trained
Demirrr Dec 2, 2024
ceffb79
dept func from executor removed and time postfix removed from continu…
Demirrr Dec 2, 2024
01e696b
unused import removed
Demirrr Dec 2, 2024
a766711
Merge pull request #286 from dice-group/refactor
Demirrr Dec 3, 2024
cddfc17
try catches are removed and example added
Demirrr Dec 3, 2024
fc9b352
if optim doesn't exist, it should return None
Demirrr Dec 3, 2024
6f8e4b5
Simple example of training a KGE with pytorch setup
Demirrr Dec 3, 2024
e4a0bd1
Merge pull request #287 from dice-group/literal_example
Demirrr Dec 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
260 changes: 205 additions & 55 deletions README.md
Original file line number Diff line number Diff line change
@@ -65,16 +65,80 @@ python -m pytest -p no:warnings --ff # to run the failures first and then the re
## Knowledge Graph Embedding Models
<details> <summary> To see available Models</summary>

1. Decal, Keci, DualE, ComplEx, QMult, OMult, ConvQ, ConvO, ConEx, TransE, DistMult, and Shallom
2. All embedding models available in https://github.com/pykeen/pykeen#models
* ```--model Decal | Keci | DualE | ComplEx | QMult | OMult | ConvQ | ConvO | ConEx | TransE | DistMult | Shallom```
* ```--model Pykeen_QuatE | Pykeen_Mure ``` all embedding models available in https://github.com/pykeen/pykeen#models can be selected.

Training and scoring techniques
* ```--trainer torchCPUTrainer | PL | MP | torchDDP ```
* ```--scoring_technique 1vsAll | KvsAll | AllvsAll | KvsSample | NegSample ```

> For more, please refer to `examples`.
</details>

## How to Train
<details> <summary> To see a code snippet </summary>

To Train a KGE model and evaluate it on the train, validation, and test sets of the UMLS benchmark dataset.
#### Training Techniques

A KGE model can be trained with a state-of-the-art training technique ```--trainer "torchCPUTrainer" | "PL" | "MP" | torchDDP ```
```bash
# CPU training
dicee --dataset_dir "KGs/UMLS" --trainer "torchCPUTrainer" --scoring_technique KvsAll --model "Keci" --eval_model "train_val_test"
# Distributed Data Parallelism
dicee --dataset_dir "KGs/UMLS" --trainer "PL" --scoring_technique KvsAll --model "Keci" --eval_model "train_val_test"
# Model Parallelism
dicee --dataset_dir "KGs/UMLS" --trainer "MP" --scoring_technique KvsAll --model "Keci" --eval_model "train_val_test"
# Distributed Data Parallelism in native torch
OMP_NUM_THREADS=1 torchrun --standalone --nnodes=1 --nproc_per_node=gpu dicee --dataset_dir "KGs/UMLS" --model Keci --eval_model "train_val_test" --trainer "torchDDP" --scoring_technique KvsAll
```
A KGE model model can also be trained in multi-node multi-gpu DDP setting.
```bash
torchrun --nnodes 2 --nproc_per_node=gpu --node_rank 0 --rdzv_id 455 --rdzv_backend c10d --rdzv_endpoint=nebula dicee --trainer "torchDDP" --dataset_dir "KGs/YAGO3-10"
torchrun --nnodes 2 --nproc_per_node=gpu --node_rank 1 --rdzv_id 455 --rdzv_backend c10d --rdzv_endpoint=nebula dicee --trainer "torchDDP" --dataset_dir "KGs/YAGO3-10"
```
On large knowledge graphs, this configurations should be used.

where the data is in the following form
```bash
$ head -3 KGs/UMLS/train.txt
acquired_abnormality location_of experimental_model_of_disease
anatomical_abnormality manifestation_of physiologic_function
alga isa entity

$ head -3 KGs/YAGO3-10/valid.txt
Mikheil_Khutsishvili playsFor FC_Merani_Tbilisi
Ebbw_Vale isLocatedIn Blaenau_Gwent
Valenciennes isLocatedIn Nord-Pas-de-Calais
```
By default, ```--backend "pandas" --separator "\s+" ``` is used in ```pandas.read_csv(sep=args.separator)``` to obtain triples.
You can choose a suitable backend for your knowledge graph ```--backend pandas | polars | rdflib ```.
On large knowledge graphs n-triples, ```--backend "polars" --separator " " ``` is a good option.
**Apart from n-triples or standard link prediction dataset formats, we support ["owl", "nt", "turtle", "rdf/xml", "n3"]***.
On other RDF knowledge graphs, ```--backend "rdflib" ``` can be used. Note that knowledge graphs must not contain blank nodes or literals.
Moreover, a KGE model can be also trained by providing **an endpoint of a triple store**.
```bash
dicee --sparql_endpoint "http://localhost:3030/mutagenesis/" --model Keci
```

#### Scoring Techniques

We have implemented state-of-the-art scoring techniques to train a KGE model ```--scoring_technique 1vsAll | KvsAll | AllvsAll | KvsSample | NegSample ```.
```bash
dicee --dataset_dir "KGs/YAGO3-10" --model Keci --trainer "torchCPUTrainer" --scoring_technique "NegSample" --neg_ratio 10 --num_epochs 10 --batch_size 10_000 --num_core 0 --eval_model None
# Epoch:10: 100%|███████████| 10/10 [01:31<00:00, 9.11s/it, loss_step=0.09423, loss_epoch=0.07897]
# Training Runtime: 1.520 minutes.
dicee --dataset_dir "KGs/YAGO3-10" --model Keci --trainer "torchCPUTrainer" --scoring_technique "NegSample" --neg_ratio 10 --num_epochs 10 --batch_size 10_000 --num_core 10 --eval_model None
# Epoch:10: 100%|███████████| 10/10 [00:58<00:00, 5.80s/it, loss_step=0.11909, loss_epoch=0.07991]
# Training Runtime: 58.106 seconds.
dicee --dataset_dir "KGs/YAGO3-10" --model Keci --trainer "torchCPUTrainer" --scoring_technique "NegSample" --neg_ratio 10 --num_epochs 10 --batch_size 10_000 --num_core 20 --eval_model None
# Epoch:10: 100%|███████████| 10/10 [01:01<00:00, 6.16s/it, loss_step=0.10751, loss_epoch=0.06962]
# Training Runtime: 1.029 minutes.
dicee --dataset_dir "KGs/YAGO3-10" --model Keci --trainer "torchCPUTrainer" --scoring_technique "NegSample" --neg_ratio 10 --num_epochs 10 --batch_size 10_000 --num_core 50 --eval_model None
# Epoch:10: 100%|███████████| 10/10 [01:08<00:00, 6.83s/it, loss_step=0.05347, loss_epoch=0.07003]
# Training Runtime: 1.140 minutes.
```
Increasing the number of cores often (but not always) helps to decrease the runtimes on large knowledge graphs ```--num_core 4 --scoring_technique KvsSample | NegSample --neg_ratio 1```

A KGE model can be also trained in a python script
```python
from dicee.executer import Execute
from dicee.config import Namespace
@@ -91,53 +155,9 @@ print(reports["Train"]["MRR"]) # => 0.9912
print(reports["Test"]["MRR"]) # => 0.8155
# See the Keci_UMLS folder embeddings and all other files
```
where the data is in the following form
```bash
$ head -3 KGs/UMLS/train.txt
acquired_abnormality location_of experimental_model_of_disease
anatomical_abnormality manifestation_of physiologic_function
alga isa entity
```
A KGE model can be trained with a state-of-the-art training technique from the command line
```bash
# CPU training
dicee --dataset_dir "KGs/UMLS" --model Keci --eval_model "train_val_test" --trainer torchCPUTrainer
# Distributed Data Parallelism
dicee --dataset_dir "KGs/UMLS" --model Keci --eval_model "train_val_test" --trainer PL
# Model Parallelism
dicee --dataset_dir "KGs/UMLS" --model Keci --eval_model "train_val_test" --trainer MP
# Distributed Data Parallelism in native torch
torchrun --standalone --nnodes=1 --nproc_per_node=gpu dicee --dataset_dir "KGs/UMLS" --model Keci --eval_model "train_val_test" --trainer torchDDP --scoring_technique KvsAll

#### Continual Learning

```
dicee automatically detects available GPUs and trains a model with distributed data parallels technique.
You can choose a suitable backend for your knowledge graph ```--backend pandas | polars | rdflib ```.
```bash
# Train a model by only using the GPU-0
CUDA_VISIBLE_DEVICES=0 dicee --dataset_dir "KGs/UMLS" --model Keci --eval_model "train_val_test"
# Train a model by only using GPU-1
CUDA_VISIBLE_DEVICES=1 dicee --dataset_dir "KGs/UMLS" --model Keci --eval_model "train_val_test"
# Train a model by using all available GPUs
dicee --dataset_dir "KGs/UMLS" --model Keci --eval_model "train_val_test"
```
Under the hood, dicee executes the run.py script and uses [lightning](https://lightning.ai/) as a default trainer.
```bash
# Two equivalent executions
# (1)
dicee --dataset_dir "KGs/UMLS" --model Keci --eval_model "train_val_test"
# (2)
CUDA_VISIBLE_DEVICES=0,1 dicee --trainer PL --dataset_dir "KGs/UMLS" --model Keci --eval_model "train_val_test"
```
Similarly, models can be easily trained with torchrun
```bash
torchrun --standalone --nnodes=1 --nproc_per_node=gpu dicee --trainer torchDDP --dataset_dir "KGs/UMLS" --model Keci --eval_model "train_val_test"
```
You can also train a model in multi-node multi-gpu setting.
```bash
torchrun --nnodes 2 --nproc_per_node=gpu --node_rank 0 --rdzv_id 455 --rdzv_backend c10d --rdzv_endpoint=nebula dicee --trainer torchDDP --dataset_dir KGs/UMLS
torchrun --nnodes 2 --nproc_per_node=gpu --node_rank 1 --rdzv_id 455 --rdzv_backend c10d --rdzv_endpoint=nebula dicee --trainer torchDDP --dataset_dir KGs/UMLS
```
Train a KGE model by providing the path of a single file and store all parameters under newly created directory
called `KeciFamilyRun`.
```bash
@@ -150,17 +170,13 @@ _:1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07
<http://www.benchmark.org/family#hasChild> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#ObjectProperty> .
<http://www.benchmark.org/family#hasParent> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#ObjectProperty> .
```

**Continual Training:** the training phase of a pretrained model can be resumed. The model will saved in the same directory ``` --continual_learning "KeciFamilyRun"```.
```bash
dicee --continual_learning "KeciFamilyRun" --path_single_kg "KGs/Family/family-benchmark_rich_background.owl" --model Keci --backend rdflib --eval_model None
```

**Apart from n-triples or standard link prediction dataset formats, we support ["owl", "nt", "turtle", "rdf/xml", "n3"]***.
Moreover, a KGE model can be also trained by providing **an endpoint of a triple store**.
```bash
dicee --sparql_endpoint "http://localhost:3030/mutagenesis/" --model Keci
```
For more, please refer to `examples`.

</details>

## Creating an Embedding Vector Database
@@ -319,6 +335,140 @@ KGE(path='...').deploy(share=True,top_k=10)
<img src="dicee/lp.png" alt="Italian Trulli">
</details>


## Link Prediction Benchmarks

## Link Prediction Results
In the below, we provide a brief overview of the link prediction results. Results are sorted in descending order of the size of the respective dataset.

#### YAGO3-10 ####
<details> <summary> To see the results </summary>

| | | MRR | Hits@1 | Hits@3 | Hits@10 |
|--------------------|-------|------:|-------:|-------:|--------:|
| ComplEx-KvsAll | train | 1.000 | 1.000 | 1.000 | 1.000 |
| ComplEx-KvsAll | val | 0.374 | 0.308 | 0.402 | 0.501 |
| ComplEx-KvsAll | test | 0.372 | 0.302 | 0.404 | 0.505 |
| ComplEx-KvsAll-SWA | train | 0.998 | 0.997 | 1.000 | 1.000 |
| ComplEx-KvsAll-SWA | val | 0.345 | 0.279 | 0.372 | 0.474 |
| ComplEx-KvsAll-SWA | test | 0.341 | 0.272 | 0.374 | 0.474 |
| ComplEx-KvsAll-SWA | train | x | x | x | x |
| ComplEx-KvsAll-SWA | val | x | x | x | x |
| ComplEx-KvsAll-SWA | test | x | x | x | x |
| Keci-KvsAll | train | 1.000 | 1.000 | 1.000 | 1.000 |
| Keci-KvsAll | val | 0.337 | 0.268 | 0.370 | 0.468 |
| Keci-KvsAll | test | 0.343 | 0.274 | 0.376 | 0.3431 |
| Keci-KvsAll-SWA | train | x | x | x | x |
| Keci-KvsAll-SWA | val | x | x | x | x |
| Keci-KvsAll-SWA | test | x | x | x | x |
| Keci-KvsAll-ASWA | train | 0.978 | 0.969 | 0.985 | 0.991 |
| Keci-KvsAll-ASWA | val | 0.400 | 0.324 | 0.439 | 0.540 |
| Keci-KvsAll-ASWA | test | 0.394 | 0.317 | 0.439 | 0.539 |

```--embedding_dim 256 --num_epochs 300 --batch_size 1024 --optim Adam 0.1``` leading to 31.6M params.
Observations: A severe overfitting. ASWA improves the generalization better than SWA.


#### FB15k-237 ####

| | | MRR | Hits@1 | Hits@3 | Hits@10 |
|--------------------|-------|------:|-------:|-------:|--------:|
| Keci-KvsAll-SWA | train | x | x | x | x |
| Keci-KvsAll-SWA | val | x | x | x | x |
| Keci-KvsAll-SWA | test | x | x | x | x |

</details>





```bash
dicee --dataset_dir "KGs/UMLS" --model "Keci" --p 0 --q 1 --trainer "PL" --scoring_technique "KvsSample" --embedding_dim 256 --num_epochs 100 --batch_size 32 --num_core 10
# Epoch 99: 100%|███████████| 13/13 [00:00<00:00, 29.56it/s, loss_step=6.46e-6, loss_epoch=8.35e-6]
# *** Save Trained Model ***
# Evaluate Keci on Train set: Evaluate Keci on Train set
# {'H@1': 1.0, 'H@3': 1.0, 'H@10': 1.0, 'MRR': 1.0}
# Evaluate Keci on Validation set: Evaluate Keci on Validation set
# {'H@1': 0.33358895705521474, 'H@3': 0.5253067484662577, 'H@10': 0.7576687116564417, 'MRR': 0.46992150194876076}
# Evaluate Keci on Test set: Evaluate Keci on Test set
# {'H@1': 0.3320726172465961, 'H@3': 0.5098335854765507, 'H@10': 0.7594553706505295, 'MRR': 0.4633434701052234}
```
Increasing cores increases the runtimes if there is a preprocessing step at the batch generation.
```bash
dicee --dataset_dir "KGs/UMLS" --model "Keci" --p 0 --q 1 --trainer "PL" --scoring_technique "KvsAll" --embedding_dim 256 --num_epochs 100 --batch_size 32
# Epoch 99: 100%|██████████| 13/13 [00:00<00:00, 101.94it/s, loss_step=8.11e-6, loss_epoch=8.92e-6]
# Evaluate Keci on Train set: Evaluate Keci on Train set
# {'H@1': 1.0, 'H@3': 1.0, 'H@10': 1.0, 'MRR': 1.0}
# Evaluate Keci on Validation set: Evaluate Keci on Validation set
# {'H@1': 0.348159509202454, 'H@3': 0.5659509202453987, 'H@10': 0.7883435582822086, 'MRR': 0.4912162082105331}
# Evaluate Keci on Test set: Evaluate Keci on Test set
# {'H@1': 0.34568835098335854, 'H@3': 0.5544629349470499, 'H@10': 0.7776096822995462, 'MRR': 0.48692617590763265}
```

```bash
dicee --dataset_dir "KGs/UMLS" --model "Keci" --p 0 --q 1 --trainer "PL" --scoring_technique "AllvsAll" --embedding_dim 256 --num_epochs 100 --batch_size 32
# Epoch 99: 100%|██████████████| 98/98 [00:01<00:00, 88.95it/s, loss_step=0.000, loss_epoch=0.0655]
# Evaluate Keci on Train set: Evaluate Keci on Train set
# {'H@1': 0.9976993865030674, 'H@3': 0.9997124233128835, 'H@10': 0.9999041411042945, 'MRR': 0.9987183437408705}
# Evaluate Keci on Validation set: Evaluate Keci on Validation set
# {'H@1': 0.3197852760736196, 'H@3': 0.5398773006134969, 'H@10': 0.7714723926380368, 'MRR': 0.46912531544840963}
# Evaluate Keci on Test set: Evaluate Keci on Test set
# {'H@1': 0.329803328290469, 'H@3': 0.5711043872919819, 'H@10': 0.7934947049924357, 'MRR': 0.4858500337837166}
```
In KvsAll and AllvsAll, a single data point **z=(x,y)** corresponds to a tuple of input indices **x** and multi-label output vector **y**.
**x** is a tuple of indices of a unique entity and relation pair.
**y** contains a binary vector of size of the number of unique entities.

To mitigate the rate of overfitting, many regularization techniques can be applied ,e.g.,
Stochastic Weight Averaging (SWA), Adaptive Stochastic Weight Averaging (ASWA), or Dropout.
Use ```--swa``` to apply Stochastic Weight Averaging
```bash
dicee --dataset_dir "KGs/UMLS" --model "Keci" --p 0 --q 1 --trainer "PL" --scoring_technique "KvsAll" --embedding_dim 256 --num_epochs 100 --batch_size 32 --swa
# Epoch 99: 100%|███████████| 13/13 [00:00<00:00, 85.61it/s, loss_step=8.11e-6, loss_epoch=8.92e-6]
# Evaluate Keci on Train set: Evaluate Keci on Train set
# {'H@1': 1.0, 'H@3': 1.0, 'H@10': 1.0, 'MRR': 1.0}
# Evaluate Keci on Validation set: Evaluate Keci on Validation set
# {'H@1': 0.45858895705521474, 'H@3': 0.6510736196319018, 'H@10': 0.8458588957055214, 'MRR': 0.5845156794070833}
# Evaluate Keci on Test set: Evaluate Keci on Test set
# {'H@1': 0.4636913767019667, 'H@3': 0.651285930408472, 'H@10': 0.8456883509833586, 'MRR': 0.5877221440365971}
# Total Runtime: 25.417 seconds
```
Use ```--adaptive_swa``` to apply Adaptive Stochastic Weight Averaging. Currently, ASWA should not be used with DDP on multi GPUs.
We are working on it.
```bash
CUDA_VISIBLE_DEVICES=0 dicee --dataset_dir "KGs/UMLS" --model "Keci" --p 0 --q 1 --trainer "PL" --scoring_technique "KvsAll" --embedding_dim 256 --num_epochs 100 --batch_size 32 --adaptive_swa
# Epoch 99: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 49/49 [00:00<00:00, 93.86it/s, loss_step=0.0978, loss_epoch=0.143]
# Evaluate Keci on Train set: Evaluate Keci on Train set
# {'H@1': 0.9974118098159509, 'H@3': 0.9992331288343558, 'H@10': 0.9996165644171779, 'MRR': 0.9983922084274367}
# Evaluate Keci on Validation set: Evaluate Keci on Validation set
# {'H@1': 0.7668711656441718, 'H@3': 0.8696319018404908, 'H@10': 0.9440184049079755, 'MRR': 0.828767705987023}
# Evaluate Keci on Test set: Evaluate Keci on Test set
#{'H@1': 0.7844175491679274, 'H@3': 0.8888048411497731, 'H@10': 0.9546142208774584, 'MRR': 0.8460991515345323}
```
```bash
CUDA_VISIBLE_DEVICES=0 dicee --dataset_dir "KGs/UMLS" --model "Keci" --p 0 --q 1 --trainer "PL" --scoring_technique "KvsAll" --embedding_dim 256 --input_dropout_rate 0.1 --num_epochs 100 --batch_size 32 --adaptive_swa
# Epoch 99: 100%|██████████████████████████████████████████████████████████| 49/49 [00:00<00:00, 93.49it/s, loss_step=0.600, loss_epoch=0.553]
# Evaluate Keci on Train set: Evaluate Keci on Train set
# {'H@1': 0.9970283742331288, 'H@3': 0.9992331288343558, 'H@10': 0.999808282208589, 'MRR': 0.9981489117237927}
# Evaluate Keci on Validation set: Evaluate Keci on Validation set
# {'H@1': 0.8473926380368099, 'H@3': 0.9049079754601227, 'H@10': 0.9470858895705522, 'MRR': 0.8839172788777631}
# Evaluate Keci on Test set: Evaluate Keci on Test set
# {'H@1': 0.8381240544629349, 'H@3': 0.9167927382753404, 'H@10': 0.9568835098335855, 'MRR': 0.8829572716873321}

CUDA_VISIBLE_DEVICES=0 dicee --dataset_dir "KGs/UMLS" --model "Keci" --p 0 --q 1 --trainer "PL" --scoring_technique "KvsAll" --embedding_dim 256 --input_dropout_rate 0.2 --num_epochs 100 --batch_size 32 --adaptive_swa
# Epoch 99: 100%|██████████████████████████████████████████████████████████| 49/49 [00:00<00:00, 94.43it/s, loss_step=0.108, loss_epoch=0.111]
# Evaluate Keci on Train set: Evaluate Keci on Train set
# {'H@1': 0.9818826687116564, 'H@3': 0.9942484662576687, 'H@10': 0.9972200920245399, 'MRR': 0.9885307022708297}
# Evaluate Keci on Validation set: Evaluate Keci on Validation set
# {'H@1': 0.8581288343558282, 'H@3': 0.9156441717791411, 'H@10': 0.9447852760736196, 'MRR': 0.8930935122236525}
# Evaluate Keci on Test set: Evaluate Keci on Test set
# {'H@1': 0.8494704992435703, 'H@3': 0.9334341906202723, 'H@10': 0.9667170953101362, 'MRR': 0.8959156201718665}
```


```

## Docker
<details> <summary> Details</summary>
To build the Docker image:
2 changes: 2 additions & 0 deletions dicee/abstracts.py
Original file line number Diff line number Diff line change
@@ -26,6 +26,8 @@ def __init__(self, args, callbacks):
self.attributes = args
self.callbacks = callbacks
self.is_global_zero = True
self.global_rank=0
self.local_rank = 0
# Set True to use Model summary callback of pl.
torch.manual_seed(self.attributes.random_seed)
torch.cuda.manual_seed_all(self.attributes.random_seed)
Loading