softalign_objectives/experiments at master · MIR-MU/softalign_objectives

History

Name		Name	Last commit message	Last commit date
parent directory ..
ablations		ablations
baselines		baselines
seq_align		seq_align
token_align		token_align
README.md		README.md

README.md

Experiments running scripts

This folder contains reproducible scripts for all our experiments. Each experimental script should be self-explaining and reproducible as-is, without any additional parameters.

We have run our experiments from the project root, in this exact format:

CUDA_VISIBLE_DEVICES=XX COMET_PROJECT_NAME={your-project} COMET_API_KEY={key} python experiments/seq_align/adapt_emea_es-en.py

The experiments are categorized into folders and files by (1) training objective and (2) by the training domain:

Main new objectives:

token_align contain scripts resulting in the reported TokenAlign evaluations.
seq_align contain scripts resulting in the reported SeqAlign evaluations.

Ablation objectives:

ablations/SCE contain scripts resulting in the additional ablation results reported under SCE evaluations.
ablations/seq_align_dec contain scripts resulting in the additional ablation results reported under SeqAlign-dec evaluations.
ablations/seq_random contain scripts resulting in the additional ablation results reported under SCE evaluations.

Baselines:

baselines/adapters contain scripts of experiments with Adapters of Houlsby (2019)
baselines/ensembling contain scripts of experiments using the ensemble of pre-trained and fine-tuned models (Freitag, 2016), adapted to Transformers
baselines/lora contain scripts of experiments using LoRA of Hu (2022)
baselines/lora contain scripts of experiments of smoothed MLE (Szegedy, 2016)

Selected domains:

All our objectives are evaluated on the following domains ordered by size.

All our training and evaluation domains are publicly available on https://opus.nlpl.eu. We also deliver unified interface for their access: see utils/data_utils/OPUSDataset and its use within the experiments.

bible: german -> english: 2.9M tokens, 0.06 sentences
TEDTalks: english -> chinese (simplified): 5.1M tokens, 0.1M sentences
opensub: english -> ukraine: 9.5M tokens 0.8M sentences
wikipedia: english -> czech: 4.3M tokens, 0.1M sentences
EMEA v3: estonian -> english: 5.7M tokens, 0.3M sentences
DGT: english -> deutsch: 166M words, 5.1M sentences

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

experiments

experiments

README.md

Experiments running scripts

Selected domains:

Files

experiments

Directory actions

More options

Directory actions

More options

Latest commit

History

experiments

Folders and files

parent directory

README.md

Experiments running scripts

Selected domains: