The TaskBench500 dataset and code for generating custom tasks and adapting a model to them.
The TaskBench dataset is available under
wget dl.fbaipublicfiles.com/task_bench/taskbench500.tar.gz
tar -xzvf taskbench500.tar.gz
The file structure looks as follows:
.
├── atomic # atomic tasks
├── seq_composite # sequential compositional tasks
│ ├── filter # tasks of the form filter{λx.F(x)}(S)
│ ├── map # tasks of the form map{λx.F(x)}(S)
│ └── mapfilter # tasks of the form map{λx.F(x)}(filter{λx.F(x)}(S))
└── word_composite # word-level compositional tasks
├── chaining # tasks of the form F(F')
├── intersection # tasks of the form F∩F'
├── land # tasks of the form F∧F'
├── lor # tasks of the form F∨F'
└── union # tasks of the form F∪F'
where F
and F'
are arbitrary word-level functions.
If you wish the run any part of the code (dataset creation, model adaptation), you should set up your environment as follows:
conda create -n taskbench PYTHON=3.9
conda activate taskbench
Follow instructions here to install pytorch: https://pytorch.org/get-started/locally/
Then install the rest of the packages by running:
pip install -r requirements.txt
To create lexical tasks, you should install multi-lingual wordnet from NLTK. Open an interactive python session and run
import nltk
nltk.download('omw-1.4')
Before running any of the commands below, be sure to run
export PYTHONPATH=.
Begin by downloading the necessary resources for the creation of these tasks.
wget dl.fbaipublicfiles.com/task_bench/resources.tar.gz
tar -xzvf resources.tar.gz
This section describes how to set up Wikidata endpoint to create factual tasks. If you do not need to create tasks which require factual knowledge, you may skip this section.
These instructions were adapted from the following GIT repository.
- Install and the opensource version of Virtuoso database server.
# for Linux
wget https://github.com/openlink/virtuoso-opensource/releases/download/v7.2.7/virtuoso-opensource.x86_64-generic_glibc25-linux-gnu.tar.gz
tar -xzvf virtuoso-opensource.x86_64-generic_glibc25-linux-gnu.tar.gz
# for other OS, locate the OS-specific download links in http://vos.openlinksw.com/owiki/wiki/VOS/VOSDownload
- After untar the downloaded binary, the main folder is
virtuoso-opensource
. - Download the pre-built DB file and extract into
virtuoso-opensource/database
. The zipped DB file contains two files:virtuoso.db
andvirtuoso.ini
.
- By default the server runs on port 8890. You may change this by changing
ServerPort
andDefaultHost
invirtuoso.ini
. You should also change L11 infunction/wikidata_fns.py
in this repository to point to the Virtuoso server. - Start the server by running:
virtuoso-opensource/bin/virtuoso-t -f -c virtuoso-opensource/database/virtuoso.ini
This may take up to a few hours. The starting logs (shown on terminal) look like the following:
To create a dataset of (input, output)
pairs corresponding to a particular function specification, run
python scripts/make_data.py \
--function <function> \
--save_dir <output_dataset_directory> \
--sample_type [word|seq] \
(--num_samples <number_of_samples>)
function
takes in a string representation of the task you wish to generate. See below for further guidance on how this is formatted.save_dir
specifies where the output dataset will be saved. The script will save to a file<save_dir>/<function>.jsonl
.sample_type
specifies how the input will be constructed -- whether the inputs will be single words/entities (word
) or sequences of words/entities (seq
).num_samples
refers to how many (sequential) inputs will be sampled for the final dataset. This only applicable ifsample_type
isseq
, as the entire vocabulary (up to 10k words/entities) is used as input for word-wise tasks.
For bulk processing of functions, you can alternatively pass in a filepath to --function_file
(instead of specifying a single function to --function
), where function_file
is a file consisting of a list of functions separated by newlines. For example, passing a function file containing:
antonyms[eng](0)
filter{is{POS=noun}[eng](0)}(S)
wiki{mother(creator(0))}
to make_data.py
script is equivalent to running the script 3 times, once for each function, using --function
.
The syntax for specifying a function looks like
function_name{X1,X2,...}[Y1,Y2,...](Z1,Z2,...)
- arguments in braces and brackets (
X1,X2,Y1,Y2
) represent particular features of the function, such as input language, random seed, the per-token operation in the case ofmap
andfilter
functions, etc. - arguments in parentheses (
Z1,Z2
) represent inputs to the function. They can be functions themselves in the case of function composition. By default,0
is used to represent the input for word-wise functions, andS
is used to represent the input for sequential functions.
All functions requiring Wikidata are specified within a wiki{...}
brace, which tells the function creation script that the entire portion within the braces must be converted to a SPARQL query. All function
Some examples of functions include:
antonyms[eng](0)
wiki{child(0)}
is{POS=noun}[eng](0)
antonyms[eng](entailments[eng](0))
wiki{mother(creator(0))}
union(antonyms[eng](0),entailments[eng](0))
filter{is{POS=noun}[eng](0)}(S)
map{synonyms[eng](0)}(filter{is{POS=adj}[eng](0)}(S))
See Appendix Tables 4-8 in the paper for further examples of functions.
The code for adapting models to TaskBench tasks can be found in run_models/*
.
To fine-tune or prompt-tune a model, run
python run_models/ft_model.py \
--data <path_to_data_directory> \
--batchsize <batchsize> \
--arch t5-base --lr [0.001|1.0] \
--epochs <epochs> --patience <patience> \
--eval_interval <eval_interval> \
--valid_metric [accuracy|token_accuracy] \
--seed <random_seed> \
(--train_size <max_num_training_samples_to_use>)
(--do_prompt_tune --n_prefix_tokens <num_prompt_tokens_to_use>)
The data directory should be comprised of 3 files: train.jsonl
, dev.jsonl
, and test.jsonl
.
We use accuracy
as the valid_metric for wordwise tasks, and token_accuracy
for sequential tasks.
By default, the script will perform fine-tuning. To do prompt-tuning, add the --do_prompt_tune
flag and specify the number of prefix tokens using --n_prefix_tokens <n>
. We found that the optimal learning rate on our tasks is 0.001 for fine-tuning and 1.0 for prompt-tuning.
@misc{https://doi.org/10.48550/arxiv.2112.03204,
doi = {10.48550/ARXIV.2112.03204},
url = {https://arxiv.org/abs/2112.03204},
author = {Li, Belinda Z. and Yu, Jane and Khabsa, Madian and Zettlemoyer, Luke and Halevy, Alon and Andreas, Jacob},
keywords = {Computation and Language (cs.CL), Machine Learning (cs.LG), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {Quantifying Adaptability in Pre-trained Language Models with 500 Tasks},
publisher = {arXiv},
year = {2021},
copyright = {arXiv.org perpetual, non-exclusive license}
}