Fastai community entry to 2020 Reproducibility Challenge
If you don't already, its a good idea to install the package into a virtual environment
python3 -m venv my_env
source ./my_env/bin/activate
Then you can install the package by running:
pip install git+git://github.com/arampacha/reformer_fastai.git
- Reformer Paper
- Authors ICLR video
- Google Blog
- Authors code (TRAX)
- Reformer enwik8 model and training config
- @lucidrain’s Reformer code
- HuggingFace: Reformer source code
- HuggingFace: Reformer notebook example
- HuggingFace: long sequences
- HuggingFace: Pretraining
enwik8
- enwik8.zip, raw data, 100mb
- Tensor2Tensor enwik8 data generator code, with train/dev/test split. File lengths:
- Train: 89,621,832
- Eval: 5,000,000
- Test: 5,000,000
- enwik8 notebook Tensor2Tensor
WMT14
- WMT on HuggingFace Datasets
- Reformer WMT14 vocab
- Reformer.input_vocab_size = 33300, from WMT14 model config
- Train Test split: (guess) newstest2013 for validation and newstest2014 for test, in consistence with Vaswani et al. (2017) - from https://arxiv.org/pdf/2009.02070.pdf
- Tokenizer: Tensor2Tensor SubWordTextEncoder