Transformer Tricks

A collection of tricks to simplify and speed up transformer models:

Slim attention: [podcast], [paper], [notebook]
Flash normalization: [podcast], [paper], [code]
Precomputing the first layer: [podcast], [paper]
Removing weights from skipless transformers: [podcast], [paper], [notebook]
Approximate attention [work in progress]: [podcast], [paper]

Many of these tricks follow a recent trend of removing parts from neural networks such as RMSNorm’s removal of mean centering from LayerNorm, T5’s removal of bias-parameters, NoPE’s removal of positional encoding, GPT’s removal of the encoder stack, and of course transformer’s revolutionary removal of recurrent layers. Specifically, our FlashNorm removes the weights from RMSNorm and merges them with the next linear layer. And slim attention removes the entire V-cache from the context memory for MHA transformers.

Getting Started

Install the transformer tricks package with pip:

pip install transformer-tricks

OLD docu:

Tricks and tools for speeding up LLMs:

Slim attention: cut your context memory in half without loss of accuracy [work in progress]:
- Notebook for paper:
Flash normalization:
- Notebook example for converting an LLM to FlashNorm:
- Notebook for paper:
- HuggingFace repo
Removing weights from skipless transformers:
- Notebook:

Please give us a ⭐ if you like this repo, thanks!

Name		Name	Last commit message	Last commit date
Latest commit History 129 Commits
doc		doc
notebooks		notebooks
python		python
tex		tex
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transformer Tricks

Getting Started

About

Releases

Packages

Contributors 3

Languages

License

OpenMachine-ai/transformer-tricks

Folders and files

Latest commit

History

Repository files navigation

Transformer Tricks

Getting Started

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages