Skip to content

OpenMachine-ai/transformer-tricks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Transformer Tricks

PyPI Downloads


A collection of tricks to simplify and speed up transformer models:

Many of these tricks follow a recent trend of removing parts from neural networks such as RMSNorm’s removal of mean centering from LayerNorm, T5’s removal of bias-parameters, NoPE’s removal of positional encoding, GPT’s removal of the encoder stack, and of course transformer’s revolutionary removal of recurrent layers. Specifically, our FlashNorm removes the weights from RMSNorm and merges them with the next linear layer. And slim attention removes the entire V-cache from the context memory for MHA transformers.


Getting Started

Install the transformer tricks package with pip:

pip install transformer-tricks

OLD docu:

Tricks and tools for speeding up LLMs:

  • Slim attention: cut your context memory in half without loss of accuracy [work in progress]:

    • Notebook for paper: Colab
  • Flash normalization:

    • Notebook example for converting an LLM to FlashNorm: Colab
    • Notebook for paper: Colab
    • HuggingFace repo
  • Removing weights from skipless transformers:

    • Notebook: Colab

Please give us a ⭐ if you like this repo, thanks!