Releases · predibase/lorax

08 Jan 17:14

tgaddair

v0.5.0

57d5470

v0.5: CUDA graph compilation

🎉 Enhancements

CUDA graph compilation by @tgaddair in #154

🐛 Bugfixes

Fixed deadlock in sgmv_shrink kernel caused by imbalanced segments by @tgaddair in #156
Fixed loading adapter from absolute s3 path by @tgaddair in #161

📝 Docs

Update client docs with new endpoint source by @abidwael in #126
Update client docs with new endpoint source by @abidwael in #146

🔧 Maintenance

Reduce Docker size by removing duplicate torch install by @tgaddair in #144
remove CACHE_MANAGER in flash_causal_lm.py by @michaelfeil in #157

New Contributors

@michaelfeil made their first contribution in #157

Full Changelog: v0.4.1...v0.5.0

Contributors

tgaddair, michaelfeil, and abidwael

Assets 2

18 Dec 19:53

tgaddair

v0.4.1

9ae65b3

v0.4.1

🐛 Bugfixes

fix: Phi LoRA loading by @tgaddair in #136
fix: Triton usage for GPT-Q by @tgaddair in #140

🔧 Maintenance

Optimize SGMV kernel code path to reduce mallocs by @tgaddair in #139
fix sync script to account for subfolder bucket paths by @noyoshi in #135

Full Changelog: v0.4.0...v0.4.1

Contributors

tgaddair and noyoshi

Assets 2

15 Dec 18:15

tgaddair

v0.4.0

ce99dbf

v0.4.0

🎉 Enhancements

Mixtral by @flozi00 in #122
Added Phi by @tgaddair in #132
add support for H100s by @thelinuxkid in #111
upgrade to py 3.10 by @flozi00 in #121
Add predibase as a source for adapters by @magdyksaleh in #125
enh: Add soci indexing to allow Lazy loading of LoRAX images by @gyanesh-mishra in #95

🐛 Bugfixes

fix: Set Mistral sliding window to max position embeddings when None by @tgaddair in #128
Fix Qwen tensor parallelism by @tgaddair in #120
fix: Llama AWQ with GQA by @tgaddair in #114
fix: Mixtral adapter loading wraps lm_head by @tgaddair in #131

📝 Docs

Add Skypilot example and getting started guide by @tgaddair in #117
docs: fix broken link by @Fluder-Paradyne in #133
Added Mixtral and Phi to docs by @tgaddair in #134

🔧 Maintenance

Increase default client timeout to 60s by @tgaddair in #119
Make transpose contiguous for fan-in-fan-out by @tgaddair in #129
remove lorax env var by @geoffreyangus in #113

New Contributors

@gyanesh-mishra made their first contribution in #95
@thelinuxkid made their first contribution in #111
@Fluder-Paradyne made their first contribution in #133

Full Changelog: v0.3.0...v0.4.0

Contributors

thelinuxkid, tgaddair, and 5 other contributors

Assets 2

07 Dec 18:56

tgaddair

v0.3.0

bb950cc

v0.3.0

What's Changed

Enhancements

Add AWQ quantization by @flozi00 in #102
Add support for Qwen by @tgaddair in #103
Add Flash GPT2 by @geoffreyangus in #93
LoRAX-compatible GPT-2 by @geoffreyangus in #109

Bugfixes

decrease the max batch total tokens manually by @flozi00 in #89
Added --max-active-adapters to launcher by @tgaddair in #96
fix gptq fp16 inference by @flozi00 in #104
fix static adapter merge by @geoffreyangus in #106

Maintenance

Update values.yaml tag to always use the latest image by @arnavgarg1 in #87
Update chart version by @abidwael in #88
Warn if there are unused weights in the adapter by @tgaddair in #105
docs: Added client docs for connecting to Predibase endpoints by @tgaddair in #98
Generalized layer types and row parallel split logic by @tgaddair in #110
Mkdocs by @tgaddair in #112

New Contributors

@arnavgarg1 made their first contribution in #87

Full Changelog: v0.2.1...v0.3.0

Contributors

tgaddair, geoffreyangus, and 3 other contributors

Assets 2

07 Dec 18:51

noyoshi

lorax-0.3.0

bb950cc

lorax-0.3.0

LoRAX is the open-source framework for serving hundreds of fine-tuned LLMs in production for the price of one.

Assets 3

30 Nov 19:16

noyoshi

lorax-0.2.1

fb3cdb4

lorax-0.2.1

LoRAX is the open-source framework for serving hundreds of fine-tuned LLMs in production for the price of one.

Assets 3

29 Nov 20:29

tgaddair

v0.2.1

f1b9778

v0.2.1

What's Changed

Bugfixes

add weight property to exllamav2 quanlinear by @flozi00 in #80
fix: Assign dtype of lora to base model dtype by @tgaddair in #82

Full Changelog: v0.2.0...v0.2.1

Contributors

tgaddair and flozi00

Assets 2

28 Nov 22:01

tgaddair

v0.2.0

cb96f12

v0.2.0

What's Changed

Enhancements

Implement sparse SGMV by @tgaddair in #64
Implement tensor parallel SGMV by @tgaddair in #79
Add adapter support for all linear layers in Llama and Mistral by @tgaddair in #75
4 bit support by @flozi00 in #66
Exllamav2 by @flozi00 in #60

Bugfixes

Updated to custom SGMV kernel to fix issue with certain ranks by @tgaddair in #70
fix: Allow using unsupported base models without adapter loading by @tgaddair in #76

Maintenance

Add DISABLE_SGMV env var to explicitly fallback to loop by @tgaddair in #69
Upgrade the README discord badge and use an invite link that doesn't expire. by @justinxzhao in #73

New Contributors

@justinxzhao made their first contribution in #73

Full Changelog: v0.1.2...v0.2.0

Contributors

tgaddair, justinxzhao, and flozi00

Assets 2

26 Nov 21:33

tgaddair

v0.1.2

8c8109c

v0.1.2

What's Changed

Fixed adapter segments when batches contain multiple distinct adapters by @tgaddair in #62

Full Changelog: v0.1.1...v0.1.2

Contributors

tgaddair

Assets 2

22 Nov 22:47

tgaddair

v0.1.1

00675dc

v0.1.1

What's Changed

Add Helm charts to deploy models by @abidwael in #27
change defaults for helm chart by @noyoshi in #38
add helm release wf by @noyoshi in #39
Added support for YARN scaling by @tgaddair in #45
Fixed tensor parallelism splits by @tgaddair in #47
enh: enable CodeLlama by @geoffreyangus in #48
Fallback when Punica is not installed by @tgaddair in #49
add transformers gptq weights by @flozi00 in #52
Add support for paged attention v2 and update flash attention v2 by @tgaddair in #54
Fixed adapter loading for GPTQ base models by @tgaddair in #58
Update gha to be able to automatically push images with release tags by @magdyksaleh in #59

New Contributors

@abidwael made their first contribution in #27
@flozi00 made their first contribution in #52

Full Changelog: v0.1.0...v0.1.1

Contributors

tgaddair, magdyksaleh, and 4 other contributors

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🎉 Enhancements

🐛 Bugfixes

📝 Docs

🔧 Maintenance

New Contributors

Contributors

🐛 Bugfixes

🔧 Maintenance

Contributors

🎉 Enhancements

🐛 Bugfixes

📝 Docs

🔧 Maintenance

New Contributors

Contributors

What's Changed

Enhancements

Bugfixes

Maintenance

New Contributors

Contributors

What's Changed

Bugfixes

Contributors

What's Changed

Enhancements

Bugfixes

Maintenance

New Contributors

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

Releases: predibase/lorax

v0.5: CUDA graph compilation

🎉 Enhancements

🐛 Bugfixes

📝 Docs

🔧 Maintenance

New Contributors

Contributors

v0.4.1

🐛 Bugfixes

🔧 Maintenance

Contributors

v0.4.0

🎉 Enhancements

🐛 Bugfixes

📝 Docs

🔧 Maintenance

New Contributors

Contributors

v0.3.0

What's Changed

Enhancements

Bugfixes

Maintenance

New Contributors

Contributors

lorax-0.3.0

lorax-0.2.1

v0.2.1

What's Changed

Bugfixes

Contributors

v0.2.0

What's Changed

Enhancements

Bugfixes

Maintenance

New Contributors

Contributors

v0.1.2

What's Changed

Contributors

v0.1.1

What's Changed

New Contributors

Contributors