feat: Serving Gemma 2 with multiple LoRA adapters with Text Generatio…

…n Inference (TGI) on Vertex AI notebook (#1586) # Description This notebook showcases how to deploy Gemma 2 2B from the Hugging Face Hub with multiple LoRA adapters fine-tuned for different purposes such as coding, or SQL using HuggingFace's Text Generation Inference (TGI) Deep Learning Container (DLC) in combination with a [custom handler](https://huggingface.co/docs/inference-endpoints/en/guides/custom_handler#create-custom-inference-handler) on Vertex AI. --------- Co-authored-by: Holt Skinner <[email protected]> Co-authored-by: Holt Skinner <[email protected]>
GoogleCloudPlatform · Jan 9, 2025 · 924c851 · 924c851
1 parent bd6f555
commit 924c851
Show file tree

Hide file tree

Showing 3 changed files with 1,548 additions and 2 deletions.
diff --git a/.github/actions/spelling/allow.txt b/.github/actions/spelling/allow.txt
@@ -779,6 +779,7 @@ getdata
 getexif
 getparent
 gfile
+gguf
 gidiyor
 github
 gitleaks
@@ -893,6 +894,7 @@ lru
 lsb
 lxml
 lycra
+magicoder
 magika
 mahut
 makeover
@@ -960,6 +962,7 @@ ngrams
 nlp
 nmade
 nmilitary
+nmy
 noabe
 nobserved
 nodularis
@@ -1031,6 +1034,7 @@ projectid
 proname
 protobuf
 pstotext
+pth
 pubmed
 pubspec
 putalpha
@@ -1208,16 +1212,16 @@ wdir
 weaviate
 webcam
 webclient
+webfonts
 webpage
 webpages
-webfonts
 webrtc
 websites
 weightage
 welcom
 werden
-whatsapp
 wght
+whatsapp
 wiffle
 wikipedia
 wil

diff --git a/open-models/README.md b/open-models/README.md
@@ -9,11 +9,16 @@ This repository contains examples for deploying and fine-tuning open source mode
 - [serving/cloud_run_ollama_gemma2_rag_qa.ipynb](./serving/cloud_run_ollama_gemma2_rag_qa.ipynb) - This notebooks provides steps and code to deploy an open source RAG pipeline to Cloud Run using Ollama and the Gemma 2 model.
 - [serving/vertex_ai_text_generation_inference_gemma.ipynb](./serving/vertex_ai_text_generation_inference_gemma.ipynb) - This notebooks provides steps and code to deploy Google Gemma with the Hugging Face DLC for Text Generation Inference (TGI) on Vertex AI.
 - [serving/vertex_ai_pytorch_inference_paligemma_with_custom_handler.ipynb](./serving/vertex_ai_pytorch_inference_paligemma_with_custom_handler.ipynb) - This notebooks provides steps and code to deploy Google PaliGemma with the Hugging Face Python Inference DLC using a custom handler on Vertex AI.
+- [serving/vertex_ai_tgi_gemma_multi_lora_adapters_deployment.ipynb](./serving/vertex_ai_tgi_gemma_multi_lora_adapters_deployment.ipynb) - This notebook showcases how to deploy Gemma 2 from the Hugging Face Hub with multiple LoRA adapters fine-tuned for different purposes such as coding, or SQL using Hugging Face's Text Generation Inference (TGI) Deep Learning Container (DLC) in combination with a custom handler on Vertex AI.
 
 ### Fine-tuning
 
 - [fine-tuning/vertex_ai_trl_fine_tuning_gemma.ipynb](./fine-tuning/vertex_ai_trl_fine_tuning_gemma.ipynb) - This notebooks provides steps and code to fine-tune Google Gemma with TRL via the Hugging Face PyTorch DLC for Training on Vertex AI.
 
+### Evaluation
+
+- [evaluation/vertex_ai_tgi_gemma_with_genai_evaluation.ipynb](./evaluation/vertex_ai_tgi_gemma_with_genai_evaluation.ipynb) - This notebooks provides steps and code to use the Vertex AI Gen AI Evaluation framework to evaluate Gemma 2 in a summarization task.
+
 ### Use cases
 
 - [use-cases/guess_app.ipynb](./use-cases/guess_app.ipynb) - This notebook shows how to build a "Guess Who or What" app using FLUX and Gemini.