[Inference ] Integrate chat template in llm-on-ray (#199)

* integrate inference chat template Signed-off-by: minmingzhu <[email protected]> * update Signed-off-by: minmingzhu <[email protected]> * update Signed-off-by: minmingzhu <[email protected]> * update Signed-off-by: minmingzhu <[email protected]> * update Signed-off-by: minmingzhu <[email protected]> * update Signed-off-by: minmingzhu <[email protected]> * Update query_http_requests.py * update Signed-off-by: minmingzhu <[email protected]> * update Signed-off-by: minmingzhu <[email protected]> * update * update * update * update yaml file * update yaml * format yaml * update * Update mpt_deltatuner.yaml * update * Update neural-chat-7b-v3-1.yaml * Update predictor_deployment.py * 1. add jinja file 2. add chat template unit test 3. fix comments Signed-off-by: minmingzhu <[email protected]> * add license header Signed-off-by: minmingzhu <[email protected]> * update Signed-off-by: minmingzhu <[email protected]> * update Signed-off-by: minmingzhu <[email protected]> * update Signed-off-by: minmingzhu <[email protected]> * Update bloom-560m-ci.yaml * debug CI Signed-off-by: minmingzhu <[email protected]> * debug CI Signed-off-by: minmingzhu <[email protected]> * Update VLLM installation script and documentation (#212) * Update VLLM installation script and documentation Signed-off-by: Wu, Xiaochang <[email protected]> * nit Signed-off-by: Wu, Xiaochang <[email protected]> * Update vLLM installation message Signed-off-by: Wu, Xiaochang <[email protected]> * Update installation instructions for vLLM CPU Signed-off-by: Wu, Xiaochang <[email protected]> * Update Dockerfile.vllm Signed-off-by: Wu, Xiaochang <[email protected]> * Update VLLM version to 0.4.1 Signed-off-by: Wu, Xiaochang <[email protected]> * update doc Signed-off-by: Wu, Xiaochang <[email protected]> * nit Signed-off-by: Wu, Xiaochang <[email protected]> * nit Signed-off-by: Wu, Xiaochang <[email protected]> --------- Signed-off-by: Wu, Xiaochang <[email protected]> * [Workflow] Unify Docker operations into bash (#123) * docker2sh test * codepath * codepath * codepath * add * add * add * add * add * add * df * docker.sh * docker bash * docker bash * docker bash * docker bash * inference docker bash * merge main0312 * merge main0312 * merge main0312 * test set-e * fix test * fix * fix * fix * test error * test error * add map * test install error * test install error * test install error * test install error * test * test * fix * fix * fix * only inference * fux * fux * fux * target * target * target * fix proxy * fix proxy * fix proxy * fix proxy * fix proxy * fix proxy * fix proxy * fix fuc * fix fuc * fix fuc * all inference * add finetune * fix * fix * fix * fix * fix finetune * fix finetune * fix review * fix review * fix review * add info output * Update proxy settings and Docker configurations Signed-off-by: Wu, Xiaochang <[email protected]> * fix vllm pr212 * fix * fix * change name --------- Signed-off-by: Wu, Xiaochang <[email protected]> Co-authored-by: Wu, Xiaochang <[email protected]> * fix comments Signed-off-by: minmingzhu <[email protected]> * update code style Signed-off-by: minmingzhu <[email protected]> * Fix openai response for vLLM (#213) * [CI] Add llama2-70b inference workflow (#208) * add llama-2-70b * nit * fix vllm inference ci * Revert "fix vllm inference ci" This reverts commit 36062bd. * Fix StoppingCriteriaSub parameters to be compatible with latest Transformers (#215) * 1. fix CI 2. fix comments Signed-off-by: minmingzhu <[email protected]> * format Signed-off-by: minmingzhu <[email protected]> * modify jinja path Signed-off-by: minmingzhu <[email protected]> * fix comments Signed-off-by: minmingzhu <[email protected]> * update Signed-off-by: minmingzhu <[email protected]> * fix comments Signed-off-by: minmingzhu <[email protected]> * update Signed-off-by: minmingzhu <[email protected]> * update Signed-off-by: minmingzhu <[email protected]> * update jinja Signed-off-by: minmingzhu <[email protected]> * update jinja file Signed-off-by: minmingzhu <[email protected]> --------- Signed-off-by: minmingzhu <[email protected]> Signed-off-by: Wu, Xiaochang <[email protected]> Signed-off-by: minmingzhu <[email protected]> Co-authored-by: Xiaochang Wu <[email protected]> Co-authored-by: yutianchen <[email protected]> Co-authored-by: KepingYan <[email protected]> Co-authored-by: Yizhong Zhang <[email protected]> Co-authored-by: Zhi Lin <[email protected]>
intel · May 16, 2024 · 620800f · 620800f
1 parent 9142112
commit 620800f
Show file tree

Hide file tree

Showing 53 changed files with 523 additions and 580 deletions.
diff --git a/.github/workflows/config/bloom-560m-ci.yaml b/.github/workflows/config/bloom-560m-ci.yaml
@@ -13,9 +13,4 @@ ipex:
 model_description:  
   model_id_or_path: bigscience/bloom-560m
   tokenizer_name_or_path: bigscience/bloom-560m
-  chat_processor: ChatModelGptJ
-  prompt:
-    intro: ''
-    human_id: ''
-    bot_id: ''
-    stop_words: []
+  chat_template: "llm_on_ray/inference/models/templates/template_gpt2.jinja"
diff --git a/.github/workflows/config/gpt2-ci.yaml b/.github/workflows/config/gpt2-ci.yaml
@@ -14,10 +14,5 @@ ipex:
 model_description:
   model_id_or_path: gpt2
   tokenizer_name_or_path: gpt2
-  chat_processor: ChatModelGptJ
   gpt_base_model: true
-  prompt:
-    intro: ''
-    human_id: ''
-    bot_id: ''
-    stop_words: []
+  chat_template: "llm_on_ray/inference/models/templates/template_gpt2.jinja"
diff --git a/.github/workflows/config/llama-2-7b-chat-hf-vllm-fp32.yaml b/.github/workflows/config/llama-2-7b-chat-hf-vllm-fp32.yaml
@@ -16,13 +16,5 @@ ipex:
 model_description:
   model_id_or_path: meta-llama/Llama-2-7b-chat-hf
   tokenizer_name_or_path: meta-llama/Llama-2-7b-chat-hf
-  chat_processor: ChatModelLLama
-  prompt:
-    intro: ''
-    human_id: '[INST] {msg} [/INST]
-
-      '
-    bot_id: ''
-    stop_words: []
   config:
     use_auth_token: ''
diff --git a/.github/workflows/config/mpt_deltatuner.yaml b/.github/workflows/config/mpt_deltatuner.yaml
@@ -13,20 +13,7 @@ ipex:
 model_description:
   model_id_or_path: mosaicml/mpt-7b
   tokenizer_name_or_path: EleutherAI/gpt-neox-20b
-  chat_processor: ChatModelGptJ
   peft_model_id_or_path: nathan0/mpt-7b-deltatuner-model
   peft_type: deltatuner
-  prompt:
-    intro: 'Below is an instruction that describes a task, paired with an input that
-      provides further context. Write a response that appropriately completes the request.
-
-      '
-    human_id: '
-
-      ### Instruction'
-    bot_id: '
-
-      ### Response'
-    stop_words: []
   config:
     trust_remote_code: true
diff --git a/.github/workflows/config/mpt_deltatuner_deepspeed.yaml b/.github/workflows/config/mpt_deltatuner_deepspeed.yaml
@@ -13,20 +13,7 @@ ipex:
 model_description:
   model_id_or_path: mosaicml/mpt-7b
   tokenizer_name_or_path: EleutherAI/gpt-neox-20b
-  chat_processor: ChatModelGptJ
   peft_model_id_or_path: nathan0/mpt-7b-deltatuner-model
   peft_type: deltatuner
-  prompt:
-    intro: 'Below is an instruction that describes a task, paired with an input that
-      provides further context. Write a response that appropriately completes the request.
-
-      '
-    human_id: '
-
-      ### Instruction'
-    bot_id: '
-
-      ### Response'
-    stop_words: []
   config:
     trust_remote_code: true
diff --git a/.github/workflows/config/opt-125m-ci.yaml b/.github/workflows/config/opt-125m-ci.yaml
@@ -13,9 +13,4 @@ ipex:
 model_description:
   model_id_or_path: facebook/opt-125m
   tokenizer_name_or_path: facebook/opt-125m
-  chat_processor: ChatModelGptJ
-  prompt:
-    intro: ''
-    human_id: ''
-    bot_id: ''
-    stop_words: []
+  chat_template: "llm_on_ray/inference/models/templates/template_gpt2.jinja"
diff --git a/README.md b/README.md
@@ -80,7 +80,7 @@ curl $ENDPOINT_URL/chat/completions \
     -H "Content-Type: application/json" \
     -d '{
     "model": "gpt2",
-    "messages": [{"role": "assistant", "content": "You are a helpful assistant."}, {"role": "user", "content": "Hello!"}],
+    "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Hello!"}],
     "temperature": 0.7
     }'
 

diff --git a/docs/serve.md b/docs/serve.md
@@ -52,7 +52,7 @@ curl $ENDPOINT_URL/chat/completions \
     -H "Content-Type: application/json" \
     -d '{
     "model": $MODEL_NAME,
-    "messages": [{"role": "assistant", "content": "You are a helpful assistant."}, {"role": "user", "content": "Hello!"}],
+    "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Hello!"}],
     "temperature": 0.7
     }'
 

diff --git a/examples/inference/api_server_openai/openai_tools_call_query.py b/examples/inference/api_server_openai/openai_tools_call_query.py
@@ -75,11 +75,11 @@
 ]
 messages = [
     [
-        {"role": "user", "content": "You are a helpful assistant"},
+        {"role": "system", "content": "You are a helpful assistant"},
         {"role": "user", "content": "What's the weather like in Boston today?"},
     ],
     [
-        {"role": "user", "content": "You are a helpful assistant"},
+        {"role": "system", "content": "You are a helpful assistant"},
         {"role": "user", "content": "Tell me a short joke?"},
     ],
 ]

diff --git a/examples/inference/api_server_openai/query_http_requests.py b/examples/inference/api_server_openai/query_http_requests.py
@@ -58,7 +58,7 @@
 body = {
     "model": args.model_name,
     "messages": [
-        {"role": "assistant", "content": "You are a helpful assistant."},
+        {"role": "system", "content": "You are a helpful assistant."},
         {"role": "user", "content": args.input_text},
     ],
     "stream": args.streaming_response,

diff --git a/examples/inference/api_server_openai/query_http_requests_tool.py b/examples/inference/api_server_openai/query_http_requests_tool.py
@@ -73,7 +73,7 @@
 
 messages = [
     [
-        {"role": "user", "content": "You are a helpful assistant"},
+        {"role": "system", "content": "You are a helpful assistant"},
         {"role": "user", "content": "What's the weather like in Boston today?"},
     ],
 ]

diff --git a/llm_on_ray/inference/chat_process.py b/llm_on_ray/inference/chat_process.py