Skip to content

Commit

Permalink
[Inference ] Integrate chat template in llm-on-ray (#199)
Browse files Browse the repository at this point in the history
* integrate inference chat template

Signed-off-by: minmingzhu <[email protected]>

* update

Signed-off-by: minmingzhu <[email protected]>

* update

Signed-off-by: minmingzhu <[email protected]>

* update

Signed-off-by: minmingzhu <[email protected]>

* update

Signed-off-by: minmingzhu <[email protected]>

* update

Signed-off-by: minmingzhu <[email protected]>

* Update query_http_requests.py

* update

Signed-off-by: minmingzhu <[email protected]>

* update

Signed-off-by: minmingzhu <[email protected]>

* update

* update

* update

* update yaml file

* update yaml

* format yaml

* update

* Update mpt_deltatuner.yaml

* update

* Update neural-chat-7b-v3-1.yaml

* Update predictor_deployment.py

* 1. add jinja file
2. add chat template unit test
3. fix comments

Signed-off-by: minmingzhu <[email protected]>

* add license header

Signed-off-by: minmingzhu <[email protected]>

* update

Signed-off-by: minmingzhu <[email protected]>

* update

Signed-off-by: minmingzhu <[email protected]>

* update

Signed-off-by: minmingzhu <[email protected]>

* Update bloom-560m-ci.yaml

* debug CI

Signed-off-by: minmingzhu <[email protected]>

* debug CI

Signed-off-by: minmingzhu <[email protected]>

* Update VLLM installation script and documentation (#212)

* Update VLLM installation script and documentation

Signed-off-by: Wu, Xiaochang <[email protected]>

* nit

Signed-off-by: Wu, Xiaochang <[email protected]>

* Update vLLM installation message

Signed-off-by: Wu, Xiaochang <[email protected]>

* Update installation instructions for vLLM CPU

Signed-off-by: Wu, Xiaochang <[email protected]>

* Update Dockerfile.vllm

Signed-off-by: Wu, Xiaochang <[email protected]>

* Update VLLM version to 0.4.1

Signed-off-by: Wu, Xiaochang <[email protected]>

* update doc

Signed-off-by: Wu, Xiaochang <[email protected]>

* nit

Signed-off-by: Wu, Xiaochang <[email protected]>

* nit

Signed-off-by: Wu, Xiaochang <[email protected]>

---------

Signed-off-by: Wu, Xiaochang <[email protected]>

* [Workflow]  Unify Docker operations into bash (#123)

* docker2sh test

* codepath

* codepath

* codepath

* add

* add

* add

* add

* add

* add

* df

* docker.sh

* docker bash

* docker bash

* docker bash

* docker bash

* inference docker bash

* merge main0312

* merge main0312

* merge main0312

* test set-e

* fix test

* fix

* fix

* fix

* test error

* test error

* add map

* test install error

* test install error

* test install error

* test install error

* test

* test

* fix

* fix

* fix

* only inference

* fux

* fux

* fux

* target

* target

* target

* fix proxy

* fix proxy

* fix proxy

* fix proxy

* fix proxy

* fix proxy

* fix proxy

* fix fuc

* fix fuc

* fix fuc

* all inference

* add finetune

* fix

* fix

* fix

* fix

* fix finetune

* fix finetune

* fix review

* fix review

* fix review

* add info output

* Update proxy settings and Docker configurations

Signed-off-by: Wu, Xiaochang <[email protected]>

* fix vllm pr212

* fix

* fix

* change name

---------

Signed-off-by: Wu, Xiaochang <[email protected]>
Co-authored-by: Wu, Xiaochang <[email protected]>

* fix comments

Signed-off-by: minmingzhu <[email protected]>

* update code style

Signed-off-by: minmingzhu <[email protected]>

* Fix openai response for vLLM (#213)

* [CI] Add llama2-70b inference workflow (#208)

* add llama-2-70b

* nit

* fix vllm inference ci

* Revert "fix vllm inference ci"

This reverts commit 36062bd.

* Fix StoppingCriteriaSub parameters to be compatible with latest Transformers (#215)

* 1. fix CI
2. fix comments

Signed-off-by: minmingzhu <[email protected]>

* format

Signed-off-by: minmingzhu <[email protected]>

* modify jinja path

Signed-off-by: minmingzhu <[email protected]>

* fix comments

Signed-off-by: minmingzhu <[email protected]>

* update

Signed-off-by: minmingzhu <[email protected]>

* fix comments

Signed-off-by: minmingzhu <[email protected]>

* update

Signed-off-by: minmingzhu <[email protected]>

* update

Signed-off-by: minmingzhu <[email protected]>

* update jinja

Signed-off-by: minmingzhu <[email protected]>

* update jinja file

Signed-off-by: minmingzhu <[email protected]>

---------

Signed-off-by: minmingzhu <[email protected]>
Signed-off-by: Wu, Xiaochang <[email protected]>
Signed-off-by: minmingzhu <[email protected]>
Co-authored-by: Xiaochang Wu <[email protected]>
Co-authored-by: yutianchen <[email protected]>
Co-authored-by: KepingYan <[email protected]>
Co-authored-by: Yizhong Zhang <[email protected]>
Co-authored-by: Zhi Lin <[email protected]>
  • Loading branch information
6 people authored May 16, 2024
1 parent 9142112 commit 620800f
Show file tree
Hide file tree
Showing 53 changed files with 523 additions and 580 deletions.
7 changes: 1 addition & 6 deletions .github/workflows/config/bloom-560m-ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,4 @@ ipex:
model_description:
model_id_or_path: bigscience/bloom-560m
tokenizer_name_or_path: bigscience/bloom-560m
chat_processor: ChatModelGptJ
prompt:
intro: ''
human_id: ''
bot_id: ''
stop_words: []
chat_template: "llm_on_ray/inference/models/templates/template_gpt2.jinja"
7 changes: 1 addition & 6 deletions .github/workflows/config/gpt2-ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,5 @@ ipex:
model_description:
model_id_or_path: gpt2
tokenizer_name_or_path: gpt2
chat_processor: ChatModelGptJ
gpt_base_model: true
prompt:
intro: ''
human_id: ''
bot_id: ''
stop_words: []
chat_template: "llm_on_ray/inference/models/templates/template_gpt2.jinja"
8 changes: 0 additions & 8 deletions .github/workflows/config/llama-2-7b-chat-hf-vllm-fp32.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,5 @@ ipex:
model_description:
model_id_or_path: meta-llama/Llama-2-7b-chat-hf
tokenizer_name_or_path: meta-llama/Llama-2-7b-chat-hf
chat_processor: ChatModelLLama
prompt:
intro: ''
human_id: '[INST] {msg} [/INST]
'
bot_id: ''
stop_words: []
config:
use_auth_token: ''
13 changes: 0 additions & 13 deletions .github/workflows/config/mpt_deltatuner.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,20 +13,7 @@ ipex:
model_description:
model_id_or_path: mosaicml/mpt-7b
tokenizer_name_or_path: EleutherAI/gpt-neox-20b
chat_processor: ChatModelGptJ
peft_model_id_or_path: nathan0/mpt-7b-deltatuner-model
peft_type: deltatuner
prompt:
intro: 'Below is an instruction that describes a task, paired with an input that
provides further context. Write a response that appropriately completes the request.
'
human_id: '
### Instruction'
bot_id: '
### Response'
stop_words: []
config:
trust_remote_code: true
13 changes: 0 additions & 13 deletions .github/workflows/config/mpt_deltatuner_deepspeed.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,20 +13,7 @@ ipex:
model_description:
model_id_or_path: mosaicml/mpt-7b
tokenizer_name_or_path: EleutherAI/gpt-neox-20b
chat_processor: ChatModelGptJ
peft_model_id_or_path: nathan0/mpt-7b-deltatuner-model
peft_type: deltatuner
prompt:
intro: 'Below is an instruction that describes a task, paired with an input that
provides further context. Write a response that appropriately completes the request.
'
human_id: '
### Instruction'
bot_id: '
### Response'
stop_words: []
config:
trust_remote_code: true
7 changes: 1 addition & 6 deletions .github/workflows/config/opt-125m-ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,4 @@ ipex:
model_description:
model_id_or_path: facebook/opt-125m
tokenizer_name_or_path: facebook/opt-125m
chat_processor: ChatModelGptJ
prompt:
intro: ''
human_id: ''
bot_id: ''
stop_words: []
chat_template: "llm_on_ray/inference/models/templates/template_gpt2.jinja"
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ curl $ENDPOINT_URL/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt2",
"messages": [{"role": "assistant", "content": "You are a helpful assistant."}, {"role": "user", "content": "Hello!"}],
"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Hello!"}],
"temperature": 0.7
}'

Expand Down
2 changes: 1 addition & 1 deletion docs/serve.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ curl $ENDPOINT_URL/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": $MODEL_NAME,
"messages": [{"role": "assistant", "content": "You are a helpful assistant."}, {"role": "user", "content": "Hello!"}],
"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Hello!"}],
"temperature": 0.7
}'

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -75,11 +75,11 @@
]
messages = [
[
{"role": "user", "content": "You are a helpful assistant"},
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "What's the weather like in Boston today?"},
],
[
{"role": "user", "content": "You are a helpful assistant"},
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "Tell me a short joke?"},
],
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@
body = {
"model": args.model_name,
"messages": [
{"role": "assistant", "content": "You are a helpful assistant."},
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": args.input_text},
],
"stream": args.streaming_response,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@

messages = [
[
{"role": "user", "content": "You are a helpful assistant"},
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "What's the weather like in Boston today?"},
],
]
Expand Down
222 changes: 0 additions & 222 deletions llm_on_ray/inference/chat_process.py

This file was deleted.

Loading

0 comments on commit 620800f

Please sign in to comment.