Skip to content

Commit

Permalink
Add benchmark run script, figure visualization script (#195)
Browse files Browse the repository at this point in the history
* add benchmark run script, visualize script

* upd

* update multi replicas

* use --result-dir to parse results

* fix ci proxy

* add test ci

* add license

* fix

* fix

* fix ci

* fix ci

* add package matplotlib

* verify CI test

* verify CI test

* create assets folder to place pictures

* verify CI test

* move benchmark ci test to self-host nodes

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* update choice3_tokens_32_64.png
  • Loading branch information
KepingYan authored Jun 4, 2024
1 parent fb8542d commit 4fa68af
Show file tree
Hide file tree
Showing 20 changed files with 849 additions and 5 deletions.
5 changes: 5 additions & 0 deletions .github/workflows/workflow_orders_nightly.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,11 @@ jobs:
uses: ./.github/workflows/workflow_finetune.yml
with:
ci_type: nightly

call-benchmark:
uses: ./.github/workflows/workflow_test_benchmark.yml
with:
ci_type: nightly

# call-finetune-on-intel-gpu:
# uses: ./.github/workflows/workflow_finetune_gpu.yml
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/workflow_orders_on_merge.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,7 @@ jobs:
Finetune:
needs: Lint
uses: ./.github/workflows/workflow_finetune.yml

Benchmark:
needs: Lint
uses: ./.github/workflows/workflow_test_benchmark.yml
4 changes: 4 additions & 0 deletions .github/workflows/workflow_orders_on_pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,7 @@ jobs:
Finetune:
needs: Lint
uses: ./.github/workflows/workflow_finetune.yml

Benchmark:
needs: Lint
uses: ./.github/workflows/workflow_test_benchmark.yml
127 changes: 127 additions & 0 deletions .github/workflows/workflow_test_benchmark.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
name: Benchmark

on:
workflow_call:
inputs:
ci_type:
type: string
default: 'pr'
runner_container_image:
type: string
default: '10.1.2.13:5000/llmray-build'
http_proxy:
type: string
default: 'http://10.24.221.169:911'
https_proxy:
type: string
default: 'http://10.24.221.169:911'
runner_config_path:
type: string
default: '/home/ci/llm-ray-actions-runner'
code_checkout_path:
type: string
default: '/home/ci/actions-runner/_work/llm-on-ray/llm-on-ray'
model_cache_path:
type: string
default: '/mnt/DP_disk1/huggingface/cache'

concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-bench
cancel-in-progress: true

jobs:
setup-test:

name: benchmark

runs-on: self-hosted

defaults:
run:
shell: bash
container:
image: ${{ inputs.runner_container_image }}
env:
http_proxy: ${{ inputs.http_proxy }}
https_proxy: ${{ inputs.https_proxy }}
SHELL: bash -eo pipefail
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- ${{ inputs.runner_config_path }}:/root/actions-runner-config

steps:
- name: Checkout
uses: actions/checkout@v4

- name: Load environment variables
run: cat /root/actions-runner-config/.env >> $GITHUB_ENV

- name: Determine Target
id: "target"
run: |
target="benchmark"
target="${target}_vllm"
echo "target is ${target}"
echo "target=$target" >> $GITHUB_OUTPUT
- name: Build Docker Image
run: |
DF_SUFFIX=".vllm"
TARGET=${{steps.target.outputs.target}}
docker build ./ --build-arg CACHEBUST=1 --build-arg http_proxy=${{ inputs.http_proxy }} --build-arg https_proxy=${{ inputs.https_proxy }} -f dev/docker/Dockerfile${DF_SUFFIX} -t ${TARGET}:latest
docker container prune -f
docker image prune -f
- name: Start Docker Container
run: |
TARGET=${{steps.target.outputs.target}}
cid=$(docker ps -q --filter "name=${TARGET}")
if [[ ! -z "$cid" ]]; then docker stop $cid && docker rm $cid; fi
# check and remove exited container
cid=$(docker ps -a -q --filter "name=${TARGET}")
if [[ ! -z "$cid" ]]; then docker rm $cid; fi
docker run -tid -v ${{ inputs.model_cache_path }}:/root/.cache/huggingface/hub -v ${{ inputs.code_checkout_path }}:/root/llm-on-ray -e http_proxy=${{ inputs.http_proxy }} -e https_proxy=${{ inputs.https_proxy }} --name="${TARGET}" --hostname="${TARGET}-container" ${TARGET}:latest
- name: Start Ray Cluster
run: |
TARGET=${{steps.target.outputs.target}}
docker exec "${TARGET}" bash -c "./dev/scripts/start-ray-cluster.sh"
- name: Run Benchmark Test
run: |
TARGET=${{steps.target.outputs.target}}
# Additional libraries required for pytest
docker exec "${TARGET}" bash -c "pip install -r tests/requirements.txt"
CMD=$(cat << EOF
import yaml
conf_path = "llm_on_ray/inference/models/llama-2-7b-chat-hf.yaml"
with open(conf_path, encoding="utf-8") as reader:
result = yaml.load(reader, Loader=yaml.FullLoader)
result['model_description']["config"]["use_auth_token"] = "${{ env.HF_ACCESS_TOKEN }}"
with open(conf_path, 'w') as output:
yaml.dump(result, output, sort_keys=False)
conf_path = "llm_on_ray/inference/models/vllm/llama-2-7b-chat-hf-vllm.yaml"
with open(conf_path, encoding="utf-8") as reader:
result = yaml.load(reader, Loader=yaml.FullLoader)
result['model_description']["config"]["use_auth_token"] = "${{ env.HF_ACCESS_TOKEN }}"
with open(conf_path, 'w') as output:
yaml.dump(result, output, sort_keys=False)
EOF
)
docker exec "${TARGET}" python -c "$CMD"
docker exec "${TARGET}" bash -c "huggingface-cli login --token ${{ env.HF_ACCESS_TOKEN }}"
docker exec "${TARGET}" bash -c "./tests/run-tests-benchmark.sh"
- name: Stop Ray
run: |
TARGET=${{steps.target.outputs.target}}
cid=$(docker ps -q --filter "name=${TARGET}")
if [[ ! -z "$cid" ]]; then
docker exec "${TARGET}" bash -c "ray stop"
fi
- name: Stop Container
if: success() || failure()
run: |
TARGET=${{steps.target.outputs.target}}
cid=$(docker ps -q --filter "name=${TARGET}")
if [[ ! -z "$cid" ]]; then docker stop $cid && docker rm $cid; fi
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ LLM-on-Ray's modular workflow structure is designed to comprehensively cater to
* **Interactive Web UI for Enhanced Usability**: Except for command line, LLM-on-Ray introduces a Web UI, allowing users to easily finetune and deploy LLMs through a user-friendly interface. Additionally, the UI includes a chatbot application, enabling users to immediately test and refine the models.


![llm-on-ray](https://github.com/intel/llm-on-ray/assets/9278199/68017c14-c0be-4b91-8d71-4b74ab89bd81)
![llm-on-ray](./docs/assets/solution_technical_overview.png)


## Getting Started
Expand Down
6 changes: 6 additions & 0 deletions benchmarks/benchmark_serving.py
Original file line number Diff line number Diff line change
Expand Up @@ -477,6 +477,7 @@ def main(args: argparse.Namespace):
config["top_p"] = float(args.top_p)
if args.top_k:
config["top_k"] = float(args.top_k)
config["do_sample"] = args.do_sample
# In order to align with vllm test parameters
if args.vllm_engine:
config["ignore_eos"] = True
Expand Down Expand Up @@ -734,6 +735,11 @@ def main(args: argparse.Namespace):
help="The number of highest probability vocabulary tokens to keep \
for top-k-filtering.",
)
parser.add_argument(
"--do_sample",
action="store_true",
help="Whether or not to use sampling; use greedy decoding otherwise.",
)
parser.add_argument(
"--vllm-engine",
action="store_true",
Expand Down
Loading

0 comments on commit 4fa68af

Please sign in to comment.