-
Notifications
You must be signed in to change notification settings - Fork 30
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Inference] Integrate vllm example (#262)
* integrate vllm example * use vllm by default, remove ipex & deepspeed releted configs * fix ci * Modify CI, the default is vllm engine * fix ci * fix ci * fix ci, remove google/gemma2 due to huggingface token * fix ci & add params, placement_bundles is no longer used when tp=1 * mv vllm engine into route actor * fix ci * fix ci * fix ci * fix ci * fix ci * fix ci * fix ci * fix ci * fix ci * revert precision * use fp32 because ipexbf16 needs AVX512 * address comments
- Loading branch information
Showing
43 changed files
with
364 additions
and
248 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -7,13 +7,18 @@ WORKDIR /root/llm-on-ray | |
COPY ./pyproject.toml . | ||
COPY ./MANIFEST.in . | ||
|
||
# create llm_on_ray package directory to bypass the following 'pip install -e' command | ||
# Create llm_on_ray package directory to bypass the following 'pip install -e' command | ||
RUN mkdir ./llm_on_ray | ||
|
||
RUN pip install -e . && \ | ||
pip install --upgrade-strategy eager optimum[habana] && \ | ||
pip install git+https://github.com/HabanaAI/[email protected] | ||
|
||
# Install vllm habana env | ||
RUN pip install -v git+https://github.com/HabanaAI/vllm-fork.git@cf6952d | ||
# Reinstall ray because vllm downgrades the ray version | ||
RUN pip install "ray>=2.10" "ray[serve,tune]>=2.10" | ||
|
||
# Optinal. Comment out if you are not using UI | ||
COPY ./dev/scripts/install-ui.sh /tmp | ||
|
||
|
@@ -30,3 +35,4 @@ ENV RAY_EXPERIMENTAL_NOSET_HABANA_VISIBLE_MODULES=1 | |
ENV PT_HPU_LAZY_ACC_PAR_MODE=0 | ||
|
||
ENV PT_HPU_ENABLE_LAZY_COLLECTIVES=true | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
# syntax=docker/dockerfile:1 | ||
FROM ubuntu:22.04 | ||
|
||
ENV LANG C.UTF-8 | ||
|
||
WORKDIR /root/llm-on-ray | ||
|
||
RUN --mount=type=cache,target=/var/cache/apt apt-get update -y \ | ||
&& apt-get install -y build-essential cmake wget curl git vim htop ssh net-tools \ | ||
&& apt-get clean \ | ||
&& rm -rf /var/lib/apt/lists/* | ||
|
||
ENV CONDA_DIR /opt/conda | ||
RUN wget --quiet https://github.com/conda-forge/miniforge/releases/download/23.3.1-1/Miniforge3-Linux-x86_64.sh -O ~/miniforge.sh && \ | ||
/bin/bash ~/miniforge.sh -b -p /opt/conda | ||
ENV PATH $CONDA_DIR/bin:$PATH | ||
|
||
# setup env | ||
SHELL ["/bin/bash", "--login", "-c"] | ||
|
||
RUN --mount=type=cache,target=/opt/conda/pkgs conda init bash && \ | ||
unset -f conda && \ | ||
export PATH=$CONDA_DIR/bin/:${PATH} && \ | ||
mamba config --add channels intel && \ | ||
mamba install -y -c conda-forge python==3.9 gxx=12.3 gxx_linux-64=12.3 libxcrypt | ||
|
||
COPY ./pyproject.toml . | ||
COPY ./MANIFEST.in . | ||
COPY ./dev/scripts/install-vllm-cpu.sh . | ||
|
||
# create llm_on_ray package directory to bypass the following 'pip install -e' command | ||
RUN mkdir ./llm_on_ray | ||
|
||
RUN --mount=type=cache,target=/root/.cache/pip pip install -e .[cpu,deepspeed] --extra-index-url https://download.pytorch.org/whl/cpu \ | ||
--extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/ | ||
|
||
RUN ds_report | ||
|
||
# Used to invalidate docker build cache with --build-arg CACHEBUST=$(date +%s) | ||
ARG CACHEBUST=1 | ||
COPY ./dev/scripts/install-oneapi.sh /tmp | ||
RUN /tmp/install-oneapi.sh | ||
|
||
# Install vllm-cpu | ||
# Activate base first for loading g++ envs ($CONDA_PREFIX/etc/conda/activate.d/*) | ||
RUN --mount=type=cache,target=/root/.cache/pip \ | ||
source /opt/conda/bin/activate base && ./install-vllm-cpu.sh |
43 changes: 43 additions & 0 deletions
43
dev/docker/ci/Dockerfile.cpu_vllm_and_deepspeed.pip_non_editable
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
# syntax=docker/dockerfile:1 | ||
FROM ubuntu:22.04 | ||
|
||
ENV LANG C.UTF-8 | ||
|
||
WORKDIR /root/llm-on-ray | ||
|
||
RUN --mount=type=cache,target=/var/cache/apt apt-get update -y \ | ||
&& apt-get install -y build-essential cmake wget curl git vim htop ssh net-tools \ | ||
&& apt-get clean \ | ||
&& rm -rf /var/lib/apt/lists/* | ||
|
||
ENV CONDA_DIR /opt/conda | ||
RUN wget --quiet https://github.com/conda-forge/miniforge/releases/download/23.3.1-1/Miniforge3-Linux-x86_64.sh -O ~/miniforge.sh && \ | ||
/bin/bash ~/miniforge.sh -b -p /opt/conda | ||
ENV PATH $CONDA_DIR/bin:$PATH | ||
|
||
# setup env | ||
SHELL ["/bin/bash", "--login", "-c"] | ||
|
||
RUN --mount=type=cache,target=/opt/conda/pkgs conda init bash && \ | ||
unset -f conda && \ | ||
export PATH=$CONDA_DIR/bin/:${PATH} && \ | ||
mamba config --add channels intel && \ | ||
mamba install -y -c conda-forge python==3.9 gxx=12.3 gxx_linux-64=12.3 libxcrypt | ||
|
||
# copy all checkedout file for later non-editable pip | ||
COPY . . | ||
|
||
RUN --mount=type=cache,target=/root/.cache/pip pip install .[cpu,deepspeed] --extra-index-url https://download.pytorch.org/whl/cpu \ | ||
--extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/ | ||
|
||
RUN ds_report | ||
|
||
# Used to invalidate docker build cache with --build-arg CACHEBUST=$(date +%s) | ||
ARG CACHEBUST=1 | ||
COPY ./dev/scripts/install-oneapi.sh /tmp | ||
RUN /tmp/install-oneapi.sh | ||
|
||
# Install vllm-cpu | ||
# Activate base first for loading g++ envs ($CONDA_PREFIX/etc/conda/activate.d/*) | ||
RUN --mount=type=cache,target=/root/.cache/pip \ | ||
source /opt/conda/bin/activate base && ./install-vllm-cpu.sh |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,7 +15,7 @@ RUN pip install -e . && \ | |
pip install git+https://github.com/HabanaAI/[email protected] | ||
|
||
# Install vllm habana env | ||
RUN pip install -v git+https://github.com/HabanaAI/vllm-fork.git@ae3d6121 | ||
RUN pip install -v git+https://github.com/HabanaAI/vllm-fork.git@cf6952d | ||
# Reinstall ray because vllm downgrades the ray version | ||
RUN pip install "ray>=2.10" "ray[serve,tune]>=2.10" | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.