-
Notifications
You must be signed in to change notification settings - Fork 870
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
740 add generic support for different gpu hardware #3371
740 add generic support for different gpu hardware #3371
Conversation
70c500c
to
54ef2bd
Compare
54ef2bd
to
bc96fa7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR. We are still reviewing it
# For reference: | ||
# https://docs.docker.com/develop/develop-images/build_enhancements/ | ||
|
||
ARG BASE_IMAGE=ubuntu:24.04 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rest of TorchServe images are still on ubuntu 20.04 as we had issues with github runners with versions greater. Haven't tried this in a while.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback @agunapal
--index-url https://download.pytorch.org/whl/rocm6.2 | ||
torch==2.5.1+rocm6.2; sys_platform == 'linux' | ||
torchvision==0.20.1+rocm6.2; sys_platform == 'linux' | ||
torchaudio==2.5.1+rocm6.2; sys_platform == 'linux' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We haven't yet updated the Pytorch version for the rest of the project. But this should be ok. I will update it for other platforms too
@smedegaard Looks like the PR is breaking TorchServe on |
It seems like this test is failing in CI https://github.com/nod-ai/serve/blob/31824434aa2acd3ff8261bd18cf6f1d925b8e22a/frontend/server/src/test/java/org/pytorch/serve/util/ConfigManagerTest.java#L110 |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
Co-authored-by: Samu Tamminen <[email protected]>
…leUtil GPU env value
844806c
to
cc0809d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Tested with the changes manually on Graviton 3 and there are no issues.
The failures with the runners can be debugged at a later point
Description
This PR decouples the hardware layer from the front- and backend of TorchServe.
Relates to #740
'Add AMD backend support'
Rony Leppänen [email protected]
'Add AMD frontend support'
Anders Smedegaard Pedersen [email protected]
'Add Dockerfile.rocm'
Samu Tamminen [email protected]
Jarkko Lehtiranta [email protected]
'Add AMD documentation'
Anders Smedegaard Pedersen [email protected]
Rony Leppänen [email protected]
Jarkko Lehtiranta [email protected]
Other contributions:
Bipradip Chowdhury [email protected]
Jarkko Vainio [email protected]
Tero Kemppi [email protected]
Requirement Files
Added
requirements/torch_rocm62.txt
,requirements/torch_rocm61.txt
andrequirements/torch_rocm60.txt
for easy install of dependencies needed for AMD support.Backend
The Python backend supports currently NVIDIA GPUs using hardware specific libraries. There were also a number of functions that could be refactored using more generalized interfaces.
Changes Made to Backend
print_env_info
for AMD GPUs and reimplement a number of functionsFrontend
The Java frontend that acts as the workload manager had calls to SMIs hard-coded in a few places. This made it difficult for TorchServe to support multiple hardware vendors in a graceful manner.
Changes Made to Frontend
We've introduced a new package
org.pytorch.serve.device
with the classesSystemInfo
andAccelerator
.SystemInfo
holds an array list ofAccelerator
objects that holds static information about the specific accelerators on a machine, and the relevant metrics.Instead of calling the SMIs directly in multiple places in the frontend code we have abstracted the hardware away by adding an instance of
SystemInfo
to the pre-existingConfigManager
. Now the frontend can get data from the hardware via the methods onSystemInfo
without knowing about the specifics of the hardware and SMIs.To implement the specifics for each of the vendors that was already partially supported we have created a number of utility classes that communicates with the hardware via the relevant SMI.
The following steps are taken in the
SystemInfo
constructor.which {relevant smi}
for each of the supported vendors.This is how vendor detection was done previously. There might be more robust ways.
where
is used on Windows systems.ROCmUtility
for AMD.HIP_VISIBLE_DEVICES
for AMD,CUDA_VISIBLE_DEVICES
for nvidia andXPU_VISIBLE_DEVICES
for Intel. All devices are detected if the relevant environment variable is not set.The following is a class diagram showing how the new classes relate to the existing code
Documentation
serve/docs/hardware_support/
and added them under "Hardware Support" in the TOCType of change
Please delete options that are not relevant.
Feature/Issue validation/testing
We build new docker container for ROCm using Dockerfile.rocm and build argument
USE_ROCM_VERSION
. For other platforms we usedbuild_image.sh
script.Run containers
Tests
Logs:
Logs:
Logs:
Logs:
Logs:
Logs:
Logs:
Logs:
Logs:
OBS! The test
test_handler.py::test_huggingface_bert_model_parallel_inference
fails due to:ValueError: Input length of input_ids is 150, but max_length is set to 50. This can lead to unexpected behavior. You should consider increasing max_length or, better yet, setting max_new_tokens.
This indicates that preprocessing uses a different
max_length
than inference, which can be verified when looking at the handler when the test was originally implemented: model.generate() hasmax_length=50
by default, while tokenizer uses max_length from setup_config (max_length=150
). It seems that the BERT-basedTextgeneration.mar
needs an update.Checklist: