-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft of ORT GPU build #5622
base: master
Are you sure you want to change the base?
Draft of ORT GPU build #5622
Conversation
Ping @davidrohr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Du solltest ein paar Umgebungsvariable, die wir in o2.sh nutzen, auch mitaufnehmen und analog behandeln: https://github.com/alisw/alidist/blob/1916f6d88d42959097998d9481b517dc1c1ea84d/o2.sh#L191C9-L191C30
- ALIBUILD_O2_FORCE_GPU
- DISABLE_GPU
- ALIBUILD_ENABLE_CUDA
- ALIBUILD_ENABLE_HIP
- ALIBUILD_O2_OVERRIDE_HIP_ARCHS
- ALIBUILD_O2_OVERRIDE_CUDA_ARCHS
Wenn ENABLE_CUDA oder ENABLE_HIP gesetzt ist, sollte der build fehlschlagen, wenn er CUDA/HIP nicht bauen kann.
onnxruntime.sh
Outdated
" | ||
elif command -v nvcc >/dev/null 2>&1; then | ||
CUDA_VERSION=$(nvcc --version | grep "release" | awk '{print $NF}' | cut -d. -f1) | ||
if [[ "$CUDA_VERSION" == "V11" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
glaube CUDA 11 kannst du weglassen, und nur >=12 annehmen
onnxruntime.sh
Outdated
ORT_BUILD_FLAGS="" | ||
case $ARCHITECTURE in | ||
osx_*) | ||
if [[ $ARCHITECTURE == *_x86-64 ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solche printouts würde ich weglassen, das ist ja hauptsächlich für debugging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ja, aber ich nehme an das es auch einen macOS build gibt der die Mac GPU anspricht. Da muss ich nochmal ein bisschen rumsuchen, dann könnte man den if-Block nämlich nehmen um da die build flags rein zu packen. Aber ja, die print-outs nehm ich am Ende natürlich noch raus
onnxruntime.sh
Outdated
fi | ||
;; | ||
*) | ||
if command -v rocminfo >/dev/null 2>&1; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- rocm version check fehlt
- Es ist nicht klar, ob rocminfo im Pfad liegt. Du solltest zumindest
/opt/rocm/bin/rocminfo
testen. Und dann ist migraphx ein separates ROCm paket. Sprich, wenn rocminfo vorhanden ist, heist das noch nicht, das migraphx vorhanden ist. Du solltest explicit auf migraphx testen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, das check ich nochmal
onnxruntime.sh
Outdated
ORT_BUILD_FLAGS=" -Donnxruntime_USE_CUDA=ON \ | ||
-DCUDA_TOOLKIT_ROOT_DIR=$CUDA_ROOT \ | ||
-Donnxruntime_USE_CUDA_NHWC_OPS=ON \ | ||
-Donnxruntime_CUDA_USE_TENSORRT=ON \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wenn du tensorrt nutzt, musst du dann prüfen, ob das explicit installiert ist? Oder ist das immer beim CUDA SDK dabei?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scheint nicht automatisch mitzukommen (https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html)... Ok da bau ich auch noch einen Check mit ein
onnxruntime.sh
Outdated
-Donnxruntime_USE_CUDA_NHWC_OPS=ON \ | ||
-Donnxruntime_CUDA_USE_TENSORRT=ON \ | ||
" | ||
elif [[ "$CUDA_VERSION" == "V12" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was ist wenn ROCm und CUDA beides vorhanden ist? Können wir dann nicht beides bauen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ne, die kann man nicht parallel bauen, es geht immer nur eins von beiden: https://github.com/microsoft/onnxruntime/blob/afd642a194b39138ad891e7bb2c8bca26d37b785/cmake/CMakeLists.txt#L288-L290
Gneau... |
…adding env-variables for GPU enabling during code execution. For al9_gpu container and simultaneous CUDA & ROCm build, this requires ChSonnabend/onnxruntime@6ffc40c
…e build with CUDA and ROCm fails due to a ROCm internal check for THRUST and CUB libraries, which are not in sync (file: /opt/rocm/include/thrust/system/cuda/config.h)
onnxruntime.sh
Outdated
export ORT_MIGRAPHX_BUILD=0 | ||
fi | ||
### TensorRT | ||
if [ "$ORT_CUDA_BUILD" -eq 1 ] && [ $(find /opt/rocm* -name "libnvinfer*" -print -quit | wc -l 2>&1) -eq 1 ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you search for tensort in /opt/rocm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😅
onnxruntime.sh
Outdated
-DCMAKE_CXX_FLAGS="$CXXFLAGS -Wno-unknown-warning -Wno-unknown-warning-option -Wno-error=unused-but-set-variable -Wno-error=deprecated" \ | ||
-DCMAKE_C_FLAGS="$CFLAGS -Wno-unknown-warning -Wno-unknown-warning-option -Wno-error=unused-but-set-variable -Wno-error=deprecated" | ||
# Check ROCm build conditions | ||
if { [ "$ALIBUILD_O2_FORCE_GPU" -ne 0 ] || [ "$ALIBUILD_ENABLE_HIP" -ne 0 ] || command -v rocminfo >/dev/null 2>&1; } && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it guaranteed that rocminfo is in the path? Perhaps, try rocminfo or /opt/rocm/bin/rocminfo?
Also, perhaps it needs HIP, not rocminfo? So perhaps check for /opt/rocm/bin/hipcc?
-DCMAKE_HIP_COMPILER=/opt/rocm/llvm/bin/clang++ \ | ||
-D__HIP_PLATFORM_AMD__=1 \ | ||
-DCMAKE_HIP_ARCHITECTURES=gfx906,gfx908 \ | ||
${ALIBUILD_O2_OVERRIDE_HIP_ARCHS:+-DCMAKE_HIP_ARCHITECTURES=${ALIBUILD_O2_OVERRIDE_HIP_ARCHS}} \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is also an OVERRIDE_CUDA_ARCH, could you check and use that as well?
onnxruntime.sh
Outdated
-Donnxruntime_CUDA_HOME=/usr/local/cuda \ | ||
-DCMAKE_HIP_COMPILER=/opt/rocm/llvm/bin/clang++ \ | ||
-D__HIP_PLATFORM_AMD__=1 \ | ||
-DCMAKE_HIP_ARCHITECTURES=gfx906,gfx908 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this. You first set CMAKE_HIP_ARCHITECTURES, and then you possibly override it in the next line? Why don't you expand ALIBUILD_O2_OVERRIDE_HIP_ARCHS to the defaults if empty? ${...:-default}
?
onnxruntime.sh
Outdated
# Check CUDA build conditions | ||
if { [ "$ALIBUILD_O2_FORCE_GPU" -ne 0 ] || [ "$ALIBUILD_ENABLE_CUDA" -ne 0 ] || command -v nvcc >/dev/null 2>&1; } && \ | ||
{ [ -z "$DISABLE_GPU" ] || [ "$DISABLE_GPU" -eq 0 ]; }; then | ||
export ORT_CUDA_BUILD=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also set a default for ALIBUILD_O2_OVERRIDE_CUDA_ARCHS to sm_86
or sm_89
architecture for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if [ "$ALIBUILD_O2_FORCE_GPU" -eq 1 ]
will not work if the variable is not defined.
You can either do [ "0$FOO" == "01" ]
or use the bash [[
syntax. Please try with all variables undefined.
Approving to start the CI. |
macOS failure seem unrelated to this PR. It can be merged from my side if there are no objections (@ktf ). |
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/rocm/lib | ||
else | ||
export ORT_ROCM_BUILD=0 | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fi | |
mkdir -p $INSTALLROOT/etc | |
cat << EOF > $INSTALLROOT/etc/ort-init.sh | |
export ORT_ROCM_BUILD=$ORT_ROCM_BUILD | |
EOF |
@@ -45,7 +117,6 @@ mkdir -p "$INSTALLROOT/etc/modulefiles" | |||
MODULEFILE="$INSTALLROOT/etc/modulefiles/$PKGNAME" | |||
alibuild-generate-module --lib > "$MODULEFILE" | |||
cat >> "$MODULEFILE" <<EoF | |||
|
|||
# Our environment | |||
set ${PKGNAME}_ROOT \$::env(BASEDIR)/$PKGNAME/\$version | |||
prepend-path ROOT_INCLUDE_PATH \$${PKGNAME}_ROOT/include/onnxruntime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prepend-path ROOT_INCLUDE_PATH \$${PKGNAME}_ROOT/include/onnxruntime | |
prepend-path ROOT_INCLUDE_PATH \$${PKGNAME}_ROOT/include/onnxruntime | |
append-path LD_LIBRARY_PATH /opt/rocm/lib |
PR is ready from my side. GPU build should be disabled on EPN's due to missing ROCm packages and ALMA linux major version mismatch (8 instead of 9). |
@pzhristov just FYI this would leave our onnx version outside the tf2onnx compatibility range https://github.com/onnx/tensorflow-onnx?tab=readme-ov-file#tf2onnx---convert-tensorflow-keras-tensorflowjs-and-tflite-models-to-onnx |
Why actually? If you refer to the opset number:I think the ONNX opset number does not necessarily correlate with the ONNX version (I might be wrong here). Does it actually break something for the converter? |
This is a draft PR to discuss possible changes to onnxruntime.sh for GPU builds on the EPN's and potentially CUDA (to be tested)