Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added CK-gemm runner #674

Open
wants to merge 1 commit into
base: main_perf
Choose a base branch
from
Open

Added CK-gemm runner #674

wants to merge 1 commit into from

Conversation

ravil-mobile
Copy link

@ravil-mobile ravil-mobile commented Dec 6, 2024

Added CK-gemm runner. The user can replace DeviceGemmInstance in kernel.cpp. You will find amdgcn assembly of the kernel in the build directory after the compilation.

@ravil-mobile ravil-mobile force-pushed the ravil/ck-gemm branch 3 times, most recently from 51868e8 to fb2115d Compare December 6, 2024 22:53
@ravil-mobile
Copy link
Author

@zhanglx13 @giuseros @sjw36 , can you, please, review the PR?

@zhanglx13
Copy link

let me give it a try and I'll review the details later today

@zhanglx13
Copy link

I'm getting the following error

CMake Error at CMakeLists.txt:47 (find_package):
  By not providing "Findhip.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "hip", but
  CMake did not find one.

  Could not find a package configuration file provided by "hip" with any of
  the following names:

    hipConfig.cmake
    hip-config.cmake

  Add the installation prefix of "hip" to CMAKE_PREFIX_PATH or set "hip_DIR"
  to a directory containing one of the above files.  If "hip" provides a
  separate development package or SDK, be sure it has been installed.

Is there some dependency that we need to install before installing CK?

@ravil-mobile
Copy link
Author

I'm getting the following error

CMake Error at CMakeLists.txt:47 (find_package):
  By not providing "Findhip.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "hip", but
  CMake did not find one.

  Could not find a package configuration file provided by "hip" with any of
  the following names:

    hipConfig.cmake
    hip-config.cmake

  Add the installation prefix of "hip" to CMAKE_PREFIX_PATH or set "hip_DIR"
  to a directory containing one of the above files.  If "hip" provides a
  separate development package or SDK, be sure it has been installed.

Is there some dependency that we need to install before installing CK?

Hi @zhanglx13, usually we have ROCm software stack installed on all of our dev. machine (under /opt/rocm). Are you working inside a container?

@zhanglx13
Copy link

Are you working inside a container?

Yes, my base docker is rocm/pytorch:rocm6.2.3_ubuntu22.04_py3.10_pytorch_release_2.3.0

@ravil-mobile
Copy link
Author

Are you working inside a container?

Yes, my base docker is rocm/pytorch:rocm6.2.3_ubuntu22.04_py3.10_pytorch_release_2.3.0

I tried to reproduce the error with your Docker image and I didn't succeed. Are you sure that you followed the documentation? HIP is a part of ROCm. I am surprised to see that your cmake was not able to find it

CK_INSTALL_DIR=$(realpath .)/usr/ck
mkdir -p ${CK_INSTALL_DIR}
 
git clone https://github.com/ROCm/composable_kernel.git
cd composable_kernel
mkdir -p build && cd build
 
# specify your target here
TARGET="gfx942"
 
# Edit `./profiler/src/CMakeLists.txt` to speed up compilation (see below)
 
CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ cmake .. \
-DCMAKE_LINKER_TYPE=LLD \
-DCMAKE_PREFIX_PATH=/opt/rocm \
-DCMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \
-DCMAKE_BUILD_TYPE=Release \
-DGPU_TARGETS="${TARGET}" \
-DGPU_ARCHS="${TARGET}" \
-DCMAKE_POSITION_INDEPENDENT_CODE=ON \
-DCMAKE_INSTALL_PREFIX="${CK_INSTALL_DIR}" \
-DBUILD_TESTING=OFF \
-DCK_PARALLEL_COMPILE_JOBS=20 \
-DCK_PARALLEL_LINK_JOBS=20
 
make -j<num_proc>
make install

@vgokhale
Copy link
Collaborator

vgokhale commented Dec 20, 2024

What is the purpose of this PR? Is the intent to compare performance with CK GEMMs in our performance regression testing?

@ravil-mobile
Copy link
Author

What is the purpose of this PR? Is the intent to compare performance with CK GEMMs in our performance regression testing?

CK has ckProfile gemm_universal utility which find the best performing GEMM for a given problem size. Starting from recently, the utility prints a so-called Kernel-Object with all templates parameters to the standard output at the end. One can copy&past the winning Kernel-Object to this sample CK program, compile and run it under a profiler to get (e.g., ATTViewer) to get traces.

@vgokhale
Copy link
Collaborator

What is the purpose of this PR? Is the intent to compare performance with CK GEMMs in our performance regression testing?

CK has ckProfile gemm_universal utility which find the best performing GEMM for a given problem size. Starting from recently, the utility prints a so-called Kernel-Object with all templates parameters to the standard output at the end. One can copy&past the winning Kernel-Object to this sample CK program, compile and run it under a profiler to get (e.g., ATTViewer) to get traces.

Thanks. I understand what the code is doing.

I want to understand why we need this in the kernels repository. Is this something that we think we will use frequently across multiple users? Do we have resources committed to maintaining this piece of code?

@ravil-mobile
Copy link
Author

Is this something that we think we will use frequently across multiple users? Do we have resources committed to maintaining this piece of code?

Is this something that we think we will use frequently across multiple users?
It feels like we are going to work closely with the CK (maybe Tensile as well). There are still lot's of pipelining and scheduling tricks that we can adapt in Triton

Do we have resources committed to maintaining this piece of code?
In theory, we need just a single pipeline job which is going to compile this sample program (without running). The CK is a part of the ROCm software stack. So, it should be very quick in general.

I remember I was sharing this code with one or two developers (cannot remember names). As for me, it is always a sign to add such a tool to some remote repository.

PS Btw, I remember that we did performance comparisons between rocMLIR and CK/rocBLAS kernels in the rocMLIR project.

@vgokhale
Copy link
Collaborator

In theory, we need just a single pipeline job which is going to compile this sample program (without running). The CK is a part of the ROCm software stack. So, it should be very quick in general.

This is not maintenance burden. Maintenance burden is the effort needed to ensure that this code is functional and keeps running for eternity. That includes ensuring we keep up with any changes on the CK side to the APIs that are being called here. The kernels team cannot commit to this effort. IMO this is something better suited for a confluence page. However, if the compiler team can commit to maintaining this, I'm fine with checking it in here. CC: @sjw36

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants