-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Misc. bug: Very bad performance on Qwen 2 with HIP/ROCm #11153
Comments
I get the following numbers with latest master (Linux): ./build/bin/llama-bench --model /mnt/models/Qwen2.5-14B-Instruct-Q4_K_M.gguf
build: c05e8c9 (4462) Maybe a Windows issue? |
I don't know if that's a Windows specific issue, as I don't have a Linux machine to test yet. However, I compiled the 3edfa7d and the result stay the time. My 7900XTX is at 100% load but system total power draw is only ~270W compare to Vulkan at ~530W. There is a page about performance troubleshooting but nothing about troubleshooting low GPU resource utilization. Here's the result
Edit: correcting GPU load observation |
If you can find the exact commit that introduced the issue, that would greatly increase the chances of finding a solution. You can use |
Upon more detailed examination, the slow performance is only occurring on Qwen2 model but not with LLama2. I'm still bisecting and trying to look for a commit that have a reasonable performance. |
@slaren Bad news. I have got down to 9b75cb2, the commit that enables Qwen2 support and the performance is still bad. I couldn't find a commit that can considered good yet. For now, I can only assume is that the implementation is flawed from the beginning, while somehow the mystery build can behave normally. Here are the bench results: HIP 5.7build-b%LLAMA_TAG%-hip%HIP_VERSION%\bin\llama-bench.exe -m "Qwen2.5-14B-Instruct-Q4_K_M.gguf"
build: 9b75cb2 (1923) HIP 6.1build-b%LLAMA_TAG%-hip%HIP_VERSION%\bin\llama-bench.exe -m "Qwen2.5-14B-Instruct-Q4_K_M.gguf"
build: 9b75cb2 (1923) HIP 6.2build-b%LLAMA_TAG%-hip%HIP_VERSION%\bin\llama-bench.exe -m "Qwen2.5-14B-Instruct-Q4_K_M.gguf"
build: 9b75cb2 (1923) |
I guess it comes down to build settings. I believe koboldcpp has ROCm releases, you can try checking if their builds work for you, and try their build settings if so. |
koboldcpp doesn't have a ROCm release, but a fork. Downloaded v1.80.3 and the result is even worst than llama.cpp at 2.9 tok/s.
|
Correction: it was some build issue. |
I'm experiencing the same. Full GPU offload of qwen2.5-coder-7b-instruct onto 7900xtx. T/S is 3.1 Started with lmstudio 0.3.6 using llama.cpp. Need to check my run times. |
I'm getting good performance (63tok/s on tg128) with Vulkan backend. So I guess I'll just use that...? |
The HIP/ROCm build from LMStudio should be okay. However, I do experience some instability when loading some models (Phi4) so I default to Vulkan backend. |
@FeepingCreature Can you please post your benchmark result with Another user reported the Linux build is working correctly. I'm especially interested about the commit of your build. It may help accelerate my effort to find where the bug is. Thank you. |
Update: I did a clean build of llama.cpp and now it runs fine.
I didn't have the rocm libs installed right when I first ran edit: Stupid, I should have made a copy... |
Name and Version
Operating systems
Windows 11 24H2 Build 26100.2605
Which llama.cpp modules do you know to be affected?
llama-bench
Command line
Problem description & steps to reproduce
Description
It is horrendously slow. It shouldn't be this slow. You will get a sense how slow it is with the result with the Vulkan backend, which is suppose to be worse.
Step to reproduce
llama-bench.exe
with a model you likeFirst Bad Commit
No response
Relevant log output
No response
Additional Information
Results with other backends and builds
Vulken Backend
b3808-hip
mystery build from https://github.com/PiDanShouRouZhouXD/Sakura_Launcher_GUI/releases/tag/v0.0.3-alpha
Temporary Workaround
Do not use the HIP build. Use Vulkan instead.
The text was updated successfully, but these errors were encountered: