-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Int4-AWQ] Fix AWQ Marlin check for ROCm #206
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have that as an upstream PR?
Hmm, maybe add in upstream. There is already some code to check is_hip if AWQ is being used and set the VLLM_USE_TRITON_AWQ if it is not already set. Currently, "awq_marlin" is used as the quantization method when an AWQ model is loaded. It would be nice if an is_hip check were done and the "awq" quantization method was selected instead if is_hip returned true. |
a5730c7
to
dd53521
Compare
Okay, I have moved the check to a higher level to be more in line with the current I would like to put this into ROCm/vllm first, because we need it for QA. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can the breaks on 292/296 be removed and just use a break on 297?
That wouldn't quite be the same logically. |
This commit resolves an issue with the new AWQ Marlin support added in upstream. AWQ Marlin is not yet supported for ROCm in vllm, but vllm will override AWQ quantization with AWQ Marlin if quantization parameters are compatible without comprehensively checking for platform support. This commit fixes this problem in the case of ROCm by adding an is_hip() check.