You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey all! I’ve been playing around with the new IBM Granite 3.0 8B Instruct model and its accelerator, and I’m loving the speed boost it provides.
I’m curious if anyone has tried quantizing the accelerator model (ibm-granite/granite-3.0-8b-instruct-accelerator) to further reduce its size and potentially speed things up even more. I'm currently using an AWQ version of ibm-granite/granite-3.1-8b-instruct.
I know quantization works well for the main model, but I’m not sure if it’s possible or advisable for the accelerator. Has anyone given this a shot? If so, what methods did you use, and how did it affect performance? I’d love to hear about your experiences or any tips you might have!
Also, if quantizing the accelerator isn’t recommended, I’d be interested in understanding why. Thanks in advance!
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hey all! I’ve been playing around with the new IBM Granite 3.0 8B Instruct model and its accelerator, and I’m loving the speed boost it provides.
I’m curious if anyone has tried quantizing the accelerator model (ibm-granite/granite-3.0-8b-instruct-accelerator) to further reduce its size and potentially speed things up even more. I'm currently using an AWQ version of ibm-granite/granite-3.1-8b-instruct.
I know quantization works well for the main model, but I’m not sure if it’s possible or advisable for the accelerator. Has anyone given this a shot? If so, what methods did you use, and how did it affect performance? I’d love to hear about your experiences or any tips you might have!
Also, if quantizing the accelerator isn’t recommended, I’d be interested in understanding why. Thanks in advance!
Beta Was this translation helpful? Give feedback.
All reactions