-
Notifications
You must be signed in to change notification settings - Fork 925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add quantized qwen2-0.5b #490
Conversation
to support quantized qwen2-0.5b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the contribution. Some minor changes, one on consistency on naming, and one on the required MB after calculation
src/config.ts
Outdated
modelVersion + | ||
"/Qwen2-0.5B-Instruct-q4f16_1-webgpu.wasm", | ||
low_resource_required: true, | ||
vram_required_MB: 500,//rough estimate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vram_required_MB: 500,//rough estimate | |
vram_required_MB: 944.62, |
src/config.ts
Outdated
@@ -601,6 +601,19 @@ export const prebuiltAppConfig: AppConfig = { | |||
}, | |||
}, | |||
// Qwen-2 | |||
{ | |||
model: "https://huggingface.co/mlc-ai/Qwen2-0.5B-Instruct-q4f16_1-MLC", | |||
model_id: "Qwen2-0.5B-Instruct-q4f16-MLC", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
model_id: "Qwen2-0.5B-Instruct-q4f16-MLC", | |
model_id: "Qwen2-0.5B-Instruct-q4f16_1-MLC", |
By the way, what is the formula for calculating VRAM required? |
Thanks for making the changes! For VRAM, it is mainly three parts: model size, intermediate buffer size (for various matrix multiplications, etc.), and KV cache size. The sum of the first two is estimated during The |
So, VRAM=model+ intermediate buffer +KV cache. |
The |
Thanks for the info |
I just ran compile again and got following info:
I guess I used the wrong estimation, it should be 1528.12 + 48 MB instead since the wasm you uploaded uses 2048 chunk size |
Add quantized(q4f16) qwen2-0.5b to the list of supported models. [PR](mlc-ai/binary-mlc-llm-libs#128) must be merged before merging this.
Add quantized(q4f16) qwen2-0.5b to the list of supported models. PR must be merged before merging this.