-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request]: Add quantized qwen2-0.5b #41
Comments
cc. @CharlieFRuan |
cc. @CharlieFRuan |
@CharlieFRuan Please do the needful |
@bil-ash We will work on this. Meanwhile, please feel free to use MLC-LLM which supports qwen2-0.5b quantized versions and connect WebLLM Chat to its serve API as a temporary alternative solution. Instruction: https://github.com/mlc-ai/web-llm-chat/?tab=readme-ov-file#use-custom-models |
Created a PR to solve the issue. Please have a look. |
The model is available on WebLLM Chat now. https://chat.webllm.ai/#/chat Thanks for the contribution! |
Problem Description
My android phone has limited RAM and so it is able to run only the Tinyllama model. However, Tinyllama provides inferior result compared to Qwen2-0.5b instruct(tested on desktop). Although, Qwen2 0.5 B has fewer params, I am unable to run it on phone because the llm-chat has only the unquantized version of Qwen2-0.5B while having the quantized version of Tinlllama.
Solution Description
Please add Qwen2-0.5B quantized versions(q4f16 anf q4f32) to the list of supported models in web-llm-chat. These two are already available in huggingface.
Alternatives Considered
No response
Additional Context
No response
The text was updated successfully, but these errors were encountered: