Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Add quantized qwen2-0.5b #41

Closed
bil-ash opened this issue Jun 20, 2024 · 6 comments
Closed

[Feature Request]: Add quantized qwen2-0.5b #41

bil-ash opened this issue Jun 20, 2024 · 6 comments
Labels
enhancement New feature or request

Comments

@bil-ash
Copy link
Contributor

bil-ash commented Jun 20, 2024

Problem Description

My android phone has limited RAM and so it is able to run only the Tinyllama model. However, Tinyllama provides inferior result compared to Qwen2-0.5b instruct(tested on desktop). Although, Qwen2 0.5 B has fewer params, I am unable to run it on phone because the llm-chat has only the unquantized version of Qwen2-0.5B while having the quantized version of Tinlllama.

Solution Description

Please add Qwen2-0.5B quantized versions(q4f16 anf q4f32) to the list of supported models in web-llm-chat. These two are already available in huggingface.

Alternatives Considered

No response

Additional Context

No response

@bil-ash bil-ash added the enhancement New feature or request label Jun 20, 2024
@Neet-Nestor
Copy link
Collaborator

cc. @CharlieFRuan

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


cc. @CharlieFRuan

@bil-ash
Copy link
Contributor Author

bil-ash commented Jun 25, 2024

@CharlieFRuan Please do the needful

@Neet-Nestor
Copy link
Collaborator

Neet-Nestor commented Jun 25, 2024

@bil-ash We will work on this. Meanwhile, please feel free to use MLC-LLM which supports qwen2-0.5b quantized versions and connect WebLLM Chat to its serve API as a temporary alternative solution.

Instruction: https://github.com/mlc-ai/web-llm-chat/?tab=readme-ov-file#use-custom-models

@bil-ash
Copy link
Contributor Author

bil-ash commented Jun 26, 2024

@bil-ash We will work on this. Meanwhile, please feel free to use MLC-LLM which supports qwen2-0.5b quantized versions and connect WebLLM Chat to its serve API as a temporary alternative solution.

Instruction: https://github.com/mlc-ai/web-llm-chat/?tab=readme-ov-file#use-custom-models

Created a PR to solve the issue. Please have a look.

@Neet-Nestor
Copy link
Collaborator

The model is available on WebLLM Chat now. https://chat.webllm.ai/#/chat

Thanks for the contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants