[Feature Request]: Add quantized qwen2-0.5b #41

bil-ash · 2024-06-20T00:53:46Z

Problem Description

My android phone has limited RAM and so it is able to run only the Tinyllama model. However, Tinyllama provides inferior result compared to Qwen2-0.5b instruct(tested on desktop). Although, Qwen2 0.5 B has fewer params, I am unable to run it on phone because the llm-chat has only the unquantized version of Qwen2-0.5B while having the quantized version of Tinlllama.

Solution Description

Please add Qwen2-0.5B quantized versions(q4f16 anf q4f32) to the list of supported models in web-llm-chat. These two are already available in huggingface.

Alternatives Considered

No response

Additional Context

No response

Neet-Nestor · 2024-06-23T03:22:37Z

cc. @CharlieFRuan

Issues-translate-bot · 2024-06-23T03:22:48Z

Bot detected the issue body's language is not English, translate it automatically.

cc. @CharlieFRuan

bil-ash · 2024-06-25T00:43:54Z

@CharlieFRuan Please do the needful

Neet-Nestor · 2024-06-25T00:45:55Z

@bil-ash We will work on this. Meanwhile, please feel free to use MLC-LLM which supports qwen2-0.5b quantized versions and connect WebLLM Chat to its serve API as a temporary alternative solution.

Instruction: https://github.com/mlc-ai/web-llm-chat/?tab=readme-ov-file#use-custom-models

bil-ash · 2024-06-26T01:33:17Z

@bil-ash We will work on this. Meanwhile, please feel free to use MLC-LLM which supports qwen2-0.5b quantized versions and connect WebLLM Chat to its serve API as a temporary alternative solution.

Instruction: https://github.com/mlc-ai/web-llm-chat/?tab=readme-ov-file#use-custom-models

Created a PR to solve the issue. Please have a look.

Neet-Nestor · 2024-06-26T03:56:05Z

The model is available on WebLLM Chat now. https://chat.webllm.ai/#/chat

Thanks for the contribution!

bil-ash added the enhancement New feature or request label Jun 20, 2024

bil-ash mentioned this issue Jun 25, 2024

[Bug] fine-tuned model deployed with webllm not working mlc-ai/mlc-llm#2601

Closed

Neet-Nestor closed this as completed Jun 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request]: Add quantized qwen2-0.5b #41

[Feature Request]: Add quantized qwen2-0.5b #41

bil-ash commented Jun 20, 2024

Neet-Nestor commented Jun 23, 2024

Issues-translate-bot commented Jun 23, 2024

bil-ash commented Jun 25, 2024

Neet-Nestor commented Jun 25, 2024 •

edited

Loading

bil-ash commented Jun 26, 2024

Neet-Nestor commented Jun 26, 2024

[Feature Request]: Add quantized qwen2-0.5b #41

[Feature Request]: Add quantized qwen2-0.5b #41

Comments

bil-ash commented Jun 20, 2024

Problem Description

Solution Description

Alternatives Considered

Additional Context

Neet-Nestor commented Jun 23, 2024

Issues-translate-bot commented Jun 23, 2024

bil-ash commented Jun 25, 2024

Neet-Nestor commented Jun 25, 2024 • edited Loading

bil-ash commented Jun 26, 2024

Neet-Nestor commented Jun 26, 2024

Neet-Nestor commented Jun 25, 2024 •

edited

Loading