Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

medium based on llama 3.1 #226

Open
themrzmaster opened this issue Jul 30, 2024 · 8 comments
Open

medium based on llama 3.1 #226

themrzmaster opened this issue Jul 30, 2024 · 8 comments

Comments

@themrzmaster
Copy link

would be really nice to have a functionary version of llama 3.1 70b/8b!

@jeffrey-fong
Copy link
Contributor

Hi, we are working on it actively right now. Looking forward to sharing good news soon!

@xdevfaheem
Copy link

@jeffrey-fong would it possible to release the datasets or atleast lora adapters?

@xdevfaheem
Copy link

Hi, we are working on it actively right now. Looking forward to sharing good news soon!

Heavily awaiting llama3.1-8b-functionary-medium-128k🤩

@khai-meetkai
Copy link
Collaborator

Hi @xdevfaheem , @themrzmaster we have just released our new model: meetkai/functionary-small-v3.1 that is based on: meta-llama/Meta-Llama-3.1-8B-Instruct

@khai-meetkai
Copy link
Collaborator

Hi @xdevfaheem , @themrzmaster we have also released our 70b model: functionary-medium-v3.1

@xdevfaheem
Copy link

That's awesome! Great work @khai-meetkai. TQSM

@SoheylM
Copy link

SoheylM commented Aug 10, 2024

@khai-meetkai Thanks a lot for the great work! Any timeline for functionary-medium-v3.1/3.2 quantized with AWQ?

@SoheylM
Copy link

SoheylM commented Sep 24, 2024

I tried to develop an AWQ quantized version of functionary-medium-v3.2 using AutoAWQ's quantized scripts (full GPU, GPU-CPU offload is bugged with the transformers version used for Llama 3.1).

Unfortunately, it seems that I may have done something wrong as it is not performing very well, or at least way below my expectations compared to the AWQ 3.0 version. I used AutoAWQ's quantization script as follows (I manually edited config.json to replace FunctionaryForCausalLM by LlamaForCausalLM in architectures) :

from transformers import AutoTokenizer
from huggingface_hub import snapshot_download
import os

model_path = 'meetkai/functionary-medium-v3.2'
local_model_path = 'functionary-medium-v3.2-local'
quant_path = 'functionary-medium-v3.2-awq'
quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }

# Download the model files to a local directory
print("Downloading the model files...")
snapshot_download(repo_id=model_path, local_dir=local_model_path, allow_patterns=["*"])


# Load the model
print("Loading the model...")
model = AutoAWQForCausalLM.from_pretrained(
    local_model_path, device_map="auto", low_cpu_mem_usage=True, use_cache=False, trust_remote_code=True
)
print("Model loaded successfully.")

# Load the tokenizer
print("Loading the tokenizer...")
tokenizer = AutoTokenizer.from_pretrained(local_model_path, trust_remote_code=True)
print("Tokenizer loaded successfully.")

# Quantize the model
print("Starting quantization...")
model.quantize(tokenizer=tokenizer, quant_config=quant_config)
print("Quantization completed.")

# Save the quantized model
print(f"Saving quantized model to '{quant_path}'...")
model.save_quantized(quant_path)
tokenizer.save_pretrained(quant_path)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants