medium based on llama 3.1 #226

themrzmaster · 2024-07-30T23:20:17Z

would be really nice to have a functionary version of llama 3.1 70b/8b!

jeffrey-fong · 2024-07-31T03:16:05Z

Hi, we are working on it actively right now. Looking forward to sharing good news soon!

xdevfaheem · 2024-08-04T10:06:55Z

@jeffrey-fong would it possible to release the datasets or atleast lora adapters?

xdevfaheem · 2024-08-04T10:07:58Z

Hi, we are working on it actively right now. Looking forward to sharing good news soon!

Heavily awaiting llama3.1-8b-functionary-medium-128k🤩

khai-meetkai · 2024-08-06T07:50:53Z

Hi @xdevfaheem , @themrzmaster we have just released our new model: meetkai/functionary-small-v3.1 that is based on: meta-llama/Meta-Llama-3.1-8B-Instruct

khai-meetkai · 2024-08-08T07:43:47Z

Hi @xdevfaheem , @themrzmaster we have also released our 70b model: functionary-medium-v3.1

xdevfaheem · 2024-08-08T12:55:33Z

That's awesome! Great work @khai-meetkai. TQSM

SoheylM · 2024-08-10T10:30:08Z

@khai-meetkai Thanks a lot for the great work! Any timeline for functionary-medium-v3.1/3.2 quantized with AWQ?

SoheylM · 2024-09-24T13:01:39Z

I tried to develop an AWQ quantized version of functionary-medium-v3.2 using AutoAWQ's quantized scripts (full GPU, GPU-CPU offload is bugged with the transformers version used for Llama 3.1).

Unfortunately, it seems that I may have done something wrong as it is not performing very well, or at least way below my expectations compared to the AWQ 3.0 version. I used AutoAWQ's quantization script as follows (I manually edited config.json to replace FunctionaryForCausalLM by LlamaForCausalLM in architectures) :

from transformers import AutoTokenizer
from huggingface_hub import snapshot_download
import os

model_path = 'meetkai/functionary-medium-v3.2'
local_model_path = 'functionary-medium-v3.2-local'
quant_path = 'functionary-medium-v3.2-awq'
quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }

# Download the model files to a local directory
print("Downloading the model files...")
snapshot_download(repo_id=model_path, local_dir=local_model_path, allow_patterns=["*"])


# Load the model
print("Loading the model...")
model = AutoAWQForCausalLM.from_pretrained(
    local_model_path, device_map="auto", low_cpu_mem_usage=True, use_cache=False, trust_remote_code=True
)
print("Model loaded successfully.")

# Load the tokenizer
print("Loading the tokenizer...")
tokenizer = AutoTokenizer.from_pretrained(local_model_path, trust_remote_code=True)
print("Tokenizer loaded successfully.")

# Quantize the model
print("Starting quantization...")
model.quantize(tokenizer=tokenizer, quant_config=quant_config)
print("Quantization completed.")

# Save the quantized model
print(f"Saving quantized model to '{quant_path}'...")
model.save_quantized(quant_path)
tokenizer.save_pretrained(quant_path)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

medium based on llama 3.1 #226

medium based on llama 3.1 #226

themrzmaster commented Jul 30, 2024

jeffrey-fong commented Jul 31, 2024

xdevfaheem commented Aug 4, 2024

xdevfaheem commented Aug 4, 2024

khai-meetkai commented Aug 6, 2024

khai-meetkai commented Aug 8, 2024

xdevfaheem commented Aug 8, 2024

SoheylM commented Aug 10, 2024 •

edited

Loading

SoheylM commented Sep 24, 2024 •

edited

Loading

medium based on llama 3.1 #226

medium based on llama 3.1 #226

Comments

themrzmaster commented Jul 30, 2024

jeffrey-fong commented Jul 31, 2024

xdevfaheem commented Aug 4, 2024

xdevfaheem commented Aug 4, 2024

khai-meetkai commented Aug 6, 2024

khai-meetkai commented Aug 8, 2024

xdevfaheem commented Aug 8, 2024

SoheylM commented Aug 10, 2024 • edited Loading

SoheylM commented Sep 24, 2024 • edited Loading

SoheylM commented Aug 10, 2024 •

edited

Loading

SoheylM commented Sep 24, 2024 •

edited

Loading