Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

README.md

Compress TinyLLama model using synthetic data

This example demonstrates how to optimize Large Language Models (LLMs) using NNCF weight compression API & synthetic data for the advanced algorithms usage. The example applies 4/8-bit mixed-precision quantization & Scale Estimation algorithm to weights of Linear (Fully-connected) layers of TinyLlama/TinyLlama-1.1B-Chat-v1.0 model. To evaluate the accuracy of the compressed model we measure similarity between two texts generated by the baseline and compressed models using WhoWhatBench library.

The example includes the following steps:

Prepare wikitext dataset.
Prepare TinyLlama/TinyLlama-1.1B-Chat-v1.0 text-generation model in OpenVINO representation using Optimum-Intel.
Compress weights of the model with NNCF Weight compression algorithm with Scale Estimation & wikitext dataset.
Prepare synthetic dataset using nncf.data.generate_text_data method.
Compress weights of the model with NNCF Weight compression algorithm with Scale Estimation & synthetic dataset.
Measure the similarity of the two models optimized with different datasets.

Install requirements

To use this example:

Create a separate Python* environment and activate it: python3 -m venv nncf_env && source nncf_env/bin/activate
Install dependencies:

pip install -U pip
pip install -r requirements.txt
pip install ../../../../

Run Example

The example is fully automated. Just run the following command in the prepared Python environment:

python main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tiny_llama_synthetic_data

tiny_llama_synthetic_data

README.md

Compress TinyLLama model using synthetic data

Install requirements

Run Example

Files

tiny_llama_synthetic_data

Directory actions

More options

Directory actions

More options

Latest commit

History

tiny_llama_synthetic_data

Folders and files

parent directory

README.md

Compress TinyLLama model using synthetic data

Install requirements

Run Example