Skip to content

Latest commit

 

History

History

tiny_llama_synthetic_data

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Compress TinyLLama model using synthetic data

This example demonstrates how to optimize Large Language Models (LLMs) using NNCF weight compression API & synthetic data for the advanced algorithms usage. The example applies 4/8-bit mixed-precision quantization & Scale Estimation algorithm to weights of Linear (Fully-connected) layers of TinyLlama/TinyLlama-1.1B-Chat-v1.0 model. To evaluate the accuracy of the compressed model we measure similarity between two texts generated by the baseline and compressed models using WhoWhatBench library.

The example includes the following steps:

  • Prepare wikitext dataset.
  • Prepare TinyLlama/TinyLlama-1.1B-Chat-v1.0 text-generation model in OpenVINO representation using Optimum-Intel.
  • Compress weights of the model with NNCF Weight compression algorithm with Scale Estimation & wikitext dataset.
  • Prepare synthetic dataset using nncf.data.generate_text_data method.
  • Compress weights of the model with NNCF Weight compression algorithm with Scale Estimation & synthetic dataset.
  • Measure the similarity of the two models optimized with different datasets.

Install requirements

To use this example:

  • Create a separate Python* environment and activate it: python3 -m venv nncf_env && source nncf_env/bin/activate
  • Install dependencies:
pip install -U pip
pip install -r requirements.txt
pip install ../../../../

Run Example

The example is fully automated. Just run the following command in the prepared Python environment:

python main.py