Skip to content
This repository has been archived by the owner on Jan 30, 2024. It is now read-only.

Latest commit

 

History

History
29 lines (20 loc) · 1.87 KB

File metadata and controls

29 lines (20 loc) · 1.87 KB

Instructions to Replicate Zephyr-7b-β

As described in the Zephyr technical report, training this model proceeds in two steps:

  1. Apply SFT to fine-tune Mistral 7B on a filtered version of the UltraChat dataset (link). The result is an SFT model like zephyr-7b-sft-full or zephyr-7b-sft-lora.
  2. Align the SFT model to AI feedback via DPO on a preprocessed version of the UltraFeedback dataset (link). The result is an DPO model like zephyr-7b-dpo-full or zephyr-7b-dpo-lora.

See below for commands to train these models using either DeepSpeed ZeRO-3 or LoRA.

Full training examples

You will require 8 GPUs (80GB of VRAM) to train the full model.

# Step 1 - SFT
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_sft.py recipes/zephyr-7b-beta/sft/config_full.yaml

# Step 2 - DPO
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_dpo.py recipes/zephyr-7b-beta/dpo/config_full.yaml

LoRA training examples

# Step 1 - SFT
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/multi_gpu.yaml --num_processes=1 scripts/run_sft.py recipes/zephyr-7b-beta/sft/config_lora.yaml

# Step 2 - DPO
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/multi_gpu.yaml --num_processes=1 scripts/run_dpo.py recipes/zephyr-7b-beta/dpo/config_lora.yaml