diff --git a/experiment_effect_of_dpo/README.md b/experiment_effect_of_dpo/README.md new file mode 100644 index 0000000..951e37d --- /dev/null +++ b/experiment_effect_of_dpo/README.md @@ -0,0 +1,19 @@ +## Train DPO + +```bash +002-001-dpo-temp-0_3-v-all-ref.sh +``` + +### Configuration + +- BASE_MODEL: Name of Model for save. +- DATA_PATH: Dataset Path. +- EPOCH: Num Train Epoch. +- LR: 2e-5 for full finetune and 2e-4 for lora. +- GRADIENT_ACCUMULATION_STEPS: Accumulation step. +- MAX_LEN: Max training length. +- MAX_PROMPT_LEN: Max training prompt length. +- MICRO_BSZ: Batch size per step. +- VAL_SIZE: Split validation set. +- WANDB_NAME: Wandb project name. +- WARMUP_STEPS: Warmup step for scheduler. \ No newline at end of file