Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning

This is the implementation for the paper Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning. We propose a fine-tuning strategy called PREREQ-TUNE to reduce LLM hallucinations. PREREQ-TUNE disentangles the learning of skills and knowledge, addressing the knowledge inconsistency between pre-training and fine-tuning. It further leverages fictitious synthetic data to enhance the grounding of LLM outputs to their internal knowledge.

Quick Links

Dataset on Hugging Face: Link to our synthetic datasets.
Models on Hugging Face: Link to our fine-tuned models.

Installation

To run the code in this project, first, create a Python virtual environment using e.g. Conda:

conda create -n prereq_tune python=3.10 && conda activate prereq_tune

Next, install PyTorch v2.4.0:

pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu121

You can then install the remaining package dependencies as follows:

git clone https://github.com/huggingface/alignment-handbook.git
cd ./alignment-handbook/
rm setup.py
cp ../setup.py ./
python -m pip install .

You will also need Flash Attention 2 installed, which can be done by running:

python -m pip install flash-attn==2.6.3 --no-build-isolation

Usage

Code Structure

├── bash_scripts                 <- Bash scripts to run experiments
├── recipes                      <- Recipe configs for all datasets, accelerate configs
├── scripts                     
│   ├── run_cpt.py               <- First step prerequisite learning
│   ├── run_sft.py               <- Second step supervised fine-tuning
│   ├── evaluate_qa.py           <- Evaluate on PopQA and HotpotQA
│   ├── generate_longform.py     <- Generate long-form answers

Run Experiments

To run our experiments, use the corresponding script in bash_scripts.

For example, for HotpotQA, you can run:

bash bash_scripts/hotpotqa.sh

Please remember to replace SAVE_DIR with your specific path to save the models.

Evaluation

The evaluation for PopQA and HotpotQA is already included in their bash scripts, please refer to the scripts for details.

For biography generation and medical QA, we use FActScore for evaluation. However, we slightly modify the pipeline of FActScore as described in our paper. We will release the modified code soon.

Citation

If you find the content of this repo useful in your work, please cite it as follows:

@misc{liu2024fictitioussyntheticdataimprove,
      title={Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning}, 
      author={Yujian Liu and Shiyu Chang and Tommi Jaakkola and Yang Zhang},
      year={2024},
      eprint={2410.19290},
      archivePrefix={arXiv},
      primaryClass={cs.CL}, 
}

Acknowledgement

Our implementation is based on the following repos:

https://github.com/huggingface/alignment-handbook

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
bash_scripts		bash_scripts
data		data
recipes		recipes
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
data.py		data.py
setup.py		setup.py
training_utils.py		training_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning

Quick Links

Installation

Usage

Code Structure

Run Experiments

Evaluation

Citation

Acknowledgement

About

Releases

Packages

Languages

License

UCSB-NLP-Chang/Prereq_tune

Folders and files

Latest commit

History

Repository files navigation

Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning

Quick Links

Installation

Usage

Code Structure

Run Experiments

Evaluation

Citation

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages