Skip to content

Latest commit

 

History

History
75 lines (52 loc) · 2.71 KB

train.md

File metadata and controls

75 lines (52 loc) · 2.71 KB

The training process of ImageBind-LLM consists of the pre-training and fine-tuning phases.

Pre-training

Data

  • We use multiple datasets with image-text pairs for pre-training. The texts are English-only.

  • For each dataset, the meta file should be organized in the .csv format as following:

    url		caption
    /path/to/image1		caption1
    /path/to/image2		caption2
    ...
    

    Alternatively, you may modify the PretrainDataset implementation to adapt to your own meta file format.

  • Write a .yaml config file to specify the datasets for pre-training:

    META:
      - '/path/to/cc3m.csv'
      - '/path/to/coco.csv'
      ...
    

Start pre-training

We are now ready to start pre-training (please make sure that the original LLaMA / Open-Chinese-LLaMA weights are available in /path/to/llama_model_weights).

. exps/pretrain.sh /path/to/llama_model_weights /path/to/pretrain-data-config.yaml /output/path

Pre-trained checkpoint

We release our image-text pre-trained checkpoint for reproduction. You may directly use it for fine-tuning.

Name Link
7B-pretrained.pth File Link

Fine-tuning

Data

  • We fine-tune ImageBind-LLM on text-only as well as image-text instruction following datasets.

  • The following lists the datasets we use for training our release weights:

    Name Link
    alpaca_gpt4_data.json File Link
    alpaca_gpt4_data_zh.json File Link
    llava_instruct_150k.json File Link
    alpaca_data_zh_51k.json File Link
  • Similar to pre-training, write a .yaml config file to specify the datasets for fine-tuning:

    META:
      - '/path/to/alpaca_gpt4_data.json'
      - '/path/to/alpaca_gpt4_data_zh.json'
      ...
    

Start fine-tuning

. exps/finetune.sh \
 /path/to/llama_model_weights /path/to/pre-trained/checkopint.pth \
 /path/to/finetune-data-config.yaml /output/path