This project provides functionalities for training and configuring Vision-Language Models (VLM).
- Open Vision Language Model: Ex. Qwen2-VL, Pixtral, LLama 3.2 Vision
- Training methods for VLMs: Pre-Training, Supervised Fine-Tuning
git clone https://github.com/thisisiron/LLaVA-Pool.git
cd LLaVA-Pool
pip install flash-attn --no-build-isolation
Model | Converter |
---|---|
Qwen2-VL | qwen2_vl |
Llama 3.2 Vision | llama3.2_vision |
Pixtral | pixtral |
This repository was built based on LLaMA-Factory.
- LLaMA-Factory
- LLaVA-NeXT