Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add figures #32

Merged
merged 6 commits into from
Oct 24, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,10 @@ For details, see [here](m3/README.md).

### Local Demo

<p align="center">
<img src="m3/docs/images/gradio_app_ct.png" width="95%"/>
mingxin-zheng marked this conversation as resolved.
Show resolved Hide resolved
</p>

#### Prerequisites

1. **Linux Operating System**
Expand Down
55 changes: 39 additions & 16 deletions m3/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,36 +43,59 @@ The resulting expert model output will be fed back to the VLM for generating the
## Performance

### VQA Benchmarks
| | Average |
|-------------------|---------|
| VILA-M3-3B | |
| Llama3-VILA-M3-8B | |
| VILA-M3-13B | |
| Model | Type | VQA-RAD* | SLAKE-VQA | Path-VQA | Average |
|---------------------------|----------------------|-----------|-----------|----------|----------|
| Llava-Med | Task-specific | *84.2* | *86.8* | *91.7* | *87.6* |
| Med-Gemini-1.5T | Generalist | 78.8 | **84.8** | 83.3 | 82.3 |
| Llama3-VILA-M3-3B | Generalist | 78.2 | 79.8 | 87.9 | 82.0 |
| Llama3-VILA-M3-8B | Generalist | **84.5** | 84.5 | 90.0 | **86.3** |
| Llama3-VILA-M3-13B | Generalist | 80.5 | 83.2 | **91.0** | 84.9 |

### Report Generation Benchmarks
| | Average |
|-------------------|---------|
| VILA-M3-3B | |
| Llama3-VILA-M3-8B | |
| VILA-M3-13B | |
| Model | Type | BLUE-4* | ROUGE* |
|---------------------------|----------------------|----------|----------|
| Llava-Med | Task-specific | *1.0* | *13.3* |
| Med-Gemini-1.5T | Generalist | 20.5 | 28.3 |
| Llama3-VILA-M3-3B | Generalist | 20.2 | 31.7 |
| Llama3-VILA-M3-8B | Generalist | 21.5 | **32.3** |
| Llama3-VILA-M3-13B | Generalist | **21.6** | 32.1 |

### Classification Benchmarks
| | Average |
|-------------------|---------|
| VILA-M3-3B | |
| Llama3-VILA-M3-8B | |
| VILA-M3-13B | |
| Expert info | w/o | w/o | with | with |
|---------------------------|--------------|------------|--------------|------------|
| Model | ChestX-ray14 | CheXpert | ChestX-ray14 | CheXpert |
| Med-Gemini-1.5T | 46.7 | 48.3 | - | - |
| TorchXRayVision | - | - | 50 | 51.5 |
| Llama3-VILA-M3-3B | 48.4 | 57.4 | **51.3** | 60.8 |
| Llama3-VILA-M3-8B | 45.9 | **61.4** | 50.7 | 60.4 |
| Llama3-VILA-M3-13B | **49.9** | 55.8 | 51.2 | **61.5** |


## Demo
An interactive demo is provided in ...
For and interactive demo, please access here.
The code to run the demo locally is described [here](../README.md#local-demo).

## Data preparation
To prepare the datasets for training and evaluation, follow the instructions in [data_prepare](./data_prepare).

## Training
To replicate our fine-tuning procedure, utilize the provided scripts.

For our released checkpoints, we use a slurm cluster environment.
- VILA training code with Torch distributed
- 4 nodes with 8xA100 GPUs (80 GB each)
- Cosine learning rate decay with warmup

<p align="left">
<img src="docs/images/training.png" width="50%"/>
</p>

| # Parameters | Training time |
|---------------------|----------------------|
| 3 billion | 5.5 hours |
| 8 billion | 11.0 hours |
| 13 billion | 19.5 hours |

## Evaluation
To evaluate a model on the above benchmarks, follow the instructions in [eval](./eval/README.md)

Expand Down
Binary file added m3/docs/images/gradio_app_ct.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added m3/docs/images/training.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading