Skip to content

Commit

Permalink
Multiple changes to VQA data prep (#21)
Browse files Browse the repository at this point in the history
- Fixed the broken links for PathVQA, SLAKE, RadVQA (Github dataset
links for PathVQA and SLAKE were originally there and now have been
removed)
- The original PathVQA dataset could not be found except for a hugging
face link, this required adding additional processing steps to the data.
New script to process from parquet files have been added
- Readme was also updated
- Instruction tuning files have been removed as providing them would be
a form of re-distribution of data.

---------

Signed-off-by: Vishwesh Nath <[email protected]>
  • Loading branch information
finalelement authored Oct 18, 2024
1 parent 1b32a28 commit 7795abd
Show file tree
Hide file tree
Showing 11 changed files with 194 additions and 652,500 deletions.
6 changes: 3 additions & 3 deletions monai_vila2d/data_prepare/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,9 @@ experts

| Dataset | QA/Text Pairs | Images | Link |
|-----------|-----------|-----------|------|
| PathVQA | ~32,000 | ~4,000 | [PathVQA](https://github.com/UCSD-AI4H/PathVQA) |
| RadVQA | ~25,000 | ~7,000 | [RadVQA](https://github.com/abachaa/VQA-Med-2019) |
| SLAKE | ~45,000 | ~14,000 | [SLAKE](https://github.com/SLAKE-SLAKE/SLAKE) |
| PathVQA | ~32,000 | ~4,000 | [PathVQA](https://huggingface.co/datasets/flaviagiammarino/path-vqa) |
| RadVQA | ~25,000 | ~7,000 | [RadVQA](https://osf.io/89kps/) |
| SLAKE | ~45,000 | ~14,000 | [SLAKE](https://www.med-vqa.com/slake/) |
| Medical-Diff-VQA | ~429,000 | 129,232 | [MIMIC-VQA](https://physionet.org/content/medical-diff-vqa/1.0.0) |
| MIMIC-CXR-JPG | 270,784 | 270,784 | [MIMIC-CXR-JPG](https://physionet.org/content/mimic-cxr-jpg/2.1.0/) |
| ChestXRay14 | 1,962 | 1,962 | [nih-chest-xray](https://cloud.google.com/healthcare-api/docs/resources/public-datasets/nih-chest#additional_labels) |
Expand Down
10 changes: 5 additions & 5 deletions monai_vila2d/data_prepare/vqa/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,11 +31,11 @@ python slake_instruct_data_generate.py \
Example command to run generate the instruction training data json file for PathVQA dataset:

```
python pathvqa_instruct_data_generate.py \
--train_pkl /path/to/train_vqa.pkl \
--val_pkl /path/to/val_vqa.pkl \
--test_pkl /path/to/test_vqa.pkl \
--output_json /path/to/output/merged_pathvqa_instruct.json
python pathvqa_instruction_gen_parquet.py --input_path /path/to/input/parquet/files --output_path /path/to/output/processed/dataset
```
Please make sure that the .csv files were succesfully generated from the prior command before running the next command
```
python pathvqa_instruction_generate.py --input_dir /path/to/output/processed/dataset --output_dir /path/to/output_directory
```

### MIMIC-VQA
Loading

0 comments on commit 7795abd

Please sign in to comment.