How do I evaluate the results after stage-1 of training BLIP2? #774

hawkiyc · 2024-12-12T07:35:31Z

Hi, developers,

I am revising your code to build a modified BLIP2 model for time-series input. Now, I am trying to figure out the architecture of this framework. I have tested the bash run_scripts/blip2/train/pretrain_stage1.sh command with the coco dataset (btw, there are mismatches between images and annotations in the vg dataset, so I removed it), and it seems to work fine. However, I cannot find any script or .yaml file for evaluation of the result of stage 1. I have checked the lavis/configs/datasets/coco/defaults_cap.yaml file, and there is information for train, val, and test subsets.

defaults_cap.yaml

datasets:
  coco_caption: # name of the dataset builder
    dataset_card: dataset_card/coco_caption.md
    # data_dir: ${env.data_dir}/datasets
    data_type: images # [images|videos|features]

    build_info:
      # Be careful not to append minus sign (-) before split to avoid itemizing
      annotations:
        train:
          url: https://storage.googleapis.com/sfr-vision-language-research/datasets/coco_karpathy_train.json
          md5: aa31ac474cf6250ebb81d18348a07ed8
          storage: coco/annotations/coco_karpathy_train.json
        val:
          url: https://storage.googleapis.com/sfr-vision-language-research/datasets/coco_karpathy_val.json
          md5: b273847456ef5580e33713b1f7de52a0
          storage:  coco/annotations/coco_karpathy_val.json
        test:
          url: https://storage.googleapis.com/sfr-vision-language-research/datasets/coco_karpathy_test.json
          md5: 3ff34b0ef2db02d01c37399f6a2a6cd1
          storage: coco/annotations/coco_karpathy_test.json
      images:
        storage: coco/images/

Here is the printed result in the terminal:

Train: data epoch: [4]  [5550/5667]  eta: 0:03:26  lr: 0.000019  loss: 4.0731  loss_itc: 0.9712 (0.9633)  loss_itm: 0.1881 (0.1714)  loss_lm: 2.8563 (2.8436)  time: 1.7917  data: 0.0000  max mem: 27191
Train: data epoch: [4]  [5600/5667]  eta: 0:01:58  lr: 0.000019  loss: 4.1341  loss_itc: 0.9485 (0.9633)  loss_itm: 0.1703 (0.1713)  loss_lm: 2.8336 (2.8436)  time: 1.7898  data: 0.0000  max mem: 27191
Train: data epoch: [4]  [5650/5667]  eta: 0:00:30  lr: 0.000019  loss: 3.8998  loss_itc: 0.9417 (0.9632)  loss_itm: 0.1509 (0.1713)  loss_lm: 2.8545 (2.8438)  time: 1.7882  data: 0.0000  max mem: 27191
Train: data epoch: [4]  [5666/5667]  eta: 0:00:01  lr: 0.000019  loss: 3.9018  loss_itc: 0.9507 (0.9632)  loss_itm: 0.1535 (0.1713)  loss_lm: 2.8405 (2.8438)  time: 1.8221  data: 0.0000  max mem: 27191
Train: data epoch: [4] Total time: 2:47:07 (1.7694 s / it)
INFO - 2024-12-12 03:24:12,536 - base_task - Averaged stats: lr: 0.0000  loss: 3.9783  loss_itc: 0.9632  loss_itm: 0.1713  loss_lm: 2.8438
INFO - 2024-12-12 03:24:12,543 - runner_base - No validation splits found.
INFO - 2024-12-12 03:24:12,598 - runner_base - Saving checkpoint at epoch 4 to /home/revlis_ai/Documents/training_models_temp/LAVIS_with_JoLT/lavis/output/BLIP2/Pretrain_stage1/20241211132/checkpoint_4.pth.
INFO - 2024-12-12 03:24:15,828 - runner_base - Saving checkpoint at epoch 4 to /home/revlis_ai/Documents/training_models_temp/LAVIS_with_JoLT/lavis/output/BLIP2/Pretrain_stage1/20241211132/checkpoint_4.pth.
INFO - 2024-12-12 03:24:23,201 - runner_base - No validation splits found.
INFO - 2024-12-12 03:24:23,203 - runner_base - Training time 13:55:33
[rank0]:[W1212 03:24:24.182641511 ProcessGroupNCCL.cpp:1168] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())

Output log file

{
    "run": {
        "task": "image_text_pretrain",
        "lr_sched": "linear_warmup_cosine_lr",
        "init_lr": 0.0001,
        "min_lr": 1e-05,
        "warmup_lr": 1e-06,
        "weight_decay": 0.05,
        "max_epoch": 5,
        "batch_size_train": 100,
        "batch_size_eval": 64,
        "num_workers": 4,
        "warmup_steps": 5000,
        "seed": 42,
        "output_dir": "output/BLIP2/Pretrain_stage1",
        "amp": true,
        "resume_ckpt_path": null,
        "evaluate": false,
        "train_splits": [
            "train"
        ],
        "device": "cuda",
        "world_size": 1,
        "dist_url": "env://",
        "distributed": true,
        "rank": 0,
        "gpu": 0,
        "dist_backend": "nccl"
    },
    "model": {
        "arch": "blip2",
        "load_finetuned": false,
        "pretrained": "https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/BLIP2/blip2_pretrained.pth",
        "finetuned": "",
        "image_size": 224,
        "drop_path_rate": 0,
        "use_grad_checkpoint": false,
        "vit_precision": "fp16",
        "freeze_vit": true,
        "num_query_token": 32,
        "model_type": "pretrain",
        "load_pretrained": false
    },
    "preprocess": {
        "vis_processor": {
            "train": {
                "name": "blip_image_train",
                "image_size": 224
            },
            "eval": {
                "name": "blip_image_eval",
                "image_size": 224
            }
        },
        "text_processor": {
            "train": {
                "name": "blip_caption"
            },
            "eval": {
                "name": "blip_caption"
            }
        }
    },
    "datasets": {
        "coco_caption": {
            "dataset_card": "dataset_card/coco_caption.md",
            "data_type": "images",
            "build_info": {
                "annotations": {
                    "train": {
                        "url": "https://storage.googleapis.com/sfr-vision-language-research/datasets/coco_karpathy_train.json",
                        "md5": "aa31ac474cf6250ebb81d18348a07ed8",
                        "storage": "coco/annotations/coco_karpathy_train.json"
                    },
                    "val": {
                        "url": "https://storage.googleapis.com/sfr-vision-language-research/datasets/coco_karpathy_val.json",
                        "md5": "b273847456ef5580e33713b1f7de52a0",
                        "storage": "coco/annotations/coco_karpathy_val.json"
                    },
                    "test": {
                        "url": "https://storage.googleapis.com/sfr-vision-language-research/datasets/coco_karpathy_test.json",
                        "md5": "3ff34b0ef2db02d01c37399f6a2a6cd1",
                        "storage": "coco/annotations/coco_karpathy_test.json"
                    }
                },
                "images": {
                    "storage": "coco/images/"
                }
            },
            "vis_processor": {
                "train": {
                    "name": "blip2_image_train",
                    "image_size": 224
                }
            },
            "text_processor": {
                "train": {
                    "name": "blip_caption"
                }
            }
        }
    }
}
{"train_lr": "0.000", "train_loss": "5.582", "train_loss_itc": "1.492", "train_loss_itm": "0.402", "train_loss_lm": "3.688"}
{"train_lr": "0.000", "train_loss": "4.538", "train_loss_itc": "1.097", "train_loss_itm": "0.266", "train_loss_lm": "3.174"}
{"train_lr": "0.000", "train_loss": "4.288", "train_loss_itc": "1.035", "train_loss_itm": "0.222", "train_loss_lm": "3.031"}
{"train_lr": "0.000", "train_loss": "4.110", "train_loss_itc": "0.993", "train_loss_itm": "0.192", "train_loss_lm": "2.925"}
{"train_lr": "0.000", "train_loss": "3.978", "train_loss_itc": "0.963", "train_loss_itm": "0.171", "train_loss_lm": "2.844"}

The text was updated successfully, but these errors were encountered:

parth1313 · 2025-01-16T13:01:35Z

Hey @hawkiyc

I want to train the BLIP2, however i am getting issues like this :
from diffusers import (

File "/usr/local/lib/python3.11/dist-packages/diffusers/__init__.py", line 3, in <module>
   from .configuration_utils import ConfigMixin
 File "/usr/local/lib/python3.11/dist-packages/diffusers/configuration_utils.py", line 34, in <module>
   from .utils import (
 File "/usr/local/lib/python3.11/dist-packages/diffusers/utils/__init__.py", line 38, in <module>
   from .dynamic_modules_utils import get_class_from_dynamic_module
 File "/usr/local/lib/python3.11/dist-packages/diffusers/utils/dynamic_modules_utils.py", line 29, in <module>
   from huggingface_hub import HfFolder, cached_download, hf_hub_download, model_info
ImportError: cannot import name 'cached_download' from 'huggingface_hub' (/usr/local/lib/python3.11/dist-packages/huggingface_hub/__init__.py)

because, some of the packages are not being installed properly due to compatiblity. For example :

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
albucore 0.0.19 requires opencv-python-headless>=4.9.0.80, but you have opencv-python-headless 4.5.5.64 which is incompatible.
albumentations 1.4.20 requires opencv-python-headless>=4.9.0.80, but you have opencv-python-headless 4.5.5.64 which is incompatible.
sentence-transformers 3.3.1 requires transformers<5.0.0,>=4.41.0, but you have transformers 4.26.1 which is incompatible.
ERROR: Could not find a version that satisfies the requirement open3d==0.13.0 (from salesforce-lavis) (from versions: 0.16.0, 0.17.0, 0.18.0, 0.19.0)
ERROR: No matching distribution found for open3d==0.13.0

I am running it on colab with A100.

Can you provide the solution?

hawkiyc · 2025-01-16T13:25:32Z

Hi @parth1313, the cached_download is removed after transformer_hub v0.26. Downgrade your huggingface_hub to 0.25.* shall solve this problem.

parth1313 · 2025-01-16T14:06:21Z

Thank you for the reply @hawkiyc

Still getting the issue.
Can you tell me the versions of all libraries you have used during pretrain_satge1?

I am using salesforce-lavis==1.0.2 and all other libraries as given in the requirements.txt:

contexttimer
decord
diffusers<=0.16.0
einops>=0.4.1
fairscale==0.4.4
ftfy
iopath
ipython
omegaconf
opencv-python-headless==4.5.5.64
opendatasets
packaging
pandas
plotly
pre-commit
pycocoevalcap
pycocotools
python-magic
scikit-image
sentencepiece
spacy
streamlit
timm==0.4.12
torch>=1.10.0
torchvision
tqdm
transformers==4.33.2
webdataset
wheel
torchaudio
soundfile
moviepy
nltk
peft

easydict==1.9
pyyaml_env_tag==0.1
open3d==0.13.0
h5py

Here is what i am getting when installing !pip install salesforce-lavis

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
sentence-transformers 3.3.1 requires transformers<5.0.0,>=4.41.0, but you have transformers 4.26.1 which is incompatible.
albumentations 1.4.20 requires opencv-python-headless>=4.9.0.80, but you have opencv-python-headless 4.5.5.64 which is incompatible.
albucore 0.0.19 requires opencv-python-headless>=4.9.0.80, but you have opencv-python-headless 4.5.5.64 which is incompatible.
Successfully installed antlr4-python3-runtime-4.9.3 braceexpand-0.1.7 cfgv-3.4.0 contexttimer-0.3.3 decord-0.6.0 distlib-0.3.9 fairscale-0.4.4 ftfy-6.3.1 identify-2.6.5 iopath-0.1.10 jedi-0.19.2 nodeenv-1.9.1 omegaconf-2.3.0 opencv-python-headless-4.5.5.64 opendatasets-0.1.22 portalocker-3.1.1 pre-commit-4.0.1 pycocoevalcap-1.2 pydeck-0.9.1 python-magic-0.4.27 salesforce-lavis-1.0.2 streamlit-1.41.1 timm-0.4.12 tokenizers-0.13.3 transformers-4.26.1 virtualenv-20.29.0 watchdog-6.0.0 webdataset-0.2.100

And following error while running !python evaluate.py --cfg-path lavis/projects/blip2/eval/caption_coco_opt2.7b_eval.yaml :

2025-01-16 14:04:01.761870: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-01-16 14:04:01.780186: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-01-16 14:04:01.801982: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-01-16 14:04:01.808581: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-01-16 14:04:01.825630: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-01-16 14:04:02.871386: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
error: XDG_RUNTIME_DIR not set in the environment.
ALSA lib confmisc.c:855:(parse_card) cannot find card '0'
ALSA lib conf.c:5178:(_snd_config_evaluate) function snd_func_card_inum returned error: No such file or directory
ALSA lib confmisc.c:422:(snd_func_concat) error evaluating strings
ALSA lib conf.c:5178:(_snd_config_evaluate) function snd_func_concat returned error: No such file or directory
ALSA lib confmisc.c:1334:(snd_func_refer) error evaluating name
ALSA lib conf.c:5178:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
ALSA lib conf.c:5701:(snd_config_expand) Evaluate error: No such file or directory
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM default
ALSA lib confmisc.c:855:(parse_card) cannot find card '0'
ALSA lib conf.c:5178:(_snd_config_evaluate) function snd_func_card_inum returned error: No such file or directory
ALSA lib confmisc.c:422:(snd_func_concat) error evaluating strings
ALSA lib conf.c:5178:(_snd_config_evaluate) function snd_func_concat returned error: No such file or directory
ALSA lib confmisc.c:1334:(snd_func_refer) error evaluating name
ALSA lib conf.c:5178:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
ALSA lib conf.c:5701:(snd_config_expand) Evaluate error: No such file or directory
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM default
Traceback (most recent call last):
  File "/content/LAVIS/evaluate.py", line 15, in <module>
    import lavis.tasks as tasks
  File "/content/LAVIS/lavis/__init__.py", line 15, in <module>
    from lavis.datasets.builders import *
  File "/content/LAVIS/lavis/datasets/builders/__init__.py", line 8, in <module>
    from lavis.datasets.builders.base_dataset_builder import load_dataset_config
  File "/content/LAVIS/lavis/datasets/builders/base_dataset_builder.py", line 18, in <module>
    from lavis.processors.base_processor import BaseProcessor
  File "/content/LAVIS/lavis/processors/__init__.py", line 29, in <module>
    from lavis.processors.audio_processors import BeatsAudioProcessor
  File "/content/LAVIS/lavis/processors/audio_processors.py", line 17, in <module>
    from lavis.models.beats.Tokenizers import TokenizersConfig, Tokenizers
  File "/content/LAVIS/lavis/models/__init__.py", line 42, in <module>
    from lavis.models.blip2_models.blip2_vicuna_xinstruct import Blip2VicunaXInstruct
  File "/content/LAVIS/lavis/models/blip2_models/blip2_vicuna_xinstruct.py", line 22, in <module>
    from peft import (
  File "/usr/local/lib/python3.11/dist-packages/peft/__init__.py", line 22, in <module>
    from .auto import (
  File "/usr/local/lib/python3.11/dist-packages/peft/auto.py", line 32, in <module>
    from .mapping import MODEL_TYPE_TO_PEFT_MODEL_MAPPING
  File "/usr/local/lib/python3.11/dist-packages/peft/mapping.py", line 25, in <module>
    from .mixed_model import PeftMixedModel
  File "/usr/local/lib/python3.11/dist-packages/peft/mixed_model.py", line 29, in <module>
    from .peft_model import PeftModel
  File "/usr/local/lib/python3.11/dist-packages/peft/peft_model.py", line 37, in <module>
    from transformers import Cache, DynamicCache, EncoderDecoderCache, PreTrainedModel
ImportError: cannot import name 'Cache' from 'transformers' (/usr/local/lib/python3.11/dist-packages/transformers/__init__.py)

hawkiyc · 2025-01-17T05:33:39Z

Hi @parth1313, Lavis framework needs specific versions of transformer; please install your transformer with pip install transformers==4.33.2. This shall solve the error message you encounter. As the pip conflicts, you may need to downgrade some libraries or frameworks. You can grab my env if you want, but please note that I am using Ubuntu 22.04 and Anaconda.

parth1313 · 2025-01-17T09:20:22Z

Thank you for the help @hawkiyc

But, transformers==4.33.2 is not compatible with salesforce-lavis==1.0.2 as salesforce-lavis 1.0.2 requires transformers<4.27 and >=4.25.0 and doing so is further giving :

albumentations 1.4.20 requires opencv-python-headless>=4.9.0.80, but you have opencv-python-headless 4.5.5.64 which is incompatible .
albucore 0.0.19 requires opencv-python-headless>=4.9.0.80, but you have opencv-python-headless 4.5.5.64 which is incompatible.

And if i try to install opencv-python-headless>=4.9.0.80, it is giving:
salesforce-lavis 1.0.2 requires opencv-python-headless==4.5.5.64, but you have opencv-python-headless 4.9.0.80 which is incompatible.

Moreover, none of the downgraded version of albucore 0.0.19 is compatible with opencv-python-headless 4.5.5.64.

can you further calrify?

hawkiyc · 2025-01-17T10:59:07Z

Hi, @parth1313
I'm sorry that I overlooked you installed the Lavis library. I only installed the needed libraries with pip install -r requirements.txt, and git cloned all files within the Lavis repository because I needed a modified BLIP2 model for my project. In my opinion, git clone the whole repository will be a better choice if you want to train your own model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do I evaluate the results after stage-1 of training BLIP2? #774

How do I evaluate the results after stage-1 of training BLIP2? #774

hawkiyc commented Dec 12, 2024

parth1313 commented Jan 16, 2025

hawkiyc commented Jan 16, 2025

parth1313 commented Jan 16, 2025 •

edited

Loading

hawkiyc commented Jan 17, 2025 •

edited

Loading

parth1313 commented Jan 17, 2025 •

edited

Loading

hawkiyc commented Jan 17, 2025

How do I evaluate the results after stage-1 of training BLIP2? #774

How do I evaluate the results after stage-1 of training BLIP2? #774

Comments

hawkiyc commented Dec 12, 2024

parth1313 commented Jan 16, 2025

hawkiyc commented Jan 16, 2025

parth1313 commented Jan 16, 2025 • edited Loading

hawkiyc commented Jan 17, 2025 • edited Loading

parth1313 commented Jan 17, 2025 • edited Loading

hawkiyc commented Jan 17, 2025

parth1313 commented Jan 16, 2025 •

edited

Loading

hawkiyc commented Jan 17, 2025 •

edited

Loading

parth1313 commented Jan 17, 2025 •

edited

Loading