-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions about using COCa to generate captions #797
Comments
Hi, @ykj467422034 can you share a snippet of the code you are actually using? |
This is what I actually use, because I want to generate a caption, but the key is that they are repeated. |
So it generates the captions you are showing for the "cat.jpg" file? |
No, I know your meanings. |
@ykj467422034 sorry didn´t see your reply, so it repeats the same caption for different images or is generating several captions for one image? Also did you try and generate a caption for a random tensor? |
The former. repeat captions |
Mmmh not sure, I asked about the random tensor to see if the model generates the same caption also in that case, if that is so, maybe fine-tuning didn´t go well. Do you get a similar behaviour with the pretrained model? |
|
Hello, @ykj467422034 , I haven't check if the most recent update has fixed this issue so this suggestion might not work and in fact it might screw everything up so this is my warning to you, but assuming it hasn't I will refer you to issue #751 The problem was that after coca finetuning the model, all of it's predictions were all repetitions of the same word. For example in the issue it was "turnpike turnpike turnpike turnpike parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway". The solution I found to work for me as I described in issue #751 was to git pull the open_clip repository, and then edit my local files in open_clip/src/open_clip/coca_model.py the lines as exactly specified line per line in Pull Request #710 by gpucce, and then ran |
I edited it as you say, |
Are you using the newest branch? I was not so perhaps that is impacting the edit's success. |
Do you mean open_clip repository or modified src files? |
Apologies for my lack of clarity, I mean the open_clip repository, I was using the most up to date version at the time, but it looks like there have been multiple new commits made to it since then. I am currently unable to access mine to check which commit I am using though. |
Fine. Maybe I can try the latest version once more. Thanks |
@ykj467422034 I think that with those changes you would still need to rerun the fine-tuning |
Sure, I will. Thank you! |
I'm finetuning OpenCLIP on my own csv dataset. Then I output the check_point file, and then use the official code to generate captions. However, the generated captions are always being generated repeatedly. Is there anyone who can help me solve this problem?
Finetuning
python -m training.main \ --dataset-type "csv" \ --train-data "my-csv/coca_train.csv" \ --warmup 1000 \ --batch-size 32 \ --lr 1e-5 \ --wd 0.1 \ --epochs 1 \ --workers 3 \ --model "coca_ViT-L-14" \ --report-to "wandb" \ --coca-contrastive-loss-weight 0 \ --coca-caption-loss-weight 1 \ --log-every-n-steps 100
Test
`import open_clip
import torch
from PIL import Image
model, _, transform = open_clip.create_model_and_transforms(
model_name="coca_ViT-L-14",
pretrained="logs/check_point.pth"
)
im = Image.open("cat.jpg").convert("RGB")
im = transform(im).unsqueeze(0)
with torch.no_grad(), torch.cuda.amp.autocast():
generated = model.generate(im)
print(open_clip.decode(generated[0]).split("<end_of_text>")[0].replace("<start_of_text>", ""))
`
Result
As you can see, the captions generated by different pictures are the same.
The text was updated successfully, but these errors were encountered: