Dimension mismatch when using Coca for VQA task #516

jemmyshin · 2023-04-29T03:13:42Z

I use generate endpoint to do VQA task in Coca model, but got this error:

It seems that this issue will not happen in beam_search mode but appear in top_k or top_p mode.

Also, when I change max_seq_len parameter in generate I got different outputs. For example: max_seq_len = 20 and generation_type = top_p will not raise this error message. However this will not work for max_seq_len = 78 and generation_type = top_p.

Am I use this in a wrong way?

The text was updated successfully, but these errors were encountered:

gpucce · 2023-04-29T07:25:53Z

Hi @jemmyshin, I think there was an issue similar to this one that was fixed some time ago, any chance that you are using an older version? Otherwise this is a bug, I will check what the issue is.

jemmyshin · 2023-05-01T08:12:02Z

I used the code in Coca Colab so it should be 2.18.0

gpucce · 2023-05-04T08:48:32Z

Hi @jemmyshin, so indeed there is a little bug in some sense, however you can probably already do what you want, if I understand it without any changes in the codebase. In the meantime I will open a PR.

The reason that a longer max_seq_len throws an error is that the model is trained with a context length of 77 and it has a special token so using 76, the default, or less should be the way to go. However, what that parameter affects is only the context the model uses to generate not the length of the generation.

If I understand you are not getting an answer after your prompt, the reason for that is the tokenizer. if you replace
text = ... with

text = open_clip.tokenize(["Question: what is the color of this billboard? Answer:"])
text = text[:, :torch.where(text == 0)[1][0] - 1]

you should get the answer after the prompt, the issue is that the tokenizer adds padding and end of text token by default, I will make a pr to fix this but you should be able to try with this already. Let me know if this actually works!

jemmyshin · 2023-05-05T03:14:52Z

Yes, that works for single batch, but probably not for batch_size > 1 since each question may have different length.
Also, the output somehow concatenate the prompt and answer:

Is there a way to separate them automatically (if input text is not None)?

LixDemon · 2024-02-08T02:46:22Z

@jemmyshin Hi, can you share the full code of VQA in coca? Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dimension mismatch when using Coca for VQA task #516

Dimension mismatch when using Coca for VQA task #516

jemmyshin commented Apr 29, 2023

gpucce commented Apr 29, 2023

jemmyshin commented May 1, 2023

gpucce commented May 4, 2023

jemmyshin commented May 5, 2023

LixDemon commented Feb 8, 2024

Dimension mismatch when using Coca for VQA task #516

Dimension mismatch when using Coca for VQA task #516

Comments

jemmyshin commented Apr 29, 2023

gpucce commented Apr 29, 2023

jemmyshin commented May 1, 2023

gpucce commented May 4, 2023

jemmyshin commented May 5, 2023

LixDemon commented Feb 8, 2024