Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dimension mismatch when using Coca for VQA task #516

Open
jemmyshin opened this issue Apr 29, 2023 · 5 comments
Open

Dimension mismatch when using Coca for VQA task #516

jemmyshin opened this issue Apr 29, 2023 · 5 comments

Comments

@jemmyshin
Copy link

I use generate endpoint to do VQA task in Coca model, but got this error:
image

It seems that this issue will not happen in beam_search mode but appear in top_k or top_p mode.

Also, when I change max_seq_len parameter in generate I got different outputs. For example: max_seq_len = 20 and generation_type = top_p will not raise this error message. However this will not work for max_seq_len = 78 and generation_type = top_p.

image

Am I use this in a wrong way?

@gpucce
Copy link
Contributor

gpucce commented Apr 29, 2023

Hi @jemmyshin, I think there was an issue similar to this one that was fixed some time ago, any chance that you are using an older version? Otherwise this is a bug, I will check what the issue is.

@jemmyshin
Copy link
Author

I used the code in Coca Colab so it should be 2.18.0

@gpucce
Copy link
Contributor

gpucce commented May 4, 2023

Hi @jemmyshin, so indeed there is a little bug in some sense, however you can probably already do what you want, if I understand it without any changes in the codebase. In the meantime I will open a PR.

The reason that a longer max_seq_len throws an error is that the model is trained with a context length of 77 and it has a special token so using 76, the default, or less should be the way to go. However, what that parameter affects is only the context the model uses to generate not the length of the generation.

If I understand you are not getting an answer after your prompt, the reason for that is the tokenizer. if you replace
text = ... with

text = open_clip.tokenize(["Question: what is the color of this billboard? Answer:"])
text = text[:, :torch.where(text == 0)[1][0] - 1]

you should get the answer after the prompt, the issue is that the tokenizer adds padding and end of text token by default, I will make a pr to fix this but you should be able to try with this already. Let me know if this actually works!

@jemmyshin
Copy link
Author

Yes, that works for single batch, but probably not for batch_size > 1 since each question may have different length.
Also, the output somehow concatenate the prompt and answer:
image

Is there a way to separate them automatically (if input text is not None)?

@LixDemon
Copy link

LixDemon commented Feb 8, 2024

@jemmyshin Hi, can you share the full code of VQA in coca? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants