Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add VLM support #220

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from
Draft

Add VLM support #220

wants to merge 6 commits into from

Conversation

merveenoyan
Copy link

@merveenoyan merveenoyan commented Jan 16, 2025

This PR adds VLM support (closing other one for the sake of collaboration) @aymeric-roucher

  • This PR at the creation is probably broken (since you wanted to see it) primarily because as of now when you're writing memory you adopt chat templates like following:
messages = [
  {"role": "user", "content": "I'm doing great. How can I help you today?"},
]

whereas with VLMs we do like following so I modified a bit.

messages = [
  {"role": "user", "content": [{"type": "text", "text": "I'm doing great. How can I help you today?"},
{"type":"image"}
]

but you access content and modify it here and there so it is broken: (fixing)

  • Secondly I need to check if I'm adding images necessarily only once because it will break the inference if we pass one image more than once, I add it in multiple steps, so I will see.

Will open to review once I fix these.

@merveenoyan
Copy link
Author

@aymeric-roucher we can keep images in action step with a separate key. normally models do not produce images, so if we put images with "images" key it will break chat template. if we keep images for the sake of keeping images, we can keep it under a different key like "observation_images" (just like how we do in the function).

@merveenoyan merveenoyan mentioned this pull request Jan 16, 2025
@merveenoyan
Copy link
Author

we need to unify the image handling for both OpenAI & transformers I think, I saw you overwrote templates which could break transformers. will handle tomorrow

@merveenoyan
Copy link
Author

@aymeric-roucher can I write the tests now? will there be changes to API?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants