Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for multimodal openai - early version #313

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

Conversation

fm1320
Copy link
Collaborator

@fm1320 fm1320 commented Jan 6, 2025

The PR adds multimodal (text + image) support to the existing OpenAI client while maintaining backward compatibility with text-only operations. I also adds image generation with Dall E 2 and 3. It also adds tests and updates docstring

  1. Unified Input Handling
def convert_inputs_to_api_kwargs(self, input, model_kwargs, model_type):
    # Handles both text-only and image+text inputs in one place
    # Supports both simple text and structured messages
  1. Image Processing
def _prepare_image_content(self, image_source, detail="auto"):
    # Supports multiple image input types:
    # - Local files (converts to base64)
    # - URLs (direct use)
    # - Pre-formatted content

Add DALL-E Image Generation Support

  1. Added DALL-E 2 & 3 support to OpenAI client for image generation, variation, and editing. Users can now:
  • Generate images from text prompts
  • Create variations of existing images
  • Edit images using masks
  • Get results as URLs or base64

Key Changes

  • Added IMAGE_GENERATION model type
  • Enhanced client with DALL-E API integration
  • Added response parsing for image operations
  • Maintained existing error handling pattern

Example use:

Text only:

client = OpenAIClient()
response = client.call(
    api_kwargs={"input": "Hello", "model": "gpt-3.5-turbo"}
)

Multimodal:

client = OpenAIClient()
response = client.call(
    api_kwargs={
        "input": "Describe this",
        "model": "gpt-4o",
        "images": "path/to/image.jpg"
    }
)

Image generation:

class ImageGenerator(Generator):
    """Generator subclass for image generation."""
    model_type = ModelType.IMAGE_GENERATION
    
       dalle_gen = ImageGenerator(
        model_client=client,
        model_kwargs={
            "model": "dall-e-3",
            "size": "1024x1024",
            "quality": "standard",
            "n": 1
        }
    )
    
    # For image generation, input_str becomes the prompt
    response = dalle_gen({"input_str": "A happy siamese cat playing with a red ball of yarn"})
    print("\n=== DALL-E Generation ===")
    print(f"Generated Image URL: {response.data}")

TODO:

  • Everything shout be an Output Generator type - Generator cant raise error but put the error in error field
  • Image generation
  • How to raise and catch the error
  • parsed chat completion has to be a generator output - inside chat completion parser

Fixes #<issue_number>

Before submitting
  • Was this discussed/agreed via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@fm1320 fm1320 marked this pull request as ready for review January 8, 2025 09:45
Copy link
Member

@liyin2015 liyin2015 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1) image output potentially (2) test generator (3) test using real api by yourself

@fm1320 fm1320 requested a review from liyin2015 January 10, 2025 01:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants