Add support for multimodal openai - early version #313

fm1320 · 2025-01-06T20:30:35Z

The PR adds multimodal (text + image) support to the existing OpenAI client while maintaining backward compatibility with text-only operations. I also adds image generation with Dall E 2 and 3. It also adds tests and updates docstring

Unified Input Handling

def convert_inputs_to_api_kwargs(self, input, model_kwargs, model_type):
    # Handles both text-only and image+text inputs in one place
    # Supports both simple text and structured messages

Image Processing

def _prepare_image_content(self, image_source, detail="auto"):
    # Supports multiple image input types:
    # - Local files (converts to base64)
    # - URLs (direct use)
    # - Pre-formatted content

Add DALL-E Image Generation Support

Added DALL-E 2 & 3 support to OpenAI client for image generation, variation, and editing. Users can now:

Generate images from text prompts
Create variations of existing images
Edit images using masks
Get results as URLs or base64

Key Changes

Added IMAGE_GENERATION model type
Enhanced client with DALL-E API integration
Added response parsing for image operations
Maintained existing error handling pattern

Example use:

Text only:

client = OpenAIClient()
response = client.call(
    api_kwargs={"input": "Hello", "model": "gpt-3.5-turbo"}
)

Multimodal:

client = OpenAIClient()
response = client.call(
    api_kwargs={
        "input": "Describe this",
        "model": "gpt-4o",
        "images": "path/to/image.jpg"
    }
)

Image generation:

class ImageGenerator(Generator):
    """Generator subclass for image generation."""
    model_type = ModelType.IMAGE_GENERATION
    
       dalle_gen = ImageGenerator(
        model_client=client,
        model_kwargs={
            "model": "dall-e-3",
            "size": "1024x1024",
            "quality": "standard",
            "n": 1
        }
    )
    
    # For image generation, input_str becomes the prompt
    response = dalle_gen({"input_str": "A happy siamese cat playing with a red ball of yarn"})
    print("\n=== DALL-E Generation ===")
    print(f"Generated Image URL: {response.data}")

TODO:

Everything shout be an Output Generator type - Generator cant raise error but put the error in error field
Image generation
How to raise and catch the error
parsed chat completion has to be a generator output - inside chat completion parser

Fixes #<issue_number>

Before submitting

Was this discussed/agreed via a GitHub issue? (not for typos and docs)
Did you read the contributor guideline?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you list all the breaking changes introduced by this pull request?

review-notebook-app · 2025-01-06T20:30:40Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

adalflow/adalflow/components/model_client/openai_client.py

adalflow/adalflow/utils/lazy_import.py

liyin2015

(1) image output potentially (2) test generator (3) test using real api by yourself

add multi modal support for openai draft

73089ff

fm1320 added 2 commits January 6, 2025 22:46

Change multimodal to one client

c6c4663

remove separate file refs

b0a473b

liyin2015 reviewed Jan 7, 2025

View reviewed changes

adalflow/adalflow/components/model_client/openai_client.py Outdated Show resolved Hide resolved

liyin2015 reviewed Jan 7, 2025

View reviewed changes

adalflow/adalflow/components/model_client/openai_client.py Outdated Show resolved Hide resolved

liyin2015 reviewed Jan 7, 2025

View reviewed changes

adalflow/adalflow/components/model_client/openai_client.py Outdated Show resolved Hide resolved

liyin2015 reviewed Jan 7, 2025

View reviewed changes

adalflow/adalflow/utils/lazy_import.py Outdated Show resolved Hide resolved

Single function openaiclient and test

00ea1d5

fm1320 marked this pull request as ready for review January 8, 2025 09:45

fm1320 added 2 commits January 8, 2025 16:31

add more tests with mock

578a165

add more tests with mock

852c212

liyin2015 reviewed Jan 8, 2025

View reviewed changes

fm1320 added 2 commits January 9, 2025 11:28

add image gen

ff1060a

Update .rst file and colab

5144fc4

fm1320 requested a review from liyin2015 January 10, 2025 01:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for multimodal openai - early version #313

Add support for multimodal openai - early version #313

fm1320 commented Jan 6, 2025 •

edited

Loading

review-notebook-app bot commented Jan 6, 2025

liyin2015 left a comment

Add support for multimodal openai - early version #313

Are you sure you want to change the base?

Add support for multimodal openai - early version #313

Conversation

fm1320 commented Jan 6, 2025 • edited Loading

The PR adds multimodal (text + image) support to the existing OpenAI client while maintaining backward compatibility with text-only operations. I also adds image generation with Dall E 2 and 3. It also adds tests and updates docstring

Add DALL-E Image Generation Support

Key Changes

review-notebook-app bot commented Jan 6, 2025

liyin2015 left a comment

Choose a reason for hiding this comment

fm1320 commented Jan 6, 2025 •

edited

Loading