Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[InferenceClient] Support response_format={"type": "json_object"} for litellm ? #2744

Closed
lhoestq opened this issue Jan 10, 2025 · 6 comments
Closed

Comments

@lhoestq
Copy link
Member

lhoestq commented Jan 10, 2025

Is other inference APIs, response_format={"type": "json_object"} restricts the model output to be a valid JSON object without enforcing a schema.

Right now this is not supported:

HfHubHTTPError: 422 Client Error: Unprocessable Entity for url: https://api-inference.huggingface.co/models/meta-llama/Llama-3.3-70B-Instruct/v1/chat/completions (Request ID: siYCHW4vYTCQ4Kv4_Mv4R)

Failed to deserialize the JSON body into the target type: response_format: missing field `value` at line 1 column 168

I ended up with this error while using lotus-ai which uses the litellm library with response_format={ "type": "json_object" }

To reproduce:

from huggingface_hub import InferenceClient

c = InferenceClient("meta-llama/Llama-3.3-70B-Instruct")
c.chat_completion([{"role": "user", "content": "Give me a dummy json of a person"}], response_format={"type": "json_object"})

Motivation:

I'd like to use tools like lotus-ai that rely on JSON mode (no constrained schema to let the LLM output whatever is requested in the prompt)

import lotus
import pandas as pd

lm = lotus.models.LM("huggingface/meta-llama/Llama-3.3-70B-Instruct")
lotus.settings.configure(lm=lm)

df = pd.read_json("hf://datasets/GAIR/o1-journey/train.jsonl", lines=True)[:100]

# Extract main topic and subtopic
df = d.sem_extract(["question"], {"topic": "high level topic", "subtopic": "specific subtopic"})

For this to work, the litellm HF client implementation must support the response_format param, which is absent right now:

>>> from litellm import get_supported_openai_params
>>> get_supported_openai_params(model="huggingface/meta-llama/Llama-3.3-70B-Instruct")
['stream',
 'temperature',
 'max_tokens',
 'max_completion_tokens',
 'top_p',
 'stop',
 'n',
 'echo']

Additional note

The HF client in litellm doesn't even support structured generation with a given schema, since it considers we don't support response_format altogether, I suspect because of this

This would be useful to integrate in other llm clients as well

@lhoestq lhoestq changed the title [InferenceClient] Support response_format={ "type": "json_object" } for litellm ? [InferenceClient] Support response_format={"type": "json_object" } for litellm ? Jan 10, 2025
@lhoestq lhoestq changed the title [InferenceClient] Support response_format={"type": "json_object" } for litellm ? [InferenceClient] Support response_format={"type": "json_object"} for litellm ? Jan 10, 2025
@hanouticelina
Copy link
Contributor

Is other inference APIs, response_format={"type": "json_object"} restricts the model output to be a valid JSON object without enforcing a schema.

@lhoestq I think this is more a feature request for TGI rather than huggingface_hub.

The HF client in litellm doesn't even support structured generation with a given schema, since it considers we don't support response_format altogether, I suspect because of this
This would be useful to integrate in other llm clients as well

Thanks for pointing this out, I think it is worth opening a PR in litellm for that!

@lhoestq
Copy link
Member Author

lhoestq commented Jan 10, 2025

Hmm according to this it should be supported in TGI already: huggingface/text-generation-inference#2046

Let me try using HTTP requests directly or another client

and yes, happy to open a PR in litellm once it's figured out

@hanouticelina
Copy link
Contributor

I think the PR only adds the support to chat response format, but not the possibility to restricts the output without enforcing the schema. Based on the implementation text-generation-inference/router/src/lib.rs#L909 and text-generation-inference/router/src/lib.rs#L207, you need to provide both the type and value fields for the response_format field.

@lhoestq
Copy link
Member Author

lhoestq commented Jan 10, 2025

I see ! moving the issue to TGI then: huggingface/text-generation-inference#2899

@julien-c
Copy link
Member

BTW you might want to read (and maybe respond) to huggingface/huggingface.js#932

@hanouticelina
Copy link
Contributor

closing this issue as it's not related to huggingface_hub. If you have any related question, please refer to this text-generation-inference (TGI) issue #2899.

@hanouticelina hanouticelina closed this as not planned Won't fix, can't repro, duplicate, stale Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants