[InferenceClient] Support `response_format={"type": "json_object"}` for litellm ? #2744

lhoestq · 2025-01-10T16:18:26Z

Is other inference APIs, response_format={"type": "json_object"} restricts the model output to be a valid JSON object without enforcing a schema.

Right now this is not supported:

HfHubHTTPError: 422 Client Error: Unprocessable Entity for url: https://api-inference.huggingface.co/models/meta-llama/Llama-3.3-70B-Instruct/v1/chat/completions (Request ID: siYCHW4vYTCQ4Kv4_Mv4R)

Failed to deserialize the JSON body into the target type: response_format: missing field `value` at line 1 column 168

I ended up with this error while using lotus-ai which uses the litellm library with response_format={ "type": "json_object" }

To reproduce:

from huggingface_hub import InferenceClient

c = InferenceClient("meta-llama/Llama-3.3-70B-Instruct")
c.chat_completion([{"role": "user", "content": "Give me a dummy json of a person"}], response_format={"type": "json_object"})

Motivation:

I'd like to use tools like lotus-ai that rely on JSON mode (no constrained schema to let the LLM output whatever is requested in the prompt)

import lotus
import pandas as pd

lm = lotus.models.LM("huggingface/meta-llama/Llama-3.3-70B-Instruct")
lotus.settings.configure(lm=lm)

df = pd.read_json("hf://datasets/GAIR/o1-journey/train.jsonl", lines=True)[:100]

# Extract main topic and subtopic
df = d.sem_extract(["question"], {"topic": "high level topic", "subtopic": "specific subtopic"})

For this to work, the litellm HF client implementation must support the response_format param, which is absent right now:

>>> from litellm import get_supported_openai_params
>>> get_supported_openai_params(model="huggingface/meta-llama/Llama-3.3-70B-Instruct")
['stream',
 'temperature',
 'max_tokens',
 'max_completion_tokens',
 'top_p',
 'stop',
 'n',
 'echo']

Additional note

The HF client in litellm doesn't even support structured generation with a given schema, since it considers we don't support response_format altogether, I suspect because of this

This would be useful to integrate in other llm clients as well

The text was updated successfully, but these errors were encountered:

hanouticelina · 2025-01-10T16:39:07Z

Is other inference APIs, response_format={"type": "json_object"} restricts the model output to be a valid JSON object without enforcing a schema.

@lhoestq I think this is more a feature request for TGI rather than huggingface_hub.

The HF client in litellm doesn't even support structured generation with a given schema, since it considers we don't support response_format altogether, I suspect because of this
This would be useful to integrate in other llm clients as well

Thanks for pointing this out, I think it is worth opening a PR in litellm for that!

lhoestq · 2025-01-10T16:44:38Z

Hmm according to this it should be supported in TGI already: huggingface/text-generation-inference#2046

Let me try using HTTP requests directly or another client

and yes, happy to open a PR in litellm once it's figured out

hanouticelina · 2025-01-10T16:59:12Z

I think the PR only adds the support to chat response format, but not the possibility to restricts the output without enforcing the schema. Based on the implementation text-generation-inference/router/src/lib.rs#L909 and text-generation-inference/router/src/lib.rs#L207, you need to provide both the type and value fields for the response_format field.

lhoestq · 2025-01-10T17:12:32Z

I see ! moving the issue to TGI then: huggingface/text-generation-inference#2899

julien-c · 2025-01-10T17:16:47Z

BTW you might want to read (and maybe respond) to huggingface/huggingface.js#932

hanouticelina · 2025-01-14T17:19:46Z

closing this issue as it's not related to huggingface_hub. If you have any related question, please refer to this text-generation-inference (TGI) issue #2899.

lhoestq changed the title ~~[InferenceClient] Support response_format={ "type": "json_object" } for litellm ?~~ [InferenceClient] Support response_format={"type": "json_object" } for litellm ? Jan 10, 2025

lhoestq changed the title ~~[InferenceClient] Support response_format={"type": "json_object" } for litellm ?~~ [InferenceClient] Support response_format={"type": "json_object"} for litellm ? Jan 10, 2025

lhoestq mentioned this issue Jan 10, 2025

Support reponse_format: {"type": "json_object"} without any constrained schema huggingface/text-generation-inference#2899

Open

hanouticelina closed this as not planned Won't fix, can't repro, duplicate, stale Jan 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[InferenceClient] Support `response_format={"type": "json_object"}` for litellm ? #2744

[InferenceClient] Support `response_format={"type": "json_object"}` for litellm ? #2744

lhoestq commented Jan 10, 2025 •

edited

Loading

hanouticelina commented Jan 10, 2025

lhoestq commented Jan 10, 2025 •

edited

Loading

hanouticelina commented Jan 10, 2025

lhoestq commented Jan 10, 2025 •

edited

Loading

julien-c commented Jan 10, 2025

hanouticelina commented Jan 14, 2025

[InferenceClient] Support response_format={"type": "json_object"} for litellm ? #2744

[InferenceClient] Support response_format={"type": "json_object"} for litellm ? #2744

Comments

lhoestq commented Jan 10, 2025 • edited Loading

To reproduce:

Motivation:

Additional note

hanouticelina commented Jan 10, 2025

lhoestq commented Jan 10, 2025 • edited Loading

hanouticelina commented Jan 10, 2025

lhoestq commented Jan 10, 2025 • edited Loading

julien-c commented Jan 10, 2025

hanouticelina commented Jan 14, 2025

[InferenceClient] Support `response_format={"type": "json_object"}` for litellm ? #2744

[InferenceClient] Support `response_format={"type": "json_object"}` for litellm ? #2744

lhoestq commented Jan 10, 2025 •

edited

Loading

lhoestq commented Jan 10, 2025 •

edited

Loading

lhoestq commented Jan 10, 2025 •

edited

Loading