-
Notifications
You must be signed in to change notification settings - Fork 584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[InferenceClient] Support response_format={"type": "json_object"}
for litellm ?
#2744
Comments
response_format={ "type": "json_object" }
for litellm ?response_format={"type": "json_object" }
for litellm ?
response_format={"type": "json_object" }
for litellm ?response_format={"type": "json_object"}
for litellm ?
@lhoestq I think this is more a feature request for TGI rather than
Thanks for pointing this out, I think it is worth opening a PR in litellm for that! |
Hmm according to this it should be supported in TGI already: huggingface/text-generation-inference#2046 Let me try using HTTP requests directly or another client and yes, happy to open a PR in litellm once it's figured out |
I think the PR only adds the support to chat response format, but not the possibility to restricts the output without enforcing the schema. Based on the implementation text-generation-inference/router/src/lib.rs#L909 and text-generation-inference/router/src/lib.rs#L207, you need to provide both the |
I see ! moving the issue to TGI then: huggingface/text-generation-inference#2899 |
BTW you might want to read (and maybe respond) to huggingface/huggingface.js#932 |
closing this issue as it's not related to |
Is other inference APIs,
response_format={"type": "json_object"}
restricts the model output to be a valid JSON object without enforcing a schema.Right now this is not supported:
I ended up with this error while using
lotus-ai
which uses thelitellm
library withresponse_format={ "type": "json_object" }
To reproduce:
Motivation:
I'd like to use tools like lotus-ai that rely on JSON mode (no constrained schema to let the LLM output whatever is requested in the prompt)
For this to work, the litellm HF client implementation must support the
response_format
param, which is absent right now:Additional note
The HF client in
litellm
doesn't even support structured generation with a given schema, since it considers we don't supportresponse_format
altogether, I suspect because of thisThis would be useful to integrate in other llm clients as well
The text was updated successfully, but these errors were encountered: