-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Return max_input_length if sequence is too long (NLP tasks) #342
Comments
What model are you using? If you use a model powered by TGI or a TGI endpoint directly, this is the error you should get: # for "bigcode/starcoder"
huggingface_hub.inference._text_generation.ValidationError: Input validation error: `inputs` tokens + `max_new_tokens` must be <= 8192. Given: 37500 `inputs` tokens and 20 `max_new_tokens` (In any case, I think this should be handled server-side as the client cannot know this information before-hand.) |
The model I was using is https://huggingface.co/microsoft/biogpt with I think this request is to add more info to that failure message, so one can better understand next steps. I am not sure where that failure message is being made, if you could link me to a line in a Hugging Face repo, I just want to use some f-strings in its message |
prompt
with InferenceClient.text_generation
I transferred the issue to our I saw that it's implemented here (private repo) but don't know how this value could be easily accessible. |
Thanks @Wauplin for putting this in the right place! |
No there's not, truncating is the only simple option. The reason is that the API for non TGI models use the pipeline, which raises an exception that would need to be parsed (which would break everytime the message gets updated, which while it shouldn't happen that much is still very much possilbe). Using truncation should solve most issues for most users, therefore I don't think we should change anything here. @jamesbraza Any reason truncation doesn't work for you ? |
Thanks for the response @Narsil ! Appreciate the details provided too.
So there's a few techniques to parsing exceptions, one of which is parsing the error string, which I agree is indeed brittle to change. A more maintainable route involves custom class InputTooLong(ValueError):
def __init__(self, msg: str, actual_length: int, llm_limit: int) -> None:
super().__init__(msg)
self.actual_length = actual_length
self.llm_limit = llm_limit
try:
raise InputTooLongError(
"Input is too long for this model (removed rest for brevity)",
actual_length=len(prompt_tokens),
llm_limit=llm.limit
)
except Exception as exc:
pass # BadRequestError raised here can contain the length in metadata This method is readable and is independent of message strings.
Fwiw, this request is about just giving more information on the error being faced. I think it would be good if the response had the actual tokens used and the LLM's limit on tokens, and then one can decide on using truncation. I see truncation as a downstream choice after seeing a good error message, and thus using truncation or not is tangent to this issue. To answer your question, truncation seems like a bad idea to me, because it can lead to unexpected failures caused by truncating away important parts of the prompt. I use RAG, so prompts can become quite big. The actual prompt comes last, which could be truncated away |
It's always left truncated for generative models, therefore only loosing the initial part of a prompt. |
Hello @Narsil thanks for the response! Good to know truncation is left truncated usually.
Yeah this is a workaround. Though it requires one to:
However, Hugging Face already knows the input size and max allowable size because it states "Input is too long for this model". So I think it's much easier for Hugging Face 'server side' to just include this information in the thrown |
Knowing the length in tokens is not really useful if you don't know how to modify the original prompt in order to modify those tokens right ?
What kind of hardware are we talking here ? Tokenizers are extremely tiny. |
When using
InferenceClient.text_generation
with a really longprompt
, you get an error:Can we have this message contain more info:
The text was updated successfully, but these errors were encountered: