-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error Occurs When Using Grammar Sampling with Functionary in Batch Requests #223
Comments
It seems fine when not using grammar sampling to raise multi-thread requests, but the effectiveness of tool calls has decreased... |
Hi @Luffyzm3D2Y, Can you provide the Inputs (list of messages and tools) that you found enabling grammar sampling gave better results? |
@khai-meetkai text = '<|start_header_id|>system<|end_header_id|> \n \nYou are capable of executing available function(s) if required. \nOnly execute function(s) when absolutely necessary. \nAsk for the required input to:recipient==all \nUse JSON for function arguments. \nRespond in this format: \n>>>${recipient} \n${content} \nAvailable functions: \n// Supported function definitions that should be called when necessary.\nnamespace functions {\n\n// Execute your provided code and return the terminal output. To get the result, you must explicitly print the important information (intermediate results, final results, etc.) in\n your code using `print` function. \ntype execute_code = (_: {\n// The code to execute. If None, the code from the file specified by filename will be executed. Either code or filename must be provided.\ncode?: string,\n// The file name to save the code or where the code is stored when `code` is None. If None, a file with a randomly generated name will be created. The randomly generated file will\n be deleted after execution. The file name must be a relative path. Relative paths are relative to the working directory. \nfilename?: string,\n// The working directory for the code execution. If None, a default working directory will be used.\nwork_dir?: string,\n// The language of the code. Default is "python".\nlang?: string,\n}) => any;\n\n// Use this function at the end of the task handling process. As the user and other agents do not have access to your intermediate steps and the solution presented in your calling\n of `subtask_solver`, you should write the COMPLETE final answer in the `conclusion` parameter of this function. Include as many information from your exploration as possible.\ntype submit_task = (_: {\n// Use around 400 words to summarize what you have done to handle this task, especially some milestones (such as writing what content to file xxx, getting what information from we\nb xxx). Present the final answer explicitly in details. Only this conclusion will be shown to user, so you must write down enough detailed information that summarize all the thing\ns and information you got. \nconclusion: string,\n}) => any;\n\n// Define subtask and generate its response by yourself if you want to solve subtask rather than generate thought or call other tools.\ntype subtask_solver = (_: {\n// The brief description of the subtask you want to create and solve by yourself.\nsubtask: string,\n// your detailed and self-contained response to the subtask.\nsolution: string,\n}) => any;\n\n} // namespace functions<|eot_id|><|start_header_id|>system<|end_header_id|>\n\nYou are CodeExecutor, and here is your profile:\nCodeExecutor can write and execute codes to solve given questions.\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nYou are asked to complete the following TASK:\n```\n\n# Calculate ISBN-10 Check Digit for Tropicos ID of Order Helotiales\n## Task Inputs (including dialogues and takeaways from PREVIOUS collaboration)\n[system]: The team is collaborating to solve this problem:\nCompute the check digit the Tropicos ID for the Order Helotiales would have if it were an ISBN-10 number.\n[WebBrowserAgent]: We have successfully retrieved the Tropicos ID for the Order Helotiales. Now, we need to calculate the check digit using the ISBN-10 formula. The steps are as f\nollows: \n1. Treat the Tropicos ID as a 9-digit number (pad with leading zeros if necessary).\n2. Calculate the check digit using the ISBN-10 formula.\n\nLet\'s discuss the best approach to proceed with this calculation. Should we assign this task to the CodeExecutor agent, or does anyone have other suggestions?\n\n## Task Description\nUsing the retrieved Tropicos ID for the Order Helotiales, calculate the check digit as if it were an ISBN-10 number. The steps are as follows:\n1. Treat the Tropicos ID as a 9-digit number (pad with leading zeros if necessary).\n2. Calculate the check digit using the ISBN-10 formula, which involves the following steps:\n a. Multiply each of the first nine digits by its position (i.e., the first digit by 1, the second digit by 2, and so on up to the ninth digit by 9).\n b. Sum the results of these multiplications.\n c. Compute the modulus 11 of the sum.\n d. If the result is 10, the check digit is \'X\'. Otherwise, the check digit is the result itself.\n3. Combine the 9-digit Tropicos ID with the calculated check digit to form the complete ISBN-10 number.\n\n```\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nNow you must generate your thought and you must not call the tools in this stage. You should respond in the following json format:\n```json\n{\n "thought": "your thought"\n}\n```<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n>>>\n\n' |
I am sure this prompt could easily reproduce the reported error. It seems to be relevant to some special tokens. |
In our experiments, the focus was not on comparing functionary and GPT* models, and it is not easy to extract specific cases from logs where grammar sampling outperformed its absence. We have observed, however, that not using grammar sampling can lead to a decrease in tool call accuracy, occasionally requiring retries and more additional text parsing. For a more detailed observation, I suggest conducting experiments on complex benchmarks, like the GAIA benchmark. But I obeserved if the final prompt as input to the model is set to the prompt above, the output would generate wrong tool calls without grammar sampling:
So grammar sampling matters. I would appreciate it if you could solve the issue. |
@khai-meetkai I have found where the bug occurred in this specific case. In the function
|
@khai-meetkai if enable_grammar_sampling is False:
logprobs = None
else:
logprobs = 10000 step 2: encode if grammar_sampled_token_id is None:
selected_delta_token_ids = delta_token_ids[:200]
option_token_ids = set()
for option in options:
ids = tokenizer.encode(option)[1:]
for i in ids:
if i not in option_token_ids and i in delta_token_ids:
option_token_ids.add(i)
option_token_ids=list(option_token_ids)
option_token_ids.sort(key=lambda x: delta_token_ids.index(x))
for option_token_id in option_token_ids:
if option_token_id not in selected_delta_token_ids:
selected_delta_token_ids.append(option_token_id)
print(f"len(selected_delta_token_ids):{len(selected_delta_token_ids)}")
for i, sampled_token_ind in enumerate(selected_delta_token_ids):
sampled_token = tokenizer.decode(
[sampled_token_ind], add_special_tokens=False
)
# ... One significant drawback of this implementation is that it might substantially slow down the inference speed (though I haven't investigated the reasons). If you have a better solution, please let me know. Thank you very much. Additionally, during the implementation, I found grammar sampling quite interesting, seeming to be based on a state transition sampling strategy. Is there any systematic documentation that explains this mechanism, especially how the current state is determined, or could you point me to the relevant code? Thank you very much. |
Hi @Luffyzm3D2Y thank you so much for helping to debug this problem. For step 1, it seemed to be a problem within vLLM. I once raised this issue providing the latency information but no one seemed to figure out why yet. Due to how fast vLLM's codebase has been evolving, I also haven't had the chance to find out what is the reason for the significant latency degradation in recent versions when For step 2, that's a good idea to force the sampler to consider the options too by adding them into the list of Regarding the grammar sampling, yes it is based on a state transition strategy. We design a finite-state machine for each prompt template. The FSM chart for functionary-small-v2.5 is here. Functionary-medium-v3.0's FSM is similar to functionary-v2.4's which is presented below: |
@jeffrey-fong And as your second point about how do we determine which option should appear, I think in theory, if we get the whole logprobs (like I’m wondering if we can handle this error with some exception handling, in case the ideal scenario (i.e., grammar sampling occasionally failing) cannot be achieved, to prevent the entire server from crashing so that we can retry requests to the server. |
Issue Description
When executing the command below with batch requests (e.g., using Python multithreading to request the API), an error occurs:
Error Log
The specific error encountered is:
An input prompt for test as an example (I have met such errors in different cases and something in common is that the --enable-grammar-sampling flag is used in conjunction with batch requests.)
It appears that the error is related to the grammar sampling functionality, as it is triggered when the --enable-grammar-sampling flag is used in conjunction with batch requests.
Steps to Reproduce
Set up the Functionary model using the command provided above.
Send multiple requests simultaneously using a batch request method, such as Python multithreading.
Observations
The error seems to be associated with the grammar_sample method in llama3_prompt_template_v3.py.
The issue arises when decoding tokens, resulting in a NoneType object being interpreted incorrectly.
Request
Could you please investigate this issue? Any guidance or potential fixes would be greatly appreciated.
The text was updated successfully, but these errors were encountered: