fix: TogetherAI Multiple Choice Logprobs #99

AlekseyKorshuk · 2024-10-19T22:44:39Z

This PR fixes the issue of multiple-choice generation.

Example output with debug logs:

Scenario Context:
- Jack Taylor would prefer everyone went to The Crooked Billet

Check Option A:

The last user message was truncated for readability:

[removed for readability]

Question: To which pub would Jack Taylor go to watch the game?
  (a) The Crooked Billet
  (b) The Clapton Hart
Answer: (

-> Response to check: `a`
-> Response generated by the model: `a`
-> ChoiceLogprobs(content=[ChatCompletionTokenLogprob(token='a', bytes=[97], logprob=-0.00019333878299221396, top_logprobs=[])], refusal=None)
-> LogProb of response to check (`a`): -0.00019333878299221396

Check Option B:

The last user message was truncated for readability:

[removed for readability]

Question: To which pub would Jack Taylor go to watch the game?
  (a) The Crooked Billet
  (b) The Clapton Hart
Answer: (

-> Response to check: `b`
-> Response generated by the model: `a`
-> ChoiceLogprobs(content=[ChatCompletionTokenLogprob(token='a', bytes=[97], logprob=-0.00019333878299221396, top_logprobs=[])], refusal=None)
-> LogProb of response to check (`b`): -inf

Final response selection:

-> Index: 0
-> Response: `a`

Observations:

Generated results are identical: ✅
Generated logprobs are identical: ✅
Parsed logbrobs are correct: ✅
Response selection is correct: ✅

cc @jzleibo

google-cla · 2024-10-19T22:44:42Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

vezhnick

The proposed change implements sampling a response and evaluating the likelihood of the sample and if the pre-set options are not in the sample, return -inf. This is how we do multiple choice in models that do not allow log likelihoods to be extracted. This, however, produces a lot of -infs.
However, this this change and the conversation before rightly points out that there is a problem with the current code, which returns non-sensical log-probs. We have finished working on a fix and will be submitting it shortly.

AlekseyKorshuk · 2024-10-21T13:39:27Z

Can you clarify a bit? So you are saying that there are lots of -inf for all threads at the same time (aka if we have 2 options A and B, model returns -inf for both)? If this is about the single option, then this is normal, because we ask the model for greedy decoding with returning only top-1 token logprobs. Therefore for the each response completion we gonna receive the same selection, so only one will have non -inf logprob.
The same way AlpacaEval is implemented, but they sample "m" and "M" instead of ABC, and return >1 top tokens.

vezhnick · 2024-10-21T15:16:31Z

Thanks a lot for this pull request, reviewing it helped us identify the problem with the wrapper. We have submitted a new version that fixes the issue. This pull request is now not relevant.

Thanks again!

fix: TogetherAI Multiple Choice Logprobs

b10b40d

AlekseyKorshuk mentioned this pull request Oct 19, 2024

Agents don't follow their prompts 2 #95

Closed

vezhnick self-assigned this Oct 21, 2024

vezhnick self-requested a review October 21, 2024 09:40

vezhnick reviewed Oct 21, 2024

View reviewed changes

vezhnick marked this pull request as draft October 21, 2024 15:15

vezhnick closed this Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: TogetherAI Multiple Choice Logprobs #99

fix: TogetherAI Multiple Choice Logprobs #99

AlekseyKorshuk commented Oct 19, 2024

google-cla bot commented Oct 19, 2024

vezhnick left a comment

AlekseyKorshuk commented Oct 21, 2024

vezhnick commented Oct 21, 2024

fix: TogetherAI Multiple Choice Logprobs #99

fix: TogetherAI Multiple Choice Logprobs #99

Conversation

AlekseyKorshuk commented Oct 19, 2024

Example output with debug logs:

Check Option A:

Check Option B:

Final response selection:

Observations:

google-cla bot commented Oct 19, 2024

vezhnick left a comment

Choose a reason for hiding this comment

AlekseyKorshuk commented Oct 21, 2024

vezhnick commented Oct 21, 2024