Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: TogetherAI Multiple Choice Logprobs #99

Closed
wants to merge 1 commit into from

Conversation

AlekseyKorshuk
Copy link
Contributor

Re: #95

This PR fixes the issue of multiple-choice generation.

Example output with debug logs:

  • Scenario Context:
    • Jack Taylor would prefer everyone went to The Crooked Billet

Check Option A:

The last user message was truncated for readability:

[removed for readability]

Question: To which pub would Jack Taylor go to watch the game?
  (a) The Crooked Billet
  (b) The Clapton Hart
Answer: (
-> Response to check: `a`
-> Response generated by the model: `a`
-> ChoiceLogprobs(content=[ChatCompletionTokenLogprob(token='a', bytes=[97], logprob=-0.00019333878299221396, top_logprobs=[])], refusal=None)
-> LogProb of response to check (`a`): -0.00019333878299221396

Check Option B:

The last user message was truncated for readability:

[removed for readability]

Question: To which pub would Jack Taylor go to watch the game?
  (a) The Crooked Billet
  (b) The Clapton Hart
Answer: (
-> Response to check: `b`
-> Response generated by the model: `a`
-> ChoiceLogprobs(content=[ChatCompletionTokenLogprob(token='a', bytes=[97], logprob=-0.00019333878299221396, top_logprobs=[])], refusal=None)
-> LogProb of response to check (`b`): -inf

Final response selection:

-> Index: 0
-> Response: `a`

Observations:

  • Generated results are identical: ✅
  • Generated logprobs are identical: ✅
  • Parsed logbrobs are correct: ✅
  • Response selection is correct: ✅

cc @jzleibo

Copy link

google-cla bot commented Oct 19, 2024

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@vezhnick vezhnick self-assigned this Oct 21, 2024
@vezhnick vezhnick self-requested a review October 21, 2024 09:40
Copy link
Collaborator

@vezhnick vezhnick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposed change implements sampling a response and evaluating the likelihood of the sample and if the pre-set options are not in the sample, return -inf. This is how we do multiple choice in models that do not allow log likelihoods to be extracted. This, however, produces a lot of -infs.
However, this this change and the conversation before rightly points out that there is a problem with the current code, which returns non-sensical log-probs. We have finished working on a fix and will be submitting it shortly.

@AlekseyKorshuk
Copy link
Contributor Author

Can you clarify a bit? So you are saying that there are lots of -inf for all threads at the same time (aka if we have 2 options A and B, model returns -inf for both)? If this is about the single option, then this is normal, because we ask the model for greedy decoding with returning only top-1 token logprobs. Therefore for the each response completion we gonna receive the same selection, so only one will have non -inf logprob.
The same way AlpacaEval is implemented, but they sample "m" and "M" instead of ABC, and return >1 top tokens.

@vezhnick vezhnick marked this pull request as draft October 21, 2024 15:15
@vezhnick
Copy link
Collaborator

Thanks a lot for this pull request, reviewing it helped us identify the problem with the wrapper. We have submitted a new version that fixes the issue. This pull request is now not relevant.

Thanks again!

@vezhnick vezhnick closed this Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants