Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Support for logprobs sampling parameter in TT backend #37

Open
1 task done
milank94 opened this issue Nov 23, 2024 · 1 comment
Open
1 task done
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@milank94
Copy link

🚀 The feature, motivation and pitch

I'm working on evaluating Llama3.1-70B on the MMLU and MMLU-Pro datasets from Language Model Evaluation Harness to compare with the benchmarks obtained by Meta https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct#instruction-tuned-models with what Tenstorrent achieves.

This dataset evaluation relies on the logprobs output of the model: https://cookbook.openai.com/examples/using_logprobs. However, TT backend currently does not support this parameter output: https://github.com/tenstorrent/vllm/blob/dev/vllm/worker/tt_model_runner.py#L430 and as observed by trying to run the evaluation harness:

ERROR 11-23 14:35:16 engine.py:159] AssertionError('Currently not supporting logprobs')
ERROR 11-23 14:35:16 engine.py:159] Traceback (most recent call last):
ERROR 11-23 14:35:16 engine.py:159]   File "/home/mkordic/vllm_test/vllm/vllm/engine/multiprocessing/engine.py", line 157, in start
ERROR 11-23 14:35:16 engine.py:159]     self.run_engine_loop()
ERROR 11-23 14:35:16 engine.py:159]   File "/home/mkordic/vllm_test/vllm/vllm/engine/multiprocessing/engine.py", line 220, in run_engine_loop
ERROR 11-23 14:35:16 engine.py:159]     request_outputs = self.engine_step()
ERROR 11-23 14:35:16 engine.py:159]   File "/home/mkordic/vllm_test/vllm/vllm/engine/multiprocessing/engine.py", line 238, in engine_step
ERROR 11-23 14:35:16 engine.py:159]     raise e
ERROR 11-23 14:35:16 engine.py:159]   File "/home/mkordic/vllm_test/vllm/vllm/engine/multiprocessing/engine.py", line 229, in engine_step
ERROR 11-23 14:35:16 engine.py:159]     return self.engine.step()
ERROR 11-23 14:35:16 engine.py:159]   File "/home/mkordic/vllm_test/vllm/vllm/engine/llm_engine.py", line 1402, in step
ERROR 11-23 14:35:16 engine.py:159]     outputs = self.model_executor.execute_model(
ERROR 11-23 14:35:16 engine.py:159]   File "/home/mkordic/vllm_test/vllm/vllm/executor/tt_executor.py", line 55, in execute_model
ERROR 11-23 14:35:16 engine.py:159]     output = self.driver_worker.execute_model(execute_model_req)
ERROR 11-23 14:35:16 engine.py:159]   File "/home/mkordic/vllm_test/vllm/vllm/worker/tt_worker.py", line 333, in execute_model
ERROR 11-23 14:35:16 engine.py:159]     inputs = self.prepare_input(execute_model_req)
ERROR 11-23 14:35:16 engine.py:159]   File "/home/mkordic/vllm_test/vllm/vllm/worker/worker_base.py", line 291, in prepare_input
ERROR 11-23 14:35:16 engine.py:159]     return self._get_driver_input_and_broadcast(execute_model_req)
ERROR 11-23 14:35:16 engine.py:159]   File "/home/mkordic/vllm_test/vllm/vllm/worker/worker_base.py", line 253, in _get_driver_input_and_broadcast
ERROR 11-23 14:35:16 engine.py:159]     self.model_runner.prepare_model_input(
ERROR 11-23 14:35:16 engine.py:159]   File "/home/mkordic/vllm_test/vllm/vllm/worker/tt_model_runner.py", line 192, in prepare_model_input
ERROR 11-23 14:35:16 engine.py:159]     self._validate_sampling_params(sampling_params)
ERROR 11-23 14:35:16 engine.py:159]   File "/home/mkordic/vllm_test/vllm/vllm/worker/tt_model_runner.py", line 430, in _validate_sampling_params
ERROR 11-23 14:35:16 engine.py:159]     assert sampling_params.logprobs is None, "Currently not supporting logprobs"
ERROR 11-23 14:35:16 engine.py:159] AssertionError: Currently not supporting logprobs
INFO:     127.0.0.1:48296 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error

Steps to reproduce are located here: https://github.com/tenstorrent/tt-inference-server/tree/main/evals

Example of a run command:

lm_eval \
--model local-completions \
--model_args model=meta-llama/Meta-Llama-3.1-70B,base_url=http://127.0.0.1:8000/v1/completions,num_concurrent=32,max_retries=4,tokenized_requests=False,add_bos_token=True \
--gen_kwargs model=meta-llama/Meta-Llama-3.1-70B,stream=False \
--tasks mmlu \
--batch_size auto \
--output_path /home/mkordic/lm-evaluation-harness/eval_output  \
--seed 42  \
--log_samples

Alternatives

No response

Additional context

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@milank94
Copy link
Author

Can work around it by using the generative option https://github.com/EleutherAI/lm-evaluation-harness/tree/main/lm_eval/tasks/mmlu#groups and modifying the exact_match_hf_evaluat function:

def exact_match_hf_evaluate(
    predictions,
    references,
    regexes_to_ignore=None,
    ignore_case=False,
    ignore_punctuation=False,
    ignore_numbers=False,
):

    def normalize_text(text):
        """Normalize text by stripping spaces, removing prefixes, and ensuring consistent formatting."""
        # Remove leading/trailing whitespace
        text = text.strip()
        # Extract only the part before any dot or colon (if needed for "C. ..." formats)
        text = text.split(".")[0].strip()
        # Ensure single-character choices like "C" are handled
        return text

    # Normalize predictions and references
    predictions = [normalize_text(pred) for pred in predictions]
    references = [normalize_text(ref) for ref in references]

    if regexes_to_ignore is not None:
        for s in regexes_to_ignore:
            predictions = np.array([re.sub(s, "", x) for x in predictions])
            references = np.array([re.sub(s, "", x) for x in references])
    else:
        predictions = np.asarray(predictions)
        references = np.asarray(references)

    if ignore_case:
        predictions = np.char.lower(predictions)
        references = np.char.lower(references)

    if ignore_punctuation:
        repl_table = string.punctuation.maketrans("", "", string.punctuation)
        predictions = np.char.translate(predictions, table=repl_table)
        references = np.char.translate(references, table=repl_table)

    if ignore_numbers:
        repl_table = string.digits.maketrans("", "", string.digits)
        predictions = np.char.translate(predictions, table=repl_table)
        references = np.char.translate(references, table=repl_table)

    score_list = predictions == references

    return {"exact_match": np.mean(score_list)}

@milank94 milank94 added enhancement New feature or request good first issue Good for newcomers labels Nov 26, 2024
@skhorasganiTT skhorasganiTT self-assigned this Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants