[Feature]: Support for logprobs sampling parameter in TT backend #37

milank94 · 2024-11-23T16:03:22Z

🚀 The feature, motivation and pitch

I'm working on evaluating Llama3.1-70B on the MMLU and MMLU-Pro datasets from Language Model Evaluation Harness to compare with the benchmarks obtained by Meta https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct#instruction-tuned-models with what Tenstorrent achieves.

This dataset evaluation relies on the logprobs output of the model: https://cookbook.openai.com/examples/using_logprobs. However, TT backend currently does not support this parameter output: https://github.com/tenstorrent/vllm/blob/dev/vllm/worker/tt_model_runner.py#L430 and as observed by trying to run the evaluation harness:

ERROR 11-23 14:35:16 engine.py:159] AssertionError('Currently not supporting logprobs')
ERROR 11-23 14:35:16 engine.py:159] Traceback (most recent call last):
ERROR 11-23 14:35:16 engine.py:159]   File "/home/mkordic/vllm_test/vllm/vllm/engine/multiprocessing/engine.py", line 157, in start
ERROR 11-23 14:35:16 engine.py:159]     self.run_engine_loop()
ERROR 11-23 14:35:16 engine.py:159]   File "/home/mkordic/vllm_test/vllm/vllm/engine/multiprocessing/engine.py", line 220, in run_engine_loop
ERROR 11-23 14:35:16 engine.py:159]     request_outputs = self.engine_step()
ERROR 11-23 14:35:16 engine.py:159]   File "/home/mkordic/vllm_test/vllm/vllm/engine/multiprocessing/engine.py", line 238, in engine_step
ERROR 11-23 14:35:16 engine.py:159]     raise e
ERROR 11-23 14:35:16 engine.py:159]   File "/home/mkordic/vllm_test/vllm/vllm/engine/multiprocessing/engine.py", line 229, in engine_step
ERROR 11-23 14:35:16 engine.py:159]     return self.engine.step()
ERROR 11-23 14:35:16 engine.py:159]   File "/home/mkordic/vllm_test/vllm/vllm/engine/llm_engine.py", line 1402, in step
ERROR 11-23 14:35:16 engine.py:159]     outputs = self.model_executor.execute_model(
ERROR 11-23 14:35:16 engine.py:159]   File "/home/mkordic/vllm_test/vllm/vllm/executor/tt_executor.py", line 55, in execute_model
ERROR 11-23 14:35:16 engine.py:159]     output = self.driver_worker.execute_model(execute_model_req)
ERROR 11-23 14:35:16 engine.py:159]   File "/home/mkordic/vllm_test/vllm/vllm/worker/tt_worker.py", line 333, in execute_model
ERROR 11-23 14:35:16 engine.py:159]     inputs = self.prepare_input(execute_model_req)
ERROR 11-23 14:35:16 engine.py:159]   File "/home/mkordic/vllm_test/vllm/vllm/worker/worker_base.py", line 291, in prepare_input
ERROR 11-23 14:35:16 engine.py:159]     return self._get_driver_input_and_broadcast(execute_model_req)
ERROR 11-23 14:35:16 engine.py:159]   File "/home/mkordic/vllm_test/vllm/vllm/worker/worker_base.py", line 253, in _get_driver_input_and_broadcast
ERROR 11-23 14:35:16 engine.py:159]     self.model_runner.prepare_model_input(
ERROR 11-23 14:35:16 engine.py:159]   File "/home/mkordic/vllm_test/vllm/vllm/worker/tt_model_runner.py", line 192, in prepare_model_input
ERROR 11-23 14:35:16 engine.py:159]     self._validate_sampling_params(sampling_params)
ERROR 11-23 14:35:16 engine.py:159]   File "/home/mkordic/vllm_test/vllm/vllm/worker/tt_model_runner.py", line 430, in _validate_sampling_params
ERROR 11-23 14:35:16 engine.py:159]     assert sampling_params.logprobs is None, "Currently not supporting logprobs"
ERROR 11-23 14:35:16 engine.py:159] AssertionError: Currently not supporting logprobs
INFO:     127.0.0.1:48296 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error

Steps to reproduce are located here: https://github.com/tenstorrent/tt-inference-server/tree/main/evals

Example of a run command:

lm_eval \
--model local-completions \
--model_args model=meta-llama/Meta-Llama-3.1-70B,base_url=http://127.0.0.1:8000/v1/completions,num_concurrent=32,max_retries=4,tokenized_requests=False,add_bos_token=True \
--gen_kwargs model=meta-llama/Meta-Llama-3.1-70B,stream=False \
--tasks mmlu \
--batch_size auto \
--output_path /home/mkordic/lm-evaluation-harness/eval_output  \
--seed 42  \
--log_samples

Alternatives

No response

Additional context

No response

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

milank94 · 2024-11-23T16:05:12Z

Can work around it by using the generative option https://github.com/EleutherAI/lm-evaluation-harness/tree/main/lm_eval/tasks/mmlu#groups and modifying the exact_match_hf_evaluat function:

def exact_match_hf_evaluate(
    predictions,
    references,
    regexes_to_ignore=None,
    ignore_case=False,
    ignore_punctuation=False,
    ignore_numbers=False,
):

    def normalize_text(text):
        """Normalize text by stripping spaces, removing prefixes, and ensuring consistent formatting."""
        # Remove leading/trailing whitespace
        text = text.strip()
        # Extract only the part before any dot or colon (if needed for "C. ..." formats)
        text = text.split(".")[0].strip()
        # Ensure single-character choices like "C" are handled
        return text

    # Normalize predictions and references
    predictions = [normalize_text(pred) for pred in predictions]
    references = [normalize_text(ref) for ref in references]

    if regexes_to_ignore is not None:
        for s in regexes_to_ignore:
            predictions = np.array([re.sub(s, "", x) for x in predictions])
            references = np.array([re.sub(s, "", x) for x in references])
    else:
        predictions = np.asarray(predictions)
        references = np.asarray(references)

    if ignore_case:
        predictions = np.char.lower(predictions)
        references = np.char.lower(references)

    if ignore_punctuation:
        repl_table = string.punctuation.maketrans("", "", string.punctuation)
        predictions = np.char.translate(predictions, table=repl_table)
        references = np.char.translate(references, table=repl_table)

    if ignore_numbers:
        repl_table = string.digits.maketrans("", "", string.digits)
        predictions = np.char.translate(predictions, table=repl_table)
        references = np.char.translate(references, table=repl_table)

    score_list = predictions == references

    return {"exact_match": np.mean(score_list)}

milank94 added enhancement New feature or request good first issue Good for newcomers labels Nov 26, 2024

skhorasganiTT self-assigned this Dec 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Support for logprobs sampling parameter in TT backend #37

[Feature]: Support for logprobs sampling parameter in TT backend #37

milank94 commented Nov 23, 2024

milank94 commented Nov 23, 2024

[Feature]: Support for logprobs sampling parameter in TT backend #37

[Feature]: Support for logprobs sampling parameter in TT backend #37

Comments

milank94 commented Nov 23, 2024

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

milank94 commented Nov 23, 2024