Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug Report] Llama 3.2 11B (vision) performance lower than expected #17092

Open
milank94 opened this issue Jan 24, 2025 · 4 comments
Open

[Bug Report] Llama 3.2 11B (vision) performance lower than expected #17092

milank94 opened this issue Jan 24, 2025 · 4 comments
Assignees
Labels
bug Something isn't working llama3 LLM_bug P1

Comments

@milank94
Copy link
Contributor

Describe the bug
The decode performance of the Llama 3.2 11B (vision) model is lower than expected:

Expected: 14.8 t/s/u, 2880 ms (ttft)
Actual: 11.4 t/s/u, 3813 ms (ttft)

More details: https://docs.google.com/spreadsheets/d/1Mdn3mBIOHYRC0ETsMJdO9dXtSVNJn_tFaR6QipbpEXU/edit?usp=sharing

To Reproduce
Steps to reproduce the behavior:

  1. Go to https://github.com/tenstorrent/tt-inference-server/tree/tstesco/dev/benchmarking

Please complete the following environment information:

@milank94 milank94 added bug Something isn't working llama3 LLM_bug labels Jan 24, 2025
@tstescoTT
Copy link
Contributor

I'll add that this is for N300 only, T3000 performance was as expected.

@uaydonat
Copy link
Contributor

@skhorasganiTT to re-measure the perf

@skhorasganiTT
Copy link
Contributor

skhorasganiTT commented Jan 25, 2025

Hey @milank94 @tstescoTT, are you running 3.2-11b on N300 with max_num_seqs=16? (that is the batch size which we reported the perf for on N300)

@tstescoTT
Copy link
Contributor

Yes, measured 16 concurrent requests. When testing with more than 16 requests sent the vLLM backend queues the additional requests as desired.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working llama3 LLM_bug P1
Projects
None yet
Development

No branches or pull requests

5 participants