[Bug]: Cannot get past 50 RPS #6592

vutrung96 · 2024-11-05T01:25:29Z

What happened?

I have OpenAI tier 5 usage, which should give me 30,000 RPM = 500 RPS with "gpt-4o-mini". However I struggle get past 50 RPS.

The minimal replication:

from litellm import acompletion

tasks = [acompletion(
    model="gpt-4o-mini",
    messages=[
      {"role": "system", "content": "You're an agent who answers yes or no"},
      {"role": "user", "content": "Is the sky blue?"},
    ],
) for i in range(2000)]

I only get 50 items/second as opposed to ~500 items/second when sending raw HTTP requests.

Relevant log output

 16%|█████████████████████▌                                                                                                                 | 320/2000 [00:09<00:40, 41.49it/s]

Twitter / LinkedIn details

No response

The text was updated successfully, but these errors were encountered:

ishaan-jaff · 2024-11-14T16:28:06Z

hi @vutrung96 looking into this, how do you get the % complete log output ?

vutrung96 · 2024-11-16T18:39:42Z

Hi @ishaan-jaff I was just using tqdm

CharlieJCJ · 2024-11-19T03:24:34Z

Hi @ishaan-jaff , any updates on this, also facing this issue!

ishaan-jaff · 2024-11-21T16:31:41Z

hi @vutrung96 @CharlieJCJ do you see the issue on litellm.router too ? https://docs.litellm.ai/docs/routing

It would help me if you could test with litellm router too

RyanMarten · 2024-12-04T14:48:06Z

Hi @ishaan-jaff
We tracked down the root cause of the issue.

Litellm uses the official OpenAI python client

client: Optional[Union[OpenAI, AsyncOpenAI]] = None,

The official OpenAI client has performance issues with high numbers of concurrent requests due to issues in httpx

httpx client has very poor performance for concurrent requests compared to aiohttp openai/openai-python#1596

The issues in httpx are due to a number of factors related to anyio vs asyncio

Improve async performance. encode/httpx#3215

Which are addressed in the open PRs below

We saw this when implementing litellm as the backend for our synthetic data engine

Add LiteLLM+instructor (for structured output) backend for curator bespokelabsai/curator#141

When using our own openai client (with aiohttp instead of httpx) we saturate the highest rate limits (30,000 requests per minute on gpt-4o-mini tier 5). When using litellm, the performance issues cap us well under the highest rate limit (200 queries per second - 12,000 requests per minute).

ishaan-jaff · 2025-01-02T23:37:03Z

@RyanMarten you are right ! just ran a load test to confirm. The right is with aiohttp it's 10x more RPS

ishaan-jaff · 2025-01-03T06:07:08Z

@RyanMarten started work on this

#7514

added a new custom_llm_provider=aiohttp_openai that uses aiohttp for calling logic

@RyanMarten, @vutrung96 and @CharlieJCJ can y'all help us test this change as we start rolling it out ?

As of now we just added support for non-streaming. I can let you know once streaming support is added too

RyanMarten · 2025-01-07T00:20:59Z

@ishaan-jaff Thanks for creating a PR for this! We can certainly help test the change 😄. I'll run a benchmarking test with model=aiohttp_openai/gpt-4o-mini

Our use-case on non-streaming so that shouldn't be a problem.

RyanMarten · 2025-01-07T01:09:53Z

Here is our benchmarking using the curator request processor and viewer (with different backends). I see that this was released in https://github.com/BerriAI/litellm/releases/tag/v1.56.8. I upgraded my litellm version to the latest poetry add litellm@latest which is 1.57.0.

from bespokelabs.curator import LLM
from datasets import Dataset

dataset = Dataset.from_dict({"prompt": ["write me a poem"] * 100_000})

(1) our own aiohttp backend

llm = LLM(
    prompt_func=lambda row: row["prompt"],
    model_name="gpt-4o-mini",
    backend="openai",
)
dataset = llm(dataset)

(2) default litellm backend

llm = LLM(
    prompt_func=lambda row: row["prompt"],
    model_name="gpt-4o-mini",
    backend="litellm",
)
dataset = llm(dataset)

(3) litellm backend with aiohttp_openai

llm = LLM(
    prompt_func=lambda row: row["prompt"],
    model_name="aiohttp_openai/gpt-4o-mini",
    backend="litellm",
)
dataset = llm(dataset)

For some reason I'm not seeing an improvement in performance

ishaan-jaff · 2025-01-07T01:13:11Z

hmm that's odd - we see RPS going much higher on our testing

Do you see anything off with our implementation (I know you mentioned you also implemented aiohttp) ?

https://github.com/BerriAI/litellm/blob/main/litellm/llms/custom_httpx/aiohttp_handler.py#L30

ishaan-jaff · 2025-01-07T01:15:16Z

ohh - I think I know the issue, it's still getting routed to the OpenAI sdk when you pass aiohttp_openai/gpt-4o-mini

(we route to using the OpenAI sdk if the model is recognized as an OpenAI model)

In my testing I was using aiohttp_openai/mock_model

will update this thread to ensure aiohttp_openai/gpt-4o-mini uses aiohttp_openai

RyanMarten · 2025-01-07T01:17:16Z

I'll take a look!

For reference here is our aiohttp implementation:
https://github.com/bespokelabsai/curator/blob/0c7cf21a5af0a228904906de417d902fac5c2b5c/src/bespokelabs/curator/request_processor/online/openai_online_request_processor.py#L167

And here is how we are using litellm as a backend:
https://github.com/bespokelabsai/curator/blob/0c7cf21a5af0a228904906de417d902fac5c2b5c/src/bespokelabs/curator/request_processor/online/litellm_online_request_processor.py#L210

RyanMarten · 2025-01-07T01:18:56Z

Ah yes, what you said about the routing makes sense!

When the fix is in, I'll try my benchmark again and post the results 👍

ishaan-jaff · 2025-01-07T05:27:32Z

Fixed here @RyanMarten #7598

could you test on our new release ? (Will be out in 12 hrs) on v1.57.2

RyanMarten · 2025-01-07T18:53:52Z

@ishaan-jaff - yes absolutely (looking out for the release)

ishaan-jaff · 2025-01-07T18:55:03Z

Sorry ci / cd causing issues - will update here once new release out

RyanMarten · 2025-01-08T23:54:18Z

@RyanMarten you are right ! just ran a load test to confirm. The right is with aiohttp it's 10x more RPS

@ishaan-jaff Also curious, what software / visualization are you using for your load tests?

ishaan-jaff · 2025-01-09T02:03:14Z

@RyanMarten -can you help test this: https://github.com/BerriAI/litellm/releases/tag/v1.57.3

Also curious, what software / visualization are you using for your load tests?

I was using locust

RyanMarten · 2025-01-10T00:09:46Z

poetry add litellm@latest
Using version ^1.57.4 for litellm

from bespokelabs.curator import LLM
from datasets import Dataset

dataset = Dataset.from_dict({"prompt": ["write me a poem"] * 100_000})

llm = LLM(
    prompt_func=lambda row: row["prompt"],
    model_name="aiohttp_openai/gpt-4o-mini",
    backend="litellm",
)

dataset = llm(dataset)

I'm getting this error now which I wasn't before. I think this is an issue from our side, let me test.

Traceback (most recent call last):
  File "/Users/ryan/curator/.venv/lib/python3.12/site-packages/litellm/llms/custom_httpx/aiohttp_handler.py", line 112, in _make_common_sync_call
    response = sync_httpx_client.post(
               ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ryan/curator/.venv/lib/python3.12/site-packages/litellm/llms/custom_httpx/http_handler.py", line 528, in post
    raise e
  File "/Users/ryan/curator/.venv/lib/python3.12/site-packages/litellm/llms/custom_httpx/http_handler.py", line 509, in post
    response.raise_for_status()
  File "/Users/ryan/curator/.venv/lib/python3.12/site-packages/httpx/_models.py", line 763, in raise_for_status
    raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Client error '400 Bad Request' for url 'https://api.openai.com/v1/chat/completions'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/400

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/ryan/curator/.venv/lib/python3.12/site-packages/litellm/main.py", line 1501, in completion
    response = base_llm_aiohttp_handler.completion(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ryan/curator/.venv/lib/python3.12/site-packages/litellm/llms/custom_httpx/aiohttp_handler.py", line 302, in completion
    response = self._make_common_sync_call(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ryan/curator/.venv/lib/python3.12/site-packages/litellm/llms/custom_httpx/aiohttp_handler.py", line 132, in _make_common_sync_call
    raise self._handle_error(e=e, provider_config=provider_config)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ryan/curator/.venv/lib/python3.12/site-packages/litellm/llms/custom_httpx/aiohttp_handler.py", line 389, in _handle_error
    raise provider_config.get_error_class(
litellm.llms.openai.common_utils.OpenAIError: {
    "error": {
        "message": "you must provide a model parameter",
        "type": "invalid_request_error",
        "param": null,
        "code": null
    }
}


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/ryan/curator/../dcft_dump/SPEED_TEST.py", line 6, in <module>
    llm = LLM(
          ^^^^
  File "/Users/ryan/curator/src/bespokelabs/curator/llm/llm.py", line 111, in __init__
    self._request_processor = _RequestProcessorFactory.create(backend_params, batch=batch, response_format=response_format, backend=backend)
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ryan/curator/src/bespokelabs/curator/request_processor/_factory.py", line 127, in create
    _request_processor = LiteLLMOnlineRequestProcessor(config)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ryan/curator/src/bespokelabs/curator/request_processor/online/litellm_online_request_processor.py", line 46, in __init__
    self.header_based_max_requests_per_minute, self.header_based_max_tokens_per_minute = self.get_header_based_rate_limits()
                                                                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ryan/curator/src/bespokelabs/curator/request_processor/online/litellm_online_request_processor.py", line 154, in get_header_based_rate_limits
    headers = self.test_call()
              ^^^^^^^^^^^^^^^^
  File "/Users/ryan/curator/src/bespokelabs/curator/request_processor/online/litellm_online_request_processor.py", line 127, in test_call
    completion = litellm.completion(
                 ^^^^^^^^^^^^^^^^^^^
  File "/Users/ryan/curator/.venv/lib/python3.12/site-packages/litellm/utils.py", line 1022, in wrapper
    raise e
  File "/Users/ryan/curator/.venv/lib/python3.12/site-packages/litellm/utils.py", line 900, in wrapper
    result = original_function(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ryan/curator/.venv/lib/python3.12/site-packages/litellm/main.py", line 2955, in completion
    raise exception_type(
          ^^^^^^^^^^^^^^^
  File "/Users/ryan/curator/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2189, in exception_type
    raise e
  File "/Users/ryan/curator/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2158, in exception_type
    raise APIConnectionError(
litellm.exceptions.APIConnectionError: litellm.APIConnectionError: Aiohttp_openaiException - {
    "error": {
        "message": "you must provide a model parameter",
        "type": "invalid_request_error",
        "param": null,
        "code": null
    }
}

RyanMarten · 2025-01-10T00:19:47Z

Ah this is because we do a test call with completion instead of acompletion

completion = litellm.completion(model="aiohttp_openai/gpt-4o-mini",messages=[{"role": "user", "content": "hi"}])

fails with an unintuitive error message

litellm.exceptions.APIConnectionError: litellm.APIConnectionError: Aiohttp_openaiException - {
    "error": {
        "message": "you must provide a model parameter",
        "type": "invalid_request_error",
        "param": null,
        "code": null
    }
}

What I can do is just switch this call to use acompletion as well

RyanMarten · 2025-01-10T00:27:26Z

OK now I'm running into an issue in the main loop where

2025-01-09 16:26:13,066 - bespokelabs.curator.request_processor.online.base_online_request_processor - WARNING - Encountered 'APIConnectionError: litellm.APIConnectionError: Aiohttp_openaiException - Event loop is closed' during attempt 1 of 10 while processing request 0

@vutrung96 could you take a look at this since you wrote the custom event loop handling?

vutrung96 added the bug Something isn't working label Nov 5, 2024

RyanMarten mentioned this issue Nov 12, 2024

Add more model support with liteLLM bespokelabsai/curator#74

Closed

Tostino mentioned this issue Dec 27, 2024

OpenAI: Migrate to async client and enhance API support timescale/pgai#219

Open

ishaan-jaff self-assigned this Jan 2, 2025

RyanMarten mentioned this issue Jan 7, 2025

(perf) use aiohttp for custom_openai #7514

Merged

RyanMarten added a commit to bespokelabsai/curator that referenced this issue Jan 7, 2025

initial litellm tests for BerriAI/litellm#6592

e545db2

vutrung96 mentioned this issue Jan 10, 2025

Deprecate OpenAI backend once we've tested that LiteLLM can reach 500 RPS bespokelabsai/curator#336

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Cannot get past 50 RPS #6592

[Bug]: Cannot get past 50 RPS #6592

vutrung96 commented Nov 5, 2024 •

edited

Loading

ishaan-jaff commented Nov 14, 2024

vutrung96 commented Nov 16, 2024

CharlieJCJ commented Nov 19, 2024

ishaan-jaff commented Nov 21, 2024

RyanMarten commented Dec 4, 2024

ishaan-jaff commented Jan 2, 2025 •

edited

Loading

ishaan-jaff commented Jan 3, 2025

RyanMarten commented Jan 7, 2025 •

edited

Loading

RyanMarten commented Jan 7, 2025

ishaan-jaff commented Jan 7, 2025

ishaan-jaff commented Jan 7, 2025

RyanMarten commented Jan 7, 2025 •

edited

Loading

RyanMarten commented Jan 7, 2025 •

edited

Loading

ishaan-jaff commented Jan 7, 2025 •

edited

Loading

RyanMarten commented Jan 7, 2025

ishaan-jaff commented Jan 7, 2025

RyanMarten commented Jan 8, 2025

ishaan-jaff commented Jan 9, 2025

RyanMarten commented Jan 10, 2025 •

edited

Loading

RyanMarten commented Jan 10, 2025 •

edited

Loading

RyanMarten commented Jan 10, 2025

[Bug]: Cannot get past 50 RPS #6592

[Bug]: Cannot get past 50 RPS #6592

Comments

vutrung96 commented Nov 5, 2024 • edited Loading

What happened?

Relevant log output

Twitter / LinkedIn details

ishaan-jaff commented Nov 14, 2024

vutrung96 commented Nov 16, 2024

CharlieJCJ commented Nov 19, 2024

ishaan-jaff commented Nov 21, 2024

RyanMarten commented Dec 4, 2024

ishaan-jaff commented Jan 2, 2025 • edited Loading

ishaan-jaff commented Jan 3, 2025

RyanMarten commented Jan 7, 2025 • edited Loading

RyanMarten commented Jan 7, 2025

ishaan-jaff commented Jan 7, 2025

ishaan-jaff commented Jan 7, 2025

RyanMarten commented Jan 7, 2025 • edited Loading

RyanMarten commented Jan 7, 2025 • edited Loading

ishaan-jaff commented Jan 7, 2025 • edited Loading

RyanMarten commented Jan 7, 2025

ishaan-jaff commented Jan 7, 2025

RyanMarten commented Jan 8, 2025

ishaan-jaff commented Jan 9, 2025

RyanMarten commented Jan 10, 2025 • edited Loading

RyanMarten commented Jan 10, 2025 • edited Loading

RyanMarten commented Jan 10, 2025

vutrung96 commented Nov 5, 2024 •

edited

Loading

ishaan-jaff commented Jan 2, 2025 •

edited

Loading

RyanMarten commented Jan 7, 2025 •

edited

Loading

RyanMarten commented Jan 7, 2025 •

edited

Loading

RyanMarten commented Jan 7, 2025 •

edited

Loading

ishaan-jaff commented Jan 7, 2025 •

edited

Loading

RyanMarten commented Jan 10, 2025 •

edited

Loading

RyanMarten commented Jan 10, 2025 •

edited

Loading