Problems running locally on Windows 11 #10

moebiussurfing · 2024-09-27T23:19:56Z

Hello @lamm-mit ,
I am getting this error on the llama server.
This is after submitting and setling the web server UI
as mentioned in other issues here:
Text Generation Model: openai/custom_model
Custom Base API: http://localhost:8888/v1

INFO:     127.0.0.1:55176 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
Exception: Requested tokens (8031) exceed context window of 2048
Traceback (most recent call last):
  File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_cpp\server\errors.py", line 171, in custom_route_handler
    response = await original_route_handler(request)
  File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\fastapi\routing.py", line 297, in app
    raw_response = await run_endpoint_function(
  File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\fastapi\routing.py", line 210, in run_endpoint_function
    return await dependant.call(**values)
  File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_cpp\server\app.py", line 513, in create_chat_completion
    ] = await run_in_threadpool(llama.create_chat_completion, **kwargs)
  File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\starlette\concurrency.py", line 42, in run_in_threadpool
    return await anyio.to_thread.run_sync(func, *args)
  File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\_backends\_asyncio.py", line 2177, in run_sync_in_worker_thread
    return await future
  File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\_backends\_asyncio.py", line 859, in run
    result = context.run(func, *args)
  File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_cpp\llama.py", line 1997, in create_chat_completion
    return handler(
  File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_cpp\llama_chat_format.py", line 637, in chat_completion_handler
    completion_or_chunks = llama.create_completion(
  File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_cpp\llama.py", line 1831, in create_completion
    completion: Completion = next(completion_or_chunks)  # type: ignore
  File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_cpp\llama.py", line 1267, in _create_completion
    raise ValueError(
ValueError: Requested tokens (8031) exceed context window of 2048
INFO:     127.0.0.1:55189 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request

Any help?

PS
I noticed that I am using Python 3.10 but install info pointed to 3.9
conda create -n pdf2audio python=3.9

moebiussurfing · 2024-09-30T18:13:04Z

It seems that in my case (uploaded pdf) required more context size:
PS D:\_AI\LLM\PDF2Audio> python -m llama_cpp.server --model ./zephyr-7b-beta.Q6_K.gguf --host 127.0.0.1 --port 8888 --n_ctx 16384

Anyway, I get an API KEY error (even using the DUMMY alias):
warning Error code: 401 - {'error': {'message': 'Incorrect API key provided: DUMMY. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems running locally on Windows 11 #10

Problems running locally on Windows 11 #10

moebiussurfing commented Sep 27, 2024

moebiussurfing commented Sep 30, 2024 •

edited

Loading

Problems running locally on Windows 11 #10

Problems running locally on Windows 11 #10

Comments

moebiussurfing commented Sep 27, 2024

moebiussurfing commented Sep 30, 2024 • edited Loading

moebiussurfing commented Sep 30, 2024 •

edited

Loading