You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello @lamm-mit ,
I am getting this error on the llama server.
This is after submitting and setling the web server UI
as mentioned in other issues here:
Text Generation Model: openai/custom_model
Custom Base API: http://localhost:8888/v1
INFO: 127.0.0.1:55176 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
Exception: Requested tokens (8031) exceed context window of 2048
Traceback (most recent call last):
File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_cpp\server\errors.py", line 171, in custom_route_handler
response = await original_route_handler(request)
File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\fastapi\routing.py", line 297, in app
raw_response = await run_endpoint_function(
File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\fastapi\routing.py", line 210, in run_endpoint_function
return await dependant.call(**values)
File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_cpp\server\app.py", line 513, in create_chat_completion
] = await run_in_threadpool(llama.create_chat_completion, **kwargs)
File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\starlette\concurrency.py", line 42, in run_in_threadpool
return await anyio.to_thread.run_sync(func, *args)
File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\_backends\_asyncio.py", line 2177, in run_sync_in_worker_thread
return await future
File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\_backends\_asyncio.py", line 859, in run
result = context.run(func, *args)
File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_cpp\llama.py", line 1997, in create_chat_completion
return handler(
File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_cpp\llama_chat_format.py", line 637, in chat_completion_handler
completion_or_chunks = llama.create_completion(
File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_cpp\llama.py", line 1831, in create_completion
completion: Completion = next(completion_or_chunks) # type: ignore
File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_cpp\llama.py", line 1267, in _create_completion
raise ValueError(
ValueError: Requested tokens (8031) exceed context window of 2048
INFO: 127.0.0.1:55189 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
Any help?
PS
I noticed that I am using Python 3.10 but install info pointed to 3.9
conda create -n pdf2audio python=3.9
The text was updated successfully, but these errors were encountered:
It seems that in my case (uploaded pdf) required more context size: PS D:\_AI\LLM\PDF2Audio> python -m llama_cpp.server --model ./zephyr-7b-beta.Q6_K.gguf --host 127.0.0.1 --port 8888 --n_ctx 16384
Anyway, I get an API KEY error (even using the DUMMY alias): warning Error code: 401 - {'error': {'message': 'Incorrect API key provided: DUMMY. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}
Hello @lamm-mit ,
I am getting this error on the llama server.
This is after submitting and setling the web server UI
as mentioned in other issues here:
Text Generation Model: openai/custom_model
Custom Base API: http://localhost:8888/v1
Any help?
PS
I noticed that I am using Python 3.10 but install info pointed to 3.9
conda create -n pdf2audio python=3.9
The text was updated successfully, but these errors were encountered: