Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems running locally on Windows 11 #10

Open
moebiussurfing opened this issue Sep 27, 2024 · 1 comment
Open

Problems running locally on Windows 11 #10

moebiussurfing opened this issue Sep 27, 2024 · 1 comment

Comments

@moebiussurfing
Copy link

Hello @lamm-mit ,
I am getting this error on the llama server.
This is after submitting and setling the web server UI
as mentioned in other issues here:
Text Generation Model: openai/custom_model
Custom Base API: http://localhost:8888/v1

INFO:     127.0.0.1:55176 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
Exception: Requested tokens (8031) exceed context window of 2048
Traceback (most recent call last):
  File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_cpp\server\errors.py", line 171, in custom_route_handler
    response = await original_route_handler(request)
  File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\fastapi\routing.py", line 297, in app
    raw_response = await run_endpoint_function(
  File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\fastapi\routing.py", line 210, in run_endpoint_function
    return await dependant.call(**values)
  File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_cpp\server\app.py", line 513, in create_chat_completion
    ] = await run_in_threadpool(llama.create_chat_completion, **kwargs)
  File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\starlette\concurrency.py", line 42, in run_in_threadpool
    return await anyio.to_thread.run_sync(func, *args)
  File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\_backends\_asyncio.py", line 2177, in run_sync_in_worker_thread
    return await future
  File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\_backends\_asyncio.py", line 859, in run
    result = context.run(func, *args)
  File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_cpp\llama.py", line 1997, in create_chat_completion
    return handler(
  File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_cpp\llama_chat_format.py", line 637, in chat_completion_handler
    completion_or_chunks = llama.create_completion(
  File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_cpp\llama.py", line 1831, in create_completion
    completion: Completion = next(completion_or_chunks)  # type: ignore
  File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_cpp\llama.py", line 1267, in _create_completion
    raise ValueError(
ValueError: Requested tokens (8031) exceed context window of 2048
INFO:     127.0.0.1:55189 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request

Any help?

PS
I noticed that I am using Python 3.10 but install info pointed to 3.9
conda create -n pdf2audio python=3.9

@moebiussurfing
Copy link
Author

moebiussurfing commented Sep 30, 2024

It seems that in my case (uploaded pdf) required more context size:
PS D:\_AI\LLM\PDF2Audio> python -m llama_cpp.server --model ./zephyr-7b-beta.Q6_K.gguf --host 127.0.0.1 --port 8888 --n_ctx 16384

Anyway, I get an API KEY error (even using the DUMMY alias):
warning Error code: 401 - {'error': {'message': 'Incorrect API key provided: DUMMY. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant