Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

changelog : llama-server REST API #9291

Open
ggerganov opened this issue Sep 3, 2024 · 12 comments
Open

changelog : llama-server REST API #9291

ggerganov opened this issue Sep 3, 2024 · 12 comments
Labels
documentation Improvements or additions to documentation

Comments

@ggerganov
Copy link
Owner

ggerganov commented Sep 3, 2024

Overview

This is a list of changes to the public HTTP interface of the llama-server example. Collaborators are encouraged to edit this post in order to reflect important changes to the API that end up merged into the master branch.

If you are building a 3rd party project that relies on llama-server, it is recommended to follow this issue and check it carefully before upgrading to new versions.

See also:

Recent API changes (most recent at the top)

version PR desc
TBD. #10974 /v1/completions is now OAI-compat
TBD. #10783 logprobs is now OAI-compat, default to pre-sampling probs
TBD. #10861 /embeddings supports pooling type none
TBD. #10853 Add optional "tokens" output to /completions endpoint
b4337 #10803 Remove penalize_nl
b4265 #10626 CPU docker images working directory changed to /app
b4285 #10691 (Again) Change /slots and /props responses
b4283 #10704 Change /slots and /props responses
b4027 #10162 /slots endpoint: remove slot[i].state, add slot[i].is_processing
b3912 #9865 Add option to time limit the generation phase
b3911 #9860 Remove self-extend support
b3910 #9857 Remove legacy system prompt support
b3897 #9776 Change default security settings, /slots is now disabled by default
Endpoints now check for API key if it's set
b3887 #9510 Add /rerank endpoint
b3754 #9459 Add [DONE]\n\n in OAI stream response to match spec
b3721 #9398 Add seed_cur to completion response
b3683 #9308 Environment variable updated
b3599 #9056 Change /health and /slots

For older changes, use:

git log --oneline -p b3599 -- examples/server/README.md

Upcoming API changes

  • TBD
@ggerganov ggerganov added the documentation Improvements or additions to documentation label Sep 3, 2024
@ggerganov ggerganov pinned this issue Sep 3, 2024
@ngxson
Copy link
Collaborator

ngxson commented Sep 7, 2024

Not a REST API breaking change, but is server-related: some environment variables are changed in #9308

@slaren
Copy link
Collaborator

slaren commented Sep 13, 2024

After #9398, in the completion response seed contains the seed requested by the user, while seed_cur contains the seed used to generate the completion. The values can be different if seed is LLAMA_DEFAULT_SEED (or -1), in which case a random seed is generated and returned in seed_cur.

@ngxson
Copy link
Collaborator

ngxson commented Oct 8, 2024

Breaking change #9776 : better security control for public deployments

  • /slots endpoint is now disabled by default, start server with --slots to enable it
  • If an API key is set, all endpoints (including /slots and /props) requires a correct API key to access.
    Note: Only /health and /models are always publicly accessible
  • Setting "system_prompt" is removed from /completions endpoint. It is now moved to POST /props (see documentation)

Please note that GET /props is always enabled to avoid breaking the web UI.

@ngxson
Copy link
Collaborator

ngxson commented Nov 4, 2024

Breaking change for /slots endpoint #10162

slot[i].state is removed and replaced by slot[i].is_processing

slot[i].is_processing === false means the slot is idle

@isaac-mcfadyen
Copy link
Contributor

Breaking change for /slots endpoint #10162

slot[i].state is removed and replaced by slot[i].is_processing

slot[i].is_processing === false means the slot is idle

Was the slots endpoint also disabled by default? (or maybe just a documentation change?)
https://github.com/ggerganov/llama.cpp/pull/10162/files#diff-42ce5869652f266b01a5b5bc95f4d945db304ce54545e2d0c017886a7f1cee1aR698

@ngxson
Copy link
Collaborator

ngxson commented Nov 5, 2024

For security reasons, "/slots" was disabled by default since #9776 , and was mentioned in the breaking changes table. I just forgot to update the docs.

@ngxson
Copy link
Collaborator

ngxson commented Nov 7, 2024

Not an API change, but maybe good to know that the default web UI for llama-server changed in #10175

If you want to use the old completion UI, please follow instruction in the PR.

@ggerganov
Copy link
Owner Author

cache_prompt: true is now used by default (#10501)

@ngxson
Copy link
Collaborator

ngxson commented Dec 7, 2024

/propsand /slots endpoints has changed in #10691 and #10704 , see server/README.md for more

@ngxson
Copy link
Collaborator

ngxson commented Dec 18, 2024

/embeddings will NOT be OAI-compat after #10861

For clarification, we will maintain OAI-compat for all API under /v1 prefix, including:

  • /v1/embeddings
  • /v1/chat/completions

NOTE: OAI support for /v1/completions will come in the near future

@ngxson
Copy link
Collaborator

ngxson commented Dec 19, 2024

Behavior of n_probs has changed in #10783 , we're now providing OAI-compatible logprobs option

@ngxson
Copy link
Collaborator

ngxson commented Dec 31, 2024

Added OAI-compat support for /v1/completions here: #10974

If you want to use it with downstream library, be sure to add /v1 prefix. For example, using python library:

from openai import OpenAI

client = OpenAI(api_key="dummy", base_url=f"http://localhost:8080/v1")
res = client.completions.create(
    model="davinci-002",
    prompt="I believe the meaning of life is",
    max_tokens=8,
)

If you want to use the old non-OAI style, remove the /v1 from endpoint path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

5 participants
@ggerganov @slaren @isaac-mcfadyen @ngxson and others