Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Jinja template support #11016

Open
wants to merge 21 commits into
base: master
Choose a base branch
from
Open

Add Jinja template support #11016

wants to merge 21 commits into from

Conversation

ochafik
Copy link
Collaborator

@ochafik ochafik commented Dec 30, 2024

Subset of #9639 with just the Jinja templating support.

Proper tool support (grammar constraints, lazy grammar triggering, tool call parsing & stop reason) will come in a follow up PR.

  • Copies minja.hpp & chat-template.hpp from google/minja (created for this 😅) at this commit
  • Adds --jinja flag to llama-server, llama-cli, llama-run
  • Adds --chat-template-file flag to llama-server, llama-cli (related: Added chat template support to llama-run #11215 )
  • Loads tokenizer.chat_template (or tokenizer.chat_template.tool_use if defined, only when the request has tools).
  • Dual testing in test-chat-template.cpp of legacy adhoc templating & jinja route. Wherever the expected outputs diverge, the jinja expectations should be more correct (note that templates are run w/ trim_blocks = true, lstrip_blocks = true)

Example usage:

# Launch in background
./build/bin/llama-server \
  -hfr bartowski/Qwen2.5-7B-Instruct-GGUF \
  -hff Qwen2.5-7B-Instruct-Q4_K_M.gguf \
  --jinja &

curl http://localhost:8080/v1/chat/completions \
  -d '{
    "model": "gpt-3.5-turbo",
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "ipython",
          "description": "Runs code in an ipython interpreter and returns the result of the execution after 60 seconds.",
          "parameters": {
            "type": "object",
            "properties": {
              "code": {
                "type": "string",
                "description": "The code to run in the ipython interpreter."
              }
            },
            "required": ["code"]
          }
        }
      }
    ],
    "messages": [
      {
        "role": "user",
        "content": "Print a hello world message with python (using single quotes '"'"' for strings)."
      }
    ]
  }'
show output
{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "<tool_call>\n{\"name\": \"ipython\", \"arguments\": {\"code\": \"print('Hello world!')\"}}\n</tool_call>",
        "role": "assistant"
      }
    }
  ],
  "created": 1736811609,
  "model": "gpt-3.5-turbo",
  "system_fingerprint": "b4494-a57bb94e",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 25,
    "prompt_tokens": 205,
    "total_tokens": 230
  },
  "id": "chatcmpl-5YJXFVhvjoMDlLx1asuWNdSO3JVWWsUF",
  "timings": {
    "prompt_n": 1,
    "prompt_ms": 155.151,
    "prompt_per_token_ms": 155.151,
    "prompt_per_second": 6.445333900522716,
    "predicted_n": 25,
    "predicted_ms": 419.714,
    "predicted_per_token_ms": 16.78856,
    "predicted_per_second": 59.56437002339688
  }
}

TODO:

  • Add cross-testing in test-chat-template.cpp (note that minja is tested against a lot of templates in its own repo)
  • Add some instructions here
  • Add more server tests to exercise the template overrides.

@github-actions github-actions bot added script Script related examples python python script changes server labels Dec 30, 2024
@ericcurtin
Copy link
Collaborator

Feel free to add the option to llama-run for basic testing also @ochafik

@github-actions github-actions bot added the testing Everything test related label Jan 13, 2025
@ochafik ochafik marked this pull request as ready for review January 13, 2025 23:50
@ochafik ochafik requested a review from ngxson as a code owner January 13, 2025 23:50
Copy link
Collaborator

@ericcurtin ericcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve the llama-run parts at least, but the more code we can share with llama-server, etc. the better, there's probably room for more de-duplication

@ngxson
Copy link
Collaborator

ngxson commented Jan 14, 2025

IMO we can extend common_chat_apply_template to add bool jinja and reuse this function in other examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
examples python python script changes script Script related server testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants