Add Jinja template support #11016

ochafik · 2024-12-30T03:48:15Z

Subset of #9639 with just the Jinja templating support.

Proper tool support (grammar constraints, lazy grammar triggering, tool call parsing & stop reason) will come in a follow up PR.

Copies minja.hpp & chat-template.hpp from google/minja (created for this 😅) at this commit
Adds --jinja flag to llama-server, llama-cli, llama-run
Adds --chat-template-file flag to llama-server, llama-cli (related: Added chat template support to llama-run #11215 )
Loads tokenizer.chat_template (or tokenizer.chat_template.tool_use if defined, only when the request has tools).
Dual testing in test-chat-template.cpp of legacy adhoc templating & jinja route. Wherever the expected outputs diverge, the jinja expectations should be more correct (note that templates are run w/ trim_blocks = true, lstrip_blocks = true)
- Sent Refactor test-chat-template.cpp #11224 separately

Example usage:

# Launch in background
./build/bin/llama-server \
  -hfr bartowski/Qwen2.5-7B-Instruct-GGUF \
  -hff Qwen2.5-7B-Instruct-Q4_K_M.gguf \
  --jinja &

curl http://localhost:8080/v1/chat/completions \
  -d '{
    "model": "gpt-3.5-turbo",
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "ipython",
          "description": "Runs code in an ipython interpreter and returns the result of the execution after 60 seconds.",
          "parameters": {
            "type": "object",
            "properties": {
              "code": {
                "type": "string",
                "description": "The code to run in the ipython interpreter."
              }
            },
            "required": ["code"]
          }
        }
      }
    ],
    "messages": [
      {
        "role": "user",
        "content": "Print a hello world message with python (using single quotes '"'"' for strings)."
      }
    ]
  }'

show output

{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "<tool_call>\n{\"name\": \"ipython\", \"arguments\": {\"code\": \"print('Hello world!')\"}}\n</tool_call>",
        "role": "assistant"
      }
    }
  ],
  "created": 1736811609,
  "model": "gpt-3.5-turbo",
  "system_fingerprint": "b4494-a57bb94e",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 25,
    "prompt_tokens": 205,
    "total_tokens": 230
  },
  "id": "chatcmpl-5YJXFVhvjoMDlLx1asuWNdSO3JVWWsUF",
  "timings": {
    "prompt_n": 1,
    "prompt_ms": 155.151,
    "prompt_per_token_ms": 155.151,
    "prompt_per_second": 6.445333900522716,
    "predicted_n": 25,
    "predicted_ms": 419.714,
    "predicted_per_token_ms": 16.78856,
    "predicted_per_second": 59.56437002339688
  }
}

TODO:

Add cross-testing in test-chat-template.cpp (note that minja is tested against a lot of templates in its own repo)
Add some instructions here
Add more server tests to exercise the template overrides.

ericcurtin · 2025-01-13T17:23:33Z

Feel free to add the option to llama-run for basic testing also @ochafik

ericcurtin

I approve the llama-run parts at least, but the more code we can share with llama-server, etc. the better, there's probably room for more de-duplication

ngxson · 2025-01-14T12:51:34Z

IMO we can extend common_chat_apply_template to add bool jinja and reuse this function in other examples.

github-actions bot added script Script related examples python python script changes server labels Dec 30, 2024

ochafik added 2 commits December 30, 2024 03:50

Copy minja from google/minja@58f0ca6

abd274a

Add --jinja and --chat-template-file flags

e5113e8

ochafik force-pushed the jinja branch from 4ec6151 to e5113e8 Compare December 30, 2024 03:50

ochafik added 4 commits December 30, 2024 04:10

Add missing <optional> include

80138d9

Avoid print in get_hf_chat_template.py

06b5159

No designated initializers yet

ce48584

Try and work around msvc++ non-macro max resolution quirk

389d79b

ochafik force-pushed the jinja branch from c3b07a8 to 389d79b Compare December 30, 2024 04:50

Update test_chat_completion.py

238b968

ochafik mentioned this pull request Dec 30, 2024

Tool call support (Llama 3.x, Functionary v3, Hermes 2 Pro, Mistral Nemo, generic) w/ lazy grammars & minimalist Jinja engine #9639

Draft

34 tasks

slaren mentioned this pull request Dec 31, 2024

llama : add support for Cohere2ForCausalLM #10900

Merged

ngxson mentioned this pull request Jan 13, 2025

Added chat template support to llama-run #11215

Draft

ochafik added 4 commits January 13, 2025 19:56

Merge remote-tracking branch 'origin/master' into jinja

cb72cf1

Wire LLM_KV_TOKENIZER_CHAT_TEMPLATE_N in llama_model_chat_template

78861a3

Refactor test-chat-template

1aac99a

Test templates w/ minja

7c84ebc

github-actions bot added the testing Everything test related label Jan 13, 2025

ochafik added 8 commits January 13, 2025 21:30

Fix deprecation

18f257b

Add --jinja to llama-run

8dd4f33

Merge remote-tracking branch 'origin/master' into jinja

c04c50e

Update common_chat_format_example to use minja template wrapper

a6afb27

Test chat_template in e2e test

b4083e4

Update utils.py

b7e2171

Update test_chat_completion.py

a57bb94

Update run.cpp

4daae0b

ochafik marked this pull request as ready for review January 13, 2025 23:50

ochafik requested a review from ngxson as a code owner January 13, 2025 23:50

Update arg.cpp

1b3bb7e

ochafik mentioned this pull request Jan 14, 2025

Refactor test-chat-template.cpp #11224

Merged

Merge remote-tracking branch 'origin/master' into jinja

3ed670b

ericcurtin approved these changes Jan 14, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Jinja template support #11016

Add Jinja template support #11016

ochafik commented Dec 30, 2024 •

edited

Loading

ericcurtin commented Jan 13, 2025

ericcurtin left a comment

ngxson commented Jan 14, 2025

Add Jinja template support #11016

Are you sure you want to change the base?

Add Jinja template support #11016

Conversation

ochafik commented Dec 30, 2024 • edited Loading

ericcurtin commented Jan 13, 2025

ericcurtin left a comment

Choose a reason for hiding this comment

ngxson commented Jan 14, 2025

ochafik commented Dec 30, 2024 •

edited

Loading