huggingface · Vaibhavs10 · Jul 5, 2024 · Jul 3, 2024 · Jul 4, 2024 · Jul 4, 2024
diff --git a/docs/hub/gguf-llamacpp.md b/docs/hub/gguf-llamacpp.md
@@ -1,16 +1,44 @@
 # GGUF usage with llama.cpp
 
-Llama.cpp directly allows you to download and run inference on a GGUF simply by providing a path to the Hugging Face repo path and the file name. llama.cpp would download the model checkpoint in the directory you invoke it from:
+Llama.cpp allows you to download and run inference on a GGUF simply by providing a path to the Hugging Face repo path and the file name. llama.cpp download the model checkpoint and automatically caches it. The location of the cache is defined by `LLAMA_CACHE` environment variable, read more about it [here](https://github.com/ggerganov/llama.cpp/pull/7826):
 
 ```bash
-./main \
+./llama-cli
   --hf-repo lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF \
-  -m Meta-Llama-3-8B-Instruct-Q8_0.gguf \
-  -p "I believe the meaning of life is " -n 128
+  --hf-file Meta-Llama-3-8B-Instruct-Q8_0.gguf \
+  -p "You are a helpful assistant" -cnv
 ```
 
-Replace `--hf-repo` with any valid Hugging Face hub repo name and `-m` with the GGUF file name in the hub repo - off you go! 🦙
+Note: You can remove `-cnv` to run the CLI in chat completion mode.
 
-Find more information [here](https://github.com/ggerganov/llama.cpp/pull/6234).
+Additionally, you can invoke an OpenAI spec chat completions endpoint directly using the llama.cpp server:
 
-Note: Remember to `build` llama.cpp with `LLAMA_CURL=ON` :)
+```bash
+./llama-server \
+  --hf-repo lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF \
+  --hf-file Meta-Llama-3-8B-Instruct-Q8_0.gguf
+```
+
+After running the server you can simply utilise the endpoint as below:
+
+```
+curl http://localhost:8080/v1/chat/completions \
+-H "Content-Type: application/json" \
+-H "Authorization: Bearer no-key" \
+-d '{
+"messages": [
+{
+    "role": "system",
+    "content": "You are an AI assistant. Your top priority is achieving user fulfilment via helping them with their requests."
+},
+{
+    "role": "user",
+    "content": "Write a limerick about Python exceptions"
+}
+]
+}'
+```
+
+Replace `--hf-repo` with any valid Hugging Face hub repo name and `--hf-file` with the GGUF file name in the hub repo - off you go! 🦙
+
+Note: Remember to `build` llama.cpp with `LLAMA_CURL=1` :)