Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add ollama docs #1447

Merged
merged 3 commits into from
Oct 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/hub/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,8 @@
title: GGUF usage with llama.cpp
- local: gguf-gpt4all
title: GGUF usage with GPT4All
- local: ollama
title: Use Ollama with GGUF Model
Comment on lines +147 to +148
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no big deal but could have been nested under GGUF like llama.cpp

- title: Datasets
local: datasets
isExpanded: true
Expand Down
72 changes: 72 additions & 0 deletions docs/hub/ollama.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Use Ollama with any GGUF Model on Hugging Face Hub

![cover](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/ollama/cover.png)

Ollama is an application based on llama.cpp to interact with LLMs directly through your computer. You can use any GGUF quants created by the community ([bartowski](https://huggingface.co/bartowski), [MaziyarPanahi](https://huggingface.co/MaziyarPanahi) and many more) on Hugging Face directly with Ollama, without creating a new `Modelfile`. At the time of writing there are 45K public GGUF checkpoints on the Hub, you can run any of them with a single `ollama run` command. We also provide customisations like choosing quantization type, system prompt and more to improve your overall experience.

Getting started is as simple as:

```sh
ollama run hf.co/{username}/{repository}
```

Please note that you can use both `hf.co` and `huggingface.co` as the domain name.

Here are some other models that you can try:

```sh
ollama run hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF
ollama run hf.co/mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated-GGUF
ollama run hf.co/arcee-ai/SuperNova-Medius-GGUF
ollama run hf.co/bartowski/Humanish-LLama3-8B-Instruct-GGUF
```

## Custom Quantization

By default, the `Q4_K_M` quantization scheme is used. To select a different scheme, simply add a tag:

```sh
ollama run hf.co/{username}/{repository}:{quantization}
```

![guide](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/ollama/guide.png)

For example:

```sh
ollama run hf.co/bartowski/Llama-3.2-3B-Instruct-GGUF:IQ3_M
ollama run hf.co/bartowski/Llama-3.2-3B-Instruct-GGUF:Q8_0

# the quantization name is case-insensitive, this will also work
ollama run hf.co/bartowski/Llama-3.2-3B-Instruct-GGUF:iq3_m

# you can also select a specific file
ollama run hf.co/bartowski/Llama-3.2-3B-Instruct-GGUF:Llama-3.2-3B-Instruct-IQ3_M.gguf
```

## Custom Chat Template and Parameters

By default, a template will be selected automatically from a list of commonly used templates. It will be selected based on the built-in `tokenizer.chat_template` metadata stored inside the GGUF file.

If your GGUF file doesn't have a built-in template or uses a custom chat template, you can create a new file called `template` in the repository. The template must be a Go template, not a Jinja template. Here's an example:

```
{{ if .System }}<|system|>
{{ .System }}<|end|>
{{ end }}{{ if .Prompt }}<|user|>
{{ .Prompt }}<|end|>
{{ end }}<|assistant|>
{{ .Response }}<|end|>
```

To know more about Go template format, please refer to [this documentation](https://github.com/ollama/ollama/blob/main/docs/template.md)

You can optionally configure a system prompt by putting it into a new file named `system` in the repository.

To change sampling parameters, create a file named `params` in the repository. The file must be in JSON format. For the list of all available parameters, please refer to [this documentation](https://github.com/ollama/ollama/blob/main/docs/modelfile.md#parameter).


## References

- https://github.com/ollama/ollama/blob/main/docs/README.md
- https://huggingface.co/docs/hub/en/gguf
Loading