Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New api docs structure #1379

Merged
merged 42 commits into from
Sep 12, 2024
Merged
Show file tree
Hide file tree
Changes from 38 commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
88b8af1
Add draft of docs structure
osanseviero Aug 19, 2024
f558bdd
Add index page
osanseviero Aug 20, 2024
8b6230f
Prepare overview and rate limits
osanseviero Aug 21, 2024
6380dfe
Manage redirects
osanseviero Aug 21, 2024
9df929a
Clean up
osanseviero Aug 21, 2024
60ad476
Apply suggestions from code review
osanseviero Aug 21, 2024
a93f0dc
Apply suggestions from review
osanseviero Aug 21, 2024
4069586
Merge branch 'new_api_docs' of github.com:huggingface/hub-docs into n…
osanseviero Aug 21, 2024
f2610b7
Add additional headers
osanseviero Aug 23, 2024
c0bee69
Apply suggestions from code review
osanseviero Aug 26, 2024
6294514
Incorporate reviewer's feedback
osanseviero Aug 26, 2024
12ba289
First draft for text-to-image, image-to-image + generate script (#1384)
Wauplin Aug 27, 2024
eb6171e
Merge branches 'main' and 'new_api_docs' of github.com:huggingface/hu…
osanseviero Aug 27, 2024
9b1e735
Add getting started
osanseviero Aug 27, 2024
fb57a2d
Add draft of docs structure
osanseviero Aug 19, 2024
bad42b0
Add index page
osanseviero Aug 20, 2024
d656272
Prepare overview and rate limits
osanseviero Aug 21, 2024
01983fc
Manage redirects
osanseviero Aug 21, 2024
dfdc02d
Clean up
osanseviero Aug 21, 2024
abe2d4f
Apply suggestions from review
osanseviero Aug 21, 2024
042a0e4
Apply suggestions from code review
osanseviero Aug 21, 2024
d774816
Add additional headers
osanseviero Aug 23, 2024
a097022
Apply suggestions from code review
osanseviero Aug 26, 2024
9bf223e
Incorporate reviewer's feedback
osanseviero Aug 26, 2024
51750bf
First draft for text-to-image, image-to-image + generate script (#1384)
Wauplin Aug 27, 2024
0c34106
Add getting started
osanseviero Aug 27, 2024
cc9b363
Merge branch 'new_api_docs' of github.com:huggingface/hub-docs into n…
Wauplin Aug 27, 2024
ac640c8
Update docs/api-inference/getting_started.md
osanseviero Aug 28, 2024
b785d8b
Draft to add text-generation parameters (#1393)
Wauplin Aug 28, 2024
22c6bae
Filter out frozen models from API docs for tasks (#1396)
Wauplin Aug 29, 2024
4039c7e
New api docs suggestions (#1397)
Wauplin Aug 29, 2024
49e8f67
Add comment header on each task page (#1400)
Wauplin Aug 30, 2024
20c17d0
Add even more tasks: token classification, translation and zero shot …
Wauplin Aug 30, 2024
528ea95
regenerate
Wauplin Aug 30, 2024
f267d86
pull from main
Wauplin Aug 30, 2024
ed5e37b
coding style
Wauplin Sep 4, 2024
2e1e64d
Update _redirects.yml
osanseviero Sep 4, 2024
bf973e0
Rename all tasks '_' to '-' (#1405)
Wauplin Sep 4, 2024
2b6f051
Update docs/api-inference/index.md
Wauplin Sep 5, 2024
92baadc
Apply feedback for "new_api_docs" (#1408)
Wauplin Sep 5, 2024
e9eff75
Fixes new docs (#1413)
osanseviero Sep 12, 2024
c65a120
Merge branch 'main' into new_api_docs
Wauplin Sep 12, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/api-inference/_redirects.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
quicktour: index
detailed_parameters: parameters
parallelism: getting_started
usage: getting_started
faq: index
52 changes: 52 additions & 0 deletions docs/api-inference/_toctree.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
- sections:
- local: index
title: Serverless Inference API
- local: getting-started
title: Getting Started
- local: supported-models
title: Supported Models
- local: rate-limits
title: Rate Limits
title: Getting Started
- sections:
- local: parameters
title: Parameters
- sections:
- local: tasks/audio-classification
title: Audio Classification
- local: tasks/automatic-speech-recognition
title: Automatic Speech Recognition
- local: tasks/chat-completion
title: Chat Completion
- local: tasks/feature-extraction
title: Feature Extraction
- local: tasks/fill-mask
title: Fill Mask
- local: tasks/image-classification
title: Image Classification
- local: tasks/image-segmentation
title: Image Segmentation
- local: tasks/image-to-image
title: Image to Image
- local: tasks/object-detection
title: Object Detection
- local: tasks/question-answering
title: Question Answering
- local: tasks/summarization
title: Summarization
- local: tasks/table-question-answering
title: Table Question Answering
- local: tasks/text-classification
title: Text Classification
- local: tasks/text-generation
title: Text Generation
- local: tasks/text-to-image
title: Text to Image
- local: tasks/token-classification
title: Token Classification
- local: tasks/translation
title: Translation
- local: tasks/zero-shot-classification
title: Zero Shot Classification
title: Detailed Task Parameters
title: API Reference
78 changes: 78 additions & 0 deletions docs/api-inference/getting-started.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Getting Started

The Serverless Inference API allows you to easily do inference on a wide range of models and tasks. You can do requests with your favorite tools (Python, cURL, etc). We also provide a Python SDK (`huggingface_hub`) to make it even easier.

We'll do a minimal example using a [sentiment classification model](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest). Please visit task-specific parameters and further documentation in our [API Reference](./parameters.md).

## Getting a Token

Using the Serverless Inference API requires passing a user token in the request headers. You can get a token by signing up on the Hugging Face website and then going to the [tokens page](https://huggingface.co/settings/tokens). We recommend creating a `Fine-grained` token with the scope to `Make calls to the serverless Inference API`.
Wauplin marked this conversation as resolved.
Show resolved Hide resolved

TODO: add screenshot
For more details about user tokens, check out [this guide](https://huggingface.co/docs/hub/en/security-tokens).

## cURL

```bash
curl https://api-inference.huggingface.co/models/cardiffnlp/twitter-roberta-base-sentiment-latest \
-X POST \
-d '{"inputs": "Today is a nice day"}' \
-H "Authorization: Bearer hf_***" \
-H "Content-Type: application/json"
Wauplin marked this conversation as resolved.
Show resolved Hide resolved
```

## Python

You can use the `requests` library to make a request to the Inference API.

```python
import requests

API_URL = "https://api-inference.huggingface.co/models/cardiffnlp/twitter-roberta-base-sentiment-latest"
headers = {"Authorization": "Bearer hf_***"}

payload = {"inputs": "Today is a nice day"}
response = requests.post(API_URL, headers=headers, json=payload)
response.json()
```

Hugging Face also provides a [`InferenceClient`](https://huggingface.co/docs/huggingface_hub/guides/inference) that handles inference, caching, async, and more. Make sure to install it with `pip install huggingface_hub` first

```python
from huggingface_hub import InferenceClient

client = InferenceClient(model="cardiffnlp/twitter-roberta-base-sentiment-latest", token="hf_***")
client.text_classification("Today is a nice day")
```

## JavaScript
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're not using the tabs component to switch programming languages?


```js
import fetch from "node-fetch";

async function query(data) {
const response = await fetch(
"https://api-inference.huggingface.co/models/MODEL_ID",
{
method: "POST",
headers: {
Authorization: `Bearer cardiffnlp/twitter-roberta-base-sentiment-latest`,
"Content-Type": "application/json",
},
body: JSON.stringify(data),
}
);
const result = await response.json();
return result;
}

query({
inputs: "Today is a nice day"
}).then((response) => {
console.log(JSON.stringify(response, null, 2));
});
```

## Next Steps

Now that you know the basics, you can explore the [API Reference](./parameters.md) to learn more about task-specific settings and parameters.
60 changes: 60 additions & 0 deletions docs/api-inference/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# Serverless Inference API

**Instant Access to 800,000+ ML Models for Fast Prototyping**
Wauplin marked this conversation as resolved.
Show resolved Hide resolved

Explore the most popular models for text, image, speech, and more — all with a simple API request. Build, test, and experiment without worrying about infrastructure or setup.

---

## Why use the Inference API?

The Serverless Inference API offers a fast and free way to explore thousands of models for a variety of tasks. Whether you're prototyping a new application or experimenting with ML capabilities, this API gives you instant access to high-performing models across multiple domains:

* **Text Generation:** Including large language models and tool-calling prompts, generate and experiment with high-quality responses.
* **Image Generation:** Easily create customized images, including LoRAs for your own styles.
* **Document Embeddings:** Build search and retrieval systems with SOTA embeddings.
* **Classical AI Tasks:** Ready-to-use models for text classification, image classification, speech recognition, and more.

TODO: add some flow chart image

⚡ **Fast and Free to Get Started**: The Inference API is free with higher rate limits for PRO users. For production needs, explore [Inference Endpoints](https://ui.endpoints.huggingface.co/) for dedicated resources, autoscaling, advanced security features, and more.

---

## Key Benefits

- 🚀 **Instant Prototyping:** Access powerful models without setup.
- 🎯 **Diverse Use Cases:** One API for text, image, and beyond.
- 🔧 **Developer-Friendly:** Simple requests, fast responses.

---

## Main Features

* Leverage over 800,000+ models from different open-source libraries (transformers, sentence transformers, adapter transformers, diffusers, timm, etc.).
* Use models for a variety of tasks, including text generation, image generation, document embeddings, NER, summarization, image classification, and more.
* Accelerate your prototyping by using GPU-powered models.
* Run very large models that are challenging to deploy in production.
* Production-grade platform without the hassle: built-in automatic scaling, load balancing and caching.

---

## Contents

The documentation is organized into two sections:

* **Getting Started** Learn the basics of how to use the Inference API.
* **API Reference** Dive into task-specific settings and parameters.

---

## Looking for custom support from the Hugging Face team?

<a target="_blank" href="https://huggingface.co/support">
<img alt="HuggingFace Expert Acceleration Program" src="https://cdn-media.huggingface.co/marketing/transformers/new-support-improved.png" style="max-width: 600px; border: 1px solid #eee; border-radius: 4px; box-shadow: 0 1px 2px 0 rgba(0, 0, 0, 0.05);">
</a><br>
Wauplin marked this conversation as resolved.
Show resolved Hide resolved

## Hugging Face is trusted in production by over 10,000 companies

<img class="block dark:hidden !shadow-none !border-0 !rounded-none" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/inference-api/companies-light.png" width="600">
<img class="hidden dark:block !shadow-none !border-0 !rounded-none" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/inference-api/companies-dark.png" width="600">
154 changes: 154 additions & 0 deletions docs/api-inference/parameters.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
# Parameters

Table with
- Domain
- Task
- Whether it's supported in Inference API
- Supported libraries (not sure)
- Recommended model
- Link to model specific page
Wauplin marked this conversation as resolved.
Show resolved Hide resolved



## Additional Options

### Caching

There is a cache layer on the inference API to speed up requests when the inputs are exactly the same. Many models, such as classifiers and embedding models, can use those results as is if they are deterministic, meaning the results will be the same. Howevr, if you use a nondeterministic model, you can disable the cache mechanism from being used, resulting in a real new query.
osanseviero marked this conversation as resolved.
Show resolved Hide resolved

To do this, you can add `x-use-cache:false` to the request headers. For example

<inferencesnippet>

<curl>
```diff
curl https://api-inference.huggingface.co/models/MODEL_ID \
-X POST \
-d '{"inputs": "Can you please let us know more details about your "}' \
-H "Authorization: Bearer hf_***" \
-H "Content-Type: application/json" \
+ -H "x-use-cache: false"
```
</curl>

<python>
```diff
import requests

API_URL = "https://api-inference.huggingface.co/models/MODEL_ID"
headers = {
"Authorization": "Bearer hf_***",
"Content-Type": "application/json",
+ "x-use-cache": "false"
}
data = {
"inputs": "Can you please let us know more details about your "
}
response = requests.post(API_URL, headers=headers, json=data)
print(response.json())
```

</python>

<js>
```diff
import fetch from "node-fetch";

async function query(data) {
const response = await fetch(
"https://api-inference.huggingface.co/models/MODEL_ID",
{
method: "POST",
headers: {
Authorization: `Bearer hf_***`,
"Content-Type": "application/json",
+ "x-use-cache": "false"
},
body: JSON.stringify(data),
}
);
const result = await response.json();
return result;
}

query({
inputs: "Can you please let us know more details about your "
}).then((response) => {
console.log(JSON.stringify(response, null, 2));
});

```

</js>

</inferencesnippet>

### Wait for the model

When a model is warm, it is ready to be used and you will get a response relatively quickly. However, some models are cold and need to be loaded before they can be used. In that case, you will get a 503 error. Rather than doing many requests until it's loaded, you can wait for the model to be loaded by adding `x-wait-for-model:true` to the request headers. We suggest to only use this flag to wait for the model to be loaded when you are sure that the model is cold. That means, first try the request without this flag and only if you get a 503 error, try again with this flag.


<inferencesnippet>

<curl>
```diff
curl https://api-inference.huggingface.co/models/MODEL_ID \
-X POST \
-d '{"inputs": "Can you please let us know more details about your "}' \
-H "Authorization: Bearer hf_***" \
-H "Content-Type: application/json" \
+ -H "x-wait-for-model: true"
```
</curl>

<python>
```diff
import requests

API_URL = "https://api-inference.huggingface.co/models/MODEL_ID"
headers = {
"Authorization": "Bearer hf_***",
"Content-Type": "application/json",
+ "x-wait-for-model": "true"
}
data = {
"inputs": "Can you please let us know more details about your "
}
response = requests.post(API_URL, headers=headers, json=data)
print(response.json())
```

</python>

<js>
```diff
import fetch from "node-fetch";

async function query(data) {
const response = await fetch(
"https://api-inference.huggingface.co/models/MODEL_ID",
{
method: "POST",
headers: {
Authorization: `Bearer hf_***`,
"Content-Type": "application/json",
+ "x-wait-for-model": "true"
},
body: JSON.stringify(data),
}
);
const result = await response.json();
return result;
}

query({
inputs: "Can you please let us know more details about your "
}).then((response) => {
console.log(JSON.stringify(response, null, 2));
});

```

</js>

</inferencesnippet>
11 changes: 11 additions & 0 deletions docs/api-inference/rate-limits.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Rate Limits

The Inference API has rate limits based on the number of requests. These rate limits are subject to change in the future to be compute-based or token-based.

Serverless API is not meant to be used for heavy production applications. If you need higher rate limits, consider [Inference Endpoints](https://huggingface.co/docs/inference/endpoints) to have dedicated resources.

| User Tier | Rate Limit |
|---------------------|---------------------------|
| Unregistered Users | 1 request per hour |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is 0 no? for Unregistered Users

| Signed-up Users | 300 requests per hour |
| PRO and Enterprise Users | 1000 requests per hour |
28 changes: 28 additions & 0 deletions docs/api-inference/supported-models.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Supported Models

Given the fast-paced nature of the open ML ecosystem, the Inference API exposes models that have large community interest and are in active use (based on recent likes, downloads, and usage). Because of this, deployed models can be swapped without prior notice. The Hugging Face stack aims to keep all the latest popular models warm and ready to use.

You can find:

* **[Warm models](https://huggingface.co/models?inference=warm&sort=trending):** models ready to be used.
* **[Cold models](https://huggingface.co/models?inference=cold&sort=trending):** models that are not loaded but can be used.
* **[Frozen models](https://huggingface.co/models?inference=frozen&sort=trending):** models that currently can't be run with the API.

TODO: add screenshot

## What do I get with a PRO subscription?

In addition to thousands of public models available in the Hub, PRO and Enterprise users get higher [rate limits](./rate-limits) and free access to the following models:


| Model | Size | Context Length | Use |
|--------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------|--------------------------------------------------------------|
| Meta Llama 3.1 Instruct | [8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct), [70B](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct) | 128k tokens | High quality multilingual chat model with large context length |
| Meta Llama 3 Instruct | [8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct), [70B](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) | 8k tokens | One of the best chat models |
| Llama 2 Chat | [7B](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf), [13B](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf), [70B](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf) | 4k tokens | One of the best conversational models |
| Bark | [0.9B](https://huggingface.co/suno/bark) | - | Text to audio generation |


## Running Private Models

The free Serverless API is designed to run popular public models. If you have a private model, you can use [Inference Endpoints](https://huggingface.co/docs/inference/endpoints) to deploy it.
Loading
Loading