Skip to content

Commit

Permalink
New api docs structure (#1379)
Browse files Browse the repository at this point in the history
* Add draft of docs structure

* Add index page

* Prepare overview and rate limits

* Manage redirects

* Clean up

* Apply suggestions from code review

Co-authored-by: Pedro Cuenca <[email protected]>

* Apply suggestions from review

* Add additional headers

* Apply suggestions from code review

Co-authored-by: Lucain <[email protected]>

* Incorporate reviewer's feedback

* First draft for text-to-image, image-to-image + generate script (#1384)

* First draft for text-to-image

* add correct code snippets

* Update docs/api-inference/tasks/text-to-image.md

Co-authored-by: Omar Sanseviero <[email protected]>

* better table?

* Generate tasks pages from script (#1386)

* init project

* first script to generate task pages

* commit generated content

* generate payload table as well

* so undecisive

* hey

* better ?

* Add image-to-image page

* template for snippets section + few things

* few things

* Update scripts/api-inference/templates/specs_headers.handlebars

Co-authored-by: Omar Sanseviero <[email protected]>

* Update scripts/api-inference/templates/specs_headers.handlebars

Co-authored-by: Omar Sanseviero <[email protected]>

* generate

* fetch inference status

---------

Co-authored-by: Omar Sanseviero <[email protected]>

* Add getting started

* Add draft of docs structure

* Add index page

* Prepare overview and rate limits

* Manage redirects

* Clean up

* Apply suggestions from review

* Apply suggestions from code review

Co-authored-by: Pedro Cuenca <[email protected]>

* Add additional headers

* Apply suggestions from code review

Co-authored-by: Lucain <[email protected]>

* Incorporate reviewer's feedback

* First draft for text-to-image, image-to-image + generate script (#1384)

* First draft for text-to-image

* add correct code snippets

* Update docs/api-inference/tasks/text-to-image.md

Co-authored-by: Omar Sanseviero <[email protected]>

* better table?

* Generate tasks pages from script (#1386)

* init project

* first script to generate task pages

* commit generated content

* generate payload table as well

* so undecisive

* hey

* better ?

* Add image-to-image page

* template for snippets section + few things

* few things

* Update scripts/api-inference/templates/specs_headers.handlebars

Co-authored-by: Omar Sanseviero <[email protected]>

* Update scripts/api-inference/templates/specs_headers.handlebars

Co-authored-by: Omar Sanseviero <[email protected]>

* generate

* fetch inference status

---------

Co-authored-by: Omar Sanseviero <[email protected]>

* Add getting started

* Update docs/api-inference/getting_started.md

Co-authored-by: Lucain <[email protected]>

* Draft to add text-generation parameters (#1393)

* first draft to add text-generation parameters

* headers

* more structure

* add chat-completion

* better handling of arrays

* better handling of parameters

* Add new tasks pages (fill mask, summarization, question answering, sentence similarity) (#1394)

* add fill mask

* add summarization

* add question answering

* Table question answering

* handle array output

* Add sentence similarity

* text classification (almost)

* better with an enum

* Add mask token

* capitalize

* remove sentence-similarity

* Update docs/api-inference/tasks/table_question_answering.md

Co-authored-by: Omar Sanseviero <[email protected]>

---------

Co-authored-by: Omar Sanseviero <[email protected]>

* mention chat completion in text generation docs

* fix chat completion snippets

---------

Co-authored-by: Omar Sanseviero <[email protected]>

* Filter out frozen models from API docs for tasks (#1396)

* Filter out frozen models

* use placeholder

* New api docs suggestions (#1397)

* show as diff

* reorder toctree

* wording update

* diff

* Add comment header on each task page (#1400)

* Add comment header on each task page

* add huggingface.co/api/tasks

* Add even more tasks: token classification, translation and zero shot classification (#1398)

* Add token classification

* add translation task

* add zero shot classification

* more parameters

* More tasks more tasks more tasks! (#1399)

* add ASR

* fix early stopping parameter

* regenrate

* add audio_classification

* Image classification

* Object detection

* image segementation

* unknown when we don't know

* gen

* feature extraction

* update

* regenerate

* pull from main

* coding style

* Update _redirects.yml

* Rename all tasks '_' to '-' (#1405)

* Rename all tasks '_' to '-'

* also for other urls

* Update docs/api-inference/index.md

Co-authored-by: Victor Muštar <[email protected]>

* Apply feedback for "new_api_docs" (#1408)

* Update getting started examples

* Move snippets above specification

* custom link for finegrained token

* Fixes new docs (#1413)

* Misc changes

* Wrap up

* Apply suggestions from code review

* generate

* Add todos to avoid forgetting about them

---------

Co-authored-by: Lucain <[email protected]>
Co-authored-by: Wauplin <[email protected]>

---------

Co-authored-by: Pedro Cuenca <[email protected]>
Co-authored-by: Lucain <[email protected]>
Co-authored-by: Wauplin <[email protected]>
Co-authored-by: Victor Muštar <[email protected]>
  • Loading branch information
5 people authored Sep 12, 2024
1 parent 8febb83 commit 1159582
Show file tree
Hide file tree
Showing 58 changed files with 4,697 additions and 0 deletions.
11 changes: 11 additions & 0 deletions docs/TODOs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
## For API-Inference docs:

From https://github.com/huggingface/hub-docs/pull/1413:
* Use `<inference> for getting started
* Add some screenshots: supported models
* Add flow chart of how API works
* Add table with all tasks
* Add missing tasks: depth estimation and zero shot image classification
* Some tasks have no warm models, should we remove them for now? E.g. https://huggingface.co/models?inference=warm&pipeline_tag=fill-mask&sort=trending BUT many are cold and working, so actually linking to both could make sense - internal issue https://github.com/huggingface-internal/moon-landing/issues/10966
* See also this [google doc](https://docs.google.com/document/d/1xy5Ug4C_qGbqp4x3T3rj_VOyjQzQLlyce-L6I_hYi94/edit?usp=sharing)
* Add CI to auto-generate the docs when handlebars template are updated
5 changes: 5 additions & 0 deletions docs/api-inference/_redirects.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
quicktour: index
detailed_parameters: parameters
parallelism: getting_started
usage: getting_started
faq: index
54 changes: 54 additions & 0 deletions docs/api-inference/_toctree.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
- sections:
- local: index
title: Serverless Inference API
- local: getting-started
title: Getting Started
- local: supported-models
title: Supported Models
- local: rate-limits
title: Rate Limits
- local: security
title: Security
title: Getting Started
- sections:
- local: parameters
title: Parameters
- sections:
- local: tasks/audio-classification
title: Audio Classification
- local: tasks/automatic-speech-recognition
title: Automatic Speech Recognition
- local: tasks/chat-completion
title: Chat Completion
- local: tasks/feature-extraction
title: Feature Extraction
- local: tasks/fill-mask
title: Fill Mask
- local: tasks/image-classification
title: Image Classification
- local: tasks/image-segmentation
title: Image Segmentation
- local: tasks/image-to-image
title: Image to Image
- local: tasks/object-detection
title: Object Detection
- local: tasks/question-answering
title: Question Answering
- local: tasks/summarization
title: Summarization
- local: tasks/table-question-answering
title: Table Question Answering
- local: tasks/text-classification
title: Text Classification
- local: tasks/text-generation
title: Text Generation
- local: tasks/text-to-image
title: Text to Image
- local: tasks/token-classification
title: Token Classification
- local: tasks/translation
title: Translation
- local: tasks/zero-shot-classification
title: Zero Shot Classification
title: Detailed Task Parameters
title: API Reference
95 changes: 95 additions & 0 deletions docs/api-inference/getting-started.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# Getting Started

The Serverless Inference API allows you to easily do inference on a wide range of models and tasks. You can do requests with your favorite tools (Python, cURL, etc). We also provide a Python SDK (`huggingface_hub`) to make it even easier.

We'll do a minimal example using a [sentiment classification model](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest). Please visit task-specific parameters and further documentation in our [API Reference](./parameters).

## Getting a Token

Using the Serverless Inference API requires passing a user token in the request headers. You can get a token by signing up on the Hugging Face website and then going to the [tokens page](https://huggingface.co/settings/tokens/new?globalPermissions=inference.serverless.write&tokenType=fineGrained). We recommend creating a `fine-grained` token with the scope to `Make calls to the serverless Inference API`.

For more details about user tokens, check out [this guide](https://huggingface.co/docs/hub/en/security-tokens).

## cURL

```bash
curl 'https://api-inference.huggingface.co/models/cardiffnlp/twitter-roberta-base-sentiment-latest' \
-H "Authorization: Bearer hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" \
-H 'Content-Type: application/json' \
-d '{"inputs": "Today is a great day"}'
```

## Python

You can use the `requests` library to make a request to the Inference API.

```python
import requests

API_URL = "https://api-inference.huggingface.co/models/cardiffnlp/twitter-roberta-base-sentiment-latest"
headers = {"Authorization": "Bearer hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}
payload = {
"inputs": "Today is a great day",
}

response = requests.post(API_URL, headers=headers, json=payload)
response.json()
```

Hugging Face also provides a [`InferenceClient`](https://huggingface.co/docs/huggingface_hub/guides/inference) that handles inference for you. Make sure to install it with `pip install huggingface_hub` first.

```python
from huggingface_hub import InferenceClient

client = InferenceClient(
"cardiffnlp/twitter-roberta-base-sentiment-latest",
token="hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
)

client.text_classification("Today is a great day")
```

## JavaScript

```js
import fetch from "node-fetch";

async function query(data) {
const response = await fetch(
"https://api-inference.huggingface.co/models/cardiffnlp/twitter-roberta-base-sentiment-latest",
{
method: "POST",
headers: {
Authorization: `Bearer hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx`,
"Content-Type": "application/json",
},
body: JSON.stringify(data),
}
);
const result = await response.json();
return result;
}

query({inputs: "Today is a great day"}).then((response) => {
console.log(JSON.stringify(response, null, 2));
});
```

Hugging Face also provides a [`HfInference`](https://huggingface.co/docs/huggingface.js/inference/classes/HfInference) client that handles inference. Make sure to install it with `npm install @huggingface/inference` first.

```js
import { HfInference } from "@huggingface/inference";

const inference = new HfInference("hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx");

const result = await inference.textClassification({
model: "cardiffnlp/twitter-roberta-base-sentiment-latest",
inputs: "Today is a great day",
});

console.log(result);
```

## Next Steps

Now that you know the basics, you can explore the [API Reference](./parameters.md) to learn more about task-specific settings and parameters.
53 changes: 53 additions & 0 deletions docs/api-inference/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Serverless Inference API

**Instant Access to thousands of ML Models for Fast Prototyping**

Explore the most popular models for text, image, speech, and more — all with a simple API request. Build, test, and experiment without worrying about infrastructure or setup.

---

## Why use the Inference API?

The Serverless Inference API offers a fast and free way to explore thousands of models for a variety of tasks. Whether you're prototyping a new application or experimenting with ML capabilities, this API gives you instant access to high-performing models across multiple domains:

* **Text Generation:** Including large language models and tool-calling prompts, generate and experiment with high-quality responses.
* **Image Generation:** Easily create customized images, including LoRAs for your own styles.
* **Document Embeddings:** Build search and retrieval systems with SOTA embeddings.
* **Classical AI Tasks:** Ready-to-use models for text classification, image classification, speech recognition, and more.

**Fast and Free to Get Started**: The Inference API is free with higher rate limits for PRO users. For production needs, explore [Inference Endpoints](https://ui.endpoints.huggingface.co/) for dedicated resources, autoscaling, advanced security features, and more.

---

## Key Benefits

- 🚀 **Instant Prototyping:** Access powerful models without setup.
- 🎯 **Diverse Use Cases:** One API for text, image, and beyond.
- 🔧 **Developer-Friendly:** Simple requests, fast responses.

---

## Main Features

* Leverage over 800,000+ models from different open-source libraries (transformers, sentence transformers, adapter transformers, diffusers, timm, etc.).
* Use models for a variety of tasks, including text generation, image generation, document embeddings, NER, summarization, image classification, and more.
* Accelerate your prototyping by using GPU-powered models.
* Run very large models that are challenging to deploy in production.
* Production-grade platform without the hassle: built-in automatic scaling, load balancing and caching.

---

## Contents

The documentation is organized into two sections:

* **Getting Started** Learn the basics of how to use the Inference API.
* **API Reference** Dive into task-specific settings and parameters.

---

## Looking for custom support from the Hugging Face team?

<a target="_blank" href="https://huggingface.co/support">
<img alt="HuggingFace Expert Acceleration Program" src="https://cdn-media.huggingface.co/marketing/transformers/new-support-improved.png" style="max-width: 600px; border: 1px solid #eee; border-radius: 4px; box-shadow: 0 1px 2px 0 rgba(0, 0, 0, 0.05);">
</a><br>
145 changes: 145 additions & 0 deletions docs/api-inference/parameters.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
# Parameters


## Additional Options

### Caching

There is a cache layer on the inference API to speed up requests when the inputs are exactly the same. Many models, such as classifiers and embedding models, can use those results as is if they are deterministic, meaning the results will be the same. However, if you use a nondeterministic model, you can disable the cache mechanism from being used, resulting in a real new query.

To do this, you can add `x-use-cache:false` to the request headers. For example

<inferencesnippet>

<curl>
```diff
curl https://api-inference.huggingface.co/models/MODEL_ID \
-X POST \
-d '{"inputs": "Can you please let us know more details about your "}' \
-H "Authorization: Bearer hf_***" \
-H "Content-Type: application/json" \
+ -H "x-use-cache: false"
```
</curl>

<python>
```diff
import requests

API_URL = "https://api-inference.huggingface.co/models/MODEL_ID"
headers = {
"Authorization": "Bearer hf_***",
"Content-Type": "application/json",
+ "x-use-cache": "false"
}
data = {
"inputs": "Can you please let us know more details about your "
}
response = requests.post(API_URL, headers=headers, json=data)
print(response.json())
```

</python>

<js>
```diff
import fetch from "node-fetch";

async function query(data) {
const response = await fetch(
"https://api-inference.huggingface.co/models/MODEL_ID",
{
method: "POST",
headers: {
Authorization: `Bearer hf_***`,
"Content-Type": "application/json",
+ "x-use-cache": "false"
},
body: JSON.stringify(data),
}
);
const result = await response.json();
return result;
}

query({
inputs: "Can you please let us know more details about your "
}).then((response) => {
console.log(JSON.stringify(response, null, 2));
});

```

</js>

</inferencesnippet>

### Wait for the model

When a model is warm, it is ready to be used and you will get a response relatively quickly. However, some models are cold and need to be loaded before they can be used. In that case, you will get a 503 error. Rather than doing many requests until it's loaded, you can wait for the model to be loaded by adding `x-wait-for-model:true` to the request headers. We suggest to only use this flag to wait for the model to be loaded when you are sure that the model is cold. That means, first try the request without this flag and only if you get a 503 error, try again with this flag.


<inferencesnippet>

<curl>
```diff
curl https://api-inference.huggingface.co/models/MODEL_ID \
-X POST \
-d '{"inputs": "Can you please let us know more details about your "}' \
-H "Authorization: Bearer hf_***" \
-H "Content-Type: application/json" \
+ -H "x-wait-for-model: true"
```
</curl>

<python>
```diff
import requests

API_URL = "https://api-inference.huggingface.co/models/MODEL_ID"
headers = {
"Authorization": "Bearer hf_***",
"Content-Type": "application/json",
+ "x-wait-for-model": "true"
}
data = {
"inputs": "Can you please let us know more details about your "
}
response = requests.post(API_URL, headers=headers, json=data)
print(response.json())
```

</python>

<js>
```diff
import fetch from "node-fetch";

async function query(data) {
const response = await fetch(
"https://api-inference.huggingface.co/models/MODEL_ID",
{
method: "POST",
headers: {
Authorization: `Bearer hf_***`,
"Content-Type": "application/json",
+ "x-wait-for-model": "true"
},
body: JSON.stringify(data),
}
);
const result = await response.json();
return result;
}

query({
inputs: "Can you please let us know more details about your "
}).then((response) => {
console.log(JSON.stringify(response, null, 2));
});

```

</js>

</inferencesnippet>
11 changes: 11 additions & 0 deletions docs/api-inference/rate-limits.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Rate Limits

The Inference API has rate limits based on the number of requests. These rate limits are subject to change in the future to be compute-based or token-based.

Serverless API is not meant to be used for heavy production applications. If you need higher rate limits, consider [Inference Endpoints](https://huggingface.co/docs/inference/endpoints) to have dedicated resources.

| User Tier | Rate Limit |
|---------------------|---------------------------|
| Unregistered Users | 1 request per hour |
| Signed-up Users | 300 requests per hour |
| PRO and Enterprise Users | 1000 requests per hour |
Loading

0 comments on commit 1159582

Please sign in to comment.