Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New api docs structure #1379

Merged
merged 42 commits into from
Sep 12, 2024
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
88b8af1
Add draft of docs structure
osanseviero Aug 19, 2024
f558bdd
Add index page
osanseviero Aug 20, 2024
8b6230f
Prepare overview and rate limits
osanseviero Aug 21, 2024
6380dfe
Manage redirects
osanseviero Aug 21, 2024
9df929a
Clean up
osanseviero Aug 21, 2024
60ad476
Apply suggestions from code review
osanseviero Aug 21, 2024
a93f0dc
Apply suggestions from review
osanseviero Aug 21, 2024
4069586
Merge branch 'new_api_docs' of github.com:huggingface/hub-docs into n…
osanseviero Aug 21, 2024
f2610b7
Add additional headers
osanseviero Aug 23, 2024
c0bee69
Apply suggestions from code review
osanseviero Aug 26, 2024
6294514
Incorporate reviewer's feedback
osanseviero Aug 26, 2024
12ba289
First draft for text-to-image, image-to-image + generate script (#1384)
Wauplin Aug 27, 2024
eb6171e
Merge branches 'main' and 'new_api_docs' of github.com:huggingface/hu…
osanseviero Aug 27, 2024
9b1e735
Add getting started
osanseviero Aug 27, 2024
fb57a2d
Add draft of docs structure
osanseviero Aug 19, 2024
bad42b0
Add index page
osanseviero Aug 20, 2024
d656272
Prepare overview and rate limits
osanseviero Aug 21, 2024
01983fc
Manage redirects
osanseviero Aug 21, 2024
dfdc02d
Clean up
osanseviero Aug 21, 2024
abe2d4f
Apply suggestions from review
osanseviero Aug 21, 2024
042a0e4
Apply suggestions from code review
osanseviero Aug 21, 2024
d774816
Add additional headers
osanseviero Aug 23, 2024
a097022
Apply suggestions from code review
osanseviero Aug 26, 2024
9bf223e
Incorporate reviewer's feedback
osanseviero Aug 26, 2024
51750bf
First draft for text-to-image, image-to-image + generate script (#1384)
Wauplin Aug 27, 2024
0c34106
Add getting started
osanseviero Aug 27, 2024
cc9b363
Merge branch 'new_api_docs' of github.com:huggingface/hub-docs into n…
Wauplin Aug 27, 2024
ac640c8
Update docs/api-inference/getting_started.md
osanseviero Aug 28, 2024
b785d8b
Draft to add text-generation parameters (#1393)
Wauplin Aug 28, 2024
22c6bae
Filter out frozen models from API docs for tasks (#1396)
Wauplin Aug 29, 2024
4039c7e
New api docs suggestions (#1397)
Wauplin Aug 29, 2024
49e8f67
Add comment header on each task page (#1400)
Wauplin Aug 30, 2024
20c17d0
Add even more tasks: token classification, translation and zero shot …
Wauplin Aug 30, 2024
528ea95
regenerate
Wauplin Aug 30, 2024
f267d86
pull from main
Wauplin Aug 30, 2024
ed5e37b
coding style
Wauplin Sep 4, 2024
2e1e64d
Update _redirects.yml
osanseviero Sep 4, 2024
bf973e0
Rename all tasks '_' to '-' (#1405)
Wauplin Sep 4, 2024
2b6f051
Update docs/api-inference/index.md
Wauplin Sep 5, 2024
92baadc
Apply feedback for "new_api_docs" (#1408)
Wauplin Sep 5, 2024
e9eff75
Fixes new docs (#1413)
osanseviero Sep 12, 2024
c65a120
Merge branch 'main' into new_api_docs
Wauplin Sep 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/api-inference/_redirects.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
quicktour: overview
detailed_parameters: parameters
parallelism: TODO
usage: getting_started
faq: overview
17 changes: 17 additions & 0 deletions docs/api-inference/_toctree.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
- sections:
- local: index
title: Serverless Inference API
- local: overview
title: Overview
- local: getting_started
title: Getting Started
- local: rate_limits
title: Rate Limits
osanseviero marked this conversation as resolved.
Show resolved Hide resolved
title: title
osanseviero marked this conversation as resolved.
Show resolved Hide resolved
- sections:
- local: parameters
title: Parameters
- sections:
- local: tasks/fill_mask
title: Fill Mask
title: Parameters
osanseviero marked this conversation as resolved.
Show resolved Hide resolved
3 changes: 3 additions & 0 deletions docs/api-inference/getting_started.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Getting Started

TODO:
50 changes: 50 additions & 0 deletions docs/api-inference/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Serverless Inference API

**Instant Access to 800,000+ ML Models for Fast Prototyping**
Wauplin marked this conversation as resolved.
Show resolved Hide resolved

Explore the most popular models for text, image, speech, and more — all with a simple API request. Build, test, and experiment without worrying about infrastructure or setup.

---

## Why use the Inference API?

The Serverless Inference API offers a fast and free way to explore thousands of models for a variety of tasks. Whether you're prototyping a new application or experimenting with ML capabilities, this API gives you instant access to high-performing models across multiple domains:

* **Text Generation:** Including large language models and tool-calling prompts, generate and experiment with high-quality responses.
* **Image Generation:** Easily create customized images, including LoRAs for your own styles.
* **Document Embeddings:** Build search and retrieval systems with SOTA embeddings.
* **Classical AI Tasks:** Ready-to-use models for text classification, image classification, speech recognition, and more.

TODO: add some flow chart image

⚡ **Fast and Free to Get Started**: The Inference API is free with rate limits. For production needs, explore [Inference Endpoints](https://huggingface.co/docs/inference-endpoints/index) for dedicated resources, autoscaling, advanced security features, and more.
osanseviero marked this conversation as resolved.
Show resolved Hide resolved
osanseviero marked this conversation as resolved.
Show resolved Hide resolved

---

## Key Benefits

- 🚀 **Instant Prototyping:** Access powerful models without setup.
- 🎯 **Diverse Use Cases:** One API for text, image, and beyond.
- 🔧 **Developer-Friendly:** Simple requests, fast responses.

---

## Contents

The documentation is organized into two sections:

* **Getting Started** Learn the basics of how to use the Inference API.
* **Parameters** Dive into task-specific settings and parameters.
osanseviero marked this conversation as resolved.
Show resolved Hide resolved

---

## Looking for custom support from the Hugging Face team?

<a target="_blank" href="https://huggingface.co/support">
<img alt="HuggingFace Expert Acceleration Program" src="https://cdn-media.huggingface.co/marketing/transformers/new-support-improved.png" style="max-width: 600px; border: 1px solid #eee; border-radius: 4px; box-shadow: 0 1px 2px 0 rgba(0, 0, 0, 0.05);">
</a><br>
Wauplin marked this conversation as resolved.
Show resolved Hide resolved

## Hugging Face is trusted in production by over 10,000 companies

<img class="block dark:hidden !shadow-none !border-0 !rounded-none" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/inference-api/companies-light.png" width="600">
<img class="hidden dark:block !shadow-none !border-0 !rounded-none" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/inference-api/companies-dark.png" width="600">
49 changes: 49 additions & 0 deletions docs/api-inference/overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Overview

## Main Features

* Leverage over 800,000+ models from different open-source libraries (transformers, sentence transformers, adapter transformers, diffusers, timm, etc.).
* Use models for a variety of tasks, including text generation, image generation, document embeddings, NER, summarization, image classification, and more.
* Accelerate your prototyping by using GPU-powered models.
* Run very large models that are challenging to deploy in production.
* Benefit from the built-in automatic scaling, load balancing and caching.
osanseviero marked this conversation as resolved.
Show resolved Hide resolved

## Eligibility

Given the fast-paced nature of the open ML ecosystem, the Inference API allows using models that have large community interest and are actively being used(based on recent likes, downloads, and usage). Because of this, deployed models can be swapped without prior notice.
osanseviero marked this conversation as resolved.
Show resolved Hide resolved

You can find:

* **[Warm models](https://huggingface.co/models?inference=warm&sort=trending):** models ready to be used.
* **[Cold models](https://huggingface.co/models?inference=cold&sort=trending):** models that are not loaded but can be used.
* **[Frozen models](https://huggingface.co/models?inference=frozen&sort=trending):** models that currently can't be run with the API.

osanseviero marked this conversation as resolved.
Show resolved Hide resolved
TODO: add screenshot

## GPU vs CPU

By default, the Inference API uses GPUs to run large models. For small models that can run well on CPU, such as small text classification and text embeddings, the API will automatically switch to CPU to save costs.
osanseviero marked this conversation as resolved.
Show resolved Hide resolved

## Inference for PRO

In addition to thousands of public models available in the Hub, PRO and Enteprise users get free access and higher rate limits to the following models:
osanseviero marked this conversation as resolved.
Show resolved Hide resolved


| Model | Size | Context Length | Use |
|--------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------|--------------------------------------------------------------|
| Meta Llama 3.1Instruct | [8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct), [70B](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct) | 128k tokens | High quality multilingual chat model with large context length |
osanseviero marked this conversation as resolved.
Show resolved Hide resolved
| Meta Llama 3 Instruct | [8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct), [70B](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) | 8k tokens | One of the best chat models |
| Llama 2 Chat | [7B](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf), [13B](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf), [70B](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf) | 4k tokens | One of the best conversational models |
osanseviero marked this conversation as resolved.
Show resolved Hide resolved
| Bark | [0.9B](https://huggingface.co/suno/bark) | - | Text to audio generation |


## FAQ

### Running Private Models

The free Serverless API is designed to run popular public models. If you have a private model, you can use the [Inference Endpoints](https://huggingface.co/docs/inference/endpoints) to deploy your model.
osanseviero marked this conversation as resolved.
Show resolved Hide resolved

### Fine-tuning Models

To automatically finetune a model on your data, please try [AutoTrain](https://huggingface.co/autotrain). It’s a no-code solution for automatically training and deploying a model; all you have to do is upload your data!
osanseviero marked this conversation as resolved.
Show resolved Hide resolved

16 changes: 16 additions & 0 deletions docs/api-inference/parameters.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Parameters

Table with
- Domain
- Task
- Whether it's supported in Inference API
- Supported libraries (not sure)
- Recommended model
- Link to model specific page
Wauplin marked this conversation as resolved.
Show resolved Hide resolved



## Additional parameters (different page?)

- Controling cache
- Modifying the task used by a model (Which task is used by this model?)
11 changes: 11 additions & 0 deletions docs/api-inference/rate_limits.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Rate Limits

The Inference API has temporary rate limits based on the number of requests. These rate limits are subject to change in the future to be compute-based or token-based.
osanseviero marked this conversation as resolved.
Show resolved Hide resolved

Serverless API is not meant to be used for heavy production applications. If you need higher rate limits, using [Inference Endpoints](https://huggingface.co/docs/inference/endpoints) to have dedicated resources.
osanseviero marked this conversation as resolved.
Show resolved Hide resolved

| User Tier | Rate Limit |
|---------------------|---------------------------|
| Unregistered Users | 1 request per hour |
| Signed-up Users | 300 requests per hour |
| PRO and Enterprise Users | 1000 requests per hour |
6 changes: 6 additions & 0 deletions docs/api-inference/tasks/fill_mask.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
## Fill Mask

Mask filling is the task of predicting the right word (token to be precise) in the middle of a sequence.

Automated docs below