New api docs structure (#1379)

* Add draft of docs structure * Add index page * Prepare overview and rate limits * Manage redirects * Clean up * Apply suggestions from code review Co-authored-by: Pedro Cuenca <[email protected]> * Apply suggestions from review * Add additional headers * Apply suggestions from code review Co-authored-by: Lucain <[email protected]> * Incorporate reviewer's feedback * First draft for text-to-image, image-to-image + generate script (#1384) * First draft for text-to-image * add correct code snippets * Update docs/api-inference/tasks/text-to-image.md Co-authored-by: Omar Sanseviero <[email protected]> * better table? * Generate tasks pages from script (#1386) * init project * first script to generate task pages * commit generated content * generate payload table as well * so undecisive * hey * better ? * Add image-to-image page * template for snippets section + few things * few things * Update scripts/api-inference/templates/specs_headers.handlebars Co-authored-by: Omar Sanseviero <[email protected]> * Update scripts/api-inference/templates/specs_headers.handlebars Co-authored-by: Omar Sanseviero <[email protected]> * generate * fetch inference status --------- Co-authored-by: Omar Sanseviero <[email protected]> * Add getting started * Add draft of docs structure * Add index page * Prepare overview and rate limits * Manage redirects * Clean up * Apply suggestions from review * Apply suggestions from code review Co-authored-by: Pedro Cuenca <[email protected]> * Add additional headers * Apply suggestions from code review Co-authored-by: Lucain <[email protected]> * Incorporate reviewer's feedback * First draft for text-to-image, image-to-image + generate script (#1384) * First draft for text-to-image * add correct code snippets * Update docs/api-inference/tasks/text-to-image.md Co-authored-by: Omar Sanseviero <[email protected]> * better table? * Generate tasks pages from script (#1386) * init project * first script to generate task pages * commit generated content * generate payload table as well * so undecisive * hey * better ? * Add image-to-image page * template for snippets section + few things * few things * Update scripts/api-inference/templates/specs_headers.handlebars Co-authored-by: Omar Sanseviero <[email protected]> * Update scripts/api-inference/templates/specs_headers.handlebars Co-authored-by: Omar Sanseviero <[email protected]> * generate * fetch inference status --------- Co-authored-by: Omar Sanseviero <[email protected]> * Add getting started * Update docs/api-inference/getting_started.md Co-authored-by: Lucain <[email protected]> * Draft to add text-generation parameters (#1393) * first draft to add text-generation parameters * headers * more structure * add chat-completion * better handling of arrays * better handling of parameters * Add new tasks pages (fill mask, summarization, question answering, sentence similarity) (#1394) * add fill mask * add summarization * add question answering * Table question answering * handle array output * Add sentence similarity * text classification (almost) * better with an enum * Add mask token * capitalize * remove sentence-similarity * Update docs/api-inference/tasks/table_question_answering.md Co-authored-by: Omar Sanseviero <[email protected]> --------- Co-authored-by: Omar Sanseviero <[email protected]> * mention chat completion in text generation docs * fix chat completion snippets --------- Co-authored-by: Omar Sanseviero <[email protected]> * Filter out frozen models from API docs for tasks (#1396) * Filter out frozen models * use placeholder * New api docs suggestions (#1397) * show as diff * reorder toctree * wording update * diff * Add comment header on each task page (#1400) * Add comment header on each task page * add huggingface.co/api/tasks * Add even more tasks: token classification, translation and zero shot classification (#1398) * Add token classification * add translation task * add zero shot classification * more parameters * More tasks more tasks more tasks! (#1399) * add ASR * fix early stopping parameter * regenrate * add audio_classification * Image classification * Object detection * image segementation * unknown when we don't know * gen * feature extraction * update * regenerate * pull from main * coding style * Update _redirects.yml * Rename all tasks '_' to '-' (#1405) * Rename all tasks '_' to '-' * also for other urls * Update docs/api-inference/index.md Co-authored-by: Victor Muštar <[email protected]> * Apply feedback for "new_api_docs" (#1408) * Update getting started examples * Move snippets above specification * custom link for finegrained token * Fixes new docs (#1413) * Misc changes * Wrap up * Apply suggestions from code review * generate * Add todos to avoid forgetting about them --------- Co-authored-by: Lucain <[email protected]> Co-authored-by: Wauplin <[email protected]> --------- Co-authored-by: Pedro Cuenca <[email protected]> Co-authored-by: Lucain <[email protected]> Co-authored-by: Wauplin <[email protected]> Co-authored-by: Victor Muštar <[email protected]>
huggingface · Sep 12, 2024 · 1159582 · 1159582
1 parent 8febb83
commit 1159582
Show file tree

Hide file tree

Showing 58 changed files with 4,697 additions and 0 deletions.
diff --git a/docs/TODOs.md b/docs/TODOs.md
@@ -0,0 +1,11 @@
+## For API-Inference docs:
+
+From https://github.com/huggingface/hub-docs/pull/1413:
+* Use `<inference> for getting started
+* Add some screenshots: supported models
+* Add flow chart of how API works
+* Add table with all tasks
+* Add missing tasks: depth estimation and zero shot image classification
+* Some tasks have no warm models, should we remove them for now? E.g. https://huggingface.co/models?inference=warm&pipeline_tag=fill-mask&sort=trending BUT many are cold and working, so actually linking to both could make sense - internal issue https://github.com/huggingface-internal/moon-landing/issues/10966
+* See also this [google doc](https://docs.google.com/document/d/1xy5Ug4C_qGbqp4x3T3rj_VOyjQzQLlyce-L6I_hYi94/edit?usp=sharing)
+* Add CI to auto-generate the docs when handlebars template are updated
diff --git a/docs/api-inference/_redirects.yml b/docs/api-inference/_redirects.yml
@@ -0,0 +1,5 @@
+quicktour: index
+detailed_parameters: parameters
+parallelism: getting_started
+usage: getting_started
+faq: index
diff --git a/docs/api-inference/_toctree.yml b/docs/api-inference/_toctree.yml
@@ -0,0 +1,54 @@
+- sections:
+  - local: index
+    title: Serverless Inference API
+  - local: getting-started
+    title: Getting Started
+  - local: supported-models
+    title: Supported Models
+  - local: rate-limits
+    title: Rate Limits
+  - local: security
+    title: Security
+  title: Getting Started
+- sections:
+  - local: parameters
+    title: Parameters
+  - sections:
+    - local: tasks/audio-classification
+      title: Audio Classification
+    - local: tasks/automatic-speech-recognition
+      title: Automatic Speech Recognition
+    - local: tasks/chat-completion
+      title: Chat Completion
+    - local: tasks/feature-extraction
+      title: Feature Extraction
+    - local: tasks/fill-mask
+      title: Fill Mask
+    - local: tasks/image-classification
+      title: Image Classification
+    - local: tasks/image-segmentation
+      title: Image Segmentation
+    - local: tasks/image-to-image
+      title: Image to Image
+    - local: tasks/object-detection
+      title: Object Detection
+    - local: tasks/question-answering
+      title: Question Answering
+    - local: tasks/summarization
+      title: Summarization
+    - local: tasks/table-question-answering
+      title: Table Question Answering
+    - local: tasks/text-classification
+      title: Text Classification
+    - local: tasks/text-generation
+      title: Text Generation
+    - local: tasks/text-to-image
+      title: Text to Image
+    - local: tasks/token-classification
+      title: Token Classification
+    - local: tasks/translation
+      title: Translation
+    - local: tasks/zero-shot-classification
+      title: Zero Shot Classification
+    title: Detailed Task Parameters
+  title: API Reference
diff --git a/docs/api-inference/getting-started.md b/docs/api-inference/getting-started.md
@@ -0,0 +1,95 @@
+# Getting Started
+
+The Serverless Inference API allows you to easily do inference on a wide range of models and tasks. You can do requests with your favorite tools (Python, cURL, etc). We also provide a Python SDK (`huggingface_hub`) to make it even easier.
+
+We'll do a minimal example using a [sentiment classification model](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest). Please visit task-specific parameters and further documentation in our [API Reference](./parameters).
+
+## Getting a Token
+
+Using the Serverless Inference API requires passing a user token in the request headers. You can get a token by signing up on the Hugging Face website and then going to the [tokens page](https://huggingface.co/settings/tokens/new?globalPermissions=inference.serverless.write&tokenType=fineGrained). We recommend creating a `fine-grained` token with the scope to `Make calls to the serverless Inference API`.
+
+For more details about user tokens, check out [this guide](https://huggingface.co/docs/hub/en/security-tokens).
+
+## cURL
+
+```bash
+curl 'https://api-inference.huggingface.co/models/cardiffnlp/twitter-roberta-base-sentiment-latest' \
+-H "Authorization: Bearer hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" \
+-H 'Content-Type: application/json' \
+-d '{"inputs": "Today is a great day"}'
+```
+
+## Python
+
+You can use the `requests` library to make a request to the Inference API.
+
+```python
+import requests
+
+API_URL = "https://api-inference.huggingface.co/models/cardiffnlp/twitter-roberta-base-sentiment-latest"
+headers = {"Authorization": "Bearer hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}
+payload = {
+    "inputs": "Today is a great day",
+}
+
+response = requests.post(API_URL, headers=headers, json=payload)
+response.json()
+```
+
+Hugging Face also provides a [`InferenceClient`](https://huggingface.co/docs/huggingface_hub/guides/inference) that handles inference for you. Make sure to install it with `pip install huggingface_hub` first.
+
+```python
+from huggingface_hub import InferenceClient
+
+client = InferenceClient(
+    "cardiffnlp/twitter-roberta-base-sentiment-latest",
+    token="hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
+)
+
+client.text_classification("Today is a great day")
+```
+
+## JavaScript
+
+```js
+import fetch from "node-fetch";
+
+async function query(data) {
+    const response = await fetch(
+        "https://api-inference.huggingface.co/models/cardiffnlp/twitter-roberta-base-sentiment-latest",
+        {
+            method: "POST",
+            headers: {
+                Authorization: `Bearer hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx`,
+                "Content-Type": "application/json",
+            },
+            body: JSON.stringify(data),
+        }
+    );
+    const result = await response.json();
+    return result;
+}
+
+query({inputs: "Today is a great day"}).then((response) => {
+    console.log(JSON.stringify(response, null, 2));
+});
+```
+
+Hugging Face also provides a [`HfInference`](https://huggingface.co/docs/huggingface.js/inference/classes/HfInference) client that handles inference. Make sure to install it with `npm install @huggingface/inference` first.
+
+```js
+import { HfInference } from "@huggingface/inference";
+
+const inference = new HfInference("hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx");
+
+const result = await inference.textClassification({
+    model: "cardiffnlp/twitter-roberta-base-sentiment-latest",
+    inputs: "Today is a great day",
+});
+
+console.log(result);
+```
+
+## Next Steps
+
+Now that you know the basics, you can explore the [API Reference](./parameters.md) to learn more about task-specific settings and parameters. 
diff --git a/docs/api-inference/index.md b/docs/api-inference/index.md
@@ -0,0 +1,53 @@
+# Serverless Inference API
+
+**Instant Access to thousands of ML Models for Fast Prototyping**
+
+Explore the most popular models for text, image, speech, and more — all with a simple API request. Build, test, and experiment without worrying about infrastructure or setup.
+
+---
+
+## Why use the Inference API?
+
+The Serverless Inference API offers a fast and free way to explore thousands of models for a variety of tasks. Whether you're prototyping a new application or experimenting with ML capabilities, this API gives you instant access to high-performing models across multiple domains:
+
+* **Text Generation:** Including large language models and tool-calling prompts, generate and experiment with high-quality responses.
+* **Image Generation:** Easily create customized images, including LoRAs for your own styles.
+* **Document Embeddings:** Build search and retrieval systems with SOTA embeddings.
+* **Classical AI Tasks:** Ready-to-use models for text classification, image classification, speech recognition, and more.
+
+⚡ **Fast and Free to Get Started**: The Inference API is free with higher rate limits for PRO users. For production needs, explore [Inference Endpoints](https://ui.endpoints.huggingface.co/) for dedicated resources, autoscaling, advanced security features, and more.
+
+---
+
+## Key Benefits
+
+- 🚀 **Instant Prototyping:** Access powerful models without setup.
+- 🎯 **Diverse Use Cases:** One API for text, image, and beyond.
+- 🔧 **Developer-Friendly:** Simple requests, fast responses.
+
+---
+
+## Main Features
+
+* Leverage over 800,000+ models from different open-source libraries (transformers, sentence transformers, adapter transformers, diffusers, timm, etc.).
+* Use models for a variety of tasks, including text generation, image generation, document embeddings, NER, summarization, image classification, and more.
+* Accelerate your prototyping by using GPU-powered models.
+* Run very large models that are challenging to deploy in production.
+* Production-grade platform without the hassle: built-in automatic scaling, load balancing and caching.
+
+---
+
+## Contents
+
+The documentation is organized into two sections:
+
+* **Getting Started** Learn the basics of how to use the Inference API.
+* **API Reference** Dive into task-specific settings and parameters.
+
+---
+
+## Looking for custom support from the Hugging Face team?
+
+<a target="_blank" href="https://huggingface.co/support">
+    <img alt="HuggingFace Expert Acceleration Program" src="https://cdn-media.huggingface.co/marketing/transformers/new-support-improved.png" style="max-width: 600px; border: 1px solid #eee; border-radius: 4px; box-shadow: 0 1px 2px 0 rgba(0, 0, 0, 0.05);">
+</a><br>
diff --git a/docs/api-inference/parameters.md b/docs/api-inference/parameters.md
@@ -0,0 +1,145 @@
+# Parameters
+
+
+## Additional Options
+
+### Caching
+
+There is a cache layer on the inference API to speed up requests when the inputs are exactly the same. Many models, such as classifiers and embedding models, can use those results as is if they are deterministic, meaning the results will be the same. However, if you use a nondeterministic model, you can disable the cache mechanism from being used, resulting in a real new query.
+
+To do this, you can add `x-use-cache:false` to the request headers. For example
+
+<inferencesnippet>
+
+<curl>
+```diff
+curl https://api-inference.huggingface.co/models/MODEL_ID \
+    -X POST \
+    -d '{"inputs": "Can you please let us know more details about your "}' \
+    -H "Authorization: Bearer hf_***" \
+    -H "Content-Type: application/json" \
++   -H "x-use-cache: false"
+```
+</curl>
+
+<python>
+```diff
+import requests
+
+API_URL = "https://api-inference.huggingface.co/models/MODEL_ID"
+headers = {
+    "Authorization": "Bearer hf_***",
+    "Content-Type": "application/json",
++   "x-use-cache": "false"
+}
+data = {
+    "inputs": "Can you please let us know more details about your "
+}
+response = requests.post(API_URL, headers=headers, json=data)
+print(response.json())
+```
+
+</python>
+
+<js>
+```diff
+import fetch from "node-fetch";
+
+async function query(data) {
+    const response = await fetch(
+        "https://api-inference.huggingface.co/models/MODEL_ID",
+        {
+            method: "POST",
+            headers: {
+                Authorization: `Bearer hf_***`,
+                "Content-Type": "application/json",
++               "x-use-cache": "false"
+            },
+            body: JSON.stringify(data),
+        }
+    );
+    const result = await response.json();
+    return result;
+}
+
+query({
+    inputs: "Can you please let us know more details about your "
+}).then((response) => {
+    console.log(JSON.stringify(response, null, 2));
+});
+
+```
+
+</js>
+
+</inferencesnippet>
+
+### Wait for the model
+
+When a model is warm, it is ready to be used and you will get a response relatively quickly. However, some models are cold and need to be loaded before they can be used. In that case, you will get a 503 error. Rather than doing many requests until it's loaded, you can wait for the model to be loaded by adding `x-wait-for-model:true` to the request headers. We suggest to only use this flag to wait for the model to be loaded when you are sure that the model is cold. That means, first try the request without this flag and only if you get a 503 error, try again with this flag.
+
+
+<inferencesnippet>
+
+<curl>
+```diff
+curl https://api-inference.huggingface.co/models/MODEL_ID \
+    -X POST \
+    -d '{"inputs": "Can you please let us know more details about your "}' \
+    -H "Authorization: Bearer hf_***" \
+    -H "Content-Type: application/json" \
++   -H "x-wait-for-model: true"
+```
+</curl>
+
+<python>
+```diff
+import requests
+
+API_URL = "https://api-inference.huggingface.co/models/MODEL_ID"
+headers = {
+    "Authorization": "Bearer hf_***",
+    "Content-Type": "application/json",
++   "x-wait-for-model": "true"
+}
+data = {
+    "inputs": "Can you please let us know more details about your "
+}
+response = requests.post(API_URL, headers=headers, json=data)
+print(response.json())
+```
+
+</python>
+
+<js>
+```diff
+import fetch from "node-fetch";
+
+async function query(data) {
+    const response = await fetch(
+        "https://api-inference.huggingface.co/models/MODEL_ID",
+        {
+            method: "POST",
+            headers: {
+                Authorization: `Bearer hf_***`,
+                "Content-Type": "application/json",
++               "x-wait-for-model": "true"
+            },
+            body: JSON.stringify(data),
+        }
+    );
+    const result = await response.json();
+    return result;
+}
+
+query({
+    inputs: "Can you please let us know more details about your "
+}).then((response) => {
+    console.log(JSON.stringify(response, null, 2));
+});
+
+```
+
+</js>
+
+</inferencesnippet>
diff --git a/docs/api-inference/rate-limits.md b/docs/api-inference/rate-limits.md
@@ -0,0 +1,11 @@
+# Rate Limits
+
+The Inference API has rate limits based on the number of requests. These rate limits are subject to change in the future to be compute-based or token-based. 
+
+Serverless API is not meant to be used for heavy production applications. If you need higher rate limits, consider [Inference Endpoints](https://huggingface.co/docs/inference/endpoints) to have dedicated resources.
+
+| User Tier           | Rate Limit                |
+|---------------------|---------------------------|
+| Unregistered Users  | 1 request per hour        |
+| Signed-up Users     | 300 requests per hour     |
+| PRO and Enterprise Users           | 1000 requests per hour    |