From 115958294434ed0e2e23322f7c1bde209f91dde9 Mon Sep 17 00:00:00 2001
From: Omar Sanseviero <osanseviero@gmail.com>
Date: Thu, 12 Sep 2024 16:37:42 +0200
Subject: [PATCH] New api docs structure (#1379)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* Add draft of docs structure

* Add index page

* Prepare overview and rate limits

* Manage redirects

* Clean up

* Apply suggestions from code review

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Apply suggestions from review

* Add additional headers

* Apply suggestions from code review

Co-authored-by: Lucain <lucain@huggingface.co>

* Incorporate reviewer's feedback

* First draft for text-to-image, image-to-image + generate script (#1384)

* First draft for text-to-image

* add correct code snippets

* Update docs/api-inference/tasks/text-to-image.md

Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>

* better table?

* Generate tasks pages from script (#1386)

* init project

* first script to generate task pages

* commit generated content

* generate payload table as well

* so undecisive

* hey

* better ?

* Add image-to-image page

* template for snippets section + few things

* few things

* Update scripts/api-inference/templates/specs_headers.handlebars

Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>

* Update scripts/api-inference/templates/specs_headers.handlebars

Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>

* generate

* fetch inference status

---------

Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>

* Add getting started

* Add draft of docs structure

* Add index page

* Prepare overview and rate limits

* Manage redirects

* Clean up

* Apply suggestions from review

* Apply suggestions from code review

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Add additional headers

* Apply suggestions from code review

Co-authored-by: Lucain <lucain@huggingface.co>

* Incorporate reviewer's feedback

* First draft for text-to-image, image-to-image + generate script (#1384)

* First draft for text-to-image

* add correct code snippets

* Update docs/api-inference/tasks/text-to-image.md

Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>

* better table?

* Generate tasks pages from script (#1386)

* init project

* first script to generate task pages

* commit generated content

* generate payload table as well

* so undecisive

* hey

* better ?

* Add image-to-image page

* template for snippets section + few things

* few things

* Update scripts/api-inference/templates/specs_headers.handlebars

Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>

* Update scripts/api-inference/templates/specs_headers.handlebars

Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>

* generate

* fetch inference status

---------

Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>

* Add getting started

* Update docs/api-inference/getting_started.md

Co-authored-by: Lucain <lucain@huggingface.co>

* Draft to add text-generation parameters (#1393)

* first draft to add text-generation parameters

* headers

* more structure

* add chat-completion

* better handling of arrays

* better handling of parameters

* Add new tasks pages (fill mask, summarization, question answering, sentence similarity) (#1394)

* add fill mask

* add summarization

* add question answering

* Table question answering

* handle array output

* Add sentence similarity

* text classification (almost)

* better with an enum

* Add mask token

* capitalize

* remove sentence-similarity

* Update docs/api-inference/tasks/table_question_answering.md

Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>

---------

Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>

* mention chat completion in text generation docs

* fix chat completion snippets

---------

Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>

* Filter out frozen models from API docs for tasks (#1396)

* Filter out frozen models

* use placeholder

* New api docs suggestions (#1397)

* show as diff

* reorder toctree

* wording update

* diff

* Add comment header on each task page (#1400)

* Add comment header on each task page

* add huggingface.co/api/tasks

* Add even more tasks: token classification, translation and zero shot classification (#1398)

* Add token classification

* add translation task

* add zero shot classification

* more parameters

* More tasks more tasks more tasks! (#1399)

* add ASR

* fix early stopping parameter

* regenrate

* add audio_classification

* Image classification

* Object detection

* image segementation

* unknown when we don't know

* gen

* feature extraction

* update

* regenerate

* pull from main

* coding style

* Update _redirects.yml

* Rename all tasks '_' to '-' (#1405)

* Rename all tasks '_' to '-'

* also for other urls

* Update docs/api-inference/index.md

Co-authored-by: Victor Muštar <victor.mustar@gmail.com>

* Apply feedback for "new_api_docs" (#1408)

* Update getting started examples

* Move snippets above specification

* custom link for finegrained token

* Fixes new docs (#1413)

* Misc changes

* Wrap up

* Apply suggestions from code review

* generate

* Add todos to avoid forgetting about them

---------

Co-authored-by: Lucain <lucain@huggingface.co>
Co-authored-by: Wauplin <lucainp@gmail.com>

---------

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Lucain <lucain@huggingface.co>
Co-authored-by: Wauplin <lucainp@gmail.com>
Co-authored-by: Victor Muštar <victor.mustar@gmail.com>
---
 docs/TODOs.md                                 |  11 +
 docs/api-inference/_redirects.yml             |   5 +
 docs/api-inference/_toctree.yml               |  54 ++
 docs/api-inference/getting-started.md         |  95 +++
 docs/api-inference/index.md                   |  53 ++
 docs/api-inference/parameters.md              | 145 +++++
 docs/api-inference/rate-limits.md             |  11 +
 docs/api-inference/security.md                |  15 +
 docs/api-inference/supported-models.md        |  26 +
 .../tasks/audio-classification.md             | 129 +++++
 .../tasks/automatic-speech-recognition.md     | 149 +++++
 docs/api-inference/tasks/chat-completion.md   | 222 +++++++
 .../api-inference/tasks/feature-extraction.md | 130 +++++
 docs/api-inference/tasks/fill-mask.md         | 128 +++++
 .../tasks/image-classification.md             | 126 ++++
 .../api-inference/tasks/image-segmentation.md | 129 +++++
 docs/api-inference/tasks/image-to-image.md    |  75 +++
 docs/api-inference/tasks/object-detection.md  | 130 +++++
 .../api-inference/tasks/question-answering.md | 141 +++++
 docs/api-inference/tasks/summarization.md     | 124 ++++
 .../tasks/table-question-answering.md         | 150 +++++
 .../tasks/text-classification.md              | 129 +++++
 docs/api-inference/tasks/text-generation.md   | 216 +++++++
 docs/api-inference/tasks/text-to-image.md     | 133 +++++
 .../tasks/token-classification.md             | 146 +++++
 docs/api-inference/tasks/translation.md       | 126 ++++
 .../tasks/zero-shot-classification.md         | 129 +++++
 scripts/api-inference/.gitignore              |   1 +
 scripts/api-inference/.prettierignore         |   5 +
 scripts/api-inference/README.md               |  11 +
 scripts/api-inference/package.json            |  26 +
 scripts/api-inference/pnpm-lock.yaml          | 541 ++++++++++++++++++
 scripts/api-inference/scripts/.gitignore      |   1 +
 scripts/api-inference/scripts/generate.ts     | 504 ++++++++++++++++
 .../templates/common/page-header.handlebars   |  11 +
 .../common/snippets-template.handlebars       |  42 ++
 .../templates/common/specs-headers.handlebars |   9 +
 .../templates/common/specs-output.handlebars  |   9 +
 .../templates/common/specs-payload.handlebars |   9 +
 .../task/audio-classification.handlebars      |  34 ++
 .../automatic-speech-recognition.handlebars   |  34 ++
 .../templates/task/chat-completion.handlebars |  45 ++
 .../task/feature-extraction.handlebars        |  35 ++
 .../templates/task/fill-mask.handlebars       |  29 +
 .../task/image-classification.handlebars      |  30 +
 .../task/image-segmentation.handlebars        |  30 +
 .../templates/task/image-to-image.handlebars  |  35 ++
 .../task/object-detection.handlebars          |  29 +
 .../task/question-answering.handlebars        |  29 +
 .../templates/task/summarization.handlebars   |  29 +
 .../task/table-question-answering.handlebars  |  29 +
 .../task/text-classification.handlebars       |  29 +
 .../templates/task/text-generation.handlebars |  39 ++
 .../templates/task/text-to-image.handlebars   |  29 +
 .../task/token-classification.handlebars      |  38 ++
 .../templates/task/translation.handlebars     |  29 +
 .../task/zero-shot-classification.handlebars  |  29 +
 scripts/api-inference/tsconfig.json           |  20 +
 58 files changed, 4697 insertions(+)
 create mode 100644 docs/TODOs.md
 create mode 100644 docs/api-inference/_redirects.yml
 create mode 100644 docs/api-inference/_toctree.yml
 create mode 100644 docs/api-inference/getting-started.md
 create mode 100644 docs/api-inference/index.md
 create mode 100644 docs/api-inference/parameters.md
 create mode 100644 docs/api-inference/rate-limits.md
 create mode 100644 docs/api-inference/security.md
 create mode 100644 docs/api-inference/supported-models.md
 create mode 100644 docs/api-inference/tasks/audio-classification.md
 create mode 100644 docs/api-inference/tasks/automatic-speech-recognition.md
 create mode 100644 docs/api-inference/tasks/chat-completion.md
 create mode 100644 docs/api-inference/tasks/feature-extraction.md
 create mode 100644 docs/api-inference/tasks/fill-mask.md
 create mode 100644 docs/api-inference/tasks/image-classification.md
 create mode 100644 docs/api-inference/tasks/image-segmentation.md
 create mode 100644 docs/api-inference/tasks/image-to-image.md
 create mode 100644 docs/api-inference/tasks/object-detection.md
 create mode 100644 docs/api-inference/tasks/question-answering.md
 create mode 100644 docs/api-inference/tasks/summarization.md
 create mode 100644 docs/api-inference/tasks/table-question-answering.md
 create mode 100644 docs/api-inference/tasks/text-classification.md
 create mode 100644 docs/api-inference/tasks/text-generation.md
 create mode 100644 docs/api-inference/tasks/text-to-image.md
 create mode 100644 docs/api-inference/tasks/token-classification.md
 create mode 100644 docs/api-inference/tasks/translation.md
 create mode 100644 docs/api-inference/tasks/zero-shot-classification.md
 create mode 100644 scripts/api-inference/.gitignore
 create mode 100644 scripts/api-inference/.prettierignore
 create mode 100644 scripts/api-inference/README.md
 create mode 100644 scripts/api-inference/package.json
 create mode 100644 scripts/api-inference/pnpm-lock.yaml
 create mode 100644 scripts/api-inference/scripts/.gitignore
 create mode 100644 scripts/api-inference/scripts/generate.ts
 create mode 100644 scripts/api-inference/templates/common/page-header.handlebars
 create mode 100644 scripts/api-inference/templates/common/snippets-template.handlebars
 create mode 100644 scripts/api-inference/templates/common/specs-headers.handlebars
 create mode 100644 scripts/api-inference/templates/common/specs-output.handlebars
 create mode 100644 scripts/api-inference/templates/common/specs-payload.handlebars
 create mode 100644 scripts/api-inference/templates/task/audio-classification.handlebars
 create mode 100644 scripts/api-inference/templates/task/automatic-speech-recognition.handlebars
 create mode 100644 scripts/api-inference/templates/task/chat-completion.handlebars
 create mode 100644 scripts/api-inference/templates/task/feature-extraction.handlebars
 create mode 100644 scripts/api-inference/templates/task/fill-mask.handlebars
 create mode 100644 scripts/api-inference/templates/task/image-classification.handlebars
 create mode 100644 scripts/api-inference/templates/task/image-segmentation.handlebars
 create mode 100644 scripts/api-inference/templates/task/image-to-image.handlebars
 create mode 100644 scripts/api-inference/templates/task/object-detection.handlebars
 create mode 100644 scripts/api-inference/templates/task/question-answering.handlebars
 create mode 100644 scripts/api-inference/templates/task/summarization.handlebars
 create mode 100644 scripts/api-inference/templates/task/table-question-answering.handlebars
 create mode 100644 scripts/api-inference/templates/task/text-classification.handlebars
 create mode 100644 scripts/api-inference/templates/task/text-generation.handlebars
 create mode 100644 scripts/api-inference/templates/task/text-to-image.handlebars
 create mode 100644 scripts/api-inference/templates/task/token-classification.handlebars
 create mode 100644 scripts/api-inference/templates/task/translation.handlebars
 create mode 100644 scripts/api-inference/templates/task/zero-shot-classification.handlebars
 create mode 100644 scripts/api-inference/tsconfig.json

diff --git a/docs/TODOs.md b/docs/TODOs.md
new file mode 100644
index 000000000..659ee30ac
--- /dev/null
+++ b/docs/TODOs.md
@@ -0,0 +1,11 @@
+## For API-Inference docs:
+
+From https://github.com/huggingface/hub-docs/pull/1413:
+* Use `<inference> for getting started
+* Add some screenshots: supported models
+* Add flow chart of how API works
+* Add table with all tasks
+* Add missing tasks: depth estimation and zero shot image classification
+* Some tasks have no warm models, should we remove them for now? E.g. https://huggingface.co/models?inference=warm&pipeline_tag=fill-mask&sort=trending BUT many are cold and working, so actually linking to both could make sense - internal issue https://github.com/huggingface-internal/moon-landing/issues/10966
+* See also this [google doc](https://docs.google.com/document/d/1xy5Ug4C_qGbqp4x3T3rj_VOyjQzQLlyce-L6I_hYi94/edit?usp=sharing)
+* Add CI to auto-generate the docs when handlebars template are updated
\ No newline at end of file
diff --git a/docs/api-inference/_redirects.yml b/docs/api-inference/_redirects.yml
new file mode 100644
index 000000000..aab354ba5
--- /dev/null
+++ b/docs/api-inference/_redirects.yml
@@ -0,0 +1,5 @@
+quicktour: index
+detailed_parameters: parameters
+parallelism: getting_started
+usage: getting_started
+faq: index
diff --git a/docs/api-inference/_toctree.yml b/docs/api-inference/_toctree.yml
new file mode 100644
index 000000000..123f62ca4
--- /dev/null
+++ b/docs/api-inference/_toctree.yml
@@ -0,0 +1,54 @@
+- sections:
+  - local: index
+    title: Serverless Inference API
+  - local: getting-started
+    title: Getting Started
+  - local: supported-models
+    title: Supported Models
+  - local: rate-limits
+    title: Rate Limits
+  - local: security
+    title: Security
+  title: Getting Started
+- sections:
+  - local: parameters
+    title: Parameters
+  - sections:
+    - local: tasks/audio-classification
+      title: Audio Classification
+    - local: tasks/automatic-speech-recognition
+      title: Automatic Speech Recognition
+    - local: tasks/chat-completion
+      title: Chat Completion
+    - local: tasks/feature-extraction
+      title: Feature Extraction
+    - local: tasks/fill-mask
+      title: Fill Mask
+    - local: tasks/image-classification
+      title: Image Classification
+    - local: tasks/image-segmentation
+      title: Image Segmentation
+    - local: tasks/image-to-image
+      title: Image to Image
+    - local: tasks/object-detection
+      title: Object Detection
+    - local: tasks/question-answering
+      title: Question Answering
+    - local: tasks/summarization
+      title: Summarization
+    - local: tasks/table-question-answering
+      title: Table Question Answering
+    - local: tasks/text-classification
+      title: Text Classification
+    - local: tasks/text-generation
+      title: Text Generation
+    - local: tasks/text-to-image
+      title: Text to Image
+    - local: tasks/token-classification
+      title: Token Classification
+    - local: tasks/translation
+      title: Translation
+    - local: tasks/zero-shot-classification
+      title: Zero Shot Classification
+    title: Detailed Task Parameters
+  title: API Reference
\ No newline at end of file
diff --git a/docs/api-inference/getting-started.md b/docs/api-inference/getting-started.md
new file mode 100644
index 000000000..ea0007ba9
--- /dev/null
+++ b/docs/api-inference/getting-started.md
@@ -0,0 +1,95 @@
+# Getting Started
+
+The Serverless Inference API allows you to easily do inference on a wide range of models and tasks. You can do requests with your favorite tools (Python, cURL, etc). We also provide a Python SDK (`huggingface_hub`) to make it even easier.
+
+We'll do a minimal example using a [sentiment classification model](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest). Please visit task-specific parameters and further documentation in our [API Reference](./parameters).
+
+## Getting a Token
+
+Using the Serverless Inference API requires passing a user token in the request headers. You can get a token by signing up on the Hugging Face website and then going to the [tokens page](https://huggingface.co/settings/tokens/new?globalPermissions=inference.serverless.write&tokenType=fineGrained). We recommend creating a `fine-grained` token with the scope to `Make calls to the serverless Inference API`.
+
+For more details about user tokens, check out [this guide](https://huggingface.co/docs/hub/en/security-tokens).
+
+## cURL
+
+```bash
+curl 'https://api-inference.huggingface.co/models/cardiffnlp/twitter-roberta-base-sentiment-latest' \
+-H "Authorization: Bearer hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" \
+-H 'Content-Type: application/json' \
+-d '{"inputs": "Today is a great day"}'
+```
+
+## Python
+
+You can use the `requests` library to make a request to the Inference API.
+
+```python
+import requests
+
+API_URL = "https://api-inference.huggingface.co/models/cardiffnlp/twitter-roberta-base-sentiment-latest"
+headers = {"Authorization": "Bearer hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}
+payload = {
+    "inputs": "Today is a great day",
+}
+
+response = requests.post(API_URL, headers=headers, json=payload)
+response.json()
+```
+
+Hugging Face also provides a [`InferenceClient`](https://huggingface.co/docs/huggingface_hub/guides/inference) that handles inference for you. Make sure to install it with `pip install huggingface_hub` first.
+
+```python
+from huggingface_hub import InferenceClient
+
+client = InferenceClient(
+    "cardiffnlp/twitter-roberta-base-sentiment-latest",
+    token="hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
+)
+
+client.text_classification("Today is a great day")
+```
+
+## JavaScript
+
+```js
+import fetch from "node-fetch";
+
+async function query(data) {
+    const response = await fetch(
+        "https://api-inference.huggingface.co/models/cardiffnlp/twitter-roberta-base-sentiment-latest",
+        {
+            method: "POST",
+            headers: {
+                Authorization: `Bearer hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx`,
+                "Content-Type": "application/json",
+            },
+            body: JSON.stringify(data),
+        }
+    );
+    const result = await response.json();
+    return result;
+}
+
+query({inputs: "Today is a great day"}).then((response) => {
+    console.log(JSON.stringify(response, null, 2));
+});
+```
+
+Hugging Face also provides a [`HfInference`](https://huggingface.co/docs/huggingface.js/inference/classes/HfInference) client that handles inference. Make sure to install it with `npm install @huggingface/inference` first.
+
+```js
+import { HfInference } from "@huggingface/inference";
+
+const inference = new HfInference("hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx");
+
+const result = await inference.textClassification({
+    model: "cardiffnlp/twitter-roberta-base-sentiment-latest",
+    inputs: "Today is a great day",
+});
+
+console.log(result);
+```
+
+## Next Steps
+
+Now that you know the basics, you can explore the [API Reference](./parameters.md) to learn more about task-specific settings and parameters. 
\ No newline at end of file
diff --git a/docs/api-inference/index.md b/docs/api-inference/index.md
new file mode 100644
index 000000000..689bdf63b
--- /dev/null
+++ b/docs/api-inference/index.md
@@ -0,0 +1,53 @@
+# Serverless Inference API
+
+**Instant Access to thousands of ML Models for Fast Prototyping**
+
+Explore the most popular models for text, image, speech, and more — all with a simple API request. Build, test, and experiment without worrying about infrastructure or setup.
+
+---
+
+## Why use the Inference API?
+
+The Serverless Inference API offers a fast and free way to explore thousands of models for a variety of tasks. Whether you're prototyping a new application or experimenting with ML capabilities, this API gives you instant access to high-performing models across multiple domains:
+
+* **Text Generation:** Including large language models and tool-calling prompts, generate and experiment with high-quality responses.
+* **Image Generation:** Easily create customized images, including LoRAs for your own styles.
+* **Document Embeddings:** Build search and retrieval systems with SOTA embeddings.
+* **Classical AI Tasks:** Ready-to-use models for text classification, image classification, speech recognition, and more.
+
+⚡ **Fast and Free to Get Started**: The Inference API is free with higher rate limits for PRO users. For production needs, explore [Inference Endpoints](https://ui.endpoints.huggingface.co/) for dedicated resources, autoscaling, advanced security features, and more.
+
+---
+
+## Key Benefits
+
+- 🚀 **Instant Prototyping:** Access powerful models without setup.
+- 🎯 **Diverse Use Cases:** One API for text, image, and beyond.
+- 🔧 **Developer-Friendly:** Simple requests, fast responses.
+
+---
+
+## Main Features
+
+* Leverage over 800,000+ models from different open-source libraries (transformers, sentence transformers, adapter transformers, diffusers, timm, etc.).
+* Use models for a variety of tasks, including text generation, image generation, document embeddings, NER, summarization, image classification, and more.
+* Accelerate your prototyping by using GPU-powered models.
+* Run very large models that are challenging to deploy in production.
+* Production-grade platform without the hassle: built-in automatic scaling, load balancing and caching.
+
+---
+
+## Contents
+
+The documentation is organized into two sections:
+
+* **Getting Started** Learn the basics of how to use the Inference API.
+* **API Reference** Dive into task-specific settings and parameters.
+
+---
+
+## Looking for custom support from the Hugging Face team?
+
+<a target="_blank" href="https://huggingface.co/support">
+    <img alt="HuggingFace Expert Acceleration Program" src="https://cdn-media.huggingface.co/marketing/transformers/new-support-improved.png" style="max-width: 600px; border: 1px solid #eee; border-radius: 4px; box-shadow: 0 1px 2px 0 rgba(0, 0, 0, 0.05);">
+</a><br>
diff --git a/docs/api-inference/parameters.md b/docs/api-inference/parameters.md
new file mode 100644
index 000000000..b225cafd5
--- /dev/null
+++ b/docs/api-inference/parameters.md
@@ -0,0 +1,145 @@
+# Parameters
+
+
+## Additional Options
+
+### Caching
+
+There is a cache layer on the inference API to speed up requests when the inputs are exactly the same. Many models, such as classifiers and embedding models, can use those results as is if they are deterministic, meaning the results will be the same. However, if you use a nondeterministic model, you can disable the cache mechanism from being used, resulting in a real new query.
+
+To do this, you can add `x-use-cache:false` to the request headers. For example
+
+<inferencesnippet>
+
+<curl>
+```diff
+curl https://api-inference.huggingface.co/models/MODEL_ID \
+    -X POST \
+    -d '{"inputs": "Can you please let us know more details about your "}' \
+    -H "Authorization: Bearer hf_***" \
+    -H "Content-Type: application/json" \
++   -H "x-use-cache: false"
+```
+</curl>
+
+<python>
+```diff
+import requests
+
+API_URL = "https://api-inference.huggingface.co/models/MODEL_ID"
+headers = {
+    "Authorization": "Bearer hf_***",
+    "Content-Type": "application/json",
++   "x-use-cache": "false"
+}
+data = {
+    "inputs": "Can you please let us know more details about your "
+}
+response = requests.post(API_URL, headers=headers, json=data)
+print(response.json())
+```
+
+</python>
+
+<js>
+```diff
+import fetch from "node-fetch";
+
+async function query(data) {
+    const response = await fetch(
+        "https://api-inference.huggingface.co/models/MODEL_ID",
+        {
+            method: "POST",
+            headers: {
+                Authorization: `Bearer hf_***`,
+                "Content-Type": "application/json",
++               "x-use-cache": "false"
+            },
+            body: JSON.stringify(data),
+        }
+    );
+    const result = await response.json();
+    return result;
+}
+
+query({
+    inputs: "Can you please let us know more details about your "
+}).then((response) => {
+    console.log(JSON.stringify(response, null, 2));
+});
+
+```
+
+</js>
+
+</inferencesnippet>
+
+### Wait for the model
+
+When a model is warm, it is ready to be used and you will get a response relatively quickly. However, some models are cold and need to be loaded before they can be used. In that case, you will get a 503 error. Rather than doing many requests until it's loaded, you can wait for the model to be loaded by adding `x-wait-for-model:true` to the request headers. We suggest to only use this flag to wait for the model to be loaded when you are sure that the model is cold. That means, first try the request without this flag and only if you get a 503 error, try again with this flag.
+
+
+<inferencesnippet>
+
+<curl>
+```diff
+curl https://api-inference.huggingface.co/models/MODEL_ID \
+    -X POST \
+    -d '{"inputs": "Can you please let us know more details about your "}' \
+    -H "Authorization: Bearer hf_***" \
+    -H "Content-Type: application/json" \
++   -H "x-wait-for-model: true"
+```
+</curl>
+
+<python>
+```diff
+import requests
+
+API_URL = "https://api-inference.huggingface.co/models/MODEL_ID"
+headers = {
+    "Authorization": "Bearer hf_***",
+    "Content-Type": "application/json",
++   "x-wait-for-model": "true"
+}
+data = {
+    "inputs": "Can you please let us know more details about your "
+}
+response = requests.post(API_URL, headers=headers, json=data)
+print(response.json())
+```
+
+</python>
+
+<js>
+```diff
+import fetch from "node-fetch";
+
+async function query(data) {
+    const response = await fetch(
+        "https://api-inference.huggingface.co/models/MODEL_ID",
+        {
+            method: "POST",
+            headers: {
+                Authorization: `Bearer hf_***`,
+                "Content-Type": "application/json",
++               "x-wait-for-model": "true"
+            },
+            body: JSON.stringify(data),
+        }
+    );
+    const result = await response.json();
+    return result;
+}
+
+query({
+    inputs: "Can you please let us know more details about your "
+}).then((response) => {
+    console.log(JSON.stringify(response, null, 2));
+});
+
+```
+
+</js>
+
+</inferencesnippet>
\ No newline at end of file
diff --git a/docs/api-inference/rate-limits.md b/docs/api-inference/rate-limits.md
new file mode 100644
index 000000000..3077b2884
--- /dev/null
+++ b/docs/api-inference/rate-limits.md
@@ -0,0 +1,11 @@
+# Rate Limits
+
+The Inference API has rate limits based on the number of requests. These rate limits are subject to change in the future to be compute-based or token-based. 
+
+Serverless API is not meant to be used for heavy production applications. If you need higher rate limits, consider [Inference Endpoints](https://huggingface.co/docs/inference/endpoints) to have dedicated resources.
+
+| User Tier           | Rate Limit                |
+|---------------------|---------------------------|
+| Unregistered Users  | 1 request per hour        |
+| Signed-up Users     | 300 requests per hour     |
+| PRO and Enterprise Users           | 1000 requests per hour    |
\ No newline at end of file
diff --git a/docs/api-inference/security.md b/docs/api-inference/security.md
new file mode 100644
index 000000000..428734361
--- /dev/null
+++ b/docs/api-inference/security.md
@@ -0,0 +1,15 @@
+# Security & Compliance
+
+The Inference API is not designed for heavy production requirements. For production needs, explore [Inference Endpoints](https://ui.endpoints.huggingface.co/) for dedicated resources, autoscaling, advanced security features, and more.
+
+## Data Security/Privacy
+
+Hugging Face does not store any user data for training purposes. Tokens sent to the API might be stored in a short-term (few minutes) cache mechanism to speed-up repeated requests. Logs are stored for debugging for up to 30 days. Any additional data in terms of user data or tokens are not stored. 
+
+Serverless Inference API use TLS/SSL to encrypt the data in transit.
+
+## Hub Security
+
+The Hugging Face Hub, which Serverless Inference API is part, is SOC2 Type 2 certified. For more on Hub security: https://huggingface.co/docs/hub/security
+
+<img width="150" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/security-soc-1.jpg">
\ No newline at end of file
diff --git a/docs/api-inference/supported-models.md b/docs/api-inference/supported-models.md
new file mode 100644
index 000000000..81c511f60
--- /dev/null
+++ b/docs/api-inference/supported-models.md
@@ -0,0 +1,26 @@
+# Supported Models
+
+Given the fast-paced nature of the open ML ecosystem, the Inference API exposes models that have large community interest and are in active use (based on recent likes, downloads, and usage). Because of this, deployed models can be swapped without prior notice. The Hugging Face stack aims to keep all the latest popular models warm and ready to use.
+
+You can find:
+
+* **[Warm models](https://huggingface.co/models?inference=warm&sort=trending):** models ready to be used.
+* **[Cold models](https://huggingface.co/models?inference=cold&sort=trending):** models that are not loaded but can be used.
+* **[Frozen models](https://huggingface.co/models?inference=frozen&sort=trending):** models that currently can't be run with the API.
+
+## What do I get with a PRO subscription?
+
+In addition to thousands of public models available in the Hub, PRO and Enterprise users get higher [rate limits](./rate-limits) and free access to the following models:
+
+
+| Model                          | Size                                                                                                                                                                                       | Context Length | Use                                                          |
+|--------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------|--------------------------------------------------------------|
+| Meta Llama 3.1 Instruct  | [8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct), [70B](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct)                                                      | 128k tokens      | High quality multilingual chat model with large context length |
+| Meta Llama 3 Instruct          | [8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct), [70B](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct)                                                       | 8k tokens      | One of the best chat models                                  |
+| Llama 2 Chat                   | [7B](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf), [13B](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf), [70B](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf) | 4k tokens      | One of the best conversational models                        |
+| Bark                           | [0.9B](https://huggingface.co/suno/bark)                                                                                                                                                   | -              | Text to audio generation                                     |
+
+
+## Running Private Models
+
+The free Serverless API is designed to run popular public models. If you have a private model, you can use [Inference Endpoints](https://huggingface.co/docs/inference/endpoints) to deploy it.
diff --git a/docs/api-inference/tasks/audio-classification.md b/docs/api-inference/tasks/audio-classification.md
new file mode 100644
index 000000000..b752e9ee3
--- /dev/null
+++ b/docs/api-inference/tasks/audio-classification.md
@@ -0,0 +1,129 @@
+<!---
+This markdown file has been generated from a script. Please do not edit it directly.
+For more details, check out:
+- the `generate.ts` script: https://github.com/huggingface/hub-docs/blob/main/scripts/api-inference/scripts/generate.ts
+- the task template defining the sections in the page: https://github.com/huggingface/hub-docs/tree/main/scripts/api-inference/templates/task/audio-classification.handlebars
+- the input jsonschema specifications used to generate the input markdown table: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/tasks/audio-classification/spec/input.json
+- the output jsonschema specifications used to generate the output markdown table: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/tasks/audio-classification/spec/output.json
+- the snippets used to generate the example:
+  - curl: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/curl.ts
+  - python: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/python.ts
+  - javascript: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/js.ts
+- the "tasks" content for recommended models: https://huggingface.co/api/tasks
+--->
+
+## Audio Classification
+
+Audio classification is the task of assigning a label or class to a given audio.
+
+Example applications:
+* Recognizing which command a user is giving
+* Identifying a speaker
+* Detecting the genre of a song
+
+<Tip>
+
+For more details about the `audio-classification` task, check out its [dedicated page](https://huggingface.co/tasks/audio-classification)! You will find examples and related materials.
+
+</Tip>
+
+### Recommended models
+
+
+This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=audio-classification&sort=trending).
+
+### Using the API
+
+
+<inferencesnippet>
+
+<curl>
+```bash
+curl https://api-inference.huggingface.co/models/<REPO_ID> \
+	-X POST \
+	--data-binary '@sample1.flac' \
+	-H "Authorization: Bearer hf_***"
+
+```
+</curl>
+
+<python>
+```py
+import requests
+
+API_URL = "https://api-inference.huggingface.co/models/<REPO_ID>"
+headers = {"Authorization": "Bearer hf_***"}
+
+def query(filename):
+    with open(filename, "rb") as f:
+        data = f.read()
+    response = requests.post(API_URL, headers=headers, data=data)
+    return response.json()
+
+output = query("sample1.flac")
+```
+
+To use the Python client, see `huggingface_hub`'s [package reference](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.audio_classification).
+</python>
+
+<js>
+```js
+async function query(filename) {
+	const data = fs.readFileSync(filename);
+	const response = await fetch(
+		"https://api-inference.huggingface.co/models/<REPO_ID>",
+		{
+			headers: {
+				Authorization: "Bearer hf_***"
+				"Content-Type": "application/json",
+			},
+			method: "POST",
+			body: data,
+		}
+	);
+	const result = await response.json();
+	return result;
+}
+
+query("sample1.flac").then((response) => {
+	console.log(JSON.stringify(response));
+});
+```
+
+To use the JavaScript client, see `huggingface.js`'s [package reference](https://huggingface.co/docs/huggingface.js/inference/classes/HfInference#audioclassification).
+</js>
+
+</inferencesnippet>
+
+
+
+### API specification
+
+#### Request
+
+| Payload |  |  |
+| :--- | :--- | :--- |
+| **inputs*** | _string_ | The input audio data as a base64-encoded string. If no `parameters` are provided, you can also provide the audio data as a raw bytes payload. |
+| **parameters** | _object_ | Additional inference parameters for Audio Classification |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;function_to_apply** | _enum_ | Possible values: sigmoid, softmax, none. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;top_k** | _integer_ | When specified, limits the output to the top K most probable classes. |
+
+
+Some options can be configured by passing headers to the Inference API. Here are the available headers:
+
+| Headers |   |    |
+| :--- | :--- | :--- |
+| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). |
+| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). |
+| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). |
+
+For more information about Inference API headers, check out the parameters [guide](../parameters).
+
+#### Response
+
+| Body |  |
+| :--- | :--- | :--- |
+| **(array)** | _object[]_ | Output is an array of objects. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;label** | _string_ | The predicted class label. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;score** | _number_ | The corresponding probability. |
+
diff --git a/docs/api-inference/tasks/automatic-speech-recognition.md b/docs/api-inference/tasks/automatic-speech-recognition.md
new file mode 100644
index 000000000..7d7a2cc0a
--- /dev/null
+++ b/docs/api-inference/tasks/automatic-speech-recognition.md
@@ -0,0 +1,149 @@
+<!---
+This markdown file has been generated from a script. Please do not edit it directly.
+For more details, check out:
+- the `generate.ts` script: https://github.com/huggingface/hub-docs/blob/main/scripts/api-inference/scripts/generate.ts
+- the task template defining the sections in the page: https://github.com/huggingface/hub-docs/tree/main/scripts/api-inference/templates/task/automatic-speech-recognition.handlebars
+- the input jsonschema specifications used to generate the input markdown table: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/tasks/automatic-speech-recognition/spec/input.json
+- the output jsonschema specifications used to generate the output markdown table: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/tasks/automatic-speech-recognition/spec/output.json
+- the snippets used to generate the example:
+  - curl: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/curl.ts
+  - python: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/python.ts
+  - javascript: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/js.ts
+- the "tasks" content for recommended models: https://huggingface.co/api/tasks
+--->
+
+## Automatic Speech Recognition
+
+Automatic Speech Recognition (ASR), also known as Speech to Text (STT), is the task of transcribing a given audio to text.
+
+Example applications:
+* Transcribing a podcast
+* Building a voice assistant
+* Generating subtitles for a video
+
+<Tip>
+
+For more details about the `automatic-speech-recognition` task, check out its [dedicated page](https://huggingface.co/tasks/automatic-speech-recognition)! You will find examples and related materials.
+
+</Tip>
+
+### Recommended models
+
+- [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3): A powerful ASR model by OpenAI.
+- [facebook/seamless-m4t-v2-large](https://huggingface.co/facebook/seamless-m4t-v2-large): An end-to-end model that performs ASR and Speech Translation by MetaAI.
+- [pyannote/speaker-diarization-3.1](https://huggingface.co/pyannote/speaker-diarization-3.1): Powerful speaker diarization model.
+
+This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=automatic-speech-recognition&sort=trending).
+
+### Using the API
+
+
+<inferencesnippet>
+
+<curl>
+```bash
+curl https://api-inference.huggingface.co/models/openai/whisper-large-v3 \
+	-X POST \
+	--data-binary '@sample1.flac' \
+	-H "Authorization: Bearer hf_***"
+
+```
+</curl>
+
+<python>
+```py
+import requests
+
+API_URL = "https://api-inference.huggingface.co/models/openai/whisper-large-v3"
+headers = {"Authorization": "Bearer hf_***"}
+
+def query(filename):
+    with open(filename, "rb") as f:
+        data = f.read()
+    response = requests.post(API_URL, headers=headers, data=data)
+    return response.json()
+
+output = query("sample1.flac")
+```
+
+To use the Python client, see `huggingface_hub`'s [package reference](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.automatic_speech-recognition).
+</python>
+
+<js>
+```js
+async function query(filename) {
+	const data = fs.readFileSync(filename);
+	const response = await fetch(
+		"https://api-inference.huggingface.co/models/openai/whisper-large-v3",
+		{
+			headers: {
+				Authorization: "Bearer hf_***"
+				"Content-Type": "application/json",
+			},
+			method: "POST",
+			body: data,
+		}
+	);
+	const result = await response.json();
+	return result;
+}
+
+query("sample1.flac").then((response) => {
+	console.log(JSON.stringify(response));
+});
+```
+
+To use the JavaScript client, see `huggingface.js`'s [package reference](https://huggingface.co/docs/huggingface.js/inference/classes/HfInference#automaticspeech-recognition).
+</js>
+
+</inferencesnippet>
+
+
+
+### API specification
+
+#### Request
+
+| Payload |  |  |
+| :--- | :--- | :--- |
+| **inputs*** | _string_ | The input audio data as a base64-encoded string. If no `parameters` are provided, you can also provide the audio data as a raw bytes payload. |
+| **parameters** | _object_ | Additional inference parameters for Automatic Speech Recognition |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return_timestamps** | _boolean_ | Whether to output corresponding timestamps with the generated text |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;generate** | _object_ | Ad-hoc parametrization of the text generation process |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;temperature** | _number_ | The value used to modulate the next token probabilities. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;top_k** | _integer_ | The number of highest probability vocabulary tokens to keep for top-k-filtering. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;top_p** | _number_ | If set to float < 1, only the smallest set of most probable tokens with probabilities that add up to top_p or higher are kept for generation. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;typical_p** | _number_ |  Local typicality measures how similar the conditional probability of predicting a target token next is to the expected conditional probability of predicting a random token next, given the partial text already generated. If set to float < 1, the smallest set of the most locally typical tokens with probabilities that add up to typical_p or higher are kept for generation. See [this paper](https://hf.co/papers/2202.00666) for more details. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;epsilon_cutoff** | _number_ | If set to float strictly between 0 and 1, only tokens with a conditional probability greater than epsilon_cutoff will be sampled. In the paper, suggested values range from 3e-4 to 9e-4, depending on the size of the model. See [Truncation Sampling as Language Model Desmoothing](https://hf.co/papers/2210.15191) for more details. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;eta_cutoff** | _number_ | Eta sampling is a hybrid of locally typical sampling and epsilon sampling. If set to float strictly between 0 and 1, a token is only considered if it is greater than either eta_cutoff or sqrt(eta_cutoff) * exp(-entropy(softmax(next_token_logits))). The latter term is intuitively the expected next token probability, scaled by sqrt(eta_cutoff). In the paper, suggested values range from 3e-4 to 2e-3, depending on the size of the model. See [Truncation Sampling as Language Model Desmoothing](https://hf.co/papers/2210.15191) for more details. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;max_length** | _integer_ | The maximum length (in tokens) of the generated text, including the input. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;max_new_tokens** | _integer_ | The maximum number of tokens to generate. Takes precedence over maxLength. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;min_length** | _integer_ | The minimum length (in tokens) of the generated text, including the input. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;min_new_tokens** | _integer_ | The minimum number of tokens to generate. Takes precedence over maxLength. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;do_sample** | _boolean_ | Whether to use sampling instead of greedy decoding when generating new tokens. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;early_stopping** | _enum_ | Possible values: never, true, false. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;num_beams** | _integer_ | Number of beams to use for beam search. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;num_beam_groups** | _integer_ | Number of groups to divide num_beams into in order to ensure diversity among different groups of beams. See [this paper](https://hf.co/papers/1610.02424) for more details. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;penalty_alpha** | _number_ | The value balances the model confidence and the degeneration penalty in contrastive search decoding. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;use_cache** | _boolean_ | Whether the model should use the past last key/values attentions to speed up decoding |
+
+
+Some options can be configured by passing headers to the Inference API. Here are the available headers:
+
+| Headers |   |    |
+| :--- | :--- | :--- |
+| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). |
+| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). |
+| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). |
+
+For more information about Inference API headers, check out the parameters [guide](../parameters).
+
+#### Response
+
+| Body |  |
+| :--- | :--- | :--- |
+| **text** | _string_ | The recognized text. |
+| **chunks** | _object[]_ | When returnTimestamps is enabled, chunks contains a list of audio chunks identified by the model. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;text** | _string_ | A chunk of text identified by the model |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;timestamps** | _number[]_ | The start and end timestamps corresponding with the text |
+
diff --git a/docs/api-inference/tasks/chat-completion.md b/docs/api-inference/tasks/chat-completion.md
new file mode 100644
index 000000000..249318eda
--- /dev/null
+++ b/docs/api-inference/tasks/chat-completion.md
@@ -0,0 +1,222 @@
+<!---
+This markdown file has been generated from a script. Please do not edit it directly.
+For more details, check out:
+- the `generate.ts` script: https://github.com/huggingface/hub-docs/blob/main/scripts/api-inference/scripts/generate.ts
+- the task template defining the sections in the page: https://github.com/huggingface/hub-docs/tree/main/scripts/api-inference/templates/task/chat-completion.handlebars
+- the input jsonschema specifications used to generate the input markdown table: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/tasks/chat-completion/spec/input.json
+- the output jsonschema specifications used to generate the output markdown table: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/tasks/chat-completion/spec/output.json
+- the snippets used to generate the example:
+  - curl: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/curl.ts
+  - python: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/python.ts
+  - javascript: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/js.ts
+- the "tasks" content for recommended models: https://huggingface.co/api/tasks
+--->
+
+## Chat Completion
+
+Generate a response given a list of messages.
+This is a subtask of [`text-generation`](./text_generation) designed to generate responses in a conversational context.
+
+
+
+### Recommended models
+
+- [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it): A text-generation model trained to follow instructions.
+- [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct): Very powerful text generation model trained to follow instructions.
+- [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct): Small yet powerful text generation model.
+- [HuggingFaceH4/starchat2-15b-v0.1](https://huggingface.co/HuggingFaceH4/starchat2-15b-v0.1): Strong coding assistant model.
+- [mistralai/Mistral-Nemo-Instruct-2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407): Very strong open-source large language model.
+
+
+
+### Using the API
+
+The API supports:
+
+* Using the chat completion API compatible with the OpenAI SDK.
+* Using grammars, constraints, and tools.
+* Streaming the output
+
+
+<inferencesnippet>
+
+<curl>
+```bash
+curl 'https://api-inference.huggingface.co/models/google/gemma-2-2b-it/v1/chat/completions' \
+-H "Authorization: Bearer hf_***" \
+-H 'Content-Type: application/json' \
+-d '{
+	"model": "google/gemma-2-2b-it",
+	"messages": [{"role": "user", "content": "What is the capital of France?"}],
+	"max_tokens": 500,
+	"stream": false
+}'
+
+```
+</curl>
+
+<python>
+```py
+from huggingface_hub import InferenceClient
+
+client = InferenceClient(
+    "google/gemma-2-2b-it",
+    token="hf_***",
+)
+
+for message in client.chat_completion(
+	messages=[{"role": "user", "content": "What is the capital of France?"}],
+	max_tokens=500,
+	stream=True,
+):
+    print(message.choices[0].delta.content, end="")
+
+```
+
+To use the Python client, see `huggingface_hub`'s [package reference](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.chat_completion).
+</python>
+
+<js>
+```js
+import { HfInference } from "@huggingface/inference";
+
+const inference = new HfInference("hf_***");
+
+for await (const chunk of inference.chatCompletionStream({
+	model: "google/gemma-2-2b-it",
+	messages: [{ role: "user", content: "What is the capital of France?" }],
+	max_tokens: 500,
+})) {
+	process.stdout.write(chunk.choices[0]?.delta?.content || "");
+}
+
+```
+
+To use the JavaScript client, see `huggingface.js`'s [package reference](https://huggingface.co/docs/huggingface.js/inference/classes/HfInference#chatcompletion).
+</js>
+
+</inferencesnippet>
+
+
+
+### API specification
+
+#### Request
+
+| Payload |  |  |
+| :--- | :--- | :--- |
+| **frequency_penalty** | _number_ | Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. |
+| **logprobs** | _boolean_ | Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message. |
+| **max_tokens** | _integer_ | The maximum number of tokens that can be generated in the chat completion. |
+| **messages*** | _object[]_ | A list of messages comprising the conversation so far. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;content** | _string_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;name** | _string_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;role*** | _string_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;tool_calls** | _object[]_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;function*** | _object_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;arguments*** | _unknown_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;description** | _string_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;name*** | _string_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;id*** | _integer_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;type*** | _string_ |  |
+| **presence_penalty** | _number_ | Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics |
+| **seed** | _integer_ |  |
+| **stop** | _string[]_ | Up to 4 sequences where the API will stop generating further tokens. |
+| **stream** | _boolean_ |  |
+| **temperature** | _number_ | What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.  We generally recommend altering this or `top_p` but not both. |
+| **tool_choice** | _unknown_ | One of the following: |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(#1)** |  |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;FunctionName*** | _string_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(#2)** |  | Possible values: OneOf. |
+| **tool_prompt** | _string_ | A prompt to be appended before the tools |
+| **tools** | _object[]_ | A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;function*** | _object_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;arguments*** | _unknown_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;description** | _string_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;name*** | _string_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;type*** | _string_ |  |
+| **top_logprobs** | _integer_ | An integer between 0 and 5 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is used. |
+| **top_p** | _number_ | An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. |
+
+
+Some options can be configured by passing headers to the Inference API. Here are the available headers:
+
+| Headers |   |    |
+| :--- | :--- | :--- |
+| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). |
+| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). |
+| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). |
+
+For more information about Inference API headers, check out the parameters [guide](../parameters).
+
+#### Response
+
+Output type depends on the `stream` input parameter.
+If `stream` is `false` (default), the response will be a JSON object with the following fields:
+
+| Body |  |
+| :--- | :--- | :--- |
+| **choices** | _object[]_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;finish_reason** | _string_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;index** | _integer_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;logprobs** | _object_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;content** | _object[]_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;logprob** | _number_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;token** | _string_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;top_logprobs** | _object[]_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;logprob** | _number_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;token** | _string_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;message** | _object_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;content** | _string_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;name** | _string_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;role** | _string_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;tool_calls** | _object[]_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;function** | _object_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;arguments** | _unknown_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;description** | _string_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;name** | _string_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;id** | _integer_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;type** | _string_ |  |
+| **created** | _integer_ |  |
+| **id** | _string_ |  |
+| **model** | _string_ |  |
+| **object** | _string_ |  |
+| **system_fingerprint** | _string_ |  |
+| **usage** | _object_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;completion_tokens** | _integer_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;prompt_tokens** | _integer_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;total_tokens** | _integer_ |  |
+
+
+If `stream` is `true`, generated tokens are returned as a stream, using Server-Sent Events (SSE).
+For more information about streaming, check out [this guide](https://huggingface.co/docs/text-generation-inference/conceptual/streaming).
+
+| Body |  |
+| :--- | :--- | :--- |
+| **choices** | _object[]_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;delta** | _object_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;content** | _string_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;role** | _string_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;tool_calls** | _object_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;function** | _object_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;arguments** | _string_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;name** | _string_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;id** | _string_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;index** | _integer_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;type** | _string_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;finish_reason** | _string_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;index** | _integer_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;logprobs** | _object_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;content** | _object[]_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;logprob** | _number_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;token** | _string_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;top_logprobs** | _object[]_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;logprob** | _number_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;token** | _string_ |  |
+| **created** | _integer_ |  |
+| **id** | _string_ |  |
+| **model** | _string_ |  |
+| **object** | _string_ |  |
+| **system_fingerprint** | _string_ |  |
+
+
diff --git a/docs/api-inference/tasks/feature-extraction.md b/docs/api-inference/tasks/feature-extraction.md
new file mode 100644
index 000000000..6eb99703f
--- /dev/null
+++ b/docs/api-inference/tasks/feature-extraction.md
@@ -0,0 +1,130 @@
+<!---
+This markdown file has been generated from a script. Please do not edit it directly.
+For more details, check out:
+- the `generate.ts` script: https://github.com/huggingface/hub-docs/blob/main/scripts/api-inference/scripts/generate.ts
+- the task template defining the sections in the page: https://github.com/huggingface/hub-docs/tree/main/scripts/api-inference/templates/task/feature-extraction.handlebars
+- the input jsonschema specifications used to generate the input markdown table: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/tasks/feature-extraction/spec/input.json
+- the output jsonschema specifications used to generate the output markdown table: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/tasks/feature-extraction/spec/output.json
+- the snippets used to generate the example:
+  - curl: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/curl.ts
+  - python: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/python.ts
+  - javascript: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/js.ts
+- the "tasks" content for recommended models: https://huggingface.co/api/tasks
+--->
+
+## Feature Extraction
+
+Feature extraction is the task of converting a text into a vector (often called "embedding").
+
+Example applications:
+* Retrieving the most relevant documents for a query (for RAG applications).
+* Reranking a list of documents based on their similarity to a query.
+* Calculating the similarity between two sentences.
+
+<Tip>
+
+For more details about the `feature-extraction` task, check out its [dedicated page](https://huggingface.co/tasks/feature-extraction)! You will find examples and related materials.
+
+</Tip>
+
+### Recommended models
+
+- [thenlper/gte-large](https://huggingface.co/thenlper/gte-large): A powerful feature extraction model for natural language processing tasks.
+
+This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=feature-extraction&sort=trending).
+
+### Using the API
+
+
+<inferencesnippet>
+
+<curl>
+```bash
+curl https://api-inference.huggingface.co/models/thenlper/gte-large \
+	-X POST \
+	-d '{"inputs": "Today is a sunny day and I will get some ice cream."}' \
+	-H 'Content-Type: application/json' \
+	-H "Authorization: Bearer hf_***"
+
+```
+</curl>
+
+<python>
+```py
+import requests
+
+API_URL = "https://api-inference.huggingface.co/models/thenlper/gte-large"
+headers = {"Authorization": "Bearer hf_***"}
+
+def query(payload):
+	response = requests.post(API_URL, headers=headers, json=payload)
+	return response.json()
+	
+output = query({
+	"inputs": "Today is a sunny day and I will get some ice cream.",
+})
+```
+
+To use the Python client, see `huggingface_hub`'s [package reference](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.feature_extraction).
+</python>
+
+<js>
+```js
+async function query(data) {
+	const response = await fetch(
+		"https://api-inference.huggingface.co/models/thenlper/gte-large",
+		{
+			headers: {
+				Authorization: "Bearer hf_***"
+				"Content-Type": "application/json",
+			},
+			method: "POST",
+			body: JSON.stringify(data),
+		}
+	);
+	const result = await response.json();
+	return result;
+}
+
+query({"inputs": "Today is a sunny day and I will get some ice cream."}).then((response) => {
+	console.log(JSON.stringify(response));
+});
+```
+
+To use the JavaScript client, see `huggingface.js`'s [package reference](https://huggingface.co/docs/huggingface.js/inference/classes/HfInference#featureextraction).
+</js>
+
+</inferencesnippet>
+
+
+
+### API specification
+
+#### Request
+
+| Payload |  |  |
+| :--- | :--- | :--- |
+| **inputs*** | _string_ | The text to embed. |
+| **normalize** | _boolean_ |  |
+| **prompt_name** | _string_ | The name of the prompt that should be used by for encoding. If not set, no prompt will be applied.  Must be a key in the `Sentence Transformers` configuration `prompts` dictionary.  For example if ``prompt_name`` is "query" and the ``prompts`` is {"query": "query: ", ...}, then the sentence "What is the capital of France?" will be encoded as "query: What is the capital of France?" because the prompt text will be prepended before any text to encode. |
+| **truncate** | _boolean_ |  |
+| **truncation_direction** | _enum_ | Possible values: Left, Right. |
+
+
+Some options can be configured by passing headers to the Inference API. Here are the available headers:
+
+| Headers |   |    |
+| :--- | :--- | :--- |
+| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). |
+| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). |
+| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). |
+
+For more information about Inference API headers, check out the parameters [guide](../parameters).
+
+#### Response
+
+| Body |  |
+| :--- | :--- | :--- |
+| **(array)** | _array[]_ | Output is an array of arrays. |
+
+
diff --git a/docs/api-inference/tasks/fill-mask.md b/docs/api-inference/tasks/fill-mask.md
new file mode 100644
index 000000000..d25591df6
--- /dev/null
+++ b/docs/api-inference/tasks/fill-mask.md
@@ -0,0 +1,128 @@
+<!---
+This markdown file has been generated from a script. Please do not edit it directly.
+For more details, check out:
+- the `generate.ts` script: https://github.com/huggingface/hub-docs/blob/main/scripts/api-inference/scripts/generate.ts
+- the task template defining the sections in the page: https://github.com/huggingface/hub-docs/tree/main/scripts/api-inference/templates/task/fill-mask.handlebars
+- the input jsonschema specifications used to generate the input markdown table: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/tasks/fill-mask/spec/input.json
+- the output jsonschema specifications used to generate the output markdown table: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/tasks/fill-mask/spec/output.json
+- the snippets used to generate the example:
+  - curl: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/curl.ts
+  - python: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/python.ts
+  - javascript: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/js.ts
+- the "tasks" content for recommended models: https://huggingface.co/api/tasks
+--->
+
+## Fill-mask
+
+Mask filling is the task of predicting the right word (token to be precise) in the middle of a sequence.
+
+<Tip>
+
+For more details about the `fill-mask` task, check out its [dedicated page](https://huggingface.co/tasks/fill-mask)! You will find examples and related materials.
+
+</Tip>
+
+### Recommended models
+
+- [google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased): The famous BERT model.
+- [FacebookAI/xlm-roberta-base](https://huggingface.co/FacebookAI/xlm-roberta-base): A multilingual model trained on 100 languages.
+
+This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=fill-mask&sort=trending).
+
+### Using the API
+
+
+<inferencesnippet>
+
+<curl>
+```bash
+curl https://api-inference.huggingface.co/models/google-bert/bert-base-uncased \
+	-X POST \
+	-d '{"inputs": "The answer to the universe is [MASK]."}' \
+	-H 'Content-Type: application/json' \
+	-H "Authorization: Bearer hf_***"
+
+```
+</curl>
+
+<python>
+```py
+import requests
+
+API_URL = "https://api-inference.huggingface.co/models/google-bert/bert-base-uncased"
+headers = {"Authorization": "Bearer hf_***"}
+
+def query(payload):
+	response = requests.post(API_URL, headers=headers, json=payload)
+	return response.json()
+	
+output = query({
+	"inputs": "The answer to the universe is [MASK].",
+})
+```
+
+To use the Python client, see `huggingface_hub`'s [package reference](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.fill_mask).
+</python>
+
+<js>
+```js
+async function query(data) {
+	const response = await fetch(
+		"https://api-inference.huggingface.co/models/google-bert/bert-base-uncased",
+		{
+			headers: {
+				Authorization: "Bearer hf_***"
+				"Content-Type": "application/json",
+			},
+			method: "POST",
+			body: JSON.stringify(data),
+		}
+	);
+	const result = await response.json();
+	return result;
+}
+
+query({"inputs": "The answer to the universe is [MASK]."}).then((response) => {
+	console.log(JSON.stringify(response));
+});
+```
+
+To use the JavaScript client, see `huggingface.js`'s [package reference](https://huggingface.co/docs/huggingface.js/inference/classes/HfInference#fillmask).
+</js>
+
+</inferencesnippet>
+
+
+
+### API specification
+
+#### Request
+
+| Payload |  |  |
+| :--- | :--- | :--- |
+| **inputs*** | _string_ | The text with masked tokens |
+| **parameters** | _object_ | Additional inference parameters for Fill Mask |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;top_k** | _integer_ | When passed, overrides the number of predictions to return. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;targets** | _string[]_ | When passed, the model will limit the scores to the passed targets instead of looking up in the whole vocabulary. If the provided targets are not in the model vocab, they will be tokenized and the first resulting token will be used (with a warning, and that might be slower). |
+
+
+Some options can be configured by passing headers to the Inference API. Here are the available headers:
+
+| Headers |   |    |
+| :--- | :--- | :--- |
+| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). |
+| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). |
+| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). |
+
+For more information about Inference API headers, check out the parameters [guide](../parameters).
+
+#### Response
+
+| Body |  |
+| :--- | :--- | :--- |
+| **(array)** | _object[]_ | Output is an array of objects. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;sequence** | _string_ | The corresponding input with the mask token prediction. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;score** | _number_ | The corresponding probability |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;token** | _integer_ | The predicted token id (to replace the masked one). |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;token_str** | _string_ | The predicted token (to replace the masked one). |
+
diff --git a/docs/api-inference/tasks/image-classification.md b/docs/api-inference/tasks/image-classification.md
new file mode 100644
index 000000000..53f5f734f
--- /dev/null
+++ b/docs/api-inference/tasks/image-classification.md
@@ -0,0 +1,126 @@
+<!---
+This markdown file has been generated from a script. Please do not edit it directly.
+For more details, check out:
+- the `generate.ts` script: https://github.com/huggingface/hub-docs/blob/main/scripts/api-inference/scripts/generate.ts
+- the task template defining the sections in the page: https://github.com/huggingface/hub-docs/tree/main/scripts/api-inference/templates/task/image-classification.handlebars
+- the input jsonschema specifications used to generate the input markdown table: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/tasks/image-classification/spec/input.json
+- the output jsonschema specifications used to generate the output markdown table: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/tasks/image-classification/spec/output.json
+- the snippets used to generate the example:
+  - curl: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/curl.ts
+  - python: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/python.ts
+  - javascript: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/js.ts
+- the "tasks" content for recommended models: https://huggingface.co/api/tasks
+--->
+
+## Image Classification
+
+Image classification is the task of assigning a label or class to an entire image. Images are expected to have only one class for each image.
+
+<Tip>
+
+For more details about the `image-classification` task, check out its [dedicated page](https://huggingface.co/tasks/image-classification)! You will find examples and related materials.
+
+</Tip>
+
+### Recommended models
+
+- [google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224): A strong image classification model.
+
+This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=image-classification&sort=trending).
+
+### Using the API
+
+
+<inferencesnippet>
+
+<curl>
+```bash
+curl https://api-inference.huggingface.co/models/google/vit-base-patch16-224 \
+	-X POST \
+	--data-binary '@cats.jpg' \
+	-H "Authorization: Bearer hf_***"
+
+```
+</curl>
+
+<python>
+```py
+import requests
+
+API_URL = "https://api-inference.huggingface.co/models/google/vit-base-patch16-224"
+headers = {"Authorization": "Bearer hf_***"}
+
+def query(filename):
+    with open(filename, "rb") as f:
+        data = f.read()
+    response = requests.post(API_URL, headers=headers, data=data)
+    return response.json()
+
+output = query("cats.jpg")
+```
+
+To use the Python client, see `huggingface_hub`'s [package reference](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.image_classification).
+</python>
+
+<js>
+```js
+async function query(filename) {
+	const data = fs.readFileSync(filename);
+	const response = await fetch(
+		"https://api-inference.huggingface.co/models/google/vit-base-patch16-224",
+		{
+			headers: {
+				Authorization: "Bearer hf_***"
+				"Content-Type": "application/json",
+			},
+			method: "POST",
+			body: data,
+		}
+	);
+	const result = await response.json();
+	return result;
+}
+
+query("cats.jpg").then((response) => {
+	console.log(JSON.stringify(response));
+});
+```
+
+To use the JavaScript client, see `huggingface.js`'s [package reference](https://huggingface.co/docs/huggingface.js/inference/classes/HfInference#imageclassification).
+</js>
+
+</inferencesnippet>
+
+
+
+### API specification
+
+#### Request
+
+| Payload |  |  |
+| :--- | :--- | :--- |
+| **inputs*** | _string_ | The input image data as a base64-encoded string. If no `parameters` are provided, you can also provide the image data as a raw bytes payload. |
+| **parameters** | _object_ | Additional inference parameters for Image Classification |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;function_to_apply** | _enum_ | Possible values: sigmoid, softmax, none. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;top_k** | _integer_ | When specified, limits the output to the top K most probable classes. |
+
+
+Some options can be configured by passing headers to the Inference API. Here are the available headers:
+
+| Headers |   |    |
+| :--- | :--- | :--- |
+| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). |
+| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). |
+| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). |
+
+For more information about Inference API headers, check out the parameters [guide](../parameters).
+
+#### Response
+
+| Body |  |
+| :--- | :--- | :--- |
+| **(array)** | _object[]_ | Output is an array of objects. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;label** | _string_ | The predicted class label. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;score** | _number_ | The corresponding probability. |
+
+
diff --git a/docs/api-inference/tasks/image-segmentation.md b/docs/api-inference/tasks/image-segmentation.md
new file mode 100644
index 000000000..367e4b397
--- /dev/null
+++ b/docs/api-inference/tasks/image-segmentation.md
@@ -0,0 +1,129 @@
+<!---
+This markdown file has been generated from a script. Please do not edit it directly.
+For more details, check out:
+- the `generate.ts` script: https://github.com/huggingface/hub-docs/blob/main/scripts/api-inference/scripts/generate.ts
+- the task template defining the sections in the page: https://github.com/huggingface/hub-docs/tree/main/scripts/api-inference/templates/task/image-segmentation.handlebars
+- the input jsonschema specifications used to generate the input markdown table: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/tasks/image-segmentation/spec/input.json
+- the output jsonschema specifications used to generate the output markdown table: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/tasks/image-segmentation/spec/output.json
+- the snippets used to generate the example:
+  - curl: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/curl.ts
+  - python: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/python.ts
+  - javascript: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/js.ts
+- the "tasks" content for recommended models: https://huggingface.co/api/tasks
+--->
+
+## Image Segmentation
+
+Image Segmentation divides an image into segments where each pixel in the image is mapped to an object.
+
+<Tip>
+
+For more details about the `image-segmentation` task, check out its [dedicated page](https://huggingface.co/tasks/image-segmentation)! You will find examples and related materials.
+
+</Tip>
+
+### Recommended models
+
+- [nvidia/segformer-b0-finetuned-ade-512-512](https://huggingface.co/nvidia/segformer-b0-finetuned-ade-512-512): Semantic segmentation model trained on ADE20k benchmark dataset with 512x512 resolution.
+
+This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=image-segmentation&sort=trending).
+
+### Using the API
+
+
+<inferencesnippet>
+
+<curl>
+```bash
+curl https://api-inference.huggingface.co/models/nvidia/segformer-b0-finetuned-ade-512-512 \
+	-X POST \
+	--data-binary '@cats.jpg' \
+	-H "Authorization: Bearer hf_***"
+
+```
+</curl>
+
+<python>
+```py
+import requests
+
+API_URL = "https://api-inference.huggingface.co/models/nvidia/segformer-b0-finetuned-ade-512-512"
+headers = {"Authorization": "Bearer hf_***"}
+
+def query(filename):
+    with open(filename, "rb") as f:
+        data = f.read()
+    response = requests.post(API_URL, headers=headers, data=data)
+    return response.json()
+
+output = query("cats.jpg")
+```
+
+To use the Python client, see `huggingface_hub`'s [package reference](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.image_segmentation).
+</python>
+
+<js>
+```js
+async function query(filename) {
+	const data = fs.readFileSync(filename);
+	const response = await fetch(
+		"https://api-inference.huggingface.co/models/nvidia/segformer-b0-finetuned-ade-512-512",
+		{
+			headers: {
+				Authorization: "Bearer hf_***"
+				"Content-Type": "application/json",
+			},
+			method: "POST",
+			body: data,
+		}
+	);
+	const result = await response.json();
+	return result;
+}
+
+query("cats.jpg").then((response) => {
+	console.log(JSON.stringify(response));
+});
+```
+
+To use the JavaScript client, see `huggingface.js`'s [package reference](https://huggingface.co/docs/huggingface.js/inference/classes/HfInference#imagesegmentation).
+</js>
+
+</inferencesnippet>
+
+
+
+### API specification
+
+#### Request
+
+| Payload |  |  |
+| :--- | :--- | :--- |
+| **inputs*** | _string_ | The input image data as a base64-encoded string. If no `parameters` are provided, you can also provide the image data as a raw bytes payload. |
+| **parameters** | _object_ | Additional inference parameters for Image Segmentation |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;mask_threshold** | _number_ | Threshold to use when turning the predicted masks into binary values. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;overlap_mask_area_threshold** | _number_ | Mask overlap threshold to eliminate small, disconnected segments. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;subtask** | _enum_ | Possible values: instance, panoptic, semantic. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;threshold** | _number_ | Probability threshold to filter out predicted masks. |
+
+
+Some options can be configured by passing headers to the Inference API. Here are the available headers:
+
+| Headers |   |    |
+| :--- | :--- | :--- |
+| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). |
+| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). |
+| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). |
+
+For more information about Inference API headers, check out the parameters [guide](../parameters).
+
+#### Response
+
+| Body |  |
+| :--- | :--- | :--- |
+| **(array)** | _object[]_ | A predicted mask / segment |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;label** | _string_ | The label of the predicted segment. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;mask** | _string_ | The corresponding mask as a black-and-white image (base64-encoded). |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;score** | _number_ | The score or confidence degree the model has. |
+
+
diff --git a/docs/api-inference/tasks/image-to-image.md b/docs/api-inference/tasks/image-to-image.md
new file mode 100644
index 000000000..7b5cfaad4
--- /dev/null
+++ b/docs/api-inference/tasks/image-to-image.md
@@ -0,0 +1,75 @@
+<!---
+This markdown file has been generated from a script. Please do not edit it directly.
+For more details, check out:
+- the `generate.ts` script: https://github.com/huggingface/hub-docs/blob/main/scripts/api-inference/scripts/generate.ts
+- the task template defining the sections in the page: https://github.com/huggingface/hub-docs/tree/main/scripts/api-inference/templates/task/image-to-image.handlebars
+- the input jsonschema specifications used to generate the input markdown table: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/tasks/image-to-image/spec/input.json
+- the output jsonschema specifications used to generate the output markdown table: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/tasks/image-to-image/spec/output.json
+- the snippets used to generate the example:
+  - curl: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/curl.ts
+  - python: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/python.ts
+  - javascript: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/js.ts
+- the "tasks" content for recommended models: https://huggingface.co/api/tasks
+--->
+
+## Image to Image
+
+Image-to-image is the task of transforming a source image to match the characteristics of a target image or a target image domain.
+
+Example applications:
+* Transferring the style of an image to another image
+* Colorizing a black and white image
+* Increasing the resolution of an image
+
+<Tip>
+
+For more details about the `image-to-image` task, check out its [dedicated page](https://huggingface.co/tasks/image-to-image)! You will find examples and related materials.
+
+</Tip>
+
+### Recommended models
+
+- [timbrooks/instruct-pix2pix](https://huggingface.co/timbrooks/instruct-pix2pix): A model that takes an image and an instruction to edit the image.
+
+This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=image-to-image&sort=trending).
+
+### Using the API
+
+
+No snippet available for this task.
+
+
+
+### API specification
+
+#### Request
+
+| Payload |  |  |
+| :--- | :--- | :--- |
+| **inputs*** | _string_ | The input image data as a base64-encoded string. If no `parameters` are provided, you can also provide the image data as a raw bytes payload. |
+| **parameters** | _object_ | Additional inference parameters for Image To Image |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;guidance_scale** | _number_ | For diffusion models. A higher guidance scale value encourages the model to generate images closely linked to the text prompt at the expense of lower image quality. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;negative_prompt** | _string[]_ | One or several prompt to guide what NOT to include in image generation. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;num_inference_steps** | _integer_ | For diffusion models. The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;target_size** | _object_ | The size in pixel of the output image. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;width*** | _integer_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;height*** | _integer_ |  |
+
+
+Some options can be configured by passing headers to the Inference API. Here are the available headers:
+
+| Headers |   |    |
+| :--- | :--- | :--- |
+| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). |
+| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). |
+| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). |
+
+For more information about Inference API headers, check out the parameters [guide](../parameters).
+
+#### Response
+
+| Body |  |
+| :--- | :--- | :--- |
+| **image** | _unknown_ | The output image returned as raw bytes in the payload. |
+
+
diff --git a/docs/api-inference/tasks/object-detection.md b/docs/api-inference/tasks/object-detection.md
new file mode 100644
index 000000000..fc8d989c1
--- /dev/null
+++ b/docs/api-inference/tasks/object-detection.md
@@ -0,0 +1,130 @@
+<!---
+This markdown file has been generated from a script. Please do not edit it directly.
+For more details, check out:
+- the `generate.ts` script: https://github.com/huggingface/hub-docs/blob/main/scripts/api-inference/scripts/generate.ts
+- the task template defining the sections in the page: https://github.com/huggingface/hub-docs/tree/main/scripts/api-inference/templates/task/object-detection.handlebars
+- the input jsonschema specifications used to generate the input markdown table: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/tasks/object-detection/spec/input.json
+- the output jsonschema specifications used to generate the output markdown table: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/tasks/object-detection/spec/output.json
+- the snippets used to generate the example:
+  - curl: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/curl.ts
+  - python: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/python.ts
+  - javascript: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/js.ts
+- the "tasks" content for recommended models: https://huggingface.co/api/tasks
+--->
+
+## Object detection
+
+Object Detection models allow users to identify objects of certain defined classes. These models receive an image as input and output the images with bounding boxes and labels on detected objects.
+
+<Tip>
+
+For more details about the `object-detection` task, check out its [dedicated page](https://huggingface.co/tasks/object-detection)! You will find examples and related materials.
+
+</Tip>
+
+### Recommended models
+
+- [facebook/detr-resnet-50](https://huggingface.co/facebook/detr-resnet-50): Solid object detection model trained on the benchmark dataset COCO 2017.
+- [microsoft/beit-base-patch16-224-pt22k-ft22k](https://huggingface.co/microsoft/beit-base-patch16-224-pt22k-ft22k): Strong object detection model trained on ImageNet-21k dataset.
+
+This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=object-detection&sort=trending).
+
+### Using the API
+
+
+<inferencesnippet>
+
+<curl>
+```bash
+curl https://api-inference.huggingface.co/models/facebook/detr-resnet-50 \
+	-X POST \
+	--data-binary '@cats.jpg' \
+	-H "Authorization: Bearer hf_***"
+
+```
+</curl>
+
+<python>
+```py
+import requests
+
+API_URL = "https://api-inference.huggingface.co/models/facebook/detr-resnet-50"
+headers = {"Authorization": "Bearer hf_***"}
+
+def query(filename):
+    with open(filename, "rb") as f:
+        data = f.read()
+    response = requests.post(API_URL, headers=headers, data=data)
+    return response.json()
+
+output = query("cats.jpg")
+```
+
+To use the Python client, see `huggingface_hub`'s [package reference](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.object_detection).
+</python>
+
+<js>
+```js
+async function query(filename) {
+	const data = fs.readFileSync(filename);
+	const response = await fetch(
+		"https://api-inference.huggingface.co/models/facebook/detr-resnet-50",
+		{
+			headers: {
+				Authorization: "Bearer hf_***"
+				"Content-Type": "application/json",
+			},
+			method: "POST",
+			body: data,
+		}
+	);
+	const result = await response.json();
+	return result;
+}
+
+query("cats.jpg").then((response) => {
+	console.log(JSON.stringify(response));
+});
+```
+
+To use the JavaScript client, see `huggingface.js`'s [package reference](https://huggingface.co/docs/huggingface.js/inference/classes/HfInference#objectdetection).
+</js>
+
+</inferencesnippet>
+
+
+
+### API specification
+
+#### Request
+
+| Payload |  |  |
+| :--- | :--- | :--- |
+| **inputs*** | _string_ | The input image data as a base64-encoded string. If no `parameters` are provided, you can also provide the image data as a raw bytes payload. |
+| **parameters** | _object_ | Additional inference parameters for Object Detection |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;threshold** | _number_ | The probability necessary to make a prediction. |
+
+
+Some options can be configured by passing headers to the Inference API. Here are the available headers:
+
+| Headers |   |    |
+| :--- | :--- | :--- |
+| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). |
+| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). |
+| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). |
+
+For more information about Inference API headers, check out the parameters [guide](../parameters).
+
+#### Response
+
+| Body |  |
+| :--- | :--- | :--- |
+| **(array)** | _object[]_ | Output is an array of objects. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;label** | _string_ | The predicted label for the bounding box. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;score** | _number_ | The associated score / probability. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;box** | _object_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;xmin** | _integer_ | The x-coordinate of the top-left corner of the bounding box. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;xmax** | _integer_ | The x-coordinate of the bottom-right corner of the bounding box. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;ymin** | _integer_ | The y-coordinate of the top-left corner of the bounding box. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;ymax** | _integer_ | The y-coordinate of the bottom-right corner of the bounding box. |
+
diff --git a/docs/api-inference/tasks/question-answering.md b/docs/api-inference/tasks/question-answering.md
new file mode 100644
index 000000000..73ccfa13b
--- /dev/null
+++ b/docs/api-inference/tasks/question-answering.md
@@ -0,0 +1,141 @@
+<!---
+This markdown file has been generated from a script. Please do not edit it directly.
+For more details, check out:
+- the `generate.ts` script: https://github.com/huggingface/hub-docs/blob/main/scripts/api-inference/scripts/generate.ts
+- the task template defining the sections in the page: https://github.com/huggingface/hub-docs/tree/main/scripts/api-inference/templates/task/question-answering.handlebars
+- the input jsonschema specifications used to generate the input markdown table: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/tasks/question-answering/spec/input.json
+- the output jsonschema specifications used to generate the output markdown table: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/tasks/question-answering/spec/output.json
+- the snippets used to generate the example:
+  - curl: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/curl.ts
+  - python: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/python.ts
+  - javascript: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/js.ts
+- the "tasks" content for recommended models: https://huggingface.co/api/tasks
+--->
+
+## Question Answering
+
+Question Answering models can retrieve the answer to a question from a given text, which is useful for searching for an answer in a document.
+
+<Tip>
+
+For more details about the `question-answering` task, check out its [dedicated page](https://huggingface.co/tasks/question-answering)! You will find examples and related materials.
+
+</Tip>
+
+### Recommended models
+
+- [deepset/roberta-base-squad2](https://huggingface.co/deepset/roberta-base-squad2): A robust baseline model for most question answering domains.
+- [distilbert/distilbert-base-cased-distilled-squad](https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad): Small yet robust model that can answer questions.
+
+This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=question-answering&sort=trending).
+
+### Using the API
+
+
+<inferencesnippet>
+
+<curl>
+```bash
+curl https://api-inference.huggingface.co/models/deepset/roberta-base-squad2 \
+	-X POST \
+	-d '{"inputs": { "question": "What is my name?", "context": "My name is Clara and I live in Berkeley." }}' \
+	-H 'Content-Type: application/json' \
+	-H "Authorization: Bearer hf_***"
+
+```
+</curl>
+
+<python>
+```py
+import requests
+
+API_URL = "https://api-inference.huggingface.co/models/deepset/roberta-base-squad2"
+headers = {"Authorization": "Bearer hf_***"}
+
+def query(payload):
+	response = requests.post(API_URL, headers=headers, json=payload)
+	return response.json()
+	
+output = query({
+	"inputs": {
+	"question": "What is my name?",
+	"context": "My name is Clara and I live in Berkeley."
+},
+})
+```
+
+To use the Python client, see `huggingface_hub`'s [package reference](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.question_answering).
+</python>
+
+<js>
+```js
+async function query(data) {
+	const response = await fetch(
+		"https://api-inference.huggingface.co/models/deepset/roberta-base-squad2",
+		{
+			headers: {
+				Authorization: "Bearer hf_***"
+				"Content-Type": "application/json",
+			},
+			method: "POST",
+			body: JSON.stringify(data),
+		}
+	);
+	const result = await response.json();
+	return result;
+}
+
+query({"inputs": {
+	"question": "What is my name?",
+	"context": "My name is Clara and I live in Berkeley."
+}}).then((response) => {
+	console.log(JSON.stringify(response));
+});
+```
+
+To use the JavaScript client, see `huggingface.js`'s [package reference](https://huggingface.co/docs/huggingface.js/inference/classes/HfInference#questionanswering).
+</js>
+
+</inferencesnippet>
+
+
+
+### API specification
+
+#### Request
+
+| Payload |  |  |
+| :--- | :--- | :--- |
+| **inputs*** | _object_ | One (context, question) pair to answer |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;context*** | _string_ | The context to be used for answering the question |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;question*** | _string_ | The question to be answered |
+| **parameters** | _object_ | Additional inference parameters for Question Answering |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;top_k** | _integer_ | The number of answers to return (will be chosen by order of likelihood). Note that we return less than topk answers if there are not enough options available within the context. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;doc_stride** | _integer_ | If the context is too long to fit with the question for the model, it will be split in several chunks with some overlap. This argument controls the size of that overlap. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;max_answer_len** | _integer_ | The maximum length of predicted answers (e.g., only answers with a shorter length are considered). |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;max_seq_len** | _integer_ | The maximum length of the total sentence (context + question) in tokens of each chunk passed to the model. The context will be split in several chunks (using docStride as overlap) if needed. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;max_question_len** | _integer_ | The maximum length of the question after tokenization. It will be truncated if needed. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;handle_impossible_answer** | _boolean_ | Whether to accept impossible as an answer. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;align_to_words** | _boolean_ | Attempts to align the answer to real words. Improves quality on space separated languages. Might hurt on non-space-separated languages (like Japanese or Chinese) |
+
+
+Some options can be configured by passing headers to the Inference API. Here are the available headers:
+
+| Headers |   |    |
+| :--- | :--- | :--- |
+| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). |
+| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). |
+| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). |
+
+For more information about Inference API headers, check out the parameters [guide](../parameters).
+
+#### Response
+
+| Body |  |
+| :--- | :--- | :--- |
+| **(array)** | _object[]_ | Output is an array of objects. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;answer** | _string_ | The answer to the question. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;score** | _number_ | The probability associated to the answer. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;start** | _integer_ | The character position in the input where the answer begins. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;end** | _integer_ | The character position in the input where the answer ends. |
+
diff --git a/docs/api-inference/tasks/summarization.md b/docs/api-inference/tasks/summarization.md
new file mode 100644
index 000000000..c10a1828b
--- /dev/null
+++ b/docs/api-inference/tasks/summarization.md
@@ -0,0 +1,124 @@
+<!---
+This markdown file has been generated from a script. Please do not edit it directly.
+For more details, check out:
+- the `generate.ts` script: https://github.com/huggingface/hub-docs/blob/main/scripts/api-inference/scripts/generate.ts
+- the task template defining the sections in the page: https://github.com/huggingface/hub-docs/tree/main/scripts/api-inference/templates/task/summarization.handlebars
+- the input jsonschema specifications used to generate the input markdown table: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/tasks/summarization/spec/input.json
+- the output jsonschema specifications used to generate the output markdown table: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/tasks/summarization/spec/output.json
+- the snippets used to generate the example:
+  - curl: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/curl.ts
+  - python: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/python.ts
+  - javascript: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/js.ts
+- the "tasks" content for recommended models: https://huggingface.co/api/tasks
+--->
+
+## Summarization
+
+Summarization is the task of producing a shorter version of a document while preserving its important information. Some models can extract text from the original input, while other models can generate entirely new text.
+
+<Tip>
+
+For more details about the `summarization` task, check out its [dedicated page](https://huggingface.co/tasks/summarization)! You will find examples and related materials.
+
+</Tip>
+
+### Recommended models
+
+- [facebook/bart-large-cnn](https://huggingface.co/facebook/bart-large-cnn): A strong summarization model trained on English news articles. Excels at generating factual summaries.
+
+This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=summarization&sort=trending).
+
+### Using the API
+
+
+<inferencesnippet>
+
+<curl>
+```bash
+curl https://api-inference.huggingface.co/models/facebook/bart-large-cnn \
+	-X POST \
+	-d '{"inputs": "The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930. It was the first structure to reach a height of 300 metres. Due to the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct."}' \
+	-H 'Content-Type: application/json' \
+	-H "Authorization: Bearer hf_***"
+
+```
+</curl>
+
+<python>
+```py
+import requests
+
+API_URL = "https://api-inference.huggingface.co/models/facebook/bart-large-cnn"
+headers = {"Authorization": "Bearer hf_***"}
+
+def query(payload):
+	response = requests.post(API_URL, headers=headers, json=payload)
+	return response.json()
+	
+output = query({
+	"inputs": "The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930. It was the first structure to reach a height of 300 metres. Due to the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct.",
+})
+```
+
+To use the Python client, see `huggingface_hub`'s [package reference](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.summarization).
+</python>
+
+<js>
+```js
+async function query(data) {
+	const response = await fetch(
+		"https://api-inference.huggingface.co/models/facebook/bart-large-cnn",
+		{
+			headers: {
+				Authorization: "Bearer hf_***"
+				"Content-Type": "application/json",
+			},
+			method: "POST",
+			body: JSON.stringify(data),
+		}
+	);
+	const result = await response.json();
+	return result;
+}
+
+query({"inputs": "The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930. It was the first structure to reach a height of 300 metres. Due to the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct."}).then((response) => {
+	console.log(JSON.stringify(response));
+});
+```
+
+To use the JavaScript client, see `huggingface.js`'s [package reference](https://huggingface.co/docs/huggingface.js/inference/classes/HfInference#summarization).
+</js>
+
+</inferencesnippet>
+
+
+
+### API specification
+
+#### Request
+
+| Payload |  |  |
+| :--- | :--- | :--- |
+| **inputs*** | _string_ | The input text to summarize. |
+| **parameters** | _object_ | Additional inference parameters for summarization. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;clean_up_tokenization_spaces** | _boolean_ | Whether to clean up the potential extra spaces in the text output. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;truncation** | _enum_ | Possible values: do_not_truncate, longest_first, only_first, only_second. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;generate_parameters** | _object_ | Additional parametrization of the text generation algorithm. |
+
+
+Some options can be configured by passing headers to the Inference API. Here are the available headers:
+
+| Headers |   |    |
+| :--- | :--- | :--- |
+| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). |
+| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). |
+| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). |
+
+For more information about Inference API headers, check out the parameters [guide](../parameters).
+
+#### Response
+
+| Body |  |
+| :--- | :--- | :--- |
+| **summary_text** | _string_ | The summarized text. |
+
diff --git a/docs/api-inference/tasks/table-question-answering.md b/docs/api-inference/tasks/table-question-answering.md
new file mode 100644
index 000000000..3eb659892
--- /dev/null
+++ b/docs/api-inference/tasks/table-question-answering.md
@@ -0,0 +1,150 @@
+<!---
+This markdown file has been generated from a script. Please do not edit it directly.
+For more details, check out:
+- the `generate.ts` script: https://github.com/huggingface/hub-docs/blob/main/scripts/api-inference/scripts/generate.ts
+- the task template defining the sections in the page: https://github.com/huggingface/hub-docs/tree/main/scripts/api-inference/templates/task/table-question-answering.handlebars
+- the input jsonschema specifications used to generate the input markdown table: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/tasks/table-question-answering/spec/input.json
+- the output jsonschema specifications used to generate the output markdown table: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/tasks/table-question-answering/spec/output.json
+- the snippets used to generate the example:
+  - curl: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/curl.ts
+  - python: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/python.ts
+  - javascript: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/js.ts
+- the "tasks" content for recommended models: https://huggingface.co/api/tasks
+--->
+
+## Table Question Answering
+
+Table Question Answering (Table QA) is the answering a question about an information on a given table.
+
+<Tip>
+
+For more details about the `table-question-answering` task, check out its [dedicated page](https://huggingface.co/tasks/table-question-answering)! You will find examples and related materials.
+
+</Tip>
+
+### Recommended models
+
+
+This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=table-question-answering&sort=trending).
+
+### Using the API
+
+
+<inferencesnippet>
+
+<curl>
+```bash
+curl https://api-inference.huggingface.co/models/<REPO_ID> \
+	-X POST \
+	-d '{"inputs": { "query": "How many stars does the transformers repository have?", "table": { "Repository": ["Transformers", "Datasets", "Tokenizers"], "Stars": ["36542", "4512", "3934"], "Contributors": ["651", "77", "34"], "Programming language": [ "Python", "Python", "Rust, Python and NodeJS" ] } }}' \
+	-H 'Content-Type: application/json' \
+	-H "Authorization: Bearer hf_***"
+
+```
+</curl>
+
+<python>
+```py
+import requests
+
+API_URL = "https://api-inference.huggingface.co/models/<REPO_ID>"
+headers = {"Authorization": "Bearer hf_***"}
+
+def query(payload):
+	response = requests.post(API_URL, headers=headers, json=payload)
+	return response.json()
+	
+output = query({
+	"inputs": {
+	"query": "How many stars does the transformers repository have?",
+	"table": {
+		"Repository": ["Transformers", "Datasets", "Tokenizers"],
+		"Stars": ["36542", "4512", "3934"],
+		"Contributors": ["651", "77", "34"],
+		"Programming language": [
+			"Python",
+			"Python",
+			"Rust, Python and NodeJS"
+		]
+	}
+},
+})
+```
+
+To use the Python client, see `huggingface_hub`'s [package reference](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.table_question-answering).
+</python>
+
+<js>
+```js
+async function query(data) {
+	const response = await fetch(
+		"https://api-inference.huggingface.co/models/<REPO_ID>",
+		{
+			headers: {
+				Authorization: "Bearer hf_***"
+				"Content-Type": "application/json",
+			},
+			method: "POST",
+			body: JSON.stringify(data),
+		}
+	);
+	const result = await response.json();
+	return result;
+}
+
+query({"inputs": {
+	"query": "How many stars does the transformers repository have?",
+	"table": {
+		"Repository": ["Transformers", "Datasets", "Tokenizers"],
+		"Stars": ["36542", "4512", "3934"],
+		"Contributors": ["651", "77", "34"],
+		"Programming language": [
+			"Python",
+			"Python",
+			"Rust, Python and NodeJS"
+		]
+	}
+}}).then((response) => {
+	console.log(JSON.stringify(response));
+});
+```
+
+To use the JavaScript client, see `huggingface.js`'s [package reference](https://huggingface.co/docs/huggingface.js/inference/classes/HfInference#tablequestion-answering).
+</js>
+
+</inferencesnippet>
+
+
+
+### API specification
+
+#### Request
+
+| Payload |  |  |
+| :--- | :--- | :--- |
+| **inputs*** | _object_ | One (table, question) pair to answer |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;table*** | _object_ | The table to serve as context for the questions |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;question*** | _string_ | The question to be answered about the table |
+| **parameters** | _object_ | Additional inference parameters for Table Question Answering |
+
+
+Some options can be configured by passing headers to the Inference API. Here are the available headers:
+
+| Headers |   |    |
+| :--- | :--- | :--- |
+| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). |
+| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). |
+| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). |
+
+For more information about Inference API headers, check out the parameters [guide](../parameters).
+
+#### Response
+
+| Body |  |
+| :--- | :--- | :--- |
+| **(array)** | _object[]_ | Output is an array of objects. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;answer** | _string_ | The answer of the question given the table. If there is an aggregator, the answer will be preceded by `AGGREGATOR >`. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;coordinates** | _array[]_ | Coordinates of the cells of the answers. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;cells** | _string[]_ | List of strings made up of the answer cell values. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;aggregator** | _string_ | If the model has an aggregator, this returns the aggregator. |
+
diff --git a/docs/api-inference/tasks/text-classification.md b/docs/api-inference/tasks/text-classification.md
new file mode 100644
index 000000000..2ddea6833
--- /dev/null
+++ b/docs/api-inference/tasks/text-classification.md
@@ -0,0 +1,129 @@
+<!---
+This markdown file has been generated from a script. Please do not edit it directly.
+For more details, check out:
+- the `generate.ts` script: https://github.com/huggingface/hub-docs/blob/main/scripts/api-inference/scripts/generate.ts
+- the task template defining the sections in the page: https://github.com/huggingface/hub-docs/tree/main/scripts/api-inference/templates/task/text-classification.handlebars
+- the input jsonschema specifications used to generate the input markdown table: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/tasks/text-classification/spec/input.json
+- the output jsonschema specifications used to generate the output markdown table: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/tasks/text-classification/spec/output.json
+- the snippets used to generate the example:
+  - curl: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/curl.ts
+  - python: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/python.ts
+  - javascript: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/js.ts
+- the "tasks" content for recommended models: https://huggingface.co/api/tasks
+--->
+
+## Text Classification
+
+Text Classification is the task of assigning a label or class to a given text. Some use cases are sentiment analysis, natural language inference, and assessing grammatical correctness.
+
+<Tip>
+
+For more details about the `text-classification` task, check out its [dedicated page](https://huggingface.co/tasks/text-classification)! You will find examples and related materials.
+
+</Tip>
+
+### Recommended models
+
+- [distilbert/distilbert-base-uncased-finetuned-sst-2-english](https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english): A robust model trained for sentiment analysis.
+- [ProsusAI/finbert](https://huggingface.co/ProsusAI/finbert): A sentiment analysis model specialized in financial sentiment.
+- [cardiffnlp/twitter-roberta-base-sentiment-latest](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest): A sentiment analysis model specialized in analyzing tweets.
+- [papluca/xlm-roberta-base-language-detection](https://huggingface.co/papluca/xlm-roberta-base-language-detection): A model that can classify languages.
+- [meta-llama/Prompt-Guard-86M](https://huggingface.co/meta-llama/Prompt-Guard-86M): A model that can classify text generation attacks.
+
+This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=text-classification&sort=trending).
+
+### Using the API
+
+
+<inferencesnippet>
+
+<curl>
+```bash
+curl https://api-inference.huggingface.co/models/distilbert/distilbert-base-uncased-finetuned-sst-2-english \
+	-X POST \
+	-d '{"inputs": "I like you. I love you"}' \
+	-H 'Content-Type: application/json' \
+	-H "Authorization: Bearer hf_***"
+
+```
+</curl>
+
+<python>
+```py
+import requests
+
+API_URL = "https://api-inference.huggingface.co/models/distilbert/distilbert-base-uncased-finetuned-sst-2-english"
+headers = {"Authorization": "Bearer hf_***"}
+
+def query(payload):
+	response = requests.post(API_URL, headers=headers, json=payload)
+	return response.json()
+	
+output = query({
+	"inputs": "I like you. I love you",
+})
+```
+
+To use the Python client, see `huggingface_hub`'s [package reference](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.text_classification).
+</python>
+
+<js>
+```js
+async function query(data) {
+	const response = await fetch(
+		"https://api-inference.huggingface.co/models/distilbert/distilbert-base-uncased-finetuned-sst-2-english",
+		{
+			headers: {
+				Authorization: "Bearer hf_***"
+				"Content-Type": "application/json",
+			},
+			method: "POST",
+			body: JSON.stringify(data),
+		}
+	);
+	const result = await response.json();
+	return result;
+}
+
+query({"inputs": "I like you. I love you"}).then((response) => {
+	console.log(JSON.stringify(response));
+});
+```
+
+To use the JavaScript client, see `huggingface.js`'s [package reference](https://huggingface.co/docs/huggingface.js/inference/classes/HfInference#textclassification).
+</js>
+
+</inferencesnippet>
+
+
+
+### API specification
+
+#### Request
+
+| Payload |  |  |
+| :--- | :--- | :--- |
+| **inputs*** | _string_ | The text to classify |
+| **parameters** | _object_ | Additional inference parameters for Text Classification |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;function_to_apply** | _enum_ | Possible values: sigmoid, softmax, none. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;top_k** | _integer_ | When specified, limits the output to the top K most probable classes. |
+
+
+Some options can be configured by passing headers to the Inference API. Here are the available headers:
+
+| Headers |   |    |
+| :--- | :--- | :--- |
+| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). |
+| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). |
+| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). |
+
+For more information about Inference API headers, check out the parameters [guide](../parameters).
+
+#### Response
+
+| Body |  |
+| :--- | :--- | :--- |
+| **(array)** | _object[]_ | Output is an array of objects. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;label** | _string_ | The predicted class label. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;score** | _number_ | The corresponding probability. |
+
diff --git a/docs/api-inference/tasks/text-generation.md b/docs/api-inference/tasks/text-generation.md
new file mode 100644
index 000000000..22ee84e1a
--- /dev/null
+++ b/docs/api-inference/tasks/text-generation.md
@@ -0,0 +1,216 @@
+<!---
+This markdown file has been generated from a script. Please do not edit it directly.
+For more details, check out:
+- the `generate.ts` script: https://github.com/huggingface/hub-docs/blob/main/scripts/api-inference/scripts/generate.ts
+- the task template defining the sections in the page: https://github.com/huggingface/hub-docs/tree/main/scripts/api-inference/templates/task/text-generation.handlebars
+- the input jsonschema specifications used to generate the input markdown table: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/tasks/text-generation/spec/input.json
+- the output jsonschema specifications used to generate the output markdown table: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/tasks/text-generation/spec/output.json
+- the snippets used to generate the example:
+  - curl: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/curl.ts
+  - python: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/python.ts
+  - javascript: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/js.ts
+- the "tasks" content for recommended models: https://huggingface.co/api/tasks
+--->
+
+## Text Generation
+
+Generate text based on a prompt.
+
+If you are interested in a Chat Completion task, which generates a response based on a list of messages, check out the [`chat-completion`](./chat_completion) task.
+
+<Tip>
+
+For more details about the `text-generation` task, check out its [dedicated page](https://huggingface.co/tasks/text-generation)! You will find examples and related materials.
+
+</Tip>
+
+### Recommended models
+
+- [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it): A text-generation model trained to follow instructions.
+- [bigcode/starcoder](https://huggingface.co/bigcode/starcoder): A code generation model that can generate code in 80+ languages.
+- [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct): Very powerful text generation model trained to follow instructions.
+- [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct): Small yet powerful text generation model.
+- [HuggingFaceH4/starchat2-15b-v0.1](https://huggingface.co/HuggingFaceH4/starchat2-15b-v0.1): Strong coding assistant model.
+- [mistralai/Mistral-Nemo-Instruct-2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407): Very strong open-source large language model.
+
+This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=text-generation&sort=trending).
+
+### Using the API
+
+
+<inferencesnippet>
+
+<curl>
+```bash
+curl https://api-inference.huggingface.co/models/google/gemma-2-2b-it \
+	-X POST \
+	-d '{"inputs": "Can you please let us know more details about your "}' \
+	-H 'Content-Type: application/json' \
+	-H "Authorization: Bearer hf_***"
+
+```
+</curl>
+
+<python>
+```py
+import requests
+
+API_URL = "https://api-inference.huggingface.co/models/google/gemma-2-2b-it"
+headers = {"Authorization": "Bearer hf_***"}
+
+def query(payload):
+	response = requests.post(API_URL, headers=headers, json=payload)
+	return response.json()
+	
+output = query({
+	"inputs": "Can you please let us know more details about your ",
+})
+```
+
+To use the Python client, see `huggingface_hub`'s [package reference](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation).
+</python>
+
+<js>
+```js
+async function query(data) {
+	const response = await fetch(
+		"https://api-inference.huggingface.co/models/google/gemma-2-2b-it",
+		{
+			headers: {
+				Authorization: "Bearer hf_***"
+				"Content-Type": "application/json",
+			},
+			method: "POST",
+			body: JSON.stringify(data),
+		}
+	);
+	const result = await response.json();
+	return result;
+}
+
+query({"inputs": "Can you please let us know more details about your "}).then((response) => {
+	console.log(JSON.stringify(response));
+});
+```
+
+To use the JavaScript client, see `huggingface.js`'s [package reference](https://huggingface.co/docs/huggingface.js/inference/classes/HfInference#textgeneration).
+</js>
+
+</inferencesnippet>
+
+
+
+### API specification
+
+#### Request
+
+| Payload |  |  |
+| :--- | :--- | :--- |
+| **inputs*** | _string_ |  |
+| **parameters** | _object_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;best_of** | _integer_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;decoder_input_details** | _boolean_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;details** | _boolean_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;do_sample** | _boolean_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;frequency_penalty** | _number_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;grammar** | _unknown_ | One of the following: |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(#1)** |  |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;type*** | _enum_ | Possible values: json. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;value*** | _unknown_ | A string that represents a [JSON Schema](https://json-schema.org/).  JSON Schema is a declarative language that allows to annotate JSON documents with types and descriptions. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(#2)** |  |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;type*** | _enum_ | Possible values: regex. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;value*** | _string_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;max_new_tokens** | _integer_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;repetition_penalty** | _number_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return_full_text** | _boolean_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;seed** | _integer_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;stop** | _string[]_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;temperature** | _number_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;top_k** | _integer_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;top_n_tokens** | _integer_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;top_p** | _number_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;truncate** | _integer_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;typical_p** | _number_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;watermark** | _boolean_ |  |
+| **stream** | _boolean_ |  |
+
+
+Some options can be configured by passing headers to the Inference API. Here are the available headers:
+
+| Headers |   |    |
+| :--- | :--- | :--- |
+| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). |
+| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). |
+| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). |
+
+For more information about Inference API headers, check out the parameters [guide](../parameters).
+
+#### Response
+
+Output type depends on the `stream` input parameter.
+If `stream` is `false` (default), the response will be a JSON object with the following fields:
+
+| Body |  |
+| :--- | :--- | :--- |
+| **details** | _object_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;best_of_sequences** | _object[]_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;finish_reason** | _enum_ | Possible values: length, eos_token, stop_sequence. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;generated_text** | _string_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;generated_tokens** | _integer_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;prefill** | _object[]_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;id** | _integer_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;logprob** | _number_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;text** | _string_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;seed** | _integer_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;tokens** | _object[]_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;id** | _integer_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;logprob** | _number_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;special** | _boolean_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;text** | _string_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;top_tokens** | _array[]_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;id** | _integer_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;logprob** | _number_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;special** | _boolean_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;text** | _string_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;finish_reason** | _enum_ | Possible values: length, eos_token, stop_sequence. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;generated_tokens** | _integer_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;prefill** | _object[]_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;id** | _integer_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;logprob** | _number_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;text** | _string_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;seed** | _integer_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;tokens** | _object[]_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;id** | _integer_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;logprob** | _number_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;special** | _boolean_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;text** | _string_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;top_tokens** | _array[]_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;id** | _integer_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;logprob** | _number_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;special** | _boolean_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;text** | _string_ |  |
+| **generated_text** | _string_ |  |
+
+
+If `stream` is `true`, generated tokens are returned as a stream, using Server-Sent Events (SSE).
+For more information about streaming, check out [this guide](https://huggingface.co/docs/text-generation-inference/conceptual/streaming).
+
+| Body |  |
+| :--- | :--- | :--- |
+| **details** | _object_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;finish_reason** | _enum_ | Possible values: length, eos_token, stop_sequence. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;generated_tokens** | _integer_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;seed** | _integer_ |  |
+| **generated_text** | _string_ |  |
+| **index** | _integer_ |  |
+| **token** | _object_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;id** | _integer_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;logprob** | _number_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;special** | _boolean_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;text** | _string_ |  |
+| **top_tokens** | _object[]_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;id** | _integer_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;logprob** | _number_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;special** | _boolean_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;text** | _string_ |  |
+
diff --git a/docs/api-inference/tasks/text-to-image.md b/docs/api-inference/tasks/text-to-image.md
new file mode 100644
index 000000000..ec719cba0
--- /dev/null
+++ b/docs/api-inference/tasks/text-to-image.md
@@ -0,0 +1,133 @@
+<!---
+This markdown file has been generated from a script. Please do not edit it directly.
+For more details, check out:
+- the `generate.ts` script: https://github.com/huggingface/hub-docs/blob/main/scripts/api-inference/scripts/generate.ts
+- the task template defining the sections in the page: https://github.com/huggingface/hub-docs/tree/main/scripts/api-inference/templates/task/text-to-image.handlebars
+- the input jsonschema specifications used to generate the input markdown table: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/tasks/text-to-image/spec/input.json
+- the output jsonschema specifications used to generate the output markdown table: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/tasks/text-to-image/spec/output.json
+- the snippets used to generate the example:
+  - curl: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/curl.ts
+  - python: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/python.ts
+  - javascript: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/js.ts
+- the "tasks" content for recommended models: https://huggingface.co/api/tasks
+--->
+
+## Text to Image
+
+Generate an image based on a given text prompt.
+
+<Tip>
+
+For more details about the `text-to-image` task, check out its [dedicated page](https://huggingface.co/tasks/text-to-image)! You will find examples and related materials.
+
+</Tip>
+
+### Recommended models
+
+- [black-forest-labs/FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev): One of the most powerful image generation models that can generate realistic outputs.
+- [latent-consistency/lcm-lora-sdxl](https://huggingface.co/latent-consistency/lcm-lora-sdxl): A powerful yet fast image generation model.
+- [stabilityai/stable-diffusion-3-medium-diffusers](https://huggingface.co/stabilityai/stable-diffusion-3-medium-diffusers): A powerful text-to-image model.
+
+This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=text-to-image&sort=trending).
+
+### Using the API
+
+
+<inferencesnippet>
+
+<curl>
+```bash
+curl https://api-inference.huggingface.co/models/black-forest-labs/FLUX.1-dev \
+	-X POST \
+	-d '{"inputs": "Astronaut riding a horse"}' \
+	-H 'Content-Type: application/json' \
+	-H "Authorization: Bearer hf_***"
+
+```
+</curl>
+
+<python>
+```py
+import requests
+
+API_URL = "https://api-inference.huggingface.co/models/black-forest-labs/FLUX.1-dev"
+headers = {"Authorization": "Bearer hf_***"}
+
+def query(payload):
+	response = requests.post(API_URL, headers=headers, json=payload)
+	return response.content
+image_bytes = query({
+	"inputs": "Astronaut riding a horse",
+})
+# You can access the image with PIL.Image for example
+import io
+from PIL import Image
+image = Image.open(io.BytesIO(image_bytes))
+```
+
+To use the Python client, see `huggingface_hub`'s [package reference](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.text_to-image).
+</python>
+
+<js>
+```js
+async function query(data) {
+	const response = await fetch(
+		"https://api-inference.huggingface.co/models/black-forest-labs/FLUX.1-dev",
+		{
+			headers: {
+				Authorization: "Bearer hf_***"
+				"Content-Type": "application/json",
+			},
+			method: "POST",
+			body: JSON.stringify(data),
+		}
+	);
+	const result = await response.blob();
+	return result;
+}
+query({"inputs": "Astronaut riding a horse"}).then((response) => {
+	// Use image
+});
+```
+
+To use the JavaScript client, see `huggingface.js`'s [package reference](https://huggingface.co/docs/huggingface.js/inference/classes/HfInference#textto-image).
+</js>
+
+</inferencesnippet>
+
+
+
+### API specification
+
+#### Request
+
+| Payload |  |  |
+| :--- | :--- | :--- |
+| **inputs*** | _string_ | The input text data (sometimes called "prompt") |
+| **parameters** | _object_ | Additional inference parameters for Text To Image |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;guidance_scale** | _number_ | A higher guidance scale value encourages the model to generate images closely linked to the text prompt, but values too high may cause saturation and other artifacts. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;negative_prompt** | _string[]_ | One or several prompt to guide what NOT to include in image generation. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;num_inference_steps** | _integer_ | The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;target_size** | _object_ | The size in pixel of the output image |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;width*** | _integer_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;height*** | _integer_ |  |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;scheduler** | _string_ | Override the scheduler with a compatible one. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;seed** | _integer_ | Seed for the random number generator. |
+
+
+Some options can be configured by passing headers to the Inference API. Here are the available headers:
+
+| Headers |   |    |
+| :--- | :--- | :--- |
+| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). |
+| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). |
+| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). |
+
+For more information about Inference API headers, check out the parameters [guide](../parameters).
+
+#### Response
+
+| Body |  |
+| :--- | :--- | :--- |
+| **image** | _unknown_ | The generated image returned as raw bytes in the payload. |
+
diff --git a/docs/api-inference/tasks/token-classification.md b/docs/api-inference/tasks/token-classification.md
new file mode 100644
index 000000000..035582250
--- /dev/null
+++ b/docs/api-inference/tasks/token-classification.md
@@ -0,0 +1,146 @@
+<!---
+This markdown file has been generated from a script. Please do not edit it directly.
+For more details, check out:
+- the `generate.ts` script: https://github.com/huggingface/hub-docs/blob/main/scripts/api-inference/scripts/generate.ts
+- the task template defining the sections in the page: https://github.com/huggingface/hub-docs/tree/main/scripts/api-inference/templates/task/token-classification.handlebars
+- the input jsonschema specifications used to generate the input markdown table: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/tasks/token-classification/spec/input.json
+- the output jsonschema specifications used to generate the output markdown table: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/tasks/token-classification/spec/output.json
+- the snippets used to generate the example:
+  - curl: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/curl.ts
+  - python: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/python.ts
+  - javascript: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/js.ts
+- the "tasks" content for recommended models: https://huggingface.co/api/tasks
+--->
+
+## Token Classification
+
+Token classification is a task in which a label is assigned to some tokens in a text. Some popular token classification subtasks are Named Entity Recognition (NER) and Part-of-Speech (PoS) tagging.
+
+<Tip>
+
+For more details about the `token-classification` task, check out its [dedicated page](https://huggingface.co/tasks/token-classification)! You will find examples and related materials.
+
+</Tip>
+
+### Recommended models
+
+- [dslim/bert-base-NER](https://huggingface.co/dslim/bert-base-NER): A robust performance model to identify people, locations, organizations and names of miscellaneous entities.
+- [FacebookAI/xlm-roberta-large-finetuned-conll03-english](https://huggingface.co/FacebookAI/xlm-roberta-large-finetuned-conll03-english): A strong model to identify people, locations, organizations and names in multiple languages.
+- [blaze999/Medical-NER](https://huggingface.co/blaze999/Medical-NER): A token classification model specialized on medical entity recognition.
+- [flair/ner-english](https://huggingface.co/flair/ner-english): Flair models are typically the state of the art in named entity recognition tasks.
+
+This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=token-classification&sort=trending).
+
+### Using the API
+
+
+<inferencesnippet>
+
+<curl>
+```bash
+curl https://api-inference.huggingface.co/models/dslim/bert-base-NER \
+	-X POST \
+	-d '{"inputs": "My name is Sarah Jessica Parker but you can call me Jessica"}' \
+	-H 'Content-Type: application/json' \
+	-H "Authorization: Bearer hf_***"
+
+```
+</curl>
+
+<python>
+```py
+import requests
+
+API_URL = "https://api-inference.huggingface.co/models/dslim/bert-base-NER"
+headers = {"Authorization": "Bearer hf_***"}
+
+def query(payload):
+	response = requests.post(API_URL, headers=headers, json=payload)
+	return response.json()
+	
+output = query({
+	"inputs": "My name is Sarah Jessica Parker but you can call me Jessica",
+})
+```
+
+To use the Python client, see `huggingface_hub`'s [package reference](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.token_classification).
+</python>
+
+<js>
+```js
+async function query(data) {
+	const response = await fetch(
+		"https://api-inference.huggingface.co/models/dslim/bert-base-NER",
+		{
+			headers: {
+				Authorization: "Bearer hf_***"
+				"Content-Type": "application/json",
+			},
+			method: "POST",
+			body: JSON.stringify(data),
+		}
+	);
+	const result = await response.json();
+	return result;
+}
+
+query({"inputs": "My name is Sarah Jessica Parker but you can call me Jessica"}).then((response) => {
+	console.log(JSON.stringify(response));
+});
+```
+
+To use the JavaScript client, see `huggingface.js`'s [package reference](https://huggingface.co/docs/huggingface.js/inference/classes/HfInference#tokenclassification).
+</js>
+
+</inferencesnippet>
+
+
+
+### API specification
+
+#### Request
+
+| Payload |  |  |
+| :--- | :--- | :--- |
+| **inputs*** | _string_ | The input text data |
+| **parameters** | _object_ | Additional inference parameters for Token Classification |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;ignore_labels** | _string[]_ | A list of labels to ignore |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;stride** | _integer_ | The number of overlapping tokens between chunks when splitting the input text. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;aggregation_strategy** | _string_ | One of the following: |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(#1)** | _&#x27;none&#x27;_ | Do not aggregate tokens |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(#2)** | _&#x27;simple&#x27;_ | Group consecutive tokens with the same label in a single entity. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(#3)** | _&#x27;first&#x27;_ | Similar to "simple", also preserves word integrity (use the label predicted for the first token in a word). |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(#4)** | _&#x27;average&#x27;_ | Similar to "simple", also preserves word integrity (uses the label with the highest score, averaged across the word's tokens). |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(#5)** | _&#x27;max&#x27;_ | Similar to "simple", also preserves word integrity (uses the label with the highest score across the word's tokens). |
+
+
+Some options can be configured by passing headers to the Inference API. Here are the available headers:
+
+| Headers |   |    |
+| :--- | :--- | :--- |
+| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). |
+| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). |
+| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). |
+
+For more information about Inference API headers, check out the parameters [guide](../parameters).
+
+#### Response
+
+Output type depends on the `stream` input parameter.
+If `stream` is `false` (default), the response will be a JSON object with the following fields:
+
+| Body |  |
+| :--- | :--- | :--- |
+| **(array)** | _object[]_ | Output is an array of objects. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;entity_group** | _string_ | The predicted label for that group of tokens |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;score** | _number_ | The associated score / probability |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;word** | _string_ | The corresponding text |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;start** | _integer_ | The character position in the input where this group begins. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;end** | _integer_ | The character position in the input where this group ends. |
+
+
+If `stream` is `true`, generated tokens are returned as a stream, using Server-Sent Events (SSE).
+For more information about streaming, check out [this guide](https://huggingface.co/docs/token-classification-inference/conceptual/streaming).
+
+
+
diff --git a/docs/api-inference/tasks/translation.md b/docs/api-inference/tasks/translation.md
new file mode 100644
index 000000000..908aa972e
--- /dev/null
+++ b/docs/api-inference/tasks/translation.md
@@ -0,0 +1,126 @@
+<!---
+This markdown file has been generated from a script. Please do not edit it directly.
+For more details, check out:
+- the `generate.ts` script: https://github.com/huggingface/hub-docs/blob/main/scripts/api-inference/scripts/generate.ts
+- the task template defining the sections in the page: https://github.com/huggingface/hub-docs/tree/main/scripts/api-inference/templates/task/translation.handlebars
+- the input jsonschema specifications used to generate the input markdown table: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/tasks/translation/spec/input.json
+- the output jsonschema specifications used to generate the output markdown table: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/tasks/translation/spec/output.json
+- the snippets used to generate the example:
+  - curl: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/curl.ts
+  - python: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/python.ts
+  - javascript: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/js.ts
+- the "tasks" content for recommended models: https://huggingface.co/api/tasks
+--->
+
+## Translation
+
+Translation is the task of converting text from one language to another.
+
+<Tip>
+
+For more details about the `translation` task, check out its [dedicated page](https://huggingface.co/tasks/translation)! You will find examples and related materials.
+
+</Tip>
+
+### Recommended models
+
+- [google-t5/t5-base](https://huggingface.co/google-t5/t5-base): A general-purpose Transformer that can be used to translate from English to German, French, or Romanian.
+
+This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=translation&sort=trending).
+
+### Using the API
+
+
+<inferencesnippet>
+
+<curl>
+```bash
+curl https://api-inference.huggingface.co/models/google-t5/t5-base \
+	-X POST \
+	-d '{"inputs": "Меня зовут Вольфганг и я живу в Берлине"}' \
+	-H 'Content-Type: application/json' \
+	-H "Authorization: Bearer hf_***"
+
+```
+</curl>
+
+<python>
+```py
+import requests
+
+API_URL = "https://api-inference.huggingface.co/models/google-t5/t5-base"
+headers = {"Authorization": "Bearer hf_***"}
+
+def query(payload):
+	response = requests.post(API_URL, headers=headers, json=payload)
+	return response.json()
+	
+output = query({
+	"inputs": "Меня зовут Вольфганг и я живу в Берлине",
+})
+```
+
+To use the Python client, see `huggingface_hub`'s [package reference](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.translation).
+</python>
+
+<js>
+```js
+async function query(data) {
+	const response = await fetch(
+		"https://api-inference.huggingface.co/models/google-t5/t5-base",
+		{
+			headers: {
+				Authorization: "Bearer hf_***"
+				"Content-Type": "application/json",
+			},
+			method: "POST",
+			body: JSON.stringify(data),
+		}
+	);
+	const result = await response.json();
+	return result;
+}
+
+query({"inputs": "Меня зовут Вольфганг и я живу в Берлине"}).then((response) => {
+	console.log(JSON.stringify(response));
+});
+```
+
+To use the JavaScript client, see `huggingface.js`'s [package reference](https://huggingface.co/docs/huggingface.js/inference/classes/HfInference#translation).
+</js>
+
+</inferencesnippet>
+
+
+
+### API specification
+
+#### Request
+
+| Payload |  |  |
+| :--- | :--- | :--- |
+| **inputs*** | _string_ | The text to translate. |
+| **parameters** | _object_ | Additional inference parameters for Translation |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;src_lang** | _string_ | The source language of the text. Required for models that can translate from multiple languages. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;tgt_lang** | _string_ | Target language to translate to. Required for models that can translate to multiple languages. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;clean_up_tokenization_spaces** | _boolean_ | Whether to clean up the potential extra spaces in the text output. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;truncation** | _enum_ | Possible values: do_not_truncate, longest_first, only_first, only_second. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;generate_parameters** | _object_ | Additional parametrization of the text generation algorithm. |
+
+
+Some options can be configured by passing headers to the Inference API. Here are the available headers:
+
+| Headers |   |    |
+| :--- | :--- | :--- |
+| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). |
+| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). |
+| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). |
+
+For more information about Inference API headers, check out the parameters [guide](../parameters).
+
+#### Response
+
+| Body |  |
+| :--- | :--- | :--- |
+| **translation_text** | _string_ | The translated text. |
+
diff --git a/docs/api-inference/tasks/zero-shot-classification.md b/docs/api-inference/tasks/zero-shot-classification.md
new file mode 100644
index 000000000..7ccf024aa
--- /dev/null
+++ b/docs/api-inference/tasks/zero-shot-classification.md
@@ -0,0 +1,129 @@
+<!---
+This markdown file has been generated from a script. Please do not edit it directly.
+For more details, check out:
+- the `generate.ts` script: https://github.com/huggingface/hub-docs/blob/main/scripts/api-inference/scripts/generate.ts
+- the task template defining the sections in the page: https://github.com/huggingface/hub-docs/tree/main/scripts/api-inference/templates/task/zero-shot-classification.handlebars
+- the input jsonschema specifications used to generate the input markdown table: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/tasks/zero-shot-classification/spec/input.json
+- the output jsonschema specifications used to generate the output markdown table: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/tasks/zero-shot-classification/spec/output.json
+- the snippets used to generate the example:
+  - curl: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/curl.ts
+  - python: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/python.ts
+  - javascript: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/js.ts
+- the "tasks" content for recommended models: https://huggingface.co/api/tasks
+--->
+
+## Zero-Shot Classification
+
+Zero-shot text classification is super useful to try out classification with zero code, you simply pass a sentence/paragraph and the possible labels for that sentence, and you get a result. The model has not been necessarily trained on the labels you provide, but it can still predict the correct label.
+
+<Tip>
+
+For more details about the `zero-shot-classification` task, check out its [dedicated page](https://huggingface.co/tasks/zero-shot-classification)! You will find examples and related materials.
+
+</Tip>
+
+### Recommended models
+
+- [facebook/bart-large-mnli](https://huggingface.co/facebook/bart-large-mnli): Powerful zero-shot text classification model.
+- [MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7): Powerful zero-shot multilingual text classification model that can accomplish multiple tasks.
+
+This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=zero-shot-classification&sort=trending).
+
+### Using the API
+
+
+<inferencesnippet>
+
+<curl>
+```bash
+curl https://api-inference.huggingface.co/models/facebook/bart-large-mnli \
+	-X POST \
+	-d '{"inputs": "Hi, I recently bought a device from your company but it is not working as advertised and I would like to get reimbursed!", "parameters": {"candidate_labels": ["refund", "legal", "faq"]}}' \
+	-H 'Content-Type: application/json' \
+	-H "Authorization: Bearer hf_***"
+
+```
+</curl>
+
+<python>
+```py
+import requests
+
+API_URL = "https://api-inference.huggingface.co/models/facebook/bart-large-mnli"
+headers = {"Authorization": "Bearer hf_***"}
+
+def query(payload):
+	response = requests.post(API_URL, headers=headers, json=payload)
+	return response.json()
+
+output = query({
+    "inputs": "Hi, I recently bought a device from your company but it is not working as advertised and I would like to get reimbursed!",
+    "parameters": {"candidate_labels": ["refund", "legal", "faq"]},
+})
+```
+
+To use the Python client, see `huggingface_hub`'s [package reference](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.zero_shot-classification).
+</python>
+
+<js>
+```js
+async function query(data) {
+	const response = await fetch(
+		"https://api-inference.huggingface.co/models/facebook/bart-large-mnli",
+		{
+			headers: {
+				Authorization: "Bearer hf_***"
+				"Content-Type": "application/json",
+			},
+			method: "POST",
+			body: JSON.stringify(data),
+		}
+	);
+	const result = await response.json();
+	return result;
+}
+
+query({"inputs": "Hi, I recently bought a device from your company but it is not working as advertised and I would like to get reimbursed!", "parameters": {"candidate_labels": ["refund", "legal", "faq"]}}).then((response) => {
+	console.log(JSON.stringify(response));
+});
+```
+
+To use the JavaScript client, see `huggingface.js`'s [package reference](https://huggingface.co/docs/huggingface.js/inference/classes/HfInference#zeroshot-classification).
+</js>
+
+</inferencesnippet>
+
+
+
+### API specification
+
+#### Request
+
+| Payload |  |  |
+| :--- | :--- | :--- |
+| **inputs*** | _object_ | The input text data, with candidate labels |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;text*** | _string_ | The text to classify |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;candidateLabels*** | _string[]_ | The set of possible class labels to classify the text into. |
+| **parameters** | _object_ | Additional inference parameters for Zero Shot Classification |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;hypothesis_template** | _string_ | The sentence used in conjunction with candidateLabels to attempt the text classification by replacing the placeholder with the candidate labels. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;multi_label** | _boolean_ | Whether multiple candidate labels can be true. If false, the scores are normalized such that the sum of the label likelihoods for each sequence is 1. If true, the labels are considered independent and probabilities are normalized for each candidate. |
+
+
+Some options can be configured by passing headers to the Inference API. Here are the available headers:
+
+| Headers |   |    |
+| :--- | :--- | :--- |
+| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). |
+| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). |
+| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). |
+
+For more information about Inference API headers, check out the parameters [guide](../parameters).
+
+#### Response
+
+| Body |  |
+| :--- | :--- | :--- |
+| **(array)** | _object[]_ | Output is an array of objects. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;label** | _string_ | The predicted class label. |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;score** | _number_ | The corresponding probability. |
+
diff --git a/scripts/api-inference/.gitignore b/scripts/api-inference/.gitignore
new file mode 100644
index 000000000..53c37a166
--- /dev/null
+++ b/scripts/api-inference/.gitignore
@@ -0,0 +1 @@
+dist
\ No newline at end of file
diff --git a/scripts/api-inference/.prettierignore b/scripts/api-inference/.prettierignore
new file mode 100644
index 000000000..d4b43ae6c
--- /dev/null
+++ b/scripts/api-inference/.prettierignore
@@ -0,0 +1,5 @@
+pnpm-lock.yaml
+# In order to avoid code samples to have tabs, they don't display well on npm
+README.md
+dist
+*.handlebars
\ No newline at end of file
diff --git a/scripts/api-inference/README.md b/scripts/api-inference/README.md
new file mode 100644
index 000000000..67d9c79e6
--- /dev/null
+++ b/scripts/api-inference/README.md
@@ -0,0 +1,11 @@
+Install dependencies.
+
+```sh
+pnpm install
+```
+
+Generate documentation.
+
+```sh
+pnpm run generate
+```
\ No newline at end of file
diff --git a/scripts/api-inference/package.json b/scripts/api-inference/package.json
new file mode 100644
index 000000000..13f84e881
--- /dev/null
+++ b/scripts/api-inference/package.json
@@ -0,0 +1,26 @@
+{
+  "name": "api-inference-generator",
+  "version": "1.0.0",
+  "description": "",
+  "main": "index.js",
+  "type": "module",
+  "scripts": {
+    "format": "prettier --write .",
+    "format:check": "prettier --check .",
+    "generate": "tsx scripts/generate.ts"
+  },
+  "keywords": [],
+  "author": "",
+  "license": "ISC",
+  "dependencies": {
+    "@huggingface/tasks": "^0.11.11",
+    "@types/node": "^22.5.0",
+    "handlebars": "^4.7.8",
+    "node": "^20.17.0",
+    "prettier": "^3.3.3",
+    "ts-node": "^10.9.2",
+    "tsx": "^4.17.0",
+    "type-fest": "^4.25.0",
+    "typescript": "^5.5.4"
+  }
+}
diff --git a/scripts/api-inference/pnpm-lock.yaml b/scripts/api-inference/pnpm-lock.yaml
new file mode 100644
index 000000000..58267667d
--- /dev/null
+++ b/scripts/api-inference/pnpm-lock.yaml
@@ -0,0 +1,541 @@
+lockfileVersion: '9.0'
+
+settings:
+  autoInstallPeers: true
+  excludeLinksFromLockfile: false
+
+importers:
+
+  .:
+    dependencies:
+      '@huggingface/tasks':
+        specifier: ^0.11.11
+        version: 0.11.11
+      '@types/node':
+        specifier: ^22.5.0
+        version: 22.5.0
+      handlebars:
+        specifier: ^4.7.8
+        version: 4.7.8
+      node:
+        specifier: ^20.17.0
+        version: 20.17.0
+      prettier:
+        specifier: ^3.3.3
+        version: 3.3.3
+      ts-node:
+        specifier: ^10.9.2
+        version: 10.9.2(@types/node@22.5.0)(typescript@5.5.4)
+      tsx:
+        specifier: ^4.17.0
+        version: 4.17.0
+      type-fest:
+        specifier: ^4.25.0
+        version: 4.25.0
+      typescript:
+        specifier: ^5.5.4
+        version: 5.5.4
+
+packages:
+
+  '@cspotcode/source-map-support@0.8.1':
+    resolution: {integrity: sha512-IchNf6dN4tHoMFIn/7OE8LWZ19Y6q/67Bmf6vnGREv8RSbBVb9LPJxEcnwrcwX6ixSvaiGoomAUvu4YSxXrVgw==}
+    engines: {node: '>=12'}
+
+  '@esbuild/aix-ppc64@0.23.1':
+    resolution: {integrity: sha512-6VhYk1diRqrhBAqpJEdjASR/+WVRtfjpqKuNw11cLiaWpAT/Uu+nokB+UJnevzy/P9C/ty6AOe0dwueMrGh/iQ==}
+    engines: {node: '>=18'}
+    cpu: [ppc64]
+    os: [aix]
+
+  '@esbuild/android-arm64@0.23.1':
+    resolution: {integrity: sha512-xw50ipykXcLstLeWH7WRdQuysJqejuAGPd30vd1i5zSyKK3WE+ijzHmLKxdiCMtH1pHz78rOg0BKSYOSB/2Khw==}
+    engines: {node: '>=18'}
+    cpu: [arm64]
+    os: [android]
+
+  '@esbuild/android-arm@0.23.1':
+    resolution: {integrity: sha512-uz6/tEy2IFm9RYOyvKl88zdzZfwEfKZmnX9Cj1BHjeSGNuGLuMD1kR8y5bteYmwqKm1tj8m4cb/aKEorr6fHWQ==}
+    engines: {node: '>=18'}
+    cpu: [arm]
+    os: [android]
+
+  '@esbuild/android-x64@0.23.1':
+    resolution: {integrity: sha512-nlN9B69St9BwUoB+jkyU090bru8L0NA3yFvAd7k8dNsVH8bi9a8cUAUSEcEEgTp2z3dbEDGJGfP6VUnkQnlReg==}
+    engines: {node: '>=18'}
+    cpu: [x64]
+    os: [android]
+
+  '@esbuild/darwin-arm64@0.23.1':
+    resolution: {integrity: sha512-YsS2e3Wtgnw7Wq53XXBLcV6JhRsEq8hkfg91ESVadIrzr9wO6jJDMZnCQbHm1Guc5t/CdDiFSSfWP58FNuvT3Q==}
+    engines: {node: '>=18'}
+    cpu: [arm64]
+    os: [darwin]
+
+  '@esbuild/darwin-x64@0.23.1':
+    resolution: {integrity: sha512-aClqdgTDVPSEGgoCS8QDG37Gu8yc9lTHNAQlsztQ6ENetKEO//b8y31MMu2ZaPbn4kVsIABzVLXYLhCGekGDqw==}
+    engines: {node: '>=18'}
+    cpu: [x64]
+    os: [darwin]
+
+  '@esbuild/freebsd-arm64@0.23.1':
+    resolution: {integrity: sha512-h1k6yS8/pN/NHlMl5+v4XPfikhJulk4G+tKGFIOwURBSFzE8bixw1ebjluLOjfwtLqY0kewfjLSrO6tN2MgIhA==}
+    engines: {node: '>=18'}
+    cpu: [arm64]
+    os: [freebsd]
+
+  '@esbuild/freebsd-x64@0.23.1':
+    resolution: {integrity: sha512-lK1eJeyk1ZX8UklqFd/3A60UuZ/6UVfGT2LuGo3Wp4/z7eRTRYY+0xOu2kpClP+vMTi9wKOfXi2vjUpO1Ro76g==}
+    engines: {node: '>=18'}
+    cpu: [x64]
+    os: [freebsd]
+
+  '@esbuild/linux-arm64@0.23.1':
+    resolution: {integrity: sha512-/93bf2yxencYDnItMYV/v116zff6UyTjo4EtEQjUBeGiVpMmffDNUyD9UN2zV+V3LRV3/on4xdZ26NKzn6754g==}
+    engines: {node: '>=18'}
+    cpu: [arm64]
+    os: [linux]
+
+  '@esbuild/linux-arm@0.23.1':
+    resolution: {integrity: sha512-CXXkzgn+dXAPs3WBwE+Kvnrf4WECwBdfjfeYHpMeVxWE0EceB6vhWGShs6wi0IYEqMSIzdOF1XjQ/Mkm5d7ZdQ==}
+    engines: {node: '>=18'}
+    cpu: [arm]
+    os: [linux]
+
+  '@esbuild/linux-ia32@0.23.1':
+    resolution: {integrity: sha512-VTN4EuOHwXEkXzX5nTvVY4s7E/Krz7COC8xkftbbKRYAl96vPiUssGkeMELQMOnLOJ8k3BY1+ZY52tttZnHcXQ==}
+    engines: {node: '>=18'}
+    cpu: [ia32]
+    os: [linux]
+
+  '@esbuild/linux-loong64@0.23.1':
+    resolution: {integrity: sha512-Vx09LzEoBa5zDnieH8LSMRToj7ir/Jeq0Gu6qJ/1GcBq9GkfoEAoXvLiW1U9J1qE/Y/Oyaq33w5p2ZWrNNHNEw==}
+    engines: {node: '>=18'}
+    cpu: [loong64]
+    os: [linux]
+
+  '@esbuild/linux-mips64el@0.23.1':
+    resolution: {integrity: sha512-nrFzzMQ7W4WRLNUOU5dlWAqa6yVeI0P78WKGUo7lg2HShq/yx+UYkeNSE0SSfSure0SqgnsxPvmAUu/vu0E+3Q==}
+    engines: {node: '>=18'}
+    cpu: [mips64el]
+    os: [linux]
+
+  '@esbuild/linux-ppc64@0.23.1':
+    resolution: {integrity: sha512-dKN8fgVqd0vUIjxuJI6P/9SSSe/mB9rvA98CSH2sJnlZ/OCZWO1DJvxj8jvKTfYUdGfcq2dDxoKaC6bHuTlgcw==}
+    engines: {node: '>=18'}
+    cpu: [ppc64]
+    os: [linux]
+
+  '@esbuild/linux-riscv64@0.23.1':
+    resolution: {integrity: sha512-5AV4Pzp80fhHL83JM6LoA6pTQVWgB1HovMBsLQ9OZWLDqVY8MVobBXNSmAJi//Csh6tcY7e7Lny2Hg1tElMjIA==}
+    engines: {node: '>=18'}
+    cpu: [riscv64]
+    os: [linux]
+
+  '@esbuild/linux-s390x@0.23.1':
+    resolution: {integrity: sha512-9ygs73tuFCe6f6m/Tb+9LtYxWR4c9yg7zjt2cYkjDbDpV/xVn+68cQxMXCjUpYwEkze2RcU/rMnfIXNRFmSoDw==}
+    engines: {node: '>=18'}
+    cpu: [s390x]
+    os: [linux]
+
+  '@esbuild/linux-x64@0.23.1':
+    resolution: {integrity: sha512-EV6+ovTsEXCPAp58g2dD68LxoP/wK5pRvgy0J/HxPGB009omFPv3Yet0HiaqvrIrgPTBuC6wCH1LTOY91EO5hQ==}
+    engines: {node: '>=18'}
+    cpu: [x64]
+    os: [linux]
+
+  '@esbuild/netbsd-x64@0.23.1':
+    resolution: {integrity: sha512-aevEkCNu7KlPRpYLjwmdcuNz6bDFiE7Z8XC4CPqExjTvrHugh28QzUXVOZtiYghciKUacNktqxdpymplil1beA==}
+    engines: {node: '>=18'}
+    cpu: [x64]
+    os: [netbsd]
+
+  '@esbuild/openbsd-arm64@0.23.1':
+    resolution: {integrity: sha512-3x37szhLexNA4bXhLrCC/LImN/YtWis6WXr1VESlfVtVeoFJBRINPJ3f0a/6LV8zpikqoUg4hyXw0sFBt5Cr+Q==}
+    engines: {node: '>=18'}
+    cpu: [arm64]
+    os: [openbsd]
+
+  '@esbuild/openbsd-x64@0.23.1':
+    resolution: {integrity: sha512-aY2gMmKmPhxfU+0EdnN+XNtGbjfQgwZj43k8G3fyrDM/UdZww6xrWxmDkuz2eCZchqVeABjV5BpildOrUbBTqA==}
+    engines: {node: '>=18'}
+    cpu: [x64]
+    os: [openbsd]
+
+  '@esbuild/sunos-x64@0.23.1':
+    resolution: {integrity: sha512-RBRT2gqEl0IKQABT4XTj78tpk9v7ehp+mazn2HbUeZl1YMdaGAQqhapjGTCe7uw7y0frDi4gS0uHzhvpFuI1sA==}
+    engines: {node: '>=18'}
+    cpu: [x64]
+    os: [sunos]
+
+  '@esbuild/win32-arm64@0.23.1':
+    resolution: {integrity: sha512-4O+gPR5rEBe2FpKOVyiJ7wNDPA8nGzDuJ6gN4okSA1gEOYZ67N8JPk58tkWtdtPeLz7lBnY6I5L3jdsr3S+A6A==}
+    engines: {node: '>=18'}
+    cpu: [arm64]
+    os: [win32]
+
+  '@esbuild/win32-ia32@0.23.1':
+    resolution: {integrity: sha512-BcaL0Vn6QwCwre3Y717nVHZbAa4UBEigzFm6VdsVdT/MbZ38xoj1X9HPkZhbmaBGUD1W8vxAfffbDe8bA6AKnQ==}
+    engines: {node: '>=18'}
+    cpu: [ia32]
+    os: [win32]
+
+  '@esbuild/win32-x64@0.23.1':
+    resolution: {integrity: sha512-BHpFFeslkWrXWyUPnbKm+xYYVYruCinGcftSBaa8zoF9hZO4BcSCFUvHVTtzpIY6YzUnYtuEhZ+C9iEXjxnasg==}
+    engines: {node: '>=18'}
+    cpu: [x64]
+    os: [win32]
+
+  '@huggingface/tasks@0.11.11':
+    resolution: {integrity: sha512-YRleUv67oSqDOkcYm4pFdBeaw8I8Dh6/DYlXo02fxXj5iC/WiDi8PE1wBhAhTdASwkl/n1V4xbL69uKXwDNDGw==}
+
+  '@jridgewell/resolve-uri@3.1.2':
+    resolution: {integrity: sha512-bRISgCIjP20/tbWSPWMEi54QVPRZExkuD9lJL+UIxUKtwVJA8wW1Trb1jMs1RFXo1CBTNZ/5hpC9QvmKWdopKw==}
+    engines: {node: '>=6.0.0'}
+
+  '@jridgewell/sourcemap-codec@1.5.0':
+    resolution: {integrity: sha512-gv3ZRaISU3fjPAgNsriBRqGWQL6quFx04YMPW/zD8XMLsU32mhCCbfbO6KZFLjvYpCZ8zyDEgqsgf+PwPaM7GQ==}
+
+  '@jridgewell/trace-mapping@0.3.9':
+    resolution: {integrity: sha512-3Belt6tdc8bPgAtbcmdtNJlirVoTmEb5e2gC94PnkwEW9jI6CAHUeoG85tjWP5WquqfavoMtMwiG4P926ZKKuQ==}
+
+  '@tsconfig/node10@1.0.11':
+    resolution: {integrity: sha512-DcRjDCujK/kCk/cUe8Xz8ZSpm8mS3mNNpta+jGCA6USEDfktlNvm1+IuZ9eTcDbNk41BHwpHHeW+N1lKCz4zOw==}
+
+  '@tsconfig/node12@1.0.11':
+    resolution: {integrity: sha512-cqefuRsh12pWyGsIoBKJA9luFu3mRxCA+ORZvA4ktLSzIuCUtWVxGIuXigEwO5/ywWFMZ2QEGKWvkZG1zDMTag==}
+
+  '@tsconfig/node14@1.0.3':
+    resolution: {integrity: sha512-ysT8mhdixWK6Hw3i1V2AeRqZ5WfXg1G43mqoYlM2nc6388Fq5jcXyr5mRsqViLx/GJYdoL0bfXD8nmF+Zn/Iow==}
+
+  '@tsconfig/node16@1.0.4':
+    resolution: {integrity: sha512-vxhUy4J8lyeyinH7Azl1pdd43GJhZH/tP2weN8TntQblOY+A0XbT8DJk1/oCPuOOyg/Ja757rG0CgHcWC8OfMA==}
+
+  '@types/node@22.5.0':
+    resolution: {integrity: sha512-DkFrJOe+rfdHTqqMg0bSNlGlQ85hSoh2TPzZyhHsXnMtligRWpxUySiyw8FY14ITt24HVCiQPWxS3KO/QlGmWg==}
+
+  acorn-walk@8.3.3:
+    resolution: {integrity: sha512-MxXdReSRhGO7VlFe1bRG/oI7/mdLV9B9JJT0N8vZOhF7gFRR5l3M8W9G8JxmKV+JC5mGqJ0QvqfSOLsCPa4nUw==}
+    engines: {node: '>=0.4.0'}
+
+  acorn@8.12.1:
+    resolution: {integrity: sha512-tcpGyI9zbizT9JbV6oYE477V6mTlXvvi0T0G3SNIYE2apm/G5huBa1+K89VGeovbg+jycCrfhl3ADxErOuO6Jg==}
+    engines: {node: '>=0.4.0'}
+    hasBin: true
+
+  arg@4.1.3:
+    resolution: {integrity: sha512-58S9QDqG0Xx27YwPSt9fJxivjYl432YCwfDMfZ+71RAqUrZef7LrKQZ3LHLOwCS4FLNBplP533Zx895SeOCHvA==}
+
+  create-require@1.1.1:
+    resolution: {integrity: sha512-dcKFX3jn0MpIaXjisoRvexIJVEKzaq7z2rZKxf+MSr9TkdmHmsU4m2lcLojrj/FHl8mk5VxMmYA+ftRkP/3oKQ==}
+
+  diff@4.0.2:
+    resolution: {integrity: sha512-58lmxKSA4BNyLz+HHMUzlOEpg09FV+ev6ZMe3vJihgdxzgcwZ8VoEEPmALCZG9LmqfVoNMMKpttIYTVG6uDY7A==}
+    engines: {node: '>=0.3.1'}
+
+  esbuild@0.23.1:
+    resolution: {integrity: sha512-VVNz/9Sa0bs5SELtn3f7qhJCDPCF5oMEl5cO9/SSinpE9hbPVvxbd572HH5AKiP7WD8INO53GgfDDhRjkylHEg==}
+    engines: {node: '>=18'}
+    hasBin: true
+
+  fsevents@2.3.3:
+    resolution: {integrity: sha512-5xoDfX+fL7faATnagmWPpbFtwh/R77WmMMqqHGS65C3vvB0YHrgF+B1YmZ3441tMj5n63k0212XNoJwzlhffQw==}
+    engines: {node: ^8.16.0 || ^10.6.0 || >=11.0.0}
+    os: [darwin]
+
+  get-tsconfig@4.7.6:
+    resolution: {integrity: sha512-ZAqrLlu18NbDdRaHq+AKXzAmqIUPswPWKUchfytdAjiRFnCe5ojG2bstg6mRiZabkKfCoL/e98pbBELIV/YCeA==}
+
+  handlebars@4.7.8:
+    resolution: {integrity: sha512-vafaFqs8MZkRrSX7sFVUdo3ap/eNiLnb4IakshzvP56X5Nr1iGKAIqdX6tMlm6HcNRIkr6AxO5jFEoJzzpT8aQ==}
+    engines: {node: '>=0.4.7'}
+    hasBin: true
+
+  make-error@1.3.6:
+    resolution: {integrity: sha512-s8UhlNe7vPKomQhC1qFelMokr/Sc3AgNbso3n74mVPA5LTZwkB9NlXf4XPamLxJE8h0gh73rM94xvwRT2CVInw==}
+
+  minimist@1.2.8:
+    resolution: {integrity: sha512-2yyAR8qBkN3YuheJanUpWC5U3bb5osDywNB8RzDVlDwDHbocAJveqqj1u8+SVD7jkWT4yvsHCpWqqWqAxb0zCA==}
+
+  neo-async@2.6.2:
+    resolution: {integrity: sha512-Yd3UES5mWCSqR+qNT93S3UoYUkqAZ9lLg8a7g9rimsWmYGK8cVToA4/sF3RrshdyV3sAGMXVUmpMYOw+dLpOuw==}
+
+  node-bin-setup@1.1.3:
+    resolution: {integrity: sha512-opgw9iSCAzT2+6wJOETCpeRYAQxSopqQ2z+N6BXwIMsQQ7Zj5M8MaafQY8JMlolRR6R1UXg2WmhKp0p9lSOivg==}
+
+  node@20.17.0:
+    resolution: {integrity: sha512-zjgqs6fjta3bWGrwCmtT42gIkupAmvdq5QerbnCgNiQHE+3HrYSXuNrTw5sxQAHG2sZGgMVCxsXQ5OXLV+dkjw==}
+    engines: {npm: '>=5.0.0'}
+    hasBin: true
+
+  prettier@3.3.3:
+    resolution: {integrity: sha512-i2tDNA0O5IrMO757lfrdQZCc2jPNDVntV0m/+4whiDfWaTKfMNgR7Qz0NAeGz/nRqF4m5/6CLzbP4/liHt12Ew==}
+    engines: {node: '>=14'}
+    hasBin: true
+
+  resolve-pkg-maps@1.0.0:
+    resolution: {integrity: sha512-seS2Tj26TBVOC2NIc2rOe2y2ZO7efxITtLZcGSOnHHNOQ7CkiUBfw0Iw2ck6xkIhPwLhKNLS8BO+hEpngQlqzw==}
+
+  source-map@0.6.1:
+    resolution: {integrity: sha512-UjgapumWlbMhkBgzT7Ykc5YXUT46F0iKu8SGXq0bcwP5dz/h0Plj6enJqjz1Zbq2l5WaqYnrVbwWOWMyF3F47g==}
+    engines: {node: '>=0.10.0'}
+
+  ts-node@10.9.2:
+    resolution: {integrity: sha512-f0FFpIdcHgn8zcPSbf1dRevwt047YMnaiJM3u2w2RewrB+fob/zePZcrOyQoLMMO7aBIddLcQIEK5dYjkLnGrQ==}
+    hasBin: true
+    peerDependencies:
+      '@swc/core': '>=1.2.50'
+      '@swc/wasm': '>=1.2.50'
+      '@types/node': '*'
+      typescript: '>=2.7'
+    peerDependenciesMeta:
+      '@swc/core':
+        optional: true
+      '@swc/wasm':
+        optional: true
+
+  tsx@4.17.0:
+    resolution: {integrity: sha512-eN4mnDA5UMKDt4YZixo9tBioibaMBpoxBkD+rIPAjVmYERSG0/dWEY1CEFuV89CgASlKL499q8AhmkMnnjtOJg==}
+    engines: {node: '>=18.0.0'}
+    hasBin: true
+
+  type-fest@4.25.0:
+    resolution: {integrity: sha512-bRkIGlXsnGBRBQRAY56UXBm//9qH4bmJfFvq83gSz41N282df+fjy8ofcEgc1sM8geNt5cl6mC2g9Fht1cs8Aw==}
+    engines: {node: '>=16'}
+
+  typescript@5.5.4:
+    resolution: {integrity: sha512-Mtq29sKDAEYP7aljRgtPOpTvOfbwRWlS6dPRzwjdE+C0R4brX/GUyhHSecbHMFLNBLcJIPt9nl9yG5TZ1weH+Q==}
+    engines: {node: '>=14.17'}
+    hasBin: true
+
+  uglify-js@3.19.2:
+    resolution: {integrity: sha512-S8KA6DDI47nQXJSi2ctQ629YzwOVs+bQML6DAtvy0wgNdpi+0ySpQK0g2pxBq2xfF2z3YCscu7NNA8nXT9PlIQ==}
+    engines: {node: '>=0.8.0'}
+    hasBin: true
+
+  undici-types@6.19.8:
+    resolution: {integrity: sha512-ve2KP6f/JnbPBFyobGHuerC9g1FYGn/F8n1LWTwNxCEzd6IfqTwUQcNXgEtmmQ6DlRrC1hrSrBnCZPokRrDHjw==}
+
+  v8-compile-cache-lib@3.0.1:
+    resolution: {integrity: sha512-wa7YjyUGfNZngI/vtK0UHAN+lgDCxBPCylVXGp0zu59Fz5aiGtNXaq3DhIov063MorB+VfufLh3JlF2KdTK3xg==}
+
+  wordwrap@1.0.0:
+    resolution: {integrity: sha512-gvVzJFlPycKc5dZN4yPkP8w7Dc37BtP1yczEneOb4uq34pXZcvrtRTmWV8W+Ume+XCxKgbjM+nevkyFPMybd4Q==}
+
+  yn@3.1.1:
+    resolution: {integrity: sha512-Ux4ygGWsu2c7isFWe8Yu1YluJmqVhxqK2cLXNQA5AcC3QfbGNpM7fu0Y8b/z16pXLnFxZYvWhd3fhBY9DLmC6Q==}
+    engines: {node: '>=6'}
+
+snapshots:
+
+  '@cspotcode/source-map-support@0.8.1':
+    dependencies:
+      '@jridgewell/trace-mapping': 0.3.9
+
+  '@esbuild/aix-ppc64@0.23.1':
+    optional: true
+
+  '@esbuild/android-arm64@0.23.1':
+    optional: true
+
+  '@esbuild/android-arm@0.23.1':
+    optional: true
+
+  '@esbuild/android-x64@0.23.1':
+    optional: true
+
+  '@esbuild/darwin-arm64@0.23.1':
+    optional: true
+
+  '@esbuild/darwin-x64@0.23.1':
+    optional: true
+
+  '@esbuild/freebsd-arm64@0.23.1':
+    optional: true
+
+  '@esbuild/freebsd-x64@0.23.1':
+    optional: true
+
+  '@esbuild/linux-arm64@0.23.1':
+    optional: true
+
+  '@esbuild/linux-arm@0.23.1':
+    optional: true
+
+  '@esbuild/linux-ia32@0.23.1':
+    optional: true
+
+  '@esbuild/linux-loong64@0.23.1':
+    optional: true
+
+  '@esbuild/linux-mips64el@0.23.1':
+    optional: true
+
+  '@esbuild/linux-ppc64@0.23.1':
+    optional: true
+
+  '@esbuild/linux-riscv64@0.23.1':
+    optional: true
+
+  '@esbuild/linux-s390x@0.23.1':
+    optional: true
+
+  '@esbuild/linux-x64@0.23.1':
+    optional: true
+
+  '@esbuild/netbsd-x64@0.23.1':
+    optional: true
+
+  '@esbuild/openbsd-arm64@0.23.1':
+    optional: true
+
+  '@esbuild/openbsd-x64@0.23.1':
+    optional: true
+
+  '@esbuild/sunos-x64@0.23.1':
+    optional: true
+
+  '@esbuild/win32-arm64@0.23.1':
+    optional: true
+
+  '@esbuild/win32-ia32@0.23.1':
+    optional: true
+
+  '@esbuild/win32-x64@0.23.1':
+    optional: true
+
+  '@huggingface/tasks@0.11.11': {}
+
+  '@jridgewell/resolve-uri@3.1.2': {}
+
+  '@jridgewell/sourcemap-codec@1.5.0': {}
+
+  '@jridgewell/trace-mapping@0.3.9':
+    dependencies:
+      '@jridgewell/resolve-uri': 3.1.2
+      '@jridgewell/sourcemap-codec': 1.5.0
+
+  '@tsconfig/node10@1.0.11': {}
+
+  '@tsconfig/node12@1.0.11': {}
+
+  '@tsconfig/node14@1.0.3': {}
+
+  '@tsconfig/node16@1.0.4': {}
+
+  '@types/node@22.5.0':
+    dependencies:
+      undici-types: 6.19.8
+
+  acorn-walk@8.3.3:
+    dependencies:
+      acorn: 8.12.1
+
+  acorn@8.12.1: {}
+
+  arg@4.1.3: {}
+
+  create-require@1.1.1: {}
+
+  diff@4.0.2: {}
+
+  esbuild@0.23.1:
+    optionalDependencies:
+      '@esbuild/aix-ppc64': 0.23.1
+      '@esbuild/android-arm': 0.23.1
+      '@esbuild/android-arm64': 0.23.1
+      '@esbuild/android-x64': 0.23.1
+      '@esbuild/darwin-arm64': 0.23.1
+      '@esbuild/darwin-x64': 0.23.1
+      '@esbuild/freebsd-arm64': 0.23.1
+      '@esbuild/freebsd-x64': 0.23.1
+      '@esbuild/linux-arm': 0.23.1
+      '@esbuild/linux-arm64': 0.23.1
+      '@esbuild/linux-ia32': 0.23.1
+      '@esbuild/linux-loong64': 0.23.1
+      '@esbuild/linux-mips64el': 0.23.1
+      '@esbuild/linux-ppc64': 0.23.1
+      '@esbuild/linux-riscv64': 0.23.1
+      '@esbuild/linux-s390x': 0.23.1
+      '@esbuild/linux-x64': 0.23.1
+      '@esbuild/netbsd-x64': 0.23.1
+      '@esbuild/openbsd-arm64': 0.23.1
+      '@esbuild/openbsd-x64': 0.23.1
+      '@esbuild/sunos-x64': 0.23.1
+      '@esbuild/win32-arm64': 0.23.1
+      '@esbuild/win32-ia32': 0.23.1
+      '@esbuild/win32-x64': 0.23.1
+
+  fsevents@2.3.3:
+    optional: true
+
+  get-tsconfig@4.7.6:
+    dependencies:
+      resolve-pkg-maps: 1.0.0
+
+  handlebars@4.7.8:
+    dependencies:
+      minimist: 1.2.8
+      neo-async: 2.6.2
+      source-map: 0.6.1
+      wordwrap: 1.0.0
+    optionalDependencies:
+      uglify-js: 3.19.2
+
+  make-error@1.3.6: {}
+
+  minimist@1.2.8: {}
+
+  neo-async@2.6.2: {}
+
+  node-bin-setup@1.1.3: {}
+
+  node@20.17.0:
+    dependencies:
+      node-bin-setup: 1.1.3
+
+  prettier@3.3.3: {}
+
+  resolve-pkg-maps@1.0.0: {}
+
+  source-map@0.6.1: {}
+
+  ts-node@10.9.2(@types/node@22.5.0)(typescript@5.5.4):
+    dependencies:
+      '@cspotcode/source-map-support': 0.8.1
+      '@tsconfig/node10': 1.0.11
+      '@tsconfig/node12': 1.0.11
+      '@tsconfig/node14': 1.0.3
+      '@tsconfig/node16': 1.0.4
+      '@types/node': 22.5.0
+      acorn: 8.12.1
+      acorn-walk: 8.3.3
+      arg: 4.1.3
+      create-require: 1.1.1
+      diff: 4.0.2
+      make-error: 1.3.6
+      typescript: 5.5.4
+      v8-compile-cache-lib: 3.0.1
+      yn: 3.1.1
+
+  tsx@4.17.0:
+    dependencies:
+      esbuild: 0.23.1
+      get-tsconfig: 4.7.6
+    optionalDependencies:
+      fsevents: 2.3.3
+
+  type-fest@4.25.0: {}
+
+  typescript@5.5.4: {}
+
+  uglify-js@3.19.2:
+    optional: true
+
+  undici-types@6.19.8: {}
+
+  v8-compile-cache-lib@3.0.1: {}
+
+  wordwrap@1.0.0: {}
+
+  yn@3.1.1: {}
diff --git a/scripts/api-inference/scripts/.gitignore b/scripts/api-inference/scripts/.gitignore
new file mode 100644
index 000000000..4c43fe68f
--- /dev/null
+++ b/scripts/api-inference/scripts/.gitignore
@@ -0,0 +1 @@
+*.js
\ No newline at end of file
diff --git a/scripts/api-inference/scripts/generate.ts b/scripts/api-inference/scripts/generate.ts
new file mode 100644
index 000000000..286594bbb
--- /dev/null
+++ b/scripts/api-inference/scripts/generate.ts
@@ -0,0 +1,504 @@
+import { snippets, PipelineType } from "@huggingface/tasks";
+import Handlebars from "handlebars";
+import * as fs from "node:fs/promises";
+import * as path from "node:path/posix";
+import type { JsonObject } from "type-fest";
+
+const TASKS: PipelineType[] = [
+  "automatic-speech-recognition",
+  "audio-classification",
+  "feature-extraction",
+  "fill-mask",
+  "image-classification",
+  "image-segmentation",
+  "image-to-image",
+  "object-detection",
+  "question-answering",
+  "summarization",
+  "table-question-answering",
+  "text-classification",
+  "text-generation",
+  "text-to-image",
+  "token-classification",
+  "translation",
+  "zero-shot-classification",
+];
+const TASKS_EXTENDED = [...TASKS, "chat-completion"];
+const SPECS_REVISION = "main";
+
+const inferenceSnippetLanguages = ["python", "js", "curl"] as const;
+type InferenceSnippetLanguage = (typeof inferenceSnippetLanguages)[number];
+
+// Taken from https://stackoverflow.com/a/31632215
+Handlebars.registerHelper({
+  eq: (v1, v2) => v1 === v2,
+  ne: (v1, v2) => v1 !== v2,
+  lt: (v1, v2) => v1 < v2,
+  gt: (v1, v2) => v1 > v2,
+  lte: (v1, v2) => v1 <= v2,
+  gte: (v1, v2) => v1 >= v2,
+  and() {
+    return Array.prototype.every.call(arguments, Boolean);
+  },
+  or() {
+    return Array.prototype.slice.call(arguments, 0, -1).some(Boolean);
+  },
+});
+
+console.log("🛠️  Preparing...");
+
+////////////////////////
+//// Filepath utils ////
+////////////////////////
+
+const ROOT_DIR = path
+  .join(path.normalize(import.meta.url), "..", "..")
+  .replace(/^(file:)/, "");
+const TEMPLATE_DIR = path.join(ROOT_DIR, "templates");
+const DOCS_DIR = path.join(ROOT_DIR, "..", "..", "docs");
+const TASKS_DOCS_DIR = path.join(DOCS_DIR, "api-inference", "tasks");
+
+const NBSP = "&nbsp;"; // non-breaking space
+const TABLE_INDENT = NBSP.repeat(8);
+
+function readTemplate(
+  templateName: string,
+  namespace: string,
+): Promise<string> {
+  const templatePath = path.join(
+    TEMPLATE_DIR,
+    namespace,
+    `${templateName}.handlebars`,
+  );
+  console.log(`   🔍 Reading ${templateName}.handlebars`);
+  return fs.readFile(templatePath, { encoding: "utf-8" });
+}
+
+function writeTaskDoc(templateName: string, content: string): Promise<void> {
+  const taskDocPath = path.join(TASKS_DOCS_DIR, `${templateName}.md`);
+  console.log(`   💾 Saving to ${taskDocPath}`);
+  const header = PAGE_HEADER({task:templateName});
+  const contentWithHeader = `<!---\n${header}\n--->\n\n${content}`;
+  return fs
+    .mkdir(TASKS_DOCS_DIR, { recursive: true })
+    .then(() => fs.writeFile(taskDocPath, contentWithHeader, { encoding: "utf-8" }));
+}
+
+/////////////////////////
+//// Task page utils ////
+/////////////////////////
+
+const TASKS_API_URL = "https://huggingface.co/api/tasks";
+console.log(`   🕸️  Fetching ${TASKS_API_URL}`);
+const response = await fetch(TASKS_API_URL);
+// eslint-disable-next-line @typescript-eslint/no-explicit-any
+const TASKS_DATA = (await response.json()) as any;
+
+///////////////////////
+//// Snippet utils ////
+///////////////////////
+
+const GET_SNIPPET_FN = {
+  curl: snippets.curl.getCurlInferenceSnippet,
+  js: snippets.js.getJsInferenceSnippet,
+  python: snippets.python.getPythonInferenceSnippet,
+} as const;
+
+const HAS_SNIPPET_FN = {
+  curl: snippets.curl.hasCurlInferenceSnippet,
+  js: snippets.js.hasJsInferenceSnippet,
+  python: snippets.python.hasPythonInferenceSnippet,
+} as const;
+
+export function getInferenceSnippet(
+  id: string,
+  pipeline_tag: PipelineType,
+  language: InferenceSnippetLanguage,
+): string | undefined {
+  const modelData = {
+    id,
+    pipeline_tag,
+    mask_token: "[MASK]",
+    library_name: "",
+    config: {},
+  };
+  if (HAS_SNIPPET_FN[language](modelData)) {
+    return GET_SNIPPET_FN[language](modelData, "hf_***");
+  }
+}
+
+/////////////////////
+//// Specs utils ////
+/////////////////////
+
+type SpecNameType = "input" | "output" | "stream_output";
+
+const SPECS_URL_TEMPLATE = Handlebars.compile(
+  `https://raw.githubusercontent.com/huggingface/huggingface.js/${SPECS_REVISION}/packages/tasks/src/tasks/{{task}}/spec/{{name}}.json`,
+);
+const COMMON_DEFINITIONS_URL = `https://raw.githubusercontent.com/huggingface/huggingface.js/${SPECS_REVISION}/packages/tasks/src/tasks/common-definitions.json`;
+
+async function fetchOneSpec(
+  task: PipelineType,
+  name: SpecNameType,
+): Promise<JsonObject | undefined> {
+  const url = SPECS_URL_TEMPLATE({ task, name });
+  console.log(`   🕸️  Fetching ${task} ${name} specs`);
+  return fetch(url)
+    .then((res) => res.json())
+    .catch(() => undefined);
+}
+
+async function fetchSpecs(
+  task: PipelineType,
+): Promise<
+  Record<"input" | "output" | "stream_output", JsonObject | undefined>
+> {
+  return {
+    input: await fetchOneSpec(task, "input"),
+    output: await fetchOneSpec(task, "output"),
+    stream_output: await fetchOneSpec(task, "stream_output"),
+  };
+}
+
+async function fetchCommonDefinitions(): Promise<JsonObject> {
+  console.log(`   🕸️  Fetching common definitions`);
+  return fetch(COMMON_DEFINITIONS_URL).then((res) => res.json());
+}
+
+const COMMON_DEFINITIONS = await fetchCommonDefinitions();
+
+function processPayloadSchema(schema: any): JsonObject[] {
+  let rows: JsonObject[] = [];
+
+  // Helper function to resolve schema references
+  function resolveRef(ref: string) {
+    const refPath = ref.split("#/")[1].split("/");
+    let refSchema = ref.includes("common-definitions.json")
+      ? COMMON_DEFINITIONS
+      : schema;
+    for (const part of refPath) {
+      refSchema = refSchema[part];
+    }
+    return refSchema;
+  }
+
+  // Helper function to process a schema node
+  function processSchemaNode(
+    key: string,
+    value: any,
+    required: boolean,
+    parentPrefix: string,
+  ): void {
+    const isRequired = required;
+    let type = value.type || "unknown";
+    let description = value.description || "";
+
+    if (value.$ref) {
+      // Resolve the reference
+      value = resolveRef(value.$ref);
+      type = value.type || "unknown";
+      description = value.description || "";
+    }
+
+    if (value.enum) {
+      type = "enum";
+      description = `Possible values: ${value.enum.join(", ")}.`;
+    }
+
+    const isObject = type === "object" && value.properties;
+    const isArray = type === "array" && value.items;
+    const isCombinator = value.oneOf || value.allOf || value.anyOf;
+    const addRow =
+      !(isCombinator && isCombinator.length === 1) &&
+      !description.includes("UNUSED") &&
+      !key.includes("SKIP") &&
+      key.length > 0;
+
+    if (isCombinator && isCombinator.length > 1) {
+      description = "One of the following:";
+    }
+
+    if (isArray) {
+      if (value.items.$ref) {
+        type = "object[]";
+      } else if (value.items.type) {
+        type = `${value.items.type}[]`;
+      }
+    }
+
+    if (addRow) {
+      // Add the row to the table except if combination with only one option
+      if (key.includes("(#")) {
+        // If it's a combination, no need to re-specify the type except if it's to
+        // specify a constant value.
+        type = value.const ? `'${value.const}'` : "";
+      }
+      const row = {
+        name: `${parentPrefix}${key}`,
+        type: type,
+        description: description.replace(/\n/g, " "),
+        required: isRequired,
+      };
+      rows.push(row);
+    }
+
+    if (isObject) {
+      // Recursively process nested objects
+      Object.entries(value.properties || {}).forEach(
+        ([nestedKey, nestedValue]) => {
+          const nestedRequired = value.required?.includes(nestedKey);
+          processSchemaNode(
+            nestedKey,
+            nestedValue,
+            nestedRequired,
+            parentPrefix + TABLE_INDENT,
+          );
+        },
+      );
+    } else if (isArray) {
+      // Process array items
+      processSchemaNode("SKIP", value.items, false, parentPrefix);
+    } else if (isCombinator) {
+      // Process combinators like oneOf, allOf, anyOf
+      const combinators = value.oneOf || value.allOf || value.anyOf;
+      if (combinators.length === 1) {
+        // If there is only one option, process it directly
+        processSchemaNode(key, combinators[0], isRequired, parentPrefix);
+      } else {
+        // If there are multiple options, process each one as options
+        combinators.forEach((subSchema: any, index: number) => {
+          processSchemaNode(
+            `${NBSP}(#${index + 1})`,
+            subSchema,
+            isRequired,
+            parentPrefix + TABLE_INDENT,
+          );
+        });
+      }
+    }
+  }
+
+  // Start processing based on the root type of the schema
+  if (schema.type === "array") {
+    // If the root schema is an array, process its items
+    const row = {
+      name: "(array)",
+      type: `${schema.items.type}[]`,
+      description:
+        schema.items.description ||
+        `Output is an array of ${schema.items.type}s.`,
+      required: true,
+    };
+    rows.push(row);
+    processSchemaNode("", schema.items, false, "");
+  } else {
+    // Otherwise, start with the root object
+    Object.entries(schema.properties || {}).forEach(([key, value]) => {
+      const required = schema.required?.includes(key);
+      processSchemaNode(key, value, required, "");
+    });
+  }
+
+  return rows;
+}
+
+//////////////////////////
+//// Inline templates ////
+//////////////////////////
+
+const TIP_LINK_TO_TASK_PAGE_TEMPLATE = Handlebars.compile(`<Tip>
+
+For more details about the \`{{task}}\` task, check out its [dedicated page](https://huggingface.co/tasks/{{task}})! You will find examples and related materials.
+
+</Tip>`);
+
+const TIP_LIST_MODELS_LINK_TEMPLATE = Handlebars.compile(
+  `This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag={{task}}&sort=trending).`,
+);
+
+const SPECS_HEADERS = await readTemplate("specs-headers", "common");
+const PAGE_HEADER = Handlebars.compile(
+  await readTemplate("page-header", "common"),
+);
+const SNIPPETS_TEMPLATE = Handlebars.compile(
+  await readTemplate("snippets-template", "common"),
+);
+const SPECS_PAYLOAD_TEMPLATE = Handlebars.compile(
+  await readTemplate("specs-payload", "common"),
+);
+const SPECS_OUTPUT_TEMPLATE = Handlebars.compile(
+  await readTemplate("specs-output", "common"),
+);
+
+////////////////////
+//// Data utils ////
+////////////////////
+
+const DATA: {
+  constants: {
+    specsHeaders: string;
+  };
+  models: Record<string, { id: string; description: string }[]>;
+  snippets: Record<string, string>;
+  specs: Record<
+    string,
+    {
+      input: string | undefined;
+      output: string | undefined;
+      stream_output: string | undefined;
+    }
+  >;
+  tips: {
+    linksToTaskPage: Record<string, string>;
+    listModelsLink: Record<string, string>;
+  };
+} = {
+  constants: {
+    specsHeaders: SPECS_HEADERS,
+  },
+  models: {},
+  snippets: {},
+  specs: {},
+  tips: { linksToTaskPage: {}, listModelsLink: {} },
+};
+
+// Check for each model if inference status is "warm"
+await Promise.all(
+  TASKS.map(async (task) => {
+    await Promise.all(
+      TASKS_DATA[task].models.map(
+        async (model: {
+          id: string;
+          description: string;
+          inference: string | undefined;
+          config: JsonObject | undefined;
+        }) => {
+          console.log(`   ⚡ Checking inference status ${model.id}`);
+          let url = `https://huggingface.co/api/models/${model.id}?expand[]=inference`;
+          if (task === "text-generation") {
+            url += "&expand[]=config";
+          }
+          const modelData = await fetch(url).then((res) => res.json());
+          model.inference = modelData.inference;
+          model.config = modelData.config;
+        },
+      ),
+    );
+  }),
+);
+
+// Fetch recommended models
+TASKS.forEach((task) => {
+  DATA.models[task] = TASKS_DATA[task].models.filter(
+    (model: { inference: string }) =>
+      ["cold", "loading", "warm"].includes(model.inference),
+  );
+});
+
+// Fetch snippets
+// TODO: render snippets only if they are available
+TASKS.forEach((task) => {
+  // Let's take as example the first available model that is recommended.
+  // Otherwise, fallback to "<REPO_ID>".
+  const mainModel = DATA.models[task][0]?.id ?? "<REPO_ID>";
+  const taskSnippets = {
+    curl: getInferenceSnippet(mainModel, task, "curl"),
+    python: getInferenceSnippet(mainModel, task, "python"),
+    javascript: getInferenceSnippet(mainModel, task, "js"),
+  };
+  DATA.snippets[task] = SNIPPETS_TEMPLATE({
+    taskSnippets,
+    taskSnakeCase: task.replace("-", "_"),
+    taskAttached: task.replace("-", ""),
+  });
+});
+
+// Render specs
+await Promise.all(
+  TASKS_EXTENDED.map(async (task) => {
+    // @ts-ignore
+    const specs = await fetchSpecs(task);
+    DATA.specs[task] = {
+      input: specs.input
+        ? SPECS_PAYLOAD_TEMPLATE({ schema: processPayloadSchema(specs.input) })
+        : undefined,
+      output: specs.output
+        ? SPECS_OUTPUT_TEMPLATE({ schema: processPayloadSchema(specs.output) })
+        : undefined,
+      stream_output: specs.stream_output
+        ? SPECS_OUTPUT_TEMPLATE({
+            schema: processPayloadSchema(specs.stream_output),
+          })
+        : undefined,
+    };
+  }),
+);
+
+// Render tips
+TASKS.forEach((task) => {
+  DATA.tips.linksToTaskPage[task] = TIP_LINK_TO_TASK_PAGE_TEMPLATE({ task });
+  DATA.tips.listModelsLink[task] = TIP_LIST_MODELS_LINK_TEMPLATE({ task });
+});
+
+///////////////////////////////////////////////
+//// Data for chat-completion special case ////
+///////////////////////////////////////////////
+
+function fetchChatCompletion() {
+  // Recommended models based on text-generation
+  DATA.models["chat-completion"] = DATA.models["text-generation"].filter(
+    // @ts-ignore
+    (model) => model.config?.tokenizer_config?.chat_template,
+  );
+
+  // Snippet specific to chat completion
+  const mainModel = DATA.models["chat-completion"][0];
+  const mainModelData = {
+    // @ts-ignore
+    id: mainModel.id,
+    pipeline_tag: "text-generation",
+    mask_token: "",
+    library_name: "",
+    // @ts-ignore
+    config: mainModel.config,
+  };
+  const taskSnippets = {
+    // @ts-ignore
+    curl: GET_SNIPPET_FN["curl"](mainModelData, "hf_***"),
+    // @ts-ignore
+    python: GET_SNIPPET_FN["python"](mainModelData, "hf_***"),
+    // @ts-ignore
+    javascript: GET_SNIPPET_FN["js"](mainModelData, "hf_***"),
+  };
+  DATA.snippets["chat-completion"] = SNIPPETS_TEMPLATE({
+    taskSnippets,
+    taskSnakeCase: "chat-completion".replace("-", "_"),
+    taskAttached: "chat-completion".replace("-", ""),
+  });
+}
+
+fetchChatCompletion();
+
+/////////////////////////
+//// Rendering utils ////
+/////////////////////////
+
+async function renderTemplate(
+  templateName: string,
+  data: JsonObject,
+): Promise<string> {
+  console.log(`🎨  Rendering ${templateName}`);
+  const template = Handlebars.compile(await readTemplate(templateName, "task"));
+  return template(data);
+}
+
+await Promise.all(
+  TASKS_EXTENDED.map(async (task) => {
+    // @ts-ignore
+    const rendered = await renderTemplate(task, DATA);
+    await writeTaskDoc(task, rendered);
+  }),
+);
+
+console.log("✅ All done!");
diff --git a/scripts/api-inference/templates/common/page-header.handlebars b/scripts/api-inference/templates/common/page-header.handlebars
new file mode 100644
index 000000000..54aa6c861
--- /dev/null
+++ b/scripts/api-inference/templates/common/page-header.handlebars
@@ -0,0 +1,11 @@
+This markdown file has been generated from a script. Please do not edit it directly.
+For more details, check out:
+- the `generate.ts` script: https://github.com/huggingface/hub-docs/blob/main/scripts/api-inference/scripts/generate.ts
+- the task template defining the sections in the page: https://github.com/huggingface/hub-docs/tree/main/scripts/api-inference/templates/task/{{task}}.handlebars
+- the input jsonschema specifications used to generate the input markdown table: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/tasks/{{task}}/spec/input.json
+- the output jsonschema specifications used to generate the output markdown table: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/tasks/{{task}}/spec/output.json
+- the snippets used to generate the example:
+  - curl: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/curl.ts
+  - python: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/python.ts
+  - javascript: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/snippets/js.ts
+- the "tasks" content for recommended models: https://huggingface.co/api/tasks
\ No newline at end of file
diff --git a/scripts/api-inference/templates/common/snippets-template.handlebars b/scripts/api-inference/templates/common/snippets-template.handlebars
new file mode 100644
index 000000000..2d0f099e2
--- /dev/null
+++ b/scripts/api-inference/templates/common/snippets-template.handlebars
@@ -0,0 +1,42 @@
+{{#if (or taskSnippets.curl taskSnippets.python taskSnippets.javascript)}}
+
+<inferencesnippet>
+
+{{!-- cURL snippet (if exists) --}}
+{{#if taskSnippets.curl}}
+<curl>
+```bash
+{{{taskSnippets.curl}}}
+```
+</curl>
+{{/if}}
+
+{{!-- Python snippet (if exists) --}}
+{{#if taskSnippets.python}}
+<python>
+```py
+{{{taskSnippets.python}}}
+```
+
+To use the Python client, see `huggingface_hub`'s [package reference](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.{{taskSnakeCase}}).
+</python>
+{{/if}}
+
+{{!-- JavaScript snippet (if exists) --}}
+{{#if taskSnippets.javascript}}
+<js>
+```js
+{{{taskSnippets.javascript}}}
+```
+
+To use the JavaScript client, see `huggingface.js`'s [package reference](https://huggingface.co/docs/huggingface.js/inference/classes/HfInference#{{taskAttached}}).
+</js>
+{{/if}}
+
+</inferencesnippet>
+
+{{else}}
+
+No snippet available for this task.
+
+{{/if}}
\ No newline at end of file
diff --git a/scripts/api-inference/templates/common/specs-headers.handlebars b/scripts/api-inference/templates/common/specs-headers.handlebars
new file mode 100644
index 000000000..32b6e9d94
--- /dev/null
+++ b/scripts/api-inference/templates/common/specs-headers.handlebars
@@ -0,0 +1,9 @@
+Some options can be configured by passing headers to the Inference API. Here are the available headers:
+
+| Headers |   |    |
+| :--- | :--- | :--- |
+| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). |
+| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). |
+| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). |
+
+For more information about Inference API headers, check out the parameters [guide](../parameters).
\ No newline at end of file
diff --git a/scripts/api-inference/templates/common/specs-output.handlebars b/scripts/api-inference/templates/common/specs-output.handlebars
new file mode 100644
index 000000000..7d0e7b4c0
--- /dev/null
+++ b/scripts/api-inference/templates/common/specs-output.handlebars
@@ -0,0 +1,9 @@
+| Body |  |
+| :--- | :--- | :--- |
+{{#each schema}}
+{{#if type}}
+| **{{{name}}}** | _{{type}}_ | {{{description}}} |
+{{else}}
+| **{{{name}}}** | | {{{description}}} |
+{{/if}}
+{{/each}}
\ No newline at end of file
diff --git a/scripts/api-inference/templates/common/specs-payload.handlebars b/scripts/api-inference/templates/common/specs-payload.handlebars
new file mode 100644
index 000000000..6459be5d9
--- /dev/null
+++ b/scripts/api-inference/templates/common/specs-payload.handlebars
@@ -0,0 +1,9 @@
+| Payload |  |  |
+| :--- | :--- | :--- |
+{{#each schema}}
+{{#if type}}
+| **{{{name}}}{{#if required}}*{{/if}}** | _{{type}}_ | {{{description}}} |
+{{else}}
+| **{{{name}}}** |  | {{{description}}} |
+{{/if}}
+{{/each}}
\ No newline at end of file
diff --git a/scripts/api-inference/templates/task/audio-classification.handlebars b/scripts/api-inference/templates/task/audio-classification.handlebars
new file mode 100644
index 000000000..8530b7de2
--- /dev/null
+++ b/scripts/api-inference/templates/task/audio-classification.handlebars
@@ -0,0 +1,34 @@
+## Audio Classification
+
+Audio classification is the task of assigning a label or class to a given audio.
+
+Example applications:
+* Recognizing which command a user is giving
+* Identifying a speaker
+* Detecting the genre of a song
+
+{{{tips.linksToTaskPage.audio-classification}}}
+
+### Recommended models
+
+{{#each models.audio-classification}}
+- [{{this.id}}](https://huggingface.co/{{this.id}}): {{this.description}}
+{{/each}}
+
+{{{tips.listModelsLink.audio-classification}}}
+
+### Using the API
+
+{{{snippets.audio-classification}}}
+
+### API specification
+
+#### Request
+
+{{{specs.audio-classification.input}}}
+
+{{{constants.specsHeaders}}}
+
+#### Response
+
+{{{specs.audio-classification.output}}}
diff --git a/scripts/api-inference/templates/task/automatic-speech-recognition.handlebars b/scripts/api-inference/templates/task/automatic-speech-recognition.handlebars
new file mode 100644
index 000000000..fc81651df
--- /dev/null
+++ b/scripts/api-inference/templates/task/automatic-speech-recognition.handlebars
@@ -0,0 +1,34 @@
+## Automatic Speech Recognition
+
+Automatic Speech Recognition (ASR), also known as Speech to Text (STT), is the task of transcribing a given audio to text.
+
+Example applications:
+* Transcribing a podcast
+* Building a voice assistant
+* Generating subtitles for a video
+
+{{{tips.linksToTaskPage.automatic-speech-recognition}}}
+
+### Recommended models
+
+{{#each models.automatic-speech-recognition}}
+- [{{this.id}}](https://huggingface.co/{{this.id}}): {{this.description}}
+{{/each}}
+
+{{{tips.listModelsLink.automatic-speech-recognition}}}
+
+### Using the API
+
+{{{snippets.automatic-speech-recognition}}}
+
+### API specification
+
+#### Request
+
+{{{specs.automatic-speech-recognition.input}}}
+
+{{{constants.specsHeaders}}}
+
+#### Response
+
+{{{specs.automatic-speech-recognition.output}}}
diff --git a/scripts/api-inference/templates/task/chat-completion.handlebars b/scripts/api-inference/templates/task/chat-completion.handlebars
new file mode 100644
index 000000000..31acb2d21
--- /dev/null
+++ b/scripts/api-inference/templates/task/chat-completion.handlebars
@@ -0,0 +1,45 @@
+## Chat Completion
+
+Generate a response given a list of messages.
+This is a subtask of [`text-generation`](./text_generation) designed to generate responses in a conversational context.
+
+{{{tips.linksToTaskPage.chat-completion}}}
+
+### Recommended models
+
+{{#each models.chat-completion}}
+- [{{this.id}}](https://huggingface.co/{{this.id}}): {{this.description}}
+{{/each}}
+
+{{{tips.listModelsLink.chat-completion}}}
+
+### Using the API
+
+The API supports:
+
+* Using the chat completion API compatible with the OpenAI SDK.
+* Using grammars, constraints, and tools.
+* Streaming the output
+
+{{{snippets.chat-completion}}}
+
+### API specification
+
+#### Request
+
+{{{specs.chat-completion.input}}}
+
+{{{constants.specsHeaders}}}
+
+#### Response
+
+Output type depends on the `stream` input parameter.
+If `stream` is `false` (default), the response will be a JSON object with the following fields:
+
+{{{specs.chat-completion.output}}}
+
+If `stream` is `true`, generated tokens are returned as a stream, using Server-Sent Events (SSE).
+For more information about streaming, check out [this guide](https://huggingface.co/docs/text-generation-inference/conceptual/streaming).
+
+{{{specs.chat-completion.stream_output}}}
+
diff --git a/scripts/api-inference/templates/task/feature-extraction.handlebars b/scripts/api-inference/templates/task/feature-extraction.handlebars
new file mode 100644
index 000000000..0b7b9748f
--- /dev/null
+++ b/scripts/api-inference/templates/task/feature-extraction.handlebars
@@ -0,0 +1,35 @@
+## Feature Extraction
+
+Feature extraction is the task of converting a text into a vector (often called "embedding").
+
+Example applications:
+* Retrieving the most relevant documents for a query (for RAG applications).
+* Reranking a list of documents based on their similarity to a query.
+* Calculating the similarity between two sentences.
+
+{{{tips.linksToTaskPage.feature-extraction}}}
+
+### Recommended models
+
+{{#each models.feature-extraction}}
+- [{{this.id}}](https://huggingface.co/{{this.id}}): {{this.description}}
+{{/each}}
+
+{{{tips.listModelsLink.feature-extraction}}}
+
+### Using the API
+
+{{{snippets.feature-extraction}}}
+
+### API specification
+
+#### Request
+
+{{{specs.feature-extraction.input}}}
+
+{{{constants.specsHeaders}}}
+
+#### Response
+
+{{{specs.feature-extraction.output}}}
+
diff --git a/scripts/api-inference/templates/task/fill-mask.handlebars b/scripts/api-inference/templates/task/fill-mask.handlebars
new file mode 100644
index 000000000..c9c131e22
--- /dev/null
+++ b/scripts/api-inference/templates/task/fill-mask.handlebars
@@ -0,0 +1,29 @@
+## Fill-mask
+
+Mask filling is the task of predicting the right word (token to be precise) in the middle of a sequence.
+
+{{{tips.linksToTaskPage.fill-mask}}}
+
+### Recommended models
+
+{{#each models.fill-mask}}
+- [{{this.id}}](https://huggingface.co/{{this.id}}): {{this.description}}
+{{/each}}
+
+{{{tips.listModelsLink.fill-mask}}}
+
+### Using the API
+
+{{{snippets.fill-mask}}}
+
+### API specification
+
+#### Request
+
+{{{specs.fill-mask.input}}}
+
+{{{constants.specsHeaders}}}
+
+#### Response
+
+{{{specs.fill-mask.output}}}
diff --git a/scripts/api-inference/templates/task/image-classification.handlebars b/scripts/api-inference/templates/task/image-classification.handlebars
new file mode 100644
index 000000000..96a6ff49a
--- /dev/null
+++ b/scripts/api-inference/templates/task/image-classification.handlebars
@@ -0,0 +1,30 @@
+## Image Classification
+
+Image classification is the task of assigning a label or class to an entire image. Images are expected to have only one class for each image.
+
+{{{tips.linksToTaskPage.image-classification}}}
+
+### Recommended models
+
+{{#each models.image-classification}}
+- [{{this.id}}](https://huggingface.co/{{this.id}}): {{this.description}}
+{{/each}}
+
+{{{tips.listModelsLink.image-classification}}}
+
+### Using the API
+
+{{{snippets.image-classification}}}
+
+### API specification
+
+#### Request
+
+{{{specs.image-classification.input}}}
+
+{{{constants.specsHeaders}}}
+
+#### Response
+
+{{{specs.image-classification.output}}}
+
diff --git a/scripts/api-inference/templates/task/image-segmentation.handlebars b/scripts/api-inference/templates/task/image-segmentation.handlebars
new file mode 100644
index 000000000..11ea77f47
--- /dev/null
+++ b/scripts/api-inference/templates/task/image-segmentation.handlebars
@@ -0,0 +1,30 @@
+## Image Segmentation
+
+Image Segmentation divides an image into segments where each pixel in the image is mapped to an object.
+
+{{{tips.linksToTaskPage.image-segmentation}}}
+
+### Recommended models
+
+{{#each models.image-segmentation}}
+- [{{this.id}}](https://huggingface.co/{{this.id}}): {{this.description}}
+{{/each}}
+
+{{{tips.listModelsLink.image-segmentation}}}
+
+### Using the API
+
+{{{snippets.image-segmentation}}}
+
+### API specification
+
+#### Request
+
+{{{specs.image-segmentation.input}}}
+
+{{{constants.specsHeaders}}}
+
+#### Response
+
+{{{specs.image-segmentation.output}}}
+
diff --git a/scripts/api-inference/templates/task/image-to-image.handlebars b/scripts/api-inference/templates/task/image-to-image.handlebars
new file mode 100644
index 000000000..ba21bf4fe
--- /dev/null
+++ b/scripts/api-inference/templates/task/image-to-image.handlebars
@@ -0,0 +1,35 @@
+## Image to Image
+
+Image-to-image is the task of transforming a source image to match the characteristics of a target image or a target image domain.
+
+Example applications:
+* Transferring the style of an image to another image
+* Colorizing a black and white image
+* Increasing the resolution of an image
+
+{{{tips.linksToTaskPage.image-to-image}}}
+
+### Recommended models
+
+{{#each models.image-to-image}}
+- [{{this.id}}](https://huggingface.co/{{this.id}}): {{this.description}}
+{{/each}}
+
+{{{tips.listModelsLink.image-to-image}}}
+
+### Using the API
+
+{{{snippets.image-to-image}}}
+
+### API specification
+
+#### Request
+
+{{{specs.image-to-image.input}}}
+
+{{{constants.specsHeaders}}}
+
+#### Response
+
+{{{specs.image-to-image.output}}}
+
diff --git a/scripts/api-inference/templates/task/object-detection.handlebars b/scripts/api-inference/templates/task/object-detection.handlebars
new file mode 100644
index 000000000..3892e34a3
--- /dev/null
+++ b/scripts/api-inference/templates/task/object-detection.handlebars
@@ -0,0 +1,29 @@
+## Object detection
+
+Object Detection models allow users to identify objects of certain defined classes. These models receive an image as input and output the images with bounding boxes and labels on detected objects.
+
+{{{tips.linksToTaskPage.object-detection}}}
+
+### Recommended models
+
+{{#each models.object-detection}}
+- [{{this.id}}](https://huggingface.co/{{this.id}}): {{this.description}}
+{{/each}}
+
+{{{tips.listModelsLink.object-detection}}}
+
+### Using the API
+
+{{{snippets.object-detection}}}
+
+### API specification
+
+#### Request
+
+{{{specs.object-detection.input}}}
+
+{{{constants.specsHeaders}}}
+
+#### Response
+
+{{{specs.object-detection.output}}}
diff --git a/scripts/api-inference/templates/task/question-answering.handlebars b/scripts/api-inference/templates/task/question-answering.handlebars
new file mode 100644
index 000000000..3ca4e93d3
--- /dev/null
+++ b/scripts/api-inference/templates/task/question-answering.handlebars
@@ -0,0 +1,29 @@
+## Question Answering
+
+Question Answering models can retrieve the answer to a question from a given text, which is useful for searching for an answer in a document.
+
+{{{tips.linksToTaskPage.question-answering}}}
+
+### Recommended models
+
+{{#each models.question-answering}}
+- [{{this.id}}](https://huggingface.co/{{this.id}}): {{this.description}}
+{{/each}}
+
+{{{tips.listModelsLink.question-answering}}}
+
+### Using the API
+
+{{{snippets.question-answering}}}
+
+### API specification
+
+#### Request
+
+{{{specs.question-answering.input}}}
+
+{{{constants.specsHeaders}}}
+
+#### Response
+
+{{{specs.question-answering.output}}}
diff --git a/scripts/api-inference/templates/task/summarization.handlebars b/scripts/api-inference/templates/task/summarization.handlebars
new file mode 100644
index 000000000..1df382189
--- /dev/null
+++ b/scripts/api-inference/templates/task/summarization.handlebars
@@ -0,0 +1,29 @@
+## Summarization
+
+Summarization is the task of producing a shorter version of a document while preserving its important information. Some models can extract text from the original input, while other models can generate entirely new text.
+
+{{{tips.linksToTaskPage.summarization}}}
+
+### Recommended models
+
+{{#each models.summarization}}
+- [{{this.id}}](https://huggingface.co/{{this.id}}): {{this.description}}
+{{/each}}
+
+{{{tips.listModelsLink.summarization}}}
+
+### Using the API
+
+{{{snippets.summarization}}}
+
+### API specification
+
+#### Request
+
+{{{specs.summarization.input}}}
+
+{{{constants.specsHeaders}}}
+
+#### Response
+
+{{{specs.summarization.output}}}
diff --git a/scripts/api-inference/templates/task/table-question-answering.handlebars b/scripts/api-inference/templates/task/table-question-answering.handlebars
new file mode 100644
index 000000000..087ff53bf
--- /dev/null
+++ b/scripts/api-inference/templates/task/table-question-answering.handlebars
@@ -0,0 +1,29 @@
+## Table Question Answering
+
+Table Question Answering (Table QA) is the answering a question about an information on a given table.
+
+{{{tips.linksToTaskPage.table-question-answering}}}
+
+### Recommended models
+
+{{#each models.table-question-answering}}
+- [{{this.id}}](https://huggingface.co/{{this.id}}): {{this.description}}
+{{/each}}
+
+{{{tips.listModelsLink.table-question-answering}}}
+
+### Using the API
+
+{{{snippets.table-question-answering}}}
+
+### API specification
+
+#### Request
+
+{{{specs.table-question-answering.input}}}
+
+{{{constants.specsHeaders}}}
+
+#### Response
+
+{{{specs.table-question-answering.output}}}
diff --git a/scripts/api-inference/templates/task/text-classification.handlebars b/scripts/api-inference/templates/task/text-classification.handlebars
new file mode 100644
index 000000000..123d1f92a
--- /dev/null
+++ b/scripts/api-inference/templates/task/text-classification.handlebars
@@ -0,0 +1,29 @@
+## Text Classification
+
+Text Classification is the task of assigning a label or class to a given text. Some use cases are sentiment analysis, natural language inference, and assessing grammatical correctness.
+
+{{{tips.linksToTaskPage.text-classification}}}
+
+### Recommended models
+
+{{#each models.text-classification}}
+- [{{this.id}}](https://huggingface.co/{{this.id}}): {{this.description}}
+{{/each}}
+
+{{{tips.listModelsLink.text-classification}}}
+
+### Using the API
+
+{{{snippets.text-classification}}}
+
+### API specification
+
+#### Request
+
+{{{specs.text-classification.input}}}
+
+{{{constants.specsHeaders}}}
+
+#### Response
+
+{{{specs.text-classification.output}}}
diff --git a/scripts/api-inference/templates/task/text-generation.handlebars b/scripts/api-inference/templates/task/text-generation.handlebars
new file mode 100644
index 000000000..9720cc175
--- /dev/null
+++ b/scripts/api-inference/templates/task/text-generation.handlebars
@@ -0,0 +1,39 @@
+## Text Generation
+
+Generate text based on a prompt.
+
+If you are interested in a Chat Completion task, which generates a response based on a list of messages, check out the [`chat-completion`](./chat_completion) task.
+
+{{{tips.linksToTaskPage.text-generation}}}
+
+### Recommended models
+
+{{#each models.text-generation}}
+- [{{this.id}}](https://huggingface.co/{{this.id}}): {{this.description}}
+{{/each}}
+
+{{{tips.listModelsLink.text-generation}}}
+
+### Using the API
+
+{{{snippets.text-generation}}}
+
+### API specification
+
+#### Request
+
+{{{specs.text-generation.input}}}
+
+{{{constants.specsHeaders}}}
+
+#### Response
+
+Output type depends on the `stream` input parameter.
+If `stream` is `false` (default), the response will be a JSON object with the following fields:
+
+{{{specs.text-generation.output}}}
+
+If `stream` is `true`, generated tokens are returned as a stream, using Server-Sent Events (SSE).
+For more information about streaming, check out [this guide](https://huggingface.co/docs/text-generation-inference/conceptual/streaming).
+
+{{{specs.text-generation.stream_output}}}
diff --git a/scripts/api-inference/templates/task/text-to-image.handlebars b/scripts/api-inference/templates/task/text-to-image.handlebars
new file mode 100644
index 000000000..ac65056e6
--- /dev/null
+++ b/scripts/api-inference/templates/task/text-to-image.handlebars
@@ -0,0 +1,29 @@
+## Text to Image
+
+Generate an image based on a given text prompt.
+
+{{{tips.linksToTaskPage.text-to-image}}}
+
+### Recommended models
+
+{{#each models.text-to-image}}
+- [{{this.id}}](https://huggingface.co/{{this.id}}): {{this.description}}
+{{/each}}
+
+{{{tips.listModelsLink.text-to-image}}}
+
+### Using the API
+
+{{{snippets.text-to-image}}}
+
+### API specification
+
+#### Request
+
+{{{specs.text-to-image.input}}}
+
+{{{constants.specsHeaders}}}
+
+#### Response
+
+{{{specs.text-to-image.output}}}
diff --git a/scripts/api-inference/templates/task/token-classification.handlebars b/scripts/api-inference/templates/task/token-classification.handlebars
new file mode 100644
index 000000000..4a627783f
--- /dev/null
+++ b/scripts/api-inference/templates/task/token-classification.handlebars
@@ -0,0 +1,38 @@
+## Token Classification
+
+Token classification is a task in which a label is assigned to some tokens in a text. Some popular token classification subtasks are Named Entity Recognition (NER) and Part-of-Speech (PoS) tagging.
+
+{{{tips.linksToTaskPage.token-classification}}}
+
+### Recommended models
+
+{{#each models.token-classification}}
+- [{{this.id}}](https://huggingface.co/{{this.id}}): {{this.description}}
+{{/each}}
+
+{{{tips.listModelsLink.token-classification}}}
+
+### Using the API
+
+{{{snippets.token-classification}}}
+
+### API specification
+
+#### Request
+
+{{{specs.token-classification.input}}}
+
+{{{constants.specsHeaders}}}
+
+#### Response
+
+Output type depends on the `stream` input parameter.
+If `stream` is `false` (default), the response will be a JSON object with the following fields:
+
+{{{specs.token-classification.output}}}
+
+If `stream` is `true`, generated tokens are returned as a stream, using Server-Sent Events (SSE).
+For more information about streaming, check out [this guide](https://huggingface.co/docs/token-classification-inference/conceptual/streaming).
+
+{{{specs.token-classification.stream_output}}}
+
diff --git a/scripts/api-inference/templates/task/translation.handlebars b/scripts/api-inference/templates/task/translation.handlebars
new file mode 100644
index 000000000..7cbede05d
--- /dev/null
+++ b/scripts/api-inference/templates/task/translation.handlebars
@@ -0,0 +1,29 @@
+## Translation
+
+Translation is the task of converting text from one language to another.
+
+{{{tips.linksToTaskPage.translation}}}
+
+### Recommended models
+
+{{#each models.translation}}
+- [{{this.id}}](https://huggingface.co/{{this.id}}): {{this.description}}
+{{/each}}
+
+{{{tips.listModelsLink.translation}}}
+
+### Using the API
+
+{{{snippets.translation}}}
+
+### API specification
+
+#### Request
+
+{{{specs.translation.input}}}
+
+{{{constants.specsHeaders}}}
+
+#### Response
+
+{{{specs.translation.output}}}
diff --git a/scripts/api-inference/templates/task/zero-shot-classification.handlebars b/scripts/api-inference/templates/task/zero-shot-classification.handlebars
new file mode 100644
index 000000000..e0e830e93
--- /dev/null
+++ b/scripts/api-inference/templates/task/zero-shot-classification.handlebars
@@ -0,0 +1,29 @@
+## Zero-Shot Classification
+
+Zero-shot text classification is super useful to try out classification with zero code, you simply pass a sentence/paragraph and the possible labels for that sentence, and you get a result. The model has not been necessarily trained on the labels you provide, but it can still predict the correct label.
+
+{{{tips.linksToTaskPage.zero-shot-classification}}}
+
+### Recommended models
+
+{{#each models.zero-shot-classification}}
+- [{{this.id}}](https://huggingface.co/{{this.id}}): {{this.description}}
+{{/each}}
+
+{{{tips.listModelsLink.zero-shot-classification}}}
+
+### Using the API
+
+{{{snippets.zero-shot-classification}}}
+
+### API specification
+
+#### Request
+
+{{{specs.zero-shot-classification.input}}}
+
+{{{constants.specsHeaders}}}
+
+#### Response
+
+{{{specs.zero-shot-classification.output}}}
diff --git a/scripts/api-inference/tsconfig.json b/scripts/api-inference/tsconfig.json
new file mode 100644
index 000000000..20b47e4ab
--- /dev/null
+++ b/scripts/api-inference/tsconfig.json
@@ -0,0 +1,20 @@
+{
+  "compilerOptions": {
+    "allowSyntheticDefaultImports": true,
+    "lib": ["ES2022", "DOM"],
+    "module": "ESNext",
+    "target": "ESNext",
+    "moduleResolution": "node",
+    "forceConsistentCasingInFileNames": true,
+    "strict": true,
+    "noImplicitAny": true,
+    "strictNullChecks": true,
+    "skipLibCheck": true,
+    "noImplicitOverride": true,
+    "outDir": "./dist",
+    "declaration": true,
+    "declarationMap": true
+  },
+  "include": ["scripts"],
+  "exclude": ["dist"]
+}