diff --git a/docs/api-inference/_toctree.yml b/docs/api-inference/_toctree.yml
index a68f3abfb..247a96201 100644
--- a/docs/api-inference/_toctree.yml
+++ b/docs/api-inference/_toctree.yml
@@ -12,11 +12,23 @@
- local: parameters
title: Parameters
- sections:
+ - local: tasks/chat_completion
+ title: Chat Completion
- local: tasks/fill_mask
title: Fill Mask
- local: tasks/image_to_image
- title: Image-to-image
+ title: Image to Image
+ - local: tasks/question_answering
+ title: Question Answering
+ - local: tasks/summarization
+ title: Summarization
+ - local: tasks/table_question_answering
+ title: Table Question Answering
+ - local: tasks/text_classification
+ title: Text Classification
+ - local: tasks/text_generation
+ title: Text Generation
- local: tasks/text_to_image
- title: Text-to-image
+ title: Text to Image
title: Detailed Task Parameters
title: API Reference
\ No newline at end of file
diff --git a/docs/api-inference/tasks/chat_completion.md b/docs/api-inference/tasks/chat_completion.md
new file mode 100644
index 000000000..c01fe9ac1
--- /dev/null
+++ b/docs/api-inference/tasks/chat_completion.md
@@ -0,0 +1,202 @@
+## Chat Completion
+
+Generate a response given a list of messages.
+This is a subtask of [`text-generation`](./text_generation) designed to generate responses in a conversational context.
+
+
+
+### Recommended models
+
+- [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it): A text-generation model trained to follow instructions.
+- [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct): Very powerful text generation model trained to follow instructions.
+- [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct): Small yet powerful text generation model.
+- [AI-MO/NuminaMath-7B-TIR](https://huggingface.co/AI-MO/NuminaMath-7B-TIR): A very powerful model that can solve mathematical problems.
+- [HuggingFaceH4/starchat2-15b-v0.1](https://huggingface.co/HuggingFaceH4/starchat2-15b-v0.1): Strong coding assistant model.
+- [mistralai/Mistral-Nemo-Instruct-2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407): Very strong open-source large language model.
+
+
+
+### API specification
+
+#### Request
+
+| Payload | | |
+| :--- | :--- | :--- |
+| **frequency_penalty** | _number_ | Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. |
+| **logprobs** | _boolean_ | Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message. |
+| **max_tokens** | _integer_ | The maximum number of tokens that can be generated in the chat completion. |
+| **messages*** | _object[]_ | A list of messages comprising the conversation so far. |
+| ** content** | _string_ | |
+| ** name** | _string_ | |
+| ** role*** | _string_ | |
+| ** tool_calls** | _object[]_ | |
+| ** function*** | _object_ | |
+| ** arguments*** | _object_ | |
+| ** description** | _string_ | |
+| ** name*** | _string_ | |
+| ** id*** | _integer_ | |
+| ** type*** | _string_ | |
+| **presence_penalty** | _number_ | Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics |
+| **seed** | _integer_ | |
+| **stop** | _string[]_ | Up to 4 sequences where the API will stop generating further tokens. |
+| **stream** | _boolean_ | |
+| **temperature** | _number_ | What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or `top_p` but not both. |
+| **tool_choice** | _object_ | One of the following: |
+| ** (#1)** | | |
+| ** FunctionName*** | _string_ | |
+| ** (#2)** | | Possible values: OneOf. |
+| **tool_prompt** | _string_ | A prompt to be appended before the tools |
+| **tools** | _object[]_ | A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for. |
+| ** function*** | _object_ | |
+| ** arguments*** | _object_ | |
+| ** description** | _string_ | |
+| ** name*** | _string_ | |
+| ** type*** | _string_ | |
+| **top_logprobs** | _integer_ | An integer between 0 and 5 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is used. |
+| **top_p** | _number_ | An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. |
+
+
+Some options can be configured by passing headers to the Inference API. Here are the available headers:
+
+| Headers | | |
+| :--- | :--- | :--- |
+| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). |
+| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). |
+| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). |
+
+For more information about Inference API headers, check out the parameters [guide](../parameters).
+
+#### Response
+
+Output type depends on the `stream` input parameter.
+If `stream` is `false` (default), the response will be a JSON object with the following fields:
+
+| Body | |
+| :--- | :--- | :--- |
+| **choices** | _object[]_ | |
+| ** finish_reason** | _string_ | |
+| ** index** | _integer_ | |
+| ** logprobs** | _object_ | |
+| ** content** | _object[]_ | |
+| ** logprob** | _number_ | |
+| ** token** | _string_ | |
+| ** top_logprobs** | _object[]_ | |
+| ** logprob** | _number_ | |
+| ** token** | _string_ | |
+| ** message** | _object_ | |
+| ** content** | _string_ | |
+| ** name** | _string_ | |
+| ** role** | _string_ | |
+| ** tool_calls** | _object[]_ | |
+| ** function** | _object_ | |
+| ** arguments** | _object_ | |
+| ** description** | _string_ | |
+| ** name** | _string_ | |
+| ** id** | _integer_ | |
+| ** type** | _string_ | |
+| **created** | _integer_ | |
+| **id** | _string_ | |
+| **model** | _string_ | |
+| **object** | _string_ | |
+| **system_fingerprint** | _string_ | |
+| **usage** | _object_ | |
+| ** completion_tokens** | _integer_ | |
+| ** prompt_tokens** | _integer_ | |
+| ** total_tokens** | _integer_ | |
+
+
+If `stream` is `true`, generated tokens are returned as a stream, using Server-Sent Events (SSE).
+For more information about streaming, check out [this guide](https://huggingface.co/docs/text-generation-inference/conceptual/streaming).
+
+| Body | |
+| :--- | :--- | :--- |
+| **choices** | _object[]_ | |
+| ** delta** | _object_ | |
+| ** content** | _string_ | |
+| ** role** | _string_ | |
+| ** tool_calls** | _object_ | |
+| ** function** | _object_ | |
+| ** arguments** | _string_ | |
+| ** name** | _string_ | |
+| ** id** | _string_ | |
+| ** index** | _integer_ | |
+| ** type** | _string_ | |
+| ** finish_reason** | _string_ | |
+| ** index** | _integer_ | |
+| ** logprobs** | _object_ | |
+| ** content** | _object[]_ | |
+| ** logprob** | _number_ | |
+| ** token** | _string_ | |
+| ** top_logprobs** | _object[]_ | |
+| ** logprob** | _number_ | |
+| ** token** | _string_ | |
+| **created** | _integer_ | |
+| **id** | _string_ | |
+| **model** | _string_ | |
+| **object** | _string_ | |
+| **system_fingerprint** | _string_ | |
+
+
+### Using the API
+
+
+
+
+
+```bash
+curl 'https://api-inference.huggingface.co/models/google/gemma-2-2b-it/v1/chat/completions' \
+-H "Authorization: Bearer hf_***" \
+-H 'Content-Type: application/json' \
+-d '{
+ "model": "google/gemma-2-2b-it",
+ "messages": [{"role": "user", "content": "What is the capital of France?"}],
+ "max_tokens": 500,
+ "stream": false
+}'
+
+```
+
+
+
+```py
+from huggingface_hub import InferenceClient
+
+client = InferenceClient(
+ "google/gemma-2-2b-it",
+ token="hf_***",
+)
+
+for message in client.chat_completion(
+ messages=[{"role": "user", "content": "What is the capital of France?"}],
+ max_tokens=500,
+ stream=True,
+):
+ print(message.choices[0].delta.content, end="")
+
+```
+
+To use the Python client, see `huggingface_hub`'s [package reference](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.chat_completion).
+
+
+
+```js
+import { HfInference } from "@huggingface/inference";
+
+const inference = new HfInference("hf_***");
+
+for await (const chunk of inference.chatCompletionStream({
+ model: "google/gemma-2-2b-it",
+ messages: [{ role: "user", content: "What is the capital of France?" }],
+ max_tokens: 500,
+})) {
+ process.stdout.write(chunk.choices[0]?.delta?.content || "");
+}
+
+```
+
+To use the JavaScript client, see `huggingface.js`'s [package reference](https://huggingface.co/docs/huggingface.js/inference/classes/HfInference#chatcompletion).
+
+
+
+
+
diff --git a/docs/api-inference/tasks/fill_mask.md b/docs/api-inference/tasks/fill_mask.md
index 64260ae39..197fef37c 100644
--- a/docs/api-inference/tasks/fill_mask.md
+++ b/docs/api-inference/tasks/fill_mask.md
@@ -1,6 +1,114 @@
-## Fill Mask
+## Fill-mask
-Mask filling is the task of predicting the right word (token to be precise) in the middle of a sequence.
+Mask filling is the task of predicting the right word (token to be precise) in the middle of a sequence.
+
+
+
+For more details about the `fill-mask` task, check out its [dedicated page](https://huggingface.co/tasks/fill-mask)! You will find examples and related materials.
+
+
+
+### Recommended models
+
+- [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased): A faster and smaller model than the famous BERT model.
+- [xlm-roberta-base](https://huggingface.co/xlm-roberta-base): A multilingual model trained on 100 languages.
+
+This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=fill-mask&sort=trending).
+
+### API specification
+
+#### Request
+
+| Payload | | |
+| :--- | :--- | :--- |
+| **inputs*** | _string_ | The text with masked tokens |
+| **parameters** | _object_ | Additional inference parameters for Fill Mask |
+| ** top_k** | _integer_ | When passed, overrides the number of predictions to return. |
+| ** targets** | _string[]_ | When passed, the model will limit the scores to the passed targets instead of looking up in the whole vocabulary. If the provided targets are not in the model vocab, they will be tokenized and the first resulting token will be used (with a warning, and that might be slower). |
+
+
+Some options can be configured by passing headers to the Inference API. Here are the available headers:
+
+| Headers | | |
+| :--- | :--- | :--- |
+| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). |
+| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). |
+| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). |
+
+For more information about Inference API headers, check out the parameters [guide](../parameters).
+
+#### Response
+
+| Body | |
+| :--- | :--- | :--- |
+| **(array)** | _object[]_ | Output is an array of objects. |
+| ** sequence** | _string_ | The corresponding input with the mask token prediction. |
+| ** score** | _number_ | The corresponding probability |
+| ** token** | _integer_ | The predicted token id (to replace the masked one). |
+| ** token_str** | _string_ | The predicted token (to replace the masked one). |
+
+
+### Using the API
+
+
+
+
+
+```bash
+curl https://api-inference.huggingface.co/models/distilbert-base-uncased \
+ -X POST \
+ -d '{"inputs": "The answer to the universe is [MASK]."}' \
+ -H 'Content-Type: application/json' \
+ -H "Authorization: Bearer hf_***"
+
+```
+
+
+
+```py
+import requests
+
+API_URL = "https://api-inference.huggingface.co/models/distilbert-base-uncased"
+headers = {"Authorization": "Bearer hf_***"}
+
+def query(payload):
+ response = requests.post(API_URL, headers=headers, json=payload)
+ return response.json()
+
+output = query({
+ "inputs": "The answer to the universe is [MASK].",
+})
+```
+
+To use the Python client, see `huggingface_hub`'s [package reference](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.fill_mask).
+
+
+
+```js
+async function query(data) {
+ const response = await fetch(
+ "https://api-inference.huggingface.co/models/distilbert-base-uncased",
+ {
+ headers: {
+ Authorization: "Bearer hf_***"
+ "Content-Type": "application/json",
+ },
+ method: "POST",
+ body: JSON.stringify(data),
+ }
+ );
+ const result = await response.json();
+ return result;
+}
+
+query({"inputs": "The answer to the universe is [MASK]."}).then((response) => {
+ console.log(JSON.stringify(response));
+});
+```
+
+To use the JavaScript client, see `huggingface.js`'s [package reference](https://huggingface.co/docs/huggingface.js/inference/classes/HfInference#fillmask).
+
+
+
-Automated docs below
diff --git a/docs/api-inference/tasks/image_to_image.md b/docs/api-inference/tasks/image_to_image.md
index 1b5e2241e..eb197489e 100644
--- a/docs/api-inference/tasks/image_to_image.md
+++ b/docs/api-inference/tasks/image_to_image.md
@@ -1,4 +1,4 @@
-## Image-to-image
+## Image to Image
Image-to-image is the task of transforming a source image to match the characteristics of a target image or a target image domain.
Any image manipulation and enhancement is possible with image to image models.
@@ -31,28 +31,31 @@ This is only a subset of the supported models. Find the model that suits you bes
| Payload | | |
| :--- | :--- | :--- |
-| **inputs** | _object, required_ | The input image data |
-| **parameters** | _object, optional_ | Additional inference parameters for Image To Image |
-| ** guidance_scale** | _number, optional_ | For diffusion models. A higher guidance scale value encourages the model to generate images closely linked to the text prompt at the expense of lower image quality. |
-| ** negative_prompt** | _array, optional_ | One or several prompt to guide what NOT to include in image generation. |
-| ** num_inference_steps** | _integer, optional_ | For diffusion models. The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. |
-| ** target_size** | _object, optional_ | The size in pixel of the output image |
-| ** width** | _integer, required_ | |
-| ** height** | _integer, required_ | |
+| **inputs*** | _object_ | The input image data |
+| **parameters** | _object_ | Additional inference parameters for Image To Image |
+| ** guidance_scale** | _number_ | For diffusion models. A higher guidance scale value encourages the model to generate images closely linked to the text prompt at the expense of lower image quality. |
+| ** negative_prompt** | _string[]_ | One or several prompt to guide what NOT to include in image generation. |
+| ** num_inference_steps** | _integer_ | For diffusion models. The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. |
+| ** target_size** | _object_ | The size in pixel of the output image |
+| ** width*** | _integer_ | |
+| ** height*** | _integer_ | |
+Some options can be configured by passing headers to the Inference API. Here are the available headers:
+
| Headers | | |
| :--- | :--- | :--- |
-| **authorization** | _string, optional_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). |
-| **x-use-cache** | _boolean, optional, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). |
-| **x-wait-for-model** | _boolean, optional, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). |
+| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). |
+| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). |
+| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). |
+For more information about Inference API headers, check out the parameters [guide](../parameters).
#### Response
| Body | |
-| :--- | :--- |
-| **image** | The output image |
+| :--- | :--- | :--- |
+| **image** | _object_ | The output image |
### Using the API
diff --git a/docs/api-inference/tasks/question_answering.md b/docs/api-inference/tasks/question_answering.md
new file mode 100644
index 000000000..3f724c9c2
--- /dev/null
+++ b/docs/api-inference/tasks/question_answering.md
@@ -0,0 +1,127 @@
+## Question Answering
+
+Question Answering models can retrieve the answer to a question from a given text, which is useful for searching for an answer in a document.
+
+
+
+For more details about the `question-answering` task, check out its [dedicated page](https://huggingface.co/tasks/question-answering)! You will find examples and related materials.
+
+
+
+### Recommended models
+
+- [deepset/roberta-base-squad2](https://huggingface.co/deepset/roberta-base-squad2): A robust baseline model for most question answering domains.
+- [google/tapas-base-finetuned-wtq](https://huggingface.co/google/tapas-base-finetuned-wtq): A special model that can answer questions from tables!
+
+This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=question-answering&sort=trending).
+
+### API specification
+
+#### Request
+
+| Payload | | |
+| :--- | :--- | :--- |
+| **inputs*** | _object_ | One (context, question) pair to answer |
+| ** context*** | _string_ | The context to be used for answering the question |
+| ** question*** | _string_ | The question to be answered |
+| **parameters** | _object_ | Additional inference parameters for Question Answering |
+| ** top_k** | _integer_ | The number of answers to return (will be chosen by order of likelihood). Note that we return less than topk answers if there are not enough options available within the context. |
+| ** doc_stride** | _integer_ | If the context is too long to fit with the question for the model, it will be split in several chunks with some overlap. This argument controls the size of that overlap. |
+| ** max_answer_len** | _integer_ | The maximum length of predicted answers (e.g., only answers with a shorter length are considered). |
+| ** max_seq_len** | _integer_ | The maximum length of the total sentence (context + question) in tokens of each chunk passed to the model. The context will be split in several chunks (using docStride as overlap) if needed. |
+| ** max_question_len** | _integer_ | The maximum length of the question after tokenization. It will be truncated if needed. |
+| ** handle_impossible_answer** | _boolean_ | Whether to accept impossible as an answer. |
+| ** align_to_words** | _boolean_ | Attempts to align the answer to real words. Improves quality on space separated languages. Might hurt on non-space-separated languages (like Japanese or Chinese) |
+
+
+Some options can be configured by passing headers to the Inference API. Here are the available headers:
+
+| Headers | | |
+| :--- | :--- | :--- |
+| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). |
+| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). |
+| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). |
+
+For more information about Inference API headers, check out the parameters [guide](../parameters).
+
+#### Response
+
+| Body | |
+| :--- | :--- | :--- |
+| **(array)** | _object[]_ | Output is an array of objects. |
+| ** answer** | _string_ | The answer to the question. |
+| ** score** | _number_ | The probability associated to the answer. |
+| ** start** | _integer_ | The character position in the input where the answer begins. |
+| ** end** | _integer_ | The character position in the input where the answer ends. |
+
+
+### Using the API
+
+
+
+
+
+```bash
+curl https://api-inference.huggingface.co/models/deepset/roberta-base-squad2 \
+ -X POST \
+ -d '{"inputs": { "question": "What is my name?", "context": "My name is Clara and I live in Berkeley." }}' \
+ -H 'Content-Type: application/json' \
+ -H "Authorization: Bearer hf_***"
+
+```
+
+
+
+```py
+import requests
+
+API_URL = "https://api-inference.huggingface.co/models/deepset/roberta-base-squad2"
+headers = {"Authorization": "Bearer hf_***"}
+
+def query(payload):
+ response = requests.post(API_URL, headers=headers, json=payload)
+ return response.json()
+
+output = query({
+ "inputs": {
+ "question": "What is my name?",
+ "context": "My name is Clara and I live in Berkeley."
+},
+})
+```
+
+To use the Python client, see `huggingface_hub`'s [package reference](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.question_answering).
+
+
+
+```js
+async function query(data) {
+ const response = await fetch(
+ "https://api-inference.huggingface.co/models/deepset/roberta-base-squad2",
+ {
+ headers: {
+ Authorization: "Bearer hf_***"
+ "Content-Type": "application/json",
+ },
+ method: "POST",
+ body: JSON.stringify(data),
+ }
+ );
+ const result = await response.json();
+ return result;
+}
+
+query({"inputs": {
+ "question": "What is my name?",
+ "context": "My name is Clara and I live in Berkeley."
+}}).then((response) => {
+ console.log(JSON.stringify(response));
+});
+```
+
+To use the JavaScript client, see `huggingface.js`'s [package reference](https://huggingface.co/docs/huggingface.js/inference/classes/HfInference#questionanswering).
+
+
+
+
+
diff --git a/docs/api-inference/tasks/summarization.md b/docs/api-inference/tasks/summarization.md
new file mode 100644
index 000000000..f0ed74b66
--- /dev/null
+++ b/docs/api-inference/tasks/summarization.md
@@ -0,0 +1,106 @@
+## Summarization
+
+Summarization is the task of producing a shorter version of a document while preserving its important information. Some models can extract text from the original input, while other models can generate entirely new text.
+
+
+
+For more details about the `summarization` task, check out its [dedicated page](https://huggingface.co/tasks/summarization)! You will find examples and related materials.
+
+
+
+### Recommended models
+
+- [facebook/bart-large-cnn](https://huggingface.co/facebook/bart-large-cnn): A strong summarization model trained on English news articles. Excels at generating factual summaries.
+- [google/bigbird-pegasus-large-pubmed](https://huggingface.co/google/bigbird-pegasus-large-pubmed): A summarization model trained on medical articles.
+
+This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=summarization&sort=trending).
+
+### API specification
+
+#### Request
+
+| Payload | | |
+| :--- | :--- | :--- |
+
+
+Some options can be configured by passing headers to the Inference API. Here are the available headers:
+
+| Headers | | |
+| :--- | :--- | :--- |
+| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). |
+| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). |
+| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). |
+
+For more information about Inference API headers, check out the parameters [guide](../parameters).
+
+#### Response
+
+| Body | |
+| :--- | :--- | :--- |
+| **summary_text** | _string_ | The summarized text. |
+
+
+### Using the API
+
+
+
+
+
+```bash
+curl https://api-inference.huggingface.co/models/facebook/bart-large-cnn \
+ -X POST \
+ -d '{"inputs": "The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930. It was the first structure to reach a height of 300 metres. Due to the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct."}' \
+ -H 'Content-Type: application/json' \
+ -H "Authorization: Bearer hf_***"
+
+```
+
+
+
+```py
+import requests
+
+API_URL = "https://api-inference.huggingface.co/models/facebook/bart-large-cnn"
+headers = {"Authorization": "Bearer hf_***"}
+
+def query(payload):
+ response = requests.post(API_URL, headers=headers, json=payload)
+ return response.json()
+
+output = query({
+ "inputs": "The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930. It was the first structure to reach a height of 300 metres. Due to the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct.",
+})
+```
+
+To use the Python client, see `huggingface_hub`'s [package reference](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.summarization).
+
+
+
+```js
+async function query(data) {
+ const response = await fetch(
+ "https://api-inference.huggingface.co/models/facebook/bart-large-cnn",
+ {
+ headers: {
+ Authorization: "Bearer hf_***"
+ "Content-Type": "application/json",
+ },
+ method: "POST",
+ body: JSON.stringify(data),
+ }
+ );
+ const result = await response.json();
+ return result;
+}
+
+query({"inputs": "The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930. It was the first structure to reach a height of 300 metres. Due to the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct."}).then((response) => {
+ console.log(JSON.stringify(response));
+});
+```
+
+To use the JavaScript client, see `huggingface.js`'s [package reference](https://huggingface.co/docs/huggingface.js/inference/classes/HfInference#summarization).
+
+
+
+
+
diff --git a/docs/api-inference/tasks/table_question_answering.md b/docs/api-inference/tasks/table_question_answering.md
new file mode 100644
index 000000000..e3122e425
--- /dev/null
+++ b/docs/api-inference/tasks/table_question_answering.md
@@ -0,0 +1,138 @@
+## Table Question Answering
+
+Table Question Answering (Table QA) is the answering a question about an information on a given table.
+
+
+
+For more details about the `table-question-answering` task, check out its [dedicated page](https://huggingface.co/tasks/table-question-answering)! You will find examples and related materials.
+
+
+
+### Recommended models
+
+- [microsoft/tapex-base](https://huggingface.co/microsoft/tapex-base): A table question answering model that is capable of neural SQL execution, i.e., employ TAPEX to execute a SQL query on a given table.
+- [google/tapas-base-finetuned-wtq](https://huggingface.co/google/tapas-base-finetuned-wtq): A robust table question answering model.
+
+This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=table-question-answering&sort=trending).
+
+### API specification
+
+#### Request
+
+| Payload | | |
+| :--- | :--- | :--- |
+| **inputs*** | _object_ | One (table, question) pair to answer |
+| ** table*** | _object_ | The table to serve as context for the questions |
+| ** question*** | _string_ | The question to be answered about the table |
+| **parameters** | _object_ | Additional inference parameters for Table Question Answering |
+
+
+Some options can be configured by passing headers to the Inference API. Here are the available headers:
+
+| Headers | | |
+| :--- | :--- | :--- |
+| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). |
+| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). |
+| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). |
+
+For more information about Inference API headers, check out the parameters [guide](../parameters).
+
+#### Response
+
+| Body | |
+| :--- | :--- | :--- |
+| **(array)** | _object[]_ | Output is an array of objects. |
+| ** answer** | _string_ | The answer of the question given the table. If there is an aggregator, the answer will be preceded by `AGGREGATOR >`. |
+| ** coordinates** | _array[]_ | Coordinates of the cells of the answers. |
+| ** cells** | _string[]_ | List of strings made up of the answer cell values. |
+| ** aggregator** | _string_ | If the model has an aggregator, this returns the aggregator. |
+
+
+### Using the API
+
+
+
+
+
+```bash
+curl https://api-inference.huggingface.co/models/microsoft/tapex-base \
+ -X POST \
+ -d '{"inputs": { "query": "How many stars does the transformers repository have?", "table": { "Repository": ["Transformers", "Datasets", "Tokenizers"], "Stars": ["36542", "4512", "3934"], "Contributors": ["651", "77", "34"], "Programming language": [ "Python", "Python", "Rust, Python and NodeJS" ] } }}' \
+ -H 'Content-Type: application/json' \
+ -H "Authorization: Bearer hf_***"
+
+```
+
+
+
+```py
+import requests
+
+API_URL = "https://api-inference.huggingface.co/models/microsoft/tapex-base"
+headers = {"Authorization": "Bearer hf_***"}
+
+def query(payload):
+ response = requests.post(API_URL, headers=headers, json=payload)
+ return response.json()
+
+output = query({
+ "inputs": {
+ "query": "How many stars does the transformers repository have?",
+ "table": {
+ "Repository": ["Transformers", "Datasets", "Tokenizers"],
+ "Stars": ["36542", "4512", "3934"],
+ "Contributors": ["651", "77", "34"],
+ "Programming language": [
+ "Python",
+ "Python",
+ "Rust, Python and NodeJS"
+ ]
+ }
+},
+})
+```
+
+To use the Python client, see `huggingface_hub`'s [package reference](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.table_question-answering).
+
+
+
+```js
+async function query(data) {
+ const response = await fetch(
+ "https://api-inference.huggingface.co/models/microsoft/tapex-base",
+ {
+ headers: {
+ Authorization: "Bearer hf_***"
+ "Content-Type": "application/json",
+ },
+ method: "POST",
+ body: JSON.stringify(data),
+ }
+ );
+ const result = await response.json();
+ return result;
+}
+
+query({"inputs": {
+ "query": "How many stars does the transformers repository have?",
+ "table": {
+ "Repository": ["Transformers", "Datasets", "Tokenizers"],
+ "Stars": ["36542", "4512", "3934"],
+ "Contributors": ["651", "77", "34"],
+ "Programming language": [
+ "Python",
+ "Python",
+ "Rust, Python and NodeJS"
+ ]
+ }
+}}).then((response) => {
+ console.log(JSON.stringify(response));
+});
+```
+
+To use the JavaScript client, see `huggingface.js`'s [package reference](https://huggingface.co/docs/huggingface.js/inference/classes/HfInference#tablequestion-answering).
+
+
+
+
+
diff --git a/docs/api-inference/tasks/text_classification.md b/docs/api-inference/tasks/text_classification.md
new file mode 100644
index 000000000..8fffb6654
--- /dev/null
+++ b/docs/api-inference/tasks/text_classification.md
@@ -0,0 +1,112 @@
+## Text Classification
+
+Text Classification is the task of assigning a label or class to a given text. Some use cases are sentiment analysis, natural language inference, and assessing grammatical correctness.
+
+
+
+For more details about the `text-classification` task, check out its [dedicated page](https://huggingface.co/tasks/text-classification)! You will find examples and related materials.
+
+
+
+### Recommended models
+
+- [distilbert-base-uncased-finetuned-sst-2-english](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english): A robust model trained for sentiment analysis.
+- [roberta-large-mnli](https://huggingface.co/roberta-large-mnli): Multi-genre natural language inference model.
+
+This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=text-classification&sort=trending).
+
+### API specification
+
+#### Request
+
+| Payload | | |
+| :--- | :--- | :--- |
+| **inputs*** | _string_ | The text to classify |
+| **parameters** | _object_ | Additional inference parameters for Text Classification |
+| ** function_to_apply** | _enum_ | Possible values: sigmoid, softmax, none. |
+| ** top_k** | _integer_ | When specified, limits the output to the top K most probable classes. |
+
+
+Some options can be configured by passing headers to the Inference API. Here are the available headers:
+
+| Headers | | |
+| :--- | :--- | :--- |
+| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). |
+| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). |
+| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). |
+
+For more information about Inference API headers, check out the parameters [guide](../parameters).
+
+#### Response
+
+| Body | |
+| :--- | :--- | :--- |
+| **(array)** | _undefined[]_ | Output is an array of undefineds. |
+| ** label** | _string_ | The predicted class label. |
+| ** score** | _number_ | The corresponding probability. |
+
+
+### Using the API
+
+
+
+
+
+```bash
+curl https://api-inference.huggingface.co/models/distilbert-base-uncased-finetuned-sst-2-english \
+ -X POST \
+ -d '{"inputs": "I like you. I love you"}' \
+ -H 'Content-Type: application/json' \
+ -H "Authorization: Bearer hf_***"
+
+```
+
+
+
+```py
+import requests
+
+API_URL = "https://api-inference.huggingface.co/models/distilbert-base-uncased-finetuned-sst-2-english"
+headers = {"Authorization": "Bearer hf_***"}
+
+def query(payload):
+ response = requests.post(API_URL, headers=headers, json=payload)
+ return response.json()
+
+output = query({
+ "inputs": "I like you. I love you",
+})
+```
+
+To use the Python client, see `huggingface_hub`'s [package reference](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.text_classification).
+
+
+
+```js
+async function query(data) {
+ const response = await fetch(
+ "https://api-inference.huggingface.co/models/distilbert-base-uncased-finetuned-sst-2-english",
+ {
+ headers: {
+ Authorization: "Bearer hf_***"
+ "Content-Type": "application/json",
+ },
+ method: "POST",
+ body: JSON.stringify(data),
+ }
+ );
+ const result = await response.json();
+ return result;
+}
+
+query({"inputs": "I like you. I love you"}).then((response) => {
+ console.log(JSON.stringify(response));
+});
+```
+
+To use the JavaScript client, see `huggingface.js`'s [package reference](https://huggingface.co/docs/huggingface.js/inference/classes/HfInference#textclassification).
+
+
+
+
+
diff --git a/docs/api-inference/tasks/text_generation.md b/docs/api-inference/tasks/text_generation.md
new file mode 100644
index 000000000..fb3e41b3f
--- /dev/null
+++ b/docs/api-inference/tasks/text_generation.md
@@ -0,0 +1,203 @@
+## Text Generation
+
+Generate text based on a prompt.
+
+If you are interested in a Chat Completion task, which generates a response based on a list of messages, check out the [`chat-completion`](./chat_completion) task.
+
+
+
+For more details about the `text-generation` task, check out its [dedicated page](https://huggingface.co/tasks/text-generation)! You will find examples and related materials.
+
+
+
+### Recommended models
+
+- [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it): A text-generation model trained to follow instructions.
+- [bigcode/starcoder](https://huggingface.co/bigcode/starcoder): A code generation model that can generate code in 80+ languages.
+- [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct): Very powerful text generation model trained to follow instructions.
+- [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct): Small yet powerful text generation model.
+- [AI-MO/NuminaMath-7B-TIR](https://huggingface.co/AI-MO/NuminaMath-7B-TIR): A very powerful model that can solve mathematical problems.
+- [HuggingFaceH4/starchat2-15b-v0.1](https://huggingface.co/HuggingFaceH4/starchat2-15b-v0.1): Strong coding assistant model.
+- [mistralai/Mistral-Nemo-Instruct-2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407): Very strong open-source large language model.
+
+This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=text-generation&sort=trending).
+
+### API specification
+
+#### Request
+
+| Payload | | |
+| :--- | :--- | :--- |
+| **inputs*** | _string_ | |
+| **parameters** | _object_ | |
+| ** best_of** | _integer_ | |
+| ** decoder_input_details** | _boolean_ | |
+| ** details** | _boolean_ | |
+| ** do_sample** | _boolean_ | |
+| ** frequency_penalty** | _number_ | |
+| ** grammar** | _object_ | One of the following: |
+| ** (#1)** | | |
+| ** type*** | _enum_ | Possible values: json. |
+| ** value*** | _object_ | A string that represents a [JSON Schema](https://json-schema.org/). JSON Schema is a declarative language that allows to annotate JSON documents with types and descriptions. |
+| ** (#2)** | | |
+| ** type*** | _enum_ | Possible values: regex. |
+| ** value*** | _string_ | |
+| ** max_new_tokens** | _integer_ | |
+| ** repetition_penalty** | _number_ | |
+| ** return_full_text** | _boolean_ | |
+| ** seed** | _integer_ | |
+| ** stop** | _string[]_ | |
+| ** temperature** | _number_ | |
+| ** top_k** | _integer_ | |
+| ** top_n_tokens** | _integer_ | |
+| ** top_p** | _number_ | |
+| ** truncate** | _integer_ | |
+| ** typical_p** | _number_ | |
+| ** watermark** | _boolean_ | |
+| **stream** | _boolean_ | |
+
+
+Some options can be configured by passing headers to the Inference API. Here are the available headers:
+
+| Headers | | |
+| :--- | :--- | :--- |
+| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). |
+| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). |
+| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). |
+
+For more information about Inference API headers, check out the parameters [guide](../parameters).
+
+#### Response
+
+Output type depends on the `stream` input parameter.
+If `stream` is `false` (default), the response will be a JSON object with the following fields:
+
+| Body | |
+| :--- | :--- | :--- |
+| **details** | _object_ | |
+| ** best_of_sequences** | _object[]_ | |
+| ** finish_reason** | _enum_ | Possible values: length, eos_token, stop_sequence. |
+| ** generated_text** | _string_ | |
+| ** generated_tokens** | _integer_ | |
+| ** prefill** | _object[]_ | |
+| ** id** | _integer_ | |
+| ** logprob** | _number_ | |
+| ** text** | _string_ | |
+| ** seed** | _integer_ | |
+| ** tokens** | _object[]_ | |
+| ** id** | _integer_ | |
+| ** logprob** | _number_ | |
+| ** special** | _boolean_ | |
+| ** text** | _string_ | |
+| ** top_tokens** | _array[]_ | |
+| ** id** | _integer_ | |
+| ** logprob** | _number_ | |
+| ** special** | _boolean_ | |
+| ** text** | _string_ | |
+| ** finish_reason** | _enum_ | Possible values: length, eos_token, stop_sequence. |
+| ** generated_tokens** | _integer_ | |
+| ** prefill** | _object[]_ | |
+| ** id** | _integer_ | |
+| ** logprob** | _number_ | |
+| ** text** | _string_ | |
+| ** seed** | _integer_ | |
+| ** tokens** | _object[]_ | |
+| ** id** | _integer_ | |
+| ** logprob** | _number_ | |
+| ** special** | _boolean_ | |
+| ** text** | _string_ | |
+| ** top_tokens** | _array[]_ | |
+| ** id** | _integer_ | |
+| ** logprob** | _number_ | |
+| ** special** | _boolean_ | |
+| ** text** | _string_ | |
+| **generated_text** | _string_ | |
+
+
+If `stream` is `true`, generated tokens are returned as a stream, using Server-Sent Events (SSE).
+For more information about streaming, check out [this guide](https://huggingface.co/docs/text-generation-inference/conceptual/streaming).
+
+| Body | |
+| :--- | :--- | :--- |
+| **details** | _object_ | |
+| ** finish_reason** | _enum_ | Possible values: length, eos_token, stop_sequence. |
+| ** generated_tokens** | _integer_ | |
+| ** seed** | _integer_ | |
+| **generated_text** | _string_ | |
+| **index** | _integer_ | |
+| **token** | _object_ | |
+| ** id** | _integer_ | |
+| ** logprob** | _number_ | |
+| ** special** | _boolean_ | |
+| ** text** | _string_ | |
+| **top_tokens** | _object[]_ | |
+| ** id** | _integer_ | |
+| ** logprob** | _number_ | |
+| ** special** | _boolean_ | |
+| ** text** | _string_ | |
+
+
+### Using the API
+
+
+
+
+
+```bash
+curl https://api-inference.huggingface.co/models/google/gemma-2-2b-it \
+ -X POST \
+ -d '{"inputs": "Can you please let us know more details about your "}' \
+ -H 'Content-Type: application/json' \
+ -H "Authorization: Bearer hf_***"
+
+```
+
+
+
+```py
+import requests
+
+API_URL = "https://api-inference.huggingface.co/models/google/gemma-2-2b-it"
+headers = {"Authorization": "Bearer hf_***"}
+
+def query(payload):
+ response = requests.post(API_URL, headers=headers, json=payload)
+ return response.json()
+
+output = query({
+ "inputs": "Can you please let us know more details about your ",
+})
+```
+
+To use the Python client, see `huggingface_hub`'s [package reference](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation).
+
+
+
+```js
+async function query(data) {
+ const response = await fetch(
+ "https://api-inference.huggingface.co/models/google/gemma-2-2b-it",
+ {
+ headers: {
+ Authorization: "Bearer hf_***"
+ "Content-Type": "application/json",
+ },
+ method: "POST",
+ body: JSON.stringify(data),
+ }
+ );
+ const result = await response.json();
+ return result;
+}
+
+query({"inputs": "Can you please let us know more details about your "}).then((response) => {
+ console.log(JSON.stringify(response));
+});
+```
+
+To use the JavaScript client, see `huggingface.js`'s [package reference](https://huggingface.co/docs/huggingface.js/inference/classes/HfInference#textgeneration).
+
+
+
+
+
diff --git a/docs/api-inference/tasks/text_to_image.md b/docs/api-inference/tasks/text_to_image.md
index 810c8f68e..6697ca877 100644
--- a/docs/api-inference/tasks/text_to_image.md
+++ b/docs/api-inference/tasks/text_to_image.md
@@ -1,4 +1,4 @@
-## Text-to-image
+## Text to Image
Generate an image based on a given text prompt.
@@ -23,29 +23,32 @@ This is only a subset of the supported models. Find the model that suits you bes
| Payload | | |
| :--- | :--- | :--- |
-| **inputs** | _string, required_ | The input text data (sometimes called "prompt" |
-| **parameters** | _object, optional_ | Additional inference parameters for Text To Image |
-| ** guidance_scale** | _number, optional_ | For diffusion models. A higher guidance scale value encourages the model to generate images closely linked to the text prompt at the expense of lower image quality. |
-| ** negative_prompt** | _array, optional_ | One or several prompt to guide what NOT to include in image generation. |
-| ** num_inference_steps** | _integer, optional_ | For diffusion models. The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. |
-| ** target_size** | _object, optional_ | The size in pixel of the output image |
-| ** width** | _integer, required_ | |
-| ** height** | _integer, required_ | |
-| ** scheduler** | _string, optional_ | For diffusion models. Override the scheduler with a compatible one |
+| **inputs*** | _string_ | The input text data (sometimes called "prompt") |
+| **parameters** | _object_ | Additional inference parameters for Text To Image |
+| ** guidance_scale** | _number_ | For diffusion models. A higher guidance scale value encourages the model to generate images closely linked to the text prompt at the expense of lower image quality. |
+| ** negative_prompt** | _string[]_ | One or several prompt to guide what NOT to include in image generation. |
+| ** num_inference_steps** | _integer_ | For diffusion models. The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. |
+| ** target_size** | _object_ | The size in pixel of the output image |
+| ** width*** | _integer_ | |
+| ** height*** | _integer_ | |
+| ** scheduler** | _string_ | For diffusion models. Override the scheduler with a compatible one |
+Some options can be configured by passing headers to the Inference API. Here are the available headers:
+
| Headers | | |
| :--- | :--- | :--- |
-| **authorization** | _string, optional_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). |
-| **x-use-cache** | _boolean, optional, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). |
-| **x-wait-for-model** | _boolean, optional, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). |
+| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). |
+| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). |
+| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). |
+For more information about Inference API headers, check out the parameters [guide](../parameters).
#### Response
| Body | |
-| :--- | :--- |
-| **image** | The generated image |
+| :--- | :--- | :--- |
+| **image** | _object_ | The generated image |
### Using the API
diff --git a/scripts/api-inference/scripts/generate.ts b/scripts/api-inference/scripts/generate.ts
index 73144662a..9ef681c07 100644
--- a/scripts/api-inference/scripts/generate.ts
+++ b/scripts/api-inference/scripts/generate.ts
@@ -4,6 +4,19 @@ import * as fs from "node:fs/promises";
import * as path from "node:path/posix";
import type { JsonObject } from "type-fest";
+const TASKS: PipelineType[] = [
+ "fill-mask",
+ "image-to-image",
+ "question-answering",
+ "summarization",
+ "table-question-answering",
+ "text-classification",
+ "text-generation",
+ "text-to-image",
+];
+const TASKS_EXTENDED = [...TASKS, "chat-completion"];
+const SPECS_REVISION = "update-specification-for-docs";
+
const inferenceSnippetLanguages = ["python", "js", "curl"] as const;
type InferenceSnippetLanguage = (typeof inferenceSnippetLanguages)[number];
@@ -36,10 +49,17 @@ const TEMPLATE_DIR = path.join(ROOT_DIR, "templates");
const DOCS_DIR = path.join(ROOT_DIR, "..", "..", "docs");
const TASKS_DOCS_DIR = path.join(DOCS_DIR, "api-inference", "tasks");
-function readTemplate(templateName: string): Promise {
+const NBSP = " "; // non-breaking space
+const TABLE_INDENT = NBSP.repeat(8);
+
+function readTemplate(
+ templateName: string,
+ namespace: string,
+): Promise {
const templateNameSnakeCase = templateName.replace(/-/g, "_");
const templatePath = path.join(
TEMPLATE_DIR,
+ namespace,
`${templateNameSnakeCase}.handlebars`,
);
console.log(` π Reading ${templateNameSnakeCase}.handlebars`);
@@ -89,7 +109,7 @@ export function getInferenceSnippet(
const modelData = {
id,
pipeline_tag,
- mask_token: "",
+ mask_token: "[MASK]",
library_name: "",
config: {},
};
@@ -105,8 +125,9 @@ export function getInferenceSnippet(
type SpecNameType = "input" | "output" | "stream_output";
const SPECS_URL_TEMPLATE = Handlebars.compile(
- `https://raw.githubusercontent.com/huggingface/huggingface.js/main/packages/tasks/src/tasks/{{task}}/spec/{{name}}.json`,
+ `https://raw.githubusercontent.com/huggingface/huggingface.js/${SPECS_REVISION}/packages/tasks/src/tasks/{{task}}/spec/{{name}}.json`,
);
+const COMMON_DEFINITIONS_URL = `https://raw.githubusercontent.com/huggingface/huggingface.js/${SPECS_REVISION}/packages/tasks/src/tasks/common-definitions.json`;
async function fetchOneSpec(
task: PipelineType,
@@ -131,41 +152,143 @@ async function fetchSpecs(
};
}
-function processPayloadSchema(schema: any, prefix: string = ""): JsonObject[] {
+async function fetchCommonDefinitions(): Promise {
+ console.log(` πΈοΈ Fetching common definitions`);
+ return fetch(COMMON_DEFINITIONS_URL).then((res) => res.json());
+}
+
+const COMMON_DEFINITIONS = await fetchCommonDefinitions();
+
+function processPayloadSchema(schema: any): JsonObject[] {
let rows: JsonObject[] = [];
- Object.entries(schema.properties || {}).forEach(
- ([key, value]: [string, any]) => {
- const isRequired = schema.required?.includes(key);
- let type = value.type || "object";
+ // Helper function to resolve schema references
+ function resolveRef(ref: string) {
+ const refPath = ref.split("#/")[1].split("/");
+ let refSchema = ref.includes("common-definitions.json")
+ ? COMMON_DEFINITIONS
+ : schema;
+ for (const part of refPath) {
+ refSchema = refSchema[part];
+ }
+ return refSchema;
+ }
+
+ // Helper function to process a schema node
+ function processSchemaNode(
+ key: string,
+ value: any,
+ required: boolean,
+ parentPrefix: string,
+ ): void {
+ const isRequired = required;
+ let type = value.type || "object";
+ let description = value.description || "";
+
+ if (value.$ref) {
+ // Resolve the reference
+ value = resolveRef(value.$ref);
+ type = value.type || "object";
+ description = value.description || "";
+ }
- if (value.$ref) {
- // Handle references
- const refSchemaKey = value.$ref.split("/").pop();
- value = schema.$defs?.[refSchemaKey!];
+ if (value.enum) {
+ type = "enum";
+ description = `Possible values: ${value.enum.join(", ")}.`;
+ }
+
+ const isObject = type === "object" && value.properties;
+ const isArray = type === "array" && value.items;
+ const isCombinator = value.oneOf || value.allOf || value.anyOf;
+ const addRow =
+ !(isCombinator && isCombinator.length === 1) &&
+ !description.includes("UNUSED") &&
+ !key.includes("SKIP") &&
+ key.length > 0;
+
+ if (isCombinator && isCombinator.length > 1) {
+ description = "One of the following:";
+ }
+
+ if (isArray) {
+ if (value.items.$ref) {
+ type = "object[]";
+ } else if (value.items.type) {
+ type = `${value.items.type}[]`;
}
+ }
- const description = value.description || "";
- const isObject = type === "object" && value.properties;
+ if (addRow) {
+ // Add the row to the table except if combination with only one option
+ if (key.includes("(#")) {
+ // If it's a combination, no need to re-specify the type
+ type = "";
+ }
const row = {
- name: `${prefix}${key}`,
+ name: `${parentPrefix}${key}`,
type: type,
- description: description,
- required: isRequired ? "required" : "optional",
+ description: description.replace(/\n/g, " "),
+ required: isRequired,
};
rows.push(row);
+ }
- if (isObject) {
- // Recursively process nested objects
- rows = rows.concat(
- processPayloadSchema(
- value,
- prefix + " ",
- ),
- );
+ if (isObject) {
+ // Recursively process nested objects
+ Object.entries(value.properties || {}).forEach(
+ ([nestedKey, nestedValue]) => {
+ const nestedRequired = value.required?.includes(nestedKey);
+ processSchemaNode(
+ nestedKey,
+ nestedValue,
+ nestedRequired,
+ parentPrefix + TABLE_INDENT,
+ );
+ },
+ );
+ } else if (isArray) {
+ // Process array items
+ processSchemaNode("SKIP", value.items, false, parentPrefix);
+ } else if (isCombinator) {
+ // Process combinators like oneOf, allOf, anyOf
+ const combinators = value.oneOf || value.allOf || value.anyOf;
+ if (combinators.length === 1) {
+ // If there is only one option, process it directly
+ processSchemaNode(key, combinators[0], isRequired, parentPrefix);
+ } else {
+ // If there are multiple options, process each one as options
+ combinators.forEach((subSchema: any, index: number) => {
+ processSchemaNode(
+ `${NBSP}(#${index + 1})`,
+ subSchema,
+ isRequired,
+ parentPrefix + TABLE_INDENT,
+ );
+ });
}
- },
- );
+ }
+ }
+
+ // Start processing based on the root type of the schema
+ if (schema.type === "array") {
+ // If the root schema is an array, process its items
+ const row = {
+ name: "(array)",
+ type: `${schema.items.type}[]`,
+ description:
+ schema.items.description ||
+ `Output is an array of ${schema.items.type}s.`,
+ required: true,
+ };
+ rows.push(row);
+ processSchemaNode("", schema.items, false, "");
+ } else {
+ // Otherwise, start with the root object
+ Object.entries(schema.properties || {}).forEach(([key, value]) => {
+ const required = schema.required?.includes(key);
+ processSchemaNode(key, value, required, "");
+ });
+ }
return rows;
}
@@ -184,23 +307,21 @@ const TIP_LIST_MODELS_LINK_TEMPLATE = Handlebars.compile(
`This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag={{task}}&sort=trending).`,
);
-const SPECS_HEADERS = await readTemplate("specs-headers");
+const SPECS_HEADERS = await readTemplate("specs-headers", "common");
const SNIPPETS_TEMPLATE = Handlebars.compile(
- await readTemplate("snippets-template"),
+ await readTemplate("snippets-template", "common"),
);
const SPECS_PAYLOAD_TEMPLATE = Handlebars.compile(
- await readTemplate("specs-payload"),
+ await readTemplate("specs-payload", "common"),
);
const SPECS_OUTPUT_TEMPLATE = Handlebars.compile(
- await readTemplate("specs-output"),
+ await readTemplate("specs-output", "common"),
);
////////////////////
//// Data utils ////
////////////////////
-const TASKS: PipelineType[] = ["image-to-image", "text-to-image"];
-
const DATA: {
constants: {
specsHeaders: string;
@@ -238,12 +359,16 @@ await Promise.all(
id: string;
description: string;
inference: string | undefined;
+ config: JsonObject | undefined;
}) => {
console.log(` β‘ Checking inference status ${model.id}`);
- const modelData = await fetch(
- `https://huggingface.co/api/models/${model.id}?expand[]=inference`,
- ).then((res) => res.json());
+ let url = `https://huggingface.co/api/models/${model.id}?expand[]=inference`;
+ if (task === "text-generation") {
+ url += "&expand[]=config";
+ }
+ const modelData = await fetch(url).then((res) => res.json());
model.inference = modelData.inference;
+ model.config = modelData.config;
},
),
);
@@ -273,7 +398,8 @@ TASKS.forEach((task) => {
// Render specs
await Promise.all(
- TASKS.map(async (task) => {
+ TASKS_EXTENDED.map(async (task) => {
+ // @ts-ignore
const specs = await fetchSpecs(task);
DATA.specs[task] = {
input: specs.input
@@ -297,6 +423,45 @@ TASKS.forEach((task) => {
DATA.tips.listModelsLink[task] = TIP_LIST_MODELS_LINK_TEMPLATE({ task });
});
+///////////////////////////////////////////////
+//// Data for chat-completion special case ////
+///////////////////////////////////////////////
+
+function fetchChatCompletion() {
+ // Recommended models based on text-generation
+ DATA.models["chat-completion"] = DATA.models["text-generation"].filter(
+ // @ts-ignore
+ (model) => model.config?.tokenizer_config?.chat_template,
+ );
+
+ // Snippet specific to chat completion
+ const mainModel = DATA.models["chat-completion"][0];
+ const mainModelData = {
+ // @ts-ignore
+ id: mainModel.id,
+ pipeline_tag: "text-generation",
+ mask_token: "",
+ library_name: "",
+ // @ts-ignore
+ config: mainModel.config,
+ };
+ const taskSnippets = {
+ // @ts-ignore
+ curl: GET_SNIPPET_FN["curl"](mainModelData, "hf_***"),
+ // @ts-ignore
+ python: GET_SNIPPET_FN["python"](mainModelData, "hf_***"),
+ // @ts-ignore
+ javascript: GET_SNIPPET_FN["js"](mainModelData, "hf_***"),
+ };
+ DATA.snippets["chat-completion"] = SNIPPETS_TEMPLATE({
+ taskSnippets,
+ taskSnakeCase: "chat-completion".replace("-", "_"),
+ taskAttached: "chat-completion".replace("-", ""),
+ });
+}
+
+fetchChatCompletion();
+
/////////////////////////
//// Rendering utils ////
/////////////////////////
@@ -306,12 +471,12 @@ async function renderTemplate(
data: JsonObject,
): Promise {
console.log(`π¨ Rendering ${templateName}`);
- const template = Handlebars.compile(await readTemplate(templateName));
+ const template = Handlebars.compile(await readTemplate(templateName, "task"));
return template(data);
}
await Promise.all(
- TASKS.map(async (task) => {
+ TASKS_EXTENDED.map(async (task) => {
// @ts-ignore
const rendered = await renderTemplate(task, DATA);
await writeTaskDoc(task, rendered);
diff --git a/scripts/api-inference/templates/snippets_template.handlebars b/scripts/api-inference/templates/common/snippets_template.handlebars
similarity index 100%
rename from scripts/api-inference/templates/snippets_template.handlebars
rename to scripts/api-inference/templates/common/snippets_template.handlebars
diff --git a/scripts/api-inference/templates/common/specs_headers.handlebars b/scripts/api-inference/templates/common/specs_headers.handlebars
new file mode 100644
index 000000000..32b6e9d94
--- /dev/null
+++ b/scripts/api-inference/templates/common/specs_headers.handlebars
@@ -0,0 +1,9 @@
+Some options can be configured by passing headers to the Inference API. Here are the available headers:
+
+| Headers | | |
+| :--- | :--- | :--- |
+| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). |
+| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). |
+| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). |
+
+For more information about Inference API headers, check out the parameters [guide](../parameters).
\ No newline at end of file
diff --git a/scripts/api-inference/templates/common/specs_output.handlebars b/scripts/api-inference/templates/common/specs_output.handlebars
new file mode 100644
index 000000000..7d0e7b4c0
--- /dev/null
+++ b/scripts/api-inference/templates/common/specs_output.handlebars
@@ -0,0 +1,9 @@
+| Body | |
+| :--- | :--- | :--- |
+{{#each schema}}
+{{#if type}}
+| **{{{name}}}** | _{{type}}_ | {{{description}}} |
+{{else}}
+| **{{{name}}}** | | {{{description}}} |
+{{/if}}
+{{/each}}
\ No newline at end of file
diff --git a/scripts/api-inference/templates/common/specs_payload.handlebars b/scripts/api-inference/templates/common/specs_payload.handlebars
new file mode 100644
index 000000000..6459be5d9
--- /dev/null
+++ b/scripts/api-inference/templates/common/specs_payload.handlebars
@@ -0,0 +1,9 @@
+| Payload | | |
+| :--- | :--- | :--- |
+{{#each schema}}
+{{#if type}}
+| **{{{name}}}{{#if required}}*{{/if}}** | _{{type}}_ | {{{description}}} |
+{{else}}
+| **{{{name}}}** | | {{{description}}} |
+{{/if}}
+{{/each}}
\ No newline at end of file
diff --git a/scripts/api-inference/templates/specs_headers.handlebars b/scripts/api-inference/templates/specs_headers.handlebars
deleted file mode 100644
index 44b28ecc8..000000000
--- a/scripts/api-inference/templates/specs_headers.handlebars
+++ /dev/null
@@ -1,5 +0,0 @@
-| Headers | | |
-| :--- | :--- | :--- |
-| **authorization** | _string, optional_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). |
-| **x-use-cache** | _boolean, optional, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). |
-| **x-wait-for-model** | _boolean, optional, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). |
diff --git a/scripts/api-inference/templates/specs_output.handlebars b/scripts/api-inference/templates/specs_output.handlebars
deleted file mode 100644
index 7f3391b98..000000000
--- a/scripts/api-inference/templates/specs_output.handlebars
+++ /dev/null
@@ -1,5 +0,0 @@
-| Body | |
-| :--- | :--- |
-{{#each schema}}
-| **{{{name}}}** | {{{description}}} |
-{{/each}}
\ No newline at end of file
diff --git a/scripts/api-inference/templates/specs_payload.handlebars b/scripts/api-inference/templates/specs_payload.handlebars
deleted file mode 100644
index 70460b184..000000000
--- a/scripts/api-inference/templates/specs_payload.handlebars
+++ /dev/null
@@ -1,5 +0,0 @@
-| Payload | | |
-| :--- | :--- | :--- |
-{{#each schema}}
-| **{{{name}}}** | _{{type}}, {{required}}_ | {{{description}}} |
-{{/each}}
\ No newline at end of file
diff --git a/scripts/api-inference/templates/task/chat_completion.handlebars b/scripts/api-inference/templates/task/chat_completion.handlebars
new file mode 100644
index 000000000..f1274f5c5
--- /dev/null
+++ b/scripts/api-inference/templates/task/chat_completion.handlebars
@@ -0,0 +1,38 @@
+## Chat Completion
+
+Generate a response given a list of messages.
+This is a subtask of [`text-generation`](./text_generation) designed to generate responses in a conversational context.
+
+{{{tips.linksToTaskPage.chat-completion}}}
+
+### Recommended models
+
+{{#each models.chat-completion}}
+- [{{this.id}}](https://huggingface.co/{{this.id}}): {{this.description}}
+{{/each}}
+
+{{{tips.listModelsLink.chat-completion}}}
+
+### API specification
+
+#### Request
+
+{{{specs.chat-completion.input}}}
+
+{{{constants.specsHeaders}}}
+
+#### Response
+
+Output type depends on the `stream` input parameter.
+If `stream` is `false` (default), the response will be a JSON object with the following fields:
+
+{{{specs.chat-completion.output}}}
+
+If `stream` is `true`, generated tokens are returned as a stream, using Server-Sent Events (SSE).
+For more information about streaming, check out [this guide](https://huggingface.co/docs/text-generation-inference/conceptual/streaming).
+
+{{{specs.chat-completion.stream_output}}}
+
+### Using the API
+
+{{{snippets.chat-completion}}}
diff --git a/scripts/api-inference/templates/task/fill_mask.handlebars b/scripts/api-inference/templates/task/fill_mask.handlebars
new file mode 100644
index 000000000..663d2ab9f
--- /dev/null
+++ b/scripts/api-inference/templates/task/fill_mask.handlebars
@@ -0,0 +1,29 @@
+## Fill-mask
+
+Mask filling is the task of predicting the right word (token to be precise) in the middle of a sequence.
+
+{{{tips.linksToTaskPage.fill-mask}}}
+
+### Recommended models
+
+{{#each models.fill-mask}}
+- [{{this.id}}](https://huggingface.co/{{this.id}}): {{this.description}}
+{{/each}}
+
+{{{tips.listModelsLink.fill-mask}}}
+
+### API specification
+
+#### Request
+
+{{{specs.fill-mask.input}}}
+
+{{{constants.specsHeaders}}}
+
+#### Response
+
+{{{specs.fill-mask.output}}}
+
+### Using the API
+
+{{{snippets.fill-mask}}}
diff --git a/scripts/api-inference/templates/image_to_image.handlebars b/scripts/api-inference/templates/task/image_to_image.handlebars
similarity index 97%
rename from scripts/api-inference/templates/image_to_image.handlebars
rename to scripts/api-inference/templates/task/image_to_image.handlebars
index b432eab19..258dec814 100644
--- a/scripts/api-inference/templates/image_to_image.handlebars
+++ b/scripts/api-inference/templates/task/image_to_image.handlebars
@@ -1,4 +1,4 @@
-## Image-to-image
+## Image to Image
Image-to-image is the task of transforming a source image to match the characteristics of a target image or a target image domain.
Any image manipulation and enhancement is possible with image to image models.
diff --git a/scripts/api-inference/templates/task/question_answering.handlebars b/scripts/api-inference/templates/task/question_answering.handlebars
new file mode 100644
index 000000000..101d00fcc
--- /dev/null
+++ b/scripts/api-inference/templates/task/question_answering.handlebars
@@ -0,0 +1,29 @@
+## Question Answering
+
+Question Answering models can retrieve the answer to a question from a given text, which is useful for searching for an answer in a document.
+
+{{{tips.linksToTaskPage.question-answering}}}
+
+### Recommended models
+
+{{#each models.question-answering}}
+- [{{this.id}}](https://huggingface.co/{{this.id}}): {{this.description}}
+{{/each}}
+
+{{{tips.listModelsLink.question-answering}}}
+
+### API specification
+
+#### Request
+
+{{{specs.question-answering.input}}}
+
+{{{constants.specsHeaders}}}
+
+#### Response
+
+{{{specs.question-answering.output}}}
+
+### Using the API
+
+{{{snippets.question-answering}}}
diff --git a/scripts/api-inference/templates/task/summarization.handlebars b/scripts/api-inference/templates/task/summarization.handlebars
new file mode 100644
index 000000000..890487215
--- /dev/null
+++ b/scripts/api-inference/templates/task/summarization.handlebars
@@ -0,0 +1,29 @@
+## Summarization
+
+Summarization is the task of producing a shorter version of a document while preserving its important information. Some models can extract text from the original input, while other models can generate entirely new text.
+
+{{{tips.linksToTaskPage.summarization}}}
+
+### Recommended models
+
+{{#each models.summarization}}
+- [{{this.id}}](https://huggingface.co/{{this.id}}): {{this.description}}
+{{/each}}
+
+{{{tips.listModelsLink.summarization}}}
+
+### API specification
+
+#### Request
+
+{{{specs.summarization.input}}}
+
+{{{constants.specsHeaders}}}
+
+#### Response
+
+{{{specs.summarization.output}}}
+
+### Using the API
+
+{{{snippets.summarization}}}
diff --git a/scripts/api-inference/templates/task/table_question_answering.handlebars b/scripts/api-inference/templates/task/table_question_answering.handlebars
new file mode 100644
index 000000000..4ae8b53fc
--- /dev/null
+++ b/scripts/api-inference/templates/task/table_question_answering.handlebars
@@ -0,0 +1,29 @@
+## Table Question Answering
+
+Table Question Answering (Table QA) is the answering a question about an information on a given table.
+
+{{{tips.linksToTaskPage.table-question-answering}}}
+
+### Recommended models
+
+{{#each models.table-question-answering}}
+- [{{this.id}}](https://huggingface.co/{{this.id}}): {{this.description}}
+{{/each}}
+
+{{{tips.listModelsLink.table-question-answering}}}
+
+### API specification
+
+#### Request
+
+{{{specs.table-question-answering.input}}}
+
+{{{constants.specsHeaders}}}
+
+#### Response
+
+{{{specs.table-question-answering.output}}}
+
+### Using the API
+
+{{{snippets.table-question-answering}}}
diff --git a/scripts/api-inference/templates/task/text_classification.handlebars b/scripts/api-inference/templates/task/text_classification.handlebars
new file mode 100644
index 000000000..99c3cabe8
--- /dev/null
+++ b/scripts/api-inference/templates/task/text_classification.handlebars
@@ -0,0 +1,29 @@
+## Text Classification
+
+Text Classification is the task of assigning a label or class to a given text. Some use cases are sentiment analysis, natural language inference, and assessing grammatical correctness.
+
+{{{tips.linksToTaskPage.text-classification}}}
+
+### Recommended models
+
+{{#each models.text-classification}}
+- [{{this.id}}](https://huggingface.co/{{this.id}}): {{this.description}}
+{{/each}}
+
+{{{tips.listModelsLink.text-classification}}}
+
+### API specification
+
+#### Request
+
+{{{specs.text-classification.input}}}
+
+{{{constants.specsHeaders}}}
+
+#### Response
+
+{{{specs.text-classification.output}}}
+
+### Using the API
+
+{{{snippets.text-classification}}}
diff --git a/scripts/api-inference/templates/task/text_generation.handlebars b/scripts/api-inference/templates/task/text_generation.handlebars
new file mode 100644
index 000000000..85bbba97a
--- /dev/null
+++ b/scripts/api-inference/templates/task/text_generation.handlebars
@@ -0,0 +1,39 @@
+## Text Generation
+
+Generate text based on a prompt.
+
+If you are interested in a Chat Completion task, which generates a response based on a list of messages, check out the [`chat-completion`](./chat_completion) task.
+
+{{{tips.linksToTaskPage.text-generation}}}
+
+### Recommended models
+
+{{#each models.text-generation}}
+- [{{this.id}}](https://huggingface.co/{{this.id}}): {{this.description}}
+{{/each}}
+
+{{{tips.listModelsLink.text-generation}}}
+
+### API specification
+
+#### Request
+
+{{{specs.text-generation.input}}}
+
+{{{constants.specsHeaders}}}
+
+#### Response
+
+Output type depends on the `stream` input parameter.
+If `stream` is `false` (default), the response will be a JSON object with the following fields:
+
+{{{specs.text-generation.output}}}
+
+If `stream` is `true`, generated tokens are returned as a stream, using Server-Sent Events (SSE).
+For more information about streaming, check out [this guide](https://huggingface.co/docs/text-generation-inference/conceptual/streaming).
+
+{{{specs.text-generation.stream_output}}}
+
+### Using the API
+
+{{{snippets.text-generation}}}
diff --git a/scripts/api-inference/templates/text_to_image.handlebars b/scripts/api-inference/templates/task/text_to_image.handlebars
similarity index 96%
rename from scripts/api-inference/templates/text_to_image.handlebars
rename to scripts/api-inference/templates/task/text_to_image.handlebars
index 6c9c568d1..6e6ffd0c6 100644
--- a/scripts/api-inference/templates/text_to_image.handlebars
+++ b/scripts/api-inference/templates/task/text_to_image.handlebars
@@ -1,4 +1,4 @@
-## Text-to-image
+## Text to Image
Generate an image based on a given text prompt.