From 1f1df45ecae3dee2b5ba77a85415f7f8db7fd5a0 Mon Sep 17 00:00:00 2001 From: Wauplin Date: Tue, 27 Aug 2024 16:24:32 +0200 Subject: [PATCH 1/9] first draft to add text-generation parameters --- docs/api-inference/_toctree.yml | 2 + docs/api-inference/tasks/image_to_image.md | 4 +- docs/api-inference/tasks/text_generation.md | 165 ++++++++++++++++++ docs/api-inference/tasks/text_to_image.md | 4 +- scripts/api-inference/scripts/generate.ts | 122 ++++++++++--- .../templates/specs_output.handlebars | 8 +- .../templates/specs_payload.handlebars | 4 + .../templates/text_generation.handlebars | 37 ++++ 8 files changed, 317 insertions(+), 29 deletions(-) create mode 100644 docs/api-inference/tasks/text_generation.md create mode 100644 scripts/api-inference/templates/text_generation.handlebars diff --git a/docs/api-inference/_toctree.yml b/docs/api-inference/_toctree.yml index a68f3abfb..a91f5eb55 100644 --- a/docs/api-inference/_toctree.yml +++ b/docs/api-inference/_toctree.yml @@ -16,6 +16,8 @@ title: Fill Mask - local: tasks/image_to_image title: Image-to-image + - local: tasks/text_generation + title: Text generation - local: tasks/text_to_image title: Text-to-image title: Detailed Task Parameters diff --git a/docs/api-inference/tasks/image_to_image.md b/docs/api-inference/tasks/image_to_image.md index 1b5e2241e..f4c12c868 100644 --- a/docs/api-inference/tasks/image_to_image.md +++ b/docs/api-inference/tasks/image_to_image.md @@ -51,8 +51,8 @@ This is only a subset of the supported models. Find the model that suits you bes #### Response | Body | | -| :--- | :--- | -| **image** | The output image | +| :--- | :--- | :--- | +| **image** | _object_ | The output image | ### Using the API diff --git a/docs/api-inference/tasks/text_generation.md b/docs/api-inference/tasks/text_generation.md new file mode 100644 index 000000000..09c8c7ee9 --- /dev/null +++ b/docs/api-inference/tasks/text_generation.md @@ -0,0 +1,165 @@ +## Text generation + +Generate text based on a prompt. + + + +For more details about the `text-generation` task, check out its [dedicated page](https://huggingface.co/tasks/text-generation)! You will find examples and related materials. + + + +### Recommended models + +- [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it): A text-generation model trained to follow instructions. +- [bigcode/starcoder](https://huggingface.co/bigcode/starcoder): A code generation model that can generate code in 80+ languages. +- [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct): Very powerful text generation model trained to follow instructions. +- [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct): Small yet powerful text generation model. +- [AI-MO/NuminaMath-7B-TIR](https://huggingface.co/AI-MO/NuminaMath-7B-TIR): A very powerful model that can solve mathematical problems. +- [HuggingFaceH4/starchat2-15b-v0.1](https://huggingface.co/HuggingFaceH4/starchat2-15b-v0.1): Strong coding assistant model. +- [mistralai/Mistral-Nemo-Instruct-2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407): Very strong open-source large language model. + +This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=text-generation&sort=trending). + +### API specification + +#### Request + +| Payload | | | +| :--- | :--- | :--- | +| **inputs** | _string, required_ | | +| **parameters** | _object, optional_ | | +| **        best_of** | _integer, optional_ | | +| **        decoder_input_details** | _boolean, optional_ | | +| **        details** | _boolean, optional_ | | +| **        do_sample** | _boolean, optional_ | | +| **        frequency_penalty** | _number, optional_ | | +| **        grammar** | _object, optional_ | One of the following: | +| **                 (#1)** | | | +| **                        type** | _enum, required_ | Possible values: json | +| **                        value** | _object, required_ | A string that represents a [JSON Schema](https://json-schema.org/).

JSON Schema is a declarative language that allows to annotate JSON documents
with types and descriptions. | +| **                 (#2)** | | | +| **                        type** | _enum, required_ | Possible values: regex | +| **                        value** | _string, required_ | | +| **        max_new_tokens** | _integer, optional_ | | +| **        repetition_penalty** | _number, optional_ | | +| **        return_full_text** | _boolean, optional_ | | +| **        seed** | _integer, optional_ | | +| **        stop** | _array, optional_ | | +| **        temperature** | _number, optional_ | | +| **        top_k** | _integer, optional_ | | +| **        top_n_tokens** | _integer, optional_ | | +| **        top_p** | _number, optional_ | | +| **        truncate** | _integer, optional_ | | +| **        typical_p** | _number, optional_ | | +| **        watermark** | _boolean, optional_ | | +| **stream** | _boolean, optional_ | | + + +| Headers | | | +| :--- | :--- | :--- | +| **authorization** | _string, optional_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). | +| **x-use-cache** | _boolean, optional, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). | +| **x-wait-for-model** | _boolean, optional, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). | + + +#### Response + +Output type depends on the `stream` input parameter. +If `stream` is `false` (default), the response will be a JSON object with the following fields: + +| Body | | +| :--- | :--- | :--- | +| **details** | _object_ | | +| **        best_of_sequences** | _array_ | | +| **        finish_reason** | _enum_ | Possible values: length, eos_token, stop_sequence | +| **        generated_tokens** | _integer_ | | +| **        prefill** | _array_ | | +| **        seed** | _integer_ | | +| **        tokens** | _array_ | | +| **        top_tokens** | _array_ | | +| **generated_text** | _string_ | | + + +If `stream` is `true`, generated tokens are returned as a stream, using Server-Sent Events (SSE). +For more information about streaming, check out [this guide](https://huggingface.co/docs/text-generation-inference/conceptual/streaming). + +| Body | | +| :--- | :--- | :--- | +| **details** | _object_ | | +| **        finish_reason** | _enum_ | Possible values: length, eos_token, stop_sequence | +| **        generated_tokens** | _integer_ | | +| **        seed** | _integer_ | | +| **generated_text** | _string_ | | +| **index** | _integer_ | | +| **token** | _object_ | | +| **        id** | _integer_ | | +| **        logprob** | _number_ | | +| **        special** | _boolean_ | | +| **        text** | _string_ | | +| **top_tokens** | _array_ | | + + +### Using the API + + + + + +```bash +curl https://api-inference.huggingface.co/models/google/gemma-2-2b-it \ + -X POST \ + -d '{"inputs": "Can you please let us know more details about your "}' \ + -H 'Content-Type: application/json' \ + -H "Authorization: Bearer hf_***" + +``` + + + +```py +import requests + +API_URL = "https://api-inference.huggingface.co/models/google/gemma-2-2b-it" +headers = {"Authorization": "Bearer hf_***"} + +def query(payload): + response = requests.post(API_URL, headers=headers, json=payload) + return response.json() + +output = query({ + "inputs": "Can you please let us know more details about your ", +}) +``` + +To use the Python client, see `huggingface_hub`'s [package reference](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation). + + + +```js +async function query(data) { + const response = await fetch( + "https://api-inference.huggingface.co/models/google/gemma-2-2b-it", + { + headers: { + Authorization: "Bearer hf_***" + "Content-Type": "application/json", + }, + method: "POST", + body: JSON.stringify(data), + } + ); + const result = await response.json(); + return result; +} + +query({"inputs": "Can you please let us know more details about your "}).then((response) => { + console.log(JSON.stringify(response)); +}); +``` + +To use the JavaScript client, see `huggingface.js`'s [package reference](https://huggingface.co/docs/huggingface.js/inference/classes/HfInference#textgeneration). + + + + + diff --git a/docs/api-inference/tasks/text_to_image.md b/docs/api-inference/tasks/text_to_image.md index 810c8f68e..9f2dd2a7d 100644 --- a/docs/api-inference/tasks/text_to_image.md +++ b/docs/api-inference/tasks/text_to_image.md @@ -44,8 +44,8 @@ This is only a subset of the supported models. Find the model that suits you bes #### Response | Body | | -| :--- | :--- | -| **image** | The generated image | +| :--- | :--- | :--- | +| **image** | _object_ | The generated image | ### Using the API diff --git a/scripts/api-inference/scripts/generate.ts b/scripts/api-inference/scripts/generate.ts index 73144662a..2cb04ec9d 100644 --- a/scripts/api-inference/scripts/generate.ts +++ b/scripts/api-inference/scripts/generate.ts @@ -131,39 +131,111 @@ async function fetchSpecs( }; } -function processPayloadSchema(schema: any, prefix: string = ""): JsonObject[] { +function processPayloadSchema( + schema: any, + definitions: any = {}, + prefix: string = "", +): JsonObject[] { let rows: JsonObject[] = []; - Object.entries(schema.properties || {}).forEach( - ([key, value]: [string, any]) => { - const isRequired = schema.required?.includes(key); - let type = value.type || "object"; + // Helper function to resolve schema references + function resolveRef(ref: string) { + const refPath = ref.split("/").slice(1); // remove the initial # + let refSchema = schema; + for (const part of refPath) { + refSchema = refSchema[part]; + } + return refSchema; + } - if (value.$ref) { - // Handle references - const refSchemaKey = value.$ref.split("/").pop(); - value = schema.$defs?.[refSchemaKey!]; - } + // Helper function to process a schema node + function processSchemaNode( + key: string, + value: any, + required: boolean, + parentPrefix: string, + ): void { + const isRequired = required; + let type = value.type || "object"; + let description = value.description || ""; + + if (value.$ref) { + // Resolve the reference + value = resolveRef(value.$ref); + type = value.type || "object"; + description = value.description || ""; + } + + if (value.enum) { + type = "enum"; + description = `Possible values: ${value.enum.join(", ")}`; + } - const description = value.description || ""; - const isObject = type === "object" && value.properties; + const isObject = type === "object" && value.properties; + const isArray = type === "array" && value.items; + const isCombinator = value.oneOf || value.allOf || value.anyOf; + const addRow = !(isCombinator && isCombinator.length === 1); + + if (isCombinator && isCombinator.length > 1) { + description = "One of the following:"; + } + + if (addRow) { + // Add the row to the table except if combination with only one option + if (key.includes("(#")) { + // If it's a combination, no need to re-specify the type + type = ""; + } const row = { - name: `${prefix}${key}`, + name: `${parentPrefix}${key}`, type: type, - description: description, + description: description.replace(/\n/g, "
"), required: isRequired ? "required" : "optional", }; rows.push(row); + } - if (isObject) { - // Recursively process nested objects - rows = rows.concat( - processPayloadSchema( - value, - prefix + "        ", - ), - ); + if (isObject) { + // Recursively process nested objects + Object.entries(value.properties || {}).forEach( + ([nestedKey, nestedValue]) => { + const nestedRequired = value.required?.includes(nestedKey); + processSchemaNode( + nestedKey, + nestedValue, + nestedRequired, + parentPrefix + "        ", + ); + }, + ); + } else if (isArray) { + // Process array items + // processSchemaNode(key + "[]", value.items, false, parentPrefix + "        "); + } else if (isCombinator) { + // Process combinators like oneOf, allOf, anyOf + const combinators = value.oneOf || value.allOf || value.anyOf; + if (combinators.length === 1) { + // If there is only one option, process it directly + processSchemaNode(key, combinators[0], isRequired, parentPrefix); + } else { + // If there are multiple options, process each one as options + combinators.forEach((subSchema: any, index: number) => { + processSchemaNode( + ` (#${index + 1})`, + subSchema, + isRequired, + parentPrefix + "        ", + ); + }); } + } + } + + // Start processing the root schema + Object.entries(schema.properties || {}).forEach( + ([key, value]: [string, any]) => { + const isRequired = schema.required?.includes(key); + processSchemaNode(key, value, isRequired, prefix); }, ); @@ -199,7 +271,11 @@ const SPECS_OUTPUT_TEMPLATE = Handlebars.compile( //// Data utils //// //////////////////// -const TASKS: PipelineType[] = ["image-to-image", "text-to-image"]; +const TASKS: PipelineType[] = [ + "image-to-image", + "text-generation", + "text-to-image", +]; const DATA: { constants: { diff --git a/scripts/api-inference/templates/specs_output.handlebars b/scripts/api-inference/templates/specs_output.handlebars index 7f3391b98..7d0e7b4c0 100644 --- a/scripts/api-inference/templates/specs_output.handlebars +++ b/scripts/api-inference/templates/specs_output.handlebars @@ -1,5 +1,9 @@ | Body | | -| :--- | :--- | +| :--- | :--- | :--- | {{#each schema}} -| **{{{name}}}** | {{{description}}} | +{{#if type}} +| **{{{name}}}** | _{{type}}_ | {{{description}}} | +{{else}} +| **{{{name}}}** | | {{{description}}} | +{{/if}} {{/each}} \ No newline at end of file diff --git a/scripts/api-inference/templates/specs_payload.handlebars b/scripts/api-inference/templates/specs_payload.handlebars index 70460b184..f75404bc9 100644 --- a/scripts/api-inference/templates/specs_payload.handlebars +++ b/scripts/api-inference/templates/specs_payload.handlebars @@ -1,5 +1,9 @@ | Payload | | | | :--- | :--- | :--- | {{#each schema}} +{{#if type}} | **{{{name}}}** | _{{type}}, {{required}}_ | {{{description}}} | +{{else}} +| **{{{name}}}** | | {{{description}}} | +{{/if}} {{/each}} \ No newline at end of file diff --git a/scripts/api-inference/templates/text_generation.handlebars b/scripts/api-inference/templates/text_generation.handlebars new file mode 100644 index 000000000..e7b27b919 --- /dev/null +++ b/scripts/api-inference/templates/text_generation.handlebars @@ -0,0 +1,37 @@ +## Text generation + +Generate text based on a prompt. + +{{{tips.linksToTaskPage.text-generation}}} + +### Recommended models + +{{#each models.text-generation}} +- [{{this.id}}](https://huggingface.co/{{this.id}}): {{this.description}} +{{/each}} + +{{{tips.listModelsLink.text-generation}}} + +### API specification + +#### Request + +{{{specs.text-generation.input}}} + +{{{constants.specsHeaders}}} + +#### Response + +Output type depends on the `stream` input parameter. +If `stream` is `false` (default), the response will be a JSON object with the following fields: + +{{{specs.text-generation.output}}} + +If `stream` is `true`, generated tokens are returned as a stream, using Server-Sent Events (SSE). +For more information about streaming, check out [this guide](https://huggingface.co/docs/text-generation-inference/conceptual/streaming). + +{{{specs.text-generation.stream_output}}} + +### Using the API + +{{{snippets.text-generation}}} From 98a1b54fb7fe573e186625ffbff6a202a42e4ea5 Mon Sep 17 00:00:00 2001 From: Wauplin Date: Tue, 27 Aug 2024 16:31:15 +0200 Subject: [PATCH 2/9] headers --- docs/api-inference/tasks/image_to_image.md | 7 +++++++ docs/api-inference/tasks/text_generation.md | 7 +++++++ docs/api-inference/tasks/text_to_image.md | 7 +++++++ scripts/api-inference/templates/image_to_image.handlebars | 4 ++++ scripts/api-inference/templates/specs_headers.handlebars | 4 ++++ scripts/api-inference/templates/text_generation.handlebars | 4 ++++ scripts/api-inference/templates/text_to_image.handlebars | 4 ++++ 7 files changed, 37 insertions(+) diff --git a/docs/api-inference/tasks/image_to_image.md b/docs/api-inference/tasks/image_to_image.md index f4c12c868..55f84ab64 100644 --- a/docs/api-inference/tasks/image_to_image.md +++ b/docs/api-inference/tasks/image_to_image.md @@ -29,6 +29,8 @@ This is only a subset of the supported models. Find the model that suits you bes #### Request +##### Payload + | Payload | | | | :--- | :--- | :--- | | **inputs** | _object, required_ | The input image data | @@ -41,12 +43,17 @@ This is only a subset of the supported models. Find the model that suits you bes | **                height** | _integer, required_ | | +##### Headers + +Some options can be configured by passing headers to the Inference API. Here are the available headers: + | Headers | | | | :--- | :--- | :--- | | **authorization** | _string, optional_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). | | **x-use-cache** | _boolean, optional, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). | | **x-wait-for-model** | _boolean, optional, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). | +For more information about Inference API headers, check out the parameters [guide](../parameters). #### Response diff --git a/docs/api-inference/tasks/text_generation.md b/docs/api-inference/tasks/text_generation.md index 09c8c7ee9..94d764ec0 100644 --- a/docs/api-inference/tasks/text_generation.md +++ b/docs/api-inference/tasks/text_generation.md @@ -24,6 +24,8 @@ This is only a subset of the supported models. Find the model that suits you bes #### Request +##### Payload + | Payload | | | | :--- | :--- | :--- | | **inputs** | _string, required_ | | @@ -55,12 +57,17 @@ This is only a subset of the supported models. Find the model that suits you bes | **stream** | _boolean, optional_ | | +##### Headers + +Some options can be configured by passing headers to the Inference API. Here are the available headers: + | Headers | | | | :--- | :--- | :--- | | **authorization** | _string, optional_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). | | **x-use-cache** | _boolean, optional, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). | | **x-wait-for-model** | _boolean, optional, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). | +For more information about Inference API headers, check out the parameters [guide](../parameters). #### Response diff --git a/docs/api-inference/tasks/text_to_image.md b/docs/api-inference/tasks/text_to_image.md index 9f2dd2a7d..69db59cb3 100644 --- a/docs/api-inference/tasks/text_to_image.md +++ b/docs/api-inference/tasks/text_to_image.md @@ -21,6 +21,8 @@ This is only a subset of the supported models. Find the model that suits you bes #### Request +##### Payload + | Payload | | | | :--- | :--- | :--- | | **inputs** | _string, required_ | The input text data (sometimes called "prompt" | @@ -34,12 +36,17 @@ This is only a subset of the supported models. Find the model that suits you bes | **        scheduler** | _string, optional_ | For diffusion models. Override the scheduler with a compatible one | +##### Headers + +Some options can be configured by passing headers to the Inference API. Here are the available headers: + | Headers | | | | :--- | :--- | :--- | | **authorization** | _string, optional_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). | | **x-use-cache** | _boolean, optional, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). | | **x-wait-for-model** | _boolean, optional, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). | +For more information about Inference API headers, check out the parameters [guide](../parameters). #### Response diff --git a/scripts/api-inference/templates/image_to_image.handlebars b/scripts/api-inference/templates/image_to_image.handlebars index b432eab19..530ce22b8 100644 --- a/scripts/api-inference/templates/image_to_image.handlebars +++ b/scripts/api-inference/templates/image_to_image.handlebars @@ -23,8 +23,12 @@ Use cases heavily depend on the model and the dataset it was trained on, but som #### Request +##### Payload + {{{specs.image-to-image.input}}} +##### Headers + {{{constants.specsHeaders}}} #### Response diff --git a/scripts/api-inference/templates/specs_headers.handlebars b/scripts/api-inference/templates/specs_headers.handlebars index 44b28ecc8..2b952054f 100644 --- a/scripts/api-inference/templates/specs_headers.handlebars +++ b/scripts/api-inference/templates/specs_headers.handlebars @@ -1,5 +1,9 @@ +Some options can be configured by passing headers to the Inference API. Here are the available headers: + | Headers | | | | :--- | :--- | :--- | | **authorization** | _string, optional_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). | | **x-use-cache** | _boolean, optional, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). | | **x-wait-for-model** | _boolean, optional, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). | + +For more information about Inference API headers, check out the parameters [guide](../parameters). \ No newline at end of file diff --git a/scripts/api-inference/templates/text_generation.handlebars b/scripts/api-inference/templates/text_generation.handlebars index e7b27b919..71d63d6d5 100644 --- a/scripts/api-inference/templates/text_generation.handlebars +++ b/scripts/api-inference/templates/text_generation.handlebars @@ -16,8 +16,12 @@ Generate text based on a prompt. #### Request +##### Payload + {{{specs.text-generation.input}}} +##### Headers + {{{constants.specsHeaders}}} #### Response diff --git a/scripts/api-inference/templates/text_to_image.handlebars b/scripts/api-inference/templates/text_to_image.handlebars index 6c9c568d1..54712d80d 100644 --- a/scripts/api-inference/templates/text_to_image.handlebars +++ b/scripts/api-inference/templates/text_to_image.handlebars @@ -16,8 +16,12 @@ Generate an image based on a given text prompt. #### Request +##### Payload + {{{specs.text-to-image.input}}} +##### Headers + {{{constants.specsHeaders}}} #### Response From 97c7a8b0c37029f5f9a6d403dcdb73576ab94120 Mon Sep 17 00:00:00 2001 From: Wauplin Date: Tue, 27 Aug 2024 16:35:28 +0200 Subject: [PATCH 3/9] more structure --- scripts/api-inference/scripts/generate.ts | 16 ++++++++++------ .../{ => common}/snippets_template.handlebars | 0 .../{ => common}/specs_headers.handlebars | 0 .../{ => common}/specs_output.handlebars | 0 .../{ => common}/specs_payload.handlebars | 0 .../templates/task/chat_completion.handlebars | 0 .../{ => task}/image_to_image.handlebars | 0 .../{ => task}/text_generation.handlebars | 0 .../{ => task}/text_to_image.handlebars | 0 9 files changed, 10 insertions(+), 6 deletions(-) rename scripts/api-inference/templates/{ => common}/snippets_template.handlebars (100%) rename scripts/api-inference/templates/{ => common}/specs_headers.handlebars (100%) rename scripts/api-inference/templates/{ => common}/specs_output.handlebars (100%) rename scripts/api-inference/templates/{ => common}/specs_payload.handlebars (100%) create mode 100644 scripts/api-inference/templates/task/chat_completion.handlebars rename scripts/api-inference/templates/{ => task}/image_to_image.handlebars (100%) rename scripts/api-inference/templates/{ => task}/text_generation.handlebars (100%) rename scripts/api-inference/templates/{ => task}/text_to_image.handlebars (100%) diff --git a/scripts/api-inference/scripts/generate.ts b/scripts/api-inference/scripts/generate.ts index 2cb04ec9d..fa4712872 100644 --- a/scripts/api-inference/scripts/generate.ts +++ b/scripts/api-inference/scripts/generate.ts @@ -36,10 +36,14 @@ const TEMPLATE_DIR = path.join(ROOT_DIR, "templates"); const DOCS_DIR = path.join(ROOT_DIR, "..", "..", "docs"); const TASKS_DOCS_DIR = path.join(DOCS_DIR, "api-inference", "tasks"); -function readTemplate(templateName: string): Promise { +function readTemplate( + templateName: string, + namespace: string, +): Promise { const templateNameSnakeCase = templateName.replace(/-/g, "_"); const templatePath = path.join( TEMPLATE_DIR, + namespace, `${templateNameSnakeCase}.handlebars`, ); console.log(` πŸ” Reading ${templateNameSnakeCase}.handlebars`); @@ -256,15 +260,15 @@ const TIP_LIST_MODELS_LINK_TEMPLATE = Handlebars.compile( `This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag={{task}}&sort=trending).`, ); -const SPECS_HEADERS = await readTemplate("specs-headers"); +const SPECS_HEADERS = await readTemplate("specs-headers", "common"); const SNIPPETS_TEMPLATE = Handlebars.compile( - await readTemplate("snippets-template"), + await readTemplate("snippets-template", "common"), ); const SPECS_PAYLOAD_TEMPLATE = Handlebars.compile( - await readTemplate("specs-payload"), + await readTemplate("specs-payload", "common"), ); const SPECS_OUTPUT_TEMPLATE = Handlebars.compile( - await readTemplate("specs-output"), + await readTemplate("specs-output", "common"), ); //////////////////// @@ -382,7 +386,7 @@ async function renderTemplate( data: JsonObject, ): Promise { console.log(`🎨 Rendering ${templateName}`); - const template = Handlebars.compile(await readTemplate(templateName)); + const template = Handlebars.compile(await readTemplate(templateName, "task")); return template(data); } diff --git a/scripts/api-inference/templates/snippets_template.handlebars b/scripts/api-inference/templates/common/snippets_template.handlebars similarity index 100% rename from scripts/api-inference/templates/snippets_template.handlebars rename to scripts/api-inference/templates/common/snippets_template.handlebars diff --git a/scripts/api-inference/templates/specs_headers.handlebars b/scripts/api-inference/templates/common/specs_headers.handlebars similarity index 100% rename from scripts/api-inference/templates/specs_headers.handlebars rename to scripts/api-inference/templates/common/specs_headers.handlebars diff --git a/scripts/api-inference/templates/specs_output.handlebars b/scripts/api-inference/templates/common/specs_output.handlebars similarity index 100% rename from scripts/api-inference/templates/specs_output.handlebars rename to scripts/api-inference/templates/common/specs_output.handlebars diff --git a/scripts/api-inference/templates/specs_payload.handlebars b/scripts/api-inference/templates/common/specs_payload.handlebars similarity index 100% rename from scripts/api-inference/templates/specs_payload.handlebars rename to scripts/api-inference/templates/common/specs_payload.handlebars diff --git a/scripts/api-inference/templates/task/chat_completion.handlebars b/scripts/api-inference/templates/task/chat_completion.handlebars new file mode 100644 index 000000000..e69de29bb diff --git a/scripts/api-inference/templates/image_to_image.handlebars b/scripts/api-inference/templates/task/image_to_image.handlebars similarity index 100% rename from scripts/api-inference/templates/image_to_image.handlebars rename to scripts/api-inference/templates/task/image_to_image.handlebars diff --git a/scripts/api-inference/templates/text_generation.handlebars b/scripts/api-inference/templates/task/text_generation.handlebars similarity index 100% rename from scripts/api-inference/templates/text_generation.handlebars rename to scripts/api-inference/templates/task/text_generation.handlebars diff --git a/scripts/api-inference/templates/text_to_image.handlebars b/scripts/api-inference/templates/task/text_to_image.handlebars similarity index 100% rename from scripts/api-inference/templates/text_to_image.handlebars rename to scripts/api-inference/templates/task/text_to_image.handlebars From 0e37b4c7de17a1e5ae61fd856c45b0d311ced113 Mon Sep 17 00:00:00 2001 From: Wauplin Date: Tue, 27 Aug 2024 17:31:29 +0200 Subject: [PATCH 4/9] add chat-completion --- docs/api-inference/_toctree.yml | 2 + docs/api-inference/tasks/chat_completion.md | 149 ++++++++++++++++++ docs/api-inference/tasks/image_to_image.md | 4 - docs/api-inference/tasks/text_generation.md | 6 +- docs/api-inference/tasks/text_to_image.md | 4 - scripts/api-inference/scripts/generate.ts | 61 ++++++- .../templates/task/chat_completion.handlebars | 38 +++++ .../templates/task/image_to_image.handlebars | 4 - .../templates/task/text_generation.handlebars | 4 - .../templates/task/text_to_image.handlebars | 4 - 10 files changed, 244 insertions(+), 32 deletions(-) create mode 100644 docs/api-inference/tasks/chat_completion.md diff --git a/docs/api-inference/_toctree.yml b/docs/api-inference/_toctree.yml index a91f5eb55..d50a9e0dd 100644 --- a/docs/api-inference/_toctree.yml +++ b/docs/api-inference/_toctree.yml @@ -12,6 +12,8 @@ - local: parameters title: Parameters - sections: + - local: tasks/chat_completion + title: Chat completion - local: tasks/fill_mask title: Fill Mask - local: tasks/image_to_image diff --git a/docs/api-inference/tasks/chat_completion.md b/docs/api-inference/tasks/chat_completion.md new file mode 100644 index 000000000..b685b4446 --- /dev/null +++ b/docs/api-inference/tasks/chat_completion.md @@ -0,0 +1,149 @@ +## Chat completion + +Generate a response given a list of messages. +This is a subtask of [`text-generation`](./text_generation) designed to generate responses in a conversational context. + + + +### Recommended models + +- [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it): A text-generation model trained to follow instructions. +- [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct): Very powerful text generation model trained to follow instructions. +- [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct): Small yet powerful text generation model. +- [AI-MO/NuminaMath-7B-TIR](https://huggingface.co/AI-MO/NuminaMath-7B-TIR): A very powerful model that can solve mathematical problems. +- [HuggingFaceH4/starchat2-15b-v0.1](https://huggingface.co/HuggingFaceH4/starchat2-15b-v0.1): Strong coding assistant model. +- [mistralai/Mistral-Nemo-Instruct-2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407): Very strong open-source large language model. + + + +### API specification + +#### Request + +| Payload | | | +| :--- | :--- | :--- | +| **frequency_penalty** | _number, optional_ | Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. | +| **logprobs** | _boolean, optional_ | Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message. | +| **max_tokens** | _integer, optional_ | The maximum number of tokens that can be generated in the chat completion. | +| **messages** | _array, required_ | A list of messages comprising the conversation so far. | +| **presence_penalty** | _number, optional_ | Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics | +| **seed** | _integer, optional_ | | +| **stop** | _array, optional_ | Up to 4 sequences where the API will stop generating further tokens. | +| **stream** | _boolean, optional_ | | +| **temperature** | _number, optional_ | What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or `top_p` but not both. | +| **tool_choice** | _object, optional_ | One of the following: | +| **         (#1)** | | | +| **                FunctionName** | _string, required_ | | +| **         (#2)** | | Possible values: OneOf | +| **tool_prompt** | _string, optional_ | A prompt to be appended before the tools | +| **tools** | _array, optional_ | A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for. | +| **top_logprobs** | _integer, optional_ | An integer between 0 and 5 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is used. | +| **top_p** | _number, optional_ | An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. | + + +Some options can be configured by passing headers to the Inference API. Here are the available headers: + +| Headers | | | +| :--- | :--- | :--- | +| **authorization** | _string, optional_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). | +| **x-use-cache** | _boolean, optional, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). | +| **x-wait-for-model** | _boolean, optional, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). | + +For more information about Inference API headers, check out the parameters [guide](../parameters). + +#### Response + +Output type depends on the `stream` input parameter. +If `stream` is `false` (default), the response will be a JSON object with the following fields: + +| Body | | +| :--- | :--- | :--- | +| **choices** | _array_ | | +| **created** | _integer_ | | +| **id** | _string_ | | +| **model** | _string_ | | +| **object** | _string_ | | +| **system_fingerprint** | _string_ | | +| **usage** | _object_ | | +| **        completion_tokens** | _integer_ | | +| **        prompt_tokens** | _integer_ | | +| **        total_tokens** | _integer_ | | + + +If `stream` is `true`, generated tokens are returned as a stream, using Server-Sent Events (SSE). +For more information about streaming, check out [this guide](https://huggingface.co/docs/text-generation-inference/conceptual/streaming). + +| Body | | +| :--- | :--- | :--- | +| **choices** | _array_ | | +| **created** | _integer_ | | +| **id** | _string_ | | +| **model** | _string_ | | +| **object** | _string_ | | +| **system_fingerprint** | _string_ | | + + +### Using the API + + + + + +```bash +curl https://api-inference.huggingface.co/models/undefined \ + -X POST \ + -d '{"inputs": "Can you please let us know more details about your "}' \ + -H 'Content-Type: application/json' \ + -H "Authorization: Bearer hf_***" + +``` + + + +```py +import requests + +API_URL = "https://api-inference.huggingface.co/models/undefined" +headers = {"Authorization": "Bearer hf_***"} + +def query(payload): + response = requests.post(API_URL, headers=headers, json=payload) + return response.json() + +output = query({ + "inputs": "Can you please let us know more details about your ", +}) +``` + +To use the Python client, see `huggingface_hub`'s [package reference](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.chat_completion). + + + +```js +async function query(data) { + const response = await fetch( + "https://api-inference.huggingface.co/models/undefined", + { + headers: { + Authorization: "Bearer hf_***" + "Content-Type": "application/json", + }, + method: "POST", + body: JSON.stringify(data), + } + ); + const result = await response.json(); + return result; +} + +query({"inputs": "Can you please let us know more details about your "}).then((response) => { + console.log(JSON.stringify(response)); +}); +``` + +To use the JavaScript client, see `huggingface.js`'s [package reference](https://huggingface.co/docs/huggingface.js/inference/classes/HfInference#chatcompletion). + + + + + diff --git a/docs/api-inference/tasks/image_to_image.md b/docs/api-inference/tasks/image_to_image.md index 55f84ab64..448e23c23 100644 --- a/docs/api-inference/tasks/image_to_image.md +++ b/docs/api-inference/tasks/image_to_image.md @@ -29,8 +29,6 @@ This is only a subset of the supported models. Find the model that suits you bes #### Request -##### Payload - | Payload | | | | :--- | :--- | :--- | | **inputs** | _object, required_ | The input image data | @@ -43,8 +41,6 @@ This is only a subset of the supported models. Find the model that suits you bes | **                height** | _integer, required_ | | -##### Headers - Some options can be configured by passing headers to the Inference API. Here are the available headers: | Headers | | | diff --git a/docs/api-inference/tasks/text_generation.md b/docs/api-inference/tasks/text_generation.md index 94d764ec0..c23992e35 100644 --- a/docs/api-inference/tasks/text_generation.md +++ b/docs/api-inference/tasks/text_generation.md @@ -24,8 +24,6 @@ This is only a subset of the supported models. Find the model that suits you bes #### Request -##### Payload - | Payload | | | | :--- | :--- | :--- | | **inputs** | _string, required_ | | @@ -38,7 +36,7 @@ This is only a subset of the supported models. Find the model that suits you bes | **        grammar** | _object, optional_ | One of the following: | | **                 (#1)** | | | | **                        type** | _enum, required_ | Possible values: json | -| **                        value** | _object, required_ | A string that represents a [JSON Schema](https://json-schema.org/).

JSON Schema is a declarative language that allows to annotate JSON documents
with types and descriptions. | +| **                        value** | _object, required_ | A string that represents a [JSON Schema](https://json-schema.org/). JSON Schema is a declarative language that allows to annotate JSON documents with types and descriptions. | | **                 (#2)** | | | | **                        type** | _enum, required_ | Possible values: regex | | **                        value** | _string, required_ | | @@ -57,8 +55,6 @@ This is only a subset of the supported models. Find the model that suits you bes | **stream** | _boolean, optional_ | | -##### Headers - Some options can be configured by passing headers to the Inference API. Here are the available headers: | Headers | | | diff --git a/docs/api-inference/tasks/text_to_image.md b/docs/api-inference/tasks/text_to_image.md index 69db59cb3..79b7d7752 100644 --- a/docs/api-inference/tasks/text_to_image.md +++ b/docs/api-inference/tasks/text_to_image.md @@ -21,8 +21,6 @@ This is only a subset of the supported models. Find the model that suits you bes #### Request -##### Payload - | Payload | | | | :--- | :--- | :--- | | **inputs** | _string, required_ | The input text data (sometimes called "prompt" | @@ -36,8 +34,6 @@ This is only a subset of the supported models. Find the model that suits you bes | **        scheduler** | _string, optional_ | For diffusion models. Override the scheduler with a compatible one | -##### Headers - Some options can be configured by passing headers to the Inference API. Here are the available headers: | Headers | | | diff --git a/scripts/api-inference/scripts/generate.ts b/scripts/api-inference/scripts/generate.ts index fa4712872..cc68ac6c8 100644 --- a/scripts/api-inference/scripts/generate.ts +++ b/scripts/api-inference/scripts/generate.ts @@ -178,7 +178,9 @@ function processPayloadSchema( const isObject = type === "object" && value.properties; const isArray = type === "array" && value.items; const isCombinator = value.oneOf || value.allOf || value.anyOf; - const addRow = !(isCombinator && isCombinator.length === 1); + const addRow = + !(isCombinator && isCombinator.length === 1) && + !description.includes("UNUSED"); if (isCombinator && isCombinator.length > 1) { description = "One of the following:"; @@ -193,7 +195,7 @@ function processPayloadSchema( const row = { name: `${parentPrefix}${key}`, type: type, - description: description.replace(/\n/g, "
"), + description: description.replace(/\n/g, " "), required: isRequired ? "required" : "optional", }; rows.push(row); @@ -280,6 +282,7 @@ const TASKS: PipelineType[] = [ "text-generation", "text-to-image", ]; +const TASKS_EXTENDED = [...TASKS, "chat-completion"]; const DATA: { constants: { @@ -318,12 +321,16 @@ await Promise.all( id: string; description: string; inference: string | undefined; + config: JsonObject | undefined; }) => { console.log(` ⚑ Checking inference status ${model.id}`); - const modelData = await fetch( - `https://huggingface.co/api/models/${model.id}?expand[]=inference`, - ).then((res) => res.json()); + let url = `https://huggingface.co/api/models/${model.id}?expand[]=inference`; + if (task === "text-generation") { + url += "&expand[]=config"; + } + const modelData = await fetch(url).then((res) => res.json()); model.inference = modelData.inference; + model.config = modelData.config; }, ), ); @@ -353,7 +360,8 @@ TASKS.forEach((task) => { // Render specs await Promise.all( - TASKS.map(async (task) => { + TASKS_EXTENDED.map(async (task) => { + // @ts-ignore const specs = await fetchSpecs(task); DATA.specs[task] = { input: specs.input @@ -377,6 +385,45 @@ TASKS.forEach((task) => { DATA.tips.listModelsLink[task] = TIP_LIST_MODELS_LINK_TEMPLATE({ task }); }); +/////////////////////////////////////////////// +//// Data for chat-completion special case //// +/////////////////////////////////////////////// + +function fetchChatCompletion() { + // Recommended models based on text-generation + DATA.models["chat-completion"] = DATA.models["text-generation"].filter( + // @ts-ignore + (model) => model.config?.tokenizer_config?.chat_template, + ); + + // Snippet specific to chat completion + const mainModel = DATA.models["chat-completion"][0].id; + const mainModelData = { + // @ts-ignore + id: mainModel.id, + pipeline_tag: "text-generation", + mask_token: "", + library_name: "", + // @ts-ignore + config: mainModel.config, + }; + const taskSnippets = { + // @ts-ignore + curl: GET_SNIPPET_FN["curl"](mainModelData, "hf_***"), + // @ts-ignore + python: GET_SNIPPET_FN["python"](mainModelData, "hf_***"), + // @ts-ignore + javascript: GET_SNIPPET_FN["js"](mainModelData, "hf_***"), + }; + DATA.snippets["chat-completion"] = SNIPPETS_TEMPLATE({ + taskSnippets, + taskSnakeCase: "chat-completion".replace("-", "_"), + taskAttached: "chat-completion".replace("-", ""), + }); +} + +fetchChatCompletion(); + ///////////////////////// //// Rendering utils //// ///////////////////////// @@ -391,7 +438,7 @@ async function renderTemplate( } await Promise.all( - TASKS.map(async (task) => { + TASKS_EXTENDED.map(async (task) => { // @ts-ignore const rendered = await renderTemplate(task, DATA); await writeTaskDoc(task, rendered); diff --git a/scripts/api-inference/templates/task/chat_completion.handlebars b/scripts/api-inference/templates/task/chat_completion.handlebars index e69de29bb..d0d3272df 100644 --- a/scripts/api-inference/templates/task/chat_completion.handlebars +++ b/scripts/api-inference/templates/task/chat_completion.handlebars @@ -0,0 +1,38 @@ +## Chat completion + +Generate a response given a list of messages. +This is a subtask of [`text-generation`](./text_generation) designed to generate responses in a conversational context. + +{{{tips.linksToTaskPage.chat-completion}}} + +### Recommended models + +{{#each models.chat-completion}} +- [{{this.id}}](https://huggingface.co/{{this.id}}): {{this.description}} +{{/each}} + +{{{tips.listModelsLink.chat-completion}}} + +### API specification + +#### Request + +{{{specs.chat-completion.input}}} + +{{{constants.specsHeaders}}} + +#### Response + +Output type depends on the `stream` input parameter. +If `stream` is `false` (default), the response will be a JSON object with the following fields: + +{{{specs.chat-completion.output}}} + +If `stream` is `true`, generated tokens are returned as a stream, using Server-Sent Events (SSE). +For more information about streaming, check out [this guide](https://huggingface.co/docs/text-generation-inference/conceptual/streaming). + +{{{specs.chat-completion.stream_output}}} + +### Using the API + +{{{snippets.chat-completion}}} diff --git a/scripts/api-inference/templates/task/image_to_image.handlebars b/scripts/api-inference/templates/task/image_to_image.handlebars index 530ce22b8..b432eab19 100644 --- a/scripts/api-inference/templates/task/image_to_image.handlebars +++ b/scripts/api-inference/templates/task/image_to_image.handlebars @@ -23,12 +23,8 @@ Use cases heavily depend on the model and the dataset it was trained on, but som #### Request -##### Payload - {{{specs.image-to-image.input}}} -##### Headers - {{{constants.specsHeaders}}} #### Response diff --git a/scripts/api-inference/templates/task/text_generation.handlebars b/scripts/api-inference/templates/task/text_generation.handlebars index 71d63d6d5..e7b27b919 100644 --- a/scripts/api-inference/templates/task/text_generation.handlebars +++ b/scripts/api-inference/templates/task/text_generation.handlebars @@ -16,12 +16,8 @@ Generate text based on a prompt. #### Request -##### Payload - {{{specs.text-generation.input}}} -##### Headers - {{{constants.specsHeaders}}} #### Response diff --git a/scripts/api-inference/templates/task/text_to_image.handlebars b/scripts/api-inference/templates/task/text_to_image.handlebars index 54712d80d..6c9c568d1 100644 --- a/scripts/api-inference/templates/task/text_to_image.handlebars +++ b/scripts/api-inference/templates/task/text_to_image.handlebars @@ -16,12 +16,8 @@ Generate an image based on a given text prompt. #### Request -##### Payload - {{{specs.text-to-image.input}}} -##### Headers - {{{constants.specsHeaders}}} #### Response From 458879f57296d4d4f25e89edab16744d5aed3c1f Mon Sep 17 00:00:00 2001 From: Wauplin Date: Tue, 27 Aug 2024 17:58:28 +0200 Subject: [PATCH 5/9] better handling of arrays --- docs/api-inference/tasks/chat_completion.md | 64 +++++++++++++++++++-- docs/api-inference/tasks/image_to_image.md | 2 +- docs/api-inference/tasks/text_generation.md | 37 ++++++++++-- docs/api-inference/tasks/text_to_image.md | 2 +- scripts/api-inference/scripts/generate.ts | 29 ++++++++-- 5 files changed, 115 insertions(+), 19 deletions(-) diff --git a/docs/api-inference/tasks/chat_completion.md b/docs/api-inference/tasks/chat_completion.md index b685b4446..3b81214e6 100644 --- a/docs/api-inference/tasks/chat_completion.md +++ b/docs/api-inference/tasks/chat_completion.md @@ -25,10 +25,20 @@ This is a subtask of [`text-generation`](./text_generation) designed to generate | **frequency_penalty** | _number, optional_ | Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. | | **logprobs** | _boolean, optional_ | Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message. | | **max_tokens** | _integer, optional_ | The maximum number of tokens that can be generated in the chat completion. | -| **messages** | _array, required_ | A list of messages comprising the conversation so far. | +| **messages** | _object[], required_ | A list of messages comprising the conversation so far. | +| **        content** | _string, optional_ | | +| **        name** | _string, optional_ | | +| **        role** | _string, required_ | | +| **        tool_calls** | _object[], optional_ | | +| **                function** | _object, required_ | | +| **                        arguments** | _object, required_ | | +| **                        description** | _string, optional_ | | +| **                        name** | _string, required_ | | +| **                id** | _integer, required_ | | +| **                type** | _string, required_ | | | **presence_penalty** | _number, optional_ | Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics | | **seed** | _integer, optional_ | | -| **stop** | _array, optional_ | Up to 4 sequences where the API will stop generating further tokens. | +| **stop** | _string[], optional_ | Up to 4 sequences where the API will stop generating further tokens. | | **stream** | _boolean, optional_ | | | **temperature** | _number, optional_ | What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or `top_p` but not both. | | **tool_choice** | _object, optional_ | One of the following: | @@ -36,7 +46,12 @@ This is a subtask of [`text-generation`](./text_generation) designed to generate | **                FunctionName** | _string, required_ | | | **         (#2)** | | Possible values: OneOf | | **tool_prompt** | _string, optional_ | A prompt to be appended before the tools | -| **tools** | _array, optional_ | A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for. | +| **tools** | _object[], optional_ | A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for. | +| **        function** | _object, required_ | | +| **                arguments** | _object, required_ | | +| **                description** | _string, optional_ | | +| **                name** | _string, required_ | | +| **        type** | _string, required_ | | | **top_logprobs** | _integer, optional_ | An integer between 0 and 5 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is used. | | **top_p** | _number, optional_ | An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. | @@ -58,7 +73,27 @@ If `stream` is `false` (default), the response will be a JSON object with the fo | Body | | | :--- | :--- | :--- | -| **choices** | _array_ | | +| **choices** | _object[]_ | | +| **        finish_reason** | _string_ | | +| **        index** | _integer_ | | +| **        logprobs** | _object_ | | +| **                content** | _object[]_ | | +| **                        logprob** | _number_ | | +| **                        token** | _string_ | | +| **                        top_logprobs** | _object[]_ | | +| **                                logprob** | _number_ | | +| **                                token** | _string_ | | +| **        message** | _object_ | | +| **                content** | _string_ | | +| **                name** | _string_ | | +| **                role** | _string_ | | +| **                tool_calls** | _object[]_ | | +| **                        function** | _object_ | | +| **                                arguments** | _object_ | | +| **                                description** | _string_ | | +| **                                name** | _string_ | | +| **                        id** | _integer_ | | +| **                        type** | _string_ | | | **created** | _integer_ | | | **id** | _string_ | | | **model** | _string_ | | @@ -75,7 +110,26 @@ For more information about streaming, check out [this guide](https://huggingface | Body | | | :--- | :--- | :--- | -| **choices** | _array_ | | +| **choices** | _object[]_ | | +| **        delta** | _object_ | | +| **                content** | _string_ | | +| **                role** | _string_ | | +| **                tool_calls** | _object_ | | +| **                        function** | _object_ | | +| **                                arguments** | _string_ | | +| **                                name** | _string_ | | +| **                        id** | _string_ | | +| **                        index** | _integer_ | | +| **                        type** | _string_ | | +| **        finish_reason** | _string_ | | +| **        index** | _integer_ | | +| **        logprobs** | _object_ | | +| **                content** | _object[]_ | | +| **                        logprob** | _number_ | | +| **                        token** | _string_ | | +| **                        top_logprobs** | _object[]_ | | +| **                                logprob** | _number_ | | +| **                                token** | _string_ | | | **created** | _integer_ | | | **id** | _string_ | | | **model** | _string_ | | diff --git a/docs/api-inference/tasks/image_to_image.md b/docs/api-inference/tasks/image_to_image.md index 448e23c23..595b40361 100644 --- a/docs/api-inference/tasks/image_to_image.md +++ b/docs/api-inference/tasks/image_to_image.md @@ -34,7 +34,7 @@ This is only a subset of the supported models. Find the model that suits you bes | **inputs** | _object, required_ | The input image data | | **parameters** | _object, optional_ | Additional inference parameters for Image To Image | | **        guidance_scale** | _number, optional_ | For diffusion models. A higher guidance scale value encourages the model to generate images closely linked to the text prompt at the expense of lower image quality. | -| **        negative_prompt** | _array, optional_ | One or several prompt to guide what NOT to include in image generation. | +| **        negative_prompt** | _string[], optional_ | One or several prompt to guide what NOT to include in image generation. | | **        num_inference_steps** | _integer, optional_ | For diffusion models. The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. | | **        target_size** | _object, optional_ | The size in pixel of the output image | | **                width** | _integer, required_ | | diff --git a/docs/api-inference/tasks/text_generation.md b/docs/api-inference/tasks/text_generation.md index c23992e35..ddaf451b9 100644 --- a/docs/api-inference/tasks/text_generation.md +++ b/docs/api-inference/tasks/text_generation.md @@ -44,7 +44,7 @@ This is only a subset of the supported models. Find the model that suits you bes | **        repetition_penalty** | _number, optional_ | | | **        return_full_text** | _boolean, optional_ | | | **        seed** | _integer, optional_ | | -| **        stop** | _array, optional_ | | +| **        stop** | _string[], optional_ | | | **        temperature** | _number, optional_ | | | **        top_k** | _integer, optional_ | | | **        top_n_tokens** | _integer, optional_ | | @@ -73,13 +73,34 @@ If `stream` is `false` (default), the response will be a JSON object with the fo | Body | | | :--- | :--- | :--- | | **details** | _object_ | | -| **        best_of_sequences** | _array_ | | +| **        best_of_sequences** | _object[]_ | | +| **                finish_reason** | _enum_ | Possible values: length, eos_token, stop_sequence | +| **                generated_text** | _string_ | | +| **                generated_tokens** | _integer_ | | +| **                prefill** | _object[]_ | | +| **                        id** | _integer_ | | +| **                        logprob** | _number_ | | +| **                        text** | _string_ | | +| **                seed** | _integer_ | | +| **                tokens** | _object[]_ | | +| **                        id** | _integer_ | | +| **                        logprob** | _number_ | | +| **                        special** | _boolean_ | | +| **                        text** | _string_ | | +| **                top_tokens** | _array[]_ | | | **        finish_reason** | _enum_ | Possible values: length, eos_token, stop_sequence | | **        generated_tokens** | _integer_ | | -| **        prefill** | _array_ | | +| **        prefill** | _object[]_ | | +| **                id** | _integer_ | | +| **                logprob** | _number_ | | +| **                text** | _string_ | | | **        seed** | _integer_ | | -| **        tokens** | _array_ | | -| **        top_tokens** | _array_ | | +| **        tokens** | _object[]_ | | +| **                id** | _integer_ | | +| **                logprob** | _number_ | | +| **                special** | _boolean_ | | +| **                text** | _string_ | | +| **        top_tokens** | _array[]_ | | | **generated_text** | _string_ | | @@ -99,7 +120,11 @@ For more information about streaming, check out [this guide](https://huggingface | **        logprob** | _number_ | | | **        special** | _boolean_ | | | **        text** | _string_ | | -| **top_tokens** | _array_ | | +| **top_tokens** | _object[]_ | | +| **        id** | _integer_ | | +| **        logprob** | _number_ | | +| **        special** | _boolean_ | | +| **        text** | _string_ | | ### Using the API diff --git a/docs/api-inference/tasks/text_to_image.md b/docs/api-inference/tasks/text_to_image.md index 79b7d7752..87b959197 100644 --- a/docs/api-inference/tasks/text_to_image.md +++ b/docs/api-inference/tasks/text_to_image.md @@ -26,7 +26,7 @@ This is only a subset of the supported models. Find the model that suits you bes | **inputs** | _string, required_ | The input text data (sometimes called "prompt" | | **parameters** | _object, optional_ | Additional inference parameters for Text To Image | | **        guidance_scale** | _number, optional_ | For diffusion models. A higher guidance scale value encourages the model to generate images closely linked to the text prompt at the expense of lower image quality. | -| **        negative_prompt** | _array, optional_ | One or several prompt to guide what NOT to include in image generation. | +| **        negative_prompt** | _string[], optional_ | One or several prompt to guide what NOT to include in image generation. | | **        num_inference_steps** | _integer, optional_ | For diffusion models. The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. | | **        target_size** | _object, optional_ | The size in pixel of the output image | | **                width** | _integer, required_ | | diff --git a/scripts/api-inference/scripts/generate.ts b/scripts/api-inference/scripts/generate.ts index cc68ac6c8..8f41f82aa 100644 --- a/scripts/api-inference/scripts/generate.ts +++ b/scripts/api-inference/scripts/generate.ts @@ -36,6 +36,9 @@ const TEMPLATE_DIR = path.join(ROOT_DIR, "templates"); const DOCS_DIR = path.join(ROOT_DIR, "..", "..", "docs"); const TASKS_DOCS_DIR = path.join(DOCS_DIR, "api-inference", "tasks"); +const NBSP = " "; // non-breaking space +const TABLE_INDENT = NBSP.repeat(8); + function readTemplate( templateName: string, namespace: string, @@ -180,12 +183,21 @@ function processPayloadSchema( const isCombinator = value.oneOf || value.allOf || value.anyOf; const addRow = !(isCombinator && isCombinator.length === 1) && - !description.includes("UNUSED"); + !description.includes("UNUSED") && + !key.includes("SKIP"); if (isCombinator && isCombinator.length > 1) { description = "One of the following:"; } + if (isArray) { + if (value.items.$ref) { + type = "object[]"; + } else if (value.items.type) { + type = `${value.items.type}[]`; + } + } + if (addRow) { // Add the row to the table except if combination with only one option if (key.includes("(#")) { @@ -210,13 +222,18 @@ function processPayloadSchema( nestedKey, nestedValue, nestedRequired, - parentPrefix + "        ", + parentPrefix + TABLE_INDENT, ); }, ); - } else if (isArray) { + } else if (isArray && value.items.$ref) { // Process array items - // processSchemaNode(key + "[]", value.items, false, parentPrefix + "        "); + processSchemaNode( + "SKIP", + resolveRef(value.items.$ref), + false, + parentPrefix, + ); } else if (isCombinator) { // Process combinators like oneOf, allOf, anyOf const combinators = value.oneOf || value.allOf || value.anyOf; @@ -227,10 +244,10 @@ function processPayloadSchema( // If there are multiple options, process each one as options combinators.forEach((subSchema: any, index: number) => { processSchemaNode( - ` (#${index + 1})`, + `${NBSP}(#${index + 1})`, subSchema, isRequired, - parentPrefix + "        ", + parentPrefix + TABLE_INDENT, ); }); } From 330312ef6097b8a15776f861a17c6305761a4462 Mon Sep 17 00:00:00 2001 From: Wauplin Date: Tue, 27 Aug 2024 18:03:08 +0200 Subject: [PATCH 6/9] better handling of parameters --- docs/api-inference/tasks/chat_completion.md | 66 +++++++++---------- docs/api-inference/tasks/image_to_image.md | 22 +++---- docs/api-inference/tasks/text_generation.md | 56 ++++++++-------- docs/api-inference/tasks/text_to_image.md | 24 +++---- scripts/api-inference/scripts/generate.ts | 2 +- .../templates/common/specs_headers.handlebars | 6 +- .../templates/common/specs_payload.handlebars | 2 +- 7 files changed, 89 insertions(+), 89 deletions(-) diff --git a/docs/api-inference/tasks/chat_completion.md b/docs/api-inference/tasks/chat_completion.md index 3b81214e6..f79672abc 100644 --- a/docs/api-inference/tasks/chat_completion.md +++ b/docs/api-inference/tasks/chat_completion.md @@ -22,47 +22,47 @@ This is a subtask of [`text-generation`](./text_generation) designed to generate | Payload | | | | :--- | :--- | :--- | -| **frequency_penalty** | _number, optional_ | Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. | -| **logprobs** | _boolean, optional_ | Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message. | -| **max_tokens** | _integer, optional_ | The maximum number of tokens that can be generated in the chat completion. | -| **messages** | _object[], required_ | A list of messages comprising the conversation so far. | -| **        content** | _string, optional_ | | -| **        name** | _string, optional_ | | -| **        role** | _string, required_ | | -| **        tool_calls** | _object[], optional_ | | -| **                function** | _object, required_ | | -| **                        arguments** | _object, required_ | | -| **                        description** | _string, optional_ | | -| **                        name** | _string, required_ | | -| **                id** | _integer, required_ | | -| **                type** | _string, required_ | | -| **presence_penalty** | _number, optional_ | Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics | -| **seed** | _integer, optional_ | | -| **stop** | _string[], optional_ | Up to 4 sequences where the API will stop generating further tokens. | -| **stream** | _boolean, optional_ | | -| **temperature** | _number, optional_ | What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or `top_p` but not both. | -| **tool_choice** | _object, optional_ | One of the following: | +| **frequency_penalty** | _number_ | Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. | +| **logprobs** | _boolean_ | Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message. | +| **max_tokens** | _integer_ | The maximum number of tokens that can be generated in the chat completion. | +| **messages*** | _object[]_ | A list of messages comprising the conversation so far. | +| **        content** | _string_ | | +| **        name** | _string_ | | +| **        role*** | _string_ | | +| **        tool_calls** | _object[]_ | | +| **                function*** | _object_ | | +| **                        arguments*** | _object_ | | +| **                        description** | _string_ | | +| **                        name*** | _string_ | | +| **                id*** | _integer_ | | +| **                type*** | _string_ | | +| **presence_penalty** | _number_ | Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics | +| **seed** | _integer_ | | +| **stop** | _string[]_ | Up to 4 sequences where the API will stop generating further tokens. | +| **stream** | _boolean_ | | +| **temperature** | _number_ | What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or `top_p` but not both. | +| **tool_choice** | _object_ | One of the following: | | **         (#1)** | | | -| **                FunctionName** | _string, required_ | | +| **                FunctionName*** | _string_ | | | **         (#2)** | | Possible values: OneOf | -| **tool_prompt** | _string, optional_ | A prompt to be appended before the tools | -| **tools** | _object[], optional_ | A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for. | -| **        function** | _object, required_ | | -| **                arguments** | _object, required_ | | -| **                description** | _string, optional_ | | -| **                name** | _string, required_ | | -| **        type** | _string, required_ | | -| **top_logprobs** | _integer, optional_ | An integer between 0 and 5 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is used. | -| **top_p** | _number, optional_ | An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. | +| **tool_prompt** | _string_ | A prompt to be appended before the tools | +| **tools** | _object[]_ | A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for. | +| **        function*** | _object_ | | +| **                arguments*** | _object_ | | +| **                description** | _string_ | | +| **                name*** | _string_ | | +| **        type*** | _string_ | | +| **top_logprobs** | _integer_ | An integer between 0 and 5 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is used. | +| **top_p** | _number_ | An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. | Some options can be configured by passing headers to the Inference API. Here are the available headers: | Headers | | | | :--- | :--- | :--- | -| **authorization** | _string, optional_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). | -| **x-use-cache** | _boolean, optional, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). | -| **x-wait-for-model** | _boolean, optional, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). | +| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). | +| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). | +| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). | For more information about Inference API headers, check out the parameters [guide](../parameters). diff --git a/docs/api-inference/tasks/image_to_image.md b/docs/api-inference/tasks/image_to_image.md index 595b40361..5de1a34ed 100644 --- a/docs/api-inference/tasks/image_to_image.md +++ b/docs/api-inference/tasks/image_to_image.md @@ -31,23 +31,23 @@ This is only a subset of the supported models. Find the model that suits you bes | Payload | | | | :--- | :--- | :--- | -| **inputs** | _object, required_ | The input image data | -| **parameters** | _object, optional_ | Additional inference parameters for Image To Image | -| **        guidance_scale** | _number, optional_ | For diffusion models. A higher guidance scale value encourages the model to generate images closely linked to the text prompt at the expense of lower image quality. | -| **        negative_prompt** | _string[], optional_ | One or several prompt to guide what NOT to include in image generation. | -| **        num_inference_steps** | _integer, optional_ | For diffusion models. The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. | -| **        target_size** | _object, optional_ | The size in pixel of the output image | -| **                width** | _integer, required_ | | -| **                height** | _integer, required_ | | +| **inputs*** | _object_ | The input image data | +| **parameters** | _object_ | Additional inference parameters for Image To Image | +| **        guidance_scale** | _number_ | For diffusion models. A higher guidance scale value encourages the model to generate images closely linked to the text prompt at the expense of lower image quality. | +| **        negative_prompt** | _string[]_ | One or several prompt to guide what NOT to include in image generation. | +| **        num_inference_steps** | _integer_ | For diffusion models. The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. | +| **        target_size** | _object_ | The size in pixel of the output image | +| **                width*** | _integer_ | | +| **                height*** | _integer_ | | Some options can be configured by passing headers to the Inference API. Here are the available headers: | Headers | | | | :--- | :--- | :--- | -| **authorization** | _string, optional_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). | -| **x-use-cache** | _boolean, optional, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). | -| **x-wait-for-model** | _boolean, optional, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). | +| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). | +| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). | +| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). | For more information about Inference API headers, check out the parameters [guide](../parameters). diff --git a/docs/api-inference/tasks/text_generation.md b/docs/api-inference/tasks/text_generation.md index ddaf451b9..f9881087a 100644 --- a/docs/api-inference/tasks/text_generation.md +++ b/docs/api-inference/tasks/text_generation.md @@ -26,42 +26,42 @@ This is only a subset of the supported models. Find the model that suits you bes | Payload | | | | :--- | :--- | :--- | -| **inputs** | _string, required_ | | -| **parameters** | _object, optional_ | | -| **        best_of** | _integer, optional_ | | -| **        decoder_input_details** | _boolean, optional_ | | -| **        details** | _boolean, optional_ | | -| **        do_sample** | _boolean, optional_ | | -| **        frequency_penalty** | _number, optional_ | | -| **        grammar** | _object, optional_ | One of the following: | +| **inputs*** | _string_ | | +| **parameters** | _object_ | | +| **        best_of** | _integer_ | | +| **        decoder_input_details** | _boolean_ | | +| **        details** | _boolean_ | | +| **        do_sample** | _boolean_ | | +| **        frequency_penalty** | _number_ | | +| **        grammar** | _object_ | One of the following: | | **                 (#1)** | | | -| **                        type** | _enum, required_ | Possible values: json | -| **                        value** | _object, required_ | A string that represents a [JSON Schema](https://json-schema.org/). JSON Schema is a declarative language that allows to annotate JSON documents with types and descriptions. | +| **                        type*** | _enum_ | Possible values: json | +| **                        value*** | _object_ | A string that represents a [JSON Schema](https://json-schema.org/). JSON Schema is a declarative language that allows to annotate JSON documents with types and descriptions. | | **                 (#2)** | | | -| **                        type** | _enum, required_ | Possible values: regex | -| **                        value** | _string, required_ | | -| **        max_new_tokens** | _integer, optional_ | | -| **        repetition_penalty** | _number, optional_ | | -| **        return_full_text** | _boolean, optional_ | | -| **        seed** | _integer, optional_ | | -| **        stop** | _string[], optional_ | | -| **        temperature** | _number, optional_ | | -| **        top_k** | _integer, optional_ | | -| **        top_n_tokens** | _integer, optional_ | | -| **        top_p** | _number, optional_ | | -| **        truncate** | _integer, optional_ | | -| **        typical_p** | _number, optional_ | | -| **        watermark** | _boolean, optional_ | | -| **stream** | _boolean, optional_ | | +| **                        type*** | _enum_ | Possible values: regex | +| **                        value*** | _string_ | | +| **        max_new_tokens** | _integer_ | | +| **        repetition_penalty** | _number_ | | +| **        return_full_text** | _boolean_ | | +| **        seed** | _integer_ | | +| **        stop** | _string[]_ | | +| **        temperature** | _number_ | | +| **        top_k** | _integer_ | | +| **        top_n_tokens** | _integer_ | | +| **        top_p** | _number_ | | +| **        truncate** | _integer_ | | +| **        typical_p** | _number_ | | +| **        watermark** | _boolean_ | | +| **stream** | _boolean_ | | Some options can be configured by passing headers to the Inference API. Here are the available headers: | Headers | | | | :--- | :--- | :--- | -| **authorization** | _string, optional_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). | -| **x-use-cache** | _boolean, optional, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). | -| **x-wait-for-model** | _boolean, optional, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). | +| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). | +| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). | +| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). | For more information about Inference API headers, check out the parameters [guide](../parameters). diff --git a/docs/api-inference/tasks/text_to_image.md b/docs/api-inference/tasks/text_to_image.md index 87b959197..1d642b357 100644 --- a/docs/api-inference/tasks/text_to_image.md +++ b/docs/api-inference/tasks/text_to_image.md @@ -23,24 +23,24 @@ This is only a subset of the supported models. Find the model that suits you bes | Payload | | | | :--- | :--- | :--- | -| **inputs** | _string, required_ | The input text data (sometimes called "prompt" | -| **parameters** | _object, optional_ | Additional inference parameters for Text To Image | -| **        guidance_scale** | _number, optional_ | For diffusion models. A higher guidance scale value encourages the model to generate images closely linked to the text prompt at the expense of lower image quality. | -| **        negative_prompt** | _string[], optional_ | One or several prompt to guide what NOT to include in image generation. | -| **        num_inference_steps** | _integer, optional_ | For diffusion models. The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. | -| **        target_size** | _object, optional_ | The size in pixel of the output image | -| **                width** | _integer, required_ | | -| **                height** | _integer, required_ | | -| **        scheduler** | _string, optional_ | For diffusion models. Override the scheduler with a compatible one | +| **inputs*** | _string_ | The input text data (sometimes called "prompt" | +| **parameters** | _object_ | Additional inference parameters for Text To Image | +| **        guidance_scale** | _number_ | For diffusion models. A higher guidance scale value encourages the model to generate images closely linked to the text prompt at the expense of lower image quality. | +| **        negative_prompt** | _string[]_ | One or several prompt to guide what NOT to include in image generation. | +| **        num_inference_steps** | _integer_ | For diffusion models. The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. | +| **        target_size** | _object_ | The size in pixel of the output image | +| **                width*** | _integer_ | | +| **                height*** | _integer_ | | +| **        scheduler** | _string_ | For diffusion models. Override the scheduler with a compatible one | Some options can be configured by passing headers to the Inference API. Here are the available headers: | Headers | | | | :--- | :--- | :--- | -| **authorization** | _string, optional_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). | -| **x-use-cache** | _boolean, optional, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). | -| **x-wait-for-model** | _boolean, optional, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). | +| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). | +| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). | +| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). | For more information about Inference API headers, check out the parameters [guide](../parameters). diff --git a/scripts/api-inference/scripts/generate.ts b/scripts/api-inference/scripts/generate.ts index 8f41f82aa..a9d220a07 100644 --- a/scripts/api-inference/scripts/generate.ts +++ b/scripts/api-inference/scripts/generate.ts @@ -208,7 +208,7 @@ function processPayloadSchema( name: `${parentPrefix}${key}`, type: type, description: description.replace(/\n/g, " "), - required: isRequired ? "required" : "optional", + required: isRequired, }; rows.push(row); } diff --git a/scripts/api-inference/templates/common/specs_headers.handlebars b/scripts/api-inference/templates/common/specs_headers.handlebars index 2b952054f..32b6e9d94 100644 --- a/scripts/api-inference/templates/common/specs_headers.handlebars +++ b/scripts/api-inference/templates/common/specs_headers.handlebars @@ -2,8 +2,8 @@ Some options can be configured by passing headers to the Inference API. Here are | Headers | | | | :--- | :--- | :--- | -| **authorization** | _string, optional_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). | -| **x-use-cache** | _boolean, optional, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). | -| **x-wait-for-model** | _boolean, optional, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). | +| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). | +| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). | +| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). | For more information about Inference API headers, check out the parameters [guide](../parameters). \ No newline at end of file diff --git a/scripts/api-inference/templates/common/specs_payload.handlebars b/scripts/api-inference/templates/common/specs_payload.handlebars index f75404bc9..6459be5d9 100644 --- a/scripts/api-inference/templates/common/specs_payload.handlebars +++ b/scripts/api-inference/templates/common/specs_payload.handlebars @@ -2,7 +2,7 @@ | :--- | :--- | :--- | {{#each schema}} {{#if type}} -| **{{{name}}}** | _{{type}}, {{required}}_ | {{{description}}} | +| **{{{name}}}{{#if required}}*{{/if}}** | _{{type}}_ | {{{description}}} | {{else}} | **{{{name}}}** | | {{{description}}} | {{/if}} From d44d7f372c8f33b6624b848a2c89c215ff7d4071 Mon Sep 17 00:00:00 2001 From: Lucain Date: Wed, 28 Aug 2024 16:58:18 +0200 Subject: [PATCH 7/9] Add new tasks pages (fill mask, summarization, question answering, sentence similarity) (#1394) * add fill mask * add summarization * add question answering * Table question answering * handle array output * Add sentence similarity * text classification (almost) * better with an enum * Add mask token * capitalize * remove sentence-similarity * Update docs/api-inference/tasks/table_question_answering.md Co-authored-by: Omar Sanseviero --------- Co-authored-by: Omar Sanseviero --- docs/api-inference/_toctree.yml | 16 +- docs/api-inference/tasks/chat_completion.md | 4 +- docs/api-inference/tasks/fill_mask.md | 114 ++++++++++++++- docs/api-inference/tasks/image_to_image.md | 2 +- .../api-inference/tasks/question_answering.md | 127 ++++++++++++++++ docs/api-inference/tasks/summarization.md | 106 ++++++++++++++ .../tasks/table_question_answering.md | 138 ++++++++++++++++++ .../tasks/text_classification.md | 112 ++++++++++++++ docs/api-inference/tasks/text_generation.md | 20 ++- docs/api-inference/tasks/text_to_image.md | 4 +- scripts/api-inference/scripts/generate.ts | 86 +++++++---- .../templates/task/chat_completion.handlebars | 2 +- .../templates/task/fill_mask.handlebars | 29 ++++ .../templates/task/image_to_image.handlebars | 2 +- .../task/question_answering.handlebars | 29 ++++ .../templates/task/summarization.handlebars | 29 ++++ .../task/table_question_answering.handlebars | 29 ++++ .../task/text_classification.handlebars | 29 ++++ .../templates/task/text_generation.handlebars | 2 +- .../templates/task/text_to_image.handlebars | 2 +- 20 files changed, 828 insertions(+), 54 deletions(-) create mode 100644 docs/api-inference/tasks/question_answering.md create mode 100644 docs/api-inference/tasks/summarization.md create mode 100644 docs/api-inference/tasks/table_question_answering.md create mode 100644 docs/api-inference/tasks/text_classification.md create mode 100644 scripts/api-inference/templates/task/fill_mask.handlebars create mode 100644 scripts/api-inference/templates/task/question_answering.handlebars create mode 100644 scripts/api-inference/templates/task/summarization.handlebars create mode 100644 scripts/api-inference/templates/task/table_question_answering.handlebars create mode 100644 scripts/api-inference/templates/task/text_classification.handlebars diff --git a/docs/api-inference/_toctree.yml b/docs/api-inference/_toctree.yml index d50a9e0dd..247a96201 100644 --- a/docs/api-inference/_toctree.yml +++ b/docs/api-inference/_toctree.yml @@ -13,14 +13,22 @@ title: Parameters - sections: - local: tasks/chat_completion - title: Chat completion + title: Chat Completion - local: tasks/fill_mask title: Fill Mask - local: tasks/image_to_image - title: Image-to-image + title: Image to Image + - local: tasks/question_answering + title: Question Answering + - local: tasks/summarization + title: Summarization + - local: tasks/table_question_answering + title: Table Question Answering + - local: tasks/text_classification + title: Text Classification - local: tasks/text_generation - title: Text generation + title: Text Generation - local: tasks/text_to_image - title: Text-to-image + title: Text to Image title: Detailed Task Parameters title: API Reference \ No newline at end of file diff --git a/docs/api-inference/tasks/chat_completion.md b/docs/api-inference/tasks/chat_completion.md index f79672abc..1b6c0cb0b 100644 --- a/docs/api-inference/tasks/chat_completion.md +++ b/docs/api-inference/tasks/chat_completion.md @@ -1,4 +1,4 @@ -## Chat completion +## Chat Completion Generate a response given a list of messages. This is a subtask of [`text-generation`](./text_generation) designed to generate responses in a conversational context. @@ -44,7 +44,7 @@ This is a subtask of [`text-generation`](./text_generation) designed to generate | **tool_choice** | _object_ | One of the following: | | **         (#1)** | | | | **                FunctionName*** | _string_ | | -| **         (#2)** | | Possible values: OneOf | +| **         (#2)** | | Possible values: OneOf. | | **tool_prompt** | _string_ | A prompt to be appended before the tools | | **tools** | _object[]_ | A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for. | | **        function*** | _object_ | | diff --git a/docs/api-inference/tasks/fill_mask.md b/docs/api-inference/tasks/fill_mask.md index 64260ae39..197fef37c 100644 --- a/docs/api-inference/tasks/fill_mask.md +++ b/docs/api-inference/tasks/fill_mask.md @@ -1,6 +1,114 @@ -## Fill Mask +## Fill-mask -Mask filling is the task of predicting the right word (token to be precise) in the middle of a sequence. +Mask filling is the task of predicting the right word (token to be precise) in the middle of a sequence. + + + +For more details about the `fill-mask` task, check out its [dedicated page](https://huggingface.co/tasks/fill-mask)! You will find examples and related materials. + + + +### Recommended models + +- [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased): A faster and smaller model than the famous BERT model. +- [xlm-roberta-base](https://huggingface.co/xlm-roberta-base): A multilingual model trained on 100 languages. + +This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=fill-mask&sort=trending). + +### API specification + +#### Request + +| Payload | | | +| :--- | :--- | :--- | +| **inputs*** | _string_ | The text with masked tokens | +| **parameters** | _object_ | Additional inference parameters for Fill Mask | +| **        top_k** | _integer_ | When passed, overrides the number of predictions to return. | +| **        targets** | _string[]_ | When passed, the model will limit the scores to the passed targets instead of looking up in the whole vocabulary. If the provided targets are not in the model vocab, they will be tokenized and the first resulting token will be used (with a warning, and that might be slower). | + + +Some options can be configured by passing headers to the Inference API. Here are the available headers: + +| Headers | | | +| :--- | :--- | :--- | +| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). | +| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). | +| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). | + +For more information about Inference API headers, check out the parameters [guide](../parameters). + +#### Response + +| Body | | +| :--- | :--- | :--- | +| **(array)** | _object[]_ | Output is an array of objects. | +| **        sequence** | _string_ | The corresponding input with the mask token prediction. | +| **        score** | _number_ | The corresponding probability | +| **        token** | _integer_ | The predicted token id (to replace the masked one). | +| **        token_str** | _string_ | The predicted token (to replace the masked one). | + + +### Using the API + + + + + +```bash +curl https://api-inference.huggingface.co/models/distilbert-base-uncased \ + -X POST \ + -d '{"inputs": "The answer to the universe is [MASK]."}' \ + -H 'Content-Type: application/json' \ + -H "Authorization: Bearer hf_***" + +``` + + + +```py +import requests + +API_URL = "https://api-inference.huggingface.co/models/distilbert-base-uncased" +headers = {"Authorization": "Bearer hf_***"} + +def query(payload): + response = requests.post(API_URL, headers=headers, json=payload) + return response.json() + +output = query({ + "inputs": "The answer to the universe is [MASK].", +}) +``` + +To use the Python client, see `huggingface_hub`'s [package reference](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.fill_mask). + + + +```js +async function query(data) { + const response = await fetch( + "https://api-inference.huggingface.co/models/distilbert-base-uncased", + { + headers: { + Authorization: "Bearer hf_***" + "Content-Type": "application/json", + }, + method: "POST", + body: JSON.stringify(data), + } + ); + const result = await response.json(); + return result; +} + +query({"inputs": "The answer to the universe is [MASK]."}).then((response) => { + console.log(JSON.stringify(response)); +}); +``` + +To use the JavaScript client, see `huggingface.js`'s [package reference](https://huggingface.co/docs/huggingface.js/inference/classes/HfInference#fillmask). + + + -Automated docs below diff --git a/docs/api-inference/tasks/image_to_image.md b/docs/api-inference/tasks/image_to_image.md index 5de1a34ed..eb197489e 100644 --- a/docs/api-inference/tasks/image_to_image.md +++ b/docs/api-inference/tasks/image_to_image.md @@ -1,4 +1,4 @@ -## Image-to-image +## Image to Image Image-to-image is the task of transforming a source image to match the characteristics of a target image or a target image domain. Any image manipulation and enhancement is possible with image to image models. diff --git a/docs/api-inference/tasks/question_answering.md b/docs/api-inference/tasks/question_answering.md new file mode 100644 index 000000000..3f724c9c2 --- /dev/null +++ b/docs/api-inference/tasks/question_answering.md @@ -0,0 +1,127 @@ +## Question Answering + +Question Answering models can retrieve the answer to a question from a given text, which is useful for searching for an answer in a document. + + + +For more details about the `question-answering` task, check out its [dedicated page](https://huggingface.co/tasks/question-answering)! You will find examples and related materials. + + + +### Recommended models + +- [deepset/roberta-base-squad2](https://huggingface.co/deepset/roberta-base-squad2): A robust baseline model for most question answering domains. +- [google/tapas-base-finetuned-wtq](https://huggingface.co/google/tapas-base-finetuned-wtq): A special model that can answer questions from tables! + +This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=question-answering&sort=trending). + +### API specification + +#### Request + +| Payload | | | +| :--- | :--- | :--- | +| **inputs*** | _object_ | One (context, question) pair to answer | +| **        context*** | _string_ | The context to be used for answering the question | +| **        question*** | _string_ | The question to be answered | +| **parameters** | _object_ | Additional inference parameters for Question Answering | +| **        top_k** | _integer_ | The number of answers to return (will be chosen by order of likelihood). Note that we return less than topk answers if there are not enough options available within the context. | +| **        doc_stride** | _integer_ | If the context is too long to fit with the question for the model, it will be split in several chunks with some overlap. This argument controls the size of that overlap. | +| **        max_answer_len** | _integer_ | The maximum length of predicted answers (e.g., only answers with a shorter length are considered). | +| **        max_seq_len** | _integer_ | The maximum length of the total sentence (context + question) in tokens of each chunk passed to the model. The context will be split in several chunks (using docStride as overlap) if needed. | +| **        max_question_len** | _integer_ | The maximum length of the question after tokenization. It will be truncated if needed. | +| **        handle_impossible_answer** | _boolean_ | Whether to accept impossible as an answer. | +| **        align_to_words** | _boolean_ | Attempts to align the answer to real words. Improves quality on space separated languages. Might hurt on non-space-separated languages (like Japanese or Chinese) | + + +Some options can be configured by passing headers to the Inference API. Here are the available headers: + +| Headers | | | +| :--- | :--- | :--- | +| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). | +| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). | +| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). | + +For more information about Inference API headers, check out the parameters [guide](../parameters). + +#### Response + +| Body | | +| :--- | :--- | :--- | +| **(array)** | _object[]_ | Output is an array of objects. | +| **        answer** | _string_ | The answer to the question. | +| **        score** | _number_ | The probability associated to the answer. | +| **        start** | _integer_ | The character position in the input where the answer begins. | +| **        end** | _integer_ | The character position in the input where the answer ends. | + + +### Using the API + + + + + +```bash +curl https://api-inference.huggingface.co/models/deepset/roberta-base-squad2 \ + -X POST \ + -d '{"inputs": { "question": "What is my name?", "context": "My name is Clara and I live in Berkeley." }}' \ + -H 'Content-Type: application/json' \ + -H "Authorization: Bearer hf_***" + +``` + + + +```py +import requests + +API_URL = "https://api-inference.huggingface.co/models/deepset/roberta-base-squad2" +headers = {"Authorization": "Bearer hf_***"} + +def query(payload): + response = requests.post(API_URL, headers=headers, json=payload) + return response.json() + +output = query({ + "inputs": { + "question": "What is my name?", + "context": "My name is Clara and I live in Berkeley." +}, +}) +``` + +To use the Python client, see `huggingface_hub`'s [package reference](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.question_answering). + + + +```js +async function query(data) { + const response = await fetch( + "https://api-inference.huggingface.co/models/deepset/roberta-base-squad2", + { + headers: { + Authorization: "Bearer hf_***" + "Content-Type": "application/json", + }, + method: "POST", + body: JSON.stringify(data), + } + ); + const result = await response.json(); + return result; +} + +query({"inputs": { + "question": "What is my name?", + "context": "My name is Clara and I live in Berkeley." +}}).then((response) => { + console.log(JSON.stringify(response)); +}); +``` + +To use the JavaScript client, see `huggingface.js`'s [package reference](https://huggingface.co/docs/huggingface.js/inference/classes/HfInference#questionanswering). + + + + + diff --git a/docs/api-inference/tasks/summarization.md b/docs/api-inference/tasks/summarization.md new file mode 100644 index 000000000..f0ed74b66 --- /dev/null +++ b/docs/api-inference/tasks/summarization.md @@ -0,0 +1,106 @@ +## Summarization + +Summarization is the task of producing a shorter version of a document while preserving its important information. Some models can extract text from the original input, while other models can generate entirely new text. + + + +For more details about the `summarization` task, check out its [dedicated page](https://huggingface.co/tasks/summarization)! You will find examples and related materials. + + + +### Recommended models + +- [facebook/bart-large-cnn](https://huggingface.co/facebook/bart-large-cnn): A strong summarization model trained on English news articles. Excels at generating factual summaries. +- [google/bigbird-pegasus-large-pubmed](https://huggingface.co/google/bigbird-pegasus-large-pubmed): A summarization model trained on medical articles. + +This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=summarization&sort=trending). + +### API specification + +#### Request + +| Payload | | | +| :--- | :--- | :--- | + + +Some options can be configured by passing headers to the Inference API. Here are the available headers: + +| Headers | | | +| :--- | :--- | :--- | +| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). | +| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). | +| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). | + +For more information about Inference API headers, check out the parameters [guide](../parameters). + +#### Response + +| Body | | +| :--- | :--- | :--- | +| **summary_text** | _string_ | The summarized text. | + + +### Using the API + + + + + +```bash +curl https://api-inference.huggingface.co/models/facebook/bart-large-cnn \ + -X POST \ + -d '{"inputs": "The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930. It was the first structure to reach a height of 300 metres. Due to the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct."}' \ + -H 'Content-Type: application/json' \ + -H "Authorization: Bearer hf_***" + +``` + + + +```py +import requests + +API_URL = "https://api-inference.huggingface.co/models/facebook/bart-large-cnn" +headers = {"Authorization": "Bearer hf_***"} + +def query(payload): + response = requests.post(API_URL, headers=headers, json=payload) + return response.json() + +output = query({ + "inputs": "The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930. It was the first structure to reach a height of 300 metres. Due to the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct.", +}) +``` + +To use the Python client, see `huggingface_hub`'s [package reference](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.summarization). + + + +```js +async function query(data) { + const response = await fetch( + "https://api-inference.huggingface.co/models/facebook/bart-large-cnn", + { + headers: { + Authorization: "Bearer hf_***" + "Content-Type": "application/json", + }, + method: "POST", + body: JSON.stringify(data), + } + ); + const result = await response.json(); + return result; +} + +query({"inputs": "The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930. It was the first structure to reach a height of 300 metres. Due to the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct."}).then((response) => { + console.log(JSON.stringify(response)); +}); +``` + +To use the JavaScript client, see `huggingface.js`'s [package reference](https://huggingface.co/docs/huggingface.js/inference/classes/HfInference#summarization). + + + + + diff --git a/docs/api-inference/tasks/table_question_answering.md b/docs/api-inference/tasks/table_question_answering.md new file mode 100644 index 000000000..4cb92bc0a --- /dev/null +++ b/docs/api-inference/tasks/table_question_answering.md @@ -0,0 +1,138 @@ +## Table Question Answering + +Table Question Answering (Table QA) is answering a question about information on a given table. + + + +For more details about the `table-question-answering` task, check out its [dedicated page](https://huggingface.co/tasks/table-question-answering)! You will find examples and related materials. + + + +### Recommended models + +- [microsoft/tapex-base](https://huggingface.co/microsoft/tapex-base): A table question answering model that is capable of neural SQL execution, i.e., employ TAPEX to execute a SQL query on a given table. +- [google/tapas-base-finetuned-wtq](https://huggingface.co/google/tapas-base-finetuned-wtq): A robust table question answering model. + +This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=table-question-answering&sort=trending). + +### API specification + +#### Request + +| Payload | | | +| :--- | :--- | :--- | +| **inputs*** | _object_ | One (table, question) pair to answer | +| **        table*** | _object_ | The table to serve as context for the questions | +| **        question*** | _string_ | The question to be answered about the table | +| **parameters** | _object_ | Additional inference parameters for Table Question Answering | + + +Some options can be configured by passing headers to the Inference API. Here are the available headers: + +| Headers | | | +| :--- | :--- | :--- | +| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). | +| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). | +| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). | + +For more information about Inference API headers, check out the parameters [guide](../parameters). + +#### Response + +| Body | | +| :--- | :--- | :--- | +| **(array)** | _object[]_ | Output is an array of objects. | +| **        answer** | _string_ | The answer of the question given the table. If there is an aggregator, the answer will be preceded by `AGGREGATOR >`. | +| **        coordinates** | _array[]_ | Coordinates of the cells of the answers. | +| **        cells** | _string[]_ | List of strings made up of the answer cell values. | +| **        aggregator** | _string_ | If the model has an aggregator, this returns the aggregator. | + + +### Using the API + + + + + +```bash +curl https://api-inference.huggingface.co/models/microsoft/tapex-base \ + -X POST \ + -d '{"inputs": { "query": "How many stars does the transformers repository have?", "table": { "Repository": ["Transformers", "Datasets", "Tokenizers"], "Stars": ["36542", "4512", "3934"], "Contributors": ["651", "77", "34"], "Programming language": [ "Python", "Python", "Rust, Python and NodeJS" ] } }}' \ + -H 'Content-Type: application/json' \ + -H "Authorization: Bearer hf_***" + +``` + + + +```py +import requests + +API_URL = "https://api-inference.huggingface.co/models/microsoft/tapex-base" +headers = {"Authorization": "Bearer hf_***"} + +def query(payload): + response = requests.post(API_URL, headers=headers, json=payload) + return response.json() + +output = query({ + "inputs": { + "query": "How many stars does the transformers repository have?", + "table": { + "Repository": ["Transformers", "Datasets", "Tokenizers"], + "Stars": ["36542", "4512", "3934"], + "Contributors": ["651", "77", "34"], + "Programming language": [ + "Python", + "Python", + "Rust, Python and NodeJS" + ] + } +}, +}) +``` + +To use the Python client, see `huggingface_hub`'s [package reference](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.table_question-answering). + + + +```js +async function query(data) { + const response = await fetch( + "https://api-inference.huggingface.co/models/microsoft/tapex-base", + { + headers: { + Authorization: "Bearer hf_***" + "Content-Type": "application/json", + }, + method: "POST", + body: JSON.stringify(data), + } + ); + const result = await response.json(); + return result; +} + +query({"inputs": { + "query": "How many stars does the transformers repository have?", + "table": { + "Repository": ["Transformers", "Datasets", "Tokenizers"], + "Stars": ["36542", "4512", "3934"], + "Contributors": ["651", "77", "34"], + "Programming language": [ + "Python", + "Python", + "Rust, Python and NodeJS" + ] + } +}}).then((response) => { + console.log(JSON.stringify(response)); +}); +``` + +To use the JavaScript client, see `huggingface.js`'s [package reference](https://huggingface.co/docs/huggingface.js/inference/classes/HfInference#tablequestion-answering). + + + + + diff --git a/docs/api-inference/tasks/text_classification.md b/docs/api-inference/tasks/text_classification.md new file mode 100644 index 000000000..8fffb6654 --- /dev/null +++ b/docs/api-inference/tasks/text_classification.md @@ -0,0 +1,112 @@ +## Text Classification + +Text Classification is the task of assigning a label or class to a given text. Some use cases are sentiment analysis, natural language inference, and assessing grammatical correctness. + + + +For more details about the `text-classification` task, check out its [dedicated page](https://huggingface.co/tasks/text-classification)! You will find examples and related materials. + + + +### Recommended models + +- [distilbert-base-uncased-finetuned-sst-2-english](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english): A robust model trained for sentiment analysis. +- [roberta-large-mnli](https://huggingface.co/roberta-large-mnli): Multi-genre natural language inference model. + +This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=text-classification&sort=trending). + +### API specification + +#### Request + +| Payload | | | +| :--- | :--- | :--- | +| **inputs*** | _string_ | The text to classify | +| **parameters** | _object_ | Additional inference parameters for Text Classification | +| **        function_to_apply** | _enum_ | Possible values: sigmoid, softmax, none. | +| **        top_k** | _integer_ | When specified, limits the output to the top K most probable classes. | + + +Some options can be configured by passing headers to the Inference API. Here are the available headers: + +| Headers | | | +| :--- | :--- | :--- | +| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). | +| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). | +| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). | + +For more information about Inference API headers, check out the parameters [guide](../parameters). + +#### Response + +| Body | | +| :--- | :--- | :--- | +| **(array)** | _undefined[]_ | Output is an array of undefineds. | +| **        label** | _string_ | The predicted class label. | +| **        score** | _number_ | The corresponding probability. | + + +### Using the API + + + + + +```bash +curl https://api-inference.huggingface.co/models/distilbert-base-uncased-finetuned-sst-2-english \ + -X POST \ + -d '{"inputs": "I like you. I love you"}' \ + -H 'Content-Type: application/json' \ + -H "Authorization: Bearer hf_***" + +``` + + + +```py +import requests + +API_URL = "https://api-inference.huggingface.co/models/distilbert-base-uncased-finetuned-sst-2-english" +headers = {"Authorization": "Bearer hf_***"} + +def query(payload): + response = requests.post(API_URL, headers=headers, json=payload) + return response.json() + +output = query({ + "inputs": "I like you. I love you", +}) +``` + +To use the Python client, see `huggingface_hub`'s [package reference](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.text_classification). + + + +```js +async function query(data) { + const response = await fetch( + "https://api-inference.huggingface.co/models/distilbert-base-uncased-finetuned-sst-2-english", + { + headers: { + Authorization: "Bearer hf_***" + "Content-Type": "application/json", + }, + method: "POST", + body: JSON.stringify(data), + } + ); + const result = await response.json(); + return result; +} + +query({"inputs": "I like you. I love you"}).then((response) => { + console.log(JSON.stringify(response)); +}); +``` + +To use the JavaScript client, see `huggingface.js`'s [package reference](https://huggingface.co/docs/huggingface.js/inference/classes/HfInference#textclassification). + + + + + diff --git a/docs/api-inference/tasks/text_generation.md b/docs/api-inference/tasks/text_generation.md index f9881087a..7ca1442ca 100644 --- a/docs/api-inference/tasks/text_generation.md +++ b/docs/api-inference/tasks/text_generation.md @@ -1,4 +1,4 @@ -## Text generation +## Text Generation Generate text based on a prompt. @@ -35,10 +35,10 @@ This is only a subset of the supported models. Find the model that suits you bes | **        frequency_penalty** | _number_ | | | **        grammar** | _object_ | One of the following: | | **                 (#1)** | | | -| **                        type*** | _enum_ | Possible values: json | +| **                        type*** | _enum_ | Possible values: json. | | **                        value*** | _object_ | A string that represents a [JSON Schema](https://json-schema.org/). JSON Schema is a declarative language that allows to annotate JSON documents with types and descriptions. | | **                 (#2)** | | | -| **                        type*** | _enum_ | Possible values: regex | +| **                        type*** | _enum_ | Possible values: regex. | | **                        value*** | _string_ | | | **        max_new_tokens** | _integer_ | | | **        repetition_penalty** | _number_ | | @@ -74,7 +74,7 @@ If `stream` is `false` (default), the response will be a JSON object with the fo | :--- | :--- | :--- | | **details** | _object_ | | | **        best_of_sequences** | _object[]_ | | -| **                finish_reason** | _enum_ | Possible values: length, eos_token, stop_sequence | +| **                finish_reason** | _enum_ | Possible values: length, eos_token, stop_sequence. | | **                generated_text** | _string_ | | | **                generated_tokens** | _integer_ | | | **                prefill** | _object[]_ | | @@ -88,7 +88,11 @@ If `stream` is `false` (default), the response will be a JSON object with the fo | **                        special** | _boolean_ | | | **                        text** | _string_ | | | **                top_tokens** | _array[]_ | | -| **        finish_reason** | _enum_ | Possible values: length, eos_token, stop_sequence | +| **                        id** | _integer_ | | +| **                        logprob** | _number_ | | +| **                        special** | _boolean_ | | +| **                        text** | _string_ | | +| **        finish_reason** | _enum_ | Possible values: length, eos_token, stop_sequence. | | **        generated_tokens** | _integer_ | | | **        prefill** | _object[]_ | | | **                id** | _integer_ | | @@ -101,6 +105,10 @@ If `stream` is `false` (default), the response will be a JSON object with the fo | **                special** | _boolean_ | | | **                text** | _string_ | | | **        top_tokens** | _array[]_ | | +| **                id** | _integer_ | | +| **                logprob** | _number_ | | +| **                special** | _boolean_ | | +| **                text** | _string_ | | | **generated_text** | _string_ | | @@ -110,7 +118,7 @@ For more information about streaming, check out [this guide](https://huggingface | Body | | | :--- | :--- | :--- | | **details** | _object_ | | -| **        finish_reason** | _enum_ | Possible values: length, eos_token, stop_sequence | +| **        finish_reason** | _enum_ | Possible values: length, eos_token, stop_sequence. | | **        generated_tokens** | _integer_ | | | **        seed** | _integer_ | | | **generated_text** | _string_ | | diff --git a/docs/api-inference/tasks/text_to_image.md b/docs/api-inference/tasks/text_to_image.md index 1d642b357..6697ca877 100644 --- a/docs/api-inference/tasks/text_to_image.md +++ b/docs/api-inference/tasks/text_to_image.md @@ -1,4 +1,4 @@ -## Text-to-image +## Text to Image Generate an image based on a given text prompt. @@ -23,7 +23,7 @@ This is only a subset of the supported models. Find the model that suits you bes | Payload | | | | :--- | :--- | :--- | -| **inputs*** | _string_ | The input text data (sometimes called "prompt" | +| **inputs*** | _string_ | The input text data (sometimes called "prompt") | | **parameters** | _object_ | Additional inference parameters for Text To Image | | **        guidance_scale** | _number_ | For diffusion models. A higher guidance scale value encourages the model to generate images closely linked to the text prompt at the expense of lower image quality. | | **        negative_prompt** | _string[]_ | One or several prompt to guide what NOT to include in image generation. | diff --git a/scripts/api-inference/scripts/generate.ts b/scripts/api-inference/scripts/generate.ts index a9d220a07..6d2085ef1 100644 --- a/scripts/api-inference/scripts/generate.ts +++ b/scripts/api-inference/scripts/generate.ts @@ -4,6 +4,19 @@ import * as fs from "node:fs/promises"; import * as path from "node:path/posix"; import type { JsonObject } from "type-fest"; +const TASKS: PipelineType[] = [ + "fill-mask", + "image-to-image", + "question-answering", + "summarization", + "table-question-answering", + "text-classification", + "text-generation", + "text-to-image", +]; +const TASKS_EXTENDED = [...TASKS, "chat-completion"]; +const SPECS_REVISION = "update-specification-for-docs"; + const inferenceSnippetLanguages = ["python", "js", "curl"] as const; type InferenceSnippetLanguage = (typeof inferenceSnippetLanguages)[number]; @@ -96,7 +109,7 @@ export function getInferenceSnippet( const modelData = { id, pipeline_tag, - mask_token: "", + mask_token: "[MASK]", library_name: "", config: {}, }; @@ -112,8 +125,10 @@ export function getInferenceSnippet( type SpecNameType = "input" | "output" | "stream_output"; const SPECS_URL_TEMPLATE = Handlebars.compile( - `https://raw.githubusercontent.com/huggingface/huggingface.js/main/packages/tasks/src/tasks/{{task}}/spec/{{name}}.json`, + `https://raw.githubusercontent.com/huggingface/huggingface.js/${SPECS_REVISION}/packages/tasks/src/tasks/{{task}}/spec/{{name}}.json`, ); +const COMMON_DEFINITIONS_URL = + `https://raw.githubusercontent.com/huggingface/huggingface.js/${SPECS_REVISION}/packages/tasks/src/tasks/common-definitions.json`; async function fetchOneSpec( task: PipelineType, @@ -138,17 +153,22 @@ async function fetchSpecs( }; } -function processPayloadSchema( - schema: any, - definitions: any = {}, - prefix: string = "", -): JsonObject[] { +async function fetchCommonDefinitions(): Promise { + console.log(` πŸ•ΈοΈ Fetching common definitions`); + return fetch(COMMON_DEFINITIONS_URL).then((res) => res.json()); +} + +const COMMON_DEFINITIONS = await fetchCommonDefinitions(); + +function processPayloadSchema(schema: any): JsonObject[] { let rows: JsonObject[] = []; // Helper function to resolve schema references function resolveRef(ref: string) { - const refPath = ref.split("/").slice(1); // remove the initial # - let refSchema = schema; + const refPath = ref.split("#/")[1].split("/"); + let refSchema = ref.includes("common-definitions.json") + ? COMMON_DEFINITIONS + : schema; for (const part of refPath) { refSchema = refSchema[part]; } @@ -175,7 +195,7 @@ function processPayloadSchema( if (value.enum) { type = "enum"; - description = `Possible values: ${value.enum.join(", ")}`; + description = `Possible values: ${value.enum.join(", ")}.`; } const isObject = type === "object" && value.properties; @@ -184,7 +204,8 @@ function processPayloadSchema( const addRow = !(isCombinator && isCombinator.length === 1) && !description.includes("UNUSED") && - !key.includes("SKIP"); + !key.includes("SKIP") && + key.length > 0; if (isCombinator && isCombinator.length > 1) { description = "One of the following:"; @@ -226,14 +247,9 @@ function processPayloadSchema( ); }, ); - } else if (isArray && value.items.$ref) { + } else if (isArray) { // Process array items - processSchemaNode( - "SKIP", - resolveRef(value.items.$ref), - false, - parentPrefix, - ); + processSchemaNode("SKIP", value.items, false, parentPrefix); } else if (isCombinator) { // Process combinators like oneOf, allOf, anyOf const combinators = value.oneOf || value.allOf || value.anyOf; @@ -254,13 +270,26 @@ function processPayloadSchema( } } - // Start processing the root schema - Object.entries(schema.properties || {}).forEach( - ([key, value]: [string, any]) => { - const isRequired = schema.required?.includes(key); - processSchemaNode(key, value, isRequired, prefix); - }, - ); + // Start processing based on the root type of the schema + if (schema.type === "array") { + // If the root schema is an array, process its items + const row = { + name: "(array)", + type: `${schema.items.type}[]`, + description: + schema.items.description || + `Output is an array of ${schema.items.type}s.`, + required: true, + }; + rows.push(row); + processSchemaNode("", schema.items, false, ""); + } else { + // Otherwise, start with the root object + Object.entries(schema.properties || {}).forEach(([key, value]) => { + const required = schema.required?.includes(key); + processSchemaNode(key, value, required, ""); + }); + } return rows; } @@ -294,13 +323,6 @@ const SPECS_OUTPUT_TEMPLATE = Handlebars.compile( //// Data utils //// //////////////////// -const TASKS: PipelineType[] = [ - "image-to-image", - "text-generation", - "text-to-image", -]; -const TASKS_EXTENDED = [...TASKS, "chat-completion"]; - const DATA: { constants: { specsHeaders: string; diff --git a/scripts/api-inference/templates/task/chat_completion.handlebars b/scripts/api-inference/templates/task/chat_completion.handlebars index d0d3272df..f1274f5c5 100644 --- a/scripts/api-inference/templates/task/chat_completion.handlebars +++ b/scripts/api-inference/templates/task/chat_completion.handlebars @@ -1,4 +1,4 @@ -## Chat completion +## Chat Completion Generate a response given a list of messages. This is a subtask of [`text-generation`](./text_generation) designed to generate responses in a conversational context. diff --git a/scripts/api-inference/templates/task/fill_mask.handlebars b/scripts/api-inference/templates/task/fill_mask.handlebars new file mode 100644 index 000000000..663d2ab9f --- /dev/null +++ b/scripts/api-inference/templates/task/fill_mask.handlebars @@ -0,0 +1,29 @@ +## Fill-mask + +Mask filling is the task of predicting the right word (token to be precise) in the middle of a sequence. + +{{{tips.linksToTaskPage.fill-mask}}} + +### Recommended models + +{{#each models.fill-mask}} +- [{{this.id}}](https://huggingface.co/{{this.id}}): {{this.description}} +{{/each}} + +{{{tips.listModelsLink.fill-mask}}} + +### API specification + +#### Request + +{{{specs.fill-mask.input}}} + +{{{constants.specsHeaders}}} + +#### Response + +{{{specs.fill-mask.output}}} + +### Using the API + +{{{snippets.fill-mask}}} diff --git a/scripts/api-inference/templates/task/image_to_image.handlebars b/scripts/api-inference/templates/task/image_to_image.handlebars index b432eab19..258dec814 100644 --- a/scripts/api-inference/templates/task/image_to_image.handlebars +++ b/scripts/api-inference/templates/task/image_to_image.handlebars @@ -1,4 +1,4 @@ -## Image-to-image +## Image to Image Image-to-image is the task of transforming a source image to match the characteristics of a target image or a target image domain. Any image manipulation and enhancement is possible with image to image models. diff --git a/scripts/api-inference/templates/task/question_answering.handlebars b/scripts/api-inference/templates/task/question_answering.handlebars new file mode 100644 index 000000000..101d00fcc --- /dev/null +++ b/scripts/api-inference/templates/task/question_answering.handlebars @@ -0,0 +1,29 @@ +## Question Answering + +Question Answering models can retrieve the answer to a question from a given text, which is useful for searching for an answer in a document. + +{{{tips.linksToTaskPage.question-answering}}} + +### Recommended models + +{{#each models.question-answering}} +- [{{this.id}}](https://huggingface.co/{{this.id}}): {{this.description}} +{{/each}} + +{{{tips.listModelsLink.question-answering}}} + +### API specification + +#### Request + +{{{specs.question-answering.input}}} + +{{{constants.specsHeaders}}} + +#### Response + +{{{specs.question-answering.output}}} + +### Using the API + +{{{snippets.question-answering}}} diff --git a/scripts/api-inference/templates/task/summarization.handlebars b/scripts/api-inference/templates/task/summarization.handlebars new file mode 100644 index 000000000..890487215 --- /dev/null +++ b/scripts/api-inference/templates/task/summarization.handlebars @@ -0,0 +1,29 @@ +## Summarization + +Summarization is the task of producing a shorter version of a document while preserving its important information. Some models can extract text from the original input, while other models can generate entirely new text. + +{{{tips.linksToTaskPage.summarization}}} + +### Recommended models + +{{#each models.summarization}} +- [{{this.id}}](https://huggingface.co/{{this.id}}): {{this.description}} +{{/each}} + +{{{tips.listModelsLink.summarization}}} + +### API specification + +#### Request + +{{{specs.summarization.input}}} + +{{{constants.specsHeaders}}} + +#### Response + +{{{specs.summarization.output}}} + +### Using the API + +{{{snippets.summarization}}} diff --git a/scripts/api-inference/templates/task/table_question_answering.handlebars b/scripts/api-inference/templates/task/table_question_answering.handlebars new file mode 100644 index 000000000..4ae8b53fc --- /dev/null +++ b/scripts/api-inference/templates/task/table_question_answering.handlebars @@ -0,0 +1,29 @@ +## Table Question Answering + +Table Question Answering (Table QA) is the answering a question about an information on a given table. + +{{{tips.linksToTaskPage.table-question-answering}}} + +### Recommended models + +{{#each models.table-question-answering}} +- [{{this.id}}](https://huggingface.co/{{this.id}}): {{this.description}} +{{/each}} + +{{{tips.listModelsLink.table-question-answering}}} + +### API specification + +#### Request + +{{{specs.table-question-answering.input}}} + +{{{constants.specsHeaders}}} + +#### Response + +{{{specs.table-question-answering.output}}} + +### Using the API + +{{{snippets.table-question-answering}}} diff --git a/scripts/api-inference/templates/task/text_classification.handlebars b/scripts/api-inference/templates/task/text_classification.handlebars new file mode 100644 index 000000000..99c3cabe8 --- /dev/null +++ b/scripts/api-inference/templates/task/text_classification.handlebars @@ -0,0 +1,29 @@ +## Text Classification + +Text Classification is the task of assigning a label or class to a given text. Some use cases are sentiment analysis, natural language inference, and assessing grammatical correctness. + +{{{tips.linksToTaskPage.text-classification}}} + +### Recommended models + +{{#each models.text-classification}} +- [{{this.id}}](https://huggingface.co/{{this.id}}): {{this.description}} +{{/each}} + +{{{tips.listModelsLink.text-classification}}} + +### API specification + +#### Request + +{{{specs.text-classification.input}}} + +{{{constants.specsHeaders}}} + +#### Response + +{{{specs.text-classification.output}}} + +### Using the API + +{{{snippets.text-classification}}} diff --git a/scripts/api-inference/templates/task/text_generation.handlebars b/scripts/api-inference/templates/task/text_generation.handlebars index e7b27b919..09c7fcfab 100644 --- a/scripts/api-inference/templates/task/text_generation.handlebars +++ b/scripts/api-inference/templates/task/text_generation.handlebars @@ -1,4 +1,4 @@ -## Text generation +## Text Generation Generate text based on a prompt. diff --git a/scripts/api-inference/templates/task/text_to_image.handlebars b/scripts/api-inference/templates/task/text_to_image.handlebars index 6c9c568d1..6e6ffd0c6 100644 --- a/scripts/api-inference/templates/task/text_to_image.handlebars +++ b/scripts/api-inference/templates/task/text_to_image.handlebars @@ -1,4 +1,4 @@ -## Text-to-image +## Text to Image Generate an image based on a given text prompt. From 3eb85e8082a187a9c998d50fcaaff29df305b160 Mon Sep 17 00:00:00 2001 From: Wauplin Date: Wed, 28 Aug 2024 17:04:07 +0200 Subject: [PATCH 8/9] mention chat completion in text generation docs --- docs/api-inference/tasks/table_question_answering.md | 2 +- docs/api-inference/tasks/text_generation.md | 2 ++ scripts/api-inference/scripts/generate.ts | 3 +-- .../api-inference/templates/task/text_generation.handlebars | 2 ++ 4 files changed, 6 insertions(+), 3 deletions(-) diff --git a/docs/api-inference/tasks/table_question_answering.md b/docs/api-inference/tasks/table_question_answering.md index 4cb92bc0a..e3122e425 100644 --- a/docs/api-inference/tasks/table_question_answering.md +++ b/docs/api-inference/tasks/table_question_answering.md @@ -1,6 +1,6 @@ ## Table Question Answering -Table Question Answering (Table QA) is answering a question about information on a given table. +Table Question Answering (Table QA) is the answering a question about an information on a given table. diff --git a/docs/api-inference/tasks/text_generation.md b/docs/api-inference/tasks/text_generation.md index 7ca1442ca..fb3e41b3f 100644 --- a/docs/api-inference/tasks/text_generation.md +++ b/docs/api-inference/tasks/text_generation.md @@ -2,6 +2,8 @@ Generate text based on a prompt. +If you are interested in a Chat Completion task, which generates a response based on a list of messages, check out the [`chat-completion`](./chat_completion) task. + For more details about the `text-generation` task, check out its [dedicated page](https://huggingface.co/tasks/text-generation)! You will find examples and related materials. diff --git a/scripts/api-inference/scripts/generate.ts b/scripts/api-inference/scripts/generate.ts index 6d2085ef1..90a118aaa 100644 --- a/scripts/api-inference/scripts/generate.ts +++ b/scripts/api-inference/scripts/generate.ts @@ -127,8 +127,7 @@ type SpecNameType = "input" | "output" | "stream_output"; const SPECS_URL_TEMPLATE = Handlebars.compile( `https://raw.githubusercontent.com/huggingface/huggingface.js/${SPECS_REVISION}/packages/tasks/src/tasks/{{task}}/spec/{{name}}.json`, ); -const COMMON_DEFINITIONS_URL = - `https://raw.githubusercontent.com/huggingface/huggingface.js/${SPECS_REVISION}/packages/tasks/src/tasks/common-definitions.json`; +const COMMON_DEFINITIONS_URL = `https://raw.githubusercontent.com/huggingface/huggingface.js/${SPECS_REVISION}/packages/tasks/src/tasks/common-definitions.json`; async function fetchOneSpec( task: PipelineType, diff --git a/scripts/api-inference/templates/task/text_generation.handlebars b/scripts/api-inference/templates/task/text_generation.handlebars index 09c7fcfab..85bbba97a 100644 --- a/scripts/api-inference/templates/task/text_generation.handlebars +++ b/scripts/api-inference/templates/task/text_generation.handlebars @@ -2,6 +2,8 @@ Generate text based on a prompt. +If you are interested in a Chat Completion task, which generates a response based on a list of messages, check out the [`chat-completion`](./chat_completion) task. + {{{tips.linksToTaskPage.text-generation}}} ### Recommended models From 486e809e12669c4c8c6152d7a0040927d0385425 Mon Sep 17 00:00:00 2001 From: Wauplin Date: Wed, 28 Aug 2024 17:07:31 +0200 Subject: [PATCH 9/9] fix chat completion snippets --- docs/api-inference/tasks/chat_completion.md | 67 ++++++++++----------- scripts/api-inference/scripts/generate.ts | 2 +- 2 files changed, 34 insertions(+), 35 deletions(-) diff --git a/docs/api-inference/tasks/chat_completion.md b/docs/api-inference/tasks/chat_completion.md index 1b6c0cb0b..c01fe9ac1 100644 --- a/docs/api-inference/tasks/chat_completion.md +++ b/docs/api-inference/tasks/chat_completion.md @@ -144,29 +144,35 @@ For more information about streaming, check out [this guide](https://huggingface ```bash -curl https://api-inference.huggingface.co/models/undefined \ - -X POST \ - -d '{"inputs": "Can you please let us know more details about your "}' \ - -H 'Content-Type: application/json' \ - -H "Authorization: Bearer hf_***" +curl 'https://api-inference.huggingface.co/models/google/gemma-2-2b-it/v1/chat/completions' \ +-H "Authorization: Bearer hf_***" \ +-H 'Content-Type: application/json' \ +-d '{ + "model": "google/gemma-2-2b-it", + "messages": [{"role": "user", "content": "What is the capital of France?"}], + "max_tokens": 500, + "stream": false +}' ``` ```py -import requests - -API_URL = "https://api-inference.huggingface.co/models/undefined" -headers = {"Authorization": "Bearer hf_***"} - -def query(payload): - response = requests.post(API_URL, headers=headers, json=payload) - return response.json() - -output = query({ - "inputs": "Can you please let us know more details about your ", -}) +from huggingface_hub import InferenceClient + +client = InferenceClient( + "google/gemma-2-2b-it", + token="hf_***", +) + +for message in client.chat_completion( + messages=[{"role": "user", "content": "What is the capital of France?"}], + max_tokens=500, + stream=True, +): + print(message.choices[0].delta.content, end="") + ``` To use the Python client, see `huggingface_hub`'s [package reference](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.chat_completion). @@ -174,25 +180,18 @@ To use the Python client, see `huggingface_hub`'s [package reference](https://hu ```js -async function query(data) { - const response = await fetch( - "https://api-inference.huggingface.co/models/undefined", - { - headers: { - Authorization: "Bearer hf_***" - "Content-Type": "application/json", - }, - method: "POST", - body: JSON.stringify(data), - } - ); - const result = await response.json(); - return result; +import { HfInference } from "@huggingface/inference"; + +const inference = new HfInference("hf_***"); + +for await (const chunk of inference.chatCompletionStream({ + model: "google/gemma-2-2b-it", + messages: [{ role: "user", content: "What is the capital of France?" }], + max_tokens: 500, +})) { + process.stdout.write(chunk.choices[0]?.delta?.content || ""); } -query({"inputs": "Can you please let us know more details about your "}).then((response) => { - console.log(JSON.stringify(response)); -}); ``` To use the JavaScript client, see `huggingface.js`'s [package reference](https://huggingface.co/docs/huggingface.js/inference/classes/HfInference#chatcompletion). diff --git a/scripts/api-inference/scripts/generate.ts b/scripts/api-inference/scripts/generate.ts index 90a118aaa..9ef681c07 100644 --- a/scripts/api-inference/scripts/generate.ts +++ b/scripts/api-inference/scripts/generate.ts @@ -435,7 +435,7 @@ function fetchChatCompletion() { ); // Snippet specific to chat completion - const mainModel = DATA.models["chat-completion"][0].id; + const mainModel = DATA.models["chat-completion"][0]; const mainModelData = { // @ts-ignore id: mainModel.id,