Skip to content

Commit

Permalink
First draft for text-to-image, image-to-image + generate script (#1384)
Browse files Browse the repository at this point in the history
* First draft for text-to-image

* add correct code snippets

* Update docs/api-inference/tasks/text-to-image.md

Co-authored-by: Omar Sanseviero <[email protected]>

* better table?

* Generate tasks pages from script (#1386)

* init project

* first script to generate task pages

* commit generated content

* generate payload table as well

* so undecisive

* hey

* better ?

* Add image-to-image page

* template for snippets section + few things

* few things

* Update scripts/api-inference/templates/specs_headers.handlebars

Co-authored-by: Omar Sanseviero <[email protected]>

* Update scripts/api-inference/templates/specs_headers.handlebars

Co-authored-by: Omar Sanseviero <[email protected]>

* generate

* fetch inference status

---------

Co-authored-by: Omar Sanseviero <[email protected]>
  • Loading branch information
Wauplin and osanseviero committed Aug 27, 2024
1 parent 9bf223e commit 51750bf
Show file tree
Hide file tree
Showing 17 changed files with 1,231 additions and 0 deletions.
4 changes: 4 additions & 0 deletions docs/api-inference/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,5 +14,9 @@
- sections:
- local: tasks/fill_mask
title: Fill Mask
- local: tasks/image_to_image
title: Image-to-image
- local: tasks/text_to_image
title: Text-to-image
title: Detailed Task Parameters
title: API Reference
63 changes: 63 additions & 0 deletions docs/api-inference/tasks/image_to_image.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
## Image-to-image

Image-to-image is the task of transforming a source image to match the characteristics of a target image or a target image domain.
Any image manipulation and enhancement is possible with image to image models.

Use cases heavily depend on the model and the dataset it was trained on, but some common use cases include:
- Style transfer
- Image colorization
- Image super-resolution
- Image inpainting

<Tip>

For more details about the `image-to-image` task, check out its [dedicated page](https://huggingface.co/tasks/image-to-image)! You will find examples and related materials.

</Tip>

### Recommended models

- [fal/AuraSR-v2](https://huggingface.co/fal/AuraSR-v2): An image-to-image model to improve image resolution.
- [keras-io/super-resolution](https://huggingface.co/keras-io/super-resolution): A model that increases the resolution of an image.
- [lambdalabs/sd-image-variations-diffusers](https://huggingface.co/lambdalabs/sd-image-variations-diffusers): A model that creates a set of variations of the input image in the style of DALL-E using Stable Diffusion.
- [mfidabel/controlnet-segment-anything](https://huggingface.co/mfidabel/controlnet-segment-anything): A model that generates images based on segments in the input image and the text prompt.
- [timbrooks/instruct-pix2pix](https://huggingface.co/timbrooks/instruct-pix2pix): A model that takes an image and an instruction to edit the image.

This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=image-to-image&sort=trending).

### API specification

#### Request

| Payload | | |
| :--- | :--- | :--- |
| **inputs** | _object, required_ | The input image data |
| **parameters** | _object, optional_ | Additional inference parameters for Image To Image |
| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;guidance_scale** | _number, optional_ | For diffusion models. A higher guidance scale value encourages the model to generate images closely linked to the text prompt at the expense of lower image quality. |
| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;negative_prompt** | _array, optional_ | One or several prompt to guide what NOT to include in image generation. |
| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;num_inference_steps** | _integer, optional_ | For diffusion models. The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. |
| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;target_size** | _object, optional_ | The size in pixel of the output image |
| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;width** | _integer, required_ | |
| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;height** | _integer, required_ | |


| Headers | | |
| :--- | :--- | :--- |
| **authorization** | _string, optional_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). |
| **x-use-cache** | _boolean, optional, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). |
| **x-wait-for-model** | _boolean, optional, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). |


#### Response

| Body | |
| :--- | :--- |
| **image** | The output image |


### Using the API


No snippet available for this task.


116 changes: 116 additions & 0 deletions docs/api-inference/tasks/text_to_image.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
## Text-to-image

Generate an image based on a given text prompt.

<Tip>

For more details about the `text-to-image` task, check out its [dedicated page](https://huggingface.co/tasks/text-to-image)! You will find examples and related materials.

</Tip>

### Recommended models

- [black-forest-labs/FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev): One of the most powerful image generation models that can generate realistic outputs.
- [latent-consistency/lcm-lora-sdxl](https://huggingface.co/latent-consistency/lcm-lora-sdxl): A powerful yet fast image generation model.
- [Kwai-Kolors/Kolors](https://huggingface.co/Kwai-Kolors/Kolors): Text-to-image model for photorealistic generation.
- [stabilityai/stable-diffusion-3-medium-diffusers](https://huggingface.co/stabilityai/stable-diffusion-3-medium-diffusers): A powerful text-to-image model.

This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=text-to-image&sort=trending).

### API specification

#### Request

| Payload | | |
| :--- | :--- | :--- |
| **inputs** | _string, required_ | The input text data (sometimes called "prompt" |
| **parameters** | _object, optional_ | Additional inference parameters for Text To Image |
| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;guidance_scale** | _number, optional_ | For diffusion models. A higher guidance scale value encourages the model to generate images closely linked to the text prompt at the expense of lower image quality. |
| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;negative_prompt** | _array, optional_ | One or several prompt to guide what NOT to include in image generation. |
| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;num_inference_steps** | _integer, optional_ | For diffusion models. The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. |
| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;target_size** | _object, optional_ | The size in pixel of the output image |
| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;width** | _integer, required_ | |
| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;height** | _integer, required_ | |
| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;scheduler** | _string, optional_ | For diffusion models. Override the scheduler with a compatible one |


| Headers | | |
| :--- | :--- | :--- |
| **authorization** | _string, optional_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). |
| **x-use-cache** | _boolean, optional, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). |
| **x-wait-for-model** | _boolean, optional, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). |


#### Response

| Body | |
| :--- | :--- |
| **image** | The generated image |


### Using the API


<inferencesnippet>

<curl>
```bash
curl https://api-inference.huggingface.co/models/black-forest-labs/FLUX.1-dev \
-X POST \
-d '{"inputs": "Astronaut riding a horse"}' \
-H 'Content-Type: application/json' \
-H "Authorization: Bearer hf_***"

```
</curl>

<python>
```py
import requests

API_URL = "https://api-inference.huggingface.co/models/black-forest-labs/FLUX.1-dev"
headers = {"Authorization": "Bearer hf_***"}

def query(payload):
response = requests.post(API_URL, headers=headers, json=payload)
return response.content
image_bytes = query({
"inputs": "Astronaut riding a horse",
})
# You can access the image with PIL.Image for example
import io
from PIL import Image
image = Image.open(io.BytesIO(image_bytes))
```

To use the Python client, see `huggingface_hub`'s [package reference](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.text_to-image).
</python>

<js>
```js
async function query(data) {
const response = await fetch(
"https://api-inference.huggingface.co/models/black-forest-labs/FLUX.1-dev",
{
headers: {
Authorization: "Bearer hf_***"
"Content-Type": "application/json",
},
method: "POST",
body: JSON.stringify(data),
}
);
const result = await response.blob();
return result;
}
query({"inputs": "Astronaut riding a horse"}).then((response) => {
// Use image
});
```

To use the JavaScript client, see `huggingface.js`'s [package reference](https://huggingface.co/docs/huggingface.js/inference/classes/HfInference#textto-image).
</js>

</inferencesnippet>


1 change: 1 addition & 0 deletions scripts/api-inference/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
dist
5 changes: 5 additions & 0 deletions scripts/api-inference/.prettierignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pnpm-lock.yaml
# In order to avoid code samples to have tabs, they don't display well on npm
README.md
dist
*.handlebars
11 changes: 11 additions & 0 deletions scripts/api-inference/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
Install dependencies.

```sh
pnpm install
```

Generate documentation.

```sh
pnpm run generate
```
26 changes: 26 additions & 0 deletions scripts/api-inference/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
{
"name": "api-inference-generator",
"version": "1.0.0",
"description": "",
"main": "index.js",
"type": "module",
"scripts": {
"format": "prettier --write .",
"format:check": "prettier --check .",
"generate": "tsx scripts/generate.ts"
},
"keywords": [],
"author": "",
"license": "ISC",
"dependencies": {
"@huggingface/tasks": "^0.11.11",
"@types/node": "^22.5.0",
"handlebars": "^4.7.8",
"node": "^20.17.0",
"prettier": "^3.3.3",
"ts-node": "^10.9.2",
"tsx": "^4.17.0",
"type-fest": "^4.25.0",
"typescript": "^5.5.4"
}
}
Loading

0 comments on commit 51750bf

Please sign in to comment.