Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First draft for text-to-image, image-to-image + generate script #1384

Merged
merged 11 commits into from
Aug 27, 2024
4 changes: 4 additions & 0 deletions docs/api-inference/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,8 @@
- sections:
- local: tasks/fill_mask
title: Fill Mask
- local: tasks/image_to_image
title: Image-to-image
- local: tasks/text_to_image
title: Text-to-image
title: Parameters
63 changes: 63 additions & 0 deletions docs/api-inference/tasks/image_to_image.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
## Image-to-image

Image-to-image is the task of transforming a source image to match the characteristics of a target image or a target image domain.
Any image manipulation and enhancement is possible with image to image models.

Use cases heavily depend on the model and the dataset it was trained on, but some common use cases include:
- Style transfer
- Image colorization
- Image super-resolution
- Image inpainting

<Tip>

For more details about the `image-to-image` task, check out its [dedicated page](https://huggingface.co/tasks/image-to-image)! You will find examples and related materials.

</Tip>

### Recommended models
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed before, we'll need to filter out models that are not warm. The one I would have here is https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought twice about it and I don't know how we can do that. The inference=warm status can change over time but since the docs are static once generated, it might not be accurate when a user visits the page.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed d60825f to fetch the inference status of each model. For now I haven't changed the templates for the "Recommended models" part. For text-to-image task it would be ok but for image-to-image task there are no warm models in the suggested ones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


- [fal/AuraSR-v2](https://huggingface.co/fal/AuraSR-v2): An image-to-image model to improve image resolution.
- [keras-io/super-resolution](https://huggingface.co/keras-io/super-resolution): A model that increases the resolution of an image.
- [lambdalabs/sd-image-variations-diffusers](https://huggingface.co/lambdalabs/sd-image-variations-diffusers): A model that creates a set of variations of the input image in the style of DALL-E using Stable Diffusion.
- [mfidabel/controlnet-segment-anything](https://huggingface.co/mfidabel/controlnet-segment-anything): A model that generates images based on segments in the input image and the text prompt.
- [timbrooks/instruct-pix2pix](https://huggingface.co/timbrooks/instruct-pix2pix): A model that takes an image and an instruction to edit the image.

This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=image-to-image&sort=trending).

### API specification

#### Request

| Payload | | |
| :--- | :--- | :--- |
| **inputs** | _object, required_ | The input image data |
| **parameters** | _object, optional_ | Additional inference parameters for Image To Image |
| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;guidance_scale** | _number, optional_ | For diffusion models. A higher guidance scale value encourages the model to generate images closely linked to the text prompt at the expense of lower image quality. |
| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;negative_prompt** | _array, optional_ | One or several prompt to guide what NOT to include in image generation. |
| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;num_inference_steps** | _integer, optional_ | For diffusion models. The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. |
| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;target_size** | _object, optional_ | The size in pixel of the output image |
| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;width** | _integer, required_ | |
| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;height** | _integer, required_ | |


| Headers | | |
| :--- | :--- | :--- |
| **authorization** | _string, optional_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). |
| **x-use-cache** | _boolean, optional, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#additional-parameters-different-page]). |
| **x-wait-for-model** | _boolean, optional, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../parameters#additional-parameters-different-page]). |


#### Response

| Body | |
| :--- | :--- |
| **image** | The output image |


### Using the API


No snippet available for this task.


116 changes: 116 additions & 0 deletions docs/api-inference/tasks/text_to_image.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
## Text-to-image

Generate an image based on a given text prompt.

<Tip>

For more details about the `text-to-image` task, check out its [dedicated page](https://huggingface.co/tasks/text-to-image)! You will find examples and related materials.

</Tip>

### Recommended models

- [black-forest-labs/FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev): One of the most powerful image generation models that can generate realistic outputs.
- [latent-consistency/lcm-lora-sdxl](https://huggingface.co/latent-consistency/lcm-lora-sdxl): A powerful yet fast image generation model.
- [Kwai-Kolors/Kolors](https://huggingface.co/Kwai-Kolors/Kolors): Text-to-image model for photorealistic generation.
- [stabilityai/stable-diffusion-3-medium-diffusers](https://huggingface.co/stabilityai/stable-diffusion-3-medium-diffusers): A powerful text-to-image model.

This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=text-to-image&sort=trending).

### API specification

#### Request

| Payload | | |
| :--- | :--- | :--- |
| **inputs** | _string, required_ | The input text data (sometimes called "prompt" |
| **parameters** | _object, optional_ | Additional inference parameters for Text To Image |
| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;guidance_scale** | _number, optional_ | For diffusion models. A higher guidance scale value encourages the model to generate images closely linked to the text prompt at the expense of lower image quality. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the "diffusion models" clarification important? Only diffusers supports this task

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤷‍♂️

Not really no. I don't know where the original description comes from but we can update the specs in huggingface.js yes

| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;negative_prompt** | _array, optional_ | One or several prompt to guide what NOT to include in image generation. |
| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;num_inference_steps** | _integer, optional_ | For diffusion models. The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. |
| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;target_size** | _object, optional_ | The size in pixel of the output image |
| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;width** | _integer, required_ | |
| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;height** | _integer, required_ | |
| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;scheduler** | _string, optional_ | For diffusion models. Override the scheduler with a compatible one |


| Headers | | |
| :--- | :--- | :--- |
| **authorization** | _string, optional_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). |
| **x-use-cache** | _boolean, optional, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#additional-parameters-different-page]). |
| **x-wait-for-model** | _boolean, optional, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../parameters#additional-parameters-different-page]). |


#### Response

| Body | |
| :--- | :--- |
| **image** | The generated image |


### Using the API


<inferencesnippet>

<curl>
```bash
curl https://api-inference.huggingface.co/models/black-forest-labs/FLUX.1-dev \
-X POST \
-d '{"inputs": "Astronaut riding a horse"}' \
-H 'Content-Type: application/json' \
-H "Authorization: Bearer hf_***"

```
</curl>

<python>
```py
import requests

API_URL = "https://api-inference.huggingface.co/models/black-forest-labs/FLUX.1-dev"
headers = {"Authorization": "Bearer hf_***"}

def query(payload):
response = requests.post(API_URL, headers=headers, json=payload)
return response.content
image_bytes = query({
"inputs": "Astronaut riding a horse",
})
# You can access the image with PIL.Image for example
import io
from PIL import Image
image = Image.open(io.BytesIO(image_bytes))
```

To use the Python client, see `huggingface_hub`'s [package reference](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.text_to-image).
</python>

<js>
```js
async function query(data) {
const response = await fetch(
"https://api-inference.huggingface.co/models/black-forest-labs/FLUX.1-dev",
{
headers: {
Authorization: "Bearer hf_***"
"Content-Type": "application/json",
},
method: "POST",
body: JSON.stringify(data),
}
);
const result = await response.blob();
return result;
}
query({"inputs": "Astronaut riding a horse"}).then((response) => {
// Use image
});
```

To use the JavaScript client, see `huggingface.js`'s [package reference](https://huggingface.co/docs/huggingface.js/inference/classes/HfInference#textto-image).
</js>

</inferencesnippet>


1 change: 1 addition & 0 deletions scripts/api-inference/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
dist
5 changes: 5 additions & 0 deletions scripts/api-inference/.prettierignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pnpm-lock.yaml
# In order to avoid code samples to have tabs, they don't display well on npm
README.md
dist
*.handlebars
11 changes: 11 additions & 0 deletions scripts/api-inference/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
Install dependencies.

```sh
pnpm install
```

Generate documentation.

```sh
pnpm run generate
```
26 changes: 26 additions & 0 deletions scripts/api-inference/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
{
"name": "api-inference-generator",
"version": "1.0.0",
"description": "",
"main": "index.js",
"type": "module",
"scripts": {
"format": "prettier --write .",
"format:check": "prettier --check .",
"generate": "tsx scripts/generate.ts"
},
"keywords": [],
"author": "",
"license": "ISC",
"dependencies": {
"@huggingface/tasks": "^0.11.11",
"@types/node": "^22.5.0",
"handlebars": "^4.7.8",
"node": "^20.17.0",
"prettier": "^3.3.3",
"ts-node": "^10.9.2",
"tsx": "^4.17.0",
"type-fest": "^4.25.0",
"typescript": "^5.5.4"
}
}
Loading