-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First draft for text-to-image, image-to-image + generate script #1384
Changes from all commits
0ef46ad
949a4a7
2b5af74
1f2f4ff
a981e9f
f6b31a8
b0565a0
e656b19
0b91a04
d60825f
dddffe3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
## Image-to-image | ||
|
||
Image-to-image is the task of transforming a source image to match the characteristics of a target image or a target image domain. | ||
Any image manipulation and enhancement is possible with image to image models. | ||
|
||
Use cases heavily depend on the model and the dataset it was trained on, but some common use cases include: | ||
- Style transfer | ||
- Image colorization | ||
- Image super-resolution | ||
- Image inpainting | ||
|
||
<Tip> | ||
|
||
For more details about the `image-to-image` task, check out its [dedicated page](https://huggingface.co/tasks/image-to-image)! You will find examples and related materials. | ||
|
||
</Tip> | ||
|
||
### Recommended models | ||
|
||
- [fal/AuraSR-v2](https://huggingface.co/fal/AuraSR-v2): An image-to-image model to improve image resolution. | ||
- [keras-io/super-resolution](https://huggingface.co/keras-io/super-resolution): A model that increases the resolution of an image. | ||
- [lambdalabs/sd-image-variations-diffusers](https://huggingface.co/lambdalabs/sd-image-variations-diffusers): A model that creates a set of variations of the input image in the style of DALL-E using Stable Diffusion. | ||
- [mfidabel/controlnet-segment-anything](https://huggingface.co/mfidabel/controlnet-segment-anything): A model that generates images based on segments in the input image and the text prompt. | ||
- [timbrooks/instruct-pix2pix](https://huggingface.co/timbrooks/instruct-pix2pix): A model that takes an image and an instruction to edit the image. | ||
|
||
This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=image-to-image&sort=trending). | ||
|
||
### API specification | ||
|
||
#### Request | ||
|
||
| Payload | | | | ||
| :--- | :--- | :--- | | ||
| **inputs** | _object, required_ | The input image data | | ||
| **parameters** | _object, optional_ | Additional inference parameters for Image To Image | | ||
| ** guidance_scale** | _number, optional_ | For diffusion models. A higher guidance scale value encourages the model to generate images closely linked to the text prompt at the expense of lower image quality. | | ||
| ** negative_prompt** | _array, optional_ | One or several prompt to guide what NOT to include in image generation. | | ||
| ** num_inference_steps** | _integer, optional_ | For diffusion models. The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. | | ||
| ** target_size** | _object, optional_ | The size in pixel of the output image | | ||
| ** width** | _integer, required_ | | | ||
| ** height** | _integer, required_ | | | ||
|
||
|
||
| Headers | | | | ||
| :--- | :--- | :--- | | ||
| **authorization** | _string, optional_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). | | ||
| **x-use-cache** | _boolean, optional, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). | | ||
| **x-wait-for-model** | _boolean, optional, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). | | ||
|
||
|
||
#### Response | ||
|
||
| Body | | | ||
| :--- | :--- | | ||
| **image** | The output image | | ||
|
||
|
||
### Using the API | ||
|
||
|
||
No snippet available for this task. | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,116 @@ | ||
## Text-to-image | ||
|
||
Generate an image based on a given text prompt. | ||
|
||
<Tip> | ||
|
||
For more details about the `text-to-image` task, check out its [dedicated page](https://huggingface.co/tasks/text-to-image)! You will find examples and related materials. | ||
|
||
</Tip> | ||
|
||
### Recommended models | ||
|
||
- [black-forest-labs/FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev): One of the most powerful image generation models that can generate realistic outputs. | ||
- [latent-consistency/lcm-lora-sdxl](https://huggingface.co/latent-consistency/lcm-lora-sdxl): A powerful yet fast image generation model. | ||
- [Kwai-Kolors/Kolors](https://huggingface.co/Kwai-Kolors/Kolors): Text-to-image model for photorealistic generation. | ||
- [stabilityai/stable-diffusion-3-medium-diffusers](https://huggingface.co/stabilityai/stable-diffusion-3-medium-diffusers): A powerful text-to-image model. | ||
|
||
This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=text-to-image&sort=trending). | ||
|
||
### API specification | ||
|
||
#### Request | ||
|
||
| Payload | | | | ||
| :--- | :--- | :--- | | ||
| **inputs** | _string, required_ | The input text data (sometimes called "prompt" | | ||
| **parameters** | _object, optional_ | Additional inference parameters for Text To Image | | ||
| ** guidance_scale** | _number, optional_ | For diffusion models. A higher guidance scale value encourages the model to generate images closely linked to the text prompt at the expense of lower image quality. | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the "diffusion models" clarification important? Only There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🤷♂️ Not really no. I don't know where the original description comes from but we can update the specs in |
||
| ** negative_prompt** | _array, optional_ | One or several prompt to guide what NOT to include in image generation. | | ||
| ** num_inference_steps** | _integer, optional_ | For diffusion models. The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. | | ||
| ** target_size** | _object, optional_ | The size in pixel of the output image | | ||
| ** width** | _integer, required_ | | | ||
| ** height** | _integer, required_ | | | ||
| ** scheduler** | _string, optional_ | For diffusion models. Override the scheduler with a compatible one | | ||
|
||
|
||
| Headers | | | | ||
| :--- | :--- | :--- | | ||
| **authorization** | _string, optional_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). | | ||
| **x-use-cache** | _boolean, optional, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). | | ||
| **x-wait-for-model** | _boolean, optional, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). | | ||
|
||
|
||
#### Response | ||
|
||
| Body | | | ||
| :--- | :--- | | ||
| **image** | The generated image | | ||
|
||
|
||
### Using the API | ||
|
||
|
||
<inferencesnippet> | ||
|
||
<curl> | ||
```bash | ||
curl https://api-inference.huggingface.co/models/black-forest-labs/FLUX.1-dev \ | ||
-X POST \ | ||
-d '{"inputs": "Astronaut riding a horse"}' \ | ||
-H 'Content-Type: application/json' \ | ||
-H "Authorization: Bearer hf_***" | ||
|
||
``` | ||
</curl> | ||
|
||
<python> | ||
```py | ||
import requests | ||
|
||
API_URL = "https://api-inference.huggingface.co/models/black-forest-labs/FLUX.1-dev" | ||
headers = {"Authorization": "Bearer hf_***"} | ||
|
||
def query(payload): | ||
response = requests.post(API_URL, headers=headers, json=payload) | ||
return response.content | ||
image_bytes = query({ | ||
"inputs": "Astronaut riding a horse", | ||
}) | ||
# You can access the image with PIL.Image for example | ||
import io | ||
from PIL import Image | ||
image = Image.open(io.BytesIO(image_bytes)) | ||
``` | ||
|
||
To use the Python client, see `huggingface_hub`'s [package reference](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.text_to-image). | ||
</python> | ||
|
||
<js> | ||
```js | ||
async function query(data) { | ||
const response = await fetch( | ||
"https://api-inference.huggingface.co/models/black-forest-labs/FLUX.1-dev", | ||
{ | ||
headers: { | ||
Authorization: "Bearer hf_***" | ||
"Content-Type": "application/json", | ||
}, | ||
method: "POST", | ||
body: JSON.stringify(data), | ||
} | ||
); | ||
const result = await response.blob(); | ||
return result; | ||
} | ||
query({"inputs": "Astronaut riding a horse"}).then((response) => { | ||
// Use image | ||
}); | ||
``` | ||
|
||
To use the JavaScript client, see `huggingface.js`'s [package reference](https://huggingface.co/docs/huggingface.js/inference/classes/HfInference#textto-image). | ||
</js> | ||
|
||
</inferencesnippet> | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
dist |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
pnpm-lock.yaml | ||
# In order to avoid code samples to have tabs, they don't display well on npm | ||
README.md | ||
dist | ||
*.handlebars |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
Install dependencies. | ||
|
||
```sh | ||
pnpm install | ||
``` | ||
|
||
Generate documentation. | ||
|
||
```sh | ||
pnpm run generate | ||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
{ | ||
"name": "api-inference-generator", | ||
"version": "1.0.0", | ||
"description": "", | ||
"main": "index.js", | ||
"type": "module", | ||
"scripts": { | ||
"format": "prettier --write .", | ||
"format:check": "prettier --check .", | ||
"generate": "tsx scripts/generate.ts" | ||
}, | ||
"keywords": [], | ||
"author": "", | ||
"license": "ISC", | ||
"dependencies": { | ||
"@huggingface/tasks": "^0.11.11", | ||
"@types/node": "^22.5.0", | ||
"handlebars": "^4.7.8", | ||
"node": "^20.17.0", | ||
"prettier": "^3.3.3", | ||
"ts-node": "^10.9.2", | ||
"tsx": "^4.17.0", | ||
"type-fest": "^4.25.0", | ||
"typescript": "^5.5.4" | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed before, we'll need to filter out models that are not warm. The one I would have here is https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought twice about it and I don't know how we can do that. The
inference=warm
status can change over time but since the docs are static once generated, it might not be accurate when a user visits the page.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed d60825f to fetch the inference status of each model. For now I haven't changed the templates for the "Recommended models" part. For
text-to-image
task it would be ok but forimage-to-image
task there are no warm models in the suggested ones.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0 should be listed on https://huggingface.co/tasks/image-to-image but that seems like a short term solution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/huggingface/hub-docs/pull/1396/files