-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First draft for text-to-image, image-to-image + generate script #1384
Conversation
Yes, adding a curl example is important and were recently updated 👍 |
I've updated the draft based on @osanseviero feedback:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool stuff! 🔥
|
||
- [black-forest-labs/FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev): one of the most powerful image generation models that can generate realistic outputs. | ||
- [latent-consistency/lcm-lora-sdxl](https://huggingface.co/latent-consistency/lcm-lora-sdxl): a powerful yet fast image generation model. | ||
- [Kwai-Kolors/Kolors](https://huggingface.co/Kwai-Kolors/Kolors): text-to-image model for photorealistic generation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This model is frozen. I think it's ok for now but let's consider filtering for only warm/cold models in the future
- [Kwai-Kolors/Kolors](https://huggingface.co/Kwai-Kolors/Kolors): text-to-image model for photorealistic generation. | ||
- [stabilityai/stable-diffusion-3-medium-diffusers](https://huggingface.co/stabilityai/stable-diffusion-3-medium-diffusers): a powerful text-to-image model. | ||
|
||
This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=text-to-image&sort=trending). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened an internal issue so we can do an OR of warm and cold
| **inputs** | _string, required_ | The input text data (sometimes called "prompt"). | | ||
| **parameters.guidance_scale** | _number, optional_ | For diffusion models. A higher guidance scale value encourages the model to generate images closely linked to the text prompt at the expense of lower image quality. | | ||
| **parameters.negative_prompt[]** | _string, optional_ | FOne or several prompt to guide what NOT to include in image generation. | | ||
| **parameters.num_inference_steps** | _integer, optional_ | For diffusion models. The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of the bunch of parameters. ...
. Let's think if we can make something that keeps a clear difference while not being so repetitive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the record, I tried using nested lists
- **inputs** (_string, required_): The input text data (sometimes called "prompt").
- **parameters** (_object, optional_):
- **guidance_scale** (_number, optional_): For diffusion models. A higher guidance scale value encourages the model to generate images closely linked to the text prompt at the expense of lower image quality.
- **negative_prompt[]** (_string or string[], optional_): One or several prompts to guide what NOT to include in image generation.
- **num_inference_steps** (_integer, optional_): For diffusion models. The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.
- **target_size** (_object, optional_):
- **width** (_integer, optional_): The size in pixels of the output image.
- **height** (_integer, optional_): The size in pixels of the output image.
- **scheduler** (_string, optional_): For diffusion models. Override the scheduler with a compatible one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also tried nested table like this, which I'm really not a fan (wasted space)
| Payload | | |
| :--- | :--- | :--- | :--- | :--- |
| **inputs** | | | _string, required_ | The input text data (sometimes called "prompt"). |
| **parameters** | | | | |
| | **guidance_scale** || _number, optional_ | For diffusion models. A higher guidance scale value encourages the model to generate images closely linked to the text prompt at the expense of lower image quality. |
| | **negative_prompt[]** | | _string, optional_ | One or several prompt to guide what NOT to include in image generation. |
| | **num_inference_steps** | | _integer, optional_ | For diffusion models. The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. |
| | **target_size** || | |
| | | **width** | _integer, optional_ | The size in pixel of the output image. |
| | | **height** | _integer, optional_ | The size in pixel of the output image. |
| | **scheduler** | | _string, optional_ | For diffusion models. Override the scheduler with a compatible one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(current being this)
| Payload | | |
| :--- | :--- | :--- |
| **inputs** | _string, required_ | The input text data (sometimes called "prompt"). |
| **parameters.guidance_scale** | _number, optional_ | For diffusion models. A higher guidance scale value encourages the model to generate images closely linked to the text prompt at the expense of lower image quality. |
| **parameters.negative_prompt[]** | _string, optional_ | One or several prompt to guide what NOT to include in image generation. |
| **parameters.num_inference_steps** | _integer, optional_ | For diffusion models. The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. |
| **parameters.target_size.width** | _integer, optional_ | The size in pixel of the output image. |
| **parameters.target_size.height** | _integer, optional_ | The size in pixel of the output image. |
| **parameters.scheduler** | _string, optional_ | For diffusion models. Override the scheduler with a compatible one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or solution with a table but some nesting using a lot of
| Payload | | |
| :--- | :--- | :--- |
| **inputs** | _string, required_ | The input text data (sometimes called "prompt"). |
| **parameters** | _object, optional_ | |
| ** guidance_scale** | _number, optional_ | For diffusion models. A higher guidance scale value encourages the model to generate images closely linked to the text prompt at the expense of lower image quality. |
| ** negative_prompt** | _string or string[], optional_ | One or several prompt to guide what NOT to include in image generation. |
| ** num_inference_steps** | _integer, optional_ | For diffusion models. The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. |
| ** target_size** | _object, optional_ | |
| ** width** | _integer, required_ | The size in pixel of the output image. |
| ** height** | _integer, required_ | The size in pixel of the output image. |
| ** scheduler** | _string, optional_ | For diffusion models. Override the scheduler with a compatible one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's go with the last solution. It's ugly markdown-wise but the table looks ok. Can be changed in the future.
| :--- | :--- | :--- | | ||
| **inputs** | _string, required_ | The input text data (sometimes called "prompt"). | | ||
| **parameters.guidance_scale** | _number, optional_ | For diffusion models. A higher guidance scale value encourages the model to generate images closely linked to the text prompt at the expense of lower image quality. | | ||
| **parameters.negative_prompt[]** | _string, optional_ | FOne or several prompt to guide what NOT to include in image generation. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The typing seems a bit off for me. Is it always an array of strings (even if it's just one?)
| **inputs** | _string, required_ | The input text data (sometimes called "prompt"). | | ||
| **parameters.guidance_scale** | _number, optional_ | For diffusion models. A higher guidance scale value encourages the model to generate images closely linked to the text prompt at the expense of lower image quality. | | ||
| **parameters.negative_prompt[]** | _string, optional_ | FOne or several prompt to guide what NOT to include in image generation. | | ||
| **parameters.num_inference_steps** | _integer, optional_ | For diffusion models. The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To check if all of these already work well out of the box
| **parameters.target_size.height** | _integer, optional_ | The size in pixel of the output image. | | ||
| **parameters.scheduler** | _string, optional_ | For diffusion models. Override the scheduler with a compatible one. | | ||
|
||
| Headers | | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I would also document this in docs/api-inference/task_parameters.md
maybe as it's a general parameter for all models
Co-authored-by: Omar Sanseviero <[email protected]>
* init project * first script to generate task pages * commit generated content * generate payload table as well * so undecisive * hey * better ? * Add image-to-image page * template for snippets section + few things * few things
With #1386 being merged, we have a clean first part to merge into the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice 🔥
|
||
</Tip> | ||
|
||
### Recommended models |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed before, we'll need to filter out models that are not warm. The one I would have here is https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought twice about it and I don't know how we can do that. The inference=warm
status can change over time but since the docs are static once generated, it might not be accurate when a user visits the page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed d60825f to fetch the inference status of each model. For now I haven't changed the templates for the "Recommended models" part. For text-to-image
task it would be ok but for image-to-image
task there are no warm models in the suggested ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0 should be listed on https://huggingface.co/tasks/image-to-image but that seems like a short term solution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| :--- | :--- | :--- | | ||
| **inputs** | _string, required_ | The input text data (sometimes called "prompt" | | ||
| **parameters** | _object, optional_ | Additional inference parameters for Text To Image | | ||
| ** guidance_scale** | _number, optional_ | For diffusion models. A higher guidance scale value encourages the model to generate images closely linked to the text prompt at the expense of lower image quality. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the "diffusion models" clarification important? Only diffusers
supports this task
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤷♂️
Not really no. I don't know where the original description comes from but we can update the specs in huggingface.js
yes
@@ -0,0 +1,301 @@ | |||
import { snippets, PipelineType } from "@huggingface/tasks"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm skipping this file for now
Co-authored-by: Omar Sanseviero <[email protected]>
Co-authored-by: Omar Sanseviero <[email protected]>
As discussed offline, let's merge. |
* First draft for text-to-image * add correct code snippets * Update docs/api-inference/tasks/text-to-image.md Co-authored-by: Omar Sanseviero <[email protected]> * better table? * Generate tasks pages from script (#1386) * init project * first script to generate task pages * commit generated content * generate payload table as well * so undecisive * hey * better ? * Add image-to-image page * template for snippets section + few things * few things * Update scripts/api-inference/templates/specs_headers.handlebars Co-authored-by: Omar Sanseviero <[email protected]> * Update scripts/api-inference/templates/specs_headers.handlebars Co-authored-by: Omar Sanseviero <[email protected]> * generate * fetch inference status --------- Co-authored-by: Omar Sanseviero <[email protected]>
* Add draft of docs structure * Add index page * Prepare overview and rate limits * Manage redirects * Clean up * Apply suggestions from code review Co-authored-by: Pedro Cuenca <[email protected]> * Apply suggestions from review * Add additional headers * Apply suggestions from code review Co-authored-by: Lucain <[email protected]> * Incorporate reviewer's feedback * First draft for text-to-image, image-to-image + generate script (#1384) * First draft for text-to-image * add correct code snippets * Update docs/api-inference/tasks/text-to-image.md Co-authored-by: Omar Sanseviero <[email protected]> * better table? * Generate tasks pages from script (#1386) * init project * first script to generate task pages * commit generated content * generate payload table as well * so undecisive * hey * better ? * Add image-to-image page * template for snippets section + few things * few things * Update scripts/api-inference/templates/specs_headers.handlebars Co-authored-by: Omar Sanseviero <[email protected]> * Update scripts/api-inference/templates/specs_headers.handlebars Co-authored-by: Omar Sanseviero <[email protected]> * generate * fetch inference status --------- Co-authored-by: Omar Sanseviero <[email protected]> * Add getting started * Add draft of docs structure * Add index page * Prepare overview and rate limits * Manage redirects * Clean up * Apply suggestions from review * Apply suggestions from code review Co-authored-by: Pedro Cuenca <[email protected]> * Add additional headers * Apply suggestions from code review Co-authored-by: Lucain <[email protected]> * Incorporate reviewer's feedback * First draft for text-to-image, image-to-image + generate script (#1384) * First draft for text-to-image * add correct code snippets * Update docs/api-inference/tasks/text-to-image.md Co-authored-by: Omar Sanseviero <[email protected]> * better table? * Generate tasks pages from script (#1386) * init project * first script to generate task pages * commit generated content * generate payload table as well * so undecisive * hey * better ? * Add image-to-image page * template for snippets section + few things * few things * Update scripts/api-inference/templates/specs_headers.handlebars Co-authored-by: Omar Sanseviero <[email protected]> * Update scripts/api-inference/templates/specs_headers.handlebars Co-authored-by: Omar Sanseviero <[email protected]> * generate * fetch inference status --------- Co-authored-by: Omar Sanseviero <[email protected]> * Add getting started * Update docs/api-inference/getting_started.md Co-authored-by: Lucain <[email protected]> * Draft to add text-generation parameters (#1393) * first draft to add text-generation parameters * headers * more structure * add chat-completion * better handling of arrays * better handling of parameters * Add new tasks pages (fill mask, summarization, question answering, sentence similarity) (#1394) * add fill mask * add summarization * add question answering * Table question answering * handle array output * Add sentence similarity * text classification (almost) * better with an enum * Add mask token * capitalize * remove sentence-similarity * Update docs/api-inference/tasks/table_question_answering.md Co-authored-by: Omar Sanseviero <[email protected]> --------- Co-authored-by: Omar Sanseviero <[email protected]> * mention chat completion in text generation docs * fix chat completion snippets --------- Co-authored-by: Omar Sanseviero <[email protected]> * Filter out frozen models from API docs for tasks (#1396) * Filter out frozen models * use placeholder * New api docs suggestions (#1397) * show as diff * reorder toctree * wording update * diff * Add comment header on each task page (#1400) * Add comment header on each task page * add huggingface.co/api/tasks * Add even more tasks: token classification, translation and zero shot classification (#1398) * Add token classification * add translation task * add zero shot classification * more parameters * More tasks more tasks more tasks! (#1399) * add ASR * fix early stopping parameter * regenrate * add audio_classification * Image classification * Object detection * image segementation * unknown when we don't know * gen * feature extraction * update * regenerate * pull from main * coding style * Update _redirects.yml * Rename all tasks '_' to '-' (#1405) * Rename all tasks '_' to '-' * also for other urls * Update docs/api-inference/index.md Co-authored-by: Victor Muštar <[email protected]> * Apply feedback for "new_api_docs" (#1408) * Update getting started examples * Move snippets above specification * custom link for finegrained token * Fixes new docs (#1413) * Misc changes * Wrap up * Apply suggestions from code review * generate * Add todos to avoid forgetting about them --------- Co-authored-by: Lucain <[email protected]> Co-authored-by: Wauplin <[email protected]> --------- Co-authored-by: Pedro Cuenca <[email protected]> Co-authored-by: Lucain <[email protected]> Co-authored-by: Wauplin <[email protected]> Co-authored-by: Victor Muštar <[email protected]>
(related to #1379)
cc @osanseviero for viz'
End goal is to generate this page based on info from:
huggingface_hub
doc example?)huggingface.js
doc example?)I wrote the content in this PR manually to validate the format. I have added:
text-to-image
example it's annoying because the output is not a json, no not describable using openschema. We should find a proxy to say "it's just bytes" in the specs so that docs are generated correctly.hugginface_hub
example (?). Add link to docs.huggingface.js
(?). Add link to docs.Open questions:
https://api-inference.huggingface.co/models/black-forest-labs/FLUX.1-dev
. Depends on what we do for the curl example but we need it in any case.Anything else?