Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First draft for text-to-image, image-to-image + generate script #1384

Merged
merged 11 commits into from
Aug 27, 2024

Conversation

Wauplin
Copy link
Contributor

@Wauplin Wauplin commented Aug 20, 2024

(related to #1379)

cc @osanseviero for viz'

End goal is to generate this page based on info from:

I wrote the content in this PR manually to validate the format. I have added:

  • a small description from the tasks page
  • a link to https://huggingface.co/tasks/text-to-image for more info
  • a list of recommended models from the tasks page
  • the API specification
    • inputs: payload (from specs) and headers (will always be the same)
    • output: payload (from specs). In the text-to-image example it's annoying because the output is not a json, no not describable using openschema. We should find a proxy to say "it's just bytes" in the specs so that docs are generated correctly.
  • examples:
    • CURL => how to generate (?)
    • Python => from hugginface_hub example (?). Add link to docs.
    • JavaScript => from huggingface.js (?). Add link to docs.

Open questions:

  • we should add an example URL like https://api-inference.huggingface.co/models/black-forest-labs/FLUX.1-dev. Depends on what we do for the curl example but we need it in any case.
  • should we add a curl example? is it possible to generate it? are the current examples maintained?
  • we should harmonize python/javascript snippets with the ones from https://huggingface.co/black-forest-labs/FLUX.1-dev?inference_api=true? If yes, how?

Anything else?

@Wauplin Wauplin requested a review from osanseviero August 20, 2024 14:25
@osanseviero
Copy link
Contributor

should we add a curl example? is it possible to generate it? are the current examples maintained?

Yes, adding a curl example is important and were recently updated 👍

@Wauplin
Copy link
Contributor Author

Wauplin commented Aug 20, 2024

I've updated the draft based on @osanseviero feedback:

Copy link
Contributor

@osanseviero osanseviero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool stuff! 🔥


- [black-forest-labs/FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev): one of the most powerful image generation models that can generate realistic outputs.
- [latent-consistency/lcm-lora-sdxl](https://huggingface.co/latent-consistency/lcm-lora-sdxl): a powerful yet fast image generation model.
- [Kwai-Kolors/Kolors](https://huggingface.co/Kwai-Kolors/Kolors): text-to-image model for photorealistic generation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This model is frozen. I think it's ok for now but let's consider filtering for only warm/cold models in the future

- [Kwai-Kolors/Kolors](https://huggingface.co/Kwai-Kolors/Kolors): text-to-image model for photorealistic generation.
- [stabilityai/stable-diffusion-3-medium-diffusers](https://huggingface.co/stabilityai/stable-diffusion-3-medium-diffusers): a powerful text-to-image model.

This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=text-to-image&sort=trending).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened an internal issue so we can do an OR of warm and cold

| **inputs** | _string, required_ | The input text data (sometimes called "prompt"). |
| **parameters.guidance_scale** | _number, optional_ | For diffusion models. A higher guidance scale value encourages the model to generate images closely linked to the text prompt at the expense of lower image quality. |
| **parameters.negative_prompt[]** | _string, optional_ | FOne or several prompt to guide what NOT to include in image generation. |
| **parameters.num_inference_steps** | _integer, optional_ | For diffusion models. The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of the bunch of parameters. .... Let's think if we can make something that keeps a clear difference while not being so repetitive

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the record, I tried using nested lists

- **inputs** (_string, required_): The input text data (sometimes called "prompt").
- **parameters** (_object, optional_):
  - **guidance_scale** (_number, optional_): For diffusion models. A higher guidance scale value encourages the model to generate images closely linked to the text prompt at the expense of lower image quality.
  - **negative_prompt[]** (_string or string[], optional_): One or several prompts to guide what NOT to include in image generation.
  - **num_inference_steps** (_integer, optional_): For diffusion models. The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.
  - **target_size** (_object, optional_):
    - **width** (_integer, optional_): The size in pixels of the output image.
    - **height** (_integer, optional_): The size in pixels of the output image.
  - **scheduler** (_string, optional_): For diffusion models. Override the scheduler with a compatible one.

resulting in
image

Copy link
Contributor Author

@Wauplin Wauplin Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also tried nested table like this, which I'm really not a fan (wasted space)

| Payload |   |    |
| :--- | :--- | :--- | :--- | :--- |
| **inputs** |  |  | _string, required_ | The input text data (sometimes called "prompt"). |
| **parameters**  |  | |  |  |
|  | **guidance_scale**  || _number, optional_ | For diffusion models. A higher guidance scale value encourages the model to generate images closely linked to the text prompt at the expense of lower image quality. |
|  | **negative_prompt[]** | | _string, optional_ | One or several prompt to guide what NOT to include in image generation. |
|  | **num_inference_steps** | | _integer, optional_ | For diffusion models. The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. |
|  | **target_size**  ||  |  |
|  |  | **width** | _integer, optional_ | The size in pixel of the output image. |
|  |  | **height** | _integer, optional_ | The size in pixel of the output image. |
|  | **scheduler**  | | _string, optional_ | For diffusion models. Override the scheduler with a compatible one. |

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(current being this)

| Payload |   |    |
| :--- | :--- | :--- |
| **inputs** | _string, required_ | The input text data (sometimes called "prompt"). |
| **parameters.guidance_scale** | _number, optional_ | For diffusion models. A higher guidance scale value encourages the model to generate images closely linked to the text prompt at the expense of lower image quality. |
| **parameters.negative_prompt[]** | _string, optional_ | One or several prompt to guide what NOT to include in image generation. |
| **parameters.num_inference_steps** | _integer, optional_ | For diffusion models. The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. |
| **parameters.target_size.width** | _integer, optional_ | The size in pixel of the output image. |
| **parameters.target_size.height** | _integer, optional_ | The size in pixel of the output image. |
| **parameters.scheduler** | _string, optional_ | For diffusion models. Override the scheduler with a compatible one. |

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or solution with a table but some nesting using a lot of  

| Payload |   |    |
| :--- | :--- | :--- |
| **inputs** | _string, required_ | The input text data (sometimes called "prompt"). |
| **parameters** |  _object, optional_ |  |
| **        guidance_scale** |  _number, optional_ | For diffusion models. A higher guidance scale value encourages the model to generate images closely linked to the text prompt at the expense of lower image quality. |
|  **        negative_prompt** |  _string or string[], optional_ | One or several prompt to guide what NOT to include in image generation. |
|  **        num_inference_steps** |  _integer, optional_ | For diffusion models. The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. |
|  **        target_size** |  _object, optional_ |  |
|  **                width** |  _integer, required_ | The size in pixel of the output image. |
|  **                height** |  _integer, required_ | The size in pixel of the output image. |
|  **        scheduler** |  _string, optional_ | For diffusion models. Override the scheduler with a compatible one. |

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go with the last solution. It's ugly markdown-wise but the table looks ok. Can be changed in the future.

| :--- | :--- | :--- |
| **inputs** | _string, required_ | The input text data (sometimes called "prompt"). |
| **parameters.guidance_scale** | _number, optional_ | For diffusion models. A higher guidance scale value encourages the model to generate images closely linked to the text prompt at the expense of lower image quality. |
| **parameters.negative_prompt[]** | _string, optional_ | FOne or several prompt to guide what NOT to include in image generation. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The typing seems a bit off for me. Is it always an array of strings (even if it's just one?)

| **inputs** | _string, required_ | The input text data (sometimes called "prompt"). |
| **parameters.guidance_scale** | _number, optional_ | For diffusion models. A higher guidance scale value encourages the model to generate images closely linked to the text prompt at the expense of lower image quality. |
| **parameters.negative_prompt[]** | _string, optional_ | FOne or several prompt to guide what NOT to include in image generation. |
| **parameters.num_inference_steps** | _integer, optional_ | For diffusion models. The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To check if all of these already work well out of the box

docs/api-inference/tasks/text-to-image.md Outdated Show resolved Hide resolved
| **parameters.target_size.height** | _integer, optional_ | The size in pixel of the output image. |
| **parameters.scheduler** | _string, optional_ | For diffusion models. Override the scheduler with a compatible one. |

| Headers | | |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I would also document this in docs/api-inference/task_parameters.md maybe as it's a general parameter for all models

docs/api-inference/tasks/text-to-image.md Outdated Show resolved Hide resolved
docs/api-inference/tasks/text-to-image.md Outdated Show resolved Hide resolved
Wauplin and others added 4 commits August 21, 2024 16:12
* init project

* first script to generate task pages

* commit generated content

* generate payload table as well

* so undecisive

* hey

* better ?

* Add image-to-image page

* template for snippets section + few things

* few things
@Wauplin
Copy link
Contributor Author

Wauplin commented Aug 23, 2024

With #1386 being merged, we have a clean first part to merge into the new_api_docs now. Let's not forget the few TODOs there.

@Wauplin Wauplin changed the title First draft for text-to-image First draft for text-to-image, image-to-image + generate script Aug 23, 2024
@Wauplin Wauplin requested a review from osanseviero August 23, 2024 13:31
Copy link
Contributor

@osanseviero osanseviero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 🔥


</Tip>

### Recommended models
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed before, we'll need to filter out models that are not warm. The one I would have here is https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought twice about it and I don't know how we can do that. The inference=warm status can change over time but since the docs are static once generated, it might not be accurate when a user visits the page.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed d60825f to fetch the inference status of each model. For now I haven't changed the templates for the "Recommended models" part. For text-to-image task it would be ok but for image-to-image task there are no warm models in the suggested ones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

| :--- | :--- | :--- |
| **inputs** | _string, required_ | The input text data (sometimes called "prompt" |
| **parameters** | _object, optional_ | Additional inference parameters for Text To Image |
| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;guidance_scale** | _number, optional_ | For diffusion models. A higher guidance scale value encourages the model to generate images closely linked to the text prompt at the expense of lower image quality. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the "diffusion models" clarification important? Only diffusers supports this task

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤷‍♂️

Not really no. I don't know where the original description comes from but we can update the specs in huggingface.js yes

scripts/api-inference/templates/specs_headers.handlebars Outdated Show resolved Hide resolved
scripts/api-inference/templates/specs_headers.handlebars Outdated Show resolved Hide resolved
@@ -0,0 +1,301 @@
import { snippets, PipelineType } from "@huggingface/tasks";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm skipping this file for now

@Wauplin
Copy link
Contributor Author

Wauplin commented Aug 27, 2024

As discussed offline, let's merge.

@Wauplin Wauplin merged commit 12ba289 into new_api_docs Aug 27, 2024
1 check passed
@Wauplin Wauplin deleted the add-text-to-image-example branch August 27, 2024 13:35
Wauplin added a commit that referenced this pull request Aug 27, 2024
* First draft for text-to-image

* add correct code snippets

* Update docs/api-inference/tasks/text-to-image.md

Co-authored-by: Omar Sanseviero <[email protected]>

* better table?

* Generate tasks pages from script (#1386)

* init project

* first script to generate task pages

* commit generated content

* generate payload table as well

* so undecisive

* hey

* better ?

* Add image-to-image page

* template for snippets section + few things

* few things

* Update scripts/api-inference/templates/specs_headers.handlebars

Co-authored-by: Omar Sanseviero <[email protected]>

* Update scripts/api-inference/templates/specs_headers.handlebars

Co-authored-by: Omar Sanseviero <[email protected]>

* generate

* fetch inference status

---------

Co-authored-by: Omar Sanseviero <[email protected]>
Wauplin added a commit that referenced this pull request Sep 12, 2024
* Add draft of docs structure

* Add index page

* Prepare overview and rate limits

* Manage redirects

* Clean up

* Apply suggestions from code review

Co-authored-by: Pedro Cuenca <[email protected]>

* Apply suggestions from review

* Add additional headers

* Apply suggestions from code review

Co-authored-by: Lucain <[email protected]>

* Incorporate reviewer's feedback

* First draft for text-to-image, image-to-image + generate script (#1384)

* First draft for text-to-image

* add correct code snippets

* Update docs/api-inference/tasks/text-to-image.md

Co-authored-by: Omar Sanseviero <[email protected]>

* better table?

* Generate tasks pages from script (#1386)

* init project

* first script to generate task pages

* commit generated content

* generate payload table as well

* so undecisive

* hey

* better ?

* Add image-to-image page

* template for snippets section + few things

* few things

* Update scripts/api-inference/templates/specs_headers.handlebars

Co-authored-by: Omar Sanseviero <[email protected]>

* Update scripts/api-inference/templates/specs_headers.handlebars

Co-authored-by: Omar Sanseviero <[email protected]>

* generate

* fetch inference status

---------

Co-authored-by: Omar Sanseviero <[email protected]>

* Add getting started

* Add draft of docs structure

* Add index page

* Prepare overview and rate limits

* Manage redirects

* Clean up

* Apply suggestions from review

* Apply suggestions from code review

Co-authored-by: Pedro Cuenca <[email protected]>

* Add additional headers

* Apply suggestions from code review

Co-authored-by: Lucain <[email protected]>

* Incorporate reviewer's feedback

* First draft for text-to-image, image-to-image + generate script (#1384)

* First draft for text-to-image

* add correct code snippets

* Update docs/api-inference/tasks/text-to-image.md

Co-authored-by: Omar Sanseviero <[email protected]>

* better table?

* Generate tasks pages from script (#1386)

* init project

* first script to generate task pages

* commit generated content

* generate payload table as well

* so undecisive

* hey

* better ?

* Add image-to-image page

* template for snippets section + few things

* few things

* Update scripts/api-inference/templates/specs_headers.handlebars

Co-authored-by: Omar Sanseviero <[email protected]>

* Update scripts/api-inference/templates/specs_headers.handlebars

Co-authored-by: Omar Sanseviero <[email protected]>

* generate

* fetch inference status

---------

Co-authored-by: Omar Sanseviero <[email protected]>

* Add getting started

* Update docs/api-inference/getting_started.md

Co-authored-by: Lucain <[email protected]>

* Draft to add text-generation parameters (#1393)

* first draft to add text-generation parameters

* headers

* more structure

* add chat-completion

* better handling of arrays

* better handling of parameters

* Add new tasks pages (fill mask, summarization, question answering, sentence similarity) (#1394)

* add fill mask

* add summarization

* add question answering

* Table question answering

* handle array output

* Add sentence similarity

* text classification (almost)

* better with an enum

* Add mask token

* capitalize

* remove sentence-similarity

* Update docs/api-inference/tasks/table_question_answering.md

Co-authored-by: Omar Sanseviero <[email protected]>

---------

Co-authored-by: Omar Sanseviero <[email protected]>

* mention chat completion in text generation docs

* fix chat completion snippets

---------

Co-authored-by: Omar Sanseviero <[email protected]>

* Filter out frozen models from API docs for tasks (#1396)

* Filter out frozen models

* use placeholder

* New api docs suggestions (#1397)

* show as diff

* reorder toctree

* wording update

* diff

* Add comment header on each task page (#1400)

* Add comment header on each task page

* add huggingface.co/api/tasks

* Add even more tasks: token classification, translation and zero shot classification (#1398)

* Add token classification

* add translation task

* add zero shot classification

* more parameters

* More tasks more tasks more tasks! (#1399)

* add ASR

* fix early stopping parameter

* regenrate

* add audio_classification

* Image classification

* Object detection

* image segementation

* unknown when we don't know

* gen

* feature extraction

* update

* regenerate

* pull from main

* coding style

* Update _redirects.yml

* Rename all tasks '_' to '-' (#1405)

* Rename all tasks '_' to '-'

* also for other urls

* Update docs/api-inference/index.md

Co-authored-by: Victor Muštar <[email protected]>

* Apply feedback for "new_api_docs" (#1408)

* Update getting started examples

* Move snippets above specification

* custom link for finegrained token

* Fixes new docs (#1413)

* Misc changes

* Wrap up

* Apply suggestions from code review

* generate

* Add todos to avoid forgetting about them

---------

Co-authored-by: Lucain <[email protected]>
Co-authored-by: Wauplin <[email protected]>

---------

Co-authored-by: Pedro Cuenca <[email protected]>
Co-authored-by: Lucain <[email protected]>
Co-authored-by: Wauplin <[email protected]>
Co-authored-by: Victor Muštar <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants