Skip to content

Commit

Permalink
Putting the provider arg more front'n'center (and other tweaks) (#1114
Browse files Browse the repository at this point in the history
)
  • Loading branch information
julien-c authored Jan 17, 2025
1 parent f6e1749 commit c83cc3e
Show file tree
Hide file tree
Showing 2 changed files with 25 additions and 23 deletions.
31 changes: 14 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ await uploadFile({
}
});

// Use HF Inference API
// Use HF Inference API, or external Inference Providers!

await inference.chatCompletion({
model: "meta-llama/Llama-3.1-8B-Instruct",
Expand All @@ -39,6 +39,7 @@ await inference.chatCompletion({
],
max_tokens: 512,
temperature: 0.5,
provider: "sambanova", // or together, fal-ai, replicate, …
});

await inference.textToImage({
Expand Down Expand Up @@ -146,16 +147,16 @@ for await (const chunk of inference.chatCompletionStream({

/// Using a third-party provider:
await inference.chatCompletion({
model: "meta-llama/Llama-3.1-8B-Instruct",
messages: [{ role: "user", content: "Hello, nice to meet you!" }],
max_tokens: 512,
provider: "sambanova"
model: "meta-llama/Llama-3.1-8B-Instruct",
messages: [{ role: "user", content: "Hello, nice to meet you!" }],
max_tokens: 512,
provider: "sambanova", // or together, fal-ai, replicate, …
})

await inference.textToImage({
model: "black-forest-labs/FLUX.1-dev",
inputs: "a picture of a green bird",
provider: "together"
model: "black-forest-labs/FLUX.1-dev",
inputs: "a picture of a green bird",
provider: "fal-ai",
})


Expand All @@ -169,14 +170,10 @@ await inference.translation({
},
});

await inference.textToImage({
model: 'black-forest-labs/FLUX.1-dev',
inputs: 'a picture of a green bird',
})

// pass multimodal files or URLs as inputs
await inference.imageToText({
model: 'nlpconnect/vit-gpt2-image-captioning',
data: await (await fetch('https://picsum.photos/300/300')).blob(),
model: 'nlpconnect/vit-gpt2-image-captioning',
})

// Using your own dedicated inference endpoint: https://hf.co/docs/inference-endpoints/
Expand All @@ -188,9 +185,9 @@ const llamaEndpoint = inference.endpoint(
"https://api-inference.huggingface.co/models/meta-llama/Llama-3.1-8B-Instruct"
);
const out = await llamaEndpoint.chatCompletion({
model: "meta-llama/Llama-3.1-8B-Instruct",
messages: [{ role: "user", content: "Hello, nice to meet you!" }],
max_tokens: 512,
model: "meta-llama/Llama-3.1-8B-Instruct",
messages: [{ role: "user", content: "Hello, nice to meet you!" }],
max_tokens: 512,
});
console.log(out.choices[0].message);
```
Expand Down
17 changes: 11 additions & 6 deletions packages/inference/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,15 +42,15 @@ const hf = new HfInference('your access token')

Your access token should be kept private. If you need to protect it in front-end applications, we suggest setting up a proxy server that stores the access token.

### Requesting third-party inference providers
### Third-party inference providers

You can request inference from third-party providers with the inference client.
You can send inference requests to third-party providers with the inference client.

Currently, we support the following providers: [Fal.ai](https://fal.ai), [Replicate](https://replicate.com), [Together](https://together.xyz) and [Sambanova](https://sambanova.ai).

To make request to a third-party provider, you have to pass the `provider` parameter to the inference function. Make sure your request is authenticated with an access token.
To send requests to a third-party provider, you have to pass the `provider` parameter to the inference function. Make sure your request is authenticated with an access token.
```ts
const accessToken = "hf_..."; // Either a HF access token, or an API key from the 3rd party provider (Replicate in this example)
const accessToken = "hf_..."; // Either a HF access token, or an API key from the third-party provider (Replicate in this example)

const client = new HfInference(accessToken);
await client.textToImage({
Expand All @@ -63,14 +63,19 @@ await client.textToImage({
When authenticated with a Hugging Face access token, the request is routed through https://huggingface.co.
When authenticated with a third-party provider key, the request is made directly against that provider's inference API.

Only a subset of models are supported when requesting 3rd party providers. You can check the list of supported models per pipeline tasks here:
Only a subset of models are supported when requesting third-party providers. You can check the list of supported models per pipeline tasks here:
- [Fal.ai supported models](./src/providers/fal-ai.ts)
- [Replicate supported models](./src/providers/replicate.ts)
- [Sambanova supported models](./src/providers/sambanova.ts)
- [Together supported models](./src/providers/together.ts)
- [HF Inference API (serverless)](https://huggingface.co/models?inference=warm&sort=trending)

#### Tree-shaking
**Important note:** To be compatible, the third-party API must adhere to the "standard" shape API we expect on HF model pages for each pipeline task type.
This is not an issue for LLMs as everyone converged on the OpenAI API anyways, but can be more tricky for other tasks like "text-to-image" or "automatic-speech-recognition" where there exists no standard API. Let us know if any help is needed or if we can make things easier for you!

👋**Want to add another provider?** Get in touch if you'd like to add support for another Inference provider, and/or request it on https://huggingface.co/spaces/huggingface/HuggingDiscussions/discussions/49

### Tree-shaking

You can import the functions you need directly from the module instead of using the `HfInference` class.

Expand Down

0 comments on commit c83cc3e

Please sign in to comment.