Compatibility with Huggingface Serverless inference #453

haesleinhuepf · 2025-01-04T16:38:16Z

git-bob comment on how to make a new endpoint in this reppsitory compatible with Huggingface's Serverless Inference. Use llama 3.3 70b as default model for this.

If you don't know how to do this, no problem. Just say that and I'll search for example code.

github-actions · 2025-01-04T16:39:25Z

^{This message was generated by git-bob (version: 0.17.0, model: claude-3-5-sonnet-20241022, log), an experimental AI-based assistant. It can make mistakes and has limitations. Check its messages carefully.}

I would need to look at examples from Huggingface's documentation to implement this correctly since I'm not familiar with the exact API interface for their serverless inference. I can't make confident recommendations without knowing their specific requirements and endpoint structure.

Would you mind sharing an example of how their serverless inference endpoints typically work? Then I can help adapt the codebase to make it compatible.

haesleinhuepf · 2025-01-04T16:48:40Z

You find an example how to use the API here: https://github.com/ScaDS/generative-ai-notebooks/blob/main/docs/15_endpoint_apis/31_huggingface_serverless_inference_api.ipynb

git-bob think about this

github-actions · 2025-01-04T16:49:50Z

^{This message was generated by git-bob (version: 0.17.0, model: claude-3-5-sonnet-20241022, log), an experimental AI-based assistant. It can make mistakes and has limitations. Check its messages carefully.}

Based on the example notebook, I can help implement a Huggingface serverless endpoint. Here's a proposed addition to _endpoints.py:

def prompt_huggingface(message: str, model="meta-llama/Llama-2-70b-chat-hf"):
    """A prompt helper function that uses Huggingface's serverless inference API"""
    from huggingface_hub import InferenceClient
    import os
    
    client = InferenceClient(token=os.environ["HF_TOKEN"])
    
    return client.text_generation(
        message,
        model=model,
        temperature=0.7,
        top_p=0.95,
        stop_sequences=["\n\n"]
    )

You'll need to set the HF_TOKEN environment variable with your Huggingface API token to use this.

haesleinhuepf · 2025-01-04T16:53:09Z

Ok, git-bob implement this, add a respective elif-block to the _terminal.py and add appropriate information to the readme.

github-actions bot linked a pull request Jan 4, 2025 that will close this issue

Add Huggingface Serverless Inference API support #454

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compatibility with Huggingface Serverless inference #453

Compatibility with Huggingface Serverless inference #453

haesleinhuepf commented Jan 4, 2025

github-actions bot commented Jan 4, 2025

haesleinhuepf commented Jan 4, 2025

github-actions bot commented Jan 4, 2025

haesleinhuepf commented Jan 4, 2025

Compatibility with Huggingface Serverless inference #453

Compatibility with Huggingface Serverless inference #453

Comments

haesleinhuepf commented Jan 4, 2025

github-actions bot commented Jan 4, 2025

haesleinhuepf commented Jan 4, 2025

github-actions bot commented Jan 4, 2025

haesleinhuepf commented Jan 4, 2025