Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Snowflake Provider Support + Distribution Template #351

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

sfc-gh-alherrera
Copy link

Adds support for inference with Snowflake's Cortex endpoint. See docs for more background on Cortex:https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-llm-rest-api

Testing

  • built a new stack from scratch using remote::snowflake for the Inference API
  • ran python -m llama_stack.apis.inference.client localhost
  • directly submitted request to the llama stack server
Screenshot 2024-10-31 at 7 20 35 PM Screenshot 2024-10-31 at 7 11 20 PM

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 31, 2024
providers:
inference: remote::snowflake
memory: meta-reference
safety: meta-reference
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you able to offer llama guard inference as well instead of relying on meta-reference?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raghotham This is actually desirable. Safety impl should be meta-reference -- the important question is that there be a Llama-Guard model available for inference.

Copy link
Author

@sfc-gh-alherrera sfc-gh-alherrera Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately we don't have a llama-guard model hosted separately in our LLM service (see here for available models right now: https://docs.snowflake.com/user-guide/snowflake-cortex/llm-functions#availability).

We do use llama-guard in Cortex Guard: https://www.snowflake.com/en/blog/snowflake-cortex-ai-cortex-guard-llm-safeguards/ but it isn't exposed / hosted separately, it is offered as an optional arg with our normal completion serving endpoint. Meaning the Inference API response is filtered with llama-guard on the Cortex server-side if this optional arg is on.

Could we make that work somehow?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashwinb @raghotham please let me know if you have specific guidance / comments on updating the implementation given the above context, or if we are good to merge for now. In the future i'd like to build out additional components like memory...etc which Snowflake can support as well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @ashwinb @raghotham - just circling back here. what do you need from our side to get this merged? thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sfc-gh-alherrera sorry about not responding sooner. We have been working to stabilize the APIs and automate verification for providers. Can you take a look at the latest code and rebase your PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants