-
Notifications
You must be signed in to change notification settings - Fork 804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding Snowflake Provider Support + Distribution Template #351
base: main
Are you sure you want to change the base?
Adding Snowflake Provider Support + Distribution Template #351
Conversation
providers: | ||
inference: remote::snowflake | ||
memory: meta-reference | ||
safety: meta-reference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you able to offer llama guard inference as well instead of relying on meta-reference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@raghotham This is actually desirable. Safety impl should be meta-reference
-- the important question is that there be a Llama-Guard
model available for inference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately we don't have a llama-guard model hosted separately in our LLM service (see here for available models right now: https://docs.snowflake.com/user-guide/snowflake-cortex/llm-functions#availability).
We do use llama-guard in Cortex Guard: https://www.snowflake.com/en/blog/snowflake-cortex-ai-cortex-guard-llm-safeguards/ but it isn't exposed / hosted separately, it is offered as an optional arg with our normal completion serving endpoint. Meaning the Inference API response is filtered with llama-guard on the Cortex server-side if this optional arg is on.
Could we make that work somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ashwinb @raghotham please let me know if you have specific guidance / comments on updating the implementation given the above context, or if we are good to merge for now. In the future i'd like to build out additional components like memory...etc which Snowflake can support as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @ashwinb @raghotham - just circling back here. what do you need from our side to get this merged? thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sfc-gh-alherrera sorry about not responding sooner. We have been working to stabilize the APIs and automate verification for providers. Can you take a look at the latest code and rebase your PR?
Adds support for inference with Snowflake's Cortex endpoint. See docs for more background on Cortex:https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-llm-rest-api
Testing
python -m llama_stack.apis.inference.client localhost