-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add HorizontalPodAutoscaler stub to catalog #15
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for putting this together!!
adding "stub": |
/lgtm /assign Bslabe123 |
@raywainman: GitHub didn't allow me to assign the following users: Bslabe123. Note that only kubernetes-sigs members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/assign @Bslabe123 |
@raywainman: GitHub didn't allow me to assign the following users: Bslabe123. Note that only kubernetes-sigs members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/assign @Bslabe123 |
/lgtm |
@Bslabe123: changing LGTM is restricted to collaborators In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
serving-catalog/core/deployment/jetstream/gemma-7b-it/gke/kustomization.yaml
Show resolved
Hide resolved
@@ -0,0 +1,20 @@ | |||
apiVersion: autoscaling/v2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on changing this to be under components/hpa/jetstream/<hpa_target> ? (and same with vllm, etc)
Then the patch in this directory becomes metadata and target averageValue overwrites for the specific deployment.
A part of my assumption is we may have token-latency, throughput, etc different hpa. One may be defined in the deployments' kustomization.yaml as a default, but a user may override with a different hpa component.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would that kind of structure accommodate adding more servers, metrics, or dimensions later beyond those like the model? Or do you think we might run into a scenario where we end up doing something like adding some components/hpa/*/new-metric for each of 10 different servers that are all basically the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think of components/hpa
as a bank of hpa configs dependent on model server/metric. I recommended components because there may be more than one applicable hpa config per model server and model combination, and storing the alternatives somewhere may still be valuable.
Or do you think we might run into a scenario where we end up doing something like adding some components/hpa/*/new-metric for each of 10 different servers that are all basically the same?
I think this is a symptom of model servers defining different metrics. If we are adding hpa config patches to the directory structure, we are having to define hpa patches for 10 different servers.
I agree there is repetition with components as well. We may be able to put a basic hpa.yaml in core/deployment/base
and then components just override the metric, and the final patch overrides names and averageValue. Perhaps in the future there are universal metrics and we can add components/hpa/universal
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, took a while fiddling with this but I think I have it ironed out now.
@raywainman @skonto @jjk-g is this ready for merge? |
- op: add | ||
path: /metadata/labels | ||
value: | ||
app: llama3-70b-vllm-inference-server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not in this PR scope, but name/label overriding feels like one of the more repetitive / fragile patterns with the templating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would you prefer that each flavor get its own integral name+label-specific stub? what's a suggestion for a more sustainable approach? we can try something out as a follow-up PR across all catalog definitions...
/lgtm thanks for the contribution! |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jjk-g, nojnhuh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This PR adds a HorizontalPodAutoscaler to each of the deployment permutations currently specified by the serving-catalog. It triggers based on a (currently hypothetical)
token-latency-ms
custom metric.