add HorizontalPodAutoscaler stub to catalog #15

nojnhuh · 2024-10-03T19:11:18Z

This PR adds a HorizontalPodAutoscaler to each of the deployment permutations currently specified by the serving-catalog. It triggers based on a (currently hypothetical) token-latency-ms custom metric.

nojnhuh · 2024-10-03T19:11:58Z

/cc @jackfrancis @raywainman

raywainman

Thanks a lot for putting this together!!

serving-catalog/core/deployment/base/hpa.yaml

serving-catalog/core/deployment/base/kustomization.yaml

nojnhuh · 2024-10-09T15:41:01Z

adding "stub":
/retitle add HorizontalPodAutoscaler stub to catalog

raywainman · 2024-10-09T16:53:55Z

/lgtm

/assign Bslabe123

k8s-ci-robot · 2024-10-09T16:53:58Z

@raywainman: GitHub didn't allow me to assign the following users: Bslabe123.

Note that only kubernetes-sigs members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/lgtm

/assign Bslabe123

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

raywainman · 2024-10-09T16:54:57Z

/assign @Bslabe123

k8s-ci-robot · 2024-10-09T16:55:00Z

@raywainman: GitHub didn't allow me to assign the following users: Bslabe123.

Note that only kubernetes-sigs members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @Bslabe123

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Bslabe123 · 2024-10-09T17:32:25Z

/assign @Bslabe123

Bslabe123 · 2024-10-09T17:34:44Z

/lgtm

k8s-ci-robot · 2024-10-09T17:34:48Z

@Bslabe123: changing LGTM is restricted to collaborators

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

serving-catalog/core/deployment/base/kustomization.yaml

serving-catalog/core/deployment/jetstream/base/hpa.patch.yaml

serving-catalog/core/deployment/jetstream/gemma-7b-it/gke/kustomization.yaml

jjk-g · 2024-10-17T16:01:55Z

serving-catalog/core/deployment/jetstream/llama3-8b/gke/hpa.patch.yaml

@@ -0,0 +1,20 @@
+apiVersion: autoscaling/v2


Thoughts on changing this to be under components/hpa/jetstream/<hpa_target> ? (and same with vllm, etc)

Then the patch in this directory becomes metadata and target averageValue overwrites for the specific deployment.

A part of my assumption is we may have token-latency, throughput, etc different hpa. One may be defined in the deployments' kustomization.yaml as a default, but a user may override with a different hpa component.

Would that kind of structure accommodate adding more servers, metrics, or dimensions later beyond those like the model? Or do you think we might run into a scenario where we end up doing something like adding some components/hpa/*/new-metric for each of 10 different servers that are all basically the same?

I think of components/hpa as a bank of hpa configs dependent on model server/metric. I recommended components because there may be more than one applicable hpa config per model server and model combination, and storing the alternatives somewhere may still be valuable.

Or do you think we might run into a scenario where we end up doing something like adding some components/hpa/*/new-metric for each of 10 different servers that are all basically the same?

I think this is a symptom of model servers defining different metrics. If we are adding hpa config patches to the directory structure, we are having to define hpa patches for 10 different servers.

I agree there is repetition with components as well. We may be able to put a basic hpa.yaml in core/deployment/base and then components just override the metric, and the final patch overrides names and averageValue. Perhaps in the future there are universal metrics and we can add components/hpa/universal?

Sorry, took a while fiddling with this but I think I have it ironed out now.

jackfrancis · 2024-10-28T19:06:01Z

@raywainman @skonto @jjk-g is this ready for merge?

jjk-g · 2024-10-29T15:52:15Z

serving-catalog/core/deployment/vllm/llama3-70b/gke/hpa.patch.yaml

+- op: add
+  path: /metadata/labels
+  value:
+    app: llama3-70b-vllm-inference-server


Not in this PR scope, but name/label overriding feels like one of the more repetitive / fragile patterns with the templating.

would you prefer that each flavor get its own integral name+label-specific stub? what's a suggestion for a more sustainable approach? we can try something out as a follow-up PR across all catalog definitions...

jjk-g · 2024-10-29T15:54:42Z

/lgtm
/approve

thanks for the contribution!

k8s-ci-robot · 2024-10-29T15:54:50Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jjk-g, nojnhuh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~serving-catalog/OWNERS~~ [jjk-g]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 3, 2024

k8s-ci-robot requested review from ahg-g and terrytangyuan October 3, 2024 19:11

k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Oct 3, 2024

k8s-ci-robot requested review from jackfrancis and raywainman October 3, 2024 19:12

raywainman reviewed Oct 8, 2024

View reviewed changes

serving-catalog/core/deployment/base/hpa.yaml Outdated Show resolved Hide resolved

nojnhuh force-pushed the hpa-catalog branch from c2af4d7 to 3133f11 Compare October 8, 2024 21:05

raywainman reviewed Oct 9, 2024

View reviewed changes

serving-catalog/core/deployment/base/kustomization.yaml Outdated Show resolved Hide resolved

nojnhuh force-pushed the hpa-catalog branch from 3133f11 to 5efe829 Compare October 9, 2024 15:40

k8s-ci-robot changed the title ~~add HorizontalPodAutoscaler to catalog~~ add HorizontalPodAutoscaler stub to catalog Oct 9, 2024

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 9, 2024

k8s-ci-robot assigned raywainman Oct 9, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 9, 2024

k8s-ci-robot assigned Bslabe123 Oct 9, 2024

skonto reviewed Oct 10, 2024

View reviewed changes

serving-catalog/core/deployment/base/kustomization.yaml Outdated Show resolved Hide resolved

skonto reviewed Oct 10, 2024

View reviewed changes

serving-catalog/core/deployment/jetstream/base/hpa.patch.yaml Outdated Show resolved Hide resolved

nojnhuh force-pushed the hpa-catalog branch from 5efe829 to 32ff651 Compare October 16, 2024 17:23

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 16, 2024

jjk-g reviewed Oct 17, 2024

View reviewed changes

add HorizontalPodAutoscaler stub to catalog

54ee723

nojnhuh force-pushed the hpa-catalog branch from 32ff651 to 54ee723 Compare October 25, 2024 17:58

jjk-g reviewed Oct 29, 2024

View reviewed changes

k8s-ci-robot assigned jjk-g Oct 29, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 29, 2024

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 29, 2024

k8s-ci-robot merged commit 93745fc into kubernetes-sigs:main Oct 29, 2024
2 checks passed

nojnhuh deleted the hpa-catalog branch November 4, 2024 21:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add HorizontalPodAutoscaler stub to catalog #15

add HorizontalPodAutoscaler stub to catalog #15

nojnhuh commented Oct 3, 2024

nojnhuh commented Oct 3, 2024

raywainman left a comment

nojnhuh commented Oct 9, 2024

raywainman commented Oct 9, 2024

k8s-ci-robot commented Oct 9, 2024

raywainman commented Oct 9, 2024

k8s-ci-robot commented Oct 9, 2024

Bslabe123 commented Oct 9, 2024

Bslabe123 commented Oct 9, 2024

k8s-ci-robot commented Oct 9, 2024

jjk-g Oct 17, 2024

nojnhuh Oct 17, 2024

jjk-g Oct 18, 2024 •

edited

Loading

nojnhuh Oct 25, 2024

jackfrancis commented Oct 28, 2024

jjk-g Oct 29, 2024

jackfrancis Oct 29, 2024

jjk-g commented Oct 29, 2024

k8s-ci-robot commented Oct 29, 2024

add HorizontalPodAutoscaler stub to catalog #15

add HorizontalPodAutoscaler stub to catalog #15

Conversation

nojnhuh commented Oct 3, 2024

nojnhuh commented Oct 3, 2024

raywainman left a comment

Choose a reason for hiding this comment

nojnhuh commented Oct 9, 2024

raywainman commented Oct 9, 2024

k8s-ci-robot commented Oct 9, 2024

raywainman commented Oct 9, 2024

k8s-ci-robot commented Oct 9, 2024

Bslabe123 commented Oct 9, 2024

Bslabe123 commented Oct 9, 2024

k8s-ci-robot commented Oct 9, 2024

jjk-g Oct 17, 2024

Choose a reason for hiding this comment

nojnhuh Oct 17, 2024

Choose a reason for hiding this comment

jjk-g Oct 18, 2024 • edited Loading

Choose a reason for hiding this comment

nojnhuh Oct 25, 2024

Choose a reason for hiding this comment

jackfrancis commented Oct 28, 2024

jjk-g Oct 29, 2024

Choose a reason for hiding this comment

jackfrancis Oct 29, 2024

Choose a reason for hiding this comment

jjk-g commented Oct 29, 2024

k8s-ci-robot commented Oct 29, 2024

jjk-g Oct 18, 2024 •

edited

Loading