Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add HorizontalPodAutoscaler stub to catalog #15

Merged
merged 1 commit into from
Oct 29, 2024

Conversation

nojnhuh
Copy link
Contributor

@nojnhuh nojnhuh commented Oct 3, 2024

This PR adds a HorizontalPodAutoscaler to each of the deployment permutations currently specified by the serving-catalog. It triggers based on a (currently hypothetical) token-latency-ms custom metric.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 3, 2024
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Oct 3, 2024
@nojnhuh
Copy link
Contributor Author

nojnhuh commented Oct 3, 2024

/cc @jackfrancis @raywainman

Copy link

@raywainman raywainman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for putting this together!!

serving-catalog/core/deployment/base/hpa.yaml Outdated Show resolved Hide resolved
@nojnhuh
Copy link
Contributor Author

nojnhuh commented Oct 9, 2024

adding "stub":
/retitle add HorizontalPodAutoscaler stub to catalog

@k8s-ci-robot k8s-ci-robot changed the title add HorizontalPodAutoscaler to catalog add HorizontalPodAutoscaler stub to catalog Oct 9, 2024
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 9, 2024
@raywainman
Copy link

/lgtm

/assign Bslabe123

@k8s-ci-robot
Copy link
Contributor

@raywainman: GitHub didn't allow me to assign the following users: Bslabe123.

Note that only kubernetes-sigs members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/lgtm

/assign Bslabe123

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 9, 2024
@raywainman
Copy link

/assign @Bslabe123

@k8s-ci-robot
Copy link
Contributor

@raywainman: GitHub didn't allow me to assign the following users: Bslabe123.

Note that only kubernetes-sigs members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @Bslabe123

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@Bslabe123
Copy link
Contributor

/assign @Bslabe123

@Bslabe123
Copy link
Contributor

/lgtm

@k8s-ci-robot
Copy link
Contributor

@Bslabe123: changing LGTM is restricted to collaborators

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 16, 2024
@@ -0,0 +1,20 @@
apiVersion: autoscaling/v2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on changing this to be under components/hpa/jetstream/<hpa_target> ? (and same with vllm, etc)

Then the patch in this directory becomes metadata and target averageValue overwrites for the specific deployment.

A part of my assumption is we may have token-latency, throughput, etc different hpa. One may be defined in the deployments' kustomization.yaml as a default, but a user may override with a different hpa component.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would that kind of structure accommodate adding more servers, metrics, or dimensions later beyond those like the model? Or do you think we might run into a scenario where we end up doing something like adding some components/hpa/*/new-metric for each of 10 different servers that are all basically the same?

Copy link
Contributor

@jjk-g jjk-g Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think of components/hpa as a bank of hpa configs dependent on model server/metric. I recommended components because there may be more than one applicable hpa config per model server and model combination, and storing the alternatives somewhere may still be valuable.

Or do you think we might run into a scenario where we end up doing something like adding some components/hpa/*/new-metric for each of 10 different servers that are all basically the same?

I think this is a symptom of model servers defining different metrics. If we are adding hpa config patches to the directory structure, we are having to define hpa patches for 10 different servers.

I agree there is repetition with components as well. We may be able to put a basic hpa.yaml in core/deployment/base and then components just override the metric, and the final patch overrides names and averageValue. Perhaps in the future there are universal metrics and we can add components/hpa/universal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, took a while fiddling with this but I think I have it ironed out now.

@jackfrancis
Copy link

@raywainman @skonto @jjk-g is this ready for merge?

- op: add
path: /metadata/labels
value:
app: llama3-70b-vllm-inference-server
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in this PR scope, but name/label overriding feels like one of the more repetitive / fragile patterns with the templating.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you prefer that each flavor get its own integral name+label-specific stub? what's a suggestion for a more sustainable approach? we can try something out as a follow-up PR across all catalog definitions...

@jjk-g
Copy link
Contributor

jjk-g commented Oct 29, 2024

/lgtm
/approve

thanks for the contribution!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 29, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jjk-g, nojnhuh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 29, 2024
@k8s-ci-robot k8s-ci-robot merged commit 93745fc into kubernetes-sigs:main Oct 29, 2024
2 checks passed
@nojnhuh nojnhuh deleted the hpa-catalog branch November 4, 2024 21:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants