Add parameter to customize the number of cinderAPI processes #474

fmount · 2024-12-05T08:59:14Z

No description provided.

openshift-ci · 2024-12-05T08:59:28Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: fmount
Once this PR has been reviewed and has the lgtm label, please assign abays for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

fmount · 2024-12-05T09:02:31Z

api/bases/cinder.openstack.org_cinderapis.yaml

@@ -666,6 +666,11 @@ spec:
                type: object
              transportURLSecret:
                type: string
+              workers:
+                default: 4


@ASBishop do you think we need a better default? And also, I'm not sure minimum: 1 is the right thing to do, we might want to start w/ 4 as minimum value, and scale up.

fmount · 2024-12-05T09:05:10Z

api/v1beta1/cinderapi_types.go

+	// +kubebuilder:default=4
+	// +kubebuilder:validation:Minimum=1
+	// Workers - Number of processes running CinderAPI
+	Workers *int32 `json:"workers"`


@ASBishop as this is a parameter we tune for CinderAPI, my idea is to keep it under CinderAPITemplate: it results clear how to access this value from cinder_controller. Do you think we have a better place to expose this value?
FYI I'm doing the same thing for Manila, I'd like to keep the two operators in sync.

fmount · 2024-12-05T09:09:52Z

@ASBishop one interesting thing here is that we have apiTimeout as top level parameter, while I think it's specific to CinderAPI.
I'd like to consolidate those parameters into a substruct (httpdTuning or something like that), scoped at CinderAPITemplate level, but this might be part of a follow up patch to keep the focus on the goal of this change.

stuggi · 2024-12-05T09:18:32Z

form httpd tuning, check this keystone PR openstack-k8s-operators/keystone-operator#500 . we should do the same here

Signed-off-by: Francesco Pantano <[email protected]>

gibizer · 2024-12-05T13:25:13Z

api/v1beta1/cinderapi_types.go

+	// +kubebuilder:default=4
+	// +kubebuilder:validation:Minimum=1
+	// ProcessNumber - Number of processes running in Cinder API
+	ProcessNumber *int32 `json:"processNumber"`


At least in nova-operator we try to keep the replica as the unit of scaling of a service with a hardcoded process number. What is the case where a replica based scaling is not a good solution for cinder?

There were discussions on architectural aspects where you can't scale replicas (because of scheduling constraints), but you still want to tweak this parameter on the same API.
My initial understanding was pretty much the same as yours: use replicas as it represents a building block to scale the API, but to reach the same number of process with a smaller amount of replicas you might need to tune this value [1].

[1] https://issues.redhat.com/browse/OSPRH-10363

There were discussions on architectural aspects where you can't scale replicas (because of scheduling constraints), but you still want to tweak this parameter on the same API.

What are the scheduling constraints that prevents us having many replicas? I believe our anti affinity is soft, so it only tries to spread the pods across worker nodes but does not fail if more pods needs to land on the same worker. I looked at [1] but I don't see the actual scheduling problem described there.

moreover if we open up the process config for the user then it is easy to make a nonsensical config with a high process count but a low cpu resource request. I still think that having a hardcoded process count helps defining a meaningful resource request for the our pods.

That is a good point. We don't have a way to keep all these parameters consistent and still scale in a way where we keep track of the resource consumption. I'm reconsidering these set of patches because they're not probably solving a problem that is already addressed at k8s level. I see we have a default number of processes: 4. Do you know if there are better defaults or did you do anything in Nova that can lead us to choose better numbers (if so, which is the metric used in this context?)

That is a good point. We don't have a way to keep all these parameters consistent and still scale in a way where we keep track of the resource consumption. I'm reconsidering these set of patches because they're not probably solving a problem that is already addressed at k8s level. I see we have a default number of processes: 4. Do you know if there are better defaults or did you do anything in Nova that can lead us to choose better numbers (if so, which is the metric used in this context?)

In nova-operator we ended up having processes=2 [1] based on the tempest results and troubleshooting of some probe timeouts. I think 4 is equally valid if you see that as a good fit for the steps of scaling. Probably we will gain more insight from the field especially for the cpu and memory resource request. Like I'm just fixing and performance measuring a placement-api timeout / memory consumption spike upstream so I have data that an idle placement-api process needs 100MB but during an allocation candidate query it spikes up to 650MB easily while pegging 1 CPU core. Probably similar data is needed for each service to set up a nice defaults.

[1]https://github.com/openstack-k8s-operators/nova-operator/blob/main/templates/novaapi/config/httpd.conf#L76

Thank you for sharing. We didn't conduct any of this exercise and this is a prerequisite to learn more about Resource consumption and tuning the default parameters. I don't see a particular problem that comes from tempest with processes=4 as a default, but worth investigating for the future.

Yeah I think 4 is fine. I would advise against 1 or bigger 4 though. Or maybe even bigger than 4 is OK if the resource request also set accordingly.

@Deydra71 since you added this to keystone, was there a reason for keystone to be able to customize the processes, instead if bumping the replicas?

@stuggi
Basically just the Jira conversation -->
https://issues.redhat.com/browse/OSPRH-10363

fmount requested review from eharney and ASBishop December 5, 2024 08:59

fmount added the do-not-merge/hold label Dec 5, 2024

fmount marked this pull request as draft December 5, 2024 08:59

openshift-ci bot requested review from stuggi and viroel December 5, 2024 08:59

openshift-ci bot added the do-not-merge/work-in-progress label Dec 5, 2024

fmount force-pushed the workers branch from 59ffa3a to 7b737cf Compare December 5, 2024 09:00

fmount commented Dec 5, 2024

View reviewed changes

fmount marked this pull request as ready for review December 5, 2024 09:05

openshift-ci bot removed the do-not-merge/work-in-progress label Dec 5, 2024

openshift-ci bot requested a review from frenzyfriday December 5, 2024 09:05

Add parameter to customize the number of cinderAPI processes

6a4a70a

Signed-off-by: Francesco Pantano <[email protected]>

fmount force-pushed the workers branch from 7b737cf to 6a4a70a Compare December 5, 2024 10:40

gibizer reviewed Dec 5, 2024

View reviewed changes

fmount closed this Dec 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add parameter to customize the number of cinderAPI processes #474

Add parameter to customize the number of cinderAPI processes #474

fmount commented Dec 5, 2024

openshift-ci bot commented Dec 5, 2024

fmount Dec 5, 2024

fmount Dec 5, 2024

fmount commented Dec 5, 2024

stuggi commented Dec 5, 2024

gibizer Dec 5, 2024

fmount Dec 5, 2024

gibizer Dec 5, 2024

gibizer Dec 5, 2024

fmount Dec 5, 2024

gibizer Dec 6, 2024

fmount Dec 6, 2024

gibizer Dec 6, 2024

stuggi Dec 6, 2024

Deydra71 Dec 6, 2024

Add parameter to customize the number of cinderAPI processes #474

Add parameter to customize the number of cinderAPI processes #474

Conversation

fmount commented Dec 5, 2024

openshift-ci bot commented Dec 5, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fmount commented Dec 5, 2024

stuggi commented Dec 5, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment