-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add parameter to customize the number of cinderAPI processes #474
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: fmount The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@@ -666,6 +666,11 @@ spec: | |||
type: object | |||
transportURLSecret: | |||
type: string | |||
workers: | |||
default: 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ASBishop do you think we need a better default? And also, I'm not sure minimum: 1
is the right thing to do, we might want to start w/ 4
as minimum value, and scale up.
api/v1beta1/cinderapi_types.go
Outdated
// +kubebuilder:default=4 | ||
// +kubebuilder:validation:Minimum=1 | ||
// Workers - Number of processes running CinderAPI | ||
Workers *int32 `json:"workers"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ASBishop as this is a parameter we tune for CinderAPI
, my idea is to keep it under CinderAPITemplate
: it results clear how to access this value from cinder_controller
. Do you think we have a better place to expose this value?
FYI I'm doing the same thing for Manila
, I'd like to keep the two operators in sync.
@ASBishop one interesting thing here is that we have |
form httpd tuning, check this keystone PR openstack-k8s-operators/keystone-operator#500 . we should do the same here |
Signed-off-by: Francesco Pantano <[email protected]>
// +kubebuilder:default=4 | ||
// +kubebuilder:validation:Minimum=1 | ||
// ProcessNumber - Number of processes running in Cinder API | ||
ProcessNumber *int32 `json:"processNumber"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least in nova-operator we try to keep the replica as the unit of scaling of a service with a hardcoded process number. What is the case where a replica based scaling is not a good solution for cinder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were discussions on architectural aspects where you can't scale replicas (because of scheduling constraints), but you still want to tweak this parameter on the same API.
My initial understanding was pretty much the same as yours: use replicas as it represents a building block to scale the API, but to reach the same number of process with a smaller amount of replicas you might need to tune this value [1].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were discussions on architectural aspects where you can't scale replicas (because of scheduling constraints), but you still want to tweak this parameter on the same API.
What are the scheduling constraints that prevents us having many replicas? I believe our anti affinity is soft, so it only tries to spread the pods across worker nodes but does not fail if more pods needs to land on the same worker. I looked at [1] but I don't see the actual scheduling problem described there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moreover if we open up the process config for the user then it is easy to make a nonsensical config with a high process count but a low cpu resource request. I still think that having a hardcoded process count helps defining a meaningful resource request for the our pods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good point. We don't have a way to keep all these parameters consistent and still scale in a way where we keep track of the resource consumption. I'm reconsidering these set of patches because they're not probably solving a problem that is already addressed at k8s level. I see we have a default number of processes: 4
. Do you know if there are better defaults or did you do anything in Nova that can lead us to choose better numbers (if so, which is the metric used in this context?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good point. We don't have a way to keep all these parameters consistent and still scale in a way where we keep track of the resource consumption. I'm reconsidering these set of patches because they're not probably solving a problem that is already addressed at k8s level. I see we have a default number of processes:
4
. Do you know if there are better defaults or did you do anything in Nova that can lead us to choose better numbers (if so, which is the metric used in this context?)
In nova-operator we ended up having processes=2 [1] based on the tempest results and troubleshooting of some probe timeouts. I think 4 is equally valid if you see that as a good fit for the steps of scaling. Probably we will gain more insight from the field especially for the cpu and memory resource request. Like I'm just fixing and performance measuring a placement-api timeout / memory consumption spike upstream so I have data that an idle placement-api process needs 100MB but during an allocation candidate query it spikes up to 650MB easily while pegging 1 CPU core. Probably similar data is needed for each service to set up a nice defaults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for sharing. We didn't conduct any of this exercise and this is a prerequisite to learn more about Resource consumption and tuning the default parameters. I don't see a particular problem that comes from tempest with processes=4
as a default, but worth investigating for the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think 4 is fine. I would advise against 1 or bigger 4 though. Or maybe even bigger than 4 is OK if the resource request also set accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Deydra71 since you added this to keystone, was there a reason for keystone to be able to customize the processes, instead if bumping the replicas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stuggi
Basically just the Jira conversation -->
https://issues.redhat.com/browse/OSPRH-10363
No description provided.