-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add parameter to customize the number of cinderAPI processes #474
Closed
Closed
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least in nova-operator we try to keep the replica as the unit of scaling of a service with a hardcoded process number. What is the case where a replica based scaling is not a good solution for cinder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were discussions on architectural aspects where you can't scale replicas (because of scheduling constraints), but you still want to tweak this parameter on the same API.
My initial understanding was pretty much the same as yours: use replicas as it represents a building block to scale the API, but to reach the same number of process with a smaller amount of replicas you might need to tune this value [1].
[1] https://issues.redhat.com/browse/OSPRH-10363
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the scheduling constraints that prevents us having many replicas? I believe our anti affinity is soft, so it only tries to spread the pods across worker nodes but does not fail if more pods needs to land on the same worker. I looked at [1] but I don't see the actual scheduling problem described there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moreover if we open up the process config for the user then it is easy to make a nonsensical config with a high process count but a low cpu resource request. I still think that having a hardcoded process count helps defining a meaningful resource request for the our pods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good point. We don't have a way to keep all these parameters consistent and still scale in a way where we keep track of the resource consumption. I'm reconsidering these set of patches because they're not probably solving a problem that is already addressed at k8s level. I see we have a default number of processes:
4
. Do you know if there are better defaults or did you do anything in Nova that can lead us to choose better numbers (if so, which is the metric used in this context?)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In nova-operator we ended up having processes=2 [1] based on the tempest results and troubleshooting of some probe timeouts. I think 4 is equally valid if you see that as a good fit for the steps of scaling. Probably we will gain more insight from the field especially for the cpu and memory resource request. Like I'm just fixing and performance measuring a placement-api timeout / memory consumption spike upstream so I have data that an idle placement-api process needs 100MB but during an allocation candidate query it spikes up to 650MB easily while pegging 1 CPU core. Probably similar data is needed for each service to set up a nice defaults.
[1]https://github.com/openstack-k8s-operators/nova-operator/blob/main/templates/novaapi/config/httpd.conf#L76
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for sharing. We didn't conduct any of this exercise and this is a prerequisite to learn more about Resource consumption and tuning the default parameters. I don't see a particular problem that comes from tempest with
processes=4
as a default, but worth investigating for the future.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think 4 is fine. I would advise against 1 or bigger 4 though. Or maybe even bigger than 4 is OK if the resource request also set accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Deydra71 since you added this to keystone, was there a reason for keystone to be able to customize the processes, instead if bumping the replicas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stuggi
Basically just the Jira conversation -->
https://issues.redhat.com/browse/OSPRH-10363