-
Notifications
You must be signed in to change notification settings - Fork 430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move webhook registration behind feature gate flag #5099
base: main
Are you sure you want to change the base?
Conversation
Hi @bryan-cox. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/ok-to-test |
We use the webhooks to forbid creating resources disabled by feature flags. That's also what CAPI does so I think we should align with that: https://github.com/kubernetes-sigs/cluster-api/blob/be86b82e7e30a844bca141ff8bcdc450b0499549/exp/internal/webhooks/machinepool.go#L168. Does a user still get some kind of error here when they try to create an AzureMachinePool when the MachinePool flag is disabled? This seems fine as long as users do some extra work to ensure those CRDs are not installed at all when the feature flags are disabled, but that would force users to adapt to keep the existing behavior and clusterctl doesn't make that easy. Are you seeing any adverse behavior besides the error message? |
We aren't using AzureMachinePool. Yeah, we are seeing more than just the log message; the CAPZ pod restarts constantly. Here are some additional logs before the pod restarts:
We have the MachinePool feature turned off in our pod deployment:
|
FWIW the machines do get provisioned and join our cluster. The CAPZ pod just consistently restarts. |
CAPA seems to follow this same pattern as well - https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/4507c0bc7371dd44e7b7b719c393f86452be60dd/main.go#L323. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/assign @nojnhuh
This is causing pod restarts and the fix follows an approach similar to CAPA's, so I think it's reasonable.
LGTM label has been added. Git tree hash: 11c709afba588489dedad6cf904e145c3451676a
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5099 +/- ##
=======================================
Coverage 52.98% 52.98%
=======================================
Files 273 273
Lines 29197 29145 -52
=======================================
- Hits 15469 15443 -26
+ Misses 12926 12900 -26
Partials 802 802 ☔ View full report in Codecov by Sentry. |
If they're no longer reachable, can we remove the checks in the webhooks that the corresponding feature gates for resources are enabled? e.g.
|
@nojnhuh - I don't understand. Could you clarify a bit more? Wouldn't those checks still be valid since those are behind a featuregate flag? |
If we only start the webhooks in main.go when the feature gate is enabled, is it possible for the feature gate to be disabled when we're inside the webhook? The feature gates can't be toggled at runtime AFAIK. |
83f3f66
to
2f2e523
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
2f2e523
to
362e74d
Compare
362e74d
to
26db138
Compare
Move webhook registration behind feature gate flags similar to controller registration. Signed-off-by: Bryan Cox <[email protected]>
26db138
to
32f73b1
Compare
/test pull-cluster-api-provider-azure-e2e-aks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
LGTM label has been added. Git tree hash: 802baeabc173b13515eb73bc8556cf322c7e37db
|
Hey @nojnhuh 👋🏻 - if you're good with the changes, can I get a /approve on this PR please? |
Sorry, still working my way back to this. /assign |
^ (and when the AzureMachinePool CRD is installed, as it would be for default CAPZ installations?) @bryan-cox Can you confirm that this is the case? |
@nojnhuh I can try and take a look. The project I'm on doesn't use that CRD in our self-managed setup. |
@nojnhuh - Sorry for the delay, but I tested this yesterday. If I install the AzureMachinePool CRD, there are no errors in the CAPZ pod, machines are provisioned, and join a nodepool in my self-managed setup. However, the issue is still there if the AzureMachinePool CRD isn't installed where the pod restarts continually. |
@nojnhuh Did you have time to take a look at this again? No problem if not and feel free to assign me if so. |
If you could take a look that would be great! /assign @willie-yao |
Looks good to me, although I just wanted a bit of clarification on this statement. Does this issue still exist and how can you reproduce it? Which pod is restarting continually? @bryan-cox @mboersma |
@willie-yao - The issue was still present when I made that comment. I haven't tested it again since. The project I work on in Red Hat uses the self-managed flavor of CAPZ (we BYO cloud infra) with the exception of machine management. We ran into this issue when we were reducing the Azure CRDs we install related to CAPZ. This is related to #5294. We don't use AzureMachinePool and several other CAPZ related CRDs. One could reproduce it using externally managed Azure infra and not installing the AzureMachinePool CRD. The pod that was restarting was the one running the main CAPZ program, manager. |
Just wanted to make sure: This only happens when they try to create an AzureMachinePool when the MachinePool flag is disabled, or just when the MachinePool flag is disabled in general? If it's the former, I think we can go ahead and merge this, otherwise this would probably block the PR. Thoughts? @mboersma |
@willie-yao - The pod restarts consistently when the AzureMachinePool CRD isn't installed. I can't recall completely what the flag status was. Would it be helpful to have a working example of this to go over at one of the office hours? |
Yup that would be great! Thank you so much!! I'll try and reproduce this in the meantime and if so, I'll let you know. Apologies for the hassle and delay on this. |
What type of PR is this?
/kind bug
What this PR does / why we need it:
Move webhook registration behind feature gate flags similar to controller registration.
Without this PR, from a self-managed / externally managed infrastructure perspective, if you want to exclude the CRDs behind the MachinePool and ASOAPI feature flags, you'll get an error because the webhook for them is still registered.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #
Special notes for your reviewer:
TODOs:
Release note: