-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix reconcilation problem after restore KamajiControlPlane from backup #83
Fix reconcilation problem after restore KamajiControlPlane from backup #83
Conversation
I'm a bit confused @andreykont, isn't this a problem in Kamaji upstream rather than on the CAPI provider? |
Good day, @prometherion . Yes. I completely agree. Kamaji doesn't restore owner reference of it's CA secret. It's a mistake. But anyway it might take some period of time. I think capi provider has to control such situation (kamaji |
That's correct: https://github.com/clastix/kamaji/blob/eff68db336f3c161447e08a6733344260c10e12a/internal/resources/ca_certificate.go#L92-L110 This snippet shows we're just avoiding rewriting the secret content if the certificates are matching since Velero is not able to restore the Owner Reference.
It's Kubernetes, eventually consistent :) Jokes apart, we can wait for a few iterations before having everything reconciled properly, and I'd say Kamaji resources are our source of truth since the CAPI CP provider is almost dummy.
This is a bit problematic since we started Kamaji completely decoupled from Cluster API, and the CP provider is the one responsible for this, however, we stumbled upon a naming collision afterwards and had to patch in this way. tl;dr; WDYT in fixing this in Kamaji, rather than in CAPI? |
Hello, @prometherion. Let me clarify the problem once more. After restoring kamaji tenant control plane’s secret with cert and key is restored, KamajiControlPlane Provider start reconcilation and adopts the secret changing keys in its map: cluster-api-control-plane-provider-kamaji/controllers/kamajicontrolplane_controller_resources.go Lines 102 to 105 in 346f3ca
Then it updates secret and returns to reconcile process and comes back to check secret once more. cluster-api-control-plane-provider-kamaji/controllers/kamajicontrolplane_controller_resources.go Line 108 in 346f3ca
the secret is not valid for that check: cluster-api-control-plane-provider-kamaji/controllers/kamajicontrolplane_controller_resources.go Line 77 in 346f3ca
and KamajiControlPlaneProvider tries to find ca.crt and ca.key in secret, which it patched during the previous reconcilation loop. As the result it logs errors as it is not able to find in TenantControlPlane secret ca.crt, another solution is to add more checks there cluster-api-control-plane-provider-kamaji/controllers/kamajicontrolplane_controller_resources.go Line 77 in 346f3ca
either by name equals or that secret is already patched for CAPI. I think it should not be patched from the kamaji upstream as it is not aware of kamajiControlPlane provider. Moreover, i think it can be fixed in kamajiControlPlane provider. If we come to CAPI Project to fix that it is going to break their agreement on how to prepare secrets i think. https://cluster-api.sigs.k8s.io/developer/architecture/controllers/cluster.html#secrets |
True the fact Kamaji is not aware of the CAPI CP provider, but Owner/Controller References should be fixed upon first reconciliation, that's the reason why I'm suggesting fixing in Kamaji. Once we have the OwnerReference in place, there's not bug in in the CP provider, if I understand correctly. |
Hello, @prometherion. |
Fixed in clastix/kamaji#427 |
My goal is restore managed cluster from backup.
I use Velero for restore.
First of all I recover etcd database, TenantControlPlane and other generic capi resources.
On last stage I have to recover KamajiControlPlane resource.
KamajiControlPlane gets "Ready" status, but I see error in kamaji control plane controller's log:
2024-02-21T13:54:48Z ERROR Reconciler error {"controller": "kamajicontrolplane", "controllerGroup": "controlplane.cluster.x-k8s.io", "controllerKind": "KamajiControlPlane", "KamajiControlPlane": {"name":"ocean-ocean-avkontya3-jvpinos","namespace":"ocean-ocean-avkontya3-jvpinos"}, "namespace": "ocean-ocean-avkontya3-jvpinos", "name": "ocean-ocean-avkontya3-jvpinos", "reconcileID": "2cece214-b52e-41fc-884a-cac3ebd28bd6", "error": "cannot create or update CA secret: missing Certificate value from *kamajiv1alpha1.TenantControlPlane CA", "errorVerbose": "missing Certificate value from *kamajiv1alpha1.TenantControlPlane CA\ngithub.com/clastix/cluster-api-control-plane-provider-kamaji/controllers.(*KamajiControlPlaneReconciler).createOrUpdateCertificateAuthor
Cause of problem:
In my case kamajiCA and capiCA secret names are equals. Also when I restore kcp with Velero, it doesn't restore secret OwnerReferences. Kamaji control plane detects it like it's own object and delete "ca.crt" and "ca.key" data in kamajiCA secret. On the second iteration it makes error, because ca.crt deleted.
Wanted behavior: KamajiControlPlane controller doesn't control kamajiCA secrets.
How I solved the error:
Our provider doesn't have to control KamajiCA secret.
I replaced condition that check who is owner of CA secret.
Now we control it by secret names.