From b20c5e52fca6297b091ac8ad45c641ed21ca178f Mon Sep 17 00:00:00 2001 From: michaelawyu Date: Thu, 21 Nov 2024 04:16:33 +0800 Subject: [PATCH 1/6] Added the drift detection how-to doc --- docs/howtos/drift-detection.md | 313 +++++++++++++++++++ pkg/controllers/updaterun/controller.go | 5 +- pkg/controllers/updaterun/controller_test.go | 3 +- 3 files changed, 318 insertions(+), 3 deletions(-) create mode 100644 docs/howtos/drift-detection.md diff --git a/docs/howtos/drift-detection.md b/docs/howtos/drift-detection.md new file mode 100644 index 000000000..9541506b1 --- /dev/null +++ b/docs/howtos/drift-detection.md @@ -0,0 +1,313 @@ +# How-to Guide: Enabling Drift Detection in Fleet + +This guide provides an overview on how to enable drift detection in Fleet. This feature can help +developers and admins identify (and act upon) configuration drifts in their Kubernetes system, +which are often brought by temporary fixes, inadvertent changes, and failed automations. + +> Before you begin +> +> The new drift detection experience is currently in preview. Contact the Fleet team for more information on how to have a peek at the experience. +> +> Note that the APIs for the new experience are only available in the Fleet v1beta1 API, not the v1 API. If you do not see the new APIs in command outputs, verify that you are explicitly requesting the v1beta1 API objects, as opposed to the v1 API objects (the default). + +## What is a drift? + +A drift occurs when a non-Fleet agent (e.g., a developer or a controller) makes changes to +a field of a Fleet-managed resource directly on the member cluster side without modifying +the corresponding resource template created on the hub cluster. + +See the steps below for an example; the code assumes that you have a Fleet of two clusters, +`member-1` and `member-2`. + +* Switch to the hub cluster in the preview environment: + + ```sh + kubectl config use-context hub-admin + ``` + +* Create a namespace, `work`, on the hub cluster, with some labels: + + ```sh + kubectl create ns work + kubectl label ns work app=work + kubectl label ns work owner=redfield + ``` + +* Create a CRP object, which places the namespace on all member clusters: + + ```sh + cat < Important: + > + > The presence of drifts will NOT stop Fleet from rolling out newer resource versions. If you choose to edit the resource template on the hub cluster, Fleet will always apply the new resource template in the rollout process, which may also resolve the drift. + +## Comparison options + +One may have found out that the namespace on the member cluster has another drift, the +label `use=hack`, which is not reported in the CRP status by Fleet. This is because by default +Fleet compares only managed fields, i.e., fields that are explicitly specified in the resource +template. If a field is not populated on the hub cluster side, Fleet will not recognize its +presence on the member cluster side as a drift. This allows controllers on the member cluster +side to manage some fields automatically without Fleet's involvement; for example, one might would +like to use an HPA solution to auto-scale Deployments as appropriate and consequently decide not +to include the `.spec.replicas field` in the resource template. + +Fleet recognizes that there might be cases where developers and admins would like to have their +resources look exactly the same across their fleet. If this scenario applies, one might set up +the `comparisonOptions` field in the apply strategy from the `partialComparison` value +(the default) to `fullComparison`: + +```yaml +apiVersion: placement.kubernetes-fleet.io/v1beta1 +kind: ClusterResourcePlacement +metadata: + name: work +spec: + resourceSelectors: + - group: "" + kind: Namespace + version: v1 + labelSelector: + matchLabels: + app: work + policy: + placementType: PickAll + strategy: + applyStrategy: + whenToApply: IfNotDrifted + comparisonOption: fullComparison +``` + +With this setting, Fleet will recognize the presence of any unmanaged fields (i.e., fields that +are present on the member cluster side, but not set on the hub cluster side) as drifts as well. +If anyone adds a field to a Fleet-managed object directly on the member cluster, it would trigger +an apply error, which you can find out about the details the same way as illustrated in the +section above. + +## Summary + +Below is a summary of the synergy between the whenToApply and comparisonOption settings: + +| `whenToApply` setting | `comparisonOption` setting | Drift scenario | Outcome +| -------- | ------- | -------- | ------- | +| `IfNotDrifted` | `partialComparison` | A managed field (i.e., a field that has been explicitly set in the hub cluster resource template) is edited. | Fleet will report an apply error in the status, plus the drift details. | +| `IfNotDrifted` | `partialComparison` | An unmanaged field (i.e., a field that has not been explicitly set in the hub cluster resource template) is edited/added. | N/A; the change is left untouched, and Fleet will ignore it. | +| `IfNotDrifted` | `fullComparison` | Any field is edited/added. | Fleet will report an apply error in the status, plus the drift details. | +| `Always` | `partialComparison` | A managed field (i.e., a field that has been explicitly set in the hub cluster resource template) is edited. | N/A; the change is overwritten shortly. | +| `Always` | `partialComparison` | An unmanaged field (i.e., a field that has not been explicitly set in the hub cluster resource template) is edited/added. | N/A; the change is left untouched, and Fleet will ignore it. | +| `Always` | `fullComparison` | Any field is edited/added. | The change on managed fields will be overwritten shortly; Fleet will report drift details about changes on unmanaged fields, but this is not considered as an apply error. | + + + diff --git a/pkg/controllers/updaterun/controller.go b/pkg/controllers/updaterun/controller.go index d570bf399..27d7a3bf5 100644 --- a/pkg/controllers/updaterun/controller.go +++ b/pkg/controllers/updaterun/controller.go @@ -11,8 +11,6 @@ import ( "fmt" "time" - "go.goms.io/fleet/pkg/utils" - "go.goms.io/fleet/pkg/utils/controller" "k8s.io/apimachinery/pkg/types" "k8s.io/client-go/tools/record" "k8s.io/client-go/util/workqueue" @@ -26,6 +24,9 @@ import ( "sigs.k8s.io/controller-runtime/pkg/predicate" "sigs.k8s.io/controller-runtime/pkg/reconcile" + "go.goms.io/fleet/pkg/utils" + "go.goms.io/fleet/pkg/utils/controller" + placementv1alpha1 "go.goms.io/fleet/apis/placement/v1alpha1" ) diff --git a/pkg/controllers/updaterun/controller_test.go b/pkg/controllers/updaterun/controller_test.go index bf7d07f53..2fd1b3032 100644 --- a/pkg/controllers/updaterun/controller_test.go +++ b/pkg/controllers/updaterun/controller_test.go @@ -8,11 +8,12 @@ package updaterun import ( "testing" - placementv1alpha1 "go.goms.io/fleet/apis/placement/v1alpha1" "k8s.io/client-go/util/workqueue" "sigs.k8s.io/controller-runtime/pkg/client" "sigs.k8s.io/controller-runtime/pkg/controller/controllertest" "sigs.k8s.io/controller-runtime/pkg/reconcile" + + placementv1alpha1 "go.goms.io/fleet/apis/placement/v1alpha1" ) func TestHandleClusterApprovalRequest(t *testing.T) { From 8f0c259a75a4df2229ccc6d5badc5e0080278f11 Mon Sep 17 00:00:00 2001 From: michaelawyu Date: Thu, 21 Nov 2024 22:42:39 +0800 Subject: [PATCH 2/6] Minor fixes --- docs/howtos/drift-detection.md | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/docs/howtos/drift-detection.md b/docs/howtos/drift-detection.md index 9541506b1..0050272c2 100644 --- a/docs/howtos/drift-detection.md +++ b/docs/howtos/drift-detection.md @@ -44,7 +44,7 @@ See the steps below for an example; the code assumes that you have a Fleet of tw name: work spec: resourceSelectors: - - group: "" + - group: "" kind: Namespace version: v1 # Select all namespaces with the label app=work. @@ -179,7 +179,7 @@ illustrated by the steps below: this setting, Fleet will check for drifts periodically; if drifts are found, Fleet will stop applying the resource templates and report in the CRP status. -* witch to the first member cluster and edit the labels for a second time, effectively introducing +* Switch to the first member cluster and edit the labels for a second time, effectively introducing a drift in the system. After it's done, switch back to the hub cluster: ```sh @@ -245,9 +245,13 @@ and its details will be reported in the status of the CRP object: * To fix the drift, consider one of the following options: - * Switch the whenToApply setting back to Always, which will instruct Fleet to overwrite the drifts using values from the hub cluster resource template; or - * Edit the drifted field directly on the member cluster side, so that the value is consistent with that on the hub cluster; Fleet will periodically re-evaluate drifts and should report that no drifts are found soon after. - * Delete the resource from the member cluster. Fleet will then re-apply the resource template and re-create the resource. + * Switch the `whenToApply` setting back to `Always`, which will instruct Fleet to overwrite + the drifts using values from the hub cluster resource template; or + * Edit the drifted field directly on the member cluster side, so that the value is + consistent with that on the hub cluster; Fleet will periodically re-evaluate drifts + and should report that no drifts are found soon after. + * Delete the resource from the member cluster. Fleet will then re-apply the resource + template and re-create the resource. > Important: > @@ -262,7 +266,7 @@ template. If a field is not populated on the hub cluster side, Fleet will not re presence on the member cluster side as a drift. This allows controllers on the member cluster side to manage some fields automatically without Fleet's involvement; for example, one might would like to use an HPA solution to auto-scale Deployments as appropriate and consequently decide not -to include the `.spec.replicas field` in the resource template. +to include the `.spec.replicas` field in the resource template. Fleet recognizes that there might be cases where developers and admins would like to have their resources look exactly the same across their fleet. If this scenario applies, one might set up From cd4f50f212e491cf516268ade6d49b747893e1c4 Mon Sep 17 00:00:00 2001 From: michaelawyu Date: Thu, 21 Nov 2024 22:44:07 +0800 Subject: [PATCH 3/6] Minor fixes --- docs/howtos/drift-detection.md | 78 +++++++++++++++++----------------- 1 file changed, 39 insertions(+), 39 deletions(-) diff --git a/docs/howtos/drift-detection.md b/docs/howtos/drift-detection.md index 0050272c2..68e21de7c 100644 --- a/docs/howtos/drift-detection.md +++ b/docs/howtos/drift-detection.md @@ -41,26 +41,26 @@ See the steps below for an example; the code assumes that you have a Fleet of tw apiVersion: placement.kubernetes-fleet.io/v1beta1 kind: ClusterResourcePlacement metadata: - name: work + name: work spec: - resourceSelectors: - - group: "" - kind: Namespace - version: v1 - # Select all namespaces with the label app=work. - labelSelector: - matchLabels: - app: work - policy: - placementType: PickAll - strategy: - # For simplicity reasons, the CRP is configured to roll out changes to - # all member clusters at once. This is not a setup recommended for production - # use. - type: RollingUpdate - rollingUpdate: - maxUnavailable: 100% - unavailablePeriodSeconds: 1 + resourceSelectors: + - group: "" + kind: Namespace + version: v1 + # Select all namespaces with the label app=work. + labelSelector: + matchLabels: + app: work + policy: + placementType: PickAll + strategy: + # For simplicity reasons, the CRP is configured to roll out changes to + # all member clusters at once. This is not a setup recommended for production + # use. + type: RollingUpdate + rollingUpdate: + maxUnavailable: 100% + unavailablePeriodSeconds: 1 EOF ``` @@ -145,28 +145,28 @@ illustrated by the steps below: apiVersion: placement.kubernetes-fleet.io/v1beta1 kind: ClusterResourcePlacement metadata: - name: work + name: work spec: - resourceSelectors: + resourceSelectors: - group: "" - kind: Namespace - version: v1 - # Select all namespaces with the label app=work. - labelSelector: - matchLabels: - app: work - policy: - placementType: PickAll - strategy: - applyStrategy: - whenToApply: IfNotDrifted - # For simplicity reasons, the CRP is configured to roll out changes to - # all member clusters at once. This is not a setup recommended for production - # use. - type: RollingUpdate - rollingUpdate: - maxUnavailable: 100% - unavailablePeriodSeconds: 1 + kind: Namespace + version: v1 + # Select all namespaces with the label app=work. + labelSelector: + matchLabels: + app: work + policy: + placementType: PickAll + strategy: + applyStrategy: + whenToApply: IfNotDrifted + # For simplicity reasons, the CRP is configured to roll out changes to + # all member clusters at once. This is not a setup recommended for production + # use. + type: RollingUpdate + rollingUpdate: + maxUnavailable: 100% + unavailablePeriodSeconds: 1 EOF ``` From 1d10c730bebe6ea9513af8694e1d52ae92e26e3d Mon Sep 17 00:00:00 2001 From: michaelawyu Date: Thu, 21 Nov 2024 22:46:06 +0800 Subject: [PATCH 4/6] Minor fixes --- docs/howtos/drift-detection.md | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/docs/howtos/drift-detection.md b/docs/howtos/drift-detection.md index 68e21de7c..767d36bcb 100644 --- a/docs/howtos/drift-detection.md +++ b/docs/howtos/drift-detection.md @@ -49,16 +49,16 @@ See the steps below for an example; the code assumes that you have a Fleet of tw version: v1 # Select all namespaces with the label app=work. labelSelector: - matchLabels: + matchLabels: app: work policy: - placementType: PickAll + placementType: PickAll strategy: - # For simplicity reasons, the CRP is configured to roll out changes to - # all member clusters at once. This is not a setup recommended for production - # use. - type: RollingUpdate - rollingUpdate: + # For simplicity reasons, the CRP is configured to roll out changes to + # all member clusters at once. This is not a setup recommended for production + # use. + type: RollingUpdate + rollingUpdate: maxUnavailable: 100% unavailablePeriodSeconds: 1 EOF @@ -153,18 +153,18 @@ illustrated by the steps below: version: v1 # Select all namespaces with the label app=work. labelSelector: - matchLabels: + matchLabels: app: work policy: - placementType: PickAll + placementType: PickAll strategy: - applyStrategy: + applyStrategy: whenToApply: IfNotDrifted - # For simplicity reasons, the CRP is configured to roll out changes to - # all member clusters at once. This is not a setup recommended for production - # use. - type: RollingUpdate - rollingUpdate: + # For simplicity reasons, the CRP is configured to roll out changes to + # all member clusters at once. This is not a setup recommended for production + # use. + type: RollingUpdate + rollingUpdate: maxUnavailable: 100% unavailablePeriodSeconds: 1 EOF From e21bd6d7d0d427a60af98dfefb836dc835da29e8 Mon Sep 17 00:00:00 2001 From: michaelawyu Date: Thu, 21 Nov 2024 22:54:12 +0800 Subject: [PATCH 5/6] Minor fixes --- docs/howtos/drift-detection.md | 10 +++++----- pkg/controllers/updaterun/controller.go | 3 +-- 2 files changed, 6 insertions(+), 7 deletions(-) diff --git a/docs/howtos/drift-detection.md b/docs/howtos/drift-detection.md index 767d36bcb..d8feba875 100644 --- a/docs/howtos/drift-detection.md +++ b/docs/howtos/drift-detection.md @@ -89,7 +89,7 @@ See the steps below for an example; the code assumes that you have a Fleet of tw ``` NAME STATUS AGE LABELS - work-1 Active 91m app=work,owner=redfield,kubernetes.io/metadata.name=work-1 + work Active 91m app=work,owner=redfield,kubernetes.io/metadata.name=work ``` * Anyone with proper access to the member cluster could modify the namespace as they want; @@ -198,12 +198,12 @@ and its details will be reported in the status of the CRP object: # The command above uses JSON paths to query the drift details directly and # uses the jq utility to pretty print the output JSON. # - # If your system does not have jq installed, consider installing it, or drop - # it from the command. + # jq might not be available in your environment. You may have to install it + # seperately, or omit it from the command. # # If the output is empty, the status might have not been populated properly - # yet. You can switch the output type from jsonpath to yaml to see the full - # object. + # yet. Retry in a few seconds; you may also want to switch the output type + # from jsonpath to yaml to see the full object. ``` The output should look like this: diff --git a/pkg/controllers/updaterun/controller.go b/pkg/controllers/updaterun/controller.go index 27d7a3bf5..43fe30c4b 100644 --- a/pkg/controllers/updaterun/controller.go +++ b/pkg/controllers/updaterun/controller.go @@ -24,10 +24,9 @@ import ( "sigs.k8s.io/controller-runtime/pkg/predicate" "sigs.k8s.io/controller-runtime/pkg/reconcile" + placementv1alpha1 "go.goms.io/fleet/apis/placement/v1alpha1" "go.goms.io/fleet/pkg/utils" "go.goms.io/fleet/pkg/utils/controller" - - placementv1alpha1 "go.goms.io/fleet/apis/placement/v1alpha1" ) // Reconciler reconciles a ClusterStagedUpdateRun object. From a76d626933c054ec4865aff4f0e8e29604657caf Mon Sep 17 00:00:00 2001 From: michaelawyu Date: Thu, 21 Nov 2024 23:01:46 +0800 Subject: [PATCH 6/6] Minor fixes --- docs/howtos/drift-detection.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/howtos/drift-detection.md b/docs/howtos/drift-detection.md index d8feba875..f3cd899cb 100644 --- a/docs/howtos/drift-detection.md +++ b/docs/howtos/drift-detection.md @@ -199,7 +199,7 @@ and its details will be reported in the status of the CRP object: # uses the jq utility to pretty print the output JSON. # # jq might not be available in your environment. You may have to install it - # seperately, or omit it from the command. + # separately, or omit it from the command. # # If the output is empty, the status might have not been populated properly # yet. Retry in a few seconds; you may also want to switch the output type