Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nil Pointer Deference happening when spec.coreTemplate.spec.replicas is not defined in EMQX CRD #1098

Open
Kallepan opened this issue Jan 12, 2025 · 3 comments
Labels
bug Something isn't working

Comments

@Kallepan
Copy link

Kallepan commented Jan 12, 2025

Describe the bug

The EMQX CRD version v2beta1 can cause a nil pointer dereference in the operator controller when updating the Replicas status if spec.coreTemplate.spec.replicas is not defined in the CRD.

Logs

"level":"info","ts":"2025-01-12T17:42:42Z","msg":"Starting workers","controller":"rebalance","controllerGroup":"apps.emqx.io","controllerKind":"Rebalance","worker count":1}
{"level":"info","ts":"2025-01-12T17:42:42Z","msg":"Starting workers","controller":"emqx","controllerGroup":"apps.emqx.io","controllerKind":"EMQX","worker count":1}
{"level":"info","ts":"2025-01-12T17:42:43Z","msg":"Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference","controller":"emqx","controllerGroup":"apps.emqx.io","controllerKind":"EMQX","EMQX":{"name":"emqx","namespace":"emqx"},"namespace":"emqx","name":"emqx","reconcileID":"eaeaed64-f44c-420c-81e2-8e1613bbeb4c"}
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
    panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x177dc84]
goroutine 248 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
    /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:116 +0x1e5
panic({0x195d980?, 0x2c00a90?})
    /usr/local/go/src/runtime/panic.go:770 +0x132
github.com/emqx/emqx-operator/controllers/apps/v2beta1.(*updateStatus).reconcile(0xc00049e020?, {0x1e76b08?, 0xc00098a690?}, {{0xc0001c5830?, 0xc00058cc08?}, 0xc00098a690?}, 0xc00058cc08?, {0x0?, 0x0?})
    /workspace/controllers/apps/v2beta1/update_emqx_status.go:24 +0x44
github.com/emqx/emqx-operator/controllers/apps/v2beta1.(*EMQXReconciler).Reconcile(0xc000429f50, {0x1e76b08, 0xc00098a690}, {{{0xc0009afaac?, 0x5?}, {0xc0009afaa8?, 0xc0006a9d10?}}})
    /workspace/controllers/apps/v2beta1/emqx_controller.go:137 +0x7c3
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x1e7b088?, {0x1e76b08?, 0xc00098a690?}, {{{0xc0009afaac?, 0xb?}, {0xc0009afaa8?, 0x0?}}})
    /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:119 +0xb7
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0002f6960, {0x1e76b40, 0xc0000507d0}, {0x1a0de80, 0xc0000ac4e0})
    /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:316 +0x3bc
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0002f6960, {0x1e76b40, 0xc0000507d0})
    /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266 +0x1be
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
    /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227 +0x79
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 in goroutine 145
    /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:223 +0x50c

To Reproduce

  1. Deploy the EMQX Operator as described in the official docs

  2. Create an EMQX CRD without defining spec.coreTemplate.spec.replicas:

    ---
    apiVersion: apps.emqx.io/v2beta1
    kind: EMQX
    metadata:
      name: emqx
      namespace: emqx
    spec:
      image: emqx:5
      coreTemplate:
        metadata:
          labels:
            app: emqx
  3. Observe that the operator crashes due to a nil pointer dereference.

Expected Behavior
An EMQX instace should be deployed successfully based on the CRD without causing a crash in the operator controller. The operator should handle the absence of replicas gracefully.

The following manifest resolves the issue:

apiVersion: apps.emqx.io/v2beta1
kind: EMQX
metadata:
  name: emqx
  namespace: emqx
spec:
  image: emqx:5
  coreTemplate:
    spec:
        replicas: 1

Additional Information
The issue is likely caused by the lack of a nil check on instance.Spec.CoreTemplate.Spec.Replicas in the controller code, specifically at [this line](

instance.Status.CoreNodesStatus.Replicas = *instance.Spec.CoreTemplate.Spec.Replicas
). A nil pointer dereference occurs when attempting to access the Replicas field if it is not specified.

To fix this:

  • The controller should ensure that instance.Spec.CoreTemplate.Spec.Replicas is checked for nil before dereferencing.
  • Alternatively, an admission webhook could be added to ensure that a valid value for replicas is provided during CRD creation.

Environment Details:

  • Kubernetes version: 1.32.0
  • Cloud provider/provisioner: Self Hosted Kubernetes Cluster
  • EMQX Operator version: 2.2.26
  • Installation method: Helm via CICD
@Kallepan Kallepan added the bug Something isn't working label Jan 12, 2025
@Rory-Z
Copy link
Member

Rory-Z commented Jan 15, 2025

HI @Kallepan the default value of the replicas in .spec.coreTemplate.spce.replicas is 2, so it shouldn't be nil, I used the following file to deploy EMQX is work

apiVersion: apps.emqx.io/v2beta1
kind: EMQX
metadata:
  name: emqx
spec:
  image: emqx:5

Check your YAML file like

apiVersion: apps.emqx.io/v2beta1
kind: EMQX
metadata:
  name: emqx
  namespace: emqx
spec:
  image: emqx:5
  metadata:
    labels:
      app: emqx

Looks have some error, because CRD have not the field of .spec.metadata, could you please offer more information

@Kallepan
Copy link
Author

Kallepan commented Jan 15, 2025

Hi,

my mistake, I forgot to add the key coreTemplate in the example. I have adjusted the example in the issue. Basically this error occurs when coreTemplate is defined in the manifest but replicas is omitted.

@Rory-Z
Copy link
Member

Rory-Z commented Jan 16, 2025

Yes, you are right, but I'm still find the reason, check the default value of CRD by kubectl get crd emqxes.apps.emqx.io -o yaml | yq '.spec.versions[1].schema.openAPIV3Schema.properties.spec.properties.coreTemplate.properties.spec.properties.replicas', can find I already have to set 2 for default, but it's not work in your case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants