Controller preventing rollout restart #858

bmorton · 2024-12-02T07:35:29Z

I've been rebuilding lots of clusters while trying to find a way to fix #708 and it seems like the controller starts reconciling resources whenever I try to restart deployments to experiment with certificate refreshes. It seems to immediately scale down any new ReplicaSet objects created by a rollout restart. If I scale down the temporal-operator controller, I can again restart the rollout as expected. Once I scale the controller back up, it immediately scales down the new ReplicaSet and scales up the old ReplicaSet.

I'll add the steps below to reproduce this. What's the expected behavior here?

Repro steps

I've been using this configuration:

apiVersion: temporal.io/v1beta1
kind: TemporalCluster
metadata:
  name: temporal
  namespace: temporal-repro
spec:
  version: 1.23.0
  numHistoryShards: 8
  persistence:
    defaultStore:
      sql:
        user: temporal
        pluginName: postgres
        databaseName: temporal
        connectAddr: temporal-db-rw:5432
        connectProtocol: tcp
      passwordSecretRef:
        name: temporal-db-credentials
        key: password
    visibilityStore:
      sql:
        user: temporal
        pluginName: postgres
        databaseName: temporal_visibility
        connectAddr: temporal-db-rw:5432
        connectProtocol: tcp
      passwordSecretRef:
        name: temporal-db-credentials
        key: password
  ui:
    enabled: true
  mTLS:
    provider: cert-manager
    internode:
      enabled: true
    frontend:
      enabled: true
    certificatesDuration:
      clientCertificates: 1h0m0s
      frontendCertificate: 1h0m0s
      intermediateCAsCertificates: 1h30m0s
      internodeCertificate: 1h0m0s
      rootCACertificate: 2h0m0s
    renewBefore: 55m0s
    refreshInterval: 1m0s
  admintools:
    enabled: true
  metrics:
    enabled: true
    prometheus:
      listenPort: 9090
  services:
    frontend:
      replicas: 4
    history:
      replicas: 4
    matching:
      replicas: 4
    worker:
      replicas: 4

After the cluster is provisioned, I try to restart one of the services: kubectl -n temporal-repro rollout restart deployment/temporal-frontend
I notice a few new pods come online, but are gone almost immediately as the new ReplicaSet is scaled down to 0. Watching the controller logs, there's immediately entries about reconciling objects.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Controller preventing rollout restart #858

Controller preventing rollout restart #858

bmorton commented Dec 2, 2024

Controller preventing rollout restart #858

Controller preventing rollout restart #858

Comments

bmorton commented Dec 2, 2024

Repro steps