Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Controller preventing rollout restart #858

Open
bmorton opened this issue Dec 2, 2024 · 0 comments
Open

Controller preventing rollout restart #858

bmorton opened this issue Dec 2, 2024 · 0 comments

Comments

@bmorton
Copy link
Contributor

bmorton commented Dec 2, 2024

I've been rebuilding lots of clusters while trying to find a way to fix #708 and it seems like the controller starts reconciling resources whenever I try to restart deployments to experiment with certificate refreshes. It seems to immediately scale down any new ReplicaSet objects created by a rollout restart. If I scale down the temporal-operator controller, I can again restart the rollout as expected. Once I scale the controller back up, it immediately scales down the new ReplicaSet and scales up the old ReplicaSet.

I'll add the steps below to reproduce this. What's the expected behavior here?

Repro steps

  1. I've been using this configuration:
apiVersion: temporal.io/v1beta1
kind: TemporalCluster
metadata:
  name: temporal
  namespace: temporal-repro
spec:
  version: 1.23.0
  numHistoryShards: 8
  persistence:
    defaultStore:
      sql:
        user: temporal
        pluginName: postgres
        databaseName: temporal
        connectAddr: temporal-db-rw:5432
        connectProtocol: tcp
      passwordSecretRef:
        name: temporal-db-credentials
        key: password
    visibilityStore:
      sql:
        user: temporal
        pluginName: postgres
        databaseName: temporal_visibility
        connectAddr: temporal-db-rw:5432
        connectProtocol: tcp
      passwordSecretRef:
        name: temporal-db-credentials
        key: password
  ui:
    enabled: true
  mTLS:
    provider: cert-manager
    internode:
      enabled: true
    frontend:
      enabled: true
    certificatesDuration:
      clientCertificates: 1h0m0s
      frontendCertificate: 1h0m0s
      intermediateCAsCertificates: 1h30m0s
      internodeCertificate: 1h0m0s
      rootCACertificate: 2h0m0s
    renewBefore: 55m0s
    refreshInterval: 1m0s
  admintools:
    enabled: true
  metrics:
    enabled: true
    prometheus:
      listenPort: 9090
  services:
    frontend:
      replicas: 4
    history:
      replicas: 4
    matching:
      replicas: 4
    worker:
      replicas: 4
  1. After the cluster is provisioned, I try to restart one of the services: kubectl -n temporal-repro rollout restart deployment/temporal-frontend

  2. I notice a few new pods come online, but are gone almost immediately as the new ReplicaSet is scaled down to 0. Watching the controller logs, there's immediately entries about reconciling objects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant