You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This info was migrated from the old internal repo's issues.
Going from StatefulSet -> Job:
we gain the ability to set restartPolicy: never and backoffLimit: 0 which will prevent containers or Pods from restarting
we have to work around any containers which currently rely on restart behaviour during startup (e.g. using a new dependsOn field)
we lose the --wait functionality we got from helm install. We'll have to do something ourselves, maybe as a postInstall helm hook
Going from StatefulSet -> bare Pod:
we gain the ability to set restartPolicy: never which prevents containers from restarting
we have to work around any containers which currently rely on restart behaviour during startup
the --wait functionality of helm install is suboptimal as it will wait for all pods to be "ready" even if they become "failed" until the timeout (which can be hours for large runs)
We could support something a bit like dependsOn from compose.yaml e.g.
We'd implement this by adding an initContainer which uses a service account to wait until the required services are ready. Would involve having to pass a service account to the initContainer. Questions remain as to what timeout behaviour to use.
Other suggestions were to
Expose the restartPolicy to the user and let them control it (would still entail using Jobs)
Could write our own controller, but this is a large undertaking
Use a helm hook (like test) which the Python code can call to periodically check there have been no restarts
We noted that we don't have a good handle on how often this is an issue.
The text was updated successfully, but these errors were encountered:
This is described in the docs https://k8s-sandbox.ai-safety-institute.org.uk/design/limitations/#containers-may-restart - please read this first to make sense of the info below.
This info was migrated from the old internal repo's issues.
restartPolicy: never
andbackoffLimit: 0
which will prevent containers or Pods from restartingdependsOn
field)--wait
functionality we got fromhelm install
. We'll have to do something ourselves, maybe as a postInstall helm hookrestartPolicy: never
which prevents containers from restarting--wait
functionality ofhelm install
is suboptimal as it will wait for all pods to be "ready" even if they become "failed" until the timeout (which can be hours for large runs)We could support something a bit like
dependsOn
from compose.yaml e.g.We'd implement this by adding an initContainer which uses a service account to wait until the required services are ready. Would involve having to pass a service account to the initContainer. Questions remain as to what timeout behaviour to use.
Other suggestions were to
We noted that we don't have a good handle on how often this is an issue.
The text was updated successfully, but these errors were encountered: