Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance "Helm install failed: context deadline exceeded" errors #50

Open
craigwalton-dsit opened this issue Jan 8, 2025 · 1 comment

Comments

@craigwalton-dsit
Copy link
Collaborator

These are typically caused by:

  • insufficient cluster capacity
  • an error starting the container
  • an error pulling the image

See docs https://k8s-sandbox.ai-safety-institute.org.uk/tips/troubleshooting/#im-seeing-helm-install-context-deadline-exceeded-errors

Oftentimes, the underlying error can be gleaned by viewing the Pod events for example in K9s (:pod, select relevant Pod, d, scroll to bottom). Should we try to extract these events to bundle into the error message, to make the user's life easier?

Some difficulties:

  • By the time the helm install --wait fails, will Helm have automatically removed the resources?
  • There can be many Pods in a release - do we infer which one is problematic based on what the status is?
@craigwalton-dsit
Copy link
Collaborator Author

craigwalton-dsit commented Jan 9, 2025

I don't believe Helm will uninstall the chart if the installation fails unless the --atomic flag is set (which we don't currently do). So the resources should still be around for us to query.

I also tried using the --debug flag to helm install. This tells us which resources are not ready e.g.

ready.go:425: 2025-01-09 12:54:00.254703868 +0000 UTC m=+9.082961258 [debug] StatefulSet is not ready: craig-walton-default/agent-env-foo-default. 0 out of 1 expected pods are ready

but not why they're not ready, which can be cleaned from K9s/kubectl:

Failed to pull image "python:3.12-bookworm-doesnotexist": rpc error: code = NotFound desc = failed to pull and unpack image "docker.io/library/python:3.12-bookworm-doesnotexist": failed to resolve reference "docker.io/library/python:3.12-bookworm-doesnotexist": docker.io/library/python:3.12-bookworm-doesnotexist: not found     

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant