Add optional support for k8s liveness and rediness probes #30

michaelyaakoby · 2020-08-31T09:41:44Z

See https://docs.spring.io/spring-boot/docs/current/reference/html/production-ready-features.html#production-ready-kubernetes-probes for how its done in spring-boot's actuator.

In high level, pyctuator should support telling k8s when an application/service is ready to serve requests (i.e. readiness probe) and if the application is alive (this is the liveness probe) - this is different from the health status returned by /pyctuator/health for two reasons:

it should return fast, so shouldn't include health checks of external resources
it should reflect the k8s lifecycle as described in actuator's documentation (and in k8s docs of course)

While actuator allow to configure additional checks to these probes, it is suggested that initially we'll provide default implementation that users can choose to use.

The text was updated successfully, but these errors were encountered:

michaelyaakoby · 2020-09-21T07:44:08Z

Since if the Liveness State of an application is broken, Kubernetes will try to solve that problem by restarting the application instance, the “liveness” probe should not depend on health checks for external systems.
Otherwise, if an external system fails (e.g. a database, a Web API, an external cache) triggering the liveness to fail, Kubernetes might restart all application instances and create cascading failures.

Also, since Kubernetes will not route traffic to an instance of the application that it's "readiness state" is unready, checking external systems must be made carefully by the application developers.

The probes should return 200 if they the server is alive/ready and >=400 otherwise (or completely fail to repsonse).

For more details on these probes, see https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/.

Therefore I suggest:

Add two new endpoints under /pyctuator/health/probes', one for readinessand another forliveness`.
By default all they do is returning 200 OK.
Allow to register custom "readiness" checks which are functions that throw an exception to indicate not-ready.

WDYT?

simoneggler · 2021-06-29T11:12:23Z

there's another problem. when the registration with the admin server fails, pyctuator does not serve actuator endpoints at all. this behavior leads to cascading restarts since the readiness probe fails with connection refused.

michaelyaakoby · 2021-06-29T11:50:38Z

I see,
Can you provide steps to reproduce?
Thanks

simoneggler · 2021-06-29T20:43:46Z

point registration_url to an url that does not resolve (e.g. service in k8s cluster with no running pod)
create instance of Pyctuator (with flask)
flask only becomes ready after a timeout of more than two minutess (Failed registering with boot-admin, [Errno 110] Connection timed out)

expected behavoir: exception is thrown much earlier

michaelyaakoby · 2021-06-29T20:58:00Z

Thanks, will look into this.

michaelyaakoby · 2021-06-29T21:00:11Z

Actually, assuming you run your flask application in a k8s pod too, I'm thinking if this might be a behavior of k8s.
Can you try to curl to the same URL from the pod running the flask application? I wonder if you'll get an immediate answer.

simoneggler · 2021-06-29T21:02:31Z

Curl times out after two minutes...

michaelyaakoby · 2021-06-29T21:03:50Z

Yes, I just tested it myself.
Not sure that in this case I can make pyctuator timeout earlier.

simoneggler · 2021-06-29T21:06:09Z

How about connecting in the background? The service may be healthy without admin registration....

michaelyaakoby · 2021-06-29T21:16:04Z

I agree that you don't want to block an application from starting because the monitoring system is down.

Pyctuator is using the builtin http.client.HTTPConnection so it doesnt force you to use any library - whatever will be the solution, we need to maintain this.

Looking at http.client.HTTPConnection, I see there's an optional timeout that we may use, see https://docs.python.org/3/library/http.client.html

I can make this a config parameter with a default that's much lower, maybe 10s.

simoneggler · 2021-06-29T21:17:52Z

That‘d be great, thanks.

michaelyaakoby · 2021-07-12T19:05:05Z

Oh, just noticed we are discussing this hanging within the issue asking k8s probes.
Moving the discussion to #51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add optional support for k8s liveness and rediness probes #30

Add optional support for k8s liveness and rediness probes #30

michaelyaakoby commented Aug 31, 2020

michaelyaakoby commented Sep 21, 2020

simoneggler commented Jun 29, 2021

michaelyaakoby commented Jun 29, 2021

simoneggler commented Jun 29, 2021

michaelyaakoby commented Jun 29, 2021

michaelyaakoby commented Jun 29, 2021

simoneggler commented Jun 29, 2021

michaelyaakoby commented Jun 29, 2021

simoneggler commented Jun 29, 2021

michaelyaakoby commented Jun 29, 2021

simoneggler commented Jun 29, 2021

michaelyaakoby commented Jul 12, 2021

Add optional support for k8s liveness and rediness probes #30

Add optional support for k8s liveness and rediness probes #30

Comments

michaelyaakoby commented Aug 31, 2020

michaelyaakoby commented Sep 21, 2020

simoneggler commented Jun 29, 2021

michaelyaakoby commented Jun 29, 2021

simoneggler commented Jun 29, 2021

michaelyaakoby commented Jun 29, 2021

michaelyaakoby commented Jun 29, 2021

simoneggler commented Jun 29, 2021

michaelyaakoby commented Jun 29, 2021

simoneggler commented Jun 29, 2021

michaelyaakoby commented Jun 29, 2021

simoneggler commented Jun 29, 2021

michaelyaakoby commented Jul 12, 2021