Define own WatchErrorHandler for toolscache.Reflector #2987

alexanderstephan · 2024-10-18T08:26:12Z

client-go sets a default handler for watching errors func DefaultWatchErrorHandler(r *Reflector, err error), see https://github.com/kubernetes/client-go/blob/23900f492969db045e8a36396852381fd189569b/tools/cache/reflector.go#L154. Right now, if it is included in the controller-runtime it will use its logger klog by default. This leads to inconsistent log output by default. Therefore, I propose that the controller-runtime should overwrite this method, i.e., use its own handler.

This handler could look something like this:

func WatchErrorHandler(r *toolscache.Reflector, err error) {
	log := ctrl.Log.WithValues("name", r.Name(), "description", r.TypeDescription())
	switch {
	case apierrors.IsResourceExpired(err) || apierrors.IsGone(err):
		// Don't set LastSyncResourceVersionUnavailable - LIST call with ResourceVersion=RV already
		// has a semantic that it returns data at least as fresh as provided RV.
		// So first try to LIST with setting RV to resource version of last observed object.
		log.Error(err, "watch closed")
	case err == io.EOF:
		// watch closed normally
	case err == io.ErrUnexpectedEOF:
		log.Error(err, "watch closed with unexpected EOF")
	default:
		utilruntime.HandleError(fmt.Errorf("%s: Failed to watch %v: %v", r.Name(), r.TypeDescription(), err))
	}
}

As you can see, we would use a logger with two fields of the reflector now. I exposed these fields recently (see kubernetes/client-go@ce42c29), so the controller-runtime can essentially redefine the default handler of client-go, as the default handler also logs these fields.

I would like to implement this, if it gets your approval.

The text was updated successfully, but these errors were encountered:

sbueringer · 2024-10-18T13:23:33Z

If I see correctly this part is currently refactored upstream: kubernetes/kubernetes#127709

EDIT: Sorry linked the wrong PR, this is the right one: https://github.com/kubernetes/kubernetes/pull/126387/files#diff-9ccdf713e010f73dbebd01e936cb0077fc63e4f5ab941d865ded42da219d84ec

(cc @alvaroaleman)

troy0820 · 2024-10-18T15:10:45Z

I see your point. When implementing this PR to be able to override the WatchErrorHandler it didn't occur to me that the klog would get in the way of the controller-runtime's logging, etc. Implementing this seems doable and allowing this to be set to the default we provide or the overridden function in the cache options.

However, we run into the problem of trying to stay within parity with upstream not using the upstream default handler. Maintaining that seems like a lot of drift but still doable.

alexanderstephan · 2024-10-22T14:05:02Z

Thanks, so my take-away is that this seems reasonable to implement.

@sbueringer So, I should wait until the refactoring is over to avoid merge conflicts? Or, maybe start and use the branch of the PR as basis for now?

sbueringer · 2024-10-22T14:10:12Z

I could be absolutely misunderstanding this, but I was wondering if the problem still exists if we use the error handler that takes a context (and the logger out of the context).

I would recommend waiting until the PR in upstream is merged and we bumped to a Kubernetes version that has the change. Then we can check if we can simply switch to the error handler that takes a context.

(if they merge the upstream PR we'll have the change in CR directly after the next Kubernetes alpha/beta release which shouldn't take long)

alexanderstephan · 2024-10-22T14:13:07Z

Yes, okay seems good, let's wait then. Using the context here would also make sense to me. Also, we don't have this drifting problem that was mentioned before, from my understanding.

k8s-triage-robot · 2025-01-20T14:14:11Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

alexanderstephan · 2025-01-20T18:06:52Z

@sbueringer Do you have any updates on this whether this issue is still relevant now that I can see that the other PR is merged?

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define own WatchErrorHandler for toolscache.Reflector #2987

Define own WatchErrorHandler for toolscache.Reflector #2987

alexanderstephan commented Oct 18, 2024 •

edited

Loading

sbueringer commented Oct 18, 2024 •

edited

Loading

troy0820 commented Oct 18, 2024

alexanderstephan commented Oct 22, 2024

sbueringer commented Oct 22, 2024 •

edited

Loading

alexanderstephan commented Oct 22, 2024

k8s-triage-robot commented Jan 20, 2025

alexanderstephan commented Jan 20, 2025

Define own WatchErrorHandler for toolscache.Reflector #2987

Define own WatchErrorHandler for toolscache.Reflector #2987

Comments

alexanderstephan commented Oct 18, 2024 • edited Loading

sbueringer commented Oct 18, 2024 • edited Loading

troy0820 commented Oct 18, 2024

alexanderstephan commented Oct 22, 2024

sbueringer commented Oct 22, 2024 • edited Loading

alexanderstephan commented Oct 22, 2024

k8s-triage-robot commented Jan 20, 2025

alexanderstephan commented Jan 20, 2025

alexanderstephan commented Oct 18, 2024 •

edited

Loading

sbueringer commented Oct 18, 2024 •

edited

Loading

sbueringer commented Oct 22, 2024 •

edited

Loading