Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define own WatchErrorHandler for toolscache.Reflector #2987

Open
alexanderstephan opened this issue Oct 18, 2024 · 7 comments
Open

Define own WatchErrorHandler for toolscache.Reflector #2987

alexanderstephan opened this issue Oct 18, 2024 · 7 comments
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@alexanderstephan
Copy link

alexanderstephan commented Oct 18, 2024

client-go sets a default handler for watching errors func DefaultWatchErrorHandler(r *Reflector, err error), see https://github.com/kubernetes/client-go/blob/23900f492969db045e8a36396852381fd189569b/tools/cache/reflector.go#L154. Right now, if it is included in the controller-runtime it will use its logger klog by default. This leads to inconsistent log output by default. Therefore, I propose that the controller-runtime should overwrite this method, i.e., use its own handler.

This handler could look something like this:

func WatchErrorHandler(r *toolscache.Reflector, err error) {
	log := ctrl.Log.WithValues("name", r.Name(), "description", r.TypeDescription())
	switch {
	case apierrors.IsResourceExpired(err) || apierrors.IsGone(err):
		// Don't set LastSyncResourceVersionUnavailable - LIST call with ResourceVersion=RV already
		// has a semantic that it returns data at least as fresh as provided RV.
		// So first try to LIST with setting RV to resource version of last observed object.
		log.Error(err, "watch closed")
	case err == io.EOF:
		// watch closed normally
	case err == io.ErrUnexpectedEOF:
		log.Error(err, "watch closed with unexpected EOF")
	default:
		utilruntime.HandleError(fmt.Errorf("%s: Failed to watch %v: %v", r.Name(), r.TypeDescription(), err))
	}
}

As you can see, we would use a logger with two fields of the reflector now. I exposed these fields recently (see kubernetes/client-go@ce42c29), so the controller-runtime can essentially redefine the default handler of client-go, as the default handler also logs these fields.

I would like to implement this, if it gets your approval.

@sbueringer
Copy link
Member

sbueringer commented Oct 18, 2024

If I see correctly this part is currently refactored upstream: kubernetes/kubernetes#127709

EDIT: Sorry linked the wrong PR, this is the right one: https://github.com/kubernetes/kubernetes/pull/126387/files#diff-9ccdf713e010f73dbebd01e936cb0077fc63e4f5ab941d865ded42da219d84ec

(cc @alvaroaleman)

@troy0820
Copy link
Member

I see your point. When implementing this PR to be able to override the WatchErrorHandler it didn't occur to me that the klog would get in the way of the controller-runtime's logging, etc. Implementing this seems doable and allowing this to be set to the default we provide or the overridden function in the cache options.

However, we run into the problem of trying to stay within parity with upstream not using the upstream default handler. Maintaining that seems like a lot of drift but still doable.

@alexanderstephan
Copy link
Author

Thanks, so my take-away is that this seems reasonable to implement.

@sbueringer So, I should wait until the refactoring is over to avoid merge conflicts? Or, maybe start and use the branch of the PR as basis for now?

@sbueringer
Copy link
Member

sbueringer commented Oct 22, 2024

I could be absolutely misunderstanding this, but I was wondering if the problem still exists if we use the error handler that takes a context (and the logger out of the context).

I would recommend waiting until the PR in upstream is merged and we bumped to a Kubernetes version that has the change. Then we can check if we can simply switch to the error handler that takes a context.

(if they merge the upstream PR we'll have the change in CR directly after the next Kubernetes alpha/beta release which shouldn't take long)

@alexanderstephan
Copy link
Author

Yes, okay seems good, let's wait then. Using the context here would also make sense to me. Also, we don't have this drifting problem that was mentioned before, from my understanding.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 20, 2025
@alexanderstephan
Copy link
Author

@sbueringer Do you have any updates on this whether this issue is still relevant now that I can see that the other PR is merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

5 participants