Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add renewal for scanner to prevent lease timeout #277

Merged
merged 1 commit into from
Dec 23, 2024

Conversation

tanay-vakharia
Copy link
Contributor

This change adds an option to Scan (Renew) that allows for renewal of scanners during periods of starvation to prevent lease timeout. Currently, we are renewing every lease timeout / 2 seconds (10s currently).

Copy link
Contributor

@cluckett-arista cluckett-arista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't normally work on this, so just a few general nits and a request to run gofmt

integration_test.go Outdated Show resolved Hide resolved
hrpc/scan.go Show resolved Hide resolved
integration_test.go Outdated Show resolved Hide resolved
@tanay-vakharia tanay-vakharia force-pushed the renewScanners branch 2 times, most recently from ef48bd3 to 8a503e4 Compare December 4, 2024 19:37
integration_test.go Outdated Show resolved Hide resolved
scanner.go Outdated Show resolved Hide resolved
scanner.go Outdated Show resolved Hide resolved
scanner.go Outdated Show resolved Hide resolved
scanner.go Outdated Show resolved Hide resolved
scanner.go Outdated Show resolved Hide resolved
scanner.go Outdated Show resolved Hide resolved
scanner.go Outdated Show resolved Hide resolved
scanner.go Outdated Show resolved Hide resolved
scanner.go Outdated Show resolved Hide resolved
scanner.go Outdated Show resolved Hide resolved
hrpc/scan.go Outdated Show resolved Hide resolved
@tanay-vakharia tanay-vakharia force-pushed the renewScanners branch 3 times, most recently from 67a91e0 to 8df1eda Compare December 6, 2024 22:39
scanner.go Outdated Show resolved Hide resolved
integration_test.go Outdated Show resolved Hide resolved
scanner.go Outdated
panic("Renew timer was not stopped before starting again")
}
if s.rpc.RenewInterval() > 0 {
s.renewTimer = time.AfterFunc(s.rpc.RenewInterval(), s.renewScanner)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A different way to do this would be to start a goroutine that sleeps on a ticker and context.

renewCtx, cancel := context.WithCancel(s.Context())
s.renewCancel = cancel
go s.renewLoop(renewCtx)

And renewLoop would look like:

func (s *scanner) renewLoop() {
    t := time.Ticker(s.rpc.RenewInterval()
    defer t.Stop()
    for {
        select {
        case <-t.C:
             if err := s.renew(); err != nil {
                    return
              }
        case <-ctx.Done():
            return
        }
    }
}

The advantage here is that the renewLoop routine will clean itself up immediately after the context is canceled and doesn't have to wait for the timer to go off to notice the context is done. And I think it's slightly simpler to have a loop rather than renewScanner() needing to start a new renew timer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another advantage here is that it provides a place for us to add a metric to track the number of live renewals. It can be done in a follow on change, but we will want to add a metric that increments when this function starts and decrements when it exits. That way we can track if we are leaking scanners.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this definitely cleans up the code and makes it easier to add metrics in the future. I've implemented this, feel free to take a look!

scanner.go Outdated Show resolved Hide resolved
scanner.go Outdated Show resolved Hide resolved
scanner.go Outdated Show resolved Hide resolved
@timoha
Copy link
Collaborator

timoha commented Dec 7, 2024

Just my 2c:
What's preventing scanning go routine to keep scanning and storing in a buffer end purposefully renewing the lease? I think the lease should only be renewed purposefully if you are actively scanning data.

Also, please make sure to not have this behavior by default as it might be unexpected for some users, as now their "periods of starvation" in client layer will cause leaked snanners within regionservers.

@aaronbee
Copy link
Collaborator

aaronbee commented Dec 7, 2024

Just my 2c: What's preventing scanning go routine to keep scanning and storing in a buffer end purposefully renewing the lease? I think the lease should only be renewed purposefully if you are actively scanning data.

Also, please make sure to not have this behavior by default as it might be unexpected for some users, as now their "periods of starvation" in client layer will cause leaked snanners within regionservers.

Hey Andrey!
Yeah, I agree this shouldn't be enabled by default inside gohbase.
We have some needs for keeping a long lived scanner running. Buffering extra data is not great because of the extra memory use.

@timoha
Copy link
Collaborator

timoha commented Dec 7, 2024

Hey Aaron!
Got it, it makes sense as an option. Just another alternative I'd suggest in those cases is just increasing lease timeout (server or client config). I would expect that eventually you would have to timeout anyway with your scan

@tanay-vakharia tanay-vakharia force-pushed the renewScanners branch 3 times, most recently from e00cce6 to 440f87c Compare December 10, 2024 19:35
@aaronbee aaronbee force-pushed the renewScanners branch 4 times, most recently from 13132ce to 7a09f93 Compare December 23, 2024 21:51
@aaronbee aaronbee merged commit 7abcffa into tsuna:master Dec 23, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants