-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry more aggressively in VssStore. #368
Conversation
src/io/vss_store.rs
Outdated
.with_max_total_delay(Duration::from_secs(2)) | ||
let retry_policy = ExponentialBackoffRetryPolicy::new(Duration::from_millis(50)) | ||
.with_max_attempts(6) | ||
.with_max_total_delay(Duration::from_secs(7)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even seven seconds seems like a really short period before we give up and crash entirely :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, 7 was just calculated based on backoff intervals b/w retries for approx 6 attempts.
I wanted to increase it further, but then increasing it too much leads to node being stuck for that duration, so too much is also bad? (difficult to determine, 10 secs?)
Open to other options, we could also remove the limit on max_total_delay and just keep like 8 exponential backoff retries or use naive retry with jitter and fixed delay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mhh, so I agree with Matt that we had discussed retrying forever on persistence failure until we get a proper async-persistence interface as we just can't arbitrarily crash when our connectivity drops. I guess a raised max delay is better than nothing, but we probably still want to drop max attempts and max delay limits, or at the very least bump them way, way up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did bump them after the previous comment,
now it is 10 attempts and 15 secs max delay.
Let me know if you a have a specific number in mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think intermittently we still want to go the 'forever' route as recently discussed offline. Any case, going to merge this as it's a step in the right direction and tracking it now here: #380
We also discussed to reduce the number of potential panics, e.g., actually using |
I think those changes are independent of these. |
Since a failed persistence might cause LDK to panic.
Since a failed persistence might cause LDK to panic.