Retry more aggressively in VssStore. #368

G8XSU · 2024-10-10T00:41:24Z

Since a failed persistence might cause LDK to panic.

TheBlueMatt · 2024-10-10T18:07:07Z

src/io/vss_store.rs

-			.with_max_total_delay(Duration::from_secs(2))
+		let retry_policy = ExponentialBackoffRetryPolicy::new(Duration::from_millis(50))
+			.with_max_attempts(6)
+			.with_max_total_delay(Duration::from_secs(7))


Even seven seconds seems like a really short period before we give up and crash entirely :/

Maybe, 7 was just calculated based on backoff intervals b/w retries for approx 6 attempts.

I wanted to increase it further, but then increasing it too much leads to node being stuck for that duration, so too much is also bad? (difficult to determine, 10 secs?)

Open to other options, we could also remove the limit on max_total_delay and just keep like 8 exponential backoff retries or use naive retry with jitter and fixed delay.

Mhh, so I agree with Matt that we had discussed retrying forever on persistence failure until we get a proper async-persistence interface as we just can't arbitrarily crash when our connectivity drops. I guess a raised max delay is better than nothing, but we probably still want to drop max attempts and max delay limits, or at the very least bump them way, way up?

I did bump them after the previous comment,
now it is 10 attempts and 15 secs max delay.
Let me know if you a have a specific number in mind.

I think intermittently we still want to go the 'forever' route as recently discussed offline. Any case, going to merge this as it's a step in the right direction and tracking it now here: #380

tnull · 2024-10-11T13:30:54Z

Since a failed persistence might cause LDK to panic.

We also discussed to reduce the number of potential panics, e.g., actually using ReplayEvent in event handling now that #358 landed. Should we still do this here or in a follow-up PR?

G8XSU · 2024-10-14T19:44:26Z

We also discussed to reduce the number of potential panics, e.g., actually using ReplayEvent in event handling now that #358 landed. Should we still do this here or in a follow-up PR?

I think those changes are independent of these.
Made changes to ReplayEvents in case of persistence failure, but not entirely sure if they are sufficient/exhaustive for ldk-node : #374

Since a failed persistence might cause LDK to panic.

TheBlueMatt reviewed Oct 10, 2024

View reviewed changes

Retry more aggressively in VssStore.

2c824ff

Since a failed persistence might cause LDK to panic.

G8XSU force-pushed the inc-retries branch from 0697266 to 2c824ff Compare October 15, 2024 20:02

tnull mentioned this pull request Oct 16, 2024

Have VSS persistence retry forever #380

Open

tnull approved these changes Oct 16, 2024

View reviewed changes

tnull merged commit ca6c2fa into lightningdevkit:main Oct 16, 2024
12 of 13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry more aggressively in VssStore. #368

Retry more aggressively in VssStore. #368

G8XSU commented Oct 10, 2024

TheBlueMatt Oct 10, 2024

G8XSU Oct 10, 2024

tnull Oct 16, 2024

G8XSU Oct 16, 2024 •

edited

Loading

tnull Oct 16, 2024

tnull commented Oct 11, 2024

G8XSU commented Oct 14, 2024

Retry more aggressively in VssStore. #368

Retry more aggressively in VssStore. #368

Conversation

G8XSU commented Oct 10, 2024

TheBlueMatt Oct 10, 2024

Choose a reason for hiding this comment

G8XSU Oct 10, 2024

Choose a reason for hiding this comment

tnull Oct 16, 2024

Choose a reason for hiding this comment

G8XSU Oct 16, 2024 • edited Loading

Choose a reason for hiding this comment

tnull Oct 16, 2024

Choose a reason for hiding this comment

tnull commented Oct 11, 2024

G8XSU commented Oct 14, 2024

G8XSU Oct 16, 2024 •

edited

Loading