-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: Crash on start if can't connected to bitcoind #1022
Comments
Can I work on this if the issue has not been solved? |
Would checking if a daemon like bitcoind is running after every few seconds, as set in config file work and log an |
There's some design debate here to be had, I guess. Does |
There's a very good reason to do that: to detect misconfiguration early. If these daemons ignored failures then it'd take extra steps to verify that the configuration is correct. However your request is a very valid one and it has a neat solution: use systemd socket activation. The trick is to bind sockets first and then start all services in parallel. Once a service needs to call into another service it just blocks until the other service is ready. You can also postpone start of a service until it's actually needed by something (but that's probably not your problem). However to support socket activation you need the service to be able to reuse an already-bound socket. IOW it requires support from Note also that |
Yeah, the motives are good, but in practice, for a long running service it just means the external service behavior is inconsistent. "detect misconfig. early" matters maybe when the service is first being set up. But this inconsistent behavior is present every time the app starts afterwards.
Ignoring it being OS-dependent, somewhat extra complexity, etc. it just doesn't work e.g. if The inconsistency of behavior when starting and already running makes the "detect misconfiguration" motivation invalid. |
To solve this we'd have to remember what the last configuration was and then compare them. (or their hashes). It smells bad but I'm not sure why.
I think it's safe to assume tests are running locally using regtest and that's where speed matters. Once you deploy it the speed is not really that important because you'll launch it once per several months at most.
How? The configuration cannot be changed run time so one has to restart anyway. |
If bitcoind is unavailable while electrs is already running, electrs will just keep retrying. If it's unavailable when electrs is starting, electrs will fail.
Trying to detect misconfiguration might be well meaning, but just misplaced and misguided, as it introduces weird behavior inconsistency, attempting to achieve something that can't be done right at this level anyway. |
IME changing ports and such which would have a real impact on this is very rare. It basically never happens. Getting initial configuration to work is the "hard" part. I believe the current behavior provides great balance of costs and benefits even if it looks inconsistent. Also if your goal is to start the tests as soon as possible then we should have some way to force electrs retrying connection before a timer expires. Probably using a signal. But it'd be best to check if systemd supports some mechanism to do this and make it compatible. |
hitting this while doing some tests:
if i restart the bitcoind service, electrs will stop on:
it's not a big deal in this case as the electrs service will restard, but look a bit ugly, i'd expected at least a timeout in this case, not a direct stop |
@pythcoiner that doesn't match what @dpc said - that it does retry. Which of you is correct? I do think it should retry if it's already running. |
@Kixunil in my case it does not retry at all directly stop at the instant i |
Describe the bug
The core question is - should a daemon like electrs crash on start if it can't connected to bitcoind?
Note the first timestamp:
20:13.921
the whole test suite started:
timestamp:
20:13.911
bitcoind
spawned in the background earlier, but was available for querying only a few seconds later. But 30ms into the test suite,electrs
already gave up on it.It seems like all Bitcoin daemons we're using are like that: lightningd, lnd, electrs. which makes me wonder - is this some shared design decision, that I never learned, or just a weird coincidence. :D . All three are different languages, different teams etc.
Sure in a real deployment, there always will be some kind of supervisor to restart things, but still... I would expect daemons to never shut down just because they can't connect to another networked service. What's the point, if the supervisor ... is just going to start them again.
The context is: I'm trying to optimize our test suite starting time: letting more things start in parallel, etc. And it would be nice if I could start some daemons around the same time I'm starting
bitcoind
, and not have to postpone everything untilbitcoind
takes a shower, brushes teeth, eats breakfast and is finally ready for work.The text was updated successfully, but these errors were encountered: