-
Notifications
You must be signed in to change notification settings - Fork 462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
While offline, edgeHub fails after automatic renewal of certificates #7321
Comments
I've encountered the same issue on my devices. |
Hi @sejonssonr , sorry for the late answer, I started investigating this. I looked at the logs you attached. I see the periodic certificate renewal and a bunch of errors about not able to connect to iot hub. I assume that is expected and caused by being offline. I also see a module connecting (roger-test-240424/dispatcher), although the related upstream connection fails - again, I assume this is expected as the upstream network connection was somehow disabled for this test. The part I am not sure if understand is that the error description is "the edgeHub is stopped and fails to start again ". However, in the logs I see edgeHub restarting every few minutes and based on the logs, it accepts incoming connections. I don't see this part from the log, but the expected behavior is that if that "dispatcher" module sends messages, once EdgeHub gets online again, those will be forwarded. Based on your description "This causes both data loss and complete failure of downstream devices to run configured modules." - this is what does not happen? To repeat I want to clarify:
|
@vipeller Sorry for a late reply. I'm a @sejonssonr colleague.
You can see that edgeHub logs stops at 12:08:51, after one last attempt to start without connectivity.
Messages get lost when edgeHub is down, of course, but the other important issue is that all IoTEdge downstream devices (nested child device) fail to start when the edgeHub on the parent device is down. It looks like, in this case, the Identity Service on child devices fails indefinitely. It's stuck on a restart loop. We restored the system by manually restarting edgeHub on the parent once the connectivity has been restored. |
Hi, is there any news on this? |
@vipeller any update on this one ? |
Any updates? |
Hey folks we are looking into prioritizing a fix. Currently we think it might be on SDK side. The hypothesis is that when edgeAgent is offline, then it makes a call (into the SDK) that get stuck (never returns), and because of that it stops restarting stopped modules. And because edgeHub stops to renew its cert, it stays stopped. But we still need to find some bandwidth on the team to get an isolated repro. Will update soon. |
Just a quick update to say we are tracking this and should be able to line up some bandwidth for more investigation soon. |
[like] Lorenzo Bianchi reacted to your message:
…________________________________
From: John Lian ***@***.***>
Sent: Tuesday, November 26, 2024 7:04:46 PM
To: Azure/iotedge ***@***.***>
Cc: Lorenzo Bianchi ***@***.***>; Manual ***@***.***>
Subject: Re: [Azure/iotedge] While offline, edgeHub fails after automatic renewal of certificates (Issue #7321)
Just a quick update to say we are tracking this and should be able to line up some bandwidth for more investigation soon. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed
Just a quick update to say we are tracking this and should be able to line up some bandwidth for more investigation soon.
—
Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https://github.com/Azure/iotedge/issues/7321*issuecomment-2501715961__;Iw!!LouN9OorEw!X7ZTlQMDlWNs_HTqNklbFYj49chbxIj4gIzgXRLILthNiW98UyKpnurVyv909_HeE9v86_9Y8VhPdAHSHR1wFMEGx_MB8fQ$>, or unsubscribe<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AVRRWETARYMEZE6VXAAELBD2CTA45AVCNFSM6AAAAABKLYEETWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMBRG4YTKOJWGE__;!!LouN9OorEw!X7ZTlQMDlWNs_HTqNklbFYj49chbxIj4gIzgXRLILthNiW98UyKpnurVyv909_HeE9v86_9Y8VhPdAHSHR1wFMEGWOwq_Vw$>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
Our company have a large number of IoT devices that rely on the offline capabilities of IoT Edge.
We recently discovered that devices can run offline at a maximum of ~25 days.
The behaviour seems to be caused by the automatic renewal of device/workload certificates. The renewal interval can be specified by setting the edgeHub environment variable ServerCertificateRenewAfterInMs but maxes out at 25 days(int32.max).
When the certificate is renewed, the edgeHub is stopped and fails to start again if the device is offline. This causes both data loss and complete failure of downstream devices to run configured modules. The edgeHub does not recover when connectivity is restored.
In the documentation found here the following is stated: "While disconnected from IoT Hub, the IoT Edge device, its deployed modules, and any downstream devices can operate indefinitely."
Expected Behavior
IoT Edge modules including the edge hub can operate indefinitely in offline mode
Current Behavior
Edge Hub stops after being offline for ~25 days which causes dataloss at the devices
Steps to Reproduce
Provide a detailed set of steps to reproduce the bug.
[edge_ca]
cert = "file:///etc/pki/tls/certs/<mydeviceca>.full-chain.ca.cert.pem"
pk = "file:///etc/pki/tls/private/<mydeviceca>.key.pem"
Context (Environment)
Output of
iotedge check
Click here
Device Information
Runtime Versions
Logs
aziot-edged logs
[iotedge_system_logs.txt](https://github.com/user-attachments/files/16100719/iotedge_system_logs.txt)edge-agent logs
[edgeAgent_logs.txt](https://github.com/user-attachments/files/16100711/edgeAgent_logs.txt)edge-hub logs
[edgeHub_logs.txt](https://github.com/user-attachments/files/16100674/edgeHub_logs.txt)Additional Information
Logs supplied as files due to max character limit.
The text was updated successfully, but these errors were encountered: