Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

edgeHub crashes on reboot, device goes into zombie state #7398

Open
codeputer opened this issue Nov 25, 2024 · 3 comments
Open

edgeHub crashes on reboot, device goes into zombie state #7398

codeputer opened this issue Nov 25, 2024 · 3 comments
Assignees

Comments

@codeputer
Copy link

Expected Behavior

Raspberry PI running iotedge, does not hang when device is asked to reboot

Current Behavior

issuing an "sudo reboot" command causes the edgeHub module to throw an exception on shutdown. This causes the device to hang, and requires a power cycle to clear. Power Cycling my devices in Africa is not cool.

Steps to Reproduce

Provide a detailed set of steps to reproduce the bug.

  1. Execute my configuration for a custom module to run
  2. Created a crontab job to reboot every 24 hours (may not be a great idea)
  3. device fails to shutdown (not 100% is consistent, but it does occur often)
  4. device is power cycled
  5. log of the iotEdge device show the exception

Context (Environment)

Raspberry zero 2 (x64) for both os and iotEdge

##workaround
'sudo iotedge system stop' before issuing the 'sudo reboot' command

Output of iotedge check

GFSAdmin001@GFS0000000315:~ $ sudo iotedge check --verbose

Configuration checks (aziot-identity-service)

√ keyd configuration is well-formed - OK
√ certd configuration is well-formed - OK
√ tpmd configuration is well-formed - OK
√ identityd configuration is well-formed - OK
√ daemon configurations up-to-date with config.toml - OK
√ identityd config toml file specifies a valid hostname - OK
√ aziot-identity-service package is up-to-date - OK
√ host time is close to reference time - OK
√ preloaded certificates are valid - OK
√ keyd is running - OK
√ certd is running - OK
√ identityd is running - OK
√ read all preloaded certificates from the Certificates Service - OK
√ read all preloaded key pairs from the Keys Service - OK
√ check all EST server URLs utilize HTTPS - OK
√ ensure all preloaded certificates match preloaded private keys with the same ID - OK

Connectivity checks (aziot-identity-service)

√ host can connect to and perform TLS handshake with iothub AMQP port - OK
√ host can connect to and perform TLS handshake with iothub HTTPS / WebSockets port - OK
√ host can connect to and perform TLS handshake with iothub MQTT port - OK

Configuration checks

√ aziot-edged configuration is well-formed - OK
√ configuration up-to-date with config.toml - OK
√ container engine is installed and functional - OK
√ configuration has correct URIs for daemon mgmt endpoint - OK
√ aziot-edge package is up-to-date - OK
× container time is close to host time - Error
Detected time drift between host and container
caused by: Detected time drift between host and container (NOTE: this happens only occasionally and is corrected when sync occurs)support_bundle_2024_11_25_18_34_29_UTC.zip

√ DNS server - OK
√ production readiness: logs policy - OK
√ production readiness: Edge Agent's storage directory is persisted on the host filesystem - OK
√ production readiness: Edge Hub's storage directory is persisted on the host filesystem - OK
√ Agent image is valid and can be pulled from upstream - OK
√ proxy settings are consistent in aziot-edged, aziot-identityd, moby daemon and config.toml - OK

Connectivity checks

√ container on the default network can connect to upstream AMQP port - OK
√ container on the default network can connect to upstream HTTPS / WebSockets port - OK
√ container on the default network can connect to upstream MQTT port - OK
skipping because of not required in this configuration
√ container on the IoT Edge module network can connect to upstream AMQP port - OK
√ container on the IoT Edge module network can connect to upstream HTTPS / WebSockets port - OK
√ container on the IoT Edge module network can connect to upstream MQTT port - OK
skipping because of not required in this configuration
34 check(s) succeeded.
1 check(s) raised errors.
2 check(s) were skipped due to errors from other checks. (NOTE: can't seem to remove this error)

Click here
<6> 2024-11-24 16:00:12.478 +00:00 [INF] - Termination requested, initiating shutdown.
<6> 2024-11-24 16:00:12.556 +00:00 [INF] - Stopping the protocol heads...
<6> 2024-11-24 16:00:12.665 +00:00 [INF] - Closing protocol heads - (MQTT, AMQP, HTTP)
<6> 2024-11-24 16:00:12.768 +00:00 [INF] - Stopping MQTT protocol head
<6> 2024-11-24 16:00:13.247 +00:00 [INF] - Closing HTTP head
<6> 2024-11-24 16:00:13.567 +00:00 [INF] - Waiting for cleanup to finish
<6> 2024-11-24 16:00:13.722 +00:00 [INF] - Closed HTTP head
<3> 2024-11-24 16:00:13.865 +00:00 [ERR] - Shutting down because no response from unix:///var/run/iotedge/workload.sock for Decrypt
System.Net.Sockets.SocketException (111): Connection refused
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.CreateException(SocketError error, Boolean forAsyncThrow)
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ConnectAsync(Socket socket)
   at System.Net.Sockets.Socket.ConnectAsync(EndPoint remoteEP, CancellationToken cancellationToken)
   at System.Net.Sockets.Socket.ConnectAsync(EndPoint remoteEP)
   at Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken) in /mnt/vss/_work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/uds/HttpUdsMessageHandler.cs:line 30
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine)
   at Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpMessageInvoker.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine)
   at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
   at Microsoft.Azure.Devices.Edge.Util.Edged.Version_2019_01_30.GeneratedCode.HttpWorkloadClient.DecryptAsync(String api_version, String name, String genid, DecryptRequest payload, CancellationToken cancellationToken) in /mnt/vss/_work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/edged/version_2019_01_30/generatedCode/HttpWorkloadClient.cs:line 352
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine)
   at Microsoft.Azure.Devices.Edge.Util.Edged.Version_2019_01_30.GeneratedCode.HttpWorkloadClient.DecryptAsync(String api_version, String name, String genid, DecryptRequest payload, CancellationToken cancellationToken)
   at Microsoft.Azure.Devices.Edge.Util.Edged.Version_2019_01_30.GeneratedCode.HttpWorkloadClient.DecryptAsync(String api_version, String name, String genid, DecryptRequest payload) in /mnt/vss/_work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/edged/version_2019_01_30/generatedCode/HttpWorkloadClient.cs:line 308
   at Microsoft.Azure.Devices.Edge.Util.Edged.Version_2019_01_30.WorkloadClient.<>c__DisplayClass6_0.<DecryptAsync>b__0() in /mnt/vss/_work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/edged/version_2019_01_30/WorkloadClient.cs:line 81
   at Microsoft.Azure.Devices.Edge.Util.TransientFaultHandling.AsyncExecution`1.ExecuteAsyncImpl(Task ignore) in /mnt/vss/_work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/transientFaultHandling/AsyncExecution[T].cs:line 64
   at Microsoft.Azure.Devices.Edge.Util.TransientFaultHandling.AsyncExecution`1.ExecuteAsync() in /mnt/vss/_work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/transientFaultHandling/AsyncExecution[T].cs:line 43
   at Microsoft.Azure.Devices.Edge.Util.TransientFaultHandling.RetryPolicy.ExecuteAsync[TResult](Func`1 taskFunc, CancellationToken cancellationToken) in /mnt/vss/_work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/transientFaultHandling/RetryPolicy.cs:line 254
   at Microsoft.Azure.Devices.Edge.Util.TransientFaultHandling.RetryPolicy.ExecuteAsync[TResult](Func`1 taskFunc) in /mnt/vss/_work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/transientFaultHandling/RetryPolicy.cs:line 233
   at Microsoft.Azure.Devices.Edge.Util.Edged.WorkloadClientVersioned.ExecuteWithRetry[T](Func`1 func, Action`1 onRetry, ITransientErrorDetectionStrategy transientErrorDetection) in /mnt/vss/_work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/edged/WorkloadClientVersioned.cs:line 89
   at Microsoft.Azure.Devices.Edge.Util.Edged.WorkloadClientVersioned.Execute[T](Func`1 func, String operation)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine)
   at Microsoft.Azure.Devices.Edge.Util.Edged.WorkloadClientVersioned.Execute[T](Func`1 func, String operation)
   at Microsoft.Azure.Devices.Edge.Util.Edged.Version_2019_01_30.WorkloadClient.DecryptAsync(String initializationVector, String encryptedText) in /mnt/vss/_work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/edged/version_2019_01_30/WorkloadClient.cs:line 81
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine)
   at Microsoft.Azure.Devices.Edge.Util.Edged.Version_2019_01_30.WorkloadClient.DecryptAsync(String initializationVector, String encryptedText)
   at Microsoft.Azure.Devices.Edge.Util.Edged.WorkloadClient.DecryptAsync(String initializationVector, String encryptedText) in /mnt/vss/_work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/edged/WorkloadClient.cs:line 30
   at Microsoft.Azure.Devices.Edge.Util.Edged.EncryptionProvider.DecryptAsync(String encryptedText) in /mnt/vss/_work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/edged/EncryptionProvider.cs:line 41
   at Microsoft.Azure.Devices.Edge.Storage.UpdatableEncryptedStore`2.DecryptValue(String value) in /mnt/vss/_work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Storage/UpdatableEncryptedStore.cs:line 46
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine)
   at Microsoft.Azure.Devices.Edge.Storage.UpdatableEncryptedStore`2.DecryptValue(String value)
   at Microsoft.Azure.Devices.Edge.Storage.EncryptedStore`2.<Get>b__14_0(String e) in /mnt/vss/_work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Storage/EncryptedStore.cs:line 50
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine)
   at Microsoft.Azure.Devices.Edge.Storage.EncryptedStore`2.<Get>b__14_0(String e)
   at Microsoft.Azure.Devices.Edge.Util.Option`1.Map[TResult](Func`2 mapping) in /mnt/vss/_work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/Option.cs:line 177
   at Microsoft.Azure.Devices.Edge.Storage.EncryptedStore`2.Get(TK key, CancellationToken cancellationToken) in /mnt/vss/_work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Storage/EncryptedStore.cs:line 47
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.ExecutionContextCallback(Object s)
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext(Thread threadPoolThread)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext()
   at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(IAsyncStateMachineBox box, Boolean allowInlining)
   at System.Threading.Tasks.Task.RunContinuations(Object continuationObject)
   at System.Threading.Tasks.Task`1.TrySetResult(TResult result)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetExistingTaskResult(Task`1 task, TResult result)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetResult(TResult result)
   at Microsoft.Azure.Devices.Edge.Util.TaskEx.TimeoutAfter[T](Task`1 task, TimeSpan timeout, Action action) in /mnt/vss/_work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/TaskEx.cs:line 137
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.ExecutionContextCallback(Object s)
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext(Thread threadPoolThread)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext()
   at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(IAsyncStateMachineBox box, Boolean allowInlining)
   at System.Threading.Tasks.Task.RunContinuations(Object continuationObject)
   at System.Threading.Tasks.Task`1.TrySetResult(TResult result)
   at System.Threading.Tasks.Task.TwoTaskWhenAnyPromise`1.Invoke(Task completingTask)
   at System.Threading.Tasks.Task.RunContinuations(Object continuationObject)
   at System.Threading.Tasks.Task`1.TrySetResult(TResult result)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetExistingTaskResult(Task`1 task, TResult result)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetResult(TResult result)
   at Microsoft.Azure.Devices.Edge.Storage.KeyValueStoreMapper`4.Get(TK key, CancellationToken cancellationToken) in /mnt/vss/_work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Storage/KeyValueStoreMapper.cs:line 51
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.ExecutionContextCallback(Object s)
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext(Thread threadPoolThread)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.ExecuteFromThreadPool(Thread threadPoolThread)
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
--- End of stack trace from previous location ---
   at Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken) in /mnt/vss/_work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/uds/HttpUdsMessageHandler.cs:line 30
   at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
   at Microsoft.Azure.Devices.Edge.Util.Edged.Version_2019_01_30.GeneratedCode.HttpWorkloadClient.DecryptAsync(String api_version, String name, String genid, DecryptRequest payload, CancellationToken cancellationToken) in /mnt/vss/_work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/edged/version_2019_01_30/generatedCode/HttpWorkloadClient.cs:line 352
   at Microsoft.Azure.Devices.Edge.Util.TaskEx.TimeoutAfter[T](Task`1 task, TimeSpan timeout, Action action) in /mnt/vss/_work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/TaskEx.cs:line 135
   at Microsoft.Azure.Devices.Edge.Util.Edged.WorkloadClientVersioned.Execute[T](Func`1 func, String operation) in /mnt/vss/_work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/edged/WorkloadClientVersioned.cs:line 61
2024-11-24 16:02:06  Starting Edge Hub
2024-11-24 16:02:06  Starting Edge Hub
2024-11-24 16:02:06  Changing ownership of storage folder: /tmp/edgeHub/edgeHub to 13623
2024-11-24 16:02:06  Changing ownership of backup folder: /tmp/edgeHub_backup to 13623
2024-11-24 16:02:07.406 +00:00 Edge Hub Main()
<3> 2024-11-24 16:02:10.301 +00:00 [ERR] - Shutting down because no response from unix:///var/run/iotedge/workload.sock for CreateServerCertificateAsync
System.Net.Sockets.SocketException (111): Connection refused
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.CreateException(SocketError error, Boolean forAsyncThrow)
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ConnectAsync(Socket socket)
   at System.Net.Sockets.Socket.ConnectAsync(EndPoint remoteEP, CancellationToken cancellationToken)
   at System.Net.Sockets.Socket.ConnectAsync(EndPoint remoteEP)
   at Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken) in /mnt/vss/_work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/uds/HttpUdsMessageHandler.cs:line 30
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine)
   at Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpMessageInvoker.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine)
   at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
   at Microsoft.Azure.Devices.Edge.Util.Edged.Version_2019_01_30.GeneratedCode.HttpWorkloadClient.CreateServerCertificateAsync(String api_version, String name, String genid, ServerCertificateRequest request, CancellationToken cancellationToken) in /mnt/vss/_work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/edged/version_2019_01_30/generatedCode/HttpWorkloadClient.cs:line 594
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine)
   at Microsoft.Azure.Devices.Edge.Util.Edged.Version_2019_01_30.GeneratedCode.HttpWorkloadClient.CreateServerCertificateAsync(String api_version, String name, String genid, ServerCertificateRequest request, CancellationToken cancellationToken)
   at Microsoft.Azure.Devices.Edge.Util.Edged.Version_2019_01_30.GeneratedCode.HttpWorkloadClient.CreateServerCertificateAsync(String api_version, String name, String genid, ServerCertificateRequest request) in /mnt/vss/_work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/edged/version_2019_01_30/generatedCode/HttpWorkloadClient.cs:line 550
   at Microsoft.Azure.Devices.Edge.Util.Edged.Version_2019_01_30.WorkloadClient.<>c__DisplayClass2_0.<CreateServerCertificateAsync>b__0() in /mnt/vss/_work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/edged/version_2019_01_30/WorkloadClient.cs:line 35
   at Microsoft.Azure.Devices.Edge.Util.TransientFaultHandling.AsyncExecution`1.ExecuteAsyncImpl(Task ignore) in /mnt/vss/_work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/transientFaultHandling/AsyncExecution[T].cs:line 64
   at Microsoft.Azure.Devices.Edge.Util.TransientFaultHandling.AsyncExecution`1.ExecuteAsync() in /mnt/vss/_work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/transientFaultHandling/AsyncExecution[T].cs:line 43
   at Microsoft.Azure.Devices.Edge.Util.TransientFaultHandling.RetryPolicy.ExecuteAsync[TResult](Func`1 taskFunc, CancellationToken cancellationToken) in /mnt/vss/_work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/transientFaultHandling/RetryPolicy.cs:line 254
   at Microsoft.Azure.Devices.Edge.Util.TransientFaultHandling.RetryPolicy.ExecuteAsync[TResult](Func`1 taskFunc) in /mnt/vss/_work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/transientFaultHandling/RetryPolicy.cs:line 233
   at Microsoft.Azure.Devices.Edge.Util.Edged.WorkloadClientVersioned.ExecuteWithRetry[T](Func`1 func, Action`1 onRetry, ITransientErrorDetectionStrategy transientErrorDetection) in /mnt/vss/_work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/edged/WorkloadClientVersioned.cs:line 89
   at Microsoft.Azure.Devices.Edge.Util.Edged.WorkloadClientVersioned.Execute[T](Func`1 func, String operation)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine)
   at Microsoft.Azure.Devices.Edge.Util.Edged.WorkloadClientVersioned.Execute[T](Func`1 func, String operation)
   at Microsoft.Azure.Devices.Edge.Util.Edged.Version_2019_01_30.WorkloadClient.CreateServerCertificateAsync(String hostname, DateTime expiration) in /mnt/vss/_work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/edged/version_2019_01_30/WorkloadClient.cs:line 35
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine)
   at Microsoft.Azure.Devices.Edge.Util.Edged.Version_2019_01_30.WorkloadClient.CreateServerCertificateAsync(String hostname, DateTime expiration)
   at Microsoft.Azure.Devices.Edge.Util.Edged.WorkloadClient.CreateServerCertificateAsync(String hostname, DateTime expiration) in /mnt/vss/_work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/edged/WorkloadClient.cs:line 22
   at Microsoft.Azure.Devices.Edge.Util.CertificateHelper.GetServerCertificatesFromEdgelet(Uri workloadUri, String workloadApiVersion, String workloadClientApiVersion, String moduleId, String moduleGenerationId, String edgeHubHostname, DateTime expiration, ILogger logger) in /mnt/vss/_work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/CertificateHelper.cs:line 257
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine)
   at Microsoft.Azure.Devices.Edge.Util.CertificateHelper.GetServerCertificatesFromEdgelet(Uri workloadUri, String workloadApiVersion, String workloadClientApiVersion, String moduleId, String moduleGenerationId, String edgeHubHostname, DateTime expiration, ILogger logger)
   at Microsoft.Azure.Devices.Edge.Hub.Service.EdgeHubCertificates.LoadAsync(IConfigurationRoot configuration, ILogger logger) in /mnt/vss/_work/1/s/edge-hub/core/src/Microsoft.Azure.Devices.Edge.Hub.Service/EdgeHubCertificates.cs:line 57
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine)
   at Microsoft.Azure.Devices.Edge.Hub.Service.EdgeHubCertificates.LoadAsync(IConfigurationRoot configuration, ILogger logger)
   at Microsoft.Azure.Devices.Edge.Hub.Service.Program.MainAsync(IConfigurationRoot configuration, ILogger logger) in /mnt/vss/_work/1/s/edge-hub/core/src/Microsoft.Azure.Devices.Edge.Hub.Service/Program.cs:line 90
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine)
   at Microsoft.Azure.Devices.Edge.Hub.Service.Program.MainAsync(IConfigurationRoot configuration, ILogger logger)
   at Microsoft.Azure.Devices.Edge.Hub.Service.Program.Main() in /mnt/vss/_work/1/s/edge-hub/core/src/Microsoft.Azure.Devices.Edge.Hub.Service/Program.cs:line 67
--- End of stack trace from previous location ---
   at Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken) in /mnt/vss/_work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/uds/HttpUdsMessageHandler.cs:line 30
   at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
   at Microsoft.Azure.Devices.Edge.Util.Edged.Version_2019_01_30.GeneratedCode.HttpWorkloadClient.CreateServerCertificateAsync(String api_version, String name, String genid, ServerCertificateRequest request, CancellationToken cancellationToken) in /mnt/vss/_work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/edged/version_2019_01_30/generatedCode/HttpWorkloadClient.cs:line 594
   at Microsoft.Azure.Devices.Edge.Util.TaskEx.TimeoutAfter[T](Task`1 task, TimeSpan timeout, Action action) in /mnt/vss/_work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/TaskEx.cs:line 135
   at Microsoft.Azure.Devices.Edge.Util.Edged.WorkloadClientVersioned.Execute[T](Func`1 func, String operation) in /mnt/vss/_work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/edged/WorkloadClientVersioned.cs:line 61


Device Information

  • OS: Debian 12 arm64
  • Architecture: arm64
  • Container OS: linux:

Runtime Versions

  • aziot-edged version 1.5.13
  • Edge Agent : 1.5
  • Edge Hub 1.5:
  • Docker/Moby: Docker version 27.0.3-1, build 7d4bcd863a4c863e650eed02a550dfeb98560b83

Note: when using Windows containers on Windows, run docker -H npipe:////./pipe/iotedge_moby_engine version instead

Logs

Attached the support bundel

Additional Information

This issue, for me, caused damaged with a client, as all devices zombied just after 24 hours (new update)

@jlian
Copy link
Member

jlian commented Nov 26, 2024

@vadim-kovalyov please take a look

@codeputer
Copy link
Author

Any progress? We are holding production units to hopefully understand why units are requiring a complete powercycle to restart.

@vadim-kovalyov
Copy link
Contributor

Hey @codeputer, sorry for late response, holiday season... Happy New Year, btw! Sorry to hear that you having this issue and it blocks your production unit.

Unfortunately, it is not clear for me how the EdgeHub container would block the system restart. Edge service (aziot-edged) basically works as pass-through for docker (moby engine) commands. So when you shut down the edge, it will ask docker to stop containers. Docker will send SIGTERM and give a container a grace period (default is 10s) and if container process is not stopped, it will SIGKILL it.

EdgeHub can throw an exception on shutdown, it is ok. But it cannot stay alive after SIGKILL. I can see in the support bundle logs you provided that indeed the edgehub was forcefully killed (SIGKILL) by docker.

docler.txt:58

Nov 25 10:30:28 GFS0000000315 dockerd[581]: time="2024-11-25T10:30:28.798641775-08:00" level=info msg="Container failed to exit within 10s of signal 15 - using the force" container=dfece988d823fb503824cc5a7ff3d77d01239a05a698a1dd247c98f248710862

One suspicions thing is that even if the container is SIGKILL'ed, I don't see something like this in your docker logs

dockerd[1384]: time="2025-01-07T16:54:32.859172116-08:00" level=info msg="Daemon shutdown complete"

So maybe try to query journalctl to see what else is going on with either docker or any other system service.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants