-
Notifications
You must be signed in to change notification settings - Fork 462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Azure IoT Edge runtime does not handle DPS based provisioning correctly when device is initially rejected - Failed to clear provisioning cache before reprovision #7411
Comments
As @aleupo mentioned, I was working on this issue recently. We receive a As a result, the runtime performs a reprovision immediately after a successful provisioning, which the DPS then rejects with a 400 error. This behavior (os error instead of a simple network error) could be different due to the fact that we implement the runtime using https://github.com/Azure/meta-iotedge for Yocto-Linux. Here is a log showing this behviour: Log
To fix this for our case, i simply patched PatchAgain, i'm not familiar with Rust, so there are probably better ways to achive this.
|
@aleupo The fix mentioned is for offline scenarios. Which doesn't seem to be your case? From speaking with my colleague on this, it appears that what might be happening is a race condition between identityd on startup and edged on failure to provision. With DPS things just work because things respond with 200 in both cases, but that might not be the case with your custom DPS allocator... |
We are using Azure IoT Edge on an Embedded Linux device. For the provisioning of the devices, we are using DPS with X.509 based bootstrap certificates and group enrolments. For our application, the device has to be registered in our backend before it is allowed to connect to the IoT Hub. Depending on the registration state, the DPS will either accept the device (status code 200) or reject it (status code 400). To check the registration state of the devices, we are using the DPS custom allocation function. Rejected devices will try again endlessly until they have been registered and are accepted by the DPS.
For the case that the device is accepted upon first attempt at the DPS, everything works as intended. (I.e. it has been registered in the backend before the first connection attempt)
For the case that the device is first rejected and then accepted later, we are observing faulty behaviour in some cases (seemingly random). After being rejected initially, and then receiving 200 from the DPS, the Edge runtime will go into a faulty state where it does not recover. It mainly revolves around the
edged
reporting the errorFailed to clear provisioning cache before reprovision: No such file or directory
. From our understanding,edged
triggers a reprovisioning atidentityd
, when it should not. Thenidentityd
cannot recover from this.Expected Behavior
Current Behavior
So, after receiving 200 from the DPS, provisioning is triggered again.
Note 1: We can see that the provisioning was successfully initially, because the device will show up in the IoT Hub with the registration id we have provided.
Note 2: After the "200" reply from the DPS, the DPS will respond to further requests with "400" again. You can see it in the logs below. We are currently not sure if this behaviour is correct, it may be the result of how our custom allocation function is implemented. We are investigating this separately. However, it should not be the case that provisioning is re-triggered immediately on the device.
Context
Configuration File
Click here
Output of
iotedge check
Click here
Device Information
Runtime Versions
iotedge version
]: iotedge 1.5.5docker version
]: 20.10.21Logs
iotedge system logs (info level)
Summary:
Additional Information
A colleague has already created a patch wich apparently fixed the issue. He will post here later. However, i want to clarify if this behaviour is actually a bug, or if we have some misconfiguration etc.
I have also attached a log file with debug level: iotedge log debug level.txt
The text was updated successfully, but these errors were encountered: