-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Follow up tasks due to the introduction of TaskScheduler #565
Comments
Hi @schlimmchen I really appreciate your valuable feedback. To track your suggestions I create a task list.
|
@schlimmchen do you know when the tasks switch will happen in TaskScheduler loop?. Always after a full loop cycle or also in between? My assumption is (without any knowledge), that it runs at least one complete loop cycle. MQTT callbacks have an internal mutex to handle the complete data. So there might be no problem. |
loop()
paradigm. This has the potential to make the project more efficient and responsive.
Hi @helgeerbe,
and I enjoy contributing to this project and working with you 😊
What task switch? As far as I understand the TaskScheduler, only its tasks will not preempt each other. However, other FreeRTOS threads like the MQTT client and AsyncWebServer will preempt each other and the TaskScheduler scheduler pass. So there are tasks (TaskScheduler tasks) and threads (FreeRTOS tasks). Tasks will run in the context of one thread, while other threads are still around (MQTT client and AsyncWebServer in particular). Threads are still scheduled preemptively by the underlying FreeRTOS. However, your question suggests the idea that TaskScheduler might take care that one scheduler pass is never preempted by other threads. I am quite confident that this is not the case. In particular, this would force TaskScheduler to be aware of the type of OS it is running on and call respective guards or functions. That makes no sense. The TaskScheduler documentation is not explicit about this issue. However, have a look at Are we on the same page now? I think I researched this now to a point where I feel confident to have understood it corretly: We still need to take care to lock FreeRTOS threads against concurrent access of shared data. Related to researching this: tbnobody#1598.
Where? Within the lib? Or within the callbacks in our part of the code? The former I don't know about, the latter would surprise me... |
@schlimmchen Thanks for the deep dive. I hope, that I got the complete picture now. Regarding the mqtt callback, ist should be in the lib. Ahh, it's in the esspressif MQTT (https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/protocols/mqtt.html). This API is could be executed from a user task or from a MQTT event callback i.e. internal MQTT task (API is protected by internal mutex, so it might block if a longer data receive operation is in progress. Too many frameworks. I have to check the our mqtt lib. |
@schlimmchen I tried to find some more information. In MQTT it seems, that the callbacks seems to be thread safe. So on subscribed messages the callbacks wouldn't interrupt themself with an updated message. I found the hint, that received objects should be synchronized with the main process. Checking quickly the message handler of dpl and a battery, I found only switches or limits, but those are atomic data. Do you have an example, where we have to synchronize the instance a complete object, while otherwise changing only single values will corrupt the whole object? |
By grepping for "mqttsettings.subscribe" I found four users:
Assessment:
|
I am not touching the power meter implementation. The only thing we can improve is accessing the map of subscriptions. However, that map is only initialized at boot and is not written afterwards (since the Web App handler reboots the ESP when updating the power meter setting). Making the access to And if I am going to refactor this, it's going to be nuked altogether and the respective power meter implementations will move to their own classes. A whole bunch of problems will be gone, then. Especially the need to reboot the ESP on powermeter settings updates. |
To summarize:
|
Thanks for pointing that out @helgeerbe I probably did not spot the TaskScheduler implementations in the rather long commit and I am still unaware of most of the OpenDTU and onBattery code in general. I just had the hunch that there is a lot more related to this change using the TaskScheduler than what meets the eye. But I think @schlimmchen made a very conclusive review of the necessary changes. Regarding the ESPAsyncWebServer becoming synchronous again I would kind of object, because this was one of the main issues in earlier AhoyDTU and OpenDTU implementations. Parallel requests with loads of response data led to several reboots in the past. I also quickly peeked into the following article to synchronize threads across multiple FreeRTOS "threads" and/or "fibers" Use Both Cores on an ESP32: Easy Synchronization with Esp32SynchronizationContext Maybe this distinction is helpful, as the notion of "fibers" was new to me:
|
Who mentioned that the web server should become synchronous again? That's not on my list at least. Instead, we do need to keep in mind that the web request run in a different thread (not fiber) and hence those callbacks must not corrupt shared data. |
I may have misinterpreted your old comment as you were referencing #350 (comment) some 5 months ago.
|
Oh, wow, your memory is excellent! Well, the async web server still does run in its own thread AFAIK. And in reality this does not cause too much trouble, if any. Going forward, I think we should do the same thing as with the MQTT callbacks: analyze the web callbacks and make sure they operate on thread-safe methods/data or move the execution of the respective actions into the TaskScheduler context. |
So... I ticked of four of the boxes that Helge listed, as they are either obsolete or addressed in the meantime. The remaining ones are "Schedule the loop of [JK BMS|DPL|VictronMpptController]", which is nice to have, but merely exactly that: nice to have. I don't think we should bother changing them and to risk regressions. Then we talked about that the MQTT callbacks and webserver callbacks run asynchronously and whether we need to keep an eye on that or even make then synchronous. We did not, and that was a good call, as the projects runs smoothly. That, however, is because we now about these two tasks and took measures to prevent issues. The power meter refactoring did contribute to that in particular. In the meantime we even gained a write guard for the config such that the webserver callbacks will not corrupt the config. Given that the firmware runs stable (based on the fact that users don't open issues about instability), there is no need to do anything more in this context. |
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new discussion or issue for related concerns. |
@stefan123t They are part of the merge commit 7c11a5a. @helgeerbe did all the work required to migrate all OpenDTU-OnBattery-specific components to TaskScheduler. Have a look at the changes to
src/Battery.cpp
in the aforementioned commit for an example.He did it in the most backwards-compatible way, which makes perfect sense. This means the components have their
loop()
methods made private, but they are called periodically as often as possible, just like before. In the near future, we should update the respective tasks' frequency dynamically (which is explicitly supported) depending on the actually required frequency. The DPL uses a dynamic backoff time which should be used here to schedule the next iteration of the DPL's loop at the appropriate time instead of checking every time in theloop()
whether or not the backoff elapsed. The same goes for the JK BMS Controller for sure. Also the VE.Directloop()
can be called at a reasonable frequency since we know only one message per second will arrive.Regarding the mutex in
BatteryClass
: This protects_upProvider
and must be part of the newly implementedupdateSettings()
. Currently that function accesses_upProvider
without locking the mutex, which happens when the battery settings are changed from the WebUI. This is probably an oversight, as it was done the right way inVictronMppt
.And this is where my concers start: As far as I can tell, AsyncWebServer does indeed still run in a thread other than the main
loop()
thread, i.e., the TaskScheduler context.As before, all `loop()´ functions that we care about run in the same context. Now they are part of the TaskScheduler. However, also as before, some callbacks run in the context of the respective caller, i.e., MQTT and WebServer callbacks in particular. Those are not part of the TaskScheduler context. Same with the WiFi driver (and potentially other drivers as well), which also may execute callbacks that access shared data structures.
As an example I looked closer at AsyncWebServer. I don't see that AsyncWebServer would have any knowledge about TaskScheduler such that it would ask TaskScheduler to execute callbacks as TaskScheduler tasks. Instead, the callbacks are (still) executed in the "async_tcp" FreeRTOS thread, which is scheduled pre-emptively alongside the TaskScheduler thread.
Hence I don't see how we could remove mutexes and locking mechanisms, as the concurrency issues are the same as before.
This is based on reading code as I am unable to conduct experiments right now. However, I am fairly confident that my assessment holds true.
It absolutely is not. Serial output will be garbled from time to time. Not within a single
println()
orprintf()
call, but in between words or formatting boundaries. And worse, we are back to the original limitation of the static buffer, which causes JK BMS and VE.Direct verbose logging messages to be cut (very) short in the web console. I will propose a new PR that restores the previous MessageOutput implementation by myself. It ran perfectly well for months now at all devices that used respectively recent releases.For the GridProfileParser, the library "Frozen" was added. That's also very interesting and can serve performance improvements and code reduction for JK BMS and VE.Direct at least.
If this was not already released, I would argue to go ahead and do so. If users experience errors, we can address them as needed. This is a good time for experiments as sun is scarce anyways and people might have more time than usual to report, inspect and fix issues.
#350 was effectively reverted. Let's keep that in mind in case spontaneous reboots or data corruption are reported while using the MQTT interface. We might need to have a look at all MQTT callbacks within OpenDTU-OnBattery.
Originally posted by @schlimmchen in #556 (comment)
The text was updated successfully, but these errors were encountered: