-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resets with IDF 4.4/Arduino Core 2.x platform on network request #2518
Comments
@Aircoookie I am not seeing any resets no matter how fast I hit WLED with My build environment is:
|
@blazoncek does not happen with |
@Aircoookie Looks like a bug in the Arduino core 2.0.x. Core 2.0.x is heavily refactored.
Using this setup uses latest orig. Arduino ESP32 from espressif repo EDIT: Just asked our dev for the "Lights" part in Tasmota. We do not have issues in Tasmota when using RMT |
@Jason2866 Thank you for the info! |
I had opened an issue with @Makuna a while ago and it turns out the latest RMT code (also) has problems feeding data to RMT buffers. Especially when ISRs take more time than needed. |
FYI I can't reproduce this on Tasmota using RMT and latest Core/idf, however hard I hammer it on the network side. Tested on Atom Matrix (Esp32 Pico D4) with an animation over 25 leds every 50ms. This does not confirm a bug in Arduino or IDF |
Found the thread... |
@Aircoookie Do a symbol dump on that method "rmt_ll_enable_tx_err_interrupt", if you actually find it listed what is the address? It shouldn't be listed (it is marked inline) but sometimes the compiler will ignore the suggestion of inline. There is a known GNU C++ bug that static inline functions may not honor attributes (inherited or specifically marked) and thus not be placed into IRAM memory location. Thus when run will require it to be loaded from cache (note your error). I ran into this myself and had to do some shenanigans to get the methods marked with IRAM (no longer inline, no longer listed in a header). |
@Makuna thank you for the helpful pointer! Not entirely positive this is what you are referring to, but as expected, I am also confused about this line in the error: @s-hadinger thank you for testing! The difference might very well be that WLED uses ESPAsyncWebserver, which in itself operates on interrupts upon network requests while Tasmota uses (as far as I can tell) the synchronous ESP8266Webserver. |
Perhaps whatever function is inlining |
Not have conclusive evidence, but it seems like this only happens if the network request accesses flash data in any way (filesystem operation, or progmem strings) |
Also strange that core 1 (handling network interrupts) panics. I would have assumed RMT interrupts to use core 0. |
I have the issue with 0.13.1 where it frequently crashes and reboots when being controlled via Home Assistant, but I am not sure if it is due to the same reason. The Environment string on the info pages says "esp32 V3.3.6-16-gcc5440f6a2". Is there anything I can do to help debug the issue? |
Latest Tasmota framework (based on 2.0.3 rc1 and IDF 4.4.1) can be used with
|
I have built latest |
I have now also tried a build with the just released 2.0.3 (not rc) from Tasmota. Same result as above, unsurprisingly. |
Getting some trace output with the latest official (non-RC) 2.0.3 tasmota release (https://github.com/tasmota/platform-espressif32/releases/tag/v.2.0.3) when replicating this issue on esp32-c3 (previously with 2.0.2 I was only getting stack memory dump, mentioned here: #2596 (comment)) Though, unsure if the trace output is accurate, based on WIP PR: platformio/platform-espressif32#612
|
Hey! This issue has been open for quite some time without any new comments now. It will be closed automatically in a week if no further activity occurs. |
reminds me of the pending use-after-free bug in Async Webserver : me-no-dev/ESPAsyncWebServer#951 and proposed fix in me-no-dev/ESPAsyncWebServer#952 |
@Aircoookie perhaps you could pull this into your fork? |
@softhack007 Is there a easy (without connecting any leds) way to reproduce the resets? I would like to test with a recent builded framework (using a newer idf commit than espressif uses in release arduino core 2.0.5). Many bug fixes in IDF
Probably fixes wifi stability issues for esp32 and esp32c3 |
Hi @Jason2866, I think the problem in async webserver was found by code analysis and inspection. So it might be hard to reliably reproduce the crash. Sorry I don't have a clear scenario for you, maybe someone else here can help with that. We have several users reporting such crashes, and the stacktrace provided by @fallspectrum clearly points into the same direction. It seems to happen with idf v4.4.x and arduino-esp32 2.0.x on new MCUs like s2, c3, but also seen on "classic" esp32. Maybe memory management in idf was changed or it is just pure luck that we don't see such crashes with older frameworks. The root cause seems to be a use-after-free coding error in async webserver, so I see very little chance that it can be solved with updates in other components. Edit: it could be that we are talking about two different bugs here, as the original report from @Aircoookie was having a crash in a different area:
I've seen this one on 'classic ESP32' and on -s3 when using new framework. My solution was to use latest NeoPixelBus (master). |
There are changes in IDF regarding rmt. The changes are in latest Tasmota core. |
I have reviewed AsyncWebServer's way of handling unneeded headers and although not apparent there is a possibility of deleting a pointer while still using it. I would have rewritten I did not debug thoroughly yet to see if error lies in WLED code (callbacks) or AsyncWebServer itself. |
Done. |
Tested, it does no harm on ESP32 but it also does not help with web server on ESP32-S2. |
We have zero issues with the S2 and wifi. It does connect fine with DHCP and fixed ip addresses. Connection is stable too. If AP is switched off reconnecting does work too. |
Do not get me wrong, my ESP32-S2 (Lolin Wemos S2 mini) always connects to my WiFi after a cold boot, but after warm boot not always. I am also seeing it drop off of network while it is still reacting to button presses and plays WLED effects. In such cases I can also sometimes observe there is no more debug data dumped to serial interface. |
Yes, and this is related to WLED. Zero issues cold or warmboot or whatever the S2 is running without a hickup with Tasmota. The CDC is working too, no problem. |
As you only talk about Tasmota, I think what it means is "related to a scenario we don't have in Tasmota". This could be many things, I guess. Some aspects that we should think about:
Just a few ideas that might help us dig down to the root cause of crashes. |
I am guessing that the most probable suspects for issues are LittleFS and Async* library (could also be AsyncTCP as it is used by AsyncWebServer). If I have S2 unit run a single effect it will run happily until rebooted or powered off. But as soon as I run playlist which loads presets from LittleFS every so often it will soon lock up/reboot (without any UI/web interaction). Similar thing happens if I start playing with UI (trying not to trigger any LittleFS use) some page loads will stall and/or never complete loading (files from PROGMEM strings). Unfortunately I am no good at JTAG debugging nor have proper equipment so can't use that. BTW, yesterday I added a few DEBUG_PRINT commands to web server callbacks and a single call to |
It is NOT LittleFS. That is sure. We use it in every Tasmota32 version as default file system and it is used heavily.
|
I have ruled out NeoPixelBus (2.6.9) as LEDs are updated correctly without issues. Lockups/reboots occur with or without LEDs being updated. Below is partial debug output (trimmed for clarity and comments added).
From this point on WiFi can no longer reconnect (credentials hardcoded into binary) but WLED still reacts to buttons and I2S interface. |
If i see mDNS in espressif arduino code, i have the No1 candidate for issues. We removed in all builds (esp8266 and esp32x). Since this day all strange issues we had in the past are gone. mDNS is a mess. It will never work reliable. |
Commented out mDNS and look what happens. USB serial no longer works, WLED is running effect as normal, web UI is no longer accessible after a few minutes. |
Strange problem when using ESPAsyncWebServer with a S3 espressif/arduino-esp32#7268 |
Using S2 with very selective HTTP JSON API calls (and keeping it at minimum) I can operate it stable for hours. |
Looks like ESP32-C3 works flawlessly with:
|
OMG! This is amazing! I have been waiting to get my C3 M5 Stamp to drive my WLED installation for months.. and now it's all working with no restarts! Thank you so much for the config - it indeed works flawlessly! 🎉 |
There is an issue with classic ESP32 and writing to LittleFS while there are interrupts involved (e.g. driving LEDS). |
Only able to get 2 pins to actually work. I can rearrange them to different pins and they'll work but when I add a 3rd one it freezes up when saving and reverts back to what I had before. I've tried driving 30leds on each channel but that doesn't seem to make a difference. |
What has this to do with OP? |
WLED is working well on many -S3/-S2/-C3 with platform 5.3.0. I assume that bugs in the core libraries were causing problems when network requests and RMT interrupts collided. Closing. |
What happened?
WLED frequently resets upon network requests (e.g. to
/json
while loading main UI, applying presets...) while (SK6812) LEDs are on.To Reproduce Bug
Compile WLED with the custom IDF 4.4 + arduino 2.0.2 platform courtesy of Tasmota (this branch).
The LITTLEFS library by lorol is replaced by the built-in LittleFS in the 2.x.x arduino core.
The reset reason is the low level RMT interrupts failing during a network request, although I don't yet understand the exact cause. Exception decoder trace below.
This has been reproduced on ESP32 and ESP32-S2 and the platforms:
Expected Behavior
No resets
Install Method
Self-Compiled
What version of WLED?
arduinocore2 branch
Which microcontroller/board are you seeing the problem on?
ESP32
Relevant log/trace output
Anything else?
@Jason2866 I suspect that this issue is due to a programming error on my end surfacing with the newer IDF/Arduino releases or possibly with the exchanged LittleFS library. Still, if you know what could be a likely cause, I'd be super happy to know as I see you did quite a few changes to sdkconfig (which I really like, as it drastically reduces binary size!)
My first guess was that the issue was due to
CONFIG_DISABLE_HAL_LOCKS
being set in 2.0.2.1, however the issue is also present inhttps://github.com/tasmota/platform-espressif32/releases/download/v2.0.2/platform-tasmota-espressif32-2.0.2.zip
, where that option was unset as far as I can tell.Code of Conduct
The text was updated successfully, but these errors were encountered: