-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve command retry #12
Comments
Thank you for the detailed post, I agree with most points, to be fair I am not happy with the current implementation either.
This is simple to fix, the backoff library that I use for retries has its own logger. I'll fix this by only logging retries if debug logging is enabled for the component. For the other points, I'm not sure what would be the ideal solution. First, I added a retry strategy to the requests because my washing machine was rejecting requests too, for no apparent reason. But as you can see, even with this retry mechanism sometimes the device becomes unavailable during a wash cycle. I still don't understand why this happens, I'm afraid I need to tweak the retry logic a bit. At the moment the integration polls the device every minute (plus a few extra retries if a request fails), and I don't necessarily want to make this less frequent because the delayed status updates wouldn't be a good UX. I also thought of doing exponentially increasing refresh intervals. This would make less unnecessary polls when the device is truly offline, but the interval would be so big after a few days of being turned off (which is not uncommon for a washing machine) that when it would be finally turned on the next update would have a huge delay. I think the ultimate solution will be something similar to what you mentioned as a "grace period". I'd make the updater "stateful" and store the last successful request's timestamp. Based on this, the updater would poll more frequently initially, but after a certain amount of time passed since the last update, it would switch to a less frequent interval. |
not sure if it helps, but I noticed that the http server (mine is a wine cooler) only supports one connection, so: 1 - every connection must be closed before trying to open another one aiohttp.ClientSession(connector=aiohttp.TCPConnector(force_close=True)) 2 - if you are using the app at the same time, it will consume the only connection allowed |
Thank you @rogerlz for the suggestion, I will try running this modification locally for a few days to see how stable it is. |
Creating my own
|
I will get back to you in a few days with this tested. I'm currently adding support to the Wine Cooler. |
FWIW, I modified the component to specify None as the update_interval so that there is no polling at all. I then created an automation that triggers against the motion sensor I have in my kitchen and then simply polls the machine every 2 minutes until the machine becomes unavailable or is idle. I'm not so bothered about the UI accurately reflecting the state of the machine but more about generating notifications when I need to go back to the machine to do something. My plan is to modify the automation to do something roughly like this:
I think it would be quite useful to have the option to disable polling in the config flow and choose to observe state via automation only. I imagine that automation blueprints could be used to define behaviours according to how people want to use the sensor. Or just use polling! |
I wonder if this is because of this bug: https://docs.aiohttp.org/en/stable/client_advanced.html#graceful-shutdown |
At the moment the integration query the washer every minute and if it does not get a reply it makes several (max_tries=10?) connection attempts. This behaviors is not good when the device is not available, for example because it is off or in sleep.
For example, in my case, the machine goes in sleep mode (wifi is disconnected) after 1-2 minutes it complete the washing cycle. This results in flooding the log with rows like this:
In few days my home assistant log has grow to several megabytes with 30k+ errors as above.
--
Another related problem experienced is that the retry mechanism does not really avoid connection problem. For example, in the following chart, it is possible to see that the machine became unavailable several time during the washing cycles.
This "unavailable time" is exactly 1 minute, the time between a query and the next one.
So the retry mechanism does not actually improve the connection reliability.
--
Possible improvements:
Thanks for the great job you are doing with the integration.
The text was updated successfully, but these errors were encountered: