Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does not re-connect when MQTT broker restarts #30

Open
crashmatt opened this issue Dec 4, 2021 · 22 comments
Open

Does not re-connect when MQTT broker restarts #30

crashmatt opened this issue Dec 4, 2021 · 22 comments

Comments

@crashmatt
Copy link

Broker connection is broken after a restart.

Tested reconnect by entering configuration, touching a field and then saving status.
Touch may not be required. This is not tested.
Reboot may make a reconnect. This is not tested.

@crashmatt
Copy link
Author

I suspect a lua script might act as a watchdog for this service.

class Client(plugin has methods that seem like they should be exposed to lua.
There is no method I can find to plainly expose the connection state. Detecting if it is connected may be difficult.

There is no mechanism I can find in client.py that watches over the connection state.

I would make adjustments to the plugin myself but I have had no success building a good build environment for these plugins.

@quazzie
Copy link
Owner

quazzie commented Dec 4, 2021

Uhm lua ?
You mean we should add a separate lua plugin to watch this ?
There should not be a need for a watchdog for the mqtt connection, the paho client should reconnect by itself according to the documentation. But i have just noticed this problem myself, if the connection is lost it does not reconnect. I'll have a look what i can do when i get some free time.

@crashmatt
Copy link
Author

Yes. Run a small lua script on a timer to check for connection and restart if required.

This lua script interacts with your plugin to send the mqtt debug message.

The only flaw in this plan is that Client does not have the correct methods.

-- hass mqtt plugin has to be installed
local mqtt = nil

function onInit()
mqtt = require 'HASSMQTT.Client'
if mqtt == nil then
print "No mqtt client"
else
print "mqtt client ok"
end
end

function onDeviceStateChanged(device, state, stateValue)
dev_name = device:name()
if dev_name == "ClockTick" then
print "sent debug message"
mqtt:_debug("bahhh")
end
end

@crashmatt
Copy link
Author

but fixing the behavior in paho would be better

@crashmatt
Copy link
Author

A lua script fix while paho is broken.

  1. A fake Nexa switch device is added as "HassTellstickMQTTWatchdog"
  2. A Hass automation sets this device through MQTT once every 10 seconds
  3. The lua script checks if the watchdog event has happened with 30s minimum interval
  4. If the watchdog signal is not received then lua sets the "hostname" of the mqtt client. This results in a disconnect-connect started from here. I did not find a better way to do disconnect-connect.
-- hass mqtt plugin has to be installed
local mqtt = require 'HASSMQTT.Client'
local deviceManager = require "telldus.DeviceManager"	
local running_timer = false
local watchdog_count = 0
local watchdog_timeout_seconds = 30 -- Delay in minutes

function init()
	if mqtt == nil then
		print "No mqtt client"
	else
		print "mqtt client ok"
	end
end

function onInit()
	init()
end

function onDeviceStateChanged(device, state, stateValue)
	if mqtt == nil then
		return
	end
	
	dev_name = device:name()
	if dev_name == "HassTellsickMQTTWatchdog" then
		if device:state() == 1 then
			watchdog_count = watchdog_count + 1
			print "HassTellstickMQTTWatchdog signal received"
		end
	end
	
	if not running_timer then
		running_timer = true
		watchdog_count = 0
		sleep(watchdog_timeout_seconds*1000)
		
		if watchdog_count == 0 then
			print("HassTellstickMQTTWatchdog timeout")
			mqtt:configWasUpdated('hostname', '<HASS_ADDRESS>')
		else
			print("HassTellstickMQTTWatchdog count %u", watchdog_count)
		end
		running_timer = false
	end
end

@henripalmroth
Copy link

Looks interesting. Could you share also the HA automation part?

@pierrebengtsson
Copy link

pierrebengtsson commented Jan 17, 2022

I´m experiencing the same issue. In my case we have a lot of power outages at the winter and when my znet starts up before my HA instance the MQTT connection fails and won´t reconnect until i powercykle my znet.
@crashmatt could you share youre HA-automation? If i create an automation that sets the fictional device to "on" every 10 second and the only message I get from the LUA-script is "HassTellstickMQTTWatchdog timeout"

@crashmatt
Copy link
Author

crashmatt commented Jan 17, 2022 via email

@crashmatt
Copy link
Author

crashmatt commented Feb 4, 2022 via email

@fredrike
Copy link
Contributor

I've had this issue for quite some time, don't know a good solution though.

#9

It would be great if @crashmatt could share the lua and ha config (with formatting) for a watchdog.

@crashmatt
Copy link
Author

crashmatt commented Apr 17, 2023 via email

@tiehfood
Copy link

Might have a look at my comments on the mentioned ticket. Maybe that's a thing?

@fredrike
Copy link
Contributor

Here are my current configurations.

  1. Created a switch in Telldus Live (called MQTT-watchdog)
  2. Changed id for the new switch to switch.tellstick_mqtt_watchdog in HA
  3. Built the following automation in HA:
    alias: HassTellstickMQTTWatchdog
    description: ""
    trigger:
      - platform: time_pattern
        seconds: /10
    condition: []
    action:
      - service: switch.turn_on
        data: {}
        target:
          entity_id: switch.tellstick_mqtt_watchdog
    mode: single
  4. Built the following Lua script on my Telldus TellStick (accessed trough the local IP):
    -- hass mqtt plugin has to be installed
    local mqtt = require 'HASSMQTT.Client'
    local deviceManager = require "telldus.DeviceManager"
    local running_timer = false
    local watchdog_count = 0
    local watchdog_timeout_seconds = 30 -- Delay in seconds
    
    function init()
       if mqtt == nil then
          print "No mqtt client"
       else
          print "mqtt client ok"
       end
    end
    
    function onInit()
       init()
    end
    
    function onDeviceStateChanged(device, state, stateValue)
       if mqtt == nil then
          return
       end
    
       dev_name = device:name()
       if dev_name == "HassTellsickMQTTWatchdog" then
          if device:state() == 1 then
             watchdog_count = watchdog_count + 1
             print "HassTellstickMQTTWatchdog signal received"
          end
       end
    
       if not running_timer then
          running_timer = true
          watchdog_count = 0
          sleep(watchdog_timeout_seconds*1000)
    
          if watchdog_count == 0 then
             print("HassTellstickMQTTWatchdog timeout")
             mqtt:connect()
          else
             print("HassTellstickMQTTWatchdog count %u", watchdog_count)
          end
          running_timer = false
       end
    end

I've not had any issues with MQTT since I started running this, but I can't say that it is just because of this script (I might be lucky too).

@tiehfood
Copy link

Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "build/bdist.linux-x86_64/egg/paho/mqtt/client.py", line 3591, in _thread_main
  File "build/bdist.linux-x86_64/egg/paho/mqtt/client.py", line 1779, in loop_forever
  File "build/bdist.linux-x86_64/egg/paho/mqtt/client.py", line 1044, in reconnect
  File "build/bdist.linux-x86_64/egg/paho/mqtt/client.py", line 3685, in _create_socket_connection
  File "/usr/lib/python2.7/socket.py", line 575, in create_connection
    raise err
timeout: timed out

This is the error which paho throws, if the mqtt server is restarted or shut down

same problem here:
eclipse-paho/paho.mqtt.python#636

@crashmatt
Copy link
Author

crashmatt commented Apr 25, 2023 via email

@tiehfood
Copy link

tiehfood commented Apr 25, 2023

MQTT_Homeassistant-0.90.4_paho-1.5.1.zip
@crashmatt , @fredrike you may want to try this version. It seems that the reconnect is working better with paho <1.6.0. So this is just the current version 0.90.4 repacked with the paho 1.5.1 from version 0.90.0. For me this is far more stable on reconnects and no exception is thrown so far.

p.s. the files are signed and unmodified from this repo, otherwise it would not be possible to load them in telldus. So you might trust the content of the ZIP 😉

@crashmatt
Copy link
Author

crashmatt commented Apr 26, 2023 via email

@tiehfood
Copy link

tiehfood commented Apr 26, 2023

Just install the zip file as a plug-in (don't extract). As you do it with the official plugin from the releases page. And yes, it just replaces the paho version (from 1.6.1 down to 1.5.1)

@sampod
Copy link

sampod commented Apr 26, 2023

MQTT_Homeassistant-0.90.4_paho-1.5.1.zip

This seems to be working well. I installed this and tried a couple of times restarting my mqtt server and power cycling my network switch and the mqtt connection was restored correctly.

@hauard
Copy link

hauard commented Dec 6, 2023

MQTT_Homeassistant-0.90.4_paho-1.5.1.zip

This seems to be working well. I installed this and tried a couple of times restarting my mqtt server and power cycling my network switch and the mqtt connection was restored correctly.

I forgot to check what version I had first, but tried the zip-file, my problem still persists. Lately the addon have disconnected just seconds after connecting, making the znet dumb as f**k

Going to try the lua script now, fingers crossed X

@hauard
Copy link

hauard commented Dec 6, 2023

Looks like the client disconnects just seconds after connecting either way. Made myself a virtual switch that a lua listens to and connects the MQTT, making it easier to investigate. Earlier I had to login and bump the addon by removing and adding a number in the config in the addon

I have several other clients that connects to the broker without issues. Tried increasing and decreasing the keepalive ping on the broker, but no luck.

Is there any way to enable logging on the znet? To see whats going on there

@grEvenX
Copy link

grEvenX commented Dec 7, 2023

My problems with disconnects only happens after a while. After years with issues where I have had to restart the znet manually from time to time, I’ve now connected it to a power switch that I automatically power cycle every night.
Now my setup is finally stable 🙈

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants