Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.5-beta: problems with starting services in propper order/time/delay #521

Open
interduo opened this issue Jul 12, 2024 · 5 comments
Open
Assignees
Labels
bug Something isn't working
Milestone

Comments

@interduo
Copy link
Contributor

interduo commented Jul 12, 2024

Aftert reboot there is always problem:

Jul 12 09:04:44 libreqos-beta systemd[1]: Started lqos_node_manager.service.
Jul 12 09:04:44 libreqos-beta lqos_node_manager[938]: Rocket has launched from http://[::]:9123
Jul 12 09:04:44 libreqos-beta lqos_node_manager[938]: Error: Unable to access /run/lqos/bus. Check that lqosd is running and you have appropriate permissions.

After systemctl restart lqos_node_manger all is starting perfectly.

Solution 1 "temporary": add ExecStartPre=/bin/sleep 10 in systemd service unit file

cat /etc/systemd/system/lqos_node_manager.service

[Unit]
After=network.service lqosd.service
Requires=lqosd.service

[Service]
WorkingDirectory=/opt/libreqos/src/bin
ExecStartPre=/bin/sleep 10
ExecStart=/opt/libreqos/src/bin/lqos_node_manager
Restart=always
#Turn on debuging for service
#Environment=RUST_LOG=info

[Install]
WantedBy=default.target

Solution 2: propper way fix, use service notify type

  1. Set lqosd Service Type as Notify
  2. Send a message (via sd_notify) after full start of lqosd that the service is ready.

What do You think about it?

@interduo
Copy link
Contributor Author

interduo commented Jul 12, 2024

This is also lqos_scheduler problem:

What I did:

journalctl --vacuum-time=15min --rotate
reboot
journalctl -u lqos_scheduler
-- Boot b0a56b4b200144f3802715e47c588a83 --
Jul 12 09:24:31 libreqos-beta systemd[1]: Starting lqos_scheduler.service...
Jul 12 09:24:31 libreqos-beta python3[943]: thread '<unnamed>' panicked at lqos_python/src/lib.rs:269:70:
Jul 12 09:24:31 libreqos-beta python3[943]: called `Result::unwrap()` on an `Err` value: Socket (typically /run/lqos/bus) not found. Check that lqosd is running, and you have permi>
Jul 12 09:24:31 libreqos-beta python3[943]: note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Jul 12 09:24:31 libreqos-beta python3[943]: Running Python Version 3.12.3 (main, Apr 10 2024, 05:33:47) [GCC 13.2.0]
Jul 12 09:24:31 libreqos-beta python3[943]: refreshShapers starting at 12/07/2024 09:24:31
Jul 12 09:24:31 libreqos-beta python3[943]: First time run since system boot.
Jul 12 09:24:31 libreqos-beta python3[943]: Validating input files 'ShapedDevices.csv' and 'network.json'
Jul 12 09:24:33 libreqos-beta python3[943]: Traceback (most recent call last):
Jul 12 09:24:33 libreqos-beta python3[943]:   File "/opt/libreqos/src/scheduler.py", line 69, in <module>
Jul 12 09:24:33 libreqos-beta python3[943]:     importAndShapeFullReload()
Jul 12 09:24:33 libreqos-beta python3[943]:   File "/opt/libreqos/src/scheduler.py", line 62, in importAndShapeFullReload
Jul 12 09:24:33 libreqos-beta python3[943]:     refreshShapers()
Jul 12 09:24:33 libreqos-beta python3[943]:   File "/opt/libreqos/src/LibreQoS.py", line 448, in refreshShapers
Jul 12 09:24:33 libreqos-beta python3[943]:     if (validateNetworkAndDevices() == True):
Jul 12 09:24:33 libreqos-beta python3[943]:         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
Jul 12 09:24:33 libreqos-beta python3[943]:   File "/opt/libreqos/src/LibreQoS.py", line 130, in validateNetworkAndDevices
Jul 12 09:24:33 libreqos-beta python3[943]:     rustValid = validate_shaped_devices()
Jul 12 09:24:33 libreqos-beta python3[943]:                 ^^^^^^^^^^^^^^^^^^^^^^^^^
Jul 12 09:24:33 libreqos-beta python3[943]: pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: Socket (typically /run/lqos/bus) not found. Check that lqosd i>
Jul 12 09:24:33 libreqos-beta systemd[1]: lqos_scheduler.service: Main process exited, code=exited, status=1/FAILURE
Jul 12 09:24:41 libreqos-beta systemd[1]: lqos_scheduler.service: Failed with result 'exit-code'.
Jul 12 09:24:41 libreqos-beta systemd[1]: Failed to start lqos_scheduler.service.
Jul 12 09:24:41 libreqos-beta systemd[1]: lqos_scheduler.service: Consumed 2.281s CPU time.
Jul 12 09:24:41 libreqos-beta systemd[1]: lqos_scheduler.service: Scheduled restart job, restart counter is at 1.
Jul 12 09:24:41 libreqos-beta systemd[1]: Starting lqos_scheduler.service...

Setting ExecStartPre=/bin/sleep 60 in lqos_scheduler.service helps for that

@interduo
Copy link
Contributor Author

interduo commented Jul 12, 2024

Temporary solution: #522

Don't requires implementing anything in lqosd.

@interduo interduo changed the title v1.5-beta: lqos_node_manager - Error: Unable to access /run/lqos/bus. Check that lqosd is running and you have appropriate permissions. v1.5-beta: problems with starting services in propper order/time/delay Jul 12, 2024
@thebracket
Copy link
Collaborator

The good news is that with UI2, there's no more rocket or separate node_manager daemon - so the Rocket side of things is going away. The scheduler needs to do an "is lqosd running? If not, delay" check - that should be easy enough.

@thebracket thebracket added the bug Something isn't working label Jul 12, 2024
@thebracket thebracket added this to the v1.5 Beta 2 milestone Jul 12, 2024
@thebracket thebracket self-assigned this Jul 12, 2024
@interduo
Copy link
Contributor Author

Well this should be done on systemd level I think.
It was creates for also this.

@interduo
Copy link
Contributor Author

Ok the situation now is that:
scheduler not started because no lqosd
lqosd not started because qsfp+ not up
(sometimes it is negotiating connection few secs)
scheduler give up and throw error in dmesg that it could not be started. I started manually started lqosd then scheduler.

Lqos_scheduler schould check link state before checking lqosd (?) if interfaces are not up just sleep some time and check again.

@rchac rchac modified the milestones: v1.5 Beta 2, v2.0 Jan 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants