Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kobo worker logging thread may hang indefinitely on TLS handshake #269

Open
kdudka opened this issue Jan 21, 2025 · 0 comments
Open

kobo worker logging thread may hang indefinitely on TLS handshake #269

kdudka opened this issue Jan 21, 2025 · 0 comments

Comments

@kdudka
Copy link
Contributor

kdudka commented Jan 21, 2025

I am forwarding an issue from Red Hat internal Jira that I was debugging in May 2024 but that I have not resolved yet.

Current Behavior:
An OSH task hanged indefinitely on an OSH worker while the child process was blocked on write to stdout/stderr. The kobo worker logging thread was blocked indeifintely on TLS handshake:

(gdb) py-bt
Traceback (most recent call first):
  File "/usr/lib64/python3.9/ssl.py", line 1343, in do_handshake
    self._sslobj.do_handshake()
  File "/usr/lib64/python3.9/ssl.py", line 1074, in _create
    self.do_handshake()
  File "/usr/lib64/python3.9/ssl.py", line 501, in wrap_socket
    return self.sslsocket_class._create(
  File "/usr/lib64/python3.9/http/client.py", line 1454, in connect
    self.sock = self._context.wrap_socket(self.sock,
  File "/usr/lib64/python3.9/http/client.py", line 980, in send
    self.connect()
  File "/usr/lib64/python3.9/http/client.py", line 1040, in _send_output
    self.send(msg)
  File "/usr/lib64/python3.9/http/client.py", line 1280, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib64/python3.9/xmlrpc/client.py", line 1321, in send_content
    connection.endheaders(request_body)
  File "/usr/lib64/python3.9/xmlrpc/client.py", line 1291, in send_request
    self.send_content(connection, request_body)
  File "/usr/lib/python3.9/site-packages/kobo/xmlrpc.py", line 369, in _single_request3
    h = self.send_request(host, handler, request_body, verbose)
  File "/usr/lib64/python3.9/xmlrpc/client.py", line 1166, in request
    return self.single_request(host, handler, request_body, verbose)
  File "/usr/lib/python3.9/site-packages/kobo/xmlrpc.py", line 477, in request
    result = transport_class.request(self, *args, **kwargs)
  File "/usr/lib64/python3.9/xmlrpc/client.py", line 1464, in __request
    response = self.__transport.request(
  File "/usr/lib64/python3.9/xmlrpc/client.py", line 1122, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib/python3.9/site-packages/kobo/client/__init__.py", line 510, in upload_task_log
    self._hub.worker.upload_task_log(task_id, remote_file_name, mode, chunk_start, chunk_len, chunk_checksum, encoded_chunk)
  File "/usr/lib/python3.9/site-packages/kobo/worker/logger.py", line 65, in run
    self._hub.upload_task_log(BytesIO(self._send_data), self._task_id, "stdout.log", append=True)
  File "/usr/lib64/python3.9/threading.py", line 980, in _bootstrap_inner
    self.run()
  File "/usr/lib64/python3.9/threading.py", line 937, in _bootstrap
    self._bootstrap_inner()

The Python code where the thread was blocked seems to support a timeout to be set for the TLS handshake but the kobo/xmlrpc stack does not set it:

(gdb) py-list
1338            self._check_connected()
1339            timeout = self.gettimeout()
1340            try:
1341                if timeout == 0.0 and block:
1342                    self.settimeout(None)
>1343                self._sslobj.do_handshake()
1344            finally:
1345                self.settimeout(timeout)
1346    
1347        def _real_connect(self, addr, connect_ex):
1348            if self.server_side:

Expected Behavior:
The task should either fail or stop transferring the captured output to the hub but it should not hang indefinitely.

Steps to reproduce:
I am not sure how it happened but I suspect it was caused by an intermittent network issue.

Impact Statement:
Such OSH tasks unnecessarily block the OSH scanning queue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant