Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ApiException pod does not exist #40

Open
craigwalton-dsit opened this issue Dec 20, 2024 · 0 comments
Open

ApiException pod does not exist #40

craigwalton-dsit opened this issue Dec 20, 2024 · 0 comments
Labels
3rd party errors Errors observed from 3rd party code such as websocket or SSL errors

Comments

@craigwalton-dsit
Copy link
Collaborator

Migrated from internal repo.
Complete stack trace and logs (sensitive) https://github.com/AI-Safety-Institute/aisi-inspect-tools/issues/149
Original date: 25 Oct 2024

During an eval, the an ApiException was raised which caused a task to fail.

    "traceback": "Traceback (most recent call last):
  File \"redacted/.venv/lib/python3.12/site-packages/kubernetes/stream/ws_client.py\", line 528, in websocket_call
    client = WSClient(configuration, url, headers, capture_all, binary=binary)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/kubernetes/stream/ws_client.py\", line 68, in __init__
    self.sock = create_websocket(configuration, url, headers)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/kubernetes/stream/ws_client.py\", line 494, in create_websocket
    websocket.connect(url, **connect_opt)
  File \"redacted/.venv/lib/python3.12/site-packages/websocket/_core.py\", line 261, in connect
    self.handshake_response = handshake(self.sock, url, *addrs, **options)
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/websocket/_handshake.py\", line 65, in handshake
    status, resp = _get_resp_headers(sock)
                   ^^^^^^^^^^^^^^^^^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/websocket/_handshake.py\", line 150, in _get_resp_headers
    raise WebSocketBadStatusException(
websocket._exceptions.WebSocketBadStatusException: Handshake status 404 Not Found -+-+- {'content-length': '18', 'content-type': 'text/plain; charset=utf-8', 'date': 'Fri, 25 Oct 2024 12:35:35 GMT'} -+-+- b'pod does not exist'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File \"redacted/.venv/lib/python3.12/site-packages/inspect_ai/_eval/task/run.py\", line 266, in task_run
    sample_results = await asyncio.gather(*sample_coroutines)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/inspect_ai/_eval/task/run.py\", line 431, in task_run_sample
    error = sample_error(ex)
            ^^^^^^^^^^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/inspect_ai/_eval/task/error.py\", line 22, in __call__
    raise ex
  File \"redacted/.venv/lib/python3.12/site-packages/inspect_ai/_eval/task/run.py\", line 423, in task_run_sample
    state = await plan(state, generate)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/inspect_ai/solver/_plan.py\", line 106, in __call__
    state = await solver(state, generate)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/inspect_ai/solver/_basic_agent.py\", line 184, in solve
    tool_results = await call_tools(state.output.message, state.tools)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/inspect_ai/model/_call_tools.py\", line 154, in call_tools
    results = await asyncio.gather(*tasks)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/inspect_ai/model/_call_tools.py\", line 74, in call_tool_task
    result = await call_tool(tdefs, message.text, call)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/inspect_ai/model/_call_tools.py\", line 196, in call_tool
    result = await tool_def.tool(**arguments)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/...bash.py\", line 48, in bash
    result, new_cwd = await run_bash_command(
                      ^^^^^^^^^^^^^^^^^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/...bash.py\", line 98, in run_bash_command
    result = await bash_sandbox.exec([\"bash\", \"-c\", code], timeout=timeout_seconds)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/aisitools/k8s_sandbox/sandbox_environment.py\", line 102, in exec
    return await self._pod.exec(cmd, input, cwd, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/aisitools/k8s_sandbox/pod.py\", line 91, in exec
    result = await self._run_asynchronously(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/aisitools/k8s_sandbox/pod.py\", line 149, in _run_asynchronously
    return await loop.run_in_executor(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"/usr/lib/python3.12/concurrent/futures/thread.py\", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/aisitools/k8s_sandbox/pod.py\", line 92, in <lambda>
    lambda: executor.exec(cmd, stdin, cwd, timeout)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/aisitools/k8s_sandbox/pod.py\", line 246, in exec
    with self._interactive_shell(timeout) as ws_client:
  File \"/usr/lib/python3.12/contextlib.py\", line 137, in __enter__
    return next(self.gen)
           ^^^^^^^^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/aisitools/k8s_sandbox/pod.py\", line 266, in _interactive_shell
    ws_client: WSClient = stream(
                          ^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/kubernetes/stream/stream.py\", line 36, in _websocket_request
    out = api_method(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/kubernetes/client/api/core_v1_api.py\", line 994, in connect_get_namespaced_pod_exec
    return self.connect_get_namespaced_pod_exec_with_http_info(name, namespace, **kwargs)  # noqa: E501
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/kubernetes/client/api/core_v1_api.py\", line 1101, in connect_get_namespaced_pod_exec_with_http_info
    return self.api_client.call_api(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/kubernetes/client/api_client.py\", line 348, in call_api
    return self.__call_api(resource_path, method,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/kubernetes/client/api_client.py\", line 180, in __call_api
    response_data = self.request(
                    ^^^^^^^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/kubernetes/stream/ws_client.py\", line 538, in websocket_call
    raise ApiException(status=0, reason=str(e))
kubernetes.client.exceptions.ApiException: (0)
Reason: Handshake status 404 Not Found -+-+- {'content-length': '18', 'content-type': 'text/plain; charset=utf-8', 'date': 'Fri, 25 Oct 2024 12:35:35 GMT'} -+-+- b'pod does not exist'
",

The key info is "pod does not exist". Looking at kubectl get events agent-env-w3qxf7hy-default-0` for we can see

2024-10-25T12:20:11Z   Normal    agent-env-w3qxf7hy-default-0                  Scheduled               Successfully assigned agent/agent-env-w3qxf7hy-default-0 to ip-192-168-118-142.eu-west-2.compute.internal
2024-10-25T12:20:13Z   Normal    agent-env-w3qxf7hy-default-0                  Created                 Created container resolve-coredns-ip
2024-10-25T12:20:13Z   Normal    agent-env-w3qxf7hy-default-0                  Pulled                  Container image "toolbelt/dig:2024-09-23" already present on machine
2024-10-25T12:20:13Z   Normal    agent-env-w3qxf7hy-default-0                  Started                 Started container resolve-coredns-ip
2024-10-25T12:20:14Z   Normal    agent-env-w3qxf7hy-default-0                  Pulled                  Container image "redacted" already present on machine
2024-10-25T12:20:15Z   Normal    agent-env-w3qxf7hy-default-0                  Created                 Created container default
2024-10-25T12:20:15Z   Normal    agent-env-w3qxf7hy-default-0                  Started                 Started container default
2024-10-25T12:22:27Z   Warning   agent-env-w3qxf7hy-default-0                  NodeNotReady            Node is not ready
2024-10-25T12:27:32Z   Normal    agent-env-w3qxf7hy-default-0                  TaintManagerEviction    Marking for deletion Pod agent/agent-env-w3qxf7hy-default-0
2024-10-25T12:27:32Z   Normal    agent-env-w3qxf7hy-default-0                  Scheduled               Successfully assigned agent/agent-env-w3qxf7hy-default-0 to ip-192-168-103-228.eu-west-2.compute.internal
2024-10-25T12:27:32Z   Normal    agent-env-w3qxf7hy-default                    SuccessfulCreate        create Pod agent-env-w3qxf7hy-default-0 in StatefulSet agent-env-w3qxf7hy-default successful
2024-10-25T12:27:33Z   Normal    agent-env-w3qxf7hy-default-0                  Pulled                  Container image "toolbelt/dig:2024-09-23" already present on machine
2024-10-25T12:27:33Z   Normal    agent-env-w3qxf7hy-default-0                  Created                 Created container resolve-coredns-ip
2024-10-25T12:27:34Z   Normal    agent-env-w3qxf7hy-default-0                  Started                 Started container resolve-coredns-ip
2024-10-25T12:27:34Z   Normal    agent-env-w3qxf7hy-default-0                  Pulled                  Container image "redacted" already present on machine
2024-10-25T12:27:34Z   Normal    agent-env-w3qxf7hy-default-0                  Created                 Created container default
2024-10-25T12:27:35Z   Normal    agent-env-w3qxf7hy-default-0                  Started                 Started container default
2024-10-25T12:35:35Z   Normal    agent-env-w3qxf7hy-default-0                  Killing                 Stopping container default
2024-10-25T12:35:36Z   Normal    agent-env-w3qxf7hy-default-0                  Killing                 Stopping container default

Note the node not ready, after which it looks like the pod was rescheduled.

I don't know the reason the node went not ready.

Lots of evals on old versions of challenges where resource requests were not specified were also running.

@craigwalton-dsit craigwalton-dsit added the 3rd party errors Errors observed from 3rd party code such as websocket or SSL errors label Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3rd party errors Errors observed from 3rd party code such as websocket or SSL errors
Projects
None yet
Development

No branches or pull requests

1 participant