Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running KinD in Kubernetes Pod with gVisor RuntimeClass #11313

Open
ericjohansson89 opened this issue Dec 20, 2024 · 5 comments
Open

Running KinD in Kubernetes Pod with gVisor RuntimeClass #11313

ericjohansson89 opened this issue Dec 20, 2024 · 5 comments
Assignees
Labels
type: bug Something isn't working

Comments

@ericjohansson89
Copy link

ericjohansson89 commented Dec 20, 2024

Description

Hello,

I am trying to run a KinD cluster within a Pod inside Kubernetes where to pod uses gVisor RuntimeClass.
I am using the docker daemon provided by your basic images (docker-in-gvisor) and the regular docker-cli image with version 27.3.1-cli.
The pod has capabilities as per your recommendation here: docker in gvisor
audit_write, chown, dac_override, fowner, fsetid, kill, mknod, net_bind_service, net_admin, net_raw, setfcap, setgid, setpcap, setuid, sys_admin, sys_chroot, sys_ptrace

Additionally the following configuration is set for the runsc configuration.

[runsc_config]
debug = "true"
debug-log = "/var/log/runsc/%ID%/gvisor.%COMMAND%.log"
net-raw = "true"
overlay2 = "root:self"

containerd config

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runsc]
runtime_type = "io.containerd.runsc.v1"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runsc.options]
TypeUrl = "io.containerd.runsc.v1.options"
ConfigPath = "/etc/containerd/runsc.toml"

I have managed to get both docker build and docker run commands to work but when trying to run kind create cluster it fails and the logs I am getting in the daemon is sparse even when running with the debug flag.

I can run the same setup in Kubernetes when RuntimeClass is using default runc in cluster and having it run as privileged. That setup works, but when running with gvisor it does not.

daemon-logs-working.txt (default cluster runc with privileged)
daemon-logs-not-working.txt (runsc RuntimeClass)

If zooming in I see that workingsetup I get this

time="2024-12-20T09:57:11.982332748Z" level=debug msg="Programming external connectivity on endpoint kind-control-plane (1ed3ae2722359aada81db09dbb17e01b1fa9edb015f695109f39c3bf432b00dc)" spanID=32a7af2ae5c4aa04 traceID=304a2903477e6027a5f02a60b3adbb30
time="2024-12-20T09:57:11.992323656Z" level=debug msg="EnableService 1d5396597f5861e7a321b952cfa415fa579b973ced13fa912876fa1e879cb0b8 START"
time="2024-12-20T09:57:11.992357339Z" level=debug msg="EnableService 1d5396597f5861e7a321b952cfa415fa579b973ced13fa912876fa1e879cb0b8 DONE"
time="2024-12-20T09:57:11.994789447Z" level=debug msg="bundle dir created" bundle=/var/run/docker/containerd/1d5396597f5861e7a321b952cfa415fa579b973ced13fa912876fa1e879cb0b8 module=libcontainerd namespace=moby root=/var/lib/docker/overlay2/8d1278868acfa80371564712cd4a6b6c2f3098e68a57336992b6bdcfab166f47/merged
time="2024-12-20T09:57:12.011775325Z" level=debug msg="shim bootstrap parameters" address="unix:///run/containerd/s/e38e0141c6c721f68c79ad46eeac3007350a5898bbcb2311400c527dbefbe82e" namespace=moby protocol=ttrpc
time="2024-12-20T09:57:12.014914690Z" level=info msg="loading plugin \"io.containerd.event.v1.publisher\"..." runtime=io.containerd.runc.v2 type=io.containerd.event.v1

Which is not shown on the daemon in the cluster. Does bundle creation have to do with overlay part not available inside the docker daemon as it is using VFS as storage driver?

There is also a section with IP tables erroring at top in gvisor logs but since that is disabled I guess it is accurate?
The initial commands ran in the pod of the daemon as per your image is:

set -xe -o pipefail

dev=$(ip route show default | sed 's/.*\sdev\s\(\S*\)\s.*$/\1/')
addr=$(ip addr show dev "$dev"  | grep -w inet | sed 's/^\s*inet\s\(\S*\)\/.*$/\1/')

echo 1 > /proc/sys/net/ipv4/ip_forward
iptables-legacy -t nat -A POSTROUTING -o "$dev" -j SNAT --to-source "$addr" -p tcp
iptables-legacy -t nat -A POSTROUTING -o "$dev" -j SNAT --to-source "$addr" -p udp

exec /usr/bin/dockerd --iptables=false --ip6tables=false -D

Also it complains about cgroup setting from the docker-cli container

Creating cluster "kind" ...
 • Ensuring node image (nexus.int.clxnetworks.net:8009/kindest/node:v1.29.8) 🖼  ...
DEBUG: docker/images.go:67] Pulling image: nexus.int.clxnetworks.net:8009/kindest/node:v1.29.8 ...
 ✓ Ensuring node image (nexus.int.clxnetworks.net:8009/kindest/node:v1.29.8) 🖼
 • Preparing nodes 📦   ...
 ✗ Preparing nodes 📦 
Deleted nodes: ["kind-control-plane"]
ERROR: failed to create cluster: command "docker run --name kind-control-plane --hostname kind-control-plane --label io.x-k8s.kind.role=control-plane --privileged --security-opt seccomp=unconfined --security-opt apparmor=unconfined --tmpfs /tmp --tmpfs /run --volume /var --volume /lib/modules:/lib/modules:ro -e KIND_EXPERIMENTAL_CONTAINERD_SNAPSHOTTER --detach --tty --label io.x-k8s.kind.cluster=kind --net kind --restart=on-failure:1 --init=false --cgroupns=private --publish=127.0.0.1:58758:6443/TCP -e KUBECONFIG=/etc/kubernetes/admin.conf nexus.int.clxnetworks.net:8009/kindest/node:v1.29.8" failed with error: exit status 125
Command Output: WARNING: Your kernel does not support cgroup namespaces.  Cgroup namespace setting discarded.
d7ef273962a1ba0a3dfc55418aab0d030a71a3b1cb299cd568b00e7ed0b4a06a
docker: Error response from daemon: not a device node.

Is there something we need to do here?

Steps to reproduce

  • Create a kubernetes cluster with a node having gVisor installed on the note
  • Install the runtimeclass of gvisor
  • Configure containerd and runsc
  • Launch a pod with
    • two containers where one is docker-cli and the other is the "docker-in-gvisor"
    • capabilities described above
  • In the docker-cli pod download and run "kind create cluster"
  • Watch it fail starting a kind cluster with device is not a node error

runsc version

runsc --version
runsc version release-20241217.0
spec: 1.1.0-rc.1

docker version (if using docker)

From docker-cli container


docker info
Client:
 Version:    27.3.1
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.19.1
    Path:     /usr/local/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.31.0
    Path:     /usr/local/libexec/docker/cli-plugins/docker-compose
Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 0
 Server Version: 27.3.1
 Storage Driver: vfs
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7f7fdf5fed64eb6a7caf99b3e12efcf9d60e311c
 runc version: v1.1.14-0-g2c9f560
 init version: de40ad0
 Security Options:
  seccomp
   Profile: builtin
 Kernel Version: 4.4.0
 Operating System: Alpine Linux v3.20 (containerized)
 OSType: linux
 Architecture: aarch64
 CPUs: 4
 Total Memory: 5GiB
 Name: runner-scsfzeqwr-project-65012842-concurrent-0-efbxgh1t
 ID: 8adf71ea-bdfc-405a-9043-31e1aa4109f6
 Docker Root Dir: /var/lib/docker
 Debug Mode: true
  File Descriptors: 24
  Goroutines: 48
  System Time: 2024-12-20T10:12:41.23162601Z
  EventsListeners: 0
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
 Product License: Community Engine
[DEPRECATION NOTICE]: API is accessible on http://0.0.0.0:2375 without encryption.
         Access to the remote API is equivalent to root access on the host. Refer
         to the 'Docker daemon attack surface' section in the documentation for
         more information: https://docs.docker.com/go/attack-surface/
In future versions this will be a hard failure preventing the daemon from starting! Learn more at: https://docs.docker.com/go/api-security/
WARNING: No swap limit support
WARNING: No kernel memory TCP limit support
WARNING: No oom kill disable support
WARNING: No blkio throttle.read_bps_device support
WARNING: No blkio throttle.write_bps_device support
WARNING: No blkio throttle.read_iops_device support
WARNING: No blkio throttle.write_iops_device support
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled

uname

uname -a Linux ip-10-15-40-36.eu-west-1.compute.internal 6.1.119-129.201.amzn2023.aarch64 #1 SMP Tue Dec 3 21:06:52 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux

kubectl (if using Kubernetes)

kubectl version
Client Version: v1.28.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.11-eks-56e63d8

repo state (if built from source)

No response

runsc debug logs (if available)

Can provide logs if it helps (don't know which file to add)
@ericjohansson89 ericjohansson89 added the type: bug Something isn't working label Dec 20, 2024
@milantracy
Copy link
Contributor

Can you umount /dev/termination-logs and run your workflow again? I believe this is a known issue in gvisor

@milantracy milantracy self-assigned this Dec 20, 2024
@milantracy
Copy link
Contributor

milantracy commented Dec 20, 2024

To clarify

docker: Error response from daemon: not a device node.

should be gone once you umount /dev/termination-logs, this is the known issue, we expect to have a fix in soon.

However, it will still fail with the cgroup namespace, I can reproduce it via docker

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: cgroup namespaces aren't enabled in the kernel: unknown.

@milantracy
Copy link
Contributor

it looks to me, under the cover, kind uses a docker container to start a kind control plane, the command is

failed to create cluster: command \"docker run --name kind-control-plane --hostname kind-control-plane --label io.x-k8s.kind.role=control-plane --privileged --security-opt seccomp=unconfined --security-opt apparmor=unconfined --tmpfs /tmp --tmpfs /run --volume /var --volume \
/lib/modules:/lib/modules:ro -e KIND_EXPERIMENTAL_CONTAINERD_SNAPSHOTTER --detach --tty --label io.x-k8s.kind.cluster=kind --net kind --restart=on-failure:1 --init=false --cgroupns=private --publish=127.0.0.1:41965:6443/TCP -e KUBECONFIG=/etc/kubernetes/admin.conf kindest/node:v1.32.0@sha256:c48c62eac5da28cdadcf560d1d8616cfa6783b58f0d94cf63ad1bf49600cb027\" failed with error: exit status 125\\
n", 0x2ac)

@milantracy
Copy link
Contributor

look at the option --cgroupns=private it is the root cause, i believe gvisor doesn't support, it is also documented at https://kind.sigs.k8s.io/docs/user/known-issues/#older-linux-distributions

from the gvisor's perspective, it can be simple reproduced via

root@e31b0a2d24a6:/# docker run --cgroupns=private hello-world
WARNING: Your kernel does not support cgroup namespaces.  Cgroup namespace setting discarded.
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: cgroup namespaces aren't enabled in the kernel: unknown.
ERRO[0000] error waiting for container:

@ericjohansson89
Copy link
Author

Hey @milantracy, thanks for looking into this. Is there any chance this will be supported in the future?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants