Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bwrap broken #303

Open
tbodt opened this issue Jun 7, 2024 · 11 comments
Open

bwrap broken #303

tbodt opened this issue Jun 7, 2024 · 11 comments

Comments

@tbodt
Copy link

tbodt commented Jun 7, 2024

sudo brl fetch debian -r testing -n test
sudo strat test apt install bubblewrap
strat test bwrap --dev-bind / / true

gives

bwrap: No permissions to create new namespace, likely because the kernel does not allow non-privileged user namespaces. See <https://deb.li/bubblewrap> or <file:///usr/share/doc/bubblewrap/README.Debian.gz>.

However, bwrap installed in the arch linux arm stratum I have (from hijacking, used as init) worked fine.

@paradigm
Copy link
Member

paradigm commented Jun 7, 2024

Bedrock has to balance making processes from different strata see different things (so they each see what they need to work without conflicting) and making them see the same thing (so they interact and everything feels like one cohesive system). In 0.7 Poki, all processes see the same mount namespace, so they can all mount/unmount each other's items. However, they have different root directories (/ is local). Ends up this upsets a sanity check constraint used by user namespaces, which is how bwrap works. I didn't learn this until it was too late to adjust course in Poki. For the upcoming 0.8 Naga the plan is to do per-stratum mount namespaces, which should make this and a few other situations just-work, but make other Bedrock internal implementation details more complex.

In Poki, your choices are to:

  • As you found, use bwrap from the init-providing stratum. This passes the user namespace sanity check constraint.
  • setuid the bwrap executable. This elevates its permissions and allows it to implement namespace changes the sanity check constraint, but comes at the expense of trusting bwrap drops permissions correctly.

I need to document this on https://bedrocklinux.org/0.7/feature-compatibility.html.

@tbodt
Copy link
Author

tbodt commented Jun 7, 2024

Thank you, I will probably use setuid. What exactly is that sanity check?

@paradigm
Copy link
Member

paradigm commented Jun 7, 2024

Thank you, I will probably use setuid.

Happy to help

What exactly is that sanity check?

From man 2 unshare:

   EPERM (since Linux 3.9)
          CLONE_NEWUSER  was  specified in flags and the caller is in a chroot environment (i.e., the caller's root directory does not match the root directory of the mount namespace in which it resides).

Mount namespaces have their own concept of a root directory, and the virtual filesystem has its own concept (which one usually changes with chroot(2)). These two things need to align for unprivileged unshare(2) calls (which is how bwrap works). In Poki, these two roots don't align: it's one global mount namespace with only one mount namespace root (like on most distros), but with a per-stratum virtual filesystem root. The plan in Naga is to give each stratum its own mount namespace root which maps 1:1 with the virtual filesystem root, resolving this issue.

@tbodt
Copy link
Author

tbodt commented Jun 8, 2024

Sadly, this did not work either...

$ sudo strat debian bwrap --dev-bind / / bash
bwrap: pivot_root: Invalid argument

@tbodt
Copy link
Author

tbodt commented Jun 8, 2024

231680 execve("/usr/bin/bwrap", ["bwrap", "--dev-bind", "/", "/", "bash"], 0xffffee3953d8 /* 21 vars */) = 0
231680 brk(NULL)                        = 0xaaaaea241000
231680 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xffff80843000
231680 faccessat(AT_FDCWD, "/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
231680 openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
231680 newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=47147, ...}, AT_EMPTY_PATH) = 0
231680 mmap(NULL, 47147, PROT_READ, MAP_PRIVATE, 3, 0) = 0xffff80837000
231680 close(3)                         = 0
231680 openat(AT_FDCWD, "/lib/aarch64-linux-gnu/libselinux.so.1", O_RDONLY|O_CLOEXEC) = 3
231680 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\267\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 832) = 832
231680 newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=198800, ...}, AT_EMPTY_PATH) = 0
231680 mmap(NULL, 337464, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_DENYWRITE, -1, 0) = 0xffff807b7000
231680 mmap(0xffff807c0000, 271928, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0) = 0xffff807c0000
231680 munmap(0xffff807b7000, 36864)    = 0
231680 munmap(0xffff80803000, 26168)    = 0
231680 mprotect(0xffff807ed000, 73728, PROT_NONE) = 0
231680 mmap(0xffff807ff000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2f000) = 0xffff807ff000
231680 mmap(0xffff80801000, 5688, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xffff80801000
231680 close(3)                         = 0
231680 openat(AT_FDCWD, "/lib/aarch64-linux-gnu/libcap.so.2", O_RDONLY|O_CLOEXEC) = 3
231680 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\267\0\1\0\0\0\220z\0\0\0\0\0\0"..., 832) = 832
231680 newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=67704, ...}, AT_EMPTY_PATH) = 0
231680 mmap(NULL, 196696, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_DENYWRITE, -1, 0) = 0xffff8078f000
231680 mmap(0xffff80790000, 131160, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0) = 0xffff80790000
231680 munmap(0xffff8078f000, 4096)     = 0
231680 munmap(0xffff807b1000, 57432)    = 0
231680 mprotect(0xffff8079a000, 86016, PROT_NONE) = 0
231680 mmap(0xffff807af000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xf000) = 0xffff807af000
231680 close(3)                         = 0
231680 openat(AT_FDCWD, "/lib/aarch64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
231680 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0\267\0\1\0\0\0000w\2\0\0\0\0\0"..., 832) = 832
231680 newfstatat(3, "", {st_mode=S_IFREG|0755, st_size=1716384, ...}, AT_EMPTY_PATH) = 0
231680 mmap(NULL, 1826736, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_DENYWRITE, -1, 0) = 0xffff805d2000
231680 mmap(0xffff805e0000, 1761200, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0) = 0xffff805e0000
231680 munmap(0xffff805d2000, 57344)    = 0
231680 munmap(0xffff8078e000, 8112)     = 0
231680 mprotect(0xffff80770000, 53248, PROT_NONE) = 0
231680 mmap(0xffff8077d000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x19d000) = 0xffff8077d000
231680 mmap(0xffff80782000, 49072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xffff80782000
231680 close(3)                         = 0
231680 openat(AT_FDCWD, "/lib/aarch64-linux-gnu/libpcre2-8.so.0", O_RDONLY|O_CLOEXEC) = 3
231680 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\267\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 832) = 832
231680 newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=592440, ...}, AT_EMPTY_PATH) = 0
231680 mmap(NULL, 721648, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_DENYWRITE, -1, 0) = 0xffff8052f000
231680 mmap(0xffff80530000, 656112, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0) = 0xffff80530000
231680 munmap(0xffff8052f000, 4096)     = 0
231680 munmap(0xffff805d1000, 58096)    = 0
231680 mprotect(0xffff805b9000, 90112, PROT_NONE) = 0
231680 mmap(0xffff805cf000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x8f000) = 0xffff805cf000
231680 close(3)                         = 0
231680 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xffff80835000
231680 set_tid_address(0xffff80835510)  = 231680
231680 set_robust_list(0xffff80835520, 24) = 0
231680 rseq(0xffff80835b60, 0x20, 0, 0xd428bc00) = 0
231680 mprotect(0xffff8077d000, 12288, PROT_READ) = 0
231680 mprotect(0xffff805cf000, 4096, PROT_READ) = 0
231680 mprotect(0xffff807af000, 4096, PROT_READ) = 0
231680 mprotect(0xffff807ff000, 4096, PROT_READ) = 0
231680 mprotect(0xaaaab71df000, 4096, PROT_READ) = 0
231680 mprotect(0xffff80848000, 8192, PROT_READ) = 0
231680 prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
231680 munmap(0xffff80837000, 47147)    = 0
231680 prctl(PR_CAPBSET_READ, CAP_MAC_OVERRIDE) = 1
231680 prctl(PR_CAPBSET_READ, 0x30 /* CAP_??? */) = -1 EINVAL (Invalid argument)
231680 prctl(PR_CAPBSET_READ, CAP_CHECKPOINT_RESTORE) = 1
231680 prctl(PR_CAPBSET_READ, 0x2c /* CAP_??? */) = -1 EINVAL (Invalid argument)
231680 prctl(PR_CAPBSET_READ, 0x2a /* CAP_??? */) = -1 EINVAL (Invalid argument)
231680 prctl(PR_CAPBSET_READ, 0x29 /* CAP_??? */) = -1 EINVAL (Invalid argument)
231680 statfs("/sys/fs/selinux", 0xffffda027ca0) = -1 ENOENT (No such file or directory)
231680 statfs("/selinux", 0xffffda027ca0) = -1 ENOENT (No such file or directory)
231680 getrandom("\xd7\x1c\x78\x4d\xdf\xa6\xf9\x8d", 8, GRND_NONBLOCK) = 8
231680 brk(NULL)                        = 0xaaaaea241000
231680 brk(0xaaaaea262000)              = 0xaaaaea262000
231680 openat(AT_FDCWD, "/proc/filesystems", O_RDONLY|O_CLOEXEC) = 3
231680 newfstatat(3, "", {st_mode=S_IFREG|0444, st_size=0, ...}, AT_EMPTY_PATH) = 0
231680 read(3, "nodev\tsysfs\nnodev\ttmpfs\nnodev\tbd"..., 1024) = 437
231680 read(3, "", 1024)                = 0
231680 close(3)                         = 0
231680 faccessat(AT_FDCWD, "/etc/selinux/config", F_OK) = -1 ENOENT (No such file or directory)
231680 getuid()                         = 0
231680 getgid()                         = 0
231680 geteuid()                        = 0
231680 capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=1<<CAP_CHOWN|1<<CAP_DAC_OVERRIDE|1<<CAP_DAC_READ_SEARCH|1<<CAP_FOWNER|1<<CAP_FSETID|1<<CAP_KILL|1<<CAP_SETGID|1<<CAP_SETUID|1<<CAP_SETPCAP|1<<CAP_LINUX_IMMUTABLE|1<<CAP_NET_BIND_SERVICE|1<<CAP_NET_BROADCAST|1<<CAP_NET_ADMIN|1<<CAP_NET_RAW|1<<CAP_IPC_LOCK|1<<CAP_IPC_OWNER|1<<CAP_SYS_MODULE|1<<CAP_SYS_RAWIO|1<<CAP_SYS_CHROOT|1<<CAP_SYS_PTRACE|1<<CAP_SYS_PACCT|1<<CAP_SYS_ADMIN|1<<CAP_SYS_BOOT|1<<CAP_SYS_NICE|1<<CAP_SYS_RESOURCE|1<<CAP_SYS_TIME|1<<CAP_SYS_TTY_CONFIG|1<<CAP_MKNOD|1<<CAP_LEASE|1<<CAP_AUDIT_WRITE|1<<CAP_AUDIT_CONTROL|1<<CAP_SETFCAP|1<<CAP_MAC_OVERRIDE|1<<CAP_MAC_ADMIN|1<<CAP_SYSLOG|1<<CAP_WAKE_ALARM|1<<CAP_BLOCK_SUSPEND|1<<CAP_AUDIT_READ|1<<CAP_PERFMON|1<<CAP_BPF|1<<CAP_CHECKPOINT_RESTORE, permitted=1<<CAP_CHOWN|1<<CAP_DAC_OVERRIDE|1<<CAP_DAC_READ_SEARCH|1<<CAP_FOWNER|1<<CAP_FSETID|1<<CAP_KILL|1<<CAP_SETGID|1<<CAP_SETUID|1<<CAP_SETPCAP|1<<CAP_LINUX_IMMUTABLE|1<<CAP_NET_BIND_SERVICE|1<<CAP_NET_BROADCAST|1<<CAP_NET_ADMIN|1<<CAP_NET_RAW|1<<CAP_IPC_LOCK|1<<CAP_IPC_OWNER|1<<CAP_SYS_MODULE|1<<CAP_SYS_RAWIO|1<<CAP_SYS_CHROOT|1<<CAP_SYS_PTRACE|1<<CAP_SYS_PACCT|1<<CAP_SYS_ADMIN|1<<CAP_SYS_BOOT|1<<CAP_SYS_NICE|1<<CAP_SYS_RESOURCE|1<<CAP_SYS_TIME|1<<CAP_SYS_TTY_CONFIG|1<<CAP_MKNOD|1<<CAP_LEASE|1<<CAP_AUDIT_WRITE|1<<CAP_AUDIT_CONTROL|1<<CAP_SETFCAP|1<<CAP_MAC_OVERRIDE|1<<CAP_MAC_ADMIN|1<<CAP_SYSLOG|1<<CAP_WAKE_ALARM|1<<CAP_BLOCK_SUSPEND|1<<CAP_AUDIT_READ|1<<CAP_PERFMON|1<<CAP_BPF|1<<CAP_CHECKPOINT_RESTORE, inheritable=0}) = 0
231680 prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0) = 0
231680 openat(AT_FDCWD, "/proc/sys/kernel/overflowuid", O_RDONLY|O_CLOEXEC) = 3
231680 read(3, "65534\n", 4079)         = 6
231680 read(3, "", 4073)                = 0
231680 close(3)                         = 0
231680 openat(AT_FDCWD, "/proc/sys/kernel/overflowgid", O_RDONLY|O_CLOEXEC) = 3
231680 read(3, "65534\n", 4079)         = 6
231680 read(3, "", 4073)                = 0
231680 close(3)                         = 0
231680 ioctl(1, TCGETS, {c_iflag=IGNPAR|ICRNL|IXON|IMAXBEL|IUTF8, c_oflag=NL0|CR0|TAB0|BS0|VT0|FF0|OPOST|ONLCR, c_cflag=B38400|CS8|CREAD, c_lflag=ISIG|ICANON|ECHO|ECHOE|ECHOK|IEXTEN|ECHOCTL|ECHOKE, ...}) = 0
231680 ioctl(1, TCGETS, {c_iflag=IGNPAR|ICRNL|IXON|IMAXBEL|IUTF8, c_oflag=NL0|CR0|TAB0|BS0|VT0|FF0|OPOST|ONLCR, c_cflag=B38400|CS8|CREAD, c_lflag=ISIG|ICANON|ECHO|ECHOE|ECHOK|IEXTEN|ECHOCTL|ECHOKE, ...}) = 0
231680 ioctl(1, TCGETS, {c_iflag=IGNPAR|ICRNL|IXON|IMAXBEL|IUTF8, c_oflag=NL0|CR0|TAB0|BS0|VT0|FF0|OPOST|ONLCR, c_cflag=B38400|CS8|CREAD, c_lflag=ISIG|ICANON|ECHO|ECHOE|ECHOK|IEXTEN|ECHOCTL|ECHOKE, ...}) = 0
231680 newfstatat(1, "", {st_mode=S_IFCHR|0600, st_rdev=makedev(0x88, 0x3), ...}, AT_EMPTY_PATH) = 0
231680 readlinkat(AT_FDCWD, "/proc/self/fd/1", "/dev/pts/3", 4095) = 10
231680 newfstatat(AT_FDCWD, "/dev/pts/3", {st_mode=S_IFCHR|0600, st_rdev=makedev(0x88, 0x3), ...}, 0) = 0
231680 getuid()                         = 0
231680 openat(AT_FDCWD, "/proc", O_RDONLY|O_PATH) = 3
231680 rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0
231680 wait4(-1, 0xffffda025b10, WNOHANG, NULL) = -1 ECHILD (No child processes)
231680 eventfd2(0, EFD_CLOEXEC)         = 4
231680 clone(child_stack=NULL, flags=CLONE_NEWNS|SIGCHLD) = 231681
231681 read(4,  <unfinished ...>
231680 openat(3, "231681/ns", O_RDONLY|O_PATH) = 5
231680 newfstatat(5, "mnt", {st_mode=S_IFREG|0444, st_size=0, ...}, 0) = 0
231680 close(5)                         = 0
231680 capset({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=0, permitted=0, inheritable=0}) = 0
231680 prctl(PR_SET_DUMPABLE, SUID_DUMP_USER) = 0
231680 write(4, "\1\0\0\0\0\0\0\0", 8)  = 8
231681 <... read resumed>"\1\0\0\0\0\0\0\0", 8) = 8
231680 close(4)                         = 0
231681 close(4 <unfinished ...>
231680 openat(3, "self/fd", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY <unfinished ...>
231681 <... close resumed>)             = 0
231680 <... openat resumed>)            = 4
231681 umask(000 <unfinished ...>
231680 newfstatat(4, "",  <unfinished ...>
231681 <... umask resumed>)             = 022
231680 <... newfstatat resumed>{st_mode=S_IFDIR|0500, st_size=5, ...}, AT_EMPTY_PATH) = 0
231680 fcntl(4, F_GETFL <unfinished ...>
231681 mount(NULL, "/", NULL, MS_REC|MS_SILENT|MS_SLAVE, NULL <unfinished ...>
231680 <... fcntl resumed>)             = 0x24800 (flags O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY)
231680 fcntl(4, F_SETFD, FD_CLOEXEC <unfinished ...>
231681 <... mount resumed>)             = 0
231680 <... fcntl resumed>)             = 0
231681 mount("tmpfs", "/tmp", "tmpfs", MS_NOSUID|MS_NODEV, NULL <unfinished ...>
231680 getdents64(4, 0xaaaaea2429b0 /* 7 entries */, 32768) = 168
231681 <... mount resumed>)             = 0
231680 close(3 <unfinished ...>
231681 getcwd( <unfinished ...>
231680 <... close resumed>)             = 0
231681 <... getcwd resumed>"/home/tbodt/ncdu", 4096) = 17
231681 chdir("/tmp" <unfinished ...>
231680 getdents64(4, 0xaaaaea2429b0 /* 0 entries */, 32768) = 0
231681 <... chdir resumed>)             = 0
231681 mkdirat(AT_FDCWD, "newroot", 0755 <unfinished ...>
231680 close(4)                         = 0
231681 <... mkdirat resumed>)           = 0
231680 signalfd4(-1, [CHLD], 8, SFD_CLOEXEC|SFD_NONBLOCK <unfinished ...>
231681 mount("newroot", "newroot", NULL, MS_MGC_VAL|MS_BIND|MS_REC|MS_SILENT, NULL <unfinished ...>
231680 <... signalfd4 resumed>)         = 3
231681 <... mount resumed>)             = 0
231680 ppoll([{fd=3, events=POLLIN}], 1, NULL, NULL, 0 <unfinished ...>
231681 mkdirat(AT_FDCWD, "oldroot", 0755) = 0
231681 pivot_root("/tmp", "oldroot")    = -1 EINVAL (Invalid argument)
231681 write(2, "bwrap: ", 7)           = 7
231681 write(2, "pivot_root", 10)       = 10
231681 write(2, ": Invalid argument", 18) = 18
231681 write(2, "\n", 1)                = 1
231681 exit_group(1)                    = ?
231681 +++ exited with 1 +++
231680 <... ppoll resumed>)             = 1 ([{fd=3, revents=POLLIN}])
231680 read(3, "\21\0\0\0\0\0\0\0\1\0\0\0\1\211\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128
231680 wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], WNOHANG, NULL) = 231681
231680 exit_group(1)                    = ?
231680 +++ exited with 1 +++

@tbodt
Copy link
Author

tbodt commented Jun 8, 2024

I wonder if you could simply replace the chroot operation in brl strat with some bind mounts and a pivot_root?

@paradigm
Copy link
Member

paradigm commented Jun 8, 2024

It definitely works to some degree - I've had this issue come up elsewhere; the fact I haven't had the time to update the website accordingly since then is slightly concerning.


Looks like it's:

231681 pivot_root("/tmp", "oldroot")    = -1 EINVAL (Invalid argument)

from man 2 pivot_root:

EINVAL new_root is not a mount point.

I see the mount point being created in your strace log, so I don't think this is it.

EINVAL put_old is not at or underneath new_root.

It's unlikely bwrap made this mistake

EINVAL The current root directory is not a mount point (because of an earlier chroot(2)).

Bedrock should be ensuring this isn't a concern, but you could double check that's the case on your machine just in case.

EINVAL The current root is on the rootfs (initial ramfs) mount; see NOTES.

This isn't a rootfs, probably not relevant

EINVAL Either the mount point at new_root, or the parent mount of that mount point, has propagation type MS_SHARED.

IIRC Bedrock doesn't set this on stratum roots or anything parent directory, but this is also something you could double check

EINVAL put_old is a mount point and has the propagation type MS_SHARED.

Also something you could double check


I wonder if you could simply replace the chroot operation in brl strat with some bind mounts and a pivot_root?

This is a very deep change and unsuitable for Poki. As mentioned before, I've already got a firm plan here for Naga.

You clearly have enough technical acumen to be able to assist with Bedrock and I really don't want to scare you off, but you've also found Bedrock at an awkward time between when effort on Poki is more or less done, and Naga isn't at a good point for this kind of dynamically-discovered contribution.

@tbodt
Copy link
Author

tbodt commented Jun 8, 2024

Bedrock should be ensuring this isn't a concern, but you could double check that's the case on your machine just in case.

I'm like 99% sure it's this - the same exact thing that made the unshare check fail.

This is a very deep change and unsuitable for Poki. As mentioned before, I've already got a firm plan here for Naga.

Yeah I figured this out in the process of attempting to implement it :P after realizing that to pivot_root without making everything explode, strat needs to create a new mount namespace first, which is just what Naga will do with a couple extra steps.

Perhaps what I can do is build this version of strat as a fork, or maybe a flag, to be used only in the rare situations where it solves more problems than it causes.

@tbodt
Copy link
Author

tbodt commented Jun 8, 2024

I'm like 99% sure it's this - the same exact thing that made the unshare check fail.

Wait, the check is slightly different? pivot_root requires a mount, unshare requires the root mount? In that case, worth looking closer... because strat debian cat /proc/self/mounts does show that / is a mount.

@tbodt
Copy link
Author

tbodt commented Jun 8, 2024

Learned about kprobes (holy shit what a tool!) and, well...the checks in the kernel source don't perfectly line up with the man page

	error = -EINVAL;
	new_mnt = real_mount(new.mnt);
	root_mnt = real_mount(root.mnt);
	old_mnt = real_mount(old.mnt);
	ex_parent = new_mnt->mnt_parent;
	root_parent = root_mnt->mnt_parent;
	if (IS_MNT_SHARED(old_mnt) ||
		IS_MNT_SHARED(ex_parent) ||
		IS_MNT_SHARED(root_parent))
		goto out4;

new, root, and old are struct paths, new and old are looked up from the parameters to pivot_root, root is the current chroot path. So what is this checking, exactly? return einval if

  • put_old is on a shared mount
  • the parent of the new mount is a shared mount
  • the parent of the current chroot's mount is a shared mount

and the manpage says

  • put_old is a mount point of a shared mount
  • the new mount is a shared mount
  • the parent of the new mount is a shared mount

only one of these is the same! wow...

thinking about this, the kernel checks actually make sense for the goal of preventing pivot_root from affecting other namespaces, so it's the manpage that's wrong. this always happens...


the third IS_MNT_SHARED is what's failing here. bwrap remembers to remount / as MS_SLAVE, but its / is /bedrock/strata/debian and there is a different mount above it which is still shared. idk why no one has run into this before

@tbodt tbodt mentioned this issue Jun 8, 2024
@tbodt
Copy link
Author

tbodt commented Jun 8, 2024

#304

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants