Docker, and the containers it makes has completely changed the way of packaging applications and deploying them. It injects our source code with mobility enabling them to run at scale. I use Docker for practically everything on my laptop, creating local development environments, testing environments and using disposable containers for doing some crazy stuff and deleting them right away if things get messy😅. I am sure that everyone who has ever used Docker must be amazed that how a single docker-run
command, let us create isolated and independent machines(called containers) within a few seconds.
Let's understand this magic while implementing our own Docker with a few lines Go code.
At the end of this post we'll be having a Go binary that would be capable of running any valid linux command/executable inside an isolated process(practically known as container).
docker run image <cmd> <params>
go run main.go run {some command} <cmd> <params>
- Go SDK(Linux)
- Any Linux distribution
- Docker for Linux
Linux is required because containers are practically a wrap around some awesome Linux technologies that we'll be exploring next.
- Namespaces - what an isolated process can see is defined and controlled by namespaces. It creates isolation by providing each process its own pseudo environment.
- Chroots - it control root filesystem for each process.
- Cgroups - what an isloted process can use as resource from host machine is enforced by cgroups(control groups).
Namespaces provide the isolation needed to run multiple containers on one machine while giving each what appears like it’s own environment. We have following 6 namespaces thus providing different levels and kinds of isolation.
- Unix Timesharing System
- Process IDs
- Mounts
- Networks
- User IDs
- InterProcess Communication
Enough with theory and definitions, now let's actually see how a container behaves differently from host machine.
We'll create an ubuntu docker container passing /bin/bash
as entrypoint. Use the following snippet.
docker run -it --rm ubuntu /bin/bash
We'll run a few Linux commands inside container(ubuntu 20.04) and host machine(ubuntu 20.04) and observer their behaviour inside both environments:
- hostname - return name of the host inside which bash is running.
- ps - return list of active process running inside the environment.
We can see that, when we run same commands in docker container and our host machine we get different results.
- Container is assigned a hostname from docker(container ID), while our system have a completely different hostname.
- Lots of process are running inside our host but our container is only aware of process running inside it, thus providing isolation.
We have got a taste of a how these containers function. It's time to open our editor and write some go code to achieve something similar that docker does.
Create a main.go
file with main package and start adding the following components. Do read the comments mentioned in these code snippet for a better understanding.
- command switch
func main() {
switch os.Args[1] {
case "run":
run()
case "child":
child()
default:
panic("invalid command")
}
}
The command switch picks the command-line argument passed, and runs the function mapped to that argument.
- run function
func run() {
fmt.Printf("Running %v as %d\n", os.Args[2:], os.Getpid())
// proc dir is a directory where all processes metadata is there
// our temporary binary will also be present here
// below line executes child function inside the newly created container
cmd := exec.Command("/proc/self/exe", append([]string{"child"}, os.Args[2:]...)...)
// attatching os-std process to our cmd-std process
cmd.Stdin = os.Stdin
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
// setting some system process attributes
// below line of code is responsible for creating a new isolated process
cmd.SysProcAttr = &syscall.SysProcAttr{
// Cloning is what creates the process(container) in which we would be running our command.
// CLONE_NEWUTS will allow to have our own hostname inside our container by creating a new unix timesharing system.
// CLONE_NEWPID assigns pids to only process inside the new namspace.
// CLONE_NEWNS new namespace for mount.
Cloneflags: syscall.CLONE_NEWUTS | syscall.CLONE_NEWPID | syscall.CLONE_NEWNS,
// Mounts in systemd gets recursively shared property.
// Unshare the recursively shared property for new mount namespace.
// It prevents sharing of new namespace with the host.
Unshareflags: syscall.CLONE_NEWNS,
}
// running the command and catching error
if err := cmd.Run(); err != nil {
log.Fatal("Error: ", err)
}
}
The run()
function is responsible for creating an isolated process(container) and then execute itself inside this isolated process, this time invoking child()
function
- child function
func child() {
fmt.Printf("Running %v as %d\n", os.Args[2:], os.Getpid())
// below are some system calls that set some container properties
// sets hostname for newly created namespace
must(syscall.Sethostname([]byte("maverick")))
// setting root director for the container
must(syscall.Chroot("/"))
// making "/" as default dir
must(syscall.Chdir("/"))
// mounting proc dir to see the process running inside container
must(syscall.Mount("proc", "proc", "proc", 0, ""))
// below line finally executes the user-command inside our own created container!
cmd := exec.Command(os.Args[2], os.Args[3:]...)
// attatching os-std process to our cmd-std process
cmd.Stdin = os.Stdin
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
// running the command and catching error
if err := cmd.Run(); err != nil {
log.Fatal("Error: ", err)
}
// unmount the proc after command is finished
syscall.Unmount("/proc", 0)
}
The child()
function is invoked by run()
as a child process inside container created by run()
. It is responsible for some system calls for setting some container properties and finally execute the command dispatched from user.
- must function
func must(err error) {
if err != nil {
panic(err)
}
}
The must()
is a simple error wrapper that panics if any system call invoked inside child function fails.
Now that we have a mini docker program that can actually create isolated and independent containers on our host machine. Let's put it to some use!
The command that we would be running inside our container is /bin/bash
, that will start a new bash program inside our container.
Run following snippet to run the binary that will create the container.
go run main.go run /bin/bash
When we run the above command passing /bin/bash
as argument following changes happen
- a new container is created running a bash process in isolation to the system.
Running [/bin/bash] as 279518
means our bash process running in the container has a pid of 279518 on host, whileRunning [/bin/bash] as 1
is the pid of the same process inside our container.- both
root@maverick
and hostname command tells us that hostname of our container is maverick, that is passed as system call by our Go program. - running
ps
return only the process running inside our container and has no information about other system processes. - since we mounted the
root dir
of our host machine asChroot
of our container, upon runningls
we can see all those root files.
Now we are sure that our bash is running as an isolated process, let's exit
from it and observe come changes.
- upon exit we kill our container and return back to host machine
- hostname and ps return different metrics and values that represents our host machine.
In above execution we mounted root directory of host machine as our container's root dir
. This practically gives all executables and metadata which our host machine holds, thus creating a vulnerability. A better approach to this, could be creating a pseudo root directory with stripped down executable and thus mounting is as container's root dir
.
We have successfully created a Go program that runs a Linux executable as an isolated container with its own hostname, own process management and pseudo root-dir
. For creating a fully isolated and independent container similar to a Docker container we'll have to touch all six namespaces, configure all Cgroups, dynamically create Chroots for every container and do layer caching(which makes docker awesome to use 😀) is beyond the scope of this article.
I hope this article helped you understand how docker works and how it wraps cool pieces of technology that comes with linux kernel.
For an in-depth understanding of how docker works?, I would recommend you to check containers from scratch by Liz Rice.
Akshit Sadana [email protected]
- Github: @Akshit8
- LinkedIn: @akshitsadana
- containers from scratch by Liz Rice
- Build Your Own Container Using Less than 100 Lines of Go by Julian Friedman