At Reputation, we extensively work on Kubernetes clusters. Working on clusters and debugging them, helped me gain a deep understanding on how containers work at a low level. This blog post describes my findings on containers.
Setup
- Ubuntu 18.04
- Docker version 18.09.7, build 2d0083d
- containerd 1.2.6–0ubuntu1~18.04.2
- runc version spec: 1.0.1-dev
- kernel version: 5.3.0–61-generic
Internals
There are no such things as containers, Docker is only leveraging kernel feature called a namespace
through different programs to give the illusion of container construct. Each namespace isolates a different dimension of the container and can be shared to other containers.
Namespaces
There are 5 namespaces involved in the container creation process
- pid : Isolation of Processes, different PIDs per container
- net : Isolation of Network Interface (eth0)
- ipc : Isolation of Inter Process Communication
- mnt : Isolation of filesystem mount points
- uts : Isolation kernel and version identifiers (UTS: Unix Timesharing System)
Control Groups
Control Groups or cgroups for short, are kernel mechanisms that limit a namespaces access to hardware resources, used by PID / IPC Namespaces.
Filesystem
The container doesn’t use the same filesystem as the host machine, it uses the Union File System, they work by using layers. Other variants are:
- AUFS
- btrfs
- vfs
- Device Mapper
Container Formats
All three previously mentioned kernel features define what is called a Container Format
, for Docker this container format is called libcontainer
, for other container runtimes for example Rocket, a competitor of docker is called App Container
.
Architecture
Docker by itself is just a daemon that runs in the background, it uses a client-server model to operate. The known docker
command from the CLI is the client and where images are deployed is the “server”. Docker is composed of pluggable components that handle different aspects of running a container.
dockerd
: It is the docker daemon which listens to the docker api calls and leverages containerd to create new containers, it binds to/var/run/docker.sock
and creates the pid file/var/run/docker.pid
, this is why you need either root or be a member of the docker group in order to run the docker daemon.The daemon can create 3 different types of sockets (tcp, fd, unix), the default one on my system isfd
as shown byps
command$ ps $(pidof dockerd) PID TTY STAT TIME COMMAND 6811 ? Ssl 0:14 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sockcontainerd
: It’s involved in the process of setting up network interfaces and filesystem / storage, basically preparing an environment for the actual container runtime, in this case runc [1]. It also implements the OCI runtime specifications [1].runc
: The actual process in charge of running the process, it is run as a child process of containerd, that is run through thecontainerd-shim
A high level overview of these components are shown below:
Docker Deployment Analysis
Running docker info
prints to stdout the configuration options with which docker was deployed. On my system it looks like this.docker info
Containers: 48
Running: 1
Paused: 0 what should we do today
Stopped: 47
Images: 110
Server Version: 18.09.7
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version:
runc version: N/A
init version: v0.18.0 (expected: fec3683b971d9c3ef73f284f176672c44b448662)
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 5.3.0-61-generic
Operating System: Ubuntu 18.04.3 LTS
OSType: linux
Architecture: x86_64 CPUs: 8
Total Memory: 15.49GiB
Name: xxxx ID: xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:
Docker Root Dir: /var/lib/docker
Username: xxxxxxxxx
Registry: https://index.docker.io/v1/ Labels:
From here we can see that the docker root is located at /var/lib/docker
. The contents of the directory are as follows.$ sudo ls -all /var/lib/docker -all
total 84 drwx--x--x 14 root root 4096 .
drwxr-xr-x 90 root root 4096 ..
drwx------ 2 root root 4096 builder
drwx------ 4 root root 4096 buildkit
drwx------ 50 root root 4096 containers
drwx------ 3 root root 4096 image
drwxr-x--- 3 root root 4096 network
drwx------ 213 root root 28672 overlay2
drwx------ 4 root root 4096 plugins
drwx------ 2 root root 4096 runtimes
drwx------ 2 root root 4096 swarm
drwx------ 2 root root 4096 tmp
drwx------ 2 root root 4096 trust
drwx------ 3 root root 4096 volumes
This directory holds all the files needed by docker/containerd/runc to actually execute a container. Most notably the containers
directory holds the configuration for the environment for a container, while the overlay2
holds the filesystem directories that will be mounted on the container’s filesystem.
Running a container
We will analyze the container in depth using the docker command which runs a sample nginx image on any port
$ docker run -P -d nginx
79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28
-P
: Randomly select the ports to bind, you can statically set them with-p 80:80
or-p 443:443
-d
: Deattached mode, it removes the process from the terminal, you can still kill the container withdocker kill ${ID}
What is returned here is the internal ID for the container that was just spawned.
This will be the basis for further analysis.
Process Map
The initial process on Linux Ubuntu is systemd
, and other procesess are forks of this, including containerd
and dockerd
is illustrated below:
Viewing the running process tree:
$ pstree
systemd,1 splash
|-containerd,905
| |-containerd-shim,11738 -namespace moby -workdir...
| | |-bash,25790
| | |-nginx,11756
| | | `-nginx,11867
| | |-{containerd-shim},11739
| | |-{containerd-shim},11740
| | |-{containerd-shim},11741
| | |-{containerd-shim},11742
| | |-{containerd-shim},11743
| | |-{containerd-shim},11744
| | |-{containerd-shim},11798
| | |-{containerd-shim},11799
| | `-{containerd-shim},12181
| |-{containerd},939
| |-{containerd},940
| |-{containerd},941
| |-{containerd},942
| |-{containerd},943
| |-{containerd},980
| |-{containerd},981
| |-{containerd},982
| |-{containerd},984
| |-{containerd},1005
| |-{containerd},1006
| |-{containerd},1015
| |-{containerd},1016
| |-{containerd},1028
| |-{containerd},6871
| |-{containerd},6872
| |-{containerd},9688
| |-{containerd},9689
| |-{containerd},15141
| |-{containerd},4423
| |-{containerd},6450
| |-{containerd},12766
| |-{containerd},8467
| |-{containerd},13764
| |-{containerd},2002
| `-{containerd},7109
|-dockerd,6811 -H fd:// --containerd=/run/containerd/containerd.sock
| |-docker-proxy,11728 -proto tcp -host-ip 0.0.0.0 -host-port 32768 -container-ip 172.17.0.2 -container-port 80
| | |-{docker-proxy},11730
| | |-{docker-proxy},11731
| | |-{docker-proxy},11732
| | |-{docker-proxy},11733
| | |-{docker-proxy},11734
| | |-{docker-proxy},11735
| | |-{docker-proxy},11736
| | `-{docker-proxy},11737
| |-{dockerd},6824
| |-{dockerd},6826
| |-{dockerd},6827
| |-{dockerd},6828
| |-{dockerd},6830
| |-{dockerd},6832
| |-{dockerd},6833
| |-{dockerd},6834
| |-{dockerd},6835
| |-{dockerd},6836
| |-{dockerd},6850
| |-{dockerd},6851
| |-{dockerd},6852
| |-{dockerd},6853
| |-{dockerd},6856
| |-{dockerd},6857
| |-{dockerd},6859
| |-{dockerd},6860
| |-{dockerd},6861
| |-{dockerd},6862
| |-{dockerd},6863
| |-{dockerd},6864
| |-{dockerd},6865
| |-{dockerd},6866
| |-{dockerd},6867
| |-{dockerd},6868
| `-{dockerd},6869
From here we can see that the systemd process has a PID=1 and is the first process on the system, followed by containerd with PID=905
and dockerd with PID=6811
.
dockerd
is started with the host flag-H
set tofd://
which means it is using linux file descriptors as the default socket for connection and the--containerd
flag which sets the GRPc address, set to/run/containerd/containerd.sock
, it also forks a docker-proxy processes [4].containerd
is started with no flags, but when a container is run it uses acontainerd-shim
withPID=11738
that handles the container creation process. containerd-shim is started with following arguments as can be seen by grepping containerd-shim ps command output.
$ ps -ef | grep containerd-shim
UID PID PPID C TTY TIME CMD
root 11738 905 0 ? 00:00:01 containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28 -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd -runtime-root /var/run/docker/runtime-runc
-namespace
: moby-workdir
: /var/lib/containerd/io.containerd.runtime.v1.linux/moby/7941550….-address
: /run/containerd/containerd.sock-containerd-binary
: /usr/bin/containerd-runtime
: /var/run/docker/runtime-runc
The namespace name is found in the /var/lib/containerd/io.containerd.runtime.v1.linux
and it holds a folder with the container ID from docker.
/var/lib/containerd/io.containerd.runtime.v1.linux $ tree
.
└── moby
└── 79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28
Docker Anatomy
I already have a running container with an nginx image running. This can be further verified by running docker ps
.
$ docker ps --no-trunc
CONTAINER ID IMAGE COMMAND PORTS
79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28 nginx "/docker-entrypoint.sh nginx -g 'daemon off;'" 0.0.0.0:32768->80/tcp
When a container is created it also creates a directory with its container ID on /var/lib/docker/containers
or more generally ${DOCKER_ROOT}/containers
$ ls -all /var/lib/docker/containers/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28
total 44
drwx------ 4 root root 4096 jun 28 20:47 .
drwx------ 51 root root 4096 jun 28 20:47 ..
-rw-r----- 1 root root 1072 jun 28 20:47 79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28-json.log
drwx------ 2 root root 4096 jun 28 20:47 checkpoints
-rw------- 1 root root 2969 jun 28 20:47 config.v2.json
-rw-r--r-- 1 root root 1420 jun 28 20:47 hostconfig.json
-rw-r--r-- 1 root root 13 jun 28 20:47 hostname
-rw-r--r-- 1 root root 174 jun 28 20:47 hosts
drwx------ 3 root root 4096 jun 28 20:47 mounts
-rw-r--r-- 1 root root 645 jun 28 20:47 resolv.conf
-rw-r--r-- 1 root root 71 jun 28 20:47 resolv.conf.hash
A break down of all these files is shown below:
- ${CONTAINER_ID}-json.log: This file logs what the containerd is doing and outputting to stdout, since I’ve only started the container all it has done right now is look for the
/docker-entrypoint.sh
and10-listen-on-ipv6-by-default.sh
This can be verified by checking the contents of the log file
{"log":"/docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration\n","stream":"stdout"}
{"log":"/docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/\n","stream":"stdout"}
{"log":"/docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh\n","stream":"stdout"}
{"log":"10-listen-on-ipv6-by-default.sh: Getting the checksum of /etc/nginx/conf.d/default.conf\n","stream":"stdout"}
{"log":"10-listen-on-ipv6-by-default.sh: Enabled listen on IPv6 in /etc/nginx/conf.d/default.conf\n","stream":"stdout"}
{"log":"/docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh\n","stream":"stdout"}
{"log":"/docker-entrypoint.sh: Configuration complete; ready for start up\n","stream":"stdout"}
Exec into the container to verify this:
$ docker exec -it 79415509360e /bin/bash
root@79415509360e:/# cat /docker-entrypoint.sh
#!/usr/bin/env sh
# vim:sw=4:ts=4:et
set -e
if [ "$1" = "nginx" -o "$1" = "nginx-debug" ]; then
if /usr/bin/find "/docker-entrypoint.d/" -mindepth 1 -maxdepth 1 -type f -print -quit 2>/dev/null | read v; then
echo "$0: /docker-entrypoint.d/ is not empty, will attempt to perform configuration"
echo "$0: Looking for shell scripts in /docker-entrypoint.d/"
find "/docker-entrypoint.d/" -follow -type f -print | sort -n | while read -r f; do
case "$f" in
*.sh)
if [ -x "$f" ]; then
echo "$0: Launching $f";
"$f"
else
# warn on shell scripts without exec bit
echo "$0: Ignoring $f, not executable";
fi
;;
*) echo "$0: Ignoring $f";;
esac
done
echo "$0: Configuration complete; ready for start up"
else
echo "$0: No files found in /docker-entrypoint.d/, skipping configuration"
fi
fi
exec "$@"
root@79415509360e:~# ls -all /docker-entrypoint.d/
total 16
drwxr-xr-x 1 root root 4096 .
drwxr-xr-x 1 root root 4096 ..
-rwxrwxr-x 1 root root 1963 10-listen-on-ipv6-by-default.sh
-rwxrwxr-x 1 root root 1043 20-envsubst-on-templates.sh
As you can see, at container startup it executes whatever is found at /docker-entrypoint.sh
and since the contents reference a file called 10-listen-on-ipv6-by-default.sh
it shows up on the ${CONTAINER_ID}-json.log
- config.v2.json: A JSON file for configuring the container,
docker inspect ${CONTAINER_ID}
gets most of its data from this file. When runningdocker run
with parameters, those parameters are mapped to the attributes of this file, for example-p 80:80
will create the correct mapping inNetworkSettings.Ports
. Broadly speaking all the parameters and Dockerfile configurations are stored on this file.
root@earth:/var/lib/docker/containers/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28# cat config.v2.json | jq
{
"StreamConfig": {},
"State": {
"Running": true,
"Paused": false,
"Restarting": false,
"OOMKilled": false,
"RemovalInProgress": false,
"Dead": false,
"Pid": 11756,
"ExitCode": 0,
"Error": "",
"StartedAt": "2020-06-29T00:47:27.064480116Z",
"FinishedAt": "0001-01-01T00:00:00Z",
"Health": null
},
"ID": "79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28",
"Created": "2020-06-29T00:47:26.622421221Z",
"Managed": false,
"Path": "/docker-entrypoint.sh",
"Args": [
"nginx",
"-g",
"daemon off;"
],
"Config": {
"Hostname": "79415509360e",
"Domainname": "",
"User": "",
"AttachStdin": false,
"AttachStdout": false,
"AttachStderr": false,
"ExposedPorts": {
"80/tcp": {}
},
"Tty": false,
"OpenStdin": false,
"StdinOnce": false,
"Env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"NGINX_VERSION=1.19.0",
"NJS_VERSION=0.4.1",
"PKG_RELEASE=1~buster"
],
"Cmd": [
"nginx",
"-g",
"daemon off;"
],
"ArgsEscaped": true,
"Image": "nginx",
"Volumes": null,
"WorkingDir": "",
"Entrypoint": [
"/docker-entrypoint.sh"
],
"OnBuild": null,
"Labels": {
"maintainer": "NGINX Docker Maintainers <docker-maint@nginx.com>"
},
"StopSignal": "SIGTERM"
},
"Image": "sha256:4392e5dad77dbaf6a573650b0fe1e282b57c5fba6e6cea00a27c7d4b68539b81",
"NetworkSettings": {
"Bridge": "",
"SandboxID": "603fbb30c2e591f9ccc4c21d7b7fc569abb377095d1bdcf7bb79ae7a5c5155c7",
"HairpinMode": false,
"LinkLocalIPv6Address": "",
"LinkLocalIPv6PrefixLen": 0,
"Networks": {
"bridge": {
"IPAMConfig": null,
"Links": null,
"Aliases": null,
"NetworkID": "a8b4c847b9a1ee58ef02a801e84288b536e44030eed13c67566ab0f56585ae49",
"EndpointID": "9430a8eba0a7bdf871c921fc12d7dab52047e37b49207b6dbe2d7370c7a914e1",
"Gateway": "172.17.0.1",
"IPAddress": "172.17.0.2",
"IPPrefixLen": 16,
"IPv6Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"MacAddress": "02:42:ac:11:00:02",
"DriverOpts": null,
"IPAMOperational": false
}
},
"Service": null,
"Ports": {
"80/tcp": [
{
"HostIp": "0.0.0.0",
"HostPort": "32768"
}
]
},
"SandboxKey": "/var/run/docker/netns/603fbb30c2e5",
"SecondaryIPAddresses": null,
"SecondaryIPv6Addresses": null,
"IsAnonymousEndpoint": true,
"HasSwarmEndpoint": false
},
"LogPath": "/var/lib/docker/containers/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28-json.log",
"Name": "/dazzling_goldberg",
"Driver": "overlay2",
"OS": "linux",
"MountLabel": "",
"ProcessLabel": "",
"RestartCount": 0,
"HasBeenStartedBefore": true,
"HasBeenManuallyStopped": false,
"MountPoints": {},
"SecretReferences": null,
"ConfigReferences": null,
"AppArmorProfile": "docker-default",
"HostnamePath": "/var/lib/docker/containers/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28/hostname",
"HostsPath": "/var/lib/docker/containers/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28/hosts",
"ShmPath": "/var/lib/docker/containers/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28/mounts/shm",
"ResolvConfPath": "/var/lib/docker/containers/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28/resolv.conf",
"SeccompProfile": "",
"NoNewPrivileges": false
}
- hostconfig.json: This file is mainly responsible for setting up the network interfaces, bridge modes and dns, among other things if you are running `docker-compose` (docker-compose is a tool for defining and running multi-container docker applications) , then this file would contain different settings, since we are using docker to run a simple container the file looks like this.
root@earth:/var/lib/docker/containers/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28# cat hostconfig.json | jq
{
"Binds": null,
"ContainerIDFile": "",
"LogConfig": {
"Type": "json-file",
"Config": {}
},
"NetworkMode": "default",
"PortBindings": {},
"RestartPolicy": {
"Name": "no",
"MaximumRetryCount": 0
},
"AutoRemove": false,
"VolumeDriver": "",
"VolumesFrom": null,
"CapAdd": null,
"CapDrop": null,
"Dns": [],
"DnsOptions": [],
"DnsSearch": [],
"ExtraHosts": null,
"GroupAdd": null,
"IpcMode": "shareable",
"Cgroup": "",
"Links": null,
"OomScoreAdj": 0,
"PidMode": "",
"Privileged": false,
"PublishAllPorts": true,
"ReadonlyRootfs": false,
"SecurityOpt": null,
"UTSMode": "",
"UsernsMode": "",
"ShmSize": 67108864,
"Runtime": "runc",
"ConsoleSize": [
0,
0
],
"Isolation": "",
"CpuShares": 0,
"Memory": 0,
"NanoCpus": 0,
"CgroupParent": "",
"BlkioWeight": 0,
"BlkioWeightDevice": [],
"BlkioDeviceReadBps": null,
"BlkioDeviceWriteBps": null,
"BlkioDeviceReadIOps": null,
"BlkioDeviceWriteIOps": null,
"CpuPeriod": 0,
"CpuQuota": 0,
"CpuRealtimePeriod": 0,
"CpuRealtimeRuntime": 0,
"CpusetCpus": "",
"CpusetMems": "",
"Devices": [],
"DeviceCgroupRules": null,
"DiskQuota": 0,
"KernelMemory": 0,
"MemoryReservation": 0,
"MemorySwap": 0,
"MemorySwappiness": null,
"OomKillDisable": false,
"PidsLimit": 0,
"Ulimits": null,
"CpuCount": 0,
"CpuPercent": 0,
"IOMaximumIOps": 0,
"IOMaximumBandwidth": 0,
"MaskedPaths": [
"/proc/asound",
"/proc/acpi",
"/proc/kcore",
"/proc/keys",
"/proc/latency_stats",
"/proc/timer_list",
"/proc/timer_stats",
"/proc/sched_debug",
"/proc/scsi",
"/sys/firmware"
],
"ReadonlyPaths": [
"/proc/bus",
"/proc/fs",
"/proc/irq",
"/proc/sys",
"/proc/sysrq-trigger"
]
}
- hostname: Contains the hostname for the container.
# Host machine
$ cat /var/lib/docker/containers/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28/hostname
79415509360e
# On the container
root@79415509360e $ cat /etc/hostname
79415509360e
- resolv.conf: Contains the
resolv.conf
for the container, mounted at/etc/resolv.conf
root@earth:/var/lib/docker/containers/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28# cat /var/lib/docker/containers/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28/resolv.conf
# This file is managed by man:systemd-resolved(8). Do not edit.
#
# This is a dynamic resolv.conf file for connecting local clients directly to
# all known uplink DNS servers. This file lists all configured search domains.
#
# Third party programs must not access this file directly, but only through the
# symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a different way,
# replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.
nameserver 192.168.1.7
nameserver 1.1.1.1
nameserver 8.8.8.8
search localdomain
root@79415509360e:~# cat /etc/resolv.conf
# This file is managed by man:systemd-resolved(8). Do not edit.
#
# This is a dynamic resolv.conf file for connecting local clients directly to
# all known uplink DNS servers. This file lists all configured search domains.
#
# Third party programs must not access this file directly, but only through the
# symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a different way,
# replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.
nameserver 192.168.1.7
nameserver 1.1.1.1
nameserver 8.8.8.8
search localdomain
- resolv.conf.hash: Just a file that holds the sha-256 check sum for the previously mentioned resolv.conf file
$ cat /var/lib/docker/containers/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28/resolv.conf.hash
sha256:4b2f8601ebbea68e34b4218913bd52e6acb9e816a7024e6ff9e4ed3ecf71b878
shasum -a 256 resolv.conf
4b2f8601ebbea68e34b4218913bd52e6acb9e816a7024e6ff9e4ed3ecf71b878 resolv.conf
Docker Filesystem and Files
We know that there are some files from /var/lib/docker/containers/${CONTAINER_ID}
that are mounted automatically as part of the startup process, most notably hostname
and resolv.conf
but what about the rest of the filesystem? For that we need to look in docker inspect
.
$ docker inspect 79415509360e
[
{
"Id": "79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28",
"Created": "2020-06-29T00:47:26.622421221Z",
"Path": "/docker-entrypoint.sh",
"Args": [
"nginx",
"-g",
"daemon off;"
],
"State": {
"Status": "running",
"Running": true,
"Paused": false,
"Restarting": false,
"OOMKilled": false,
"Dead": false,
"Pid": 11756,
"ExitCode": 0,
"Error": "",
"StartedAt": "2020-06-29T00:47:27.064480116Z",
"FinishedAt": "0001-01-01T00:00:00Z"
},
"Image": "sha256:4392e5dad77dbaf6a573650b0fe1e282b57c5fba6e6cea00a27c7d4b68539b81",
"ResolvConfPath": "/var/lib/docker/containers/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28/resolv.conf",
"HostnamePath": "/var/lib/docker/containers/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28/hostname",
"HostsPath": "/var/lib/docker/containers/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28/hosts",
"LogPath": "/var/lib/docker/containers/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28-json.log",
"Name": "/dazzling_goldberg",
"RestartCount": 0,
"Driver": "overlay2",
"Platform": "linux",
"MountLabel": "",
"ProcessLabel": "",
"AppArmorProfile": "docker-default",
"ExecIDs": [
"2aabd28bfd6a8b19e9190809a33ecfaa90473980f63ce417903cc6679513af22"
],
"HostConfig": {
"Binds": null,
"ContainerIDFile": "",
"LogConfig": {
"Type": "json-file",
"Config": {}
},
"NetworkMode": "default",
"PortBindings": {},
"RestartPolicy": {
"Name": "no",
"MaximumRetryCount": 0
},
"AutoRemove": false,
"VolumeDriver": "",
"VolumesFrom": null,
"CapAdd": null,
"CapDrop": null,
"Dns": [],
"DnsOptions": [],
"DnsSearch": [],
"ExtraHosts": null,
"GroupAdd": null,
"IpcMode": "shareable",
"Cgroup": "",
"Links": null,
"OomScoreAdj": 0,
"PidMode": "",
"Privileged": false,
"PublishAllPorts": true,
"ReadonlyRootfs": false,
"SecurityOpt": null,
"UTSMode": "",
"UsernsMode": "",
"ShmSize": 67108864,
"Runtime": "runc",
"ConsoleSize": [
0,
0
],
"Isolation": "",
"CpuShares": 0,
"Memory": 0,
"NanoCpus": 0,
"CgroupParent": "",
"BlkioWeight": 0,
"BlkioWeightDevice": [],
"BlkioDeviceReadBps": null,
"BlkioDeviceWriteBps": null,
"BlkioDeviceReadIOps": null,
"BlkioDeviceWriteIOps": null,
"CpuPeriod": 0,
"CpuQuota": 0,
"CpuRealtimePeriod": 0,
"CpuRealtimeRuntime": 0,
"CpusetCpus": "",
"CpusetMems": "",
"Devices": [],
"DeviceCgroupRules": null,
"DiskQuota": 0,
"KernelMemory": 0,
"MemoryReservation": 0,
"MemorySwap": 0,
"MemorySwappiness": null,
"OomKillDisable": false,
"PidsLimit": 0,
"Ulimits": null,
"CpuCount": 0,
"CpuPercent": 0,
"IOMaximumIOps": 0,
"IOMaximumBandwidth": 0,
"MaskedPaths": [
"/proc/asound",
"/proc/acpi",
"/proc/kcore",
"/proc/keys",
"/proc/latency_stats",
"/proc/timer_list",
"/proc/timer_stats",
"/proc/sched_debug",
"/proc/scsi",
"/sys/firmware"
],
"ReadonlyPaths": [
"/proc/bus",
"/proc/fs",
"/proc/irq",
"/proc/sys",
"/proc/sysrq-trigger"
]
},
"GraphDriver": {
"Data": {
"LowerDir": "/var/lib/docker/overlay2/a125ed1c07acbc7873e2bcb2e05b0536dd8e2d86b24bf35116a3435fe6659bf2-init/diff:/var/lib/docker/overlay2/df0cfc3535d26b05d11e44e4398f99bf39a920615c5810f1690faaae120826e9/diff:/var/lib/docker/overlay2/bc9429ad6ede34e6704739f342a5c4f2400e02744d5f85e11084866ed636069c/diff:/var/lib/docker/overlay2/092a847640b36f15c2e9103767cb06d8d04e7be675dbef4d0f3ee81d019b71d4/diff:/var/lib/docker/overlay2/bbf65ed8358dec49eafff808a9e61ebdb5edb8a592238e81045a4f79daeb1cd6/diff:/var/lib/docker/overlay2/fe33dfe667d8a4d39e6fcdccdfa1fe3681d84140f4fd9b6dbe0e8195446dee69/diff",
"MergedDir": "/var/lib/docker/overlay2/a125ed1c07acbc7873e2bcb2e05b0536dd8e2d86b24bf35116a3435fe6659bf2/merged",
"UpperDir": "/var/lib/docker/overlay2/a125ed1c07acbc7873e2bcb2e05b0536dd8e2d86b24bf35116a3435fe6659bf2/diff",
"WorkDir": "/var/lib/docker/overlay2/a125ed1c07acbc7873e2bcb2e05b0536dd8e2d86b24bf35116a3435fe6659bf2/work"
},
"Name": "overlay2"
},
"Mounts": [],
"Config": {
"Hostname": "79415509360e",
"Domainname": "",
"User": "",
"AttachStdin": false,
"AttachStdout": false,
"AttachStderr": false,
"ExposedPorts": {
"80/tcp": {}
},
"Tty": false,
"OpenStdin": false,
"StdinOnce": false,
"Env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"NGINX_VERSION=1.19.0",
"NJS_VERSION=0.4.1",
"PKG_RELEASE=1~buster"
],
"Cmd": [
"nginx",
"-g",
"daemon off;"
],
"ArgsEscaped": true,
"Image": "nginx",
"Volumes": null,
"WorkingDir": "",
"Entrypoint": [
"/docker-entrypoint.sh"
],
"OnBuild": null,
"Labels": {
"maintainer": "NGINX Docker Maintainers <docker-maint@nginx.com>"
},
"StopSignal": "SIGTERM"
},
"NetworkSettings": {
"Bridge": "",
"SandboxID": "603fbb30c2e591f9ccc4c21d7b7fc569abb377095d1bdcf7bb79ae7a5c5155c7",
"HairpinMode": false,
"LinkLocalIPv6Address": "",
"LinkLocalIPv6PrefixLen": 0,
"Ports": {
"80/tcp": [
{
"HostIp": "0.0.0.0",
"HostPort": "32768"
}
]
},
"SandboxKey": "/var/run/docker/netns/603fbb30c2e5",
"SecondaryIPAddresses": null,
"SecondaryIPv6Addresses": null,
"EndpointID": "9430a8eba0a7bdf871c921fc12d7dab52047e37b49207b6dbe2d7370c7a914e1",
"Gateway": "172.17.0.1",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"IPAddress": "172.17.0.2",
"IPPrefixLen": 16,
"IPv6Gateway": "",
"MacAddress": "02:42:ac:11:00:02",
"Networks": {
"bridge": {
"IPAMConfig": null,
"Links": null,
"Aliases": null,
"NetworkID": "a8b4c847b9a1ee58ef02a801e84288b536e44030eed13c67566ab0f56585ae49",
"EndpointID": "9430a8eba0a7bdf871c921fc12d7dab52047e37b49207b6dbe2d7370c7a914e1",
"Gateway": "172.17.0.1",
"IPAddress": "172.17.0.2",
"IPPrefixLen": 16,
"IPv6Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"MacAddress": "02:42:ac:11:00:02",
"DriverOpts": null
}
}
}
}
]
We can see that the files mentioned previously are mounted by virtue of docker.
"ResolvConfPath": "/var/lib/docker/containers/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28/resolv.conf",
"HostnamePath": "/var/lib/docker/containers/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28/hostname",
"HostsPath": "/var/lib/docker/containers/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28/hosts",
"LogPath": "/var/lib/docker/containers/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28-json.log",
We can also see that the filesystem driver is shown here.
"Driver": "overlay2",
Also there are references to some /proc/
directories that are either masked or marked as read-only, these attributes are also referenced in hostconfig.json
"MaskedPaths": [
"/proc/asound",
"/proc/acpi",
"/proc/kcore",
"/proc/keys",
"/proc/latency_stats",
"/proc/timer_list",
"/proc/timer_stats",
"/proc/sched_debug",
"/proc/scsi",
"/sys/firmware"
],
"ReadonlyPaths": [
"/proc/bus",
"/proc/fs",
"/proc/irq",
"/proc/sys",
"/proc/sysrq-trigger"
]
Finally there seems to be some reference to the overlay filesystem through some attribute called GraphDriver.
"GraphDriver": {
"Data": {
"LowerDir": "/var/lib/docker/overlay2/a125ed1c07acbc7873e2bcb2e05b0536dd8e2d86b24bf35116a3435fe6659bf2-init/diff:/var/lib/docker/overlay2/df0cfc3535d26b05d11e44e4398f99bf39a920615c5810f1690faaae120826e9/diff:/var/lib/docker/overlay2/bc9429ad6ede34e6704739f342a5c4f2400e02744d5f85e11084866ed636069c/diff:/var/lib/docker/overlay2/092a847640b36f15c2e9103767cb06d8d04e7be675dbef4d0f3ee81d019b71d4/diff:/var/lib/docker/overlay2/bbf65ed8358dec49eafff808a9e61ebdb5edb8a592238e81045a4f79daeb1cd6/diff:/var/lib/docker/overlay2/fe33dfe667d8a4d39e6fcdccdfa1fe3681d84140f4fd9b6dbe0e8195446dee69/diff",
"MergedDir": "/var/lib/docker/overlay2/a125ed1c07acbc7873e2bcb2e05b0536dd8e2d86b24bf35116a3435fe6659bf2/merged",
"UpperDir": "/var/lib/docker/overlay2/a125ed1c07acbc7873e2bcb2e05b0536dd8e2d86b24bf35116a3435fe6659bf2/diff",
"WorkDir": "/var/lib/docker/overlay2/a125ed1c07acbc7873e2bcb2e05b0536dd8e2d86b24bf35116a3435fe6659bf2/work"
},
"Name": "overlay2"
},
This is interesting since the id in the overlay2 directory is not the same as the container ID that I’ve been previously working with. I’ll call this overlay ID with value a125ed1c07acbc7873e2bcb2e05b0536dd8e2d86b24bf35116a3435fe6659bf2
. The contents of this directory are:
root@earth:~# ls -al /var/lib/docker/overlay2/a125ed1c07acbc7873e2bcb2e05b0536dd8e2d86b24bf35116a3435fe6659bf2
total 56
drwx------ 5 root root 4096 .
drwx------ 215 root root 28672 ..
drwxr-xr-x 6 root root 4096 diff
-rw-r--r-- 1 root root 26 link
-rw-r--r-- 1 root root 173 lower
drwxr-xr-x 1 root root 4096 merged
drwx------ 3 root root 4096 work
Doing a tree of the directory shows us the complete filesystem for the container.
$ tree /var/lib/docker/overlay2/a125ed1c07acbc7873e2bcb2e05b0536dd8e2d86b24bf35116a3435fe6659bf2 -d -L 2
/var/lib/docker/overlay2/a125ed1c07acbc7873e2bcb2e05b0536dd8e2d86b24bf35116a3435fe6659bf2
├── diff
│ ├── etc
│ ├── root
│ ├── run
│ └── var
├── merged
│ ├── bin
│ ├── boot
│ ├── dev
│ ├── docker-entrypoint.d
│ ├── etc
│ ├── home
│ ├── lib
│ ├── lib64
│ ├── media
│ ├── mnt
│ ├── opt
│ ├── proc
│ ├── root
│ ├── run
│ ├── sbin
│ ├── srv
│ ├── sys
│ ├── tmp
│ ├── usr
│ └── var
└── work
└── work
The work directory is empty, only used by OverlayFS internal mechanics so it’s irrelevant for now. The lower
file is related to the layering of previous images, checking the file yields.
$ cat lower
l/7WPZKXJV2SBCQBE57LVAP4OUIE:l/ON3TWDAENYU5UIS7SW5QWGW6F3:l/W5ZIQ33SFPQXS4WV3LHDGAIEVC:l/QJMVZIUNLOO3E6QS5EX4L4ZTAS:l/6FB65JHRJYH2FMWHZZ2LF2QMUT:l/O46JWE6HPO5PKFWKVEDKCDA66Xroot@earth:/var/lib/docker/overlay2/a125ed1c07acbc7873e2bcb2e05b0536dd8e2d86b24bf35116a3435fe6659bf
A ;
separated list of values, formatting this to be prettier:
l/7WPZKXJV2SBCQBE57LVAP4OUIE
l/ON3TWDAENYU5UIS7SW5QWGW6F3
l/W5ZIQ33SFPQXS4WV3LHDGAIEVC
l/QJMVZIUNLOO3E6QS5EX4L4ZTAS
l/6FB65JHRJYH2FMWHZZ2LF2QMUT
l/O46JWE6HPO5PKFWKVEDKCDA66X
Running a list directory on the contents of l
directories.
$ ls ../l/7WPZKXJV2SBCQBE57LVAP4OUIE
dev etc
$ ls ../l/ON3TWDAENYU5UIS7SW5QWGW6F3
docker-entrypoint.d
$ ls ../l/W5ZIQ33SFPQXS4WV3LHDGAIEVC
docker-entrypoint.d
$ ls ../l/QJMVZIUNLOO3E6QS5EX4L4ZTAS
docker-entrypoint.sh
$ ls ../l/6FB65JHRJYH2FMWHZZ2LF2QMUT
docker-entrypoint.d etc lib tmp usr var
$ ls ../l/O46JWE6HPO5PKFWKVEDKCDA66X
bin boot dev etc home lib lib64 media mnt opt proc root run sbin srv sys tmp usr var
These seem to be the files that where modified by different layers as the image was being built by a docker build
command. Starting at the base image which holds all the root file system directories and each layer modified certain files and directories.
Moving on, the diff
and merged
directories are interesting. Let’s test this, by creating a file on the container and see if it shows up here.
root@79415509360e:~# echo "HELLLOOO FROM THE CONTAINER" > helloworld.txt
root@79415509360e:~# cat helloworld.txt
Then list the contents of the directory on the host machine.
root@earth:/var/lib/docker/overlay2/a125ed1c07acbc7873e2bcb2e05b0536dd8e2d86b24bf35116a3435fe6659bf2# tree diff -L 4
diff
├── etc
│ └── nginx
│ └── conf.d
│ └── default.conf
├── root
│ └── helloworld.txt
├── run
│ └── nginx.pid
└── var
└── cache
└── nginx
├── client_temp
├── fastcgi_temp
├── proxy_temp
├── scgi_temp
└── uwsgi_temp
$ cat diff/root/helloworld.txt
HELLLOOO FROM THE CONTAINER
cat merged/root/helloworld.txt
HELLLOOO FROM THE CONTAINER
So this is it! this diff directory holds the container data. But so does the merged directory. To understand this, a quick look at the documentation [3].
OverlayFS layers two directories on a single Linux host and presents them as a single directory. These directories are called layers and the unification process is referred to as a union mount.
OverlayFS refers to the lower directory as lowerdir and the upper directory a upperdir. The unified view is exposed through its own directory called merged.
So the merged
directory is the unified filesystem that is on the container, the link
file is used as a reference to a layer by symbolic links in the l
directory, this can be verified by running:
$ ls -all ../l/ | grep 7DE4KVX7BSNFR2KAQ7BJ3SDU73
lrwxrwxrwx 1 root root 72 jun 28 20:47 7DE4KVX7BSNFR2KAQ7BJ3SDU73 -> ../a125ed1c07acbc7873e2bcb2e05b0536dd8e2d86b24bf35116a3435fe6659bf2/diff
So it is basically a symbolic link to another directory, this is done to avoid name length when executing the mount
command to mount the filesystem.
The diff
directory contains the difference in the layer with respect to the base.
Checking the actual running system we can see that all of this makes sense.
$ df -Tih
Filesystem Type Inodes IUsed IFree IUse% Mounted on
udev devtmpfs 2,0M 591 2,0M 1% /dev
tmpfs tmpfs 2,0M 1,1K 2,0M 1% /run
/dev/nvme0n1p2 ext4 30M 1,4M 29M 5% /
tmpfs tmpfs 2,0M 461 2,0M 1% /dev/shm
tmpfs tmpfs 2,0M 6 2,0M 1% /run/lock
tmpfs tmpfs 2,0M 18 2,0M 1% /sys/fs/cgroup
/dev/nvme0n1p1 vfat 0 0 0 - /boot/efi
tmpfs tmpfs 2,0M 33 2,0M 1% /run/user/1000
overlay overlay 30M 1,4M 29M 5% /var/lib/docker/overlay2/a125ed1c07acbc7873e2bcb2e05b0536dd8e2d86b24bf35116a3435fe6659bf2/merged
shm tmpfs 2,0M 1 2,0M 1% /var/lib/docker/containers/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28/mounts/shm
Viewing the overlay filesystem we can see that merged
directory gets mounted, as well as the shm
directory. A quick thing to note is that the -i
option in df
command shows the inodes on the mounted filesystem, both the overlay and the host filesystem share the same amount of free inodes at 29845461
or 29M
.
Comparing against the container filesystem.
root@79415509360e:~# df -Tih
Filesystem Type Inodes IUsed IFree IUse% Mounted on
overlay overlay 30M 1.4M 29M 5% /
tmpfs tmpfs 2.0M 16 2.0M 1% /dev
tmpfs tmpfs 2.0M 17 2.0M 1% /sys/fs/cgroup
/dev/nvme0n1p2 ext4 30M 1.4M 29M 5% /etc/hosts
shm tmpfs 2.0M 1 2.0M 1% /dev/shm
tmpfs tmpfs 2.0M 1 2.0M 1% /proc/asound
tmpfs tmpfs 2.0M 1 2.0M 1% /proc/acpi
tmpfs tmpfs 2.0M 1 2.0M 1% /proc/scsi
tmpfs tmpfs 2.0M 1 2.0M 1% /sys/firmware
So to summarise, the host machine is mounted at /
with filesystem of ext4
and the container is mounted /
with filesystem of overlay
, the references to overlay2
are not of filesystem but it is the driver docker is using to mount the filesystem. We also see that the /proc/asound
, /proc/acpi
, /proc/scsi
, /sys/firmware
are mounted as tmpfs in memory but were also listed in MaskedPaths
in the docker inspect
command.
Hope this post gave you a clear picture of the docker internals, please leave a comment if you have some question
References
1: http://alexander.holbreich.org/docker-components-explained/ “Docker Components”
2: https://github.com/opencontainers/runtime-spec “OCI-Runtime-Spec”
3: https://docs.docker.com/storage/storagedriver/overlayfs-driver/ “Docker Overlay2”
4: https://windsock.io/the-docker-proxy/ “Docker Proxy”