Docker In-Depth

At Reputation, we extensively work on Kubernetes clusters. Working on clusters and debugging them, helped me gain a deep understanding on…
Docker In-Depth
FIg 1: Docker Components for Running Containers
In:

At Reputation, we extensively work on Kubernetes clusters. Working on clusters and debugging them, helped me gain a deep understanding on how containers work at a low level. This blog post describes my findings on containers.

Setup

  • Ubuntu 18.04
  • Docker version 18.09.7, build 2d0083d
  • containerd 1.2.6–0ubuntu1~18.04.2
  • runc version spec: 1.0.1-dev
  • kernel version: 5.3.0–61-generic

Internals

There are no such things as containers, Docker is only leveraging kernel feature called a namespace through different programs to give the illusion of container construct. Each namespace isolates a different dimension of the container and can be shared to other containers.

Namespaces

There are 5 namespaces involved in the container creation process

  • pid : Isolation of Processes, different PIDs per container
  • net : Isolation of Network Interface (eth0)
  • ipc : Isolation of Inter Process Communication
  • mnt : Isolation of filesystem mount points
  • uts : Isolation kernel and version identifiers (UTS: Unix Timesharing System)

Control Groups

Control Groups or cgroups for short, are kernel mechanisms that limit a namespaces access to hardware resources, used by PID / IPC Namespaces.

Filesystem

The container doesn’t use the same filesystem as the host machine, it uses the Union File System, they work by using layers. Other variants are:

  • AUFS
  • btrfs
  • vfs
  • Device Mapper

Container Formats

All three previously mentioned kernel features define what is called a Container Format, for Docker this container format is called libcontainer, for other container runtimes for example Rocket, a competitor of docker is called App Container.

Architecture

Docker by itself is just a daemon that runs in the background, it uses a client-server model to operate. The known docker command from the CLI is the client and where images are deployed is the “server”. Docker is composed of pluggable components that handle different aspects of running a container.

  • dockerd: It is the docker daemon which listens to the docker api calls and leverages containerd to create new containers, it binds to /var/run/docker.sock and creates the pid file /var/run/docker.pid, this is why you need either root or be a member of the docker group in order to run the docker daemon.The daemon can create 3 different types of sockets (tcp, fd, unix), the default one on my system is fd as shown by ps command$ ps $(pidof dockerd) PID TTY STAT TIME COMMAND 6811 ? Ssl 0:14 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
  • containerd: It’s involved in the process of setting up network interfaces and filesystem / storage, basically preparing an environment for the actual container runtime, in this case runc [1]. It also implements the OCI runtime specifications [1].
  • runc: The actual process in charge of running the process, it is run as a child process of containerd, that is run through the containerd-shim

A high level overview of these components are shown below:

Docker Deployment Analysis

Running docker info prints to stdout the configuration options with which docker was deployed. On my system it looks like this.docker info
Containers: 48
Running: 1
Paused: 0 what should we do today
Stopped: 47
Images: 110
Server Version: 18.09.7
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version:
runc version: N/A
init version: v0.18.0 (expected: fec3683b971d9c3ef73f284f176672c44b448662)
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 5.3.0-61-generic
Operating System: Ubuntu 18.04.3 LTS
OSType: linux
Architecture: x86_64 CPUs: 8
Total Memory: 15.49GiB
Name: xxxx ID: xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:
Docker Root Dir: /var/lib/docker
Username: xxxxxxxxx
Registry: https://index.docker.io/v1/ Labels:

From here we can see that the docker root is located at /var/lib/docker. The contents of the directory are as follows.$ sudo ls -all /var/lib/docker -all
total 84 drwx--x--x 14 root root 4096 .
drwxr-xr-x 90 root root 4096 ..
drwx------ 2 root root 4096 builder
drwx------ 4 root root 4096 buildkit
drwx------ 50 root root 4096 containers
drwx------ 3 root root 4096 image
drwxr-x--- 3 root root 4096 network
drwx------ 213 root root 28672 overlay2
drwx------ 4 root root 4096 plugins
drwx------ 2 root root 4096 runtimes
drwx------ 2 root root 4096 swarm
drwx------ 2 root root 4096 tmp
drwx------ 2 root root 4096 trust
drwx------ 3 root root 4096 volumes

This directory holds all the files needed by docker/containerd/runc to actually execute a container. Most notably the containers directory holds the configuration for the environment for a container, while the overlay2 holds the filesystem directories that will be mounted on the container’s filesystem.

Running a container

We will analyze the container in depth using the docker command which runs a sample nginx image on any port

$ docker run  -P -d  nginx 
79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28
  • -P: Randomly select the ports to bind, you can statically set them with -p 80:80 or -p 443:443
  • -d: Deattached mode, it removes the process from the terminal, you can still kill the container with docker kill ${ID}

What is returned here is the internal ID for the container that was just spawned.

This will be the basis for further analysis.

Process Map

The initial process on Linux Ubuntu is systemd, and other procesess are forks of this, including containerd and dockerd is illustrated below:

Fig 2: Docker Process Tree

Viewing the running process tree:

$ pstree 
systemd,1 splash 
  |-containerd,905 
  |   |-containerd-shim,11738 -namespace moby -workdir... 
  |   |   |-bash,25790 
  |   |   |-nginx,11756 
  |   |   |   `-nginx,11867 
  |   |   |-{containerd-shim},11739 
  |   |   |-{containerd-shim},11740 
  |   |   |-{containerd-shim},11741 
  |   |   |-{containerd-shim},11742 
  |   |   |-{containerd-shim},11743 
  |   |   |-{containerd-shim},11744 
  |   |   |-{containerd-shim},11798 
  |   |   |-{containerd-shim},11799 
  |   |   `-{containerd-shim},12181 
  |   |-{containerd},939 
  |   |-{containerd},940 
  |   |-{containerd},941 
  |   |-{containerd},942 
  |   |-{containerd},943 
  |   |-{containerd},980 
  |   |-{containerd},981 
  |   |-{containerd},982 
  |   |-{containerd},984 
  |   |-{containerd},1005 
  |   |-{containerd},1006 
  |   |-{containerd},1015 
  |   |-{containerd},1016 
  |   |-{containerd},1028 
  |   |-{containerd},6871 
  |   |-{containerd},6872 
  |   |-{containerd},9688 
  |   |-{containerd},9689 
  |   |-{containerd},15141 
  |   |-{containerd},4423 
  |   |-{containerd},6450 
  |   |-{containerd},12766 
  |   |-{containerd},8467 
  |   |-{containerd},13764 
  |   |-{containerd},2002 
  |   `-{containerd},7109 
  |-dockerd,6811 -H fd:// --containerd=/run/containerd/containerd.sock 
  |   |-docker-proxy,11728 -proto tcp -host-ip 0.0.0.0 -host-port 32768 -container-ip 172.17.0.2 -container-port 80 
  |   |   |-{docker-proxy},11730 
  |   |   |-{docker-proxy},11731 
  |   |   |-{docker-proxy},11732 
  |   |   |-{docker-proxy},11733 
  |   |   |-{docker-proxy},11734 
  |   |   |-{docker-proxy},11735 
  |   |   |-{docker-proxy},11736 
  |   |   `-{docker-proxy},11737 
  |   |-{dockerd},6824 
  |   |-{dockerd},6826 
  |   |-{dockerd},6827 
  |   |-{dockerd},6828 
  |   |-{dockerd},6830 
  |   |-{dockerd},6832 
  |   |-{dockerd},6833 
  |   |-{dockerd},6834 
  |   |-{dockerd},6835 
  |   |-{dockerd},6836 
  |   |-{dockerd},6850 
  |   |-{dockerd},6851 
  |   |-{dockerd},6852 
  |   |-{dockerd},6853 
  |   |-{dockerd},6856 
  |   |-{dockerd},6857 
  |   |-{dockerd},6859 
  |   |-{dockerd},6860 
  |   |-{dockerd},6861 
  |   |-{dockerd},6862 
  |   |-{dockerd},6863 
  |   |-{dockerd},6864 
  |   |-{dockerd},6865 
  |   |-{dockerd},6866 
  |   |-{dockerd},6867 
  |   |-{dockerd},6868 
  |   `-{dockerd},6869

From here we can see that the systemd process has a PID=1 and is the first process on the system, followed by containerd with PID=905 and dockerd with PID=6811.

  • dockerd is started with the host flag -H set to fd:// which means it is using linux file descriptors as the default socket for connection and the--containerd flag which sets the GRPc address, set to /run/containerd/containerd.sock, it also forks a docker-proxy processes [4].
  • containerd is started with no flags, but when a container is run it uses a containerd-shim with PID=11738 that handles the container creation process. containerd-shim is started with following arguments as can be seen by grepping containerd-shim ps command output.
$ ps -ef  | grep containerd-shim 
UID      PID     PPID  C  TTY      TIME     CMD 
root     11738   905   0  ?        00:00:01 containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28 -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd -runtime-root /var/run/docker/runtime-runc
  • -namespace: moby
  • -workdir: /var/lib/containerd/io.containerd.runtime.v1.linux/moby/7941550….
  • -address: /run/containerd/containerd.sock
  • -containerd-binary: /usr/bin/containerd
  • -runtime: /var/run/docker/runtime-runc

The namespace name is found in the /var/lib/containerd/io.containerd.runtime.v1.linux and it holds a folder with the container ID from docker.

/var/lib/containerd/io.containerd.runtime.v1.linux $ tree 
. 
└── moby 
    └── 79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28

Docker Anatomy

I already have a running container with an nginx image running. This can be further verified by running docker ps.

$ docker ps --no-trunc 
CONTAINER ID                                                     IMAGE   COMMAND                                         PORTS         
79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28 nginx  "/docker-entrypoint.sh nginx -g 'daemon off;'"   0.0.0.0:32768->80/tcp

When a container is created it also creates a directory with its container ID on /var/lib/docker/containers or more generally ${DOCKER_ROOT}/containers

$ ls -all /var/lib/docker/containers/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28 
total 44 
drwx------  4 root root 4096 jun 28 20:47 . 
drwx------ 51 root root 4096 jun 28 20:47 .. 
-rw-r-----  1 root root 1072 jun 28 20:47 79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28-json.log 
drwx------  2 root root 4096 jun 28 20:47 checkpoints 
-rw-------  1 root root 2969 jun 28 20:47 config.v2.json 
-rw-r--r--  1 root root 1420 jun 28 20:47 hostconfig.json 
-rw-r--r--  1 root root   13 jun 28 20:47 hostname 
-rw-r--r--  1 root root  174 jun 28 20:47 hosts 
drwx------  3 root root 4096 jun 28 20:47 mounts 
-rw-r--r--  1 root root  645 jun 28 20:47 resolv.conf 
-rw-r--r--  1 root root   71 jun 28 20:47 resolv.conf.hash

A break down of all these files is shown below:

  • ${CONTAINER_ID}-json.log: This file logs what the containerd is doing and outputting to stdout, since I’ve only started the container all it has done right now is look for the /docker-entrypoint.sh and 10-listen-on-ipv6-by-default.sh This can be verified by checking the contents of the log file
{"log":"/docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration\n","stream":"stdout"} 
{"log":"/docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/\n","stream":"stdout"} 
{"log":"/docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh\n","stream":"stdout"} 
{"log":"10-listen-on-ipv6-by-default.sh: Getting the checksum of /etc/nginx/conf.d/default.conf\n","stream":"stdout"} 
{"log":"10-listen-on-ipv6-by-default.sh: Enabled listen on IPv6 in /etc/nginx/conf.d/default.conf\n","stream":"stdout"} 
{"log":"/docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh\n","stream":"stdout"} 
{"log":"/docker-entrypoint.sh: Configuration complete; ready for start up\n","stream":"stdout"}

Exec into the container to verify this:

$ docker exec -it 79415509360e /bin/bash
root@79415509360e:/# cat /docker-entrypoint.sh 
#!/usr/bin/env sh 
# vim:sw=4:ts=4:et 
 
set -e 
 
if [ "$1" = "nginx" -o "$1" = "nginx-debug" ]; then 
    if /usr/bin/find "/docker-entrypoint.d/" -mindepth 1 -maxdepth 1 -type f -print -quit 2>/dev/null | read v; then 
        echo "$0: /docker-entrypoint.d/ is not empty, will attempt to perform configuration" 
 
        echo "$0: Looking for shell scripts in /docker-entrypoint.d/" 
        find "/docker-entrypoint.d/" -follow -type f -print | sort -n | while read -r f; do 
            case "$f" in 
                *.sh) 
                    if [ -x "$f" ]; then 
                        echo "$0: Launching $f"; 
                        "$f" 
                    else 
                        # warn on shell scripts without exec bit 
                        echo "$0: Ignoring $f, not executable"; 
                    fi 
                    ;; 
                *) echo "$0: Ignoring $f";; 
            esac 
        done 
 
        echo "$0: Configuration complete; ready for start up" 
    else 
        echo "$0: No files found in /docker-entrypoint.d/, skipping configuration" 
    fi 
fi 
 
exec "$@"
root@79415509360e:~# ls -all /docker-entrypoint.d/ 
total 16 
drwxr-xr-x 1 root root 4096  . 
drwxr-xr-x 1 root root 4096  .. 
-rwxrwxr-x 1 root root 1963  10-listen-on-ipv6-by-default.sh 
-rwxrwxr-x 1 root root 1043  20-envsubst-on-templates.sh

As you can see, at container startup it executes whatever is found at /docker-entrypoint.sh and since the contents reference a file called 10-listen-on-ipv6-by-default.sh it shows up on the ${CONTAINER_ID}-json.log

  • config.v2.json: A JSON file for configuring the container, docker inspect ${CONTAINER_ID} gets most of its data from this file. When running docker run with parameters, those parameters are mapped to the attributes of this file, for example -p 80:80 will create the correct mapping in NetworkSettings.Ports. Broadly speaking all the parameters and Dockerfile configurations are stored on this file.
root@earth:/var/lib/docker/containers/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28# cat config.v2.json  | jq 
{ 
  "StreamConfig": {}, 
  "State": { 
    "Running": true, 
    "Paused": false, 
    "Restarting": false, 
    "OOMKilled": false, 
    "RemovalInProgress": false, 
    "Dead": false, 
    "Pid": 11756, 
    "ExitCode": 0, 
    "Error": "", 
    "StartedAt": "2020-06-29T00:47:27.064480116Z", 
    "FinishedAt": "0001-01-01T00:00:00Z", 
    "Health": null 
  }, 
  "ID": "79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28", 
  "Created": "2020-06-29T00:47:26.622421221Z", 
  "Managed": false, 
  "Path": "/docker-entrypoint.sh", 
  "Args": [ 
    "nginx", 
    "-g", 
    "daemon off;" 
  ], 
  "Config": { 
    "Hostname": "79415509360e", 
    "Domainname": "", 
    "User": "", 
    "AttachStdin": false, 
    "AttachStdout": false, 
    "AttachStderr": false, 
    "ExposedPorts": { 
      "80/tcp": {} 
    }, 
    "Tty": false, 
    "OpenStdin": false, 
    "StdinOnce": false, 
    "Env": [ 
      "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", 
      "NGINX_VERSION=1.19.0", 
      "NJS_VERSION=0.4.1", 
      "PKG_RELEASE=1~buster" 
    ], 
    "Cmd": [ 
      "nginx", 
      "-g", 
      "daemon off;" 
    ], 
    "ArgsEscaped": true, 
    "Image": "nginx", 
    "Volumes": null, 
    "WorkingDir": "", 
    "Entrypoint": [ 
      "/docker-entrypoint.sh" 
    ], 
    "OnBuild": null, 
    "Labels": { 
      "maintainer": "NGINX Docker Maintainers <docker-maint@nginx.com>" 
    }, 
    "StopSignal": "SIGTERM" 
  }, 
  "Image": "sha256:4392e5dad77dbaf6a573650b0fe1e282b57c5fba6e6cea00a27c7d4b68539b81", 
  "NetworkSettings": { 
    "Bridge": "", 
    "SandboxID": "603fbb30c2e591f9ccc4c21d7b7fc569abb377095d1bdcf7bb79ae7a5c5155c7", 
    "HairpinMode": false, 
    "LinkLocalIPv6Address": "", 
    "LinkLocalIPv6PrefixLen": 0, 
    "Networks": { 
      "bridge": { 
        "IPAMConfig": null, 
        "Links": null, 
        "Aliases": null, 
        "NetworkID": "a8b4c847b9a1ee58ef02a801e84288b536e44030eed13c67566ab0f56585ae49", 
        "EndpointID": "9430a8eba0a7bdf871c921fc12d7dab52047e37b49207b6dbe2d7370c7a914e1", 
        "Gateway": "172.17.0.1", 
        "IPAddress": "172.17.0.2", 
        "IPPrefixLen": 16, 
        "IPv6Gateway": "", 
        "GlobalIPv6Address": "", 
        "GlobalIPv6PrefixLen": 0, 
        "MacAddress": "02:42:ac:11:00:02", 
        "DriverOpts": null, 
        "IPAMOperational": false 
      } 
    }, 
    "Service": null, 
    "Ports": { 
      "80/tcp": [ 
        { 
          "HostIp": "0.0.0.0", 
          "HostPort": "32768" 
        } 
      ] 
    }, 
    "SandboxKey": "/var/run/docker/netns/603fbb30c2e5", 
    "SecondaryIPAddresses": null, 
    "SecondaryIPv6Addresses": null, 
    "IsAnonymousEndpoint": true, 
    "HasSwarmEndpoint": false 
  }, 
  "LogPath": "/var/lib/docker/containers/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28-json.log", 
  "Name": "/dazzling_goldberg", 
  "Driver": "overlay2", 
  "OS": "linux", 
  "MountLabel": "", 
  "ProcessLabel": "", 
  "RestartCount": 0, 
  "HasBeenStartedBefore": true, 
  "HasBeenManuallyStopped": false, 
  "MountPoints": {}, 
  "SecretReferences": null, 
  "ConfigReferences": null, 
  "AppArmorProfile": "docker-default", 
  "HostnamePath": "/var/lib/docker/containers/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28/hostname", 
  "HostsPath": "/var/lib/docker/containers/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28/hosts", 
  "ShmPath": "/var/lib/docker/containers/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28/mounts/shm", 
  "ResolvConfPath": "/var/lib/docker/containers/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28/resolv.conf", 
  "SeccompProfile": "", 
  "NoNewPrivileges": false 
}
  • hostconfig.json: This file is mainly responsible for setting up the network interfaces, bridge modes and dns, among other things if you are running `docker-compose` (docker-compose is a tool for defining and running multi-container docker applications) , then this file would contain different settings, since we are using docker to run a simple container the file looks like this.
root@earth:/var/lib/docker/containers/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28# cat hostconfig.json | jq 
{ 
  "Binds": null, 
  "ContainerIDFile": "", 
  "LogConfig": { 
    "Type": "json-file", 
    "Config": {} 
  }, 
  "NetworkMode": "default", 
  "PortBindings": {}, 
  "RestartPolicy": { 
    "Name": "no", 
    "MaximumRetryCount": 0 
  }, 
  "AutoRemove": false, 
  "VolumeDriver": "", 
  "VolumesFrom": null, 
  "CapAdd": null, 
  "CapDrop": null, 
  "Dns": [], 
  "DnsOptions": [], 
  "DnsSearch": [], 
  "ExtraHosts": null, 
  "GroupAdd": null, 
  "IpcMode": "shareable", 
  "Cgroup": "", 
  "Links": null, 
  "OomScoreAdj": 0, 
  "PidMode": "", 
  "Privileged": false, 
  "PublishAllPorts": true, 
  "ReadonlyRootfs": false, 
  "SecurityOpt": null, 
  "UTSMode": "", 
  "UsernsMode": "", 
  "ShmSize": 67108864, 
  "Runtime": "runc", 
  "ConsoleSize": [ 
    0, 
    0 
  ], 
  "Isolation": "", 
  "CpuShares": 0, 
  "Memory": 0, 
  "NanoCpus": 0, 
  "CgroupParent": "", 
  "BlkioWeight": 0, 
  "BlkioWeightDevice": [], 
  "BlkioDeviceReadBps": null, 
  "BlkioDeviceWriteBps": null, 
  "BlkioDeviceReadIOps": null, 
  "BlkioDeviceWriteIOps": null, 
  "CpuPeriod": 0, 
  "CpuQuota": 0, 
  "CpuRealtimePeriod": 0, 
  "CpuRealtimeRuntime": 0, 
  "CpusetCpus": "", 
  "CpusetMems": "", 
  "Devices": [], 
  "DeviceCgroupRules": null, 
  "DiskQuota": 0, 
  "KernelMemory": 0, 
  "MemoryReservation": 0, 
  "MemorySwap": 0, 
  "MemorySwappiness": null, 
  "OomKillDisable": false, 
  "PidsLimit": 0, 
  "Ulimits": null, 
  "CpuCount": 0, 
  "CpuPercent": 0, 
  "IOMaximumIOps": 0, 
  "IOMaximumBandwidth": 0, 
  "MaskedPaths": [ 
    "/proc/asound", 
    "/proc/acpi", 
    "/proc/kcore", 
    "/proc/keys", 
    "/proc/latency_stats", 
    "/proc/timer_list", 
    "/proc/timer_stats", 
    "/proc/sched_debug", 
    "/proc/scsi", 
    "/sys/firmware" 
  ], 
  "ReadonlyPaths": [ 
    "/proc/bus", 
    "/proc/fs", 
    "/proc/irq", 
    "/proc/sys", 
    "/proc/sysrq-trigger" 
  ] 
}
  • hostname: Contains the hostname for the container.
# Host machine 
$ cat /var/lib/docker/containers/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28/hostname 
79415509360e
# On the container  
root@79415509360e $ cat /etc/hostname 
79415509360e
  • resolv.conf: Contains the resolv.conf for the container, mounted at /etc/resolv.conf
root@earth:/var/lib/docker/containers/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28# cat /var/lib/docker/containers/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28/resolv.conf 
# This file is managed by man:systemd-resolved(8). Do not edit. 
# 
# This is a dynamic resolv.conf file for connecting local clients directly to 
# all known uplink DNS servers. This file lists all configured search domains. 
# 
# Third party programs must not access this file directly, but only through the 
# symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a different way, 
# replace this symlink by a static file or a different symlink. 
# 
# See man:systemd-resolved.service(8) for details about the supported modes of 
# operation for /etc/resolv.conf.
nameserver 192.168.1.7 
nameserver 1.1.1.1 
nameserver 8.8.8.8 
search localdomain
root@79415509360e:~# cat /etc/resolv.conf 
# This file is managed by man:systemd-resolved(8). Do not edit. 
# 
# This is a dynamic resolv.conf file for connecting local clients directly to 
# all known uplink DNS servers. This file lists all configured search domains. 
# 
# Third party programs must not access this file directly, but only through the 
# symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a different way, 
# replace this symlink by a static file or a different symlink. 
# 
# See man:systemd-resolved.service(8) for details about the supported modes of 
# operation for /etc/resolv.conf. 
 
nameserver 192.168.1.7 
nameserver 1.1.1.1 
nameserver 8.8.8.8 
search localdomain
  • resolv.conf.hash: Just a file that holds the sha-256 check sum for the previously mentioned resolv.conf file
$ cat /var/lib/docker/containers/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28/resolv.conf.hash 
sha256:4b2f8601ebbea68e34b4218913bd52e6acb9e816a7024e6ff9e4ed3ecf71b878
shasum -a 256 resolv.conf 
4b2f8601ebbea68e34b4218913bd52e6acb9e816a7024e6ff9e4ed3ecf71b878  resolv.conf

Docker Filesystem and Files

We know that there are some files from /var/lib/docker/containers/${CONTAINER_ID} that are mounted automatically as part of the startup process, most notably hostname and resolv.conf but what about the rest of the filesystem? For that we need to look in docker inspect.

$ docker inspect 79415509360e
[ 
    { 
        "Id": "79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28", 
        "Created": "2020-06-29T00:47:26.622421221Z", 
        "Path": "/docker-entrypoint.sh", 
        "Args": [ 
            "nginx", 
            "-g", 
            "daemon off;" 
        ], 
        "State": { 
            "Status": "running", 
            "Running": true, 
            "Paused": false, 
            "Restarting": false, 
            "OOMKilled": false, 
            "Dead": false, 
            "Pid": 11756, 
            "ExitCode": 0, 
            "Error": "", 
            "StartedAt": "2020-06-29T00:47:27.064480116Z", 
            "FinishedAt": "0001-01-01T00:00:00Z" 
        }, 
        "Image": "sha256:4392e5dad77dbaf6a573650b0fe1e282b57c5fba6e6cea00a27c7d4b68539b81", 
        "ResolvConfPath": "/var/lib/docker/containers/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28/resolv.conf", 
        "HostnamePath": "/var/lib/docker/containers/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28/hostname", 
        "HostsPath": "/var/lib/docker/containers/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28/hosts", 
        "LogPath": "/var/lib/docker/containers/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28-json.log", 
        "Name": "/dazzling_goldberg", 
        "RestartCount": 0, 
        "Driver": "overlay2", 
        "Platform": "linux", 
        "MountLabel": "", 
        "ProcessLabel": "", 
        "AppArmorProfile": "docker-default", 
        "ExecIDs": [ 
            "2aabd28bfd6a8b19e9190809a33ecfaa90473980f63ce417903cc6679513af22" 
        ], 
        "HostConfig": { 
            "Binds": null, 
            "ContainerIDFile": "", 
            "LogConfig": { 
                "Type": "json-file", 
                "Config": {} 
            }, 
            "NetworkMode": "default", 
            "PortBindings": {}, 
            "RestartPolicy": { 
                "Name": "no", 
                "MaximumRetryCount": 0 
            }, 
            "AutoRemove": false, 
            "VolumeDriver": "", 
            "VolumesFrom": null, 
            "CapAdd": null, 
            "CapDrop": null, 
            "Dns": [], 
            "DnsOptions": [], 
            "DnsSearch": [], 
            "ExtraHosts": null, 
            "GroupAdd": null, 
            "IpcMode": "shareable", 
            "Cgroup": "", 
            "Links": null, 
            "OomScoreAdj": 0, 
            "PidMode": "", 
            "Privileged": false, 
            "PublishAllPorts": true, 
            "ReadonlyRootfs": false, 
            "SecurityOpt": null, 
            "UTSMode": "", 
            "UsernsMode": "", 
            "ShmSize": 67108864, 
            "Runtime": "runc", 
            "ConsoleSize": [ 
                0, 
                0 
            ], 
            "Isolation": "", 
            "CpuShares": 0, 
            "Memory": 0, 
            "NanoCpus": 0, 
            "CgroupParent": "", 
            "BlkioWeight": 0, 
            "BlkioWeightDevice": [], 
            "BlkioDeviceReadBps": null, 
            "BlkioDeviceWriteBps": null, 
            "BlkioDeviceReadIOps": null, 
            "BlkioDeviceWriteIOps": null, 
            "CpuPeriod": 0, 
            "CpuQuota": 0, 
            "CpuRealtimePeriod": 0, 
            "CpuRealtimeRuntime": 0, 
            "CpusetCpus": "", 
            "CpusetMems": "", 
            "Devices": [], 
            "DeviceCgroupRules": null, 
            "DiskQuota": 0, 
            "KernelMemory": 0, 
            "MemoryReservation": 0, 
            "MemorySwap": 0, 
            "MemorySwappiness": null, 
            "OomKillDisable": false, 
            "PidsLimit": 0, 
            "Ulimits": null, 
            "CpuCount": 0, 
            "CpuPercent": 0, 
            "IOMaximumIOps": 0, 
            "IOMaximumBandwidth": 0, 
            "MaskedPaths": [ 
                "/proc/asound", 
                "/proc/acpi", 
                "/proc/kcore", 
                "/proc/keys", 
                "/proc/latency_stats", 
                "/proc/timer_list", 
                "/proc/timer_stats", 
                "/proc/sched_debug", 
                "/proc/scsi", 
                "/sys/firmware" 
            ], 
            "ReadonlyPaths": [ 
                "/proc/bus", 
                "/proc/fs", 
                "/proc/irq", 
                "/proc/sys", 
                "/proc/sysrq-trigger" 
            ] 
        }, 
        "GraphDriver": { 
            "Data": { 
                "LowerDir": "/var/lib/docker/overlay2/a125ed1c07acbc7873e2bcb2e05b0536dd8e2d86b24bf35116a3435fe6659bf2-init/diff:/var/lib/docker/overlay2/df0cfc3535d26b05d11e44e4398f99bf39a920615c5810f1690faaae120826e9/diff:/var/lib/docker/overlay2/bc9429ad6ede34e6704739f342a5c4f2400e02744d5f85e11084866ed636069c/diff:/var/lib/docker/overlay2/092a847640b36f15c2e9103767cb06d8d04e7be675dbef4d0f3ee81d019b71d4/diff:/var/lib/docker/overlay2/bbf65ed8358dec49eafff808a9e61ebdb5edb8a592238e81045a4f79daeb1cd6/diff:/var/lib/docker/overlay2/fe33dfe667d8a4d39e6fcdccdfa1fe3681d84140f4fd9b6dbe0e8195446dee69/diff", 
                "MergedDir": "/var/lib/docker/overlay2/a125ed1c07acbc7873e2bcb2e05b0536dd8e2d86b24bf35116a3435fe6659bf2/merged", 
                "UpperDir": "/var/lib/docker/overlay2/a125ed1c07acbc7873e2bcb2e05b0536dd8e2d86b24bf35116a3435fe6659bf2/diff", 
                "WorkDir": "/var/lib/docker/overlay2/a125ed1c07acbc7873e2bcb2e05b0536dd8e2d86b24bf35116a3435fe6659bf2/work" 
            }, 
            "Name": "overlay2" 
        }, 
        "Mounts": [], 
        "Config": { 
            "Hostname": "79415509360e", 
            "Domainname": "", 
            "User": "", 
            "AttachStdin": false, 
            "AttachStdout": false, 
            "AttachStderr": false, 
            "ExposedPorts": { 
                "80/tcp": {} 
            }, 
            "Tty": false, 
            "OpenStdin": false, 
            "StdinOnce": false, 
            "Env": [ 
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", 
                "NGINX_VERSION=1.19.0", 
                "NJS_VERSION=0.4.1", 
                "PKG_RELEASE=1~buster" 
            ], 
            "Cmd": [ 
                "nginx", 
                "-g", 
                "daemon off;" 
            ], 
            "ArgsEscaped": true, 
            "Image": "nginx", 
            "Volumes": null, 
            "WorkingDir": "", 
            "Entrypoint": [ 
                "/docker-entrypoint.sh" 
            ], 
            "OnBuild": null, 
            "Labels": { 
                "maintainer": "NGINX Docker Maintainers <docker-maint@nginx.com>" 
            }, 
            "StopSignal": "SIGTERM" 
        }, 
        "NetworkSettings": { 
            "Bridge": "", 
            "SandboxID": "603fbb30c2e591f9ccc4c21d7b7fc569abb377095d1bdcf7bb79ae7a5c5155c7", 
            "HairpinMode": false, 
            "LinkLocalIPv6Address": "", 
            "LinkLocalIPv6PrefixLen": 0, 
            "Ports": { 
                "80/tcp": [ 
                    { 
                        "HostIp": "0.0.0.0", 
                        "HostPort": "32768" 
                    } 
                ] 
            }, 
            "SandboxKey": "/var/run/docker/netns/603fbb30c2e5", 
            "SecondaryIPAddresses": null, 
            "SecondaryIPv6Addresses": null, 
            "EndpointID": "9430a8eba0a7bdf871c921fc12d7dab52047e37b49207b6dbe2d7370c7a914e1", 
            "Gateway": "172.17.0.1", 
            "GlobalIPv6Address": "", 
            "GlobalIPv6PrefixLen": 0, 
            "IPAddress": "172.17.0.2", 
            "IPPrefixLen": 16, 
            "IPv6Gateway": "", 
            "MacAddress": "02:42:ac:11:00:02", 
            "Networks": { 
                "bridge": { 
                    "IPAMConfig": null, 
                    "Links": null, 
                    "Aliases": null, 
                    "NetworkID": "a8b4c847b9a1ee58ef02a801e84288b536e44030eed13c67566ab0f56585ae49", 
                    "EndpointID": "9430a8eba0a7bdf871c921fc12d7dab52047e37b49207b6dbe2d7370c7a914e1", 
                    "Gateway": "172.17.0.1", 
                    "IPAddress": "172.17.0.2", 
                    "IPPrefixLen": 16, 
                    "IPv6Gateway": "", 
                    "GlobalIPv6Address": "", 
                    "GlobalIPv6PrefixLen": 0, 
                    "MacAddress": "02:42:ac:11:00:02", 
                    "DriverOpts": null 
                } 
            } 
        } 
    } 
]

We can see that the files mentioned previously are mounted by virtue of docker.

"ResolvConfPath": "/var/lib/docker/containers/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28/resolv.conf", 
"HostnamePath": "/var/lib/docker/containers/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28/hostname", 
"HostsPath": "/var/lib/docker/containers/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28/hosts", 
"LogPath": "/var/lib/docker/containers/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28-json.log",

We can also see that the filesystem driver is shown here.

"Driver": "overlay2",

Also there are references to some /proc/ directories that are either masked or marked as read-only, these attributes are also referenced in hostconfig.json

"MaskedPaths": [ 
    "/proc/asound", 
    "/proc/acpi", 
    "/proc/kcore", 
    "/proc/keys", 
    "/proc/latency_stats", 
    "/proc/timer_list", 
    "/proc/timer_stats", 
    "/proc/sched_debug", 
    "/proc/scsi", 
    "/sys/firmware" 
], 
"ReadonlyPaths": [ 
    "/proc/bus", 
    "/proc/fs", 
    "/proc/irq", 
    "/proc/sys", 
    "/proc/sysrq-trigger" 
]

Finally there seems to be some reference to the overlay filesystem through some attribute called GraphDriver.

"GraphDriver": { 
    "Data": { 
        "LowerDir": "/var/lib/docker/overlay2/a125ed1c07acbc7873e2bcb2e05b0536dd8e2d86b24bf35116a3435fe6659bf2-init/diff:/var/lib/docker/overlay2/df0cfc3535d26b05d11e44e4398f99bf39a920615c5810f1690faaae120826e9/diff:/var/lib/docker/overlay2/bc9429ad6ede34e6704739f342a5c4f2400e02744d5f85e11084866ed636069c/diff:/var/lib/docker/overlay2/092a847640b36f15c2e9103767cb06d8d04e7be675dbef4d0f3ee81d019b71d4/diff:/var/lib/docker/overlay2/bbf65ed8358dec49eafff808a9e61ebdb5edb8a592238e81045a4f79daeb1cd6/diff:/var/lib/docker/overlay2/fe33dfe667d8a4d39e6fcdccdfa1fe3681d84140f4fd9b6dbe0e8195446dee69/diff", 
        "MergedDir": "/var/lib/docker/overlay2/a125ed1c07acbc7873e2bcb2e05b0536dd8e2d86b24bf35116a3435fe6659bf2/merged", 
        "UpperDir": "/var/lib/docker/overlay2/a125ed1c07acbc7873e2bcb2e05b0536dd8e2d86b24bf35116a3435fe6659bf2/diff", 
        "WorkDir": "/var/lib/docker/overlay2/a125ed1c07acbc7873e2bcb2e05b0536dd8e2d86b24bf35116a3435fe6659bf2/work" 
    }, 
    "Name": "overlay2" 
},

This is interesting since the id in the overlay2 directory is not the same as the container ID that I’ve been previously working with. I’ll call this overlay ID with value a125ed1c07acbc7873e2bcb2e05b0536dd8e2d86b24bf35116a3435fe6659bf2. The contents of this directory are:

root@earth:~# ls -al /var/lib/docker/overlay2/a125ed1c07acbc7873e2bcb2e05b0536dd8e2d86b24bf35116a3435fe6659bf2 
total 56 
drwx------   5 root root  4096  . 
drwx------ 215 root root 28672  .. 
drwxr-xr-x   6 root root  4096  diff 
-rw-r--r--   1 root root    26  link 
-rw-r--r--   1 root root   173  lower 
drwxr-xr-x   1 root root  4096  merged 
drwx------   3 root root  4096  work

Doing a tree of the directory shows us the complete filesystem for the container.

$ tree  /var/lib/docker/overlay2/a125ed1c07acbc7873e2bcb2e05b0536dd8e2d86b24bf35116a3435fe6659bf2 -d -L 2 
/var/lib/docker/overlay2/a125ed1c07acbc7873e2bcb2e05b0536dd8e2d86b24bf35116a3435fe6659bf2 
├── diff 
│   ├── etc 
│   ├── root 
│   ├── run 
│   └── var 
├── merged 
│   ├── bin 
│   ├── boot 
│   ├── dev 
│   ├── docker-entrypoint.d 
│   ├── etc 
│   ├── home 
│   ├── lib 
│   ├── lib64 
│   ├── media 
│   ├── mnt 
│   ├── opt 
│   ├── proc 
│   ├── root 
│   ├── run 
│   ├── sbin 
│   ├── srv 
│   ├── sys 
│   ├── tmp 
│   ├── usr 
│   └── var 
└── work 
    └── work

The work directory is empty, only used by OverlayFS internal mechanics so it’s irrelevant for now. The lower file is related to the layering of previous images, checking the file yields.

$ cat lower 
l/7WPZKXJV2SBCQBE57LVAP4OUIE:l/ON3TWDAENYU5UIS7SW5QWGW6F3:l/W5ZIQ33SFPQXS4WV3LHDGAIEVC:l/QJMVZIUNLOO3E6QS5EX4L4ZTAS:l/6FB65JHRJYH2FMWHZZ2LF2QMUT:l/O46JWE6HPO5PKFWKVEDKCDA66Xroot@earth:/var/lib/docker/overlay2/a125ed1c07acbc7873e2bcb2e05b0536dd8e2d86b24bf35116a3435fe6659bf

A ; separated list of values, formatting this to be prettier:

l/7WPZKXJV2SBCQBE57LVAP4OUIE 
l/ON3TWDAENYU5UIS7SW5QWGW6F3 
l/W5ZIQ33SFPQXS4WV3LHDGAIEVC 
l/QJMVZIUNLOO3E6QS5EX4L4ZTAS 
l/6FB65JHRJYH2FMWHZZ2LF2QMUT 
l/O46JWE6HPO5PKFWKVEDKCDA66X

Running a list directory on the contents of l directories.

$ ls ../l/7WPZKXJV2SBCQBE57LVAP4OUIE 
dev  etc
$ ls ../l/ON3TWDAENYU5UIS7SW5QWGW6F3 
docker-entrypoint.d
$ ls ../l/W5ZIQ33SFPQXS4WV3LHDGAIEVC 
docker-entrypoint.d
$ ls ../l/QJMVZIUNLOO3E6QS5EX4L4ZTAS 
docker-entrypoint.sh
$ ls ../l/6FB65JHRJYH2FMWHZZ2LF2QMUT 
docker-entrypoint.d  etc  lib  tmp  usr  var
$ ls ../l/O46JWE6HPO5PKFWKVEDKCDA66X 
bin  boot  dev  etc  home  lib  lib64  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var

These seem to be the files that where modified by different layers as the image was being built by a docker build command. Starting at the base image which holds all the root file system directories and each layer modified certain files and directories.

Moving on, the diff and merged directories are interesting. Let’s test this, by creating a file on the container and see if it shows up here.

root@79415509360e:~# echo "HELLLOOO FROM THE CONTAINER" > helloworld.txt 
root@79415509360e:~# cat helloworld.txt

Then list the contents of the directory on the host machine.

root@earth:/var/lib/docker/overlay2/a125ed1c07acbc7873e2bcb2e05b0536dd8e2d86b24bf35116a3435fe6659bf2# tree diff -L 4 
diff 
├── etc 
│   └── nginx 
│       └── conf.d 
│           └── default.conf 
├── root 
│   └── helloworld.txt 
├── run 
│   └── nginx.pid 
└── var 
    └── cache 
        └── nginx 
            ├── client_temp 
            ├── fastcgi_temp 
            ├── proxy_temp 
            ├── scgi_temp 
            └── uwsgi_temp
$ cat diff/root/helloworld.txt 
HELLLOOO FROM THE CONTAINER
cat merged/root/helloworld.txt 
HELLLOOO FROM THE CONTAINER

So this is it! this diff directory holds the container data. But so does the merged directory. To understand this, a quick look at the documentation [3].

OverlayFS layers two directories on a single Linux host and presents them as a single directory.  These directories are called layers and the unification process is referred to as a union mount. 
OverlayFS refers to the lower directory as lowerdir and the upper directory a upperdir. The unified view is exposed through its own directory called merged.

So the merged directory is the unified filesystem that is on the container, the link file is used as a reference to a layer by symbolic links in the l directory, this can be verified by running:

$ ls -all ../l/ | grep 7DE4KVX7BSNFR2KAQ7BJ3SDU73 
lrwxrwxrwx   1 root root    72 jun 28 20:47 7DE4KVX7BSNFR2KAQ7BJ3SDU73 -> ../a125ed1c07acbc7873e2bcb2e05b0536dd8e2d86b24bf35116a3435fe6659bf2/diff

So it is basically a symbolic link to another directory, this is done to avoid name length when executing the mount command to mount the filesystem.

The diff directory contains the difference in the layer with respect to the base.

Checking the actual running system we can see that all of this makes sense.

$ df -Tih 
Filesystem     Type     Inodes IUsed IFree IUse% Mounted on 
udev           devtmpfs   2,0M   591  2,0M    1% /dev 
tmpfs          tmpfs      2,0M  1,1K  2,0M    1% /run 
/dev/nvme0n1p2 ext4        30M  1,4M   29M    5% / 
tmpfs          tmpfs      2,0M   461  2,0M    1% /dev/shm 
tmpfs          tmpfs      2,0M     6  2,0M    1% /run/lock 
tmpfs          tmpfs      2,0M    18  2,0M    1% /sys/fs/cgroup 
/dev/nvme0n1p1 vfat          0     0     0     - /boot/efi 
tmpfs          tmpfs      2,0M    33  2,0M    1% /run/user/1000 
overlay        overlay     30M  1,4M   29M    5% /var/lib/docker/overlay2/a125ed1c07acbc7873e2bcb2e05b0536dd8e2d86b24bf35116a3435fe6659bf2/merged 
shm            tmpfs      2,0M     1  2,0M    1% /var/lib/docker/containers/79415509360edd0ddbb33ce05887a0b5633fed7da58d32c83536af8c04f72f28/mounts/shm

Viewing the overlay filesystem we can see that merged directory gets mounted, as well as the shm directory. A quick thing to note is that the -i option in df command shows the inodes on the mounted filesystem, both the overlay and the host filesystem share the same amount of free inodes at 29845461 or 29M.

Comparing against the container filesystem.

root@79415509360e:~# df -Tih 
Filesystem     Type    Inodes IUsed IFree IUse% Mounted on 
overlay        overlay    30M  1.4M   29M    5% / 
tmpfs          tmpfs     2.0M    16  2.0M    1% /dev 
tmpfs          tmpfs     2.0M    17  2.0M    1% /sys/fs/cgroup 
/dev/nvme0n1p2 ext4       30M  1.4M   29M    5% /etc/hosts 
shm            tmpfs     2.0M     1  2.0M    1% /dev/shm 
tmpfs          tmpfs     2.0M     1  2.0M    1% /proc/asound 
tmpfs          tmpfs     2.0M     1  2.0M    1% /proc/acpi 
tmpfs          tmpfs     2.0M     1  2.0M    1% /proc/scsi 
tmpfs          tmpfs     2.0M     1  2.0M    1% /sys/firmware

So to summarise, the host machine is mounted at / with filesystem of ext4 and the container is mounted / with filesystem of overlay, the references to overlay2 are not of filesystem but it is the driver docker is using to mount the filesystem. We also see that the /proc/asound, /proc/acpi, /proc/scsi, /sys/firmware are mounted as tmpfs in memory but were also listed in MaskedPaths in the docker inspect command.

Hope this post gave you a clear picture of the docker internals, please leave a comment if you have some question

References

1: http://alexander.holbreich.org/docker-components-explained/ “Docker Components”

2: https://github.com/opencontainers/runtime-spec “OCI-Runtime-Spec”

3: https://docs.docker.com/storage/storagedriver/overlayfs-driver/ “Docker Overlay2”

4: https://windsock.io/the-docker-proxy/ “Docker Proxy”

Comments
Great! You’ve successfully signed up.
Welcome back! You've successfully signed in.
You've successfully subscribed to Binome.
Your link has expired.
Success! Check your email for magic link to sign-in.
Success! Your billing info has been updated.
Your billing was not updated.