At exactly one year ago, I set up an Anycast service with Docker in the DN42 network (Chinese only atm). Back then, I customized the container's image and added a Bird installation to it, then put in a config file to broadcast Anycast routes via OSPF. However, as time went by, a few problems were exposed:
- The process of installing Bird takes time. Instead of installing Bird with
apt-get, since my Dockerfiles need to support multiple architectures (Chinese only atm), and Bird isn't available in some architecture's repos for Debian. And since my building server is AMD64, and is running images of other architectures with
qemu-user-static(Chinese only atm), a lot of instruction translation is needed in the image building and software compilation progress, which is extremely inefficient. It may take more than 2 hours to build an image for different architectures, while if I installed it with Bird, it will take less than 5 minutes.
- Customizing image also takes time. Since both the target application (such as PowerDNS) and Bird need to be run simultaneously, I cannot simply use the target app as the ENTRYPOINT. Adding other managing software (supervisord, s6-supervise, tini, or a custom Bash script) adds extra complexity to the image (and therefore increases chances of error), and other factors such as signals, return values and zombie processes need to be taken into account.
Recently when I was reading
docker-compose's reference document, I realized that
network_mode, or the container's network scheme, can be set to
service:[name], which means multiple containers can share the same namespace, and therefore the same IP allocations and routes. This means I can create an individual Bird container, attach it to the application container's network namespace, to achieve Anycast without modifying the original container image.
Scheme 1: Two Containers
The most straightforward solution is using two containers, one for application and one for Bird. Using PowerDNS as an example, this is my config file:
services: powerdns: image: xddxdd/powerdns container_name: powerdns restart: always volumes: - "./conf/powerdns:/etc/powerdns:ro" - "/etc/geoip:/etc/geoip:ro" depends_on: - mysql - docker-ipv6nat ports: - "53:53" - "53:53/udp" networks: default: ipv4_address: 172.18.3.54 ipv6_address: fcf9:a876:eddd:c85a:8a93::54 anycast_ip: ipv4_address: 172.22.76.109 ipv6_address: fdbc:f9dc:67ad:2547::54 powerdns-bird: image: xddxdd/bird container_name: powerdns-bird restart: always network_mode: "service:powerdns" volumes: - "./conf/powerdns/bird-static.conf:/etc/bird-static.conf:ro" cap_add: - NET_ADMIN depends_on: - docker-ipv6nat - powerdns networks: [Redacted...]
Here I set all networking-related configuration (including IP and ports) on the PowerDNS container, and set
network_mode: service:powerdns on the Bird container, so they share the network namespace. Here
NET_ADMIN capability is still required by Bird to handle the broadcasting and routing, but no longer assigned to PowerDNS. Hence security is improved a bit?
docker-compose up -d to start both containers.
And Problem Arises
Not long has passed before the host cannot receive OSPF broadcasts from the Bird container anymore. I entered the Bird container, did some checks, and saw:
# ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever 2: sit0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000 link/sit 0.0.0.0 brd 0.0.0.0 3: gre0@NONE: <NOARP> mtu 1476 qdisc noop state DOWN group default qlen 1000 link/gre 0.0.0.0 brd 0.0.0.0 4: gretap0@NONE: <BROADCAST,MULTICAST> mtu 1462 qdisc noop state DOWN group default qlen 1000 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff 5: erspan0@NONE: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN group default qlen 1000 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
The IP allocation is gone. Upon further inspection on the PowerDNS container, I realized that since I have Watchtower automatically updating images to the latest version, and my building server updated PowerDNS's image, the PowerDNS container has been recreated.
network_mode is, in fact, a convenience function provided by
docker-compose, and is assigned by container ID on Docker level, when PowerDNS container was recreated, the network namespace for Bird container is lost as well.
Now, if I attempt to restart the Bird container, Docker will show an error about unable to find the container with the original ID. Here the complexity arises: if I run
docker-compose up -d again, instead of recreating containers,
docker-compose simply tries to start existing containers, which will fail.
Therefore, I need a container that is always running and never updated for the network namespace, and attach PowerDNS and Bird containers, which may be updated any time, onto the long-running container, to avoid issues when updating containers.
Scheme 2: Three Containers
I chose the Busybox container to run forever since it's small enough and occupies negligible memory space. The latest version of Busybox was 1.31.1 when I was writing this post, but since images of 1.31.1 was still updated periodically, I chose the image for 1.31.0, which was last updated 3 months ago.
tail -f /dev/null forever with Busybox without it eating CPU cycles. In addition, I set labels to the container to prevent Watchtower from auto-updating it.
Here is my updated config file:
services: powerdns-net: image: amd64/busybox:1.31.0 container_name: powerdns-net restart: always entrypoint: "tail -f /dev/null" labels: - com.centurylinklabs.watchtower.enable=false depends_on: - docker-ipv6nat ports: - "53:53" - "53:53/udp" networks: default: ipv4_address: 172.18.3.54 ipv6_address: fcf9:a876:eddd:c85a:8a93::54 anycast_ip: ipv4_address: 172.22.76.109 ipv6_address: fdbc:f9dc:67ad:2547::54 powerdns: image: xddxdd/powerdns container_name: powerdns restart: always network_mode: "service:powerdns-net" volumes: - "./conf/powerdns:/etc/powerdns:ro" - "/etc/geoip:/etc/geoip:ro" depends_on: - docker-ipv6nat - powerdns-net powerdns-bird: image: xddxdd/bird container_name: powerdns-bird restart: always network_mode: "service:powerdns-net" volumes: - "./conf/powerdns/bird-static.conf:/etc/bird-static.conf:ro" cap_add: - NET_ADMIN depends_on: - docker-ipv6nat - powerdns networks: [Redacted...]
Here the Busybox container will run forever stably to keep the network namespace running, and PowerDNS and Bird will attach to it and provide services. Either PowerDNS or Bird can be updated at any time without affecting the existence of the whole network namespace.
As for resource consumptions of Busybox:
# docker stats --no-stream CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS ... 803b11f02b3a powerdns-recursor-net 0.00% 384KiB / 734MiB 0.05% 10.3MB / 3.98MB 1.43MB / 0B 1 ...
The size is merely 384KB and can be simply ignored.