Network setup for Cambridge's new DNS servers

The SCCS-to-git
project that I wrote about previously was the prelude to setting up
new DNS servers with an entirely overhauled infrastructure.

The current setup which I am replacing uses Solaris Zones (like
FreeBSD Jails or Linux Containers) to host the various name server
instances on three physical boxes. The new setup will use Ubuntu
virtual machines on our shared VM service (should I call it a "private
cloud"?) for the authoritative servers. I am making a couple of
changes to the authoritative setup: changing to a hidden master, and
eliminating differences in which zones are served by each server.

I have obtained dedicated hardware for the recursive servers. Our
main concern is that they should be able to boot and work with no
dependencies on other services beyond power and networking, because
basically all the other services rely on the recursive DNS servers.
The machines are Dell R320s, each with one Xeon E5-2420 (6
hyperthreaded cores, 2.2GHz), 32 GB RAM, and a Dell-branded Intel
160GB SSD.

Failover for recursive DNS servers

The most important change to the recursive DNS service will be
automatic failover. Whenever I need to loosen my bowels I just
contemplate dealing with a failure of one of the current elderly
machines, which involves a lengthy and delicate manual playbook
described on our wiki...

Often when I mention DNS and failover, the immediate response is
"Anycast?". We will not be doing anycast on the new servers,
though that may change in the future. My current plan is to do
failover with VRRP using keepalived. (Several people have
told me they are successfully using keepalived, though its
documentation is shockingly bad. I would like to know of any better
alternatives.) There are a number of reasons for using VRRP rather
than anycast:

  • The recursive DNS server addresses are (aka recdns0)
    and (aka recdns1). (They have IPv6 addresses too.) They
    are on different subnets which are actually VLANs on the same physical
    network. It is not feasible to change these addresses.

  • The 8 and 12 subnets are our general server subnets, used for a
    large proportion of our services, most of which use the recdns
    servers. So anycasting recdns[01] requires punching holes in the
    server network routing.

  • The server network routers do not provide proxy ARP and my
    colleagues in network systems do not want to change this. But our
    Cisco routers can't punch a /32 anycast hole in the server subnets
    without proxy ARP. So if we did do anycast we would also have
    to do VRRP to support failover for recdns clients on the server

  • The server network spans four sites, connected via our own city-wide fibre
    . The sites are linked at layer 2: the same Ethernet VLANs
    are present at all four sites. So VRRP failover gives us pretty good
    resilience in the face of server, rack, or site failures.

VRRP will be a massive improvement over our current setup, and it
should provide us a lot of the robustness that other places would
normally need anycast for, but with significantly less complexity. And
less complexity means less time before I can take the old machines out
of service.

After the new setup is in place, it might make sense for us to revisit
anycast. For instance, we could put recursive servers at other points
of presence where our server network does not reach (e.g. the
Addenbrooke's medical research site). But in practice there are not
many situations when our server network is unreachable but the rest of
the University data network is functioning, so it might not be worth

Configuration management

The old machines are special snowflake servers. The new setup is
being managed by Ansible.

I first used Ansible in 2013 to set up the DHCP servers that were a
crucial part of the network renumbering we did when moving our main
office from the city centre to the West Cambridge site. I liked how
easy it was to get started with Ansible. The way its --check mode
prints a diff of remote config file changes is a killer feature for
me. And it uses ssh rather than rolling its own crypto and host
authentication like some other config management software.

I spent a lot of December working through the configuration of the new
servers, starting with the hidden master and an authoritative server
(a staging server which is a clone of the future live servers). It
felt like quite a lot of elapsed time without much visible progress,
though I was steadily knocking items off the list of things to get

The best bit was the last day before the xmas break. The new recdns
hardware arrived on Monday 22nd, so I spent Tuesday racking them up
and getting them running.

My Ansible setup already included most of the special cases
required for the recdns servers, so I just uncommented their hostnames
in the inventory file and told Ansible to run the playbook. It pretty
much Just Worked, which was extremely pleasing :-) All that steady
work paid off big time.

Multi-VLAN network setup

The main part of the recdns config which did not work was the network
interface configuration, which was OK because I didn't expect it to
work without fiddling.

The recdns servers are plugged into switch ports which present subnet
8 untagged (mainly to support initial bootstrap without requiring
special setup of the machine's BIOS), and subnet 12 with VLAN tags
(VLAN number 812). Each server has its own IPv4 and IPv6 addresses on
subnet 8 and subnet 12.

The service addresses recdns0 (subnet 8) and recdns1 (subnet 12) will
be additional (virtual) addresses which can be brought up on any of
the four servers. They will usually be configured something like:

  • recdns-wcdc: VRRP master for recdns0
  • recdns-rnb: VRRP backup for recdns0
  • recdns-sby: VRRP backup for recdns1
  • recdns-cnh: VRRP master for recdns1

And in case of multi-site failures, the recdns1 servers will act as
additional backups for the recdns0 servers and vice versa.

There were two problems with my initial untested configuration.

The known problem was that I was likely to need policy routing, to
ensure that packets with a subnet 12 source address were sent out with
VLAN 812 tags. This turned out to be true for IPv4, whereas IPv6 does
the Right Thing by default.

The unknown problem was that the VLAN 812 interface came up only
half-configured: it was using SLAAC for IPv6 instead of the static
address that I specified. This took a while to debug. The clue to the
solution came from running ifup with the -v flag to
get it to print out what it was doing:

# ip link delete em1.812
# ifup -v em1.812

This showed that interface configuration was failing when it tried to
set up the default route on that interface. Because there can be only
one default route, and there was already one on the main subnet 8
interface. D'oh!

Having got ifup to run to completion I was able to verify
that the subnet 12 routing worked for IPv6 but not for IPv4, pretty
much as expected. With advice from my colleagues David McBride and
Anton Altaparmakov I added the necessary runes to the configuration.

My final /etc/network/interfaces files on the recdns servers
are generated from the following Jinja template:

# This file describes the network interfaces available on the system
# and how to activate them. For more information, see interfaces(5).

# NOTE: There must be only one "gateway" line because there can be
# only one default route. Interface configuration will fail part-way
# through when you bring up a second interface with a gateway
# specification.

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface, on subnet 8
auto em1

iface em1 inet static
      address 131.111.8.{{ ifnum }}
      netmask 23

iface em1 inet6 static
      address 2001:630:212:8::d:{{ ifnum }}
      netmask 64

# VLAN tagged interface on subnet 12
auto em1.812

iface em1.812 inet static
      address 131.111.12.{{ ifnum }}
      netmask 24

      # send packets with subnet 12 source address
      # through routing table 12 to subnet 12 router

      up   ip -4 rule  add from table 12
      down ip -4 rule  del from table 12
      up   ip -4 route add default table 12 via
      down ip -4 route del default table 12 via

iface em1.812 inet6 static
      address 2001:630:212:12::d:{{ ifnum }}
      netmask 64

      # auto-configured routing works OK for IPv6

# eof
Example:  No such user