Anycasting with Docker and Exabgp

Context

I’ve been playing with the idea of Anycasting some of the services for a while.

The ideal candidate for this is syslog (and this is what we would focus on in this blog post) because there are quite a few products that I have that support only one syslog endpoint, and Watchguard firewalls are a perfect example of this.

In my current architecture pretty much all non-Windows services run as Docker containers on a number of non-clustered CoreOS servers. There is no need to cluster them as they operate independently of each other,
or replicate using the application layer (i.e. DNS master/slave replication).

Here is an example of how of two independent logstash instances receiving syslog messages look like on these nodes.

ROH:

9525f3def0dd        logstash:latest             "/docker-entrypoin..."   9 days ago          Up 20 hours             0.0.0.0:1514->1514/tcp, 0.0.0.0:5044->5044/tcp, 0.0.0.0:9600->9600/tcp, 0.0.0.0:514->10514/udp, 0.0.0.0:32768->10514/tcp         logstash

MFC:

2a4f9fe37700        logstash:latest             "/docker-entrypoin..."   11 days ago          Up 8 hours             0.0.0.0:1514->1514/tcp, 0.0.0.0:5044->5044/tcp, 0.0.0.0:9600->9600/tcp, 0.0.0.0:514->10514/udp, 0.0.0.0:32768->10514/tcp         logstash

These two run in two geographically distributed sites, ROH being in Asia, and MFC in Europe.

Idea

Like I previously mentioned the idea is to allow for downtime of one of these nodes without losing (too many) syslog messages.

I initially looked at Kubernetes and Calico as suggested in various places on the internet, but I gave up at the stage or reading the documentation, simply because this would mean redesigning the way I work with my containers.

So I instead created a loopback address on the CoreOS as and started experimenting with bird to announce that loopback to the neighbouring router.

I then realised that I got to track the health check of the containers, because the fact the CoreOS node is up does not necessary mean that the specific container or even more importantly the service with in that container is also fine.

While googling for the health check solution I came across this Reddit discussion where ExaBGP is mentioned.

So I replaced bird with ExaBGP, and tried a few quick tests:

Containers up in both locations

$ show ip bgp 10.255.255.1  
BGP routing table entry for 10.255.255.1/32
Paths: (2 available, best #1, table Default-IP-Routing-Table)
  Not advertised to any peer
  65001 65003
    10.255.254.21 from 10.255.254.21 (10.255.1.254)
      Origin IGP, metric 0, localpref 100, valid, external, best
      Last update: Thu Dec  28 20:01:48 2017
  65129 65131
    10.255.254.9 from 10.255.254.9 (10.255.129.254)
      Origin IGP, metric 0, localpref 100, valid, external
      Last update: Thu Dec  28 03:08:36 2017

One Container down

$ show ip bgp 10.255.255.1  
BGP routing table entry for 10.255.255.1/32
Paths: (1 available, best #1, table Default-IP-Routing-Table)
  Not advertised to any peer
  65129 65131
    10.255.254.9 from 10.255.254.9 (10.25.129.254)
      Origin IGP, metric 0, localpref 100, valid, external
      Last update: Thu Dec 28 03:08:36 2017

I then thought about this a little bit longer, and realised that every Anycasted service should have its separate /32 services, so that if the underlying container is down, the route could be withdrawn without affecting other services
running on the same CoreOS instance.

Luckily the exabgp’s health check supports adding the loopback interfaces on demand. I struggled to get this to work, but it boiled down to the need for exabgp execution with NET_ADMIN capabilities AND execution as root (user = root in the exabgp.env to be specific).

My exabgp container mikenowak/exabgp has this already setup, so grab that and don’t forget to star.

And here is the config used in the above example:

neighbor 10.255.3.254 {
        router-id 10.255.3.4;
        local-as 65003;
        peer-as 65001;
        md5-password 'PASSWORD';

        api services {
                processes [ watch-loghost ];
        }
}

process watch-loghost {
        encoder text;
        run python -m exabgp healthcheck --cmd "nc -z -w2 localhost 1514" --no-syslog --label loghost --withdraw-on-down --ip 10.255.255.1/32;
}

Quite happy with the results, I’ve built a Docker container mikenowak/exabgp that wraps everything in one simple place.

This can now be ran as follows

docker run -d --name exabgp --restart always -p 10.255.3.4:179:179 --cap-add=NET_ADMIN --net=host -v exabgp_usr_etc_exabgp:/usr/etc/exabgp mikenowak/exabgp

Few things to remember:

  • If you are receiving TCP+SSL syslog, make sure that the server certificate has the proper SAN, in my case I use loghost.domain.local
  • Docker containers would listen to all IP addresses of the CoreOS host (primary + loopbacks) unless you specify `-p 1.2.3.4:514:514/udp` as an argument when starting containers. I do that for all containers by default, but do NOT for Anycasted containers. The reason for that is that could easily confuse Nessus to produce inconsistent results while scanning the anycast addresses, provided that one container is at a different version than the other. So I instead scan the CoreOS primary IP and call it a day.

Hope this helps somebody!