Disabling TLS1, Veeam Backup & Replication 9.5 Failed to truncate Microsoft SQL Server transaction logs – 0x80004005

A recent vulnerability scan had flagged that the GPO to disable tls1 (think PCIDSS) wasn’t picked up by all systems that it should.

The reasons why it wasn’t are out of scope of this blog post.

What is worth discussing however is that, once this has been rectified and tls1 was disabled then Veeam Backup & Replication 9.5 could no longer do the application aware processing on MS SQL instances running on these machines.

Here is what Veeam console reported:

Unable to truncate Microsoft SQL Server transaction logs. Details: Failed to process 'TruncateSQLLog' command. Failed to truncate SQL server transaction logs for instances: MYINSTANCE. See guest helper log.

Ok, so checking the helper log I found these,

7/11/2018 3:23:04 AM   4328                  Using default SQL provider 'sqloledb' to connect to SQL server
7/11/2018 3:23:04 AM   4328  INFO            Connecting to mssql, connection string: Provider='sqloledb';Data Source='(local)';Integrated Security='SSPI';Persist Security Info=False, timeout: 15
7/11/2018 3:23:22 AM   4328  WARN                	Code = 0x80004005
7/11/2018 3:23:22 AM   4328  WARN                	Code meaning = Unspecified error
7/11/2018 3:23:22 AM   4328  WARN                	Source = Microsoft OLE DB Provider for SQL Server
7/11/2018 3:23:22 AM   4328  WARN                	Description = [DBNETLIB][ConnectionOpen (SECCreateCredentials()).]SSL Security error.
7/11/2018 3:23:22 AM   4328  WARN                COM error:  Code: 0x80004005

Once I saw that SSL Security error, it was obvious to me that this was related to recent tls1 disablement.

Just to be sure, I’ve enabled tls1 on one of the affected machines, and the warning went away on the next scheduled backup.

While I can’t find the exact url to the post on MSDN that brought me to resolution, the said post suggested that that enabling SQL Native Client Provider should do the trick.

This needs to be done on every machine running MS SQL that Veeam backups using the application aware processing.

So in order to get that going create the registry key (DWORD) with a value of 1

HKEY_LOCAL_MACHINE\SOFTWARE\Veeam\Veeam Backup and Replication\UseSqlNativeClientProvider

I’ve also had the machine rebooted, but not sure if it was required.

Anycasting with Docker and Exabgp

Context

I’ve been playing with the idea of Anycasting some of the services for a while.

The ideal candidate for this is syslog (and this is what we would focus on in this blog post) because there are quite a few products that I have that support only one syslog endpoint, and Watchguard firewalls are a perfect example of this.

In my current architecture pretty much all non-Windows services run as Docker containers on a number of non-clustered CoreOS servers. There is no need to cluster them as they operate independently of each other,
or replicate using the application layer (i.e. DNS master/slave replication).

Here is an example of how of two independent logstash instances receiving syslog messages look like on these nodes.

ROH:

9525f3def0dd        logstash:latest             "/docker-entrypoin..."   9 days ago          Up 20 hours             0.0.0.0:1514->1514/tcp, 0.0.0.0:5044->5044/tcp, 0.0.0.0:9600->9600/tcp, 0.0.0.0:514->10514/udp, 0.0.0.0:32768->10514/tcp         logstash

MFC:

2a4f9fe37700        logstash:latest             "/docker-entrypoin..."   11 days ago          Up 8 hours             0.0.0.0:1514->1514/tcp, 0.0.0.0:5044->5044/tcp, 0.0.0.0:9600->9600/tcp, 0.0.0.0:514->10514/udp, 0.0.0.0:32768->10514/tcp         logstash

These two run in two geographically distributed sites, ROH being in Asia, and MFC in Europe.

Idea

Like I previously mentioned the idea is to allow for downtime of one of these nodes without losing (too many) syslog messages.

I initially looked at Kubernetes and Calico as suggested in various places on the internet, but I gave up at the stage or reading the documentation, simply because this would mean redesigning the way I work with my containers.

So I instead created a loopback address on the CoreOS as and started experimenting with bird to announce that loopback to the neighbouring router.

I then realised that I got to track the health check of the containers, because the fact the CoreOS node is up does not necessary mean that the specific container or even more importantly the service with in that container is also fine.

While googling for the health check solution I came across this Reddit discussion where ExaBGP is mentioned.

So I replaced bird with ExaBGP, and tried a few quick tests:

Containers up in both locations

$ show ip bgp 10.255.255.1  
BGP routing table entry for 10.255.255.1/32
Paths: (2 available, best #1, table Default-IP-Routing-Table)
  Not advertised to any peer
  65001 65003
    10.255.254.21 from 10.255.254.21 (10.255.1.254)
      Origin IGP, metric 0, localpref 100, valid, external, best
      Last update: Thu Dec  28 20:01:48 2017
  65129 65131
    10.255.254.9 from 10.255.254.9 (10.255.129.254)
      Origin IGP, metric 0, localpref 100, valid, external
      Last update: Thu Dec  28 03:08:36 2017

One Container down

$ show ip bgp 10.255.255.1  
BGP routing table entry for 10.255.255.1/32
Paths: (1 available, best #1, table Default-IP-Routing-Table)
  Not advertised to any peer
  65129 65131
    10.255.254.9 from 10.255.254.9 (10.25.129.254)
      Origin IGP, metric 0, localpref 100, valid, external
      Last update: Thu Dec 28 03:08:36 2017

I then thought about this a little bit longer, and realised that every Anycasted service should have its separate /32 services, so that if the underlying container is down, the route could be withdrawn without affecting other services
running on the same CoreOS instance.

Luckily the exabgp’s health check supports adding the loopback interfaces on demand. I struggled to get this to work, but it boiled down to the need for exabgp execution with NET_ADMIN capabilities AND execution as root (user = root in the exabgp.env to be specific).

My exabgp container mikenowak/exabgp has this already setup, so grab that and don’t forget to star.

And here is the config used in the above example:

neighbor 10.255.3.254 {
        router-id 10.255.3.4;
        local-as 65003;
        peer-as 65001;
        md5-password 'PASSWORD';

        api services {
                processes [ watch-loghost ];
        }
}

process watch-loghost {
        encoder text;
        run python -m exabgp healthcheck --cmd "nc -z -w2 localhost 1514" --no-syslog --label loghost --withdraw-on-down --ip 10.255.255.1/32;
}

Quite happy with the results, I’ve built a Docker container mikenowak/exabgp that wraps everything in one simple place.

This can now be ran as follows

docker run -d --name exabgp --restart always -p 10.255.3.4:179:179 --cap-add=NET_ADMIN --net=host -v exabgp_usr_etc_exabgp:/usr/etc/exabgp mikenowak/exabgp

Few things to remember:

  • If you are receiving TCP+SSL syslog, make sure that the server certificate has the proper SAN, in my case I use loghost.domain.local
  • Docker containers would listen to all IP addresses of the CoreOS host (primary + loopbacks) unless you specify `-p 1.2.3.4:514:514/udp` as an argument when starting containers. I do that for all containers by default, but do NOT for Anycasted containers. The reason for that is that could easily confuse Nessus to produce inconsistent results while scanning the anycast addresses, provided that one container is at a different version than the other. So I instead scan the CoreOS primary IP and call it a day.

Hope this helps somebody!

Mind your MTU. A tale of UniFi, EdgeRouter-X, IPSec and NPS.

As I previously wrote here, I’ve replaced one of the Watchguards with a UniFi AP and EdgerRouter X. Everything was pretty much fine, until we started converting wired computers to wireless in an effort to get rid of some obscure cabling.

To give you a bit of background in this setup the domain joined wireless clients authenticate to the network using EAP-TLS against a NPS Radius server.

I have this setup working perfectly fine behind Watchguards in other locations, so I’ve basically replicated the settings on the UniFI controller, but the clients refused to join the network for some reason.

So I was there looking at the incredibly difficult to read Accounting Logs on the NPS server, but it appeared that the clients were completing the authentication just fine. Well, at least the <Reason-Code data_type="0">0</Reason-Code> was logged. Anyway my eyes got tired pretty fast looking at that stuff!

I then seen that there were others on the Internet who had a bunch of NPS events in their Event Log while mine was pretty empty, so I spent a day trying to get the NPS Event Logging to work.

When I finally got it to work, I seen this event being logged:

Authentication Details:
	Connection Request Policy Name:	MY-WIFI-NETWORK
	Network Policy Name:                       -
	Authentication Provider:                  Windows
	Authentication Server:                      domain-controller.local
	Authentication Type:                         -
	EAP Type:                                            -
	Account Session Identifier:              -
	Reason Code:                                      3
	Reason:                                                The RADIUS Request message that Network Policy Server received from the network access server was malformed.

A malformed request you say, well OK I accept the challange!

So I ran the packet capture, and got this:

16:22:28.767060 IP unifi-ap.34381 > domain-controller.radius: RADIUS, Access-Request (1), id: 0x5a length: 198
16:22:28.802027 IP domain-controller.radius > unifi-ap.34381: RADIUS, Access-Challenge (11), id: 0x5a length: 90
16:22:28.811600 IP unifi-ap.34381 > domain-controller.radius: RADIUS, Access-Request (1), id: 0x5b length: 312
16:22:28.847224 IP domain-controller.radius > unifi-ap.34381: RADIUS, Access-Challenge (11), id: 0x5b length: 1472
16:22:28.847289 IP domain-controller > unifi-ap: udp
16:22:28.851982 IP unifi-ap.34381 > domain-controller.radius: RADIUS, Access-Request (1), id: 0x5c length: 213
16:22:28.884595 IP domain-controller.radius > unifi-ap.34381: RADIUS, Access-Challenge (11), id: 0x5c length: 1472
16:22:28.884655 IP domain-controller > unifi-ap: udp
16:22:28.889344 IP unifi-ap.34381 > domain-controller.radius: RADIUS, Access-Request (1), id: 0x5d length: 213
16:22:28.921571 IP domain-controller.radius > unifi-ap.34381: RADIUS, Access-Challenge (11), id: 0x5d length: 932
16:22:28.960512 IP unifi-ap.34381 > domain-controller.radius: RADIUS, Access-Request (1), id: 0x5e length: 1472
16:22:28.960530 IP unifi-ap > domain-controller: udp
16:22:31.960962 IP unifi-ap.34381 > domain-controller.radius: RADIUS, Access-Request (1), id: 0x5e length: 1472
16:22:31.960969 IP unifi-ap > domain-controller: udp
16:22:37.961414 IP unifi-ap.34381 > domain-controller.radius: RADIUS, Access-Request (1), id: 0x5e length: 1472

That didn’t tell me much, so l tried in a verbose mode, and got this back

15:01:14.454542 IP (tos 0x0, ttl 64, id 16451, offset 0, flags [+], proto UDP (17), length 1500)
    unifi-ap.32887 > domain-controller.radius: RADIUS, length: 1472
        Access-Request (1), id: 0xb0, Authenticator: fd09f3b2dcd8dd0d07d0cad52894ffa
          User-Name Attribute (1), length: 26, Value: host/windows7.local
          NAS-IP-Address Attribute (4), length: 6, Value: unifi-ap
          NAS-Identifier Attribute (32), length: 14, Value: f09fc229df71
          NAS-Port Attribute (5), length: 6, Value: 0
          Called-Station-Id Attribute (30), length: 29, Value: XX-XX-XX-XX-XX-XX:MY-WIFI-NETWORK
          Calling-Station-Id Attribute (31), length: 19, Value: XX-XX-XX-XX-XX-XX
          Framed-MTU Attribute (12), length: 6, Value: 1400
          NAS-Port-Type Attribute (61), length: 6, Value: Wireless - IEEE 802.11
          Connect-Info Attribute (77), length: 23, Value: CONNECT 0Mbps 802.11b
          EAP-Message Attribute (79), length: 255, Value: .F....
          EAP-Message Attribute (79), length: 255, Value: ..
          EAP-Message Attribute (79), length: 255, Value: A.2F2.l..0..9...zF?....
          EAP-Message Attribute (79), length: 255, Value: CA.crl0m..+........a0_0]..+.....0..Qhttp://pki.local/Enterprise%20Certificate%20Authority.crt0...*.H.......
          EAP-Message Attribute (79), length: 255, Value: c.I&....pBt.......6...b.......K&...."za...\.&.z..o.`^.O.k.x.Ox..b]{f........)U.L.+.&&f▒j..%.^Cw.\...z.~..$.........[7..A..g..0...L..4.{.z.LY....NY.O.o..B.XRLM6...>R!.E........a....... t.....0..,.a.u.l.Q..|..K..Q..4yz..M...K..H.......e;p'.wd..A..^...o~.>
          EAP-Message Attribute (79), length: 229 (bogus, goes past end of packet)

That bogus, goes past end of packet, caught my eye immediately, and then I noticed the packet length which appeared strangely big for the IPSec protected GRE tunnel.

So I googled and googled and found that one way around this was to reduce the MTU on the gre interfaces.

However, I also came across the MSS-Clamp which appears less intrusive as and it puts the overhead of managing the packet size on the end device rather than the router.

My calculations for the MSS-Clamp are as follows:

1500 Ethernet MTU
– 20 TCP Header *
– 20 IP Header *

– 20 IPSec Header
– 52 ESP Header
– 24 GRE Header
= 1364

So I round it down  to 1360 for a good measure, commit, and… nothing happened!

Of course, I forgot, the radius traffic is UDP, and MSS-Clamp applies to TCP only, but I am leaving that there anyway, as quite a few people on the Ubiquiti forums complained about dodgy TCP traffic over IPSec on these devices, and now that I think of it, this might have been a root cause of another issue with flaky RDP to that site.

So the maths to get to the right MTU size are as these above for the MSS-Clamp, less the items with the asterisk (IP and TCP Headers), therefore 1404, but lets round it down to 1400 as recommended by Cisco.

And this is set as follows on the EdgeRouter-X:

set firewall options mss-clamp interface-type tun
set firewall options mss-clamp mss 1360
set interfaces tunnel tun0 mtu 1400

Upon commit, the clients began authenticating sucessfully.

So here it is, and the lesson for today is – Mind your MTU.

Working around the read-only file systems in CoreOS with overlay

I had a specific use case to place the quiesce scripts on the CoreOS running in a VMware virtual machine, so that I could take a consistent backup with Veaam.

While I generally agree this is a bad idea, and I admit that I store most of the important stuff in git, there are times when I am lazy in development and just want to have a backup of any sort.

So right back to the subject, shall we?

Of course building own image and keeping it up to date is one of the options, but let’s call it a plan Z for the moment.

Luckily, an overlay mounts can be used to work around the fact that /usr is a read-only partition.

I decided to keep the scripts ion /opt/sbin (as this location is read-write and persists reboots).

It is as simple as:

mkdir /opt/sbin
mount -o "lower=/usr/sbin:/opt/sbin" -t overlay overlay /usr/sbin

Also in order to survive the reboots we need the following systemd unit:

[Unit]
Description=Overlay mount /usr/sbin mount
Before=local-fs.target
ConditionPathExists=/opt/sbin

[Mount]
Type=overlay
What=overlay
Where=/usr/sbin
Options=lowerdir=/usr/sbin:/opt/sbin

[Install]
WantedBy=local-fs.target

Finally here are my quiesce tools that I use.

The /usr/sbin/pre-freeze-script script shuts down all the docker containers.

$ cat /usr/sbin/pre-freeze-script
#!/bin/bash
docker stop $(docker ps -aq) >/dev/null 2>&1

The /usr/sbin/post-thaw-script script restarts docker.service. This forces all containers to start up in the right order (think legacy links). I attempted to write logic to start them containers without service restart, but that became pretty complex code with no added benefit so I just gave up.

$ cat /usr/sbin/post-thaw-script
#!/bin/bash
systemctl restart docker.service >/dev/null 2>&1

NPS Authentication events not showing up in Event Log

While debugging EAP-TLS authentication between Windows 7 desktop and the Windows Server 2016 NPS, I noticed that the Event Log for Network Policy and Access Services was pretty empty compared to screenshots that I have found while talking to google.

The only Event IDs that I could see at the time were 4400 generated when NPS connects to AD (LDAP) and 13 when the Nessus scans the network overnight.
There were none of authentication events logged (6272, and 6278) that I have seen on the Internet.

I double checked NPS Event Logging and it was indeed enabled.

Then, I came across an article that suggests that Network Policy Server (NPS) may not log successful authentication events or failed authentication events in the Security log in Event Viewer. This actually talks about Windows 2008, but s I decided to give it a go anyway, and it didn’t work either.

Finally a colleague suggested checking the audit settings in GPO, as he had recollections of changing something there in the rather distant past.

So I talked to google again and figured out that a new Policy with the following settings would be in order.

I applied that to the NPS servers and BOOM!, the authentication events finally started to show up in the Event Log.

Update

I checked archives from ELK going back to 2015 when 802.11x was originally set up and I never seen any of these 6272 Event IDs, not even on NPS running on Windows 2012R2 so my guess is that this is disabled by default and has to be explicitly enabled.

All in all this exercise was worth the trouble because the Event Log is much easier to read than that funny accounting log.

New ESXi installation findings in Nessus

While running a vulnerability scan against a new system, that I am building for a colo I had Nessus pick these on a brand new ESXi install.

SSL Certificate Cannot Be Trusted

The self-signed certificates are a real nightmare in implementing a successful security programme as they form bad habits (“I will accept that certificate without thinking, and then any other that I come across”) among our user base, and as such should be replaced with properly signed certificates before the service/application reaches the production stage.

Anyway, the procedure below describes the steps needed to replace the self-signed certificates that came with the ESXi with your own.

Firstly, generate a proper CSR with the below command:

openssl req \
-new \
-newkey rsa:4096 \
-days 3650 \
-nodes \
-subj "/C=AQ/ST=Antarctica/L=South Pole/O=mikenowak.org/CN=esxi1.mikenowak.org" \
-reqexts SAN \
-config <(cat /etc/ssl/openssl.cnf \
<(printf "[SAN]\nsubjectAltName='DNS:esxi1.mikenowak.org'")) \
-keyout ${1}.key \
-out ${1}.csr

The reason I stress *proper* above is that the CSRs generated from within the ESXi UI (at the time of writing this post) lack the SAN (subjectAltName) attribute.

Now, starting with version 58 Chrome had deprecated CN subject matching, and certificates without a valid SAN are rejected with NET::ERR_CERT_COMMON_NAME_INVALID, like in the screenshot below.

Now have that CSR signed by your Corporate CA or some other CA trusted by your clients.

With both key and signed certificate at hand we can now install these on the ESXi server in the following locations:

* Key: /etc/vmware/ssl/rui.key
* Certificate: /etc/vmware/ssl/rui.crt

NB: It is also a good idea to add your Corporate CA Certificate to /etc/vmware/ssl/castore.pem if you’re planning on shipping syslog from ESXi to Splunk over TCP+SSL.

Save the changes by running auto-backup.sh

Finally, run /etc/init.d/rhttpproxy restart to restart the rhttpproxy service.

SSH Server CBC Mode Ciphers Enabled

So for reference the solution is pretty simple and boils down to changing the ciphers and algorithms in the /etc/ssh/sshd_config, as follows

replace

Ciphers aes128-ctr,aes192-ctr,aes256-ctr,3des-cbc

MACs hmac-sha2-256,hmac-sha2-512,hmac-sha1

with

Ciphers chacha20-poly1305@openssh.com,aes256-gcm@openssh.com,aes128-gcm@openssh.com,aes256-ctr,aes192-ctr,aes128-ctr
KexAlgorithms curve25519-sha256@libssh.org,ecdh-sha2-nistp521,ecdh-sha2-nistp384,ecdh-sha2-nistp256,diffie-hellman-group-exchange-sha256
MACs hmac-sha2-512-etm@openssh.com,hmac-sha2-256-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-512,hmac-sha2-256,umac-128@openssh.com

Now run the auto-backup.sh to save the change and restart sshd.

I also highly recommend checking Mozilla’s OpenSSH Security Guideline.

Good luck!

Homenet update September 2017

I recently went for a month long vacation in Europe, and while in there I managed to do some infrastructure clean-up that was long overdue.

Sayonara Watchguard

As of today most of my network runs on Watchguards. This is because I was able to get them very cheap on ebay 2nd handed. Plus I had a deal with their reseller who in exchange for some consulting I done for them in the past was able to get me decent price on the LiveSecurity support contracts. That had changed recently, as the said reseller is no longer carrying Watchguard.

Not a big problem just yet, because I only renewed most of support contracts with a 3 year pack at the end of 2016.

Except that one ancient Watchguard XTM22-W that has been EOL for a while and that I couldn’t be bothered with until sometime this summer when it had a close encounter with a lighting that partially killed the thing.

I say partially because it still pushed packets, did switching and all that, but the WIFI and all gigabit ports were as dead as it gets.

Hajimemashite UniFi

So I’ve heard a lot of good things about Ubiquiti’s UniFi AP range, and for a while now I’ve been toying with the idea of trying them out, but just because the Watchguards worked all fine I couldn’t justify the purchase until now! Oh happy days!

I ordered a UAP-LR model that is a Long Range model as we got a bit of land at this location and it would be nice to have connectivity in the yard.

I initially connected it to a 100M port on the watchguard, since 1G ports were all kaput.
I then set up the UniFi controller (went with a docker image route), and to be honest I am loving it.

 

After playing with it for a few days, I convinced myself that it was worth the money and that when the support on other Watchguards expire this is the way to go.

But that 100M port on a watchguard didn’t really appeal to me so much and one very late evening I pulled a trigger and ordered an EdgeRouter POE.

To keep a long story short this turned out to be complete mistake and waste of money, brain and time.
Not that I have anything against that model, but the truth is that I didn’t do my research properly and did not realise that it did not support VLAN-aware which I need for this specific use case.

# set interfaces switch switch0 switch-port vlan-aware enable
Error: platform does not support this setting in switch0

Oh well, we live and learn.

I then some $100 shorter, convinced myself that I need to be bothered to do the research properly this time!

This time I settled with a Edgerouter-X which is vlan-aware, has PoE pass-through and is half the price of the ER-PoE model.

I really love the PoE pass-through as this means that I can power up both the router and the AP from a single PoE injector.

At the time of writing this blog post this kit has been in service for just over 2 months and it’s been pretty much rock solid.

Oh and the HP Microserver g7

I’ve also finally shut down the old HP Microserver G7 that has been severing as my Primary European Domain Controller, WSUS, WDS, and File Server for a good few years now.

These services were virtualized and migrated to a newer HP Microserver Gen8 over the years, and the only reason it wasn’t decommissioned earlier was that Microserver g7 doesn’t have the ILO, so I couldn’t do that remotely. Oh yeah, I totally love ILO functionality on the Gen8.

I then listed both the Microserver and the Edgerouter POE for the sale on the Internets, and the Microserver had sold the day I listed it, and strangely I made a profit on if we count the HP cashback that I received when I originally bought it. Not bad!

Now only if I could flog that Edgerouter PoE that would be great!

BGP filters on EdgeOS

Hello Internet Exchange

I’ve been playing with Bird these days, and all of this in preparation for an anycast project that I hope will come my way later in the year.

Since my setup was pretty much contained within my homelab, I hit some limitations that stopped me from making progress in the direction that I was hopping for.

To keep this introduction short, I am going to ommit the details of the project and just say that I made a decision to expand my homelab and join the Internet Exchange.

I’ve applied for an ASN number via LIR back in May, and AS62184 has been assigned to me. I’ve used LIR SERVICES to facilitate the application.

As for the Exchange I’ve decided on KleyReX in Franfkurt, Germany because at this time they offer a free 100M port, and of course keeping my costs down is a priority here.

To connect all of this up I’ve acquired a second hand Ubiquiti EdgeRouter PRO and a 4GB memory upgrade (HYNIX HMT351S6CFR8C-PB to be specific), so that the multiple full tables would fit comfortably.

It’s been shipped to Frankfurt and is operational now.

So what’s next now?

Defining the Peering Policy

As soon as I connected to KleyReX the requests for peering started coming my way, but I wasn’t ready and had to defer them until such time that I had filters in place.

So, I needed filters, but what was I going to filter? Well, I needed a peering policy to define that.

With that in mind, I spent a few evenings researching what the big boys do and what the industry practices were. I then had that mixed what I already knew and wanted to do. The result was this list:

* MD5 Authentication is strongly preferred.
* Accept prefixes of length /24 and shorter for IPv4 and /48 or shorter for IPv6.
* Max-prefix filters are used for all peerings.
* Discard prefixes where NEXT_HOP doesn’t match the neighbour’s IP Address.
* Discard prefixes where first AS in the AS_PATH doesn’t match neighbour’s AS.
* Discard prefixes with Private AS anywhere in the AS_PATH.
* Discard bogon prefixes.

Now that I had policy I had to make sure it was enforced!

Building the filters

Firstly, lets define a new BGP neighbour:

set protocols bgp 206001 neighbor 193.189.82.197 description 'Google Inc. IPv4 (AS15169) via kleyrex'
set protocols bgp 206001 neighbor 193.189.82.197 remote-as 15169
set protocols bgp 206001 neighbor 193.189.82.197 password PASSWORD
set protocols bgp 206001 neighbor 193.189.82.197 route-map import import4-AS15169
set protocols bgp 206001 neighbor 193.189.82.197 route-map export export4-AS15169
set protocols bgp 206001 neighbor 193.189.82.197 address-family ipv6-unicast route-map import deny6
set protocols bgp 206001 neighbor 193.189.82.197 address-family ipv6-unicast route-map export deny6
set protocols bgp 206001 neighbor 193.189.82.197 soft-reconfiguration inbound
set protocols bgp 206001 neighbor 193.189.82.197 update-source 193.189.82.XXX
set protocols bgp 206001 neighbor 193.189.82.197 remove-private-as
set protocols bgp 206001 neighbor 193.189.82.197 maximum-prefix 15000

As you have probably noticed the above takes care of both the MD5 Authentication is strongly preferred and Max-prefix filters are used for all peerings requirements.

Now with that in place let’s take each of the remaining policy points one by one and build a configuration to enforce them.

Accept prefixes of length /24 and shorter for IPv4 and /48 or shorter for IPv6.

set policy prefix-list too-long-or-too-short4 rule 1 action deny
set policy prefix-list too-long-or-too-short4 rule 1 le 25
set policy prefix-list too-long-or-too-short4 rule 1 ge 7
set policy prefix-list too-long-or-too-short4 rule 1 prefix 0.0.0.0/0

set policy prefix-list6 too-long-or-too-short6 rule 1 action deny
set policy prefix-list6 too-long-or-too-short6 rule 1 le 49
set policy prefix-list6 too-long-or-too-short6 rule 1 ge 7
set policy prefix-list6 too-long-or-too-short6 rule 1 prefix '::/0'

set policy route-map import4-AS15169 description "Google Inc. IPv4 IMPORT"
set policy route-map import4-AS15169 rule 1 action deny
set policy route-map import4-AS15169 rule 1 match ip address prefix-list too-long-or-too-short4

Discard prefixes with Private AS anywhere in the AS_PATH.

set policy as-path-list private rule 1 action permit
set policy as-path-list private rule 1 regex '_6(4(5(1[2-9]|[2-9][0-9])|[6-9][0-9][0-9])|5([0-4][0-9][0-9]|5([0-2][0-9]|3[0-5])))_'
set policy as-path-list private rule 2 action permit
set policy as-path-list private rule 2 regex '_42([0-8][0-9][0-9][0-9][0-9][0-9][0-9][0-9]|9([0-3][0-9][0-9][0-9][0-9][0-9][0-9]|4([0-8][0-9][0-9][0-9][0-9][0-9]|9([0-5][0-9][0-9][0-9][0-9]|6([0-6][0-9][0-9][0-9]|7([0-1][0-9][0-9]|2([0-8][0-9]|9[0-4])))))))_'

set policy route-map import4-AS15169 rule 2 action deny
set policy route-map import4-AS15169 rule 2 match as-path private
set policy route-map import4-AS15169 rule 99 action deny

Only permit prefixes with a valid IRRDB entry (and therefore Discard bogon prefixes)

set policy route-map import4-AS15169 rule 100 match ip address prefix-list prefix4-AS-GOOGLE

We also need to build the list of prefixes to accept, but this subject is intentionally left for another blog post.

Discard prefixes where NEXT_HOP doesn’t match the neighbour’s IP Address.

set policy prefix-list nexthop4-AS15169 rule 1 action permit
set policy prefix-list nexthop4-AS15169 rule 1 prefix 193.189.82.197/32
set policy route-map import4-AS15169 rule 100 match ip nexthop prefix-list nexthop4-AS15169

Discard prefixes where first AS in the AS_PATH doesn’t match neighbour’s AS.

set policy as-path-list path-AS15169 rule 1 action permit
set policy as-path-list path-AS15169 rule 1 regex "^15169_"
set policy route-map import4-AS15169 rule 100 match as-path path-AS15169

Automation

I had wrapped everything discussed here in a set of scripts that I now publish as asn-tools, and they have now been extended to support IPv6 neighbours and the ripe whois import/export rules generator.

So the config used in the above example would be generated by confgen.rb (part of asn-tools) as follows:

Usage confgen.rb [options]
-r, --router=ROUTER Router hostname
-t, --type=TYPE Peer type
-a, --as=AS AS Number
-A, --all Generate configuration for all peeers
-h, --help Display this screen

$ confgen.rb -r rtr1 -t peering -a 15169

Backup Watchguard configs with Oxidized

In order to help a client meet a compliance requirement I worked on a project to bring the Watchguard Firewall configuration under the version control.

The client had an existing oxidized installation that was backing up their routers and switches. The missing piece were the Watchguard firewalls.

I found that Watchguard devices were already supported by oxidized, and the only challenge at the time was that the Watchguards run SSH on port 4118, as opposed to default SSH port 22.

I quickly found that this could be addressed in oxidized.conf as follows:

source:
default: csv
csv:
[...]
vars_map:
ssh_port: 4

and then in the router.db we declare the device in fashion similar to the below snippet:

hostname:firewareos:status:password:4118

Unfortunately, having set that up I learned that oxidized could not connect to WatchGuard devices because my client uses the logon disclaimers that had to be accepted by typing “yes”.

This functionality wasn’t implemented in oxidized, so I added the logon disclaimers support, and was really amazed how fast the PR was merged in the upstream.

The last piece of the puzzle was the requirement to run the config retrieval as a non-privileged user (in spirit of the least privilege) so another PR went upstream to implement that.

Now the configs are being properly backed up to oxidized, and the client confirmed that they were able to successfully test the restore of a config from oxidized to a device by simply loading the xml in oxidized into the XTM Policy Manager and saving that to the device.

Here is a screenshot of what it looks like in oxidized:

Update

The client had a failure of one of the their Watchguard firewalls and it was replaced by the vendor. They had loaded the configs from oxidized and found that that the certificates were invalid.

I had opened a case with watchguard for this and it’s been confirmed to me that the command that oxidized uses to retrieve the config (export config to console), does not backup the following:

* admin/status passwords
* certificates
* feature keys

Fortunately, the client keeps the certificates in a separate offline repository, so they were able to restore them swiftly, but you’ve been warned.

Here is a quote from my ticket with Watchguard that explains this and the alternative in detail:

When you restore using the Configuration file (.xml file), the password and certificates are not restored. Those are only restored only if you use the backup image (.fxi file).

A backup image is an encrypted and saved copy of the disk image from the Firebox local memory. A Firebox backup image includes:

  • Fireware OS
  • Configuration file
  • Certificates
  • Feature keys

You can run a backup image from the CLI (SSH interface) to either a USB or an FTP server:

backup image (password) to [location | usb filename]

Unfortunately, a scheduled backup is not possible, it has to be done manually.