Mind your MTU. A tale of UniFi, EdgeRouter-X, IPSec and NPS.

As I previously wrote here, I’ve replaced one of the Watchguards with a UniFi AP and EdgerRouter X. Everything was pretty much fine, until we started converting wired computers to wireless in an effort to get rid of some obscure cabling.

To give you a bit of background in this setup the domain joined wireless clients authenticate to the network using EAP-TLS against a NPS Radius server.

I have this setup working perfectly fine behind Watchguards in other locations, so I’ve basically replicated the settings on the UniFI controller, but the clients refused to join the network for some reason.

So I was there looking at the incredibly difficult to read Accounting Logs on the NPS server, but it appeared that the clients were completing the authentication just fine. Well, at least the <Reason-Code data_type="0">0</Reason-Code> was logged. Anyway my eyes got tired pretty fast looking at that stuff!

I then seen that there were others on the Internet who had a bunch of NPS events in their Event Log while mine was pretty empty, so I spent a day trying to get the NPS Event Logging to work.

When I finally got it to work, I seen this event being logged:

Authentication Details:
	Connection Request Policy Name:	MY-WIFI-NETWORK
	Network Policy Name:                       -
	Authentication Provider:                  Windows
	Authentication Server:                      domain-controller.local
	Authentication Type:                         -
	EAP Type:                                            -
	Account Session Identifier:              -
	Reason Code:                                      3
	Reason:                                                The RADIUS Request message that Network Policy Server received from the network access server was malformed.

A malformed request you say, well OK I accept the challange!

So I ran the packet capture, and got this:

16:22:28.767060 IP unifi-ap.34381 > domain-controller.radius: RADIUS, Access-Request (1), id: 0x5a length: 198
16:22:28.802027 IP domain-controller.radius > unifi-ap.34381: RADIUS, Access-Challenge (11), id: 0x5a length: 90
16:22:28.811600 IP unifi-ap.34381 > domain-controller.radius: RADIUS, Access-Request (1), id: 0x5b length: 312
16:22:28.847224 IP domain-controller.radius > unifi-ap.34381: RADIUS, Access-Challenge (11), id: 0x5b length: 1472
16:22:28.847289 IP domain-controller > unifi-ap: udp
16:22:28.851982 IP unifi-ap.34381 > domain-controller.radius: RADIUS, Access-Request (1), id: 0x5c length: 213
16:22:28.884595 IP domain-controller.radius > unifi-ap.34381: RADIUS, Access-Challenge (11), id: 0x5c length: 1472
16:22:28.884655 IP domain-controller > unifi-ap: udp
16:22:28.889344 IP unifi-ap.34381 > domain-controller.radius: RADIUS, Access-Request (1), id: 0x5d length: 213
16:22:28.921571 IP domain-controller.radius > unifi-ap.34381: RADIUS, Access-Challenge (11), id: 0x5d length: 932
16:22:28.960512 IP unifi-ap.34381 > domain-controller.radius: RADIUS, Access-Request (1), id: 0x5e length: 1472
16:22:28.960530 IP unifi-ap > domain-controller: udp
16:22:31.960962 IP unifi-ap.34381 > domain-controller.radius: RADIUS, Access-Request (1), id: 0x5e length: 1472
16:22:31.960969 IP unifi-ap > domain-controller: udp
16:22:37.961414 IP unifi-ap.34381 > domain-controller.radius: RADIUS, Access-Request (1), id: 0x5e length: 1472

That didn’t tell me much, so l tried in a verbose mode, and got this back

15:01:14.454542 IP (tos 0x0, ttl 64, id 16451, offset 0, flags [+], proto UDP (17), length 1500)
    unifi-ap.32887 > domain-controller.radius: RADIUS, length: 1472
        Access-Request (1), id: 0xb0, Authenticator: fd09f3b2dcd8dd0d07d0cad52894ffa
          User-Name Attribute (1), length: 26, Value: host/windows7.local
          NAS-IP-Address Attribute (4), length: 6, Value: unifi-ap
          NAS-Identifier Attribute (32), length: 14, Value: f09fc229df71
          NAS-Port Attribute (5), length: 6, Value: 0
          Called-Station-Id Attribute (30), length: 29, Value: XX-XX-XX-XX-XX-XX:MY-WIFI-NETWORK
          Calling-Station-Id Attribute (31), length: 19, Value: XX-XX-XX-XX-XX-XX
          Framed-MTU Attribute (12), length: 6, Value: 1400
          NAS-Port-Type Attribute (61), length: 6, Value: Wireless - IEEE 802.11
          Connect-Info Attribute (77), length: 23, Value: CONNECT 0Mbps 802.11b
          EAP-Message Attribute (79), length: 255, Value: .F....
          EAP-Message Attribute (79), length: 255, Value: ..
          EAP-Message Attribute (79), length: 255, Value: A.2F2.l..0..9...zF?....
          EAP-Message Attribute (79), length: 255, Value: CA.crl0m..+........a0_0]..+.....0..Qhttp://pki.local/Enterprise%20Certificate%20Authority.crt0...*.H.......
          EAP-Message Attribute (79), length: 255, Value: c.I&....pBt.......6...b.......K&...."za...\.&.z..o.`^.O.k.x.Ox..b]{f........)U.L.+.&&f▒j..%.^Cw.\...z.~..$.........[7..A..g..0...L..4.{.z.LY....NY.O.o..B.XRLM6...>R!.E........a....... t.....0..,.a.u.l.Q..|..K..Q..4yz..M...K..H.......e;p'.wd..A..^...o~.>
          EAP-Message Attribute (79), length: 229 (bogus, goes past end of packet)

That bogus, goes past end of packet, caught my eye immediately, and then I noticed the packet length which appeared strangely big for the IPSec protected GRE tunnel.

So I googled and googled and found that one way around this was to reduce the MTU on the gre interfaces.

However, I also came across the MSS-Clamp which appears less intrusive as and it puts the overhead of managing the packet size on the end device rather than the router.

My calculations for the MSS-Clamp are as follows:

1500 Ethernet MTU
– 20 TCP Header *
– 20 IP Header *

– 20 IPSec Header
– 52 ESP Header
– 24 GRE Header
= 1364

So I round it down  to 1360 for a good measure, commit, and… nothing happened!

Of course, I forgot, the radius traffic is UDP, and MSS-Clamp applies to TCP only, but I am leaving that there anyway, as quite a few people on the Ubiquiti forums complained about dodgy TCP traffic over IPSec on these devices, and now that I think of it, this might have been a root cause of another issue with flaky RDP to that site.

So the maths to get to the right MTU size are as these above for the MSS-Clamp, less the items with the asterisk (IP and TCP Headers), therefore 1404, but lets round it down to 1400 as recommended by Cisco.

And this is set as follows on the EdgeRouter-X:

set firewall options mss-clamp interface-type tun
set firewall options mss-clamp mss 1360
set interfaces tunnel tun0 mtu 1400

Upon commit, the clients began authenticating sucessfully.

So here it is, and the lesson for today is – Mind your MTU.