On missing IPv6 router advertisements

03.05.2020 16:58

I've been having problems with Internet connectivity for the past week or so. Randomly connections would timeout and some things would work very slowly or not at all. In the end it turned out to be a problem with IPv6 routing. It seems my Internet service provider is having problems with sending out periodic Router Advertisements and the default route on my router often times out. I've temporarily worked around it by manually adding a route.

I'm running a simple, dual-stack network setup. There's a router serving a LAN. The router is connected over an optical link to the ISP that's doing Prefix Delegation. Problems appeared as intermittent. A lot of software seems to gracefully fall back onto IPv4 if IPv6 stops working, but there's usually a more or less annoying delay before it does that. On the other hand some programs don't and seem to assume that there's global connectivity as long as a host has a globally-routable IPv6 address.

The most apparent and reproducible symptom was that IPv6 pings to hosts outside of LAN often weren't working. At the same time, hosts on the LAN had valid, globally-routable IPv6 addresses, and pings inside the LAN would work fine:

$ ping -6 -n3 host-on-the-internet
connect: Network is unreachable
$ ping -6 -n3 host-on-the-LAN
PING ...(... (2a01:...)) 56 data bytes
64 bytes from ... (2a01:...): icmp_seq=1 ttl=64 time=0.404 ms
64 bytes from ... (2a01:...): icmp_seq=2 ttl=64 time=0.353 ms
64 bytes from ... (2a01:...): icmp_seq=3 ttl=64 time=0.355 ms

--- ... ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2026ms
rtt min/avg/max/mdev = 0.353/0.370/0.404/0.032 ms

Rebooting my router seemed to help for a while, but then the problem would reappear. After some debugging I've found out that the immediate cause of the problems was that the default route on my router would disappear approximately 30 minutes after it has been rebooted. It would then randomly re-appear and disappear a few times a day.

On my router, the following command would return empty most of the time:

$ ip -6 route | grep default

But immediately after a reboot, or if I got lucky, I would get a route. I'm not sure why there are two identical entries here, but the only difference is the from field:

$ ip -6 route | grep default
default from 2a01::... via fe80::... dev eth0 proto static metric 512 pref medium
default from 2a01::... via fe80::... dev eth0 proto static metric 512 pref medium

The following graph shows the number of entries returned by the command above over time. You can see that most of the day router didn't have a default route:

Number of valid routes obtained from RA over time.

The thing that was confusing me the most was the fact that the mechanism for getting the default IPv6 route is distinct from the the way the prefix delegation is done. This means that every device in the LAN can get a perfectly valid, globally-routable IPv6 address, but at the same time there can be no configured route for packets going outside of the LAN.

The route is automatically configured via Router Advertisement (RA) packets, which are part of the Neighbor Discovery Protocol. When my router first connects to the ISP, it sends out a Router Solicitation (RS). In response to the RS, the ISP sends back a RA. The RA contains the link-local address to which the traffic intended for the Internet should be directed to, as well as a Router Lifetime. Router Lifetime sets a time interval for which this route is valid. This lifetime appears to be 30 minutes in my case, which is why rebooting the router seemed to fix the problems for a short while.

The trick is that the ISP should later periodically re-send the RA by itself, refreshing the information and lifetime, hence pushing back the deadline at which the route times out. Normally, a new RA should arrive well before the lifetime of the first one runs out. However in my case, it seemed that for some reason the ISP suddenly started sending out RA's only sporadically. Hence the route would timeout in most cases, and my router wouldn't know where to send the packets that were going outside of my LAN.

To monitor RA packets on the router using tcpdump:

$ tcpdump -v -n -i eth0 "icmp6 && ip6[40] == 134"

This should show packets like the following arriving in intervals that should be much shorter than the advertised router lifetime. On a different, correctly working network, I've seen packets arriving roughly once every 10 minutes with lifetime of 30 minutes:

18:52:01.080280 IP6 (flowlabel 0xb42b9, hlim 255, next-header ICMPv6 (58) payload length: 176)
fe80::... < ff02::1: [icmp6 sum ok] ICMP6, router advertisement, length 176
	hop limit 64, Flags [managed, other stateful], pref medium, router lifetime 1800s, reachable time 0ms, retrans timer 0ms
	...
19:00:51.599538 IP6 (flowlabel 0xb42b9, hlim 255, next-header ICMPv6 (58) payload length: 176) 
fe80::... < ff02::1: [icmp6 sum ok] ICMP6, router advertisement, length 176
	hop limit 64, Flags [managed, other stateful], pref medium, router lifetime 1800s, reachable time 0ms, retrans timer 0ms
	...

However in this case this wasn't happening. Similarly to what the graph above shows, these packets only arrive sporadically. As far as I know, this is an indication that something is wrong on the ISP side. Sending a RA in response to RS seems to work, but periodic RA sending doesn't. Strictly speaking there's nothing that can be done to fix this on my end. My understanding of RFC 4861 is that a downstream host should only send out RS once, after connecting to the link.

Once the host sends a Router Solicitation, and receives a valid Router Advertisement with a non-zero Router Lifetime, the host MUST desist from sending additional solicitations on that interface, until the next time one of the above events occurs.

Indeed, as far as I can see, Linux doesn't have any provisions for re-sending RS in case all routes from a previously received RAs time out. This answer argues that it should, but I can find no references that would confirm this. On the other hand, this answer agrees with me that RS should only be sent when connecting to a link. On that note, I've also found a discussion that mentions blocking multicast packets as a cause of similar problems. I don't believe that is the case here.

In the end I've used an ugly workaround so that things kept working. I've manually added a permanent route that is identical to what is randomly advertised in RA packets:

$ ip -6 route add default via fe80::... dev eth0

Compared to entries originating from RA this manual entry in the routing table won't time out - at least not until my router gets rebooted. It also doesn't hurt anything if additional, identical routes get occasionally added via RA. Of course, it still goes completely against the IPv6 neighbor discovery mechanism. If anything changes on the ISP side, for example if the link-local address of the router changes, the entry won't get updated and the network will break again. However it does seem fix my issues at the moment. The fact that it's working also seems to confirm my suspicion that something is only wrong with RA transmissions on the ISP side, and that actual routing on their end works correctly. I've reported my findings to the ISP and hopefully things will get fixed on their end, but in the mean time, this will have to do.

Posted by Tomaž | Categories: Code

Comments

I found your post after I discovered that the exact same thing seemed to be happening with my ISP (Rogers) in Ontario, Canada. I was hoping for a better solution than adding the default route manually, but it works for now.

Posted by Alex

Add a new comment


(No HTML tags allowed. Separate paragraphs with a blank line.)