CubieTruck Ethernet RX drops

22.11.2014 19:16

Last month I've written about occasional CRC errors on CubieTruck's SATA bus. Wired Ethernet interface is another device on this same board that is reporting intermittent errors in a similar fashion. This is how the packet error statistics page on Munin looks like:

CubieTruck eth0 errors by month.

190 micropackets per second on average is not a lot. That's around one dropped packet every two hours or one for every 47000 packets received. Supposedly the higher-level protocols detect the dropped packet and request a retransmission, so apart from higher lag every once in a while, this shouldn't be noticeable (in contrast for instance to the EeePC TCP checksum thing). Still, it is curious. This CubieTruck board is the only Linux-running computer I've personally seen that has the drops figure higher than zero.

Munin picks up the number from /sys/class/net/eth0/statistics/rx_dropped. This counter is somewhat ambiguously described by kernel documentation:

Indicates the number of packets received by the network device but dropped, that are not forwarded to the upper layers for packet processing. See the network driver for the exact meaning of this value.

First I thought the drops might be Ethernet frame CRC errors. However, it turns out that those are counted somewhere else. The errors also don't seem to go away when changing the UTP cable or the switch the CubieTruck is connected to. At least judging by a simple test with iperf, the occurrence of errors isn't correlated with the load on the interface (in other words, pushing more packets around doesn't make drops more common).

I guess it's time to look into the kernel source as the documentation suggests. I dug into Linux 3.4.95 for Allwinner A20 I'm currently using. The errors however also appeared when I was running the kernel shipped in CubieTruck Ubuntu Server installation. As far as I can see, no recent changes were committed to the git repository that would affect this driver. In any case, the only place where the rx_dropped count is incremented is in drivers/net/ethernet/allwinner/gmac/gmac_core.c on line 1179:

skb = priv->rx_skbuff[entry];
if (unlikely(!skb)) {
	pr_err("%s: Inconsistent Rx descriptor chain\n",
		priv->ndev->name);
	priv->ndev->stats.rx_dropped++;
	break;
}

This is about as far as I got so far in tracking the source of this problem. I have no idea what an "inconsistent Rx descriptor chain" is or why it might happen.

Searching the web didn't turn up any complaints regarding CubieTruck's Ethernet, so I might just be lucky and got a marginal board. Or maybe I'm just more annoyed by non-zero error counts than most. On the other hand, some quick inquiries on the IRC channels turned out that it is indeed quite unusual to see any rx_dropped number higher than zero. So, if anyone has seen similar errors on a CubieTruck, please leave a comment below.

Posted by Tomaž | Categories: Code

Comments

Hi Tomaž, I do have network problems with my cubietruck and ifconfig shows: RX packets:86768817 errors:0 dropped:506838 which is a bit less than 1%. However, I'm also experiencing the sort of networking problems that you don't want to have: hung (and eventually broken) tcp connections where there's data in the send queue; intermittent problems when trying to open simple web pages etcetera. I don't like it. I have a bad feeling about this.

My setup is a bit odd, though, my cubietruck is connected to a VLAN trunk, in order to be able to discriminate between three different VLANs, so eth0 actually just receives the tagged packets, and vlan3, vlan4 and vlan10 do the actual work. This used to work well when my system was a Globalscale Sheeva Plug; now it's the Cubietruck and its... suboptimal.

Forgot to mention: this is with Debian Jessie, with kernel 3.16.0-4-armmp-lpae. Unfortunately, I haven't had time to fully debug this.

Valentijn, thanks for your comment. So far I haven't seen any hung TCP connections. On the other hand, my CubieTruck is a web server, so I might not have noticed.

I do plan to move to Jessie soon and I'll be on watch for the problems you describe. I think the upstream kernel shipped with Debian uses a different gmac driver than the sunxi kernel I'm currently using.

Posted by Tomaž

Add a new comment


(No HTML tags allowed. Separate paragraphs with a blank line.)