CubieTruck UDMA CRC errors
Last year I bought a CubieTruck, a small, low-powered ARM computer, to host this web site and a few other things. Combined with a Samsung 840 EVO SSD on the SATA bus, it proved to be a relatively decent replacement for my aging Intel box.
One thing that has been bothering me right from the start though is that every once in a while, there were problems with the SATA bus. Occasionally, isolated error messages like these appeared in the kernel log:
kernel: ata1.00: exception Emask 0x10 SAct 0x2000000 SErr 0x400100 action 0x6 frozen kernel: ata1.00: irq_stat 0x08000000, interface fatal error kernel: ata1: SError: { UnrecovData Handshk } kernel: ata1.00: failed command: WRITE FPDMA QUEUED kernel: ata1.00: cmd 61/18:c8:68:0e:49/00:00:02:00:00/40 tag 25 ncq 12288 out kernel: res 40/00:c8:68:0e:49/00:00:02:00:00/40 Emask 0x10 (ATA bus error) kernel: ata1.00: status: { DRDY } kernel: ata1: hard resetting link kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) kernel: ata1.00: supports DRM functions and may not be fully accessible kernel: ata1.00: supports DRM functions and may not be fully accessible kernel: ata1.00: configured for UDMA/133 kernel: ata1: EH complete
At the same time, the SSD reported increased UDMA CRC error count through the SMART interface:
These errors were mostly benign. Apart from the cruft in the log files they did not appear to have any adverse effects. Only once or twice in the last 10 months or so did they cause the kernel to remount filesystems on the SSD as read-only, which required some manual intervention to get the CubieTruck back on-line.
I've seen some forum discussions that suggested this might be caused by a bad power supply. However, checking the power lines with an oscilloscope did not show anything suspicious. On the other hand, I did notice during this test that the errors seemed to occur when I was touching the SATA cable. This made me think that the cable or the connectors on it might be the culprit - something that was also suggested in the forums.
Originally, CubieTruck comes with a custom SATA cable that combines both power and data lines for the hard drive and has special connectors (at least considering what you usually see in the context of the SATA cabling) on the motherboard side.
Last few weeks it appeared that the errors were getting increasingly more common, so I decided to try replacing the cable. Instead of ordering a new CubieTruck SSD kit I improvised a bit: I didn't have proper connectors for CubieTruck's power lines at hand, so I just soldered the cables directly to the motherboard. On the SSD drive I used the standard 15-pin SATA power connector.
For the data connection, I used an ordinary SATA data cable. The shortest one I could find was about three times as long as necessary, so it looks a bit uglier now. The connector on the motherboard side also needed some work with a scalpel to fit into CubieTruck's socket. The original connector on the cable that came with CubieTruck is thinner than those on standard SATA cables I tried.
So far it seems this fixed the CRC errors. In the past few days since I replaced the cable I haven't seen any new errors pop up, but I guess it will take a month or so to be sure.