Working around broken Boost.Thread

21.02.2015 12:42

Debian Wheezy ships with a broken boost/thread.hpp header. Specifically, there's a conflict between a TIME_UTC preprocessor macro that was introduced in C11 and an enum type defined by Boost 1.49 (bug 701377). This results in somewhat cryptic compile errors for all C files that happen to include this header:

/usr/include/boost/thread/xtime.hpp:23:5: error: expected identifier before numeric constant
/usr/include/boost/thread/xtime.hpp:23:5: error: expected ‘}’ before numeric constant
/usr/include/boost/thread/xtime.hpp:23:5: error: expected unqualified-id before numeric constant
/usr/include/boost/thread/xtime.hpp:46:14: error: expected type-specifier before ‘system_time’

I first encountered this when compiling GNU Radio from source, but the problem is not specific to that project. The obvious solution would be to upgrade Boost to a newer version, but this does not appear to be possible on Wheezy without upgrading many other packages. Some web searches reveal that the general consensus regarding the solution seems be to open the /usr/include/.../xtime.hpp header in a text editor and change the offending enum.

This is bad advice. Straightforward editing of files that are managed by the package manager is never a good idea. Your fix will be silently overwritten by any minor Boost package update. Also, when debugging any kind of problem, nobody expects that files shipped in a package have been modified locally. Among other things, this can lead to hard-to-solve bugs that are not reproducible on other, seemingly identical systems.

The correct way to solve this problem (at least until Debian ships a fixed Boost library) is to work around it in the source you are compiling. For instance, including the following on top of every C file that includes boost/thread.hpp removes the conflicting macro:

#include <time.h>
#undef TIME_UTC

The trick is to have this before any other #includes. You want it to be the first time time.h is included in the compilation unit. Since time.h is protected against multiple includes, TIME_UTC won't get redefined later on when it is included for the second time through Boost headers. Of course, then the TIME_UTC macro isn't available anymore, but so far I haven't seen any code that would use it.

I've also made a patch for GNU Radio 3.7.5 that applies this workaround in all necessary places.

Posted by Tomaž | Categories: Code | Comments »

Another hard drive failure

07.02.2015 21:41

Earlier today one of my hard drives died. It was a fairly old 750 GB "Caviar GP" drive from a Western Digital "My Book" external enclosure. All it does now is emit an impressively loud metallic clicking noise.

I should have seen this coming, of course. At this point I have a pile of failed drives stashed in a box somewhere. I remember that this particular one has been unusually slow to start and mount for the last couple of times I used it. Also, smartd has previously reported "2 Currently unreadable (pending) sectors". Both of which I ignored, because I assumed this was yet another problem with the power supply. I had a "My Book" 12V external power supply fail before with similar symptoms.

I only used this drive for backups recently, so except for some archival copies of machines I no longer own, probably nothing of value was lost. Having at least a listing of contents before it failed would be nice though.

Disassembled Western Digital "My Book" external drive.

Of course, I opened it up to see if there's anything obvious wrong with it. The "My Book" USB interface board and the power supply are not the cause, because the drive has the same problem even when it is connected directly to a SATA port. I can hear the platters spinning and the clicking noise can only be caused by heads trashing around, so those are not stuck either.

Corrosion of surface finish on the controller PCB.

The only thing that immediately looks wrong is the unusual amount of corrosion on the hard drive controller PCB. It's bad enough that one some exposed test points both the immersion gold and the copper layer are completely gone. I'm not quite sure what could have caused that. As far as I can remember, this drive was sitting somewhere around my desk for the whole time, so it hasn't been exposed to any hostile environments. It might be a manufacturing defect of some sort - maybe the board was not rinsed well enough after processing.

Bottom side of the hard drive controller PCB.

I cleaned the pads where the motor and the head connect to the circuit board, but that didn't make any difference.

The copper below the green solder mask looks fine though. The bottom side of the PCB contains one large BGA chip. Maybe that one developed some bad connections, if the problem is indeed in the controller board. Just as an experiment, I also tried the disk-in-the-freezer trick, but that did not make the disk behave any differently.

Posted by Tomaž | Categories: Digital | Comments »