World wide wheel, reinventing of

09.05.2013 22:00

The direction browsers and web technology are moving these days truly baffles me. As usual in the software world, it's all about piling one shiny feature on top of another. Now, I'm not against shiny per se, but it seems that a lot of these innovations are by people that haven't even took an hour to look at the already existing body of knowledge and standards that has accumulated over the years. With the frenzy of rolling releases and implementation-is-the-standard hotness, it's not even surprising that those are then implemented by browsers before someone with a long enough beard can stand up and shout Hey! We already thought of that in this here RFC.

Take for example all the buzz about finally solving the problem with authentication on the web. Finally, there's a way to securely sign into a website without all the mess with hundreds of hard-for-me-to-remember yet easy-to-guess-by-the-cracker user name and password combinations. Wonderful. Except that this exact thing existed on the web since people did cave paintings and used Netscape to browse the web. It's called SSL client side certificates and, amazingly, worked well enough for on-line banking and government sites even before the invention of pottery and cloud-based identity providers.

But that's just the most glaring case. Another front where this madness continues is pushing things from the old HTTP headers to the fancy new HTML5. Take for example a proposal to add a HTML attribute that defines whether a browser should display something or save it to disk by default. This functionality has existed for ages in the form of a HTTP header, yet this is somehow dismissed as a server-side solution (what does that even mean?).

I wonder how many web developers today are even aware that there exists a mechanism for a client to tell the browser which language the user prefers (but we most certainly need the annoying language selection whole-screen pop-up-and-click-through!). Or that the client can tell the server whether it would rather have a PNG for downloading or a friendly HTML page for viewing in a browser (meh, we'll just fudge that with some magic on-click Javascript handlers).

Now I can see someone laughing and saying how ridiculous this idea is and if I have ever even tried to use one of those ancient features. No it's not, and I have. It's consistently painful. But it's only so because for some reason, browsers long ago decided to make the most horrible interface to such functionality imaginable to man and then forgot to ever fix it. Mostly it's hidden 10 levels down in some obscure dialog box and if banks wouldn't give you click-by-click instructions on how to import a certificate, 99% of people would give up after a few hours and continue chiseling clay tablets. Now imagine if a tenth of time spent in reinventing the wheel would be spent just improving the usability of existing features. Why can't I go to a web page and get a prompt: Hey! This web page wants you to login. Do you want me to use one of these existing certificates or generate a new, throw-away one?. World would be just a tiny bit better, believe me.

In the end, I think modern browsers have focused way too much on improving the situation for the remote web page they are displaying and neglected the local part around it. And I believe this direction is bad in the long run. Consider also the European cookie directive. I'm pretty sure this bizarre catch-22 situation where web pages are now required to manage cookie preferences for you would not be needed if browsers provided a sane interface for handling these preferences in the first place. My Firefox has three places (that I know of!) where I can set which websites are allowed to store persistent state on my computer. Plus it manages to regularly lose them, but that's a different story.

Posted by Tomaž | Categories: Life | Comments »

Cost of a Kindle server

01.05.2013 10:48

I was wondering how much running a Kindle as an always-on, underpowered Debian box was costing me in terms of electricity. So I plugged it into one of the Energycount 3000 devices and monitored its power consumption over the last 4 days. This took into account the power consumption of the Kindle as well as the efficiency of a small Nokia cell phone charger I'm using to power it.

ec3k reported an average power of 1.0 W (and maximum 2.6 W). Dividing the watt-seconds count with time also yielded 1 W to three decimal places. This nice round number makes me suspect that it's due to limited precision of the measurement, but let's consider it accurate for the moment.

1 W is equal to 0.72 kWh per month. With the current prices I'm paying for electricity this costs me 0.083 € per month. For comparison, a cup of synthetic-tasting coffee from a machine at work costs around twice as much and running my desktop machine all the time would be around a hundred times as expensive.

Posted by Tomaž | Categories: Life | Comments »

Interesting battery failure mode

28.04.2013 13:29

Thanks to my previous posts about Amazon Kindle, I have another broken specimen on my desk now. This one seems to have experienced an interesting battery failure.

Amazon Kindle 3 batteries

Kindle's battery has 4 terminals: ground, a positive terminal for power and SDA and SCL pins for I2C communication with the integrated battery management circuit. On a normal battery, the positive terminal is around 3.7 V above ground, depending on the charge level of the Li-ion cell and the I2C lines are on ground level, because they need external pull-ups.

This broken battery however has the positive terminal at 0 V compared to ground terminal while the I2C pins are at -2.5 V. I can't imagine what kind of failure mode could cause pins to go lower than ground, unless the polarity of the cell got reversed somehow. I don't see any way how a failure in the battery management circuit or a loose connection somewhere could cause such readings. I'm pretty sure it's not an artifact of my multimeter either, because the battery can draw some milliamps of current from the ground to one of the I2C pins. For the record, this looks like an original 1830 mAh battery. Date of manufacture is April 2011 and type is 170-1032-01 Rev. A.

The master I2C interface on the Kindle wasn't damaged though, because it boots and reads out battery state just fine when attached to a different battery. There does seem to be a problem with bad a connection somewhere on the motherboard, because it crashes if I lightly knock on the CPU package. Possibly a hairline crack in some solder joint. But that's a topic for some other time.

Posted by Tomaž | Categories: Life | Comments »

Contiki and libopencm3 licensing

19.03.2013 18:08

At the beginning of March a discussion started on Contiki mailing list regarding merging of a pull request by Jeff Ciesielski that added a port of Contiki to STM32 microcontrollers using the libopencm3 Cortex M3 peripherals library. The issue raised was the difference in licensing. While Contiki is available under the permissive BSD-style license, libopencm3 uses GNU Lesser General Public License version 3. Jeff's pull request was later reverted as the result of this discussion and was similar to my own effort a while ago that was also rejected due to libopencm3 license.

Both the thread on contiki-devel and later on libopencm3-devel might be an interesting read if you are into open source hardware because they exposed some valid concerns regarding firmware licensing. Two topics got the most attention: First, if you ship a device with a proprietary firmware that uses a LGPL library, what does the license actually require from you. And second, whether the anti-tivoization clause is still justified outside of the field of consumer electronics.

I'll try to summarize my understanding of the discussion and add a few comments.

Only the libopencm3-using STM32 port of the Contiki would be affected by LGPL. Builds for other targets would be unaffected by libopencm3 license and still be BSD licensed, since binaries would not be linked in any way with libopencm3. Still, it was seen as a problem that not all builds of Contiki would be licensed with the same license. Apart from added complexity, I don't see why that would be problematic. FFmpeg is an example of an existing project that has been operating in this way for some time now.

LGPL requires you to distribute any changes to the library under the same license and provide means of using your software with a different (possibly further modified) version of the library. The second requirement is simple to satisfy on systems that support dynamic linking. However this is very rare in microcontroller firmware. In this case, at the very least you have to provide binary object files for the proprietary part and a script that links them with the LGPL library into a working, statically-linked firmware.

I can see how this can be hard to comply with from the point of the typical firmware developer. Such linking requires an unusual build process that might be hard to setup in IDEs. Additionally, modern visual tools often hide the object files and linking details completely. Using proprietary compilers it might even be impossible to have any kind of portable binary objects. In any way, this is seen by some as enough of a hurdle to make reimplementation of LGPL code easier than complying with the license.

From this point of view, GPL and LGPL licenses don't seem to have a lot of difference in practice (note that libopencm3 already switched from GPL to LGPL to address concerns that it should be easier to use in commercial products). SDCC project solved this problem by adding a special exception to the GPL.

The other issue was the anti-tivoization clause. This clause was added to the third revision of the GNU public licenses to ensure that freedom to modify software can't be restricted by hardware devices that do cryptographic signature verification. This was mostly a response to the practice in consumer electronics where free software was used to enable business models that depended on anti-features, like DRM, and hence required unmodifiable software to be viable. However in microcontroller firmware there might be reasons for locking down firmware reprogramming that are easier to justify from engineering and moral standpoints.

First such case was where software modification can enable fraud (for instance energy meters) or make the device illegal to use (for instance due to FCC requirements for radio equipment) or both. In a lot of these cases however there is a very simple answer: if the user does not own the device (as is usually the case for metering equipment), no license requires the owner to enable software modification or even disclose the source code. Where that is not the case, usually the technical means are only one part of the story. The user can be bound by a contract not to change particular aspects of the device and subject to inspections. The anti-tivoization clause also does not prevent tampering indicators. However it might be that in some cases software covered by anti-tivoization might simply not be usable in practice.

The other case was where changed firmware can have harmful effects. Some strong opinions were voiced that people hacking firmware on certain dangerous devices can not know enough not to be a danger to their surroundings. This is certainly a valid concern, but the question I see is, why suddenly draw the line at firmware modification?

Search the web and you will find cases where using a wrong driver on a laptop can lead to the thing catching fire, which can certainly lead to injuries. Does that mean that people should not be allowed to modify operating system on their computers? A similar argument was made years ago in computer security, but I believe it has been proved enough times by now that manufacturers of proprietary software are not always the most knowledgeable about their products. I am sure that every device that can be made harmful with a firmware update can be done so much easier with a screwdriver.

In general, artificially limiting the number of people tinkering with your products will limit the number of people doing harmful things, but also limit the number of people doing useful modifications. A lot of hardware that was found to be easily modifiable has been adopted for research purposes in much more fancy institutions than your local hackerspace.

I haven't been involved in the design of any truly dangerous product, so perhaps I can't really have an opinion about this. However I do believe that responsibility of a designer of such products ends with a clear and unambiguous warnings as to the dangers of modification or bypassing of safety features.

Posted by Tomaž | Categories: Life | Comments »

Embedded modules

02.03.2013 20:56

I've written before about problems with VESNA deployments that have come to consume large amounts of time and nerves. Several of these have come in turn from two proprietary microprocessor modules we use: Digi Connect ME for Ethernet connectivity and Atmel SerialNet for IEEE 802.15.4 mesh networking.

One of these issues, which now finally seems to be just on the brink of being resolved, has been been dragging on from the late summer last year. We have deployed several Digi Connect ME modules as parts of gateways between IEEE 802.15.4 mesh in clusters of VESNA nodes and the Internet. One of deployments has proved especially problematic. Encrypted SSL connections from the module would randomly get dropped and re-connect only after several hours of downtime.

The issue at first proved impossible to reproduce in a lab environment and since the exact same device worked on other networks the ISP and the firewall performing NAT was blamed. However, several trips to the location and many packet captures later I could find no specific problem with TCP or IP headers I could point my finger to. We replaced a network switch with no effect. Later, by experimenting with Digi Connect TCP keep-alive settings, a colleague found a setting that caused the dropped connection to be re-established immediately instead of causing hours of down-time, making the deployment at least partially useful.

Finally, last week I managed to reproduce the problem on my desk. I noticed that the TCP connections from that location had an unusually low MSS - just 536 bytes. By simulating this I could reliably reproduce connection drops and by experimenting further I found out that SSL data records fragmented in a particular way will cause the module to drop the connection. It was somewhat specific to the Java SSL implementation we used on the other end of connection and very unlikely to happen with other connections that used larger segment sizes.

The cause of the issue was therefore in the Digi Connect module. Before having a reproducible test case I haven't even considered a possibility that a change on the link layer somewhere in the route could trigger a bug at the application layer.

After I had that piece of information, a helpful member of the support forums quickly provided a solution. The issue itself however is not yet resolved since the change in the firmware broke all sorts of other things which now need to be looked into and fixed as well.

I can't say that all of our hard-to-solve bugs came from Digi Connect or Atmel modules. We caused plenty ourselves. But having now experienced working with these two fine products, my opinion is that less time would be wasted if we went for a lower-level solution (just an interface on the physical layer) and then used an open source network stack on top. It would take more time to get to a working solution but I think problems would be much easier to diagnose and solve than with what is essentially a magical black box.

Both Digi Connect and Atmel modules suffer from the fact that they hide some very complex machinery behind a very simplistic interface. Aside from the problem of leaky abstractions, when the machinery itself fails, they provide no information that would help you work around the problem (solving it is out of the question anyway because of proprietary software). Both also come with documentation that is focused on getting a working system as fast as possible, but lacks details on what happens in corner cases. These are mostly left to your imagination and experiments and as experience has shown, behavior can change between firmware revisions. In most cases you can't even practically test against these changes, since that would involve complicated hardware test harnesses.

Posted by Tomaž | Categories: Life | Comments »

Visiting the capital

23.02.2013 14:10

I spent the last week in Brussels. The CREW project I'm involved in at IJS has organized a couple of events there as well as a plenary meeting, so the past few days have been quite exhausting, not to mention the week leading to it spent in worries and preparations.

I haven't been to Belgium in a few years and this was the first time I actually flew in. The prices for flights from Ljubljana have always been unreasonably high, with kind of an urban legend going on that it's an unofficial way of the government subsidizing our national air line by paying high prices for frequent flights of various government officials.

In any case, my flight landed late at night and the Brussels airport was more or less dark and deserted. Dark, except for big, brightly lit LED boards with advertisements. These were giving optimistic visions of a bright and sustainable future all along the long path you have to walk from the gate to the train station. And the first of these was telling us in large, friendly letters that European Parliament protects our rights. I can tell you that coming from our small, unfashionable airport to this scene reminded me of a kind of certain not-so-optimistic science fiction stories. The fact that my hotel reservation came with a legal disclaimer about assistance with authorities did not help the issue either.

Vrije Universiteit Brussel

The first order of business in Brussels was a workshop on TV white-spaces for members of the European Commission. As you might know, there is a lot of discussion going on about how to re-use the frequencies that were freed by the transition to digital broadcasts. As a project that also works in that field we presented our view on that topic to people working on spectrum regulation.

The visit to the European Commission offices was actually quite different from what I expected. I was anticipating a dusty, gray place with laser-printed passive-aggressive notices hanging around the hallways that I usually associate with government buildings here. Instead, the part of the Beaulieu 33 I saw could probably compete with Hekovnik on the number of colorful and inspiring messages stuck to the walls. Not to mention various, kind of silly posters regarding network security (in the fashion of "you don't share your toothbrush, you shouldn't share passwords either"). The security procedures as well, while visible, were pretty unobtrusive and mostly involved displaying a kind of a self-destructing badge (it got crossed-out with "expired" all by itself after a day through some kind of a chemical process I guess).

Carolina Fortuna giving a tutorial on ProtoStack

The other public event we held were CREW training days at the Vrije Universiteit Brussel. I gave a tutorial there on how to use Jožef Stefan Institute's cognitive radio testbed in Logatec and the hardware I developed for various experiments. I'm happy that I received some positive responses to that. At least for me it was a big confirmation that the tools what we are developing at the Institute are actually useful to this research community and that we are contributing in a positive way.

While trying to enter the university building on Thursday we found ourselves in front of a crowd of protesters (there was a general strike in Brussels that day), so perhaps not everyone agrees with that. I'm not quite sure though whether the university itself or people employed there were the target of their protest or we just happened to be in the wrong place at the wrong time. I would say the latter, but on the list of offices on that particular address I couldn't find any kind of institution that would be worth protesting against in my opinion. Anyway, I was not able to understand any of their complaints through loud fireworks while we were entering the building under the watch of the local riot control police.

Posted by Tomaž | Categories: Life | Comments »

Further adventures in Chromebook-land

28.01.2013 21:33

As you might remember from my previous blog post, I have a disassembled ARM-based Samsung Chromebook lying around, occupying various horizontal surfaces that might otherwise be put into better use. After an initial success with replacing the original, feature-challenged OS with armhf port of Debian Wheezy I hit on a couple of snags. First, I found out that running the computer with a non-Google signed OS means having to look at an annoying warning message at each boot and having to press Ctrl-D (and being careful not to turn off developer mode by touching any other keys by mistake) or wait for a minute or so. And second, by carelessly playing around with alsamixer, I managed to get the left built-in speaker to melt through the bottom casing of the laptop.

Naturally, the smart decision would at that point be to return the laptop to the shop and demand my money back. Of course, I chose the other way.

Chromebook left speaker close-up

As far as the speaker is concerned, it's quite beyond repair. While the body (which I guess is kind of a resonance chamber?) and the magnet are quite unharmed, the membrane and the coil (made with a piece of flexible PCB as far as I could see) ended up in a puddle of molten plastic. I suppose the other speaker is still working correctly, but I didn't test it as I have it disconnected from the motherboard for now.

Chromebook motherboard, top side

Once you remove the human interface, the business part of the laptop is a surprisingly small motherboard, containing little more than the Exynos system-on-chip surrounded by a bunch of memory chips and power supply circuits.

Chromebook motherboard, bottom side

The problem with the annoying bootloader turned out to be harder to solve than I thought. As I understand the boot process, the CPU first runs pre-boot code (apparently some proprietary initialization code from Samsung). This then loads secondary program loader which in turn loads an U-Boot. This one then annoys you and goes on to load anything you want from the SSD, provided you are in developer mode, of course. I'm not sure about the first two parts, but U-boot is stored on an Winbond 25Q32DW series serial flash chip with an SPI interface.

Serial flash IC on Chromebook motherboard

This chip has a active-low write-protect pin. The pin is pulled low by default somehow, which prevents the main Exynos CPU from writing to flash. It doesn't seem to be tied to ground though, so I'm guess it might be controlled from the embedded system controller (or a GPIO pin from Exynos, but that would be kind of stupid). If you browse Google's documentation there are some mentions of a mysterious servo2 debug board that apparently allows you to overwrite the flash and even boot the computer if the flash is corrupted. I haven't been able to find any kind of details about it, not even where are you supposed to connect it. There are no special debug connectors on the laptop's motherboard as far as I can see, so it's either plugged into one of the externally accessible connectors (USB, HDMI, SD card, audio) and does some magic through there (possibly with the help of ESC), or servo2 refers to a special version of the motherboard that has some additional debug capabilities.

In any case, without debug board's magic, replacing the bootloader doesn't look simple. I can rewire the write-protect pin, but that will give me exactly one try at programming the flash. If I botch it, Exynos will crash on boot and I won't get another chance. I'm not sure I'm capable of desoldering the flash chip without destroying it, reprogram it externally and solder it back without messing up any of tiny SMD components around it. Although there seems to already be an Arduino-based programmer available for these chips, so at least I would be spared the task to code that myself.

I've built the Chromium OS development environment which in turn can also be used to build the flash images. While everything seems to build without problems, I'm kind of confused as to whether the images built in this way still include the annoying warning or not. The build process itself turned out to be quite convoluted, involving surprising amounts of complex Python code (what's wrong with Makefiles?) hidden behind Gentoo's Portage scripts and has so far resisted my attempts to find out how the flash image is actually constructed.

Unfortunately, while poking inside the laptop I managed to add a third problem on top of the previous two. While re-attaching the copper cooling plate my screwdriver slipped and shattered one of the tiny bare-die flip-chip packages around the STM32F10086 controller.

Shattered flip-chip component

These seem to be bare silicon soldered directly to the PCB without any kind of packaging and are surprisingly brittle. From what's left of it and by looking at similar components on the board, I'm guessing the laser-etched back-side marking originally said 2822HN. I'm not sure what its function was and I can't find any references on the web for these components (the other type of a similar component used on the motherboard is 28DCV7). Perhaps a discreet logic gate? In any case, it's quite beyond my capabilities to replace, even if I would manage to get a replacement part.

Surprisingly, with this component in the broken state as it is, the laptop still boots. So far I haven't yet tested if any peripheral isn't working. One effect seems to be that the power button is now kind of unreliable, taking several presses before the computer turns on. But that might also not be related - with the casing open, everything is kind of wobbly and I wouldn't be surprised if keyboard isn't properly supported in this setup.

In any case, this Chromebook seems to be a failure as far as replacing my EeePC goes. In this broken form it certainly won't become a computer I can rely on when traveling, even if I manage to replace the bootloader. Might eventually turn out useful for some other project though.

Posted by Tomaž | Categories: Life | Comments »

GPG key transition

13.01.2013 20:22

I've been using the same GnuPG key pair for signing and encrypting my mail since 2001. If you are not using an email client that is OpenPGP-aware you might have noticed that all my electronic correspondence seems to have a piece of robot barf appended at the end. I've been stubbornly insisting on at least signing all of my out-going mail, even for recipients that I know don't use public-key cryptography, in a futile attempt to raise awareness about these things.

This secret 1024 bit DSA/ElGamal pair will now soon be 12 years old. It has been moved between many machines and, while I'm quite careful about these things, it's at least probable that in all these years it had leaked somewhere. It's also hopelessly outdated by any modern standard and quite within the reach of modern code-breakers. Listening to the RSA factorization in the real world talk at 29C3 finally reminded me to take the plunge and replace it with a modern 4096 bit RSA key. I've also moved to SHA256 digests, as recommended by Debian. And finally, to prevent the new key from getting this far beyond its best-before date, I've also set the expiry date to 5 years.

So, my new key is:

pub   4096R/0A822E7A 2013-01-13 [expires: 2018-01-12]
      Key fingerprint = 4EC1 9BBE DE7A 4AA1 E6EB  A82F 059A 0D2C 0A82 2E7A

I will be immediately switching all signatures to it. I will not revoke my old key for the next 90 days, but if you encrypt your mail, please use my new key instead. Also, if you got one of my Moo cards recently, please note that the GPG fingerprint on the back side refers to my old key.

You can import my new public key into your key chain by using the following command:

$ gpg --keyserver --recv-key 0A822E7A

I would appreciate if you would sign my new key to integrate it into the web of trust. If you meet me in person in the future, I will probably give you the key fingerprint, so you can be sure it's the correct one. Otherwise if you trust my old key, you can check my official key transition statement, which is signed by both my old and my new key.

Posted by Tomaž | Categories: Life | Comments »

Pinkie sense debugging

12.01.2013 18:29

Here's another story about a debugging session that took an embarrassing amount of time and effort and confirmed once again that a well thought-out design for debugability will pay for itself many times over in lost stomach acid and general developer well being.

As you might remember, the UHF receiver I designed a while back has been deployed on VESNA sensor nodes in a cognitive radio testbed. When the first demos of the deployment were presented however it became apparent that nodes equipped with it had a problem: experiments would often fail mysteriously and had to be repeated a few times before valid measurements could be retrieved.

In this particular case firmware running on VESNA's ARM CPU implements a very simple, home-brew scheduler. An experimenter sends a list of tasks over the management network to the sensor node and the scheduler attempts to execute them roughly at the specified times. After a while, the experimenter asks the node if the given tasks have been completed, and if they have, requests a download of the recorded data. Often however nodes would simply continue replying that the tasks have not been completed well after the last command should have been concluded.

This kind of problem was specific to nodes equipped with the UHF receiver (not that other nodes don't have problems of their own) and seemingly limited to sensor nodes deployed high on light-poles as the one test article on my desk refused to exhibit this bug.

A hint of what might be going on came when I started monitoring the uptime of sensor nodes with Munin. When the bug manifested itself the number of seconds since the CPU reboot fell to zero, making apparent that the node was resetting itself during experiments. Upon reset the scheduler would forget about the scheduled tasks stored in volatile memory, leaving the non-volatile state that was queried by experimenters in a perpetual running state. This oversight on the part of scheduler design and the fact that the node can't signal errors back to the infrastructure over the management network protocol has already made debugging this issue harder than necessary.

Next step was to determine what was causing these resets. Fortunately the STM32F103 CPU used on VESNA provides a helpful set of flags in the RCC_CSR register that allow you to distinguish between six different CPU reset reasons. Unfortunately, the bootloader on deployed nodes clobbers the values in the register, leaving no way to determine its value after reboot.

VESNA with SNE-ISMTV-UHF in a weather-proof box.

Back to square one, I reasoned that resets might be hardware related. Since the UHF tuner is quite power hungry I guessed that it might have something to do with power supply on deployed nodes. I also suspected hangs in the STM32's I2C interface, which is supposedly notoriously buggy when presented with marginal signals on the bus. Add the fact that weather-proof plastic boxes used to house sensor nodes turned out not to be as resistant to rain as we originally hoped and hardware related problems did not seem that far-fetched.

However poking around the circuit did not reveal anything obviously wrong and with no way to reproduce the problem in a lab I came to another dead end.

Next break-through came when I managed to reproduce the problem on a node that has been unmounted and brought back to the lab. It turned out that on this node a specific command would result in a node reset in around 2 cases out of 100. This might not sound much, but a real-life experiment would typically consist of many such tasks, adding up to a much higher probability of failure. Still, it took around two hours to reliably reproduce a reset in this way and this resulted in careful monitoring of failure probabilities versus the physical node and firmware versions. This monitoring later proved that all nodes exhibited this bug, even the test article I initially marked as problem-free.

Having a reproducible test case on my desk however did little to help the issue. With no JTAG or serial console available on the production configuration, it was impossible to use a on-chip debugger. It did make it possible to upload a fixed bootloader though and the cause of the resets was revealed to be the hardware watchdog.

VESNA uses the STM32 independent watchdog, which is a piece of hardware in the microcontroller that resets the CPU state unless a specific register write occurs every minute or so. The fact that watchdog was resetting the CPU pointed to a software problem. Unfortunately the wise developers in STM did not provide any way to actually determine what the CPU has been doing before the watchdog killed it. There is no way to get last program counter value or even a pointer to the last stack frame and hence no sane way of debugging watchdog-related issues (I did briefly toy with the idea of patching the CPU reset vector and dumping memory contents on reset, but soon decided that does not classify as a sane method).

This led to another fruitless hunt around the source code for functions that might be hanging the CPU for too long.

Then I noticed that one of the newer firmware versions had a much lower chance of failure - 1 failure in 2500 tries. Someone, probably unknowingly since I didn't see any commit messages that would tell me otherwise, already fixed, or nearly fixed, the bug. Since such accidental fixes tend also to be accidentally removed it still made sense to figure out what exactly was causing this bug to make sure the fix stayed put. I fired up git bisect and after a few days of testing I came up with the following situation:

git-bisect result

Note that git bisect is being run in reverse here. Since I was searching for a commit that fixed a bug, not introduced it, bisect/bad marks a revision with the bug fixed while bisect/good marks a revision with a bug.

But if you look closely, you can see that the first revision that fixed it was a merge commit of two branches, both of which exhibited the bug. To make things even more curious, this was a straightforward merge with no conflicts. It made no sense that this merge would introduce any timing changes large enough to trip the watchdog timer. However the fact that the changes had to do with the integrated A/D converter did curiously point to a certain direction.

After carefully testing for and excluding bootloader and programming issues (after all, I did find a bug once where firmware would not be uploaded to flash properly if it was exactly a multiple of 512 bytes long), I came upon this little piece of code:

while (!(ADC_GetFlagStatus(ADC1, ADC_FLAG_EOC)));

Using the STM's scarily-licensed firmware library, this triggers a one-shot A/D conversion and waits for the end of conversion flag to be set by hardware. When I carefully added a timeout to this loop the bug disappeared in all firmware versions that previously exhibited the bug. I say carefully because at this point I was not trusting compiler either, which means I added some timeout code to the loop first that had a larger-than-realistic timeout, made sure the bug was still there, and then only changed the timeout value without touching the code itself.

Now it would be most convenient to blame everything to a buggy microcontroller peripheral. Thus far most clues seem to point to the fact that some minor timing issues during ADC calibration and turn-on may cause the ADC to sporadically hang or take far longer than expected to finish a conversion (turning off the watchdog did not usually result in a hang).

But even if that is the case (and I'm still not completely convinced although this case is now closed as far as I'm concerned), this journey mostly showed what happens when debugging a 40.000-line embedded code base with none to little internal debug tools at your disposal. It's a tremendous time sink and takes careful planning and note taking - I'm left with pages of calculations I made to make sure tests were running long enough to ensure a good probability of correct git-bisect result knowing prior probabilities of encountering a bug.

So, if you made it this far and are writing code, please make sure proper error reporting facilities are your top priority. Make sure you fail as early as possible and give as much feedback as possible to people that might be trying to debug things. Try to be reasonably robust to failures in code and hardware outside of your control. And above all, make absolutely sure you don't interfere with any kind of debugging facilities your platform provides. As limited as they might appear to be, they are still better than nothing.

Posted by Tomaž | Categories: Life | Comments »

Galaksija updates

05.01.2013 12:07

One of the results of me squatting a table in the retro-gaming assembly of 29C3 is the following short list of news regarding Galaksija:

Applebloom simulation on Galaksija at 29C3 retro-gaming assembly.

I've uploaded a new version of Galaksija development tools today. Version 0.2.2 includes the assembly source for the Not my department scroller demo that I wrote on a whim on day 1 of the Congress and that could be sporadically seen running on the TV screen in the retro-gaming area. The demo uses the built-in video driver routine in ROM, but tweaks the timings of the video signal to achieve smoother vertical motion of the text than what the usual low-resolution graphics would allow. This is a trick similar to what ROM terminal emulation routines use to smoothly scroll screen contents upwards. Here's a pretty horrible video of the running demo.

Before the congress I also updated the CMOS Galaksija page. It now finally includes complete design documentation of my Galaksija-compatible motherboard and keyboard, including schematics and PCB artwork, as well as full text of my thesis that covers a lot of details of how the circuit works (in Slovene language, for information in English it's still best to just search for "galaksija" on this blog). Over the years many people asked for these documents and now I finally managed to dig out the final version from my old CVS repository and verify that it reflects correctly the circuit in my single working specimen. I would love to hear from anyone that would like to attempt building his own CMOS Galaksija using this documentation.

Finally, if you missed my Galaksija talk at 29C3, the video has been uploaded to YouTube and you can find slides from the presentation in the PDF format on the Fahrplan.

Posted by Tomaž | Categories: Life | Comments »

29C3 wrap-up

03.01.2013 10:40

On Sunday the latest iteration of the Chaos Communication Congress concluded and now, a few days later, I feel that I have paid back enough of my sleep debt to be able to write a coherent wrap-up post.

29C3 in the Congress Centrum Hamburg

As you probably heard, the Congress moved from Berlin to Hamburg in search of a bigger venue, as the Berlin Congress Center was getting increasingly crowded. At least for our small delegation from Kiberpipa this complicated travel arrangement a bit, since it turned out that Hamburg does not have any cheap airline connections from our corner of the world. So we had to opt for a day of car and train travel. Actually, I always enjoyed traveling on German ICE trains, looking out the window at 300 km/h with a cup of coffee in my hand. Although for next year we certainly need to find a way that does not include driving in a sleep-deprived caffeine haze as the last leg of the journey home.

All fears that the new venue will somehow negatively affect the Congress have not been realized. The Congress Center Hamburg certainly is huge - four floors with intermediate ones in between, one huge auditorium and two somewhat smaller lecture rooms. The place was so large in fact that I regularly got lost and made a few extra circles around a building to find an assembly or a workshop I was looking for. The size made the name Ten Forward for the lounge at the top of the building quite appropriate.

Printing all tweets with the #29c3 tag

Even though more than 6000 tickets were sold, the place was not crowded. You could always find a couch to sit in and a power socket for your laptop. Except for the more popular talks in the smaller lecture rooms where you sometimes still had to be half and hour early to get a seat. Oh, and the huge queue on the first day (getting a wrist band on day 0 was definitely a good idea). But despite that, the organization team deserves all respect for keeping such an enormous event running so smoothly. Apparently one tenth of the visitors also volunteered as angels which I just find amazing and I think it throws a very good light on the kind of crowd that gathers each year at these Congresses.

Network conditions were on the same level and it should suffice to say that Wi-Fi was truly ubiquitous and my old EeePC 901 had no problems connecting to it. This was the first congress where I didn't feel the need to jack into an Ethernet port to download something. Also worth mentioning was the rumor of dropping IPv4 connectivity next year, which will be interesting if true. Definitely a good excuse to get all of my machines reachable from IPv6 by the end of the next year.

Ang Cui and Michael Costello on hacking Cisco phones

As far as talks were concerned, I really enjoyed those that I attended. It might be that some of the speakers opted to stay in Berlin for BerlinSides or EHSM, and while I would love to attend some of the talks there, I also didn't feel like there was a lack of talks on some particular interesting topic in Hamburg this year. Here are some of the more memorable from the top of my head: Natalie Silvanovich gave a wonderful talk about reverse engineering Tamagotchis, from decapping chips to dumping ROMs and IR protocols. Ang Cui and Michael Costello showed plenty of ways some one can install spyware on your Cisco IP phone and gave a nice overview of the security of the Unix-like system that runs on them. Two talks concerning weird machines are also well worth a look: it turns out that Turing-completeness can be found just about everywhere these days, you just have to look close enough. The talk about real-world RSA factorization convinced me it's time to get a larger GPG key. Finally, it would be interesting to try low-cost chip microprobing, although a suitable microscope doesn't seem to be that cheap.

Again, this short list doesn't do the whole Fahrplan justice. There were also a lot of ad-hoc organized meet-ups and workshops and these fours days were just not enough to peek into all corners of the Congress. I myself will be slowly going through the back-log of video recordings of events I missed for the next few weeks at least. It's also worth mentioning that a lot of talks were presented by female speakers. While that might not mean much and I was told that I am not supposed to have an opinion on this topic, it does give one data point against criticisms that this community is hostile to women.

Seriously: Fragile Infrastructure beyond this point

Also missing from the list of technical talks above is the keynote and other talks connected to this year's motto, Not my department (a quote from a satire about Wernher von Braun I believe, who supposedly didn't care where his rockets fell since that was not his department). The main topics discussed were government surveillance and whistle-blowing and the general message was that people should take a broader look at what the effects of their work on the society is. It's important to occasionally take a step back and see if your work is being used for things that you would otherwise object to. All of the talks in English were about the situation in the United States though. It will be interesting to check some of the translated German talks on this topic as well.

To conclude, the Congress was as awesome as ever and I couldn't wish for a better way to spend last days of this year and I couldn't care less that I slept through the midnight on New Year's Eve because of it. The sheer amount of people I meet trying their best to make tomorrow a better place always manages to fill me with optimism that lasts for months afterwards. Presenting a talk this year made it a somewhat different experience but the feedback I got made all of the extra hours of sleep I lost well worth it, plus it made me appreciate all of the effort going into the Congress so much more. Thanks for everything and see you next year!

Posted by Tomaž | Categories: Life | Comments »


22.12.2012 18:13

A while ago, in fact as soon as Amazon started shipping them to Europe, I ordered the new ARM-based Samsung Chromebook. It seemed like the perfect replacement for my aging EeePC 901. Looking back, this little laptop has served me incredibly well for more than four years and cost 330 €. But recently it has starting to show its age. While the battery still has more than two thirds of its original capacity (which is quite amazing) it seems that the SSD slowed down considerably and the battery sometimes won't charge until I disconnect and reconnect it from the laptop.

ARM-based Samsung Chromebook

Anyway, the new dual-core Exynos system seemed like it would be an improvement over the old Atom and other specs are more or less the same. The most important part for me, weight, is just about the same as 901, while Chromebook has a considerably larger screen and keyboard. Chromebook lacks the built-in wired Ethernet interface, so I bought a separate USB-to-Ethernet dongle. It also has a HDMI instead of a VGA connector for video, which is great for watching movies on a modern TV, but not that much for giving presentations, as most places still expect you to connect via a VGA to the projector.

Some reviews I've seen criticized Chromebook's LCD panel. Certainly it can't compare in contrast and brightness to my work laptop, but that one cost almost ten times as much. However putting the EeePC and Chromebook side to side, the Chromebook seems considerably better. As far as ergonomics is concerned the only part that seems worse is the touch pad. While it is larger in surface it appears less accurate and so far I haven't liked the no-button approach (you press the whole surface down to click). I am used to having a thumb on the button while moving the cursor with my other fingers. This doesn't work here since resting a finger on the touch-sensitive surface is recognized as a two-finger gesture. I might get used to it with time though.

Oh, and the laptop is plastic, in case you were wondering. You weren't expecting machined aluminum for this kind of money, right?

As far as the pre-installed Chrome OS goes, I can say that this was one of the best out-of-the box experiences I've seen in modern computing. There is pretty much no unnecessary nonsense which everyone seems to expect from a Windows laptop these days. A short OS update, enter Google account credentials and you have a browser.

But of course, I bought this laptop to run Debian on it. I switched the laptop to developer mode and overwrote the original system with a Debian Wheezy armhf installation. I first started with armel architecture, but switched later because it turns out that binaries from Google for Chrome OS only work on armhf, and you need those to have anything better than a framebuffer driver. Getting Debian to run was quite painless thanks to this Ubuntu guide, although there's a lot to learn if you are only used to setting up Intel boxes. One annoying thing is that there seems to be no nice way of disabling the big warning screen about the disabled OS signature check on each boot and the mandatory Ctrl-D. As I understand you need to overwrite the first-stage bootloader via an SPI bus to get around that.

Unfortunately my adventures with this new toy ended up as soon as I opened the first YouTube video in the browser and wanted to hear the sound out of the built-in speakers. It turns out playing with ALSA mixer settings is more dangerous than I thought and soon I started smelling the stench of burning isolation and the underside of my shiny new computer started melting. At this point I was panicking and cursing the lack of a hardware power button and the non-removable battery, so the computer continued to melt while I typed in shutdown -h now.

Samsung Chromebook without the bottom cover.

This was maybe a month ago and I haven't touched the computer until then. Just yesterday I opened it up (Google provides nice disassembly instructions) and found out it's not as bad as I though. The membrane and coil of the left speaker have heated up to the point of melting into the plastic case, but it appears there was no collateral damage as far as electronics is concerned. Meanwhile Ubuntu folk produced a patch for this problem. So in the end I might still get to use it once I sort everything out with the Debian installation. For the time being though, I'm sticking with my trusty old EeePC.

Oh, and if anyone has any source of replacement Chromebook speakers, I might buy a set. I plan to poke local Samsung service shops, but I'm not optimistic they have access to this kind of hardware.

Posted by Tomaž | Categories: Life | Comments »

TEDx Ljubljana 2012

17.12.2012 22:38

Yesterday I attended TEDx Ljubljana, the local, independent incarnation of the famous TED conference. Apparently this was already the 12th Slovenian TEDx event. I attended the first one in 2009 at Jožef Stefan Institute and one in fall 2010 and had quite a good time at both of them. I wanted to check how the event has evolved since then, so I set a reminder and managed to get a ticket in the first minute or so before they ran out.

The event took place in Ljubljana's opera house and was professionally organized. You got a free ticket for Ljubljana's public transportation (which got me two strange looks from the bus driver) and the registration went surprisingly smoothly. For a moment after I arrived I even had a feeling they had more staff to help you find a place than actual visitors.

TEDx Ljubljana 2012 in Ljubljana opera house.

Of course, that was not the case. Apparently they filled the opera house to the brim with 600 visitors, which certainly shows that these events are getting quite a lot of attention. Not to mention that the talks were streamed live on the national television's web site.

While I can't complain about organization (OK, perhaps the hour long break bringing 600 people into the lobby at once was a bit of a mistake), the content of the conference left me quite disappointed. I see TED talks as a balanced mixture of science, technology and arts and I think the two previous events I attended managed to hit that balance pretty well.

This event however lacked the technological and scientific component almost completely.

I can't really complain about people talking about their views on life and how it changed after this or that traumatic experience. I guess listening to one such story every once in a while can remind you of shortness of life and the importance of enjoying it. But listening to such stories one after another just leaves you with the impression that no matter how much the speaker manages to engage the audience, his experience is his own and unlikely to change anyone other's life.

There was also a talk from Cultural Centre of European Space Technologies and I have a special bone to pick with those people. Dragging 60 years of history through the mud because it has no cultural value in their opinion just shows how little they understand what motivates scientists and why we do research in the first place. If they won't allow you an interpretative dance routine aboard the International Space station that doesn't mean that basic sciences contribute nothing above raw data. If you dismiss that some people can find beauty and meaning in good engineering you just lost all credibility in my eyes. Not that much of it was left after the shallow interpretation of history of space travel.

I also hold a grudge against them for taking Slovenian space travel pioneer Herman Potočnik as their own in all their public appearances and interpreting his work as an art statement. A few years back they even went as far as ruining a reprint of his book by underlying his nice original engineering drawings with ugly, red purposeless graphics and then even had the audacity of publishing it under a No-Derivatives Creative Common license. But I digress.

Really the only talk with a scientific background was by Miha Krofel of the SloWolf project and the thunderous applause he got from the audience in the end did manage somewhat to correct the bad feeling from the previous talks.

I was also a bit confused as to the international aspect of this TEDx. I was under the impression that these are intended to be local events. While I can certainly understand inviting a speaker from abroad, I'm confused as to why native Slovenian speakers were giving talks in English.

In the end, I was left wondering where were all Slovenian scientists and engineers. I'm quite certain there are plenty of them that can engage and inspire audience like this. As the motto was turning a new page and the topic was predominantly life, it should be shown that science can give it as much meaning as arts, entrepreneurship and sports. And that contributing to human knowledge can be as rewarding as donating food for the hungry.

Posted by Tomaž | Categories: Life | Comments »

The ultimate Galaksija talk

11.12.2012 10:58

A few days ago the first version of the Fahrplan for 29th Chaos Communication Congress was released. As you might have guessed from the title, I'm happy to announce that I'll be presenting a talk this year about Galaksija.

Galaksija screenshot

Why ultimate? First of all, because it's modeled after the two previous ultimate talks about vintage computers: Michael Steil's about Commodore 64 and Sven Oliver's about Atari 2600. They are both well worth a watch if you are into vintage computing and should give you an impression of what my talk will look like.

The second reason is that this is probably the last talk about Galaksija I'll do. Looking back, it's amazing that I've been playing with it for over the six years now. I have presented this little computer in a lot of places: from academic institutions through hackerspaces to vintage computing festivals and art festivals. I feel that I have explored just about everything about this small piece of history. Just last month I have also met for the first time Galaksija's original author, Voja Antonić, who helped me clarify a few remaining questions about the original design. So while probably last, I'm confident that this will be the best of my talks on the topic.

So, if this sounds interesting and you are into old 8-bit machines, you are kindly invited on day 4 at 14:00 to Saal 6 (subject to change, as usual with Fahrplans). I'll also make sure that by that time I'll update the pages about my CMOS replica and include remaining documentation that has not been published yet.

Posted by Tomaž | Categories: Life | Comments »

Fountains of Paradise

23.11.2012 22:15

On my recent trip to Trinity College Dublin, I've spent almost two full days at airports and on airplanes. I did have a loaded Kindle with me, so during that time I had the opportunity to finish The Fountains of Paradise by Arthur C. Clarke which I started reading a few day earlier.

I'm pretty much a fan of his science fiction works and have most of them on my bookshelf in paperback form. Fountains of Paradise however is one novel I haven't yet read. In a somewhat fragmented way it follows the construction of a space elevator and its chief engineer. The space elevator idea actually appears quite often in Clarke's stories (for instance in Songs of the distant Earth, or 3001). I guess that's not surprising, considering the concept is related to the geostationary orbit, something Clarke has professionally worked on. But there it's mostly just a part of the scenery. Here though, the elevator's construction is the focus of the story, as are the philosophical questions and engineering and social challenges it raises.

Main characters in the novel are portrayed in much the same way as I remember from most other Clarke's stories I read. They are always most rational thinkers, very devoted to their profession and fully in control of their minds. Even when they are overcome with emotions or do something that might appear irrational for an external observer, there's always a detached, internally rational view of their behavior. Some may say that this makes them unrealistic, but I think it fits with the general futuristic theme. As a tower to the geostationary orbit might be something that bridge builders of the future might professionally strive for, such rationality can on the other hand be something people might wish to achieve personally.

Talking about towers to the stars, Clarke's vivid descriptions of natural phenomena and feats of fictional engineering are amazing and Fountains of Paradise is no exception in that regard. Some of them literally gave me goosebumps and they definitely show that contrary to the popular opinion, the world seen through the technically correct eyes of an engineer doesn't need to be dull.

Definitely worth a read if you are into this sort of thing.

Posted by Tomaž | Categories: Life | Comments »