Follow up on Atmel ZigBit modules

27.08.2014 12:09

I've ranted before about the problematic Atmel ZigBit modules and the buggy SerialNet firmware. In my back-of-the envelope analysis of failure modes in the Jožef Stefan Institute's sensor networks, one particular problem stood out related to this low-powered mesh networking hardware: a puzzling failure that prevents a module from joining the mesh and can seemingly be fixed by reprogramming the module's firmware.

A week ago Adam posted a link to the SerialNet source in a comment to my old blog post. While I've mostly moved on to other things, this new piece of information gave me sufficient excuse to spend another few hours exploring this problem.

Atmel ATZB_900_B0 module on a VESNA SNR-MOD board.

A quick look around Atmel's source package revealed that it contains only the code for the serial interface to the underlying proprietary ZigBee stack. There are no low-level hardware drivers for the radio and no actual network stack code in there. It didn't seem likely that the bug I was hunting was caused by this thin AT-command interface code. On the other hand, this code could be responsible for dropping out characters in the serial stream. However we have sufficient workarounds in place for that bug and it's not worth spending more time on it.

One thing caught my eye in the source: ATPEEK and ATPOKE commands. ATZB-900-B0 modules consist of an ATmega1281 microcontroller and an AT86TF212 transceiver. These two commands allow for raw access to the radio hardware registers, microcontroller RAM, code flash ROM and configuration EEPROM. I thought that given these, maybe I could find out what gets corrupted in module's non-volatile memories and perhaps fix it through the AT-command interface.

Only after figuring out how to use them from studying the source, I found out that these two commands have been in fact documented in revision 8369B of the SerialNet User Guide. Somehow I overlooked this addition previously.


For the sake of completeness, here is a more detailed description of the problem:

A module that previously worked fine and passed all of my system tests will suddenly no longer respond to an AT+WJOIN command. It will not respond with either OK nor ERROR (or their numeric equivalents). However it will respond to other commands in a normal fashion. This can happen after the module has been deployed for several months or only after a few hours.

A power cycle, reset or restoring factory defaults does not fix this. The only known way of restoring the module is to reprogram its firmware through the serial port using Atmel's Bootloader PC Tool for Windows. This reprogramming invokes a bootloader mode on the module and refreshes the contents of the microcontroller's flash ROM as well as resets the configuration EEPROM contents.

It appears that this manifests more often with sensor nodes that are power-cycled regularly. However, in our setup a node only joins the network once after a power-cycle. Even if the bug is caused by some random event that happens anytime during the uptime of the node, it will not be noticed until the next power cycle. So it is possible that it's not the power cycling itself that causes the problem. Aggressive power-cycling tests as well don't seem to increase the occurrence of the bug.


So, with the new found knowledge of ATPEEK I dumped the contents of the EEPROM and flash ROM from two known-bad modules and a few good ones. Comparing the dumps revealed that both of the bad modules are missing a 256 byte block of code from the flash starting at address 0x00011100:

--- zb_046041_good_flash.hex	2014-08-25 16:41:51.000000000 +0200
+++ zb_046041_bad_flash.hex	2014-08-25 16:41:47.000000000 +0200
@@ -4362,22 +4362,8 @@
 000110d0  88 23 41 f4 0e 94 40 88  86 e0 80 93 d5 13 0e 94  |.#A...@.........|
 000110e0  f0 7b 1c c0 80 91 da 13  88 23 99 f0 82 e2 61 ee  |.{.......#....a.|
 000110f0  73 e1 0e 94 41 0c 81 e0  80 93 e5 13 8e e3 91 e7  |s...A...........|
-00011100  90 93 e7 13 80 93 e6 13  8b ed 93 e1 0e 94 86 14  |................|
-00011110  05 c0 0e 94 9a 70 88 81  0e 94 b5 70 df 91 cf 91  |.....p.....p....|
-00011120  08 95 fc 01 80 81 88 23  29 f4 0e 94 40 88 0e 94  |.......#)...@...|
-00011130  e1 71 08 95 0e 94 b5 70  08 95 a2 e1 b0 e0 e3 ea  |.q.....p........|
-00011140  f8 e8 0c 94 71 f4 80 e0  94 e0 90 93 c7 17 80 93  |....q...........|
-00011150  c6 17 0e 94 0f 78 80 91  d9 13 83 70 83 30 61 f1  |.....x.....p.0a.|
-00011160  88 e2 be 01 6f 5f 7f 4f  0e 94 41 0c 89 81 88 23  |....o_.O..A....#|
-00011170  19 f1 0e 94 a6 9f 6b e1  70 e1 48 e0 50 e0 0e 94  |......k.p.H.P...|
-00011180  47 f5 8c 01 8b e2 be 01  6e 5f 7f 4f 0e 94 41 0c  |G.......n_.O..A.|
-00011190  8a 81 88 23 19 f0 01 15  11 05 71 f4 8e 01 0d 5f  |...#......q...._|
-000111a0  1f 4f 87 e2 b8 01 0e 94  41 0c c8 01 60 e0 0e 94  |.O......A...`...|
-000111b0  22 5c 80 e0 0e 94 8d 5c  80 91 d9 13 81 ff 14 c0  |"\.....\........|
-000111c0  0e 94 40 88 0e 94 68 67  80 91 40 10 87 70 19 f4  |..@...hg..@..p..|
-000111d0  0e 94 e1 71 15 c0 81 50  82 30 90 f4 86 e0 80 93  |...q...P.0......|
-000111e0  d5 13 0e 94 f0 7b 0c c0  80 91 40 10 87 70 19 f4  |.....{....@..p..|
-000111f0  0e 94 bf 71 05 c0 81 50  82 30 10 f4 0e 94 df 70  |...q...P.0.....p|
+00011100  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
+*
 00011200  62 96 e4 e0 0c 94 8d f4  80 91 d7 13 90 91 d8 13  |b...............|
 00011210  00 97 d9 f4 84 e5 97 e7  90 93 d8 13 80 93 d7 13  |................|
 00011220  80 91 40 10 87 70 82 30  11 f4 0e 94 04 77 0e 94  |..@..p.0.....w..|

This is puzzling for several reasons.

First of all, it seems unlikely that this is a hardware problem. Both bad modules (with serial numbers apart by several 10000s) had lost the same block of code. If flash lost its contents due to out-of-spec voltage during programming or some other hardware problem I would expect the bad address to be random.

However a software bug causing a failure like that seems highly unlikely as well. I was expecting to see some kind of an EEPROM corruption. EEPROM is used to persistently store module settings and I assume the firmware often writes to it. However flash ROM should be mostly read-only. I find it hard to imagine what kind of a bug could erase a block - reprogramming the flash on a microcontroller is typically a somewhat complicated procedure that is unlikely to come up by chance.

One possibility is that we are maybe somehow unknowingly invoking the bootloader mode during the operation of the sensor node. During my testing however I found out that just invoking the serial bootloader mode without also supplying it with a fresh firmware image corrupts the flash ROM sufficiently that the module does not boot at all. The Bootloader PC Tool seems to suggest that these modules also have some kind of an over-the-air upgrade functionality, but I haven't yet looked into how that works. It's possible we're enabling that somehow.

Unfortunately, the poke functionality does not allow you to actually write to flash (you can write to RAM and EEPROM though). So even if this currently allows me to detect a corrupt module flash while the node is running, that is only good for saying that the module won't come back on-line after a reboot. I can't fix the problem without fully reprogramming the firmware. This means either hooking the module to a laptop or implementing the reprogramming procedure on the sensor node itself. The latter is not trivial, because it involves implementing the programming protocol and somehow arranging for the storage of a complete uncorrupted SerialNet firmware image on the sensor node.

Posted by Tomaž | Categories: Code | Comments »

jsonmerge

20.08.2014 21:01

As I mentioned in my earlier post, my participation at the Open Contracting code sprint during EuroPython resulted in the jsonmerge library. After the conference I slowly cleaned up the remaining few issues and brought up code coverage of unit tests to 99%. The first release is now available from PyPi under the MIT license.

jsonmerge tries to solve a problem that seems simple at first: given a series of structured JSON documents, how to create a single document that contains an aggregate of all their contents. With simple documents that might be as trivial as calling an update() method on a dict:

>>> a = {'foo': 1}
>>> b = {'bar': 2}

>>> c = a.update(b)
>>> c
{'foo': 1, 'bar': 2}

However, even with just two plain dictionaries, things can quickly get complicated. What should happen if both documents contain a field with the same name? Should a later value overwrite the earlier one? Or should the resulting document have in that place a list that contains both values? Source JSON documents themselves can also contain arrays (or arrays of arrays) and handling those is even less straightforward than dictionaries in this example.

Often I've seen a problem like this solved in application code - it's relatively simple to encode your wishes in several hundreds lines of Python. However JSON is a very flexible format while such code is typically brittle. Change the input document a bit and more often than not your code will start throwing KeyErrors left and right. Another problem with this approach is that it's often not obvious from the code what kind of a strategy is taken for merging changes in different parts of the document. If you want to have the behavior well documented you have to write and keep updated a piece of English prose that describes it.

Open Contracting folks are all about making a data standard. Having a piece of code instead of a specification clearly seemed like a wrong approach there. They were already using JSON schema to codify the format of various JSON documents for their procedures. So my idea was to extend the JSON schema format to also encode the information on how to merge consecutive versions of those document.

The result of this line of thought was jsonmerge. For example, to say that arrays appearing in the bar field should be appended instead of replaced, the following schema can be used:

schema = {
            "properties": {
                "bar": {
                    "mergeStrategy": "append"
                }
            }
        }

This way, the definition of the merge process is fairly flexible. jsonmerge contains what I hope are sane defaults for when the strategies are not explicitly defined. This means that the merge operation should not easily break when new fields are added to documents. This kind of schema is also a bit more self-explanatory than a pure Python implementation of the same process. If you already have a JSON schema for your documents, adding merge strategies should be fairly straight-forward.

One more thing that this approach makes possible is that given such an annotated schema for source documents, jsonmerge can automatically produce a JSON schema for the resulting merged document. The merged schema can be used with a schema validator to validate any other implementations of the document merge operation (or as a sanity check to check jsonmerge against itself). Again, this was convenient for Open Contracting since they expect their standards to have multiple implementations.

Since it works on JSON schema documents, the library structure borrows heavily from the jsonschema validator. I believe I managed to make the library general enough so that extending it with additional merge strategies shouldn't be too complicated. The operations performed on the documents are somewhat similar to what version control systems do. Because of that I borrowed terminology from there. jsonmerge documentation and source talks about base and head documents and merge strategies. The meanings are similar to what you would expect from a git man page.

So, if that sounds useful, fetch the latest release from PyPi or get the development version from GitHub. The README should contain further instructions on how to use the library. Consult the docstrings for specific details on the API - there shouldn't be many, as the public interface is fairly limited.

As always, patches and bug reports are welcome.

Posted by Tomaž | Categories: Code | Comments »

vesna-drivers git visualization

16.06.2014 17:41

Further in the context of two years in development of firmware for VESNA sensor nodes in Logatec, here's another view on the firmware. I used git-big-picture to graph the branching and merging of the source code.

We use a private GitHub repository to collaborate on the code. At the time of writing, it had 16 forks and 7 contributors to the master branch. Before that we used Subversion, but switched to git soon after I joined the team.

vesna-drivers repository branching visualization

Red are commits pointed to by tags, blue are branch heads and white are merge and bifurcation commits. Other uninteresting commits are not shown - zero or more intermediary commits are implied by graph edges. Time, or rather entropy, roughly increases from left to right.

Below is a detail from that picture. If you're curious, you can also download the complete graph as a rather large PDF (most branch names and tags have been redacted though).

vesna-drivers repository branching visualization, detail

At the first glance, this looks pretty chaotic. Definitely our academic environment is a bit less strict regarding the code structure than what you might see in a production project. That's not necessarily a bad thing. Many people here are researchers first and software developers second. For many, this has been their first encounter with source version control.

On the other hand, the whole point of this repository is that work can be shared. Pieces that have been useful in one project are often useful again in another. Searching 16 repositories and countless branches for a driver you might reuse isn't very fun. So some kind of structure is a must.

It's hard to get this balance right tough. On one hand you don't want to be too strict when accepting code to the git master, since then you have less code to share. Often it's even impossible for the reviewer to run-test the code since he might not have the hardware necessary. On the other hand, merging code that will be a maintenance hell later on is counter productive as well. There's no use trying to share code that will cost people more time to debug it than it would to rewrite it from scratch.

We're currently trying to go with a system of GitHub pull requests and a Wiki page that lists all the projects in private branches and I think setting up automated builds was worth the effort. But I guess after two years we're still trying to find the sweet spot.

Posted by Tomaž | Categories: Code | Comments »

Evolution of VESNA firmware size

11.06.2014 21:23

Yesterday I got an idea to plot how the size of the firmware images for VESNA sensor nodes evolved over time. Thanks to my diligent tagging of releases in git and making sure binaries can be built with a single make, all it took was a short bash script to come up with this:

Size of binary firmware image over time.

Only now do I realize that we've been polishing this code base for two years now.

When we originally started working on firmware that would run on spectrum sensing nodes in the Logatec testbed, a decision was made to develop everything from scratch and not go with an existing operating system like Contiki. The rationale was that we don't need all of its facilities and making something from scratch will be simpler than learning and building on existing work.

As it usually happens in such cases, over time we basically developed our own, small operating system. Given all the invested time, it is now hard to make a break from it, even when we have applications that would benefit from a standard platform like Contiki.

Looking at the graph, I'm actually surprised that the size of the firmware didn't increase more than it has. Keep in mind that these binary images are uploaded over a flaky ZigBit network where the best throughput is in the hundreds-of-bytes-per-second range. From the upload times to 50 nodes it certainly felt like it has increased a lot.

I didn't look into what features caused the most size increase. I'm pretty sure the recent large jump between versions 2.40 and 2.42 was because of a new SPI and microSD card driver. Also, one of the size decreases early on was after we added -ffunction-sections and -fdata-sections options to the GCC.

Posted by Tomaž | Categories: Code | Comments »

Concentrated Python cuteness

20.03.2014 19:45

Yesterday, I wrote the following piece of Python code to answer a question asked on IRC by Jure. It was more of a joke than a serious suggestion.

Update: In case you want to use this code in some serious application, please be warned that it does not work for a general case. See comments below for some better suggestions on how to solve this task.

def list_pages(page_num, cur_page, n):
	k = sorted([	0, 2*n+1, 
			cur_page-(n+1), cur_page+n, 
			page_num-(2*n+1), page_num])
	return range(1, page_num+1)[k[1]:k[-2]]

What it does is take a list of page numbers from a total of page_num pages. If possible, the list is centered on cur_page and includes n pages in front and back. At the edges, it still returns the list of the same length. For example:

list_pages(10, 1, 2)  = [1, 2, 3, 4, 5]
list_pages(10, 2, 2)  = [1, 2, 3, 4, 5]
list_pages(10, 3, 2)  = [1, 2, 3, 4, 5]
list_pages(10, 4, 2)  = [2, 3, 4, 5, 6]
list_pages(10, 5, 2)  = [3, 4, 5, 6, 7]
list_pages(10, 6, 2)  = [4, 5, 6, 7, 8]
list_pages(10, 7, 2)  = [5, 6, 7, 8, 9]
list_pages(10, 8, 2)  = [6, 7, 8, 9, 10]
list_pages(10, 9, 2)  = [6, 7, 8, 9, 10]
list_pages(10, 10, 2) = [6, 7, 8, 9, 10]

This is likely one of the most incomprehensible pieces of Python code I have ever written. The idea came from Python's cutest clamp function. It's a kind of cleverness I never want to see in a serious product.

The version Jure ended up using took around 30 lines of code to accomplish the same thing. However even with descriptive variable names it's not immediately obvious to me what it does or how it does it. If I would come across it without any additional comments, it would probably take some time and a few tests to see what is its purpose.

I failed to come up with a version that would be self-explanatory.

Perhaps comprehensive code documentation is dead these days, but I think this is one example where an English sentence can be much more expressive than code.

Posted by Tomaž | Categories: Code | Comments »

Moving a SSL certificate to a hardware token

05.01.2014 21:35

This is how you move a client side SSL certificate from Firefox to a hardware cryptographic token. It's something I do just rarely enough so that I have to look it up every time. And due to the chaotic nature of OpenSC documentation it's not easy to find all the steps. Here is a quick guide for future reference:

This assumes that OpenSC is already installed on the system and working correctly. I'm using a Schlumberger Cryptoflex USB token.

Cryptoflex works with 2048 bit RSA keys. I haven't tried larger ones.

First export the private key and certificate to a PKCS #12 file: Edit → Preferences → Advanced → Certificates → View Certificates → Your Certificates → Backup.

You can verify that it worked by:

$ openssl pkcs12 -in file.p12

Now insert the USB token or a smart card into the reader. You can inspect existing contents by:

$ pkcs15-tool -D

The Cryptoflex 32K doesn't seem to have enough memory for two key pairs, so you have to delete any existing content before uploading a new certificate. It might be possible to just delete individual files from the token, but I couldn't figure it out, so I just erase the whole device and setup everything from scratch.

First erase the token and setup the PKCS #15 structure on it. The default transport key offered by OpenSC works.

$ pkcs15-init --erase-card
$ pkcs15-init --create-pkcs15

Create a PIN and PUK entries on the token:

$ pkcs15-init --store-pin --auth-id 1 --label "My PIN"

Now upload the key you exported from Firefox to the token and protect it with the PIN you entered previously:

$ pkcs15-init -S file.p12 -f PKCS12 --auth-id 1

Verify that it has been written to the token correctly using pkcs15-tool -D. You can now remove the certificate from Firefox' software storage. You can do that from the certificate manager. You have to remove the token from the system first, because the Firefox' UI hides certificates in software storage if a hardware token is present.

Make sure you keep a safe backup of the file.p12 (and remember the passphrase). It should be impossible to retrieve the private key back from the hardware token so this is now your only way to recover it in case you want to move it to a new device in the future.

Some more background info is available on the old OpenSC wiki. It's not linked from anywhere right now because supposedly they have a new wiki, but a lot of documentation didn't make it there yet.

Posted by Tomaž | Categories: Code | Comments »

OpenSC on Wheezy

27.10.2013 11:43

One of the things that broke for me on upgrade to Debian Wheezy was smartcard support in Iceweasel. I regularly use a Schlumberger Cryptoflex USB key to authenticate on websites using client-side SSL certificates, so fixing this was kind of important to me.

OpenSC documentation is a mess and from terse error messages it was hard to make heads or tails of what was actually broken. So here's what I had to do make authentication work again in the browser.

First, fixing the most obvious thing: with the introduction of multiarch the PKCS #11 module has moved from /usr/lib/opensc-pkcs11.so to /usr/lib/x86_64-linux-gnu/opensc-pkcs11.so. This means you have to correct the path in Iceweasel. Go to Preferences, Advanced, Certificates, Security Devices and select the OpenSC module there. Click Unload to remove the module and then Load to load the module from the new path.

Also, you might have noticed that mozilla-opensc package was removed in Wheezy. I'm not sure if it was even required in the previous release, but it's definitely not needed now.

Second, the version of OpenSC shipped with Wheezy only supports accessing the smartcard readers through the pcscd daemon. You have to install the pcscd package or OpenSC will not detect any readers.

$ opensc-tool -l
# Detected readers (pcsc)
Nr.  Card  Features  Name
0    Yes             Axalto/Schlumberger/Gemalo egate token 00 00

Now for the tricky part. With the changes above, I still got a very helpful error message whenever I tried connecting to a secure website: A PKCS #11 module returned CKR_GENERAL_ERROR, indicating that an unrecoverable error has occurred. (Error code: sec_error_pkcs11_general_error).

sec_error_pkcs11_general_error message in Iceweasel

Running a test with the pkcs11-tool showed that there was something wrong with the signing operation:

$ OPENSC_DEBUG=9 pkcs11-tool --module /usr/lib/x86_64-linux-gnu/opensc-pkcs11.so -t -l
Using slot 1 with a present token (0x1)
Logging in to "OpenSC Card (tomaz)".
Please enter User PIN: 
C_SeedRandom() and C_GenerateRandom():
  seeding (C_SeedRandom) not supported
  seems to be OK
Digests:
  all 4 digest functions seem to work
  MD5: OK
  SHA-1: OK
  RIPEMD160: OK
Signatures (currently only RSA signatures)
  testing key 0 (Private Key) 
... lots of debug output skipped ...
iso7816.c:103:iso7816_check_sw: Command incompatible with file structure
card-flex.c:1067:cryptoflex_compute_signature: Card returned error: -1200 (Card command failed)
sec.c:56:sc_compute_signature: returning with: -1200 (Card command failed)
card.c:330:sc_unlock: called
pkcs15-sec.c:380:sc_pkcs15_compute_signature: sc_compute_signature() failed: -1200 (Card command failed)
card.c:330:sc_unlock: called
reader-pcsc.c:548:pcsc_unlock: called
framework-pkcs15.c:2721:pkcs15_prkey_sign: Sign complete. Result -1200.
misc.c:59:sc_to_cryptoki_error_common: libopensc return value: -1200 (Card command failed)
pkcs11-object.c:691:C_SignFinal: C_SignFinal() = CKR_GENERAL_ERROR
error: PKCS11 function C_SignFinal failed: rv = CKR_GENERAL_ERROR (0x5)

Aborting.

This seems to be a bug in the 0.12.2-3 version of the opensc package. Luckily, this is fixed in 0.13.0-3 that is currently in Unstable. Upgrading is pretty trivial and doesn't depend on upgrading a lot of other packages on the system.

With this upgrade in place, everything works again for me as it did in Squeeze.

Update: You might want to also upgrade libpcsclite1 and pcscd to versions from Unstable (1.8.10-1). With versions from Wheezy I'm still occasionally getting errors.

Posted by Tomaž | Categories: Code | Comments »

time_t on embedded systems

03.10.2013 18:44

Recently I got pointed to this slide deck from Theo de Raadt on moving to a 64-bit type for representing time in OpenBSD. It addresses ways of overcoming the limitation of the way Unix-based systems represent time. Among other things it also mentions the proliferation of the POSIX standard into the domain of embedded systems and the problematic combination of the timer overflow in the not-so-far future and the particularly long life cycles of such systems.

One interesting observation he makes is that applications often do arithmetic on the clock (for instance when calculating timeouts) and that might produce bugs even before the clock itself overflows. So things will probably start breaking well before the January 2038 deadline.

Regarding embedded systems I would like to add that this problem most likely affects a much larger set than those running POSIX-like operating systems.

One reason is that most systems that are programmed in C use the standard C library. While the C standard itself doesn't dictate how time is stored, all of the most common library implementations come from the POSIX world and bring in its definition of time_t as a signed 32-bit counter for the number of seconds from 1 January 1970. For instance, software on VESNA uses newlib. Its clock used to count seconds from 1 January 2012, but that was later changed to the standard POSIX time for the sake of simplicity of using built-in time-handling functions in the C library.

The other reason is that embedded systems often rely on hardware counters to keep the clock. ARM microcontrollers from STM for example have a 32-bit counter in the RTC peripheral. As far as I know it is a true 32-bit counter, meaning it has one more usable digit than the signed 32-bit time_t in POSIX (which adds another 68 years to its lifetime). However, there is no simple way around this limitation in software. You could start using a software counter instead, but that can potentially change response times to other interrupts that might be critical on a microcontroller system. I don't want to even imagine retrofitting that to a system that hasn't been touched in years and might control something big and expensive.

Anyway, for the moment I will go with the prevalent belief that code running on VESNA won't be around for that long and plan to book a holiday in the southern hemisphere for 2038.

Posted by Tomaž | Categories: Code | Comments »

Python Unidecode release 0.04.14

21.09.2013 14:40

Yesterday I released Unidecode 0.04.14, a new version of my Python port of Sean Burke's Text::Unidecode Perl module for transliterating Unicode strings to 7-bit ASCII.

Together with a few other minor changes, this release reverts one quite controversial change in the previous version. In the new version, Latin characters with diaeresis (ä, ö, ü) are again simply stripped of accents instead of using German transliterations (ae, oe, ue).

Using German transliterations instead of simple language-neutral accent stripping was the most often requested change to Unidecode. However, after that change was released, the most often reported bug was again concerning the transliteration of these characters. This reaction was interesting and several lessons have been learned from it.


Before diving into the problem of transliteration at all, it was obvious that people wrote code under the assumption that Unidecode will not change the transliterations over time. Apparently the 0.04.13 release broke many websites because of this, since the most popular use of Unidecode is to generate ASCII-only URLs from article titles. This is interesting because each and every release of Python Unidecode contained changes to the mappings between Unicode and ASCII characters, as it was clearly stated in the ChangeLog. This bug only became apparent once transliteration of an often used character, like ä, was changed.

So, the lesson here is that if you are using Unidecode for generating article slugs, you should only do transliteration once and store it in the database.


Now, the issue with automatic transliteration is that it's hard to do it right. In fact, it's something that requires understanding of natural language, itself a strong AI problem. If you want to get a rough idea what it involves and what kind of trade-offs Unidecode does, I suggest reading Sean Burke's article about it. Here's a relevant quote from it:

The grand lesson here is tht if y lv lttrs ot, ppl cn stll make sense of it, but ifa yaou gao araounada inasaeratainaga laetataerasa, the result is pretty confusing.

Above explains nicely why having German transliterations in Unidecode doesn't work. While it might be that German is the most common use of these characters, it makes transliteration for other languages using the same characters much worse. As Sean points out, it is much easier for a German speaker to recognize words with missing es than it is for instance for a Finnish reader to deal with extra es (compare Hyvää päivää with Hyvaeae paeivaeae or Wörterbuch with Worterbuch). It was my error that I did not remember this argument before accepting the German transliteration patch.

However, this is such a popular issue that even Wordpress implements the German transliteration as the only language-specific exception in their own transliteration tables. It should also be pointed out though that this simple fix does not actually make German transliteration perfect. There is an issue with capitalization (without understanding the context you can't know whether to replace upper-case Ä with Ae or AE).

The solution here, which I will suggest in the future to anyone that will report this as a bug, is that you should use a language-specific transliteration step before using Unidecode. You can find several of them on the web. Some, like Unihandecode, have been built on top of Unidecode.

One thing you should be aware though, is that these language-specific solutions can give a false sense of correctness. You are now relying on responsible people setting the string language correctly, when many don't even get the string encodings right. Also, can you be sure that all of the input will be in the language that was specified? An occasional foreign visitor might be much more upset with a wrong language-specific transliteration of her name than a somewhat language-neutral one provided by Unidecode.

In any case, you should be aware that using automatic transliteration to produce strings that are visible to general public will lead to such problems. This is something that developers of Plone and Launchpad got to experience first hand (although I believe the latter was not due to Unidecode).


In conclusion, I will now be much more careful accepting patches to Unidecode that deal with language-specific characters. In contrast to Sean I don't have a Master's degree in linguistics and only have a working knowledge of three languages. This makes me mostly unqualified to judge whether proposed changes make sense. Even if submitters put forth good arguments, they probably don't have a complete picture of which other languages their change might affect. Even though they didn't raise as much dust as this most recent one, I'm now actually afraid of how much damage was caused by those other few changes to Sean's original transliteration tables I accepted.

Posted by Tomaž | Categories: Code | Comments »

Hot plugging monitors in XFCE

17.09.2013 13:59

Ever since Debian Wheezy has been released, I've been putting off the upgrade of my non-server machines because I don't want to switch from my cozy GNOME 2 desktop. As kind of a test trial, I've been using Wheezy with XFCE on my work laptop since mid-July. With some work I managed to get it pretty close to GNOME 2 functionality I am used to. However it still feels like a significant step backwards in usability and it has several annoying features that I was so far unable to work around.

I might post a list of tweaks I did later on in another post. For now, here is a somewhat ugly hack to work around the fact that XFCE never fails to do exactly the wrong thing when a monitor has been added or removed from the system. As I often move between a docking station, external monitor and an occasional projector, it's extremely annoying to have to drop into a terminal all the time to poke around xrandr.

This is based on this post I found. The original solution didn't work for me and was a bit more convoluted than it strictly needed to be. Plus having some error checking is always nice.

Any way, put the following into /etc/udev/rules.d/20-monitor.rules to tell udev to call a shell script on a monitor hot plug event:

ACTION=="change", SUBSYSTEM=="drm", ENV{HOTPLUG}=="1", ENV{DEVNAME}=="dri/card0", RUN+="/etc/udev/monitors.sh"

The corresponding script in my case looks like this:

#!/bin/bash

set -e

XUSER=`w -h -s|awk '$3~"^:0"{print $1; exit 0}'`
if [ -z "$XUSER" ]; then
	echo "No one logged in, exiting."
	exit 0;
fi

XHOMEDIR=`getent passwd $XUSER|cut -d: -f6`

export DISPLAY=":0"
export XAUTHORITY="$XHOMEDIR/.Xauthority"

# Adjust the following to fit your needs

function dock {
	xrandr --output DisplayPort-2 --left-of LVDS --primary --auto
}

function undock {
	xrandr --auto
}

if xrandr|grep -q 'DisplayPort-2 connected'; then
	dock
else
	undock
fi

By the way, to debug scripts called from udev, use:

# udevadm control --log-priority=debug

This causes udev to log standard output and error streams emitted by scripts to syslog and is pretty useful when a script mysteriously doesn't seem to do what is supposed to.

The original warnings still apply. This solution only works when there is one graphical log in. It also doesn't work when you first turn on the computer or when the configuration of monitors change while the laptop was suspended.

For these cases I have a shortcut to monitors.sh on my desktop which I can click in anger when my high-DPI 24" flat panel starts pretending it has resolution of a Game Boy LCD.

Posted by Tomaž | Categories: Code | Comments »

Unbreaking GNU Radio GUI

01.09.2013 21:14

Here's a recipe how to solve a particular problem with GNU Radio. More or less for my own record as I'm pretty sure I'll come across this same problem again on some other machine and it took some time to figure it out.

If suddenly GNU Radio instrumentation using the WX GUI (like FFT and waterfall plots) look all strange and broken like this:

GNU Radio WX GUI without OpenGL

the cause of the problem is that Python OpenGL bindings are not working for some reason. In this case GNU Radio will automatically fall back onto somewhat simpler widgets without any warnings in the console. These backup widgets however seem to be very outdated or downright broken. A lot of buttons and controls are missing and the waterfall plot is completely useless.

Package that needs to be installed on Debian is python-opengl.

After this is corrected, the same WX GUI FFT Sink as above will be shown like this:

GNU Radio WX GUI with OpenGL

You can override auto-detection of a working OpenGL installation by modifying the following part of etc/gnuradio/conf.d/gr-wxgui.conf. This is useful because if you force the style to gl you will actually see the error that is causing the code to fall back to non-GL widgets:

[wxgui]
# 'gl', 'nongl', or 'auto'
style = auto
Posted by Tomaž | Categories: Code | Comments »

CuteCom fixes

19.08.2013 16:49

When developing embedded software, having a working serial terminal emulator is almost as important as having a working C compiler. So when working with VESNA I more or less have at least one instance of CuteCom running on my desktop at all times.

While CuteCom does its thing relatively well, the last version released by the original author (0.22.0) has a few minor annoyances that only get bigger with daily use:

  • Program occasionally stops saving terminal settings,
  • device drop-down history starts accumulating empty lines and
  • input widgets are not disabled when serial port is opened in read-only mode.

Fortunately, all of these can be fixed with more or less trivial patches. Unfortunately, there has been zero response to these patches from the upstream author. I have also tried to reach the author of the Debian package via e-mail to ask him if he might include these patches in the version shipped by Debian. It has been now more than a year since these attempts at communication and I have yet to receive any response.

So basically, now I'm just throwing the whole thing into the wind and hoping that maybe, just maybe, at one point someone important will pick it up and I won't have to keep installing a locally-compiled version of this package on every machine I use.

You can get the Debian source package and amd64 binary for Wheezy. If you want to include these fixes in some other distribution of CuteCom (gasp!) I have also been kind enough to nicely package them separately for you.

Posted by Tomaž | Categories: Code | Comments »

Beware of long lines in git

14.08.2013 15:45

These days I use git almost exclusively for version management. Not just for code, but also sometimes for plain text or LaTeX documents. One thing that soon becomes apparent when using git for prose is that default command line tools are very bad at handling long lines. There are some workarounds that improve the situation somewhat, but you have to remember to apply them on a per-repository basis and don't solve all problems.

However, recently I noticed that this deficiency can also become a problem when handling code contributions and pull requests on GitHub.

Take for instance this example where you are merging in some changes from someone else's branch. git diff doesn't show any indication that something may be hidden beyond the right edge of the terminal:

git diff --cached output for modified hello-world.c

A pull request on GitHub fares marginally better. Here you at least get a scroll bar at the bottom that hints at the fact that something may be amiss. However, if this branch would contain some more changes, the scroll bar could be several screen heights below the line that is causing it. Unless you have a mouse that can do horizontal scrolls it becomes practically impossible to review code like that (the view port is also conveniently set to fixed width, so resizing the browser window doesn't help).

To be honest, until yesterday I didn't even notice that you get a scroll bar.

GitHub pull request example.

Only opening the modified file in an editor that will automatically break long lines (and not all of them do that!) will unambiguously reveal the complete content of the pull request.

Modified hello-world.c opened in vi.

So I guess the conclusion would be to not accept pull requests from untrusted sources based solely on git diff output and be mindful of scroll bars on GitHub.

Posted by Tomaž | Categories: Code | Comments »

Playing the Smile song on VESNA

26.06.2013 17:53

A while back I was writing about wireless microphone simulation with VESNA. The reason back then was to be able to use remotely-accessible sensor nodes as targets for spectrum sensing experiments. However when I was writing the direct digital synthesis code it occurred to me that transmitting just a stationary tone is slightly boring. Basically nothing was preventing me from transmitting something more fun.

Since playing back recorded audio has somewhat prohibitive storage requirements for a low-powered sensor node I decided to try with a software music synthesizer. If microcomputers in the 80's were able to do it, there should be no reason why a modern ARM CPU couldn't. I've never actually written a synthesizer before and as usual when playing with sound or graphics experimenting with this code was incredibly exciting.

In the end I implemented a 6-channel wavetable synthesizer. Since now I was more concerned with audio quality than correct spectral envelope I replaced the 2-bit quantization with 1-bit and added delta-sigma modulation. This way I was able to exploit the high bit-rate supported by the CC1101 transceiver to get 4 effective bits of audio signal resolution since delta-sigma moved some quantization noise above audible frequencies. The best (subjective I guess) compromise between resolution and sample rate was at 25 kHz audio sample rate and 16x oversampling.

The most complicated thing in all of this was actually figuring out how to interpret time information in a MIDI file. There's a lot of information about this floating around but in the end understanding how to convert all of those weird music units into seconds and milliseconds took way more time than I thought.

VESNA audio synthesis demo block diagram.

It turns out that the 72 MHz Cortex M3 CPU on VESNA runs such a synthesizer without a breaking a sweat. The limiting factor was interrupt load because the streaming of data to CC1101 had to be done with software bit-banging. Since this use case wasn't exactly planned for VESNA no integrated serial bus peripherals are connected to GPIO lines that are used by CC1101 for data streaming. The software implementation handles at most 400 kbps while the transceiver goes up to 800 kbps, which would allow for one additional bit of audio signal resolution.

At high bit rates the size of the synthesis buffer also starts getting problematic. Since you want the buffer to cover at least a few periods of the lowest audio frequency you quickly run out of 96 kB of RAM on VESNA if the baseband rate is significantly higher than the audio rate.

Anyway, hacking on this has been seriously enjoyable. It was one of those side projects that give you strange looks at the office and are hard to explain to someone without giving a fair amount of background information. But in the end there's just something fundamentally fun about having at your fingertips the power to convert 50 remote devices into wireless microphones transmitting a happy song sung by a cartoon pink pony (because seriously, what else would you want to transmit over a city-wide sensor network?)

You can see a short video demonstration above which should give you an idea of what kind of audio quality is possible with this setup.

Audio synthesis source code is hosted on GitHub if you are curious about the details. Also, if you'll be in Köln two weeks from now, look for my lightning talk about this hack at SIGINT 2013.

Posted by Tomaž | Categories: Code | Comments »

Cookie law compliance

02.06.2013 19:06

On 15 June a new Slovenian law on electronic communications comes into effect. Among other things it implements the European cookie directive which makes Slovenia one of the last countries in the Union to comply with it.

Slovenian law makers decided on a stricter interpretation of the directive than most other countries, making illegal for instance storing any state on a visitor's browsers without her explicit consent. There are very few exceptions to this rule, for example where local state is the only way to implement some functionality (shopping carts for instance) or where this is expected (when you log in with a user name). But otherwise our Information commissioner is quite clear that various tracking cookies and such for anonymous users must go away unless a click-through warning is added to a web page. It remains to be seen to what extent this will be enforced though, especially since parts of the law also attempt to restrict what kind of processing you can do on plain web server access logs.

Since I started writing this blog I've been quite careful to respect the privacy of visitors on my web site. I never used cookies and for most of its existence these pages didn't include any third-party Javascript. I never used Google Analytics or other third-part analytics services and my own web server logs are only used for occasional local statistical processing when I'm curious about the number of visitors to particular articles.

I was therefore somewhat surprised when I was discussing this topic in our local Open data community and we ran some automated tests against my pages. It turns out the situation was not as rosy as I thought.

First problem was that against all odds cookies were getting set for my domain. I tracked it down to jsMath, which I use to typeset mathematical notation in some blog posts. jsMath uses a cookie to store settings you can change using a small toolbox that appears in the lower right corner of the website when a text with mathematical symbols is displayed (I'm quite sure nobody noticed it though).

That cookie isn't problematic however, since changing settings is an explicit action that is expected to save some state (the box also has an option for how long you wish the settings to be retained making that fact even clearer). However for some reason jsMath will always set a default cookie on the first visit, even if you don't touch any settings. That's not OK even though the cookie doesn't include any unique identifiers (and in fact is used solely on the client side, even though the browser will send it to my server on each HTTP request).

I'm not sure whether this is a bug in jsMath or if this is intentional. Anyway, the simple one-line patch below corrects this behavior and retains the setting-saving functionality if you happen to use it:

--- a/uncompressed/jsMath.js
+++ b/uncompressed/jsMath.js
@@ -1834,7 +1834,7 @@ jsMath.Controls = {
   cookie: {
     scale: 100,
     font: 'tex', autofont: 1, scaleImg: 0, alpha: 1,
-    warn: 1, fonts: '/', printwarn: 1, stayhires: 0,
+    warn: 0, fonts: '/', printwarn: 1, stayhires: 0,
     button: 1, progress: 1, asynch: 0, blank: 0,
     print: 0, keep: '0D', global: 'auto', hiddenGlobal: 1
   },

The second problem I had was that a few of my posts embed YouTube videos. That's a problem since YouTube player will drop two Flash Local Shared Objects on the visitor's computer as soon as it is loaded (even if you use the nocookie domain).

To my knowledge it is now impossible to embed a YouTube video on a web site and comply with the Slovenian law unless you provide a click-through warning. Since I find those obnoxious I chose to remove all embedded videos and replace them with static thumbnails that you can click-through to watch the video on the YouTube web page itself.

The other option would be to find some other video hosting service that would not set cookies (if it even exists) or host video files myself (which didn't end well a while ago). Both of these require more time than I'm willing to spend fixing this issue at the moment.

Posted by Tomaž | Categories: Code | Comments »