Two months ago I bought a CubieTruck, one of the many cheap, bare-bone ARM-based computers that keep popping-up everywhere these days. My idea was to replace the aging x86 server that is running this website with something more power-efficient. So I was looking for a reasonably powerful board with a proper SATA interface and a decent amount of RAM. Raspberry Pi was out of the question, but the latest incarnation of CubieBoard with a dual-core 1 GHz ARM Cortex-A7, 2 GB of RAM, SATA 2.0 and Gigabit Ethernet seemed to fit the bill.
Unfortunately I could not find any reliable benchmarks I could use to estimate how ARM SoCs perform in comparison with my existing setup. So before I decided to migrate I took a while to do some performance tests and get to know this hardware.
The software setup I'm interested in benchmarking is somewhat archaic in these days of Node.js and NoSQL. I'm using Perl 5 with HTML::Template doing most of the heavy lifting (at least according to Devel::NYTProf profiler). Most parts are statically generated and some are dynamic using a handful of SpeedyCGI Perl 5 scripts. These are combined into a consistent website you see here with a somewhat convoluted Apache configuration using the threaded worker.
In the following benchmarks I'm comparing:
- An AMD Duron at 700 MHz, 1.2 GB RAM running stock x86 Debian Squeeze. Root filesystem is mounted from an IDE hard drive.
- A CubieTruck A20 running armhf Debian Wheezy and the kernel supplied for the CubieTruck Ubuntu Server installation. Root filesystem is mounted from an SD card.
Both machines were connected through a 100 Mb/s Ethernet switch to a laptop which was running the remote end of the benchmarks.
First, to see how fast the static part of the web site is generated, I ran the full (single threaded) HTML rebuild. I measured the required user space CPU time with the time utility. This is the fastest run of three on each machine:
|CPU time to rebuild static pages||45.3 s||61.8 s|
Then, to check if network was operating at the bit rate I thought it was, I ran iperf to measure TCP throughput between the server and the laptop:
|iperf throughput test||94.0 Mb/s||94.5 Mb/s|
Finally, I ran a suite of tests using the Apache benchmarking tool. I measured how many requests per minute a server can handle for different types of content and different number of concurrent requests. Numbers in parentheses show size of HTTP body (without headers).
The site rebuild is somewhat disappointingly almost one-third slower than on a 10 year old PC. However the single threaded Apache performance is on par with it. In the case of more concurrent users the CubieTruck of course has an advantage because of an additional CPU core. Actually in both cases with static content CubieTruck managed to saturate the line when there was more than one concurrent request.
I tried to make these tests in a way that the slow SD card in the CubieTruck would minimally affect their outcome. All of data should fit into the buffer cache, which is why in the first test I only took into account the fastest run and only user space CPU time. However I now suspect that the SD card still affected the numbers somehow (the rebuild operation is the heaviest of the tests regarding filesystem I/O). I don't know for sure how kernel computes the time returned by the time utility.
These results are good enough that I can't dismiss CubieTruck based on performance. If a proper SATA drive wouldn't speed it up, I could probably parallelize the build process with not much work. That should cut down on time if it's really Perl performance on ARM that is slowing it down. On the other hand I'm having some other concerns about using CubieTruck as a personal server so I'm not completely decided yet about putting it on my rack.