Raspberry Pi Compute Module eMMC benchmarks

03.07.2016 13:52

I have a Raspberry Pi Compute Module development kit on my desk at the moment. I'm doing some testing and prototyping because we're considering using it for a project at the Institute. The Compute Module is basically a small PCB with the Broadcom's BCM2835 system-on-chip, 4 GB of flash ROM on an eMMC connection and little else. Even providing power supply at a number of different voltages is left as an exercise for the user.

Raspberry Pi Compute Module

I was wondering how the eMMC flash performs compared to the SD card on the more common Pies. I couldn't find any good benchmarks on the web. Wikipedia says that the latest eMMC standard rivals SATA speeds, but there's not much info around on what kind the Compute Module uses. I've used Samsung's ARM Chromebook with eMMC flash a while ago and that felt pretty fast. On the other hand, watching package updates scroll by on the Compute Module gave me a feeling that it's quite sluggish.

To get some more objective benchmark, I decided to compare the I/O performance with my Raspberry Pi Zero. Zero uses the same BCM2835 SoC, so the results should be somewhat comparable. I used the SD card that originally came with Zero preloaded with the Noobs distribution. It only has the raspberry logo printed on it, so I don't know the exact model or manufacturer. Both Compute Module and Zero were running the latest Raspbian Jessie.

One surprising discovery during this benchmark was that CPU on Zero runs between 700 MHz and 1 GHz while the Compute Module will only run at 700 MHz. These are the ranges detected at boot by bcm2835-cpufreq and default /boot/config.txt that came with the Raspbian image (i.e. no special overclocking). Because of this I performed the benchmarks on Zero at 700 MHz and 1 GHz.

For comparison, I also ran the same benchmark on my Cubietruck that has an Allwinner A20 system-on-chip with SATA-connected Samsung EVO 840 SSD and runs vanilla Debian Jessie.

This is the benchmark script I used. For each run, I chose the fastest result out of 5:

N=5

DEVICE=/dev/sda
#DEVICE=/dev/mmcblk0

I=0
while [ $I -lt $N ]; do
	hdparm -t $DEVICE
	I=$(($I+1))
done

I=0
while [ $I -lt $N ]; do
	hdparm -T $DEVICE
	I=$(($I+1))
done

I=0
while [ $I -lt $N ]; do
	dd if=/dev/zero of=tempfile bs=1M count=128 conv=fdatasync 2>&1
	I=$(($I+1))
done

I=0
while [ $I -lt $N ]; do
	echo 3 > /proc/sys/vm/drop_caches
	dd if=tempfile of=/dev/null bs=1M count=128 2>&1
	I=$(($I+1))
done

I=0
while [ $I -lt $N ]; do
	dd if=tempfile of=/dev/null bs=1M count=128 2>&1
	I=$(($I+1))
done

Here is write performance, as measured by dd. I wonder if dd figures are affected by filesystem fragmentation since it writes an actual file that might not be contiguous. I've been using Zero for a while with this Raspbian image while the Compute Module has been freshly re-imaged. Fragmentation shouldn't be as significant as with spinning disks, but it probably still has some effect.

Comparison of write performance.

Read performance, as measured by hdparm as well as dd. To remove the effect of cache when measuring with dd, I explicitly dropped kernel block device caches before each run.

Comparison of read performance.

From this it seems Compute Module's eMMC flash is slightly faster than the SD card, both on read and writes when comparing to Zero running at the same CPU clock frequency. It's interesting that Zero's results change significantly with CPU frequency, which seems to suggest that some part of SD card I/O is CPU bound. That said, performance seems to be somewhere roughly on the same order of magnitude. Cubietruck is significantly faster than both. In light of this result, it's sad that never versions of Cubieboard (and cheap ARM SoCs in general) dropped the SATA interface.

Finally, I tested block device cache performance. This more or less shows only RAM and CPU performance and shouldn't depend on storage speed.

Comparison of cached read performance.

Interestingly, Zero seems to be somewhat faster than the Compute Module at 700 MHz here. /proc/cpuinfo shows a different revision, although it's not clear to me whether that marks board revision or SoC revision. It might be that processors in Zero and Compute Module are not identical pieces of silicon.

In the end, I should note that these results are not super accurate. Complexities of I/O benchmarking on Linux aside, there are several things that might have affected the results. I already mentioned different filesystem state. A different SD card in Zero might give very different results (I didn't have a second empty card at hand to try that). While Raspberry Pies were idle during these tests, Cubietruck was running my web server and various other little tidbits that tend to accumulate on such machines.

Posted by Tomaž | Categories: Digital

Add a new comment


(No HTML tags allowed. Separate paragraphs with a blank line.)