CubieTruck SSL performance
For years now the general consensus seems to be that the overhead of serving web sites securely over SSL is negligible compared to plain-text HTTP, at least as far as CPU load is concerned. This might be true when talking about big, dynamically generated websites served by expensive rack servers humming away in some data center. But how about serving a small blog with mostly static HTML from a small ARM-based computer? It turns out that such a view is overly optimistic.
Before moving this site to HTTPS-only today, I performed some benchmarks with my CubieTruck server. I reused the same benchmarking scripts I used back in January to estimate the number of requests Apache can handle per second for a few different types of content.
My setup today is mostly identical to what I used then. Hardware is the same CubieTruck A20 on 100 Mb/s Ethernet, although now the root filesystem is on a Samsung 840 EVO SSD. Kernel is Linux 3.4.95 for Allwinner A20 from DanAnd.de. Static content is served by Apache 2.2.22 as shipped in Debian Wheezy armhf architecture. Dynamic parts are rendered with Perl 5 and HTML::Template and served through Apache and Speedy CGI.
I compared the following two Apache configurations:
- Plain text HTTP and
- secure HTTPS using TLSv1 protocol, ECDHE-RSA-AES256-GCM-SHA384 cipher and 2048 bit RSA key (a reasonably secure setup according to SSL Labs).
Below are results from the Apache benchmarking tool. Numbers in parentheses show size of HTTP body (without headers).
As you can see, encryption unfortunately makes the server somewhere between 2 and 7 times slower. The impact is highest when serving small static files. Most likely CPU time spent in the SSL layer dominates there. On the other hand, there is less slow-down with the dynamic content where the CPU had something to do even in the plain-text case.
Can something be done to improve these numbers without degrading the cipher settings? Not much, it seems.
Relatively recently, OpenSSL in Debian was compiled on ARM without hand-optimized assembly routines. I have checked, however, and the version of OpenSSL I was using has the patch that enables them already applied (together with around 50 other OpenSSL patches that Debian ships with these days).
The Allwinner A20 SoC appears to have some kind of crypto hardware on board. Unfortunately, there is very little information available about it. If it could be exposed through the kernel's crypto API, it might speed things up a bit. An experimental kernel driver has been around from June this year. The remark about not using DMA however makes me a bit pessimistic regarding its usefulness at this point. I might try to set it up eventually, but it will probably be tricky. Even with working kernel support, enabling hardware acceleration for OpenSSL in this way seems to be non-trivial.
This seems like far too huge a degradation.. was the benchmark client configured to reuse the initial SSL session key, and was Apache configured to cache those keys? If not, most time will be lost in key generation