17.06.2011 21:49

Here is another interesting result extracted from the dataset of 150.000 blog screenshots I mentioned in my previous post. I summed the pixel values of all images and created the screenshot of an average blog:

Browser window averaged over 150 thousand blogs

Actually I made this on a whim after remembering a beautiful average face that decorated a NewScientist cover a while back. It took only around 40 lines of Python code using Numpy and Python imaging library and a few hours of processing time. I wouldn't say the result is cover-page material, but it interesting nonetheless.

I guess everyone can draw their own conclusions from it. The most prominent feature is the Firefox notification bar, which is the artifact of my screenshotting method - the browser I used didn't have Adobe Flash installed. There are methods suggested in the comments to my original post on how to do web page screenshots properly, which I will definitely use should I want to repeat a survey like this.

In hindsight, this bar might have affected the HSV histograms a bit. It is quite possible that on pages with a patterned background the yellow background of the notification bar would be picked up as the dominant color on the page. However I think the effect isn't significant, since this would have resulted in a single-color spike on the dominant color histogram in the yellow part and the spike observed there covers at least two histogram bars.

United colors of the blogosphere

10.06.2011 21:57

Several months ago we had a discussion in the office about the icons that Zemanta automatically adds to the footer of blog posts that contain suggested content. The conversation mostly revolved about how aesthetically pleasing they are combined with various web site designs out there.

What bothered me is that most of the arguments there were based on guesses and anecdotal evidence. It made me curios about what are the actual prevailing colors used on web sites out there. So I dumped the list of blogs Zemanta knows about, threw together a bunch of really simple shell scripts and let a machine crawl the blogs around the world. Of course it wasn't that simple and it wasted a week making screen shots of a Firefox error window before I noticed and fixed the bug. The whole machinery grew up to be pretty complex towards the end, mostly because it turns out that modern desktop software just isn't up to such a task (and I refused to go through the process of embedding a HTML rendering engine into some custom software). When you are visiting tens of thousands of pages a browser instance is good for at best one page load and the X server instance survives maybe thousand browser restarts.

Collage of screen shots of a few blogs.

After around two months and a bit over 150.000 visited blogs I ended up with 50 GB of screen shots, which hopefully make a representative sample of the world's blogger population.

So far I extracted two numbers from each of those files: the average color (the mean red, green and blue values for each page) and the dominant color (the red, green and blue value for the color that is present in the most pixels on the page). The idea is that the dominant color should generally be equal to the background color (except for pages that use a patterned background), while the average color is also affected by the content of the page.

Here are how histograms of those values look like, when converted to the HSV color model. Let's start with the dominant colors:

Histogram of dominant color hue used in blog themes.

You can see pretty well defined peaks around orange, blue and a curious sharp peak around green. Note that this graph only shows hue, so that orange peak also includes pages with, for instance, light brown background.

I excluded pages where the dominant color had zero saturation (meaning shades of gray from black to white) and as such had an undefined hue.

Histogram of dominant color saturation used in blog themes.

The saturation histogram is weighted heavily towards unsaturated colors (note that the peak at zero is much higher and is cut off in this picture). This is pretty reasonable. Saturated backgrounds are a bad choice for blogs, which mainly publish written content and should focus on the legibility of the text.

Histogram of dominant color value used in blog themes.

Again this result is pretty much what I expected. Peaks at very light colors and very dark ones. Backgrounds in the middle of the scale don't leave much space for text contrast.

Moving on to histograms of average colors:

Histogram of average hues used in blog themes.

Average color hues are pretty much equivalent to dominant color hues, which increases my confidence in these distributions. Still we have high peaks around orange and blue, although they are a bit more spread out. That is expected, since average colors are affected by content on the site and different blogs using the same theme but publishing different content will have a slightly different average color.

Histogram of average color saturation used in blog themes.

Again, weighted strongly towards unsaturated colors.

Histogram of average color value used in blog themes.

Now this is interesting. The peak around black has disappeared completely! This suggests that the black peak in dominant colors was an artifact, probably due to the black color of the text being dominant over any single background color (say in a patterned background). The white peak is again very spread out, probably due to light background colors mixing with dark text in the foreground.

Conclusions at this point would be that light backgrounds are in majority over dark backgrounds, most popular colors are based on orange and blue and most bloggers have the common sense to use desaturated colors in their designs.

I'm sure there are loads of other interesting metrics that can be extracted from this dataset, so any suggestions and comments are welcome as always. I also spent this Zemanta Hack Day working on a fancy interactive visualization, which will be a subject of a future blog post.

Troubles with air conditioning

07.06.2011 16:03

Last week the remote on my Airwell air conditioning unit stopped working. In a pure Murphy-inspired coincidence it failed on the exact same day when a service man was scheduled to come perform the yearly maintenance. After some embarrassing moments when I couldn't get the thing to start he got it going with manual controls.

I quickly verified with a camera that the remote control is actually transmitting, so the problem was obviously in the unit itself. He didn't have any spare parts with him so I decided to look into the issue myself before calling the service again and he was kind enough to show me how get the cover off.

Airwell air conditioning unit with the cover removed.

With the cover removed the location of the IR receiver board was immediately obvious. Right next to some shoddy wiring work that made me immediately reach for my soldering iron and shrink-tubing. I know electricians despise using a soldering iron, but you do not connect wires on a kilowatt-range equipment by merely twisting them together.

IR receiver board from an Airwell air conditioning unit.

Any way, this is the so-called receiver board. In fact it holds four visible diagnostic LEDs, one high-intensity IR LED and one integrated optical receiver module, all on independent circuits. The receiver also has a tank capacitor on its supply line and a common-collector amplifier on its output.

Nothing was obviously broken and after connecting the receiver part to 5 V supply it sort of worked. However the output signal wasn't swinging rail-to-rail as it should be and it seemed to be affected by mechanical stress. None of the soldered joints seemed faulty upon visual inspection, but after reflowing them the problem went away.

I wonder what the IR LED is used for? It can't be for two-way communication with the remote control, because the remote control only has two LEDs and no receivers (at least that's on the model I have). When the unit was working again I checked with the camera and the IR LED is indeed active and seems to transmit something whenever the receiver receives a command from the remote control. Perhaps it is for synchronization if multiple units are installed in the same room?

IR burst sent from air conditioning remote control.

By the way, the remote control emits a burst of data whenever a key is pressed. The packet is quite large, taking more than 200 ms to transmit. This is several times longer than in ordinary, say TV remotes, so it looks like it transmits the entire state, not just individual commands.

Talking analog in Cyberpipe, part 3

05.06.2011 11:09

Kiberpipa is extending its season well into June so I'm pleased to announce that there will be another talk next week for electronics enthusiasts, continuing the series I started in January.

This time the talk will depart from the details of small signal electronics and instead focus on the problem of powering those circuits on the go. If you are designing a mobile device there are good chances it will be powered by a rechargeable battery. And even with today's fully digital devices battery management remains a stubborn island of analog technology.

New cell chemistries are constantly in development and old ones are continuously improved by new methods and materials, being motivated in part by new electric-powered vehicles. Different chemistries can dictate very different designs of circuits that are powered by them. So we will discuss the battery technologies in use today, their individual strengths and weaknesses and what development we will likely see in the future. You will also hear about specific charging and discharging techniques and how to avoid the most common mistakes.

The talk will be held this Tuesday, 7 June at 19:00 by Gregor Maček, electrical engineer and entrepreneur, designer of eCAT line of electric vehicles.

The talk (in Slovenian language) will be streamed live and recorded by Kiberpipa.

