## On black magic

27.09.2011 17:42

A while ago, during lunch break at a BarCamp, I sat at a table with some of my colleagues from Zemanta. There was an iPhone 4 on the table and conversation touched its infamous antenna issue. Gašper said "You know, I heard antenna design is kind of a black magic" to which I replied that somewhere an RF engineer is probably saying to her colleagues "You know, I heard software development is kind of a black magic".

While I was only half serious at the time, it did get me thinking. I believe that design in all fields of engineering has parts that can be calculated analytically. Parts where laws of nature, either directly or through simplification and abstraction permit parameters to be calculated with formulas and algorithms to some useful engineering precision. There are also other parts where the only solution is an experiment. Those are the parts where you have difficult to predict parameters, chaotic systems or simply involve systems so complex that an analytical solution either doesn't exist or costs more to obtain than doing a series of experiments that will lead to the same result.

I understand black magic as a problem that favors an extremely experiment-oriented approach. A problem where a solution can only be obtained with a heuristical process. One that requires intuition of a designer that has over the years accumulated and internalized a large number of failed and successful experiments. Designer with enough experience that when presented with a problem he can intuitively recall what approach worked well in a similar case, even if he can't explain it.

The two approaches, the purely analytical and purely experimental, overlap to a large degree. For example when designing analog circuits you can go pretty far calculating exact values down to the last resistor in the system, counting in stray capacitances and so on. Or you can take a more experimental approach. Calculate some basic parameters with wide enough error margins, build a prototype circuit or a simulation and tweak it until it works to your satisfaction. I would argue that the sweet spot lies somewhere between both extremes. Both in the aspect of least effort and quality of solution. Of course, hitting that sweet spot, knowing when you went too deep into theory, is also where intuition and experience come into play. It's a different kind of intuition though and arguably one that is more broadly applicable than the black magic one mentioned above.

Original by Paul A. Hoadley CC BY-SA 2.5

It's quite the same with software engineering. Just replace pages upon pages of differential equations with design documentation in a waterfall model. Or incremental building and optimization with the coveted agile approach.

In fact my growing dissatisfaction with the software development world is that the mindset is ever more moving towards extremely experimental design. This is most apparent in the fresh field of web application programming. There is little to no strict documentation on much of the modern frameworks that hold the web together. Few well defined interfaces have specifications that are too complex to fully understand and are riddled with implementation bugs. It's the field that favors the magicians that have enough mileage to recite individual web browser CSS bugs by heart in the middle of the night, ordered backwards by the issue numbers. With continuous deployment, increasing hate of version numbers and release cycles for software this bias is only growing more prevalent throughout the software industry.

It's an interesting contrast to the inherently analytical nature of computer programs and it certainly doesn't look like the fictional RF engineer's statement would be any less well grounded than my colleague's.

Posted by | Categories: Ideas | Comments »

## The most average blog

17.06.2011 21:49

Here is another interesting result extracted from the dataset of 150.000 blog screenshots I mentioned in my previous post. I summed the pixel values of all images and created the screenshot of an average blog:

Actually I made this on a whim after remembering a beautiful average face that decorated a NewScientist cover a while back. It took only around 40 lines of Python code using Numpy and Python imaging library and a few hours of processing time. I wouldn't say the result is cover-page material, but it interesting nonetheless.

I guess everyone can draw their own conclusions from it. The most prominent feature is the Firefox notification bar, which is the artifact of my screenshotting method - the browser I used didn't have Adobe Flash installed. There are methods suggested in the comments to my original post on how to do web page screenshots properly, which I will definitely use should I want to repeat a survey like this.

In hindsight, this bar might have affected the HSV histograms a bit. It is quite possible that on pages with a patterned background the yellow background of the notification bar would be picked up as the dominant color on the page. However I think the effect isn't significant, since this would have resulted in a single-color spike on the dominant color histogram in the yellow part and the spike observed there covers at least two histogram bars.

Posted by | Categories: Ideas | Comments »

## United colors of the blogosphere

10.06.2011 21:57

Several months ago we had a discussion in the office about the icons that Zemanta automatically adds to the footer of blog posts that contain suggested content. The conversation mostly revolved about how aesthetically pleasing they are combined with various web site designs out there.

What bothered me is that most of the arguments there were based on guesses and anecdotal evidence. It made me curios about what are the actual prevailing colors used on web sites out there. So I dumped the list of blogs Zemanta knows about, threw together a bunch of really simple shell scripts and let a machine crawl the blogs around the world. Of course it wasn't that simple and it wasted a week making screen shots of a Firefox error window before I noticed and fixed the bug. The whole machinery grew up to be pretty complex towards the end, mostly because it turns out that modern desktop software just isn't up to such a task (and I refused to go through the process of embedding a HTML rendering engine into some custom software). When you are visiting tens of thousands of pages a browser instance is good for at best one page load and the X server instance survives maybe thousand browser restarts.

After around two months and a bit over 150.000 visited blogs I ended up with 50 GB of screen shots, which hopefully make a representative sample of the world's blogger population.

So far I extracted two numbers from each of those files: the average color (the mean red, green and blue values for each page) and the dominant color (the red, green and blue value for the color that is present in the most pixels on the page). The idea is that the dominant color should generally be equal to the background color (except for pages that use a patterned background), while the average color is also affected by the content of the page.

Here are how histograms of those values look like, when converted to the HSV color model. Let's start with the dominant colors:

You can see pretty well defined peaks around orange, blue and a curious sharp peak around green. Note that this graph only shows hue, so that orange peak also includes pages with, for instance, light brown background.

I excluded pages where the dominant color had zero saturation (meaning shades of gray from black to white) and as such had an undefined hue.

The saturation histogram is weighted heavily towards unsaturated colors (note that the peak at zero is much higher and is cut off in this picture). This is pretty reasonable. Saturated backgrounds are a bad choice for blogs, which mainly publish written content and should focus on the legibility of the text.

Again this result is pretty much what I expected. Peaks at very light colors and very dark ones. Backgrounds in the middle of the scale don't leave much space for text contrast.

Moving on to histograms of average colors:

Average color hues are pretty much equivalent to dominant color hues, which increases my confidence in these distributions. Still we have high peaks around orange and blue, although they are a bit more spread out. That is expected, since average colors are affected by content on the site and different blogs using the same theme but publishing different content will have a slightly different average color.

Again, weighted strongly towards unsaturated colors.

Now this is interesting. The peak around black has disappeared completely! This suggests that the black peak in dominant colors was an artifact, probably due to the black color of the text being dominant over any single background color (say in a patterned background). The white peak is again very spread out, probably due to light background colors mixing with dark text in the foreground.

Conclusions at this point would be that light backgrounds are in majority over dark backgrounds, most popular colors are based on orange and blue and most bloggers have the common sense to use desaturated colors in their designs.

I'm sure there are loads of other interesting metrics that can be extracted from this dataset, so any suggestions and comments are welcome as always. I also spent this Zemanta Hack Day working on a fancy interactive visualization, which will be a subject of a future blog post.

Posted by | Categories: Ideas | Comments »

## Aiming for the center

11.04.2011 21:08

Software that exposes its user interface over the web offers limitless opportunities for monitoring the behavior of people using it. You can literally watch each move of the mouse and each key press. And it's so tempting to gather all this information and try to make sense out of it that some are pushing this idea of data centered design where each decision must come from a bump in a graph or two. From my point of view as an engineer (UX is not in my verbal vocabulary) I find that kind of approach shortsighted at best.

You are invariably optimizing for an average user, even if you divide people into neat little numbered bins. Consider an analogy from the physical world: Center of gravity is the mean location of mass in an object. In many objects, like in this ring for instance, it is also devoid of any mass. Optimizing something for the average user can mean you simply make no one happy and everyone equally miserable.

AB testing and similar ideas are just hill-climbing optimization algorithms for some variable that ultimately depends on processes in a wet human brain. Such simple approaches fall short in the real world where laws of physics are infinitely less complex. How can they be the main guide for development in user interface design, where a linear law is replaced by chaotic mess of firing neurons? I don't doubt that in some time in the future well be able to understand and model it (and that certainly won't be a two dimensional graph with vaguely defined axes). Until then it will take a human to understand a human.

Some may argue that at large scales groups of people are surprisingly predictable. My answer is that it's also well known that entropy increases in the universe. That doesn't bring you any closer to designing an internal combustion engine.

I'll stop myself here and won't go into the impossibility of doing controlled experiments with people or the moral problem I see in choosing profit as the universal optimization variable for all design questions. I'll offend enough liberal arts students as it is.

Measurements are important in evaluation, but anything involving a human in the loop is one significant figure at best. Modifying things based on that second decimal increase in metric instead of going with the gut feeling of what is best just takes everyone a little further from happiness.

Even if you reap mad profits from that bouncing blinking purchase button.

Posted by | Categories: Ideas | Comments »

## Treacherous waters

09.02.2011 21:44

These days (or rather months) my daily work at Zemanta often takes me to the part of the web I would not normally visit. Its shadier parts so to speak. And it turns out that those are surprisingly crowded these days.

I'm sure you've been there. Probably when you were searching for some useful piece of information and such sites cluttered the top of the result list and you had to sort through piles and piles of fluff before you found what you were looking for. Or maybe it was recommended by a friend through one of the many channels such recommendations travel in the age of social web. Perhaps you even had to deal with a bug report because a piece of your web-facing software, while compliant to all relevant standards, didn't perform up to some user's satisfaction when dealing with such a web site.

Imagine for a second the stereotypical web site of this class: fixed width design, unreadably small, gray type on white backgrounds. Left-top logo in saturated colors and gray gradients, courtesy of web-two-point-oh. Probably the definitive destination for some wildly popular topic right off the first page of your typical yellow press (celebrities, health, cars, shopping) or emerging interests (say Android development). At most two paragraphs of actual content and the rest filled with ads, user comments and everything in between. And of course at least 10 ways to share this page on all social networks you know about plus 10 more that you don't.

Considering that serving those abominations of the web is the only thing the companies behind them do, they are surprisingly incompetent about it. Pages won't validate or will throw a hundred Javascript errors from tens of different scripts that load behind the curtains. The little content there is was scraped from the Wikipedia or it looks like someone from a less fortunate country was hired to copy-and-paste a few statements on a prescribed topic from all around the Internet. Everything under a CC license is considered free-for-all (but don't dare break their lengthy personal-use-only terms of use!). Nobody cared about the fact that there is anything other than ASCII encoding or the subtleties XML parsing or for that matter even that the description of software product and an image that shows a porn star of the same name do not refer the same thing. As long as half a dopamine-starved human brain is able to decode it it's good enough.

What's puzzling at first is that such sites seem to be getting a shocking amount of traffic (and probably revenue as well). Of course, the opinions about the quality differ. Even among my colleagues some consider such sites valuable destinations. They have comment buttons and you can share them on Twitter! They're way more fun to visit than some tired old Wikipedia that doesn't even have Facebook integration. Regardless of the fact that any user-contributed discussion is as devoid of actual content as the site itself.

What I see in such pages is an evolution of link farming. Social farming if you will. Search engines have gotten better at detecting content that has just been blatantly and automatically copied from somewhere. So an up-to-date spammer, er, vertical influencer has switched from a website copying bot to a few mechanical turks producing syntactically unique but semantically carbon-copied content. The network effect of modern social networks brings more and more people to the site, producing worthless comments that again give the appearance of a respectable site. At this point they are trying to trick an algorithm by introducing living people into the content copying process.

Therefore you can hear a lot about how the traditional search engines will in time be completely replaced by your social network, introducing wetware on the other side as well. The idea is that natural language processing and information retrieval won't be able to distinguish between what you would consider a reputable site and a link-farmed site that approximately copied content from that reputable site. But your friends in a social network will. First because they are (hopefully) human and can understand things AI can't and second because you share their interests and trust what they trust. They can therefore in theory push more useful information in your direction than some algorithm, even when it is intimately familiar with your click- and search history.

However I think the sites I described above are a perfect example why this scheme won't bring any reduction to the fluff you will need to go through before getting to the information you want. At this point you even don't have to fool search engines any more because once you've got people clicking those little "share" buttons they will bring more visitors to your site and push your coupon codes and endorsements and whatnot regardless. In the end it's just as easy to subvert the human social network to pass around unsolicited advertisements as it is for a software algorithm. You just need a different kind of engineering.

Posted by | Categories: Ideas | Comments »

## Wireframing

10.06.2010 20:12

After seeing the beautiful x0xb0x front panel on Keith's blog I spent some time thinking about the design of the exterior of my new lab power supply.

Compared to your average (computer) graphical user interface there is really not much to it. I want to keep it simple, with knobs and old-fashioned analogue panel meters.

I chose this wide, low case so the power supply will fit under my oscilloscope on the table.

I would love to have that brushed-metal finish. I've seen front panel adhesive films on which you can print with a laser printer. However the case is wider than A4 paper format, which makes it tricky to print on in one piece.

All my previous projects that used the "double-sided-tape + laser print on paper + transparent protective adhesive foil" approach got wrinkly with time. So I'm more than willing to try something new this time.

Posted by | Categories: Ideas | Comments »

## Camera in a bathroom

07.01.2010 21:53

Hot water in my apartment is supplied by a storage-type electrical boiler. As uneconomical as it is to use electricity for heating in a big city I can't really do much about that. However what I did notice is that the insulation on the boiler is pretty good - the water will remain hot enough to use even if the heater has been turned off for a couple of days. So maybe I could optimize the time the heater is working. Instead of a simple thermostat I could add a timer that would only allow the heater to work at night when electrical energy costs around half as much as during the day.

However, before I start tearing the boiler apart I want to have at least a rough approximation of how much energy the boiler actually uses and how much such a modification would save me per month in lower electricity bills.

Unfortunately I don't have proper equipment at hand to directly measure and record electrical power (plus that would mean more messing with the wiring in the bathroom and that's a can of worms want to open as rarely as possible).

So I used another approach. I took my EeePC, put it on a high shelf across the boiler and trained the camera on the indicator that lights up when the heater is operating. From the recording it will be obvious at what time the heater was on and since I know approximate power I'll be able to calculate the energy consumption and cost.

Of course, the thought has crossed my mind that having a camera operating in a bathroom can be a really bad idea. Even more so if it's connected to a Wi-Fi capable laptop. So I took some extra precautions to prevent anyone ending up on YouTube (i.e. I disabled the wireless adapter).

I used the following GStreamer pipeline to record the video:

gst-launch -v \
v4l2src ! video/x-raw-yuv,width=640,height=480,fps=5 ! ffmpegcolorspace ! \
videocrop bottom=112 left=288 right=288 top=112 ! \
videoscale ! videorate ! video/x-raw-yuv,width=128,height=512,framerate=1/4 ! \
clockoverlay ! \
jpegenc ! avimux ! filesink location=boiler.avi


Everything up to ffmpegcolorspace appears to be necessary for the camera to operate properly. videocrop crops the video so that only the boiler can be seen on the image (makes for a boring video, but that's what I'm aiming for here). videorate drops 19 frames out of 20 to save disk space.

I also added a clock overlay, so that each frame would have the exact time recorded, just in case I couldn't later deduce the time by other, more sophisticated means. I had to scale the video width up by 2 (hence the videoscale element) to make it wide enough for the clock display to fit onto it.

Video is saved in the simple motion-JPEG format. Not really the most efficient encoding out there, but everything else seemed to produce unusable results. I guess the default settings for Theora and friends don't work very well at 0.25 fps.

I left this setup running for a week and it produced around 3.5 MB of video data per hour - no problem for EeePC's somewhat limited storage capabilities.

If ran at a normal 25 frames per second, this video shows a day in 15 minutes, so in case I won't be able to do anything automatically I would still be able to watch it and manually record the time (however I'm confident it won't come to that)

Posted by | Categories: Ideas | Comments »

## Reverse biased LED

25.11.2009 16:10

Here's an interesting question: If you force current through a LED in the reverse direction (i.e. by causing a breakdown in the junction), will it emit light, and if not, why?

Let's start with an experiment. I took an old low-intensity 3 mm yellow LED (I'm guessing GaAsP chemistry) - I want to limit this discussion only to simple P-N junction devices and ignore for the moment complicated new LEDs with quantum wells and such.

I applied a high reverse bias through a large resistor (1 MΩ, to limit power dissipation) and visually checked for any light. The diode started conducting a significant current at 185 V (which, by the way, is surprisingly high as all datasheets I've seen rate such LEDs at maximum 5 V reverse bias). At that reverse voltage I let 0.1 mA of current through it (which gave total power of something around the rated 20 mW). I couldn't see any light coming from the LED.

As a control I then reversed the polarity and let 0.1 mA flow in the forward direction. In that case I could clearly see the LED lit up in a darkened room. So, the LED wasn't destroyed in the experiment and 0.1 mA was enough to produce a visible effect.

So, the experiment confirms that a LED will not light up when the current flows in the reverse direction. But what is the theory behind it?

Visible photons emitted by the LED are generated by electron-hole recombinations in the semiconductor. The material has just the right energy gap that an electron in the conducting band can give away it's excess energy to a visible photon when it fills a hole in the valence band. In forward operation most of these recombinations happen after charge carriers travel through the depletion region and are diffusing into N or P material as minority carriers.

When the junction is in breakdown, a similar thing happens (from the high reverse voltage I measured I'm guessing the avalanche mode of breakdown). However this time carriers don't diffuse through the junction but are generated there through impact ionization. But because the voltage is reversed, electrons enter the N material as majority carriers (same goes for holes in the P material). So gone are the recombinations near the depletion region and no significant number of photons is produced.

Interesting though, carriers from an avalanche breakdown have significantly more energy than thermal ones in forward operation. And they do in fact emit light after they pass through the junction (i.e. through bremsstrahlung and hot carrier recombinations). However the light's wavelength is no longer defined by the material's band gap and its spectrum is completely different to that of forward operation. So the LED might have as well lit up in my experiment, but with intensity and wavelengths that were invisible to an unaided eye.

Posted by | Categories: Ideas | Comments »

## Google Translate idea

03.08.2009 18:40

Google Translate is currently pretty useless for any serious translation (regardless of what translators of user manuals for consumer electronics think).

However, I sometimes find myself searching for, say, a Slovene equivalent to a specific English technical term or phrase. I'm fluent in both languages, it's just that the term may be too far out of my expertise to know the correct translation - although I can usually spot the correct one when I see it in context.

Scientific terminology is usually very strict, with one specific name for a phenomenon. And if you want the translation to appear correct in the eyes of someone knowledgeable in that field, a simple by-the-dictionary translation of that name may make you a popular target of in-jokes (take note, authors of subtitles on Slovenian television networks).

So, if I don't have someone fluent in that terminology at hand, what I usually do is first check for equivalent pages in different languages of Wikipedia. More often than not, that approach fails miserably. Then I'm off to Google, where I search for the term I want to translate (in quotes) plus some terms in the targete language that I estimate must appear nearby in the translation.

However, that's not really a good task for a general search engine - translations may for example appear on different web pages (pages often have an English and an Slovenian section separated on different pages). On the other hand my search will only return results when a single page contains both English and Slovene texts, so a lot of potentially useful results would be missed.

Here's my idea: machine translation tools like Google Translate already recognize pairs of texts on the web that are direct translations of each other. That's the input for their machine learning algorithms, where they learn how to (badly) translate free text.

Wouldn't it be nice if you could enter just an English phrase and select a language, and get back a list of English texts containing that phrase, plus the matching texts in Slovene? It's not even necessary to point out where exactly in the text the equivalent phrases are - A paragraph-level of precision would be more than enough. I can find the exact spot myself while reading the context (which I must, to make sure I'm using the correct translation).

So in effect you would be using the infrastructure that most likely already exists, but for human instead of machine learning. I'm sure that would be a most useful tool for people translating technical texts. At least until machine translation becomes a little more accurate.

Posted by | Categories: Ideas | Comments »

## Entering PIN without confirmation, part 2

20.06.2009 19:44

In my last post I was trying to figure out an optimal strategy for guessing a combination for a lock that only checks the last m symbols entered.

It turned out my venture into the graph theory was a bit of a dead end. As Luka Rahne pointed out to me (on Facebook, no less), there's a different branch of mathematics that offers a much better approach to this problem and even has a connection to my home field of electronics.

Linear feedback shift registers are actually direct solutions to such a problem, albeit only if you limit yourself to two characters (n = 2). These are digital circuits that will generate all 2m-1 different sequences of length m in a shift register (i.e. for each step they insert a single digit at the beginning of the register and drop one at the end). Exception is the state that contains all zeroes, but that is easy to correct with a special case.

It turns out that there exists a generalization of the LFSR for n states. Basically you replace the XOR operations in the feedback loops with arithmetic multiplication and addition modulo n. There is quite a bit of theory behind, of course, but I had trouble finding a higher-level description of it on the internet. A great deal also appears to be patented judging by the ratio of patents that came up in the search.

Just like with ordinary LFSRs, where you need to find the correct locations of the feedback taps, you need to find the correct set of m+1 multiplicative constants in order to get the maximum length sequence (or M-sequence) out of it. This is not really straightforward, as it involves finding a primitive polynomial for the finite (Galois) field GF(nm). Luka posted his Mathematica code to calculate the polynomial's coefficients, but I found it easier to use COS.

Here Si are memory cells holding register states and aj are coefficients of the polynomial:

a_4 x^4 + a_3 x^3 + a_2 x^2 + a_1 x + a_0

My knowledge of this field is a bit superficial and I don't completely understand the connection between this polynomial and the fact that its coefficients will produce the M-sequence. I admit I only skimmed the most interesting parts of an introductory book yesterday. Even my curiosity has limits.

There was only one obstacle left: a finite field only exists for prime characterics n, which means that you can't get a M-sequence for, say, 10 symbols directly. The answer is to factor n and produce a M-sequence for each prime factor (2 and 5 for 10 symbols). You then run two sequences in parallel and assign a symbol to each unique ordered pair you get in this way. So for ordered pairs where the first element can have one of 2 different values and the second element can have one of 5, you get 10 different pairs.

It's easy to show that two sequences ran in parallel with periods

have a composite period of

(n_1 n_2)^m

if n1 and n2 are prime.

So, the final result, a sequence of 10003 keypresses that checks all 4-digit combinations, is below. C code that calculated it and checked for its correctness is here if anyone wants to play with it.

00007559254372858084599981873643682873396507388560205737258390890406577109057085
84407755656998452440953103431203840871976305548700449533810790452084777529615001
28865995256970698666717343853647244639134769304182425975616718038220171921697881
22883937678958816840159505057807206235198963108968447113618950692228573181015663
86685937212772232024957943835883460493374181748586283133070574410206513727615467
46423377876136810868191927035025480677350167368920043177812754478240711183257227
86425153254550496826957587659209644291712969724740441319298114038042533189239285
44227331436116270606531581499603224811334367500940227153898374812000175583453629
48085599890972652682973387417289460215736349380980407577018156094844177547479885
42441953012530212840971967215449600459532901780542085777438714010288759943479607
88667717252952656244739125679205082435974707708128221171830796890228939367699489
06841159414156816206335189873009868457112709940782229573090114672866959363037623
22025957852934892460593365091649486293132161564500207513636714476464333769671269
00869191836134034480777341077269820053176903744568241711092356236864351523455405
86827957496758218644391703879625640451318389104128043533098338294442373305271063
60607531490598612224911325277401840237152989364902001175492552638480955989819627
42683973296516298460315727259281880417576109146184845177456578894424519521035203
02841971876314458600559523811681442095776529704100289759852578616886777163439427
46245739034778214082535965617609028231170921786980229939276798498068511585051469
06207335098972018868557103619841682239572181104762867959272136632220359569439249
82461593274190658486393123071465400217512727704566465333678770278008791909271241
24481777250176278820153167813645468251710183346326865351432554414868379565877483
08645391612978634640551309299005028053532189328384443373214370072606175305815887
02225911234376410840337143899265802011174583542728481955898918636426939723875063
88461315636358290880517567019047084855176547568984425519430134212028519709673045
48601559432910690442195767439605000299758943568706887777072538436462557381257683
04083535874716618028331161831687880239938367788588069511494150478062173341899621
08869557012718850682339563091005662877958363126722221359478538258824715923651807
48487393032170474400317503637605466475332769760368009791818370250244917763411663
68821153076912654468351701093247226875350523544504869379474976492086553907039687
24641551218398014028153523099229284453372305360162607175214914896022359103253665
00841337052998274802111165493443628491954989908726427939632974072884713147273483
80881517476118056084955167457469884435518521124302029519618772054486115585239007
80443195676538614000399749853469606897776163528526463557290356692040935349657067
08029331070930696880339929277689488079510585140568063173250998630088795561037089
40683339472190014662977949273027622231358569528348825715832750816484973921231605
64401317412736614466575323679661268019790909360340245917672510672688311521679027
44469351610192256226975341433445404879378565966582087553816138696246515503093881
04029153432198238284553363215261062617174305904986023359012352674008513361439883
64803111074592452628591945899809626437938723964162885713056372492808915165671081
46085955076556478884535509431025202039518709762144487115494338016804531947675287
04001399658952478606997767073429426473556381346782041935258756076080393301619207
86881339838376698488179501495041468073172341988720089795470136098406933385631801
04663977858372036622331349479429248835714923740906485973830330614644113165037267
04467575232778670268119781819261240255916763500762689311430778036444793507011823
46227975250532454404979369475867482097552907128786247515412192890040391525231883
28285553272314270062717165215805886033358103342764009513270538892648131101655825
42629591854998818626537929633865062895712147362582809915074770090460959541675465
68885535418530034202139509619663044497114585328106805531856774296040113987499425
68607997676172438426573547291247682051934349746166081393210718216868913389293667
88489179410594050468173163251889620099794561126188407933294730810046739769493621
26623331258578438248935705833641806495972921320704645113074136276044775743237687
60269119690918270240355907673401662699310521768126445793416110832462379743415225
44405979278574876482197543817029686257514503182980041391434330892282955523633043
60063717074314814886133349013243664019512361528982649131010754834426395909459889
08627537838732874062995703057263482819914165760180461959450774474688955345095201
24203139418718672044597105495229006815530947764386041113896598434686179967671625
28427573456390256682151925259647066091392301708306869913298392676884991785015841
40469173072350898620199785471027088417932385720900047739678592630266333303495685
28249935614932650806595963831221604655112165126366045775652336696602791187819083
60241355816772410662799301431669026455792507100922463379652514234444159783695649
66483197452916038686357505413083880051390525320982283955432732052600737161653049
04887133258112252664119503271429882659130101744924427395818558898086375369297229
64063995612156272482919905075661080471958541764564689955254194210242131385097087
62045597014594238006915521857665286051112987588524687179876770634284375725473803
46683151834358656066191383211609206879912389382766885991694114850404791721633409
88621199694570036088517923295621800057738769582720267333212594694282599347059227
40807595872930230604755103075027266055774743326786603791096918092602513549077625
00663799210530678026555783417001822473378743504324445159692794658664931965439061
28687357414512092880151381435221882293954523722142601737070752058048971323491023
42665119412370438882759121011645824437394909548988087375278396238640739947031463
62483919814174670080571949451665464699954345184300243131294196096620555961055843
28007915430956674286151103897489424697178967760724285375634572812466931509253487
46067191292310618206979903299283666895990785104940405791630732418886311987855601
26089517832394630800157729679483620277332303584784283599256158236408175949639203
20605755012174036266155765653227686613790187908182603513458176634006737983015207
68027555692516010822573369653405224455158783784748665931874538070286973565055021
82881151290534230882393945433623042611736161742148049971232590032426751185033605
28883759030110654824537385819449888097374369386328641739856130472624939189051647
60081571858550674464799945255085200253130385186186621555870154852280179145219467
64287151012996498424797169877661624295374725562902467931418352496460771903833007
08207979812398292666995981695005840415790721722508887311896954610260995169233847
20801157638778492620377323213485684293598347148326409175858738212206157541031641
26267155674752236686713781097809082613512549166724007737892114216680375547835061
00823573278752414224555149693685648675930965528160287973474154030828911503815243
20883393854532632042711727071643048059970323580122427751094132614288937581211007
44825537294918458888197365279287228651738947120562625939098150656600915709495407
64465799854354094200353121295087086631554961144942281179054318476642971501039865
88425797078976670624395365635463802477930509342586461771812932016082179789033883
82667995890794014840515781631623408897310987944700261995078332856208111567297685
82621377232312494684393589257049226419174949728302207157450130650262771547657423
26687713690196818082713503459067624017736983104306681375456934070008335723697425
04225555058792694648775921875429060297972565144120829911412914252208933929455227
22043711636170652048159961233481022437750185122704289937490310016448355363859085
48889197274378296228751729857021462635938189140746601915618594416644757989453441
84201353030394096086731545871045842291178145308566643971410138874884357961699667
60625395274734472802577921419243486471770903922106083179698132892826779949817841
04841515690730632408997301897845600271994169322946209111476396694826313763233025
84685393498356058226519165859629202217156541120740263771456756432266977127811869
08083713412558076624117727893005206691374547924160009335632796434042355541497827
84649775830974438060397963475045020839910503904342209933838554236220537107271607
42049159870332490022537741095023604299936581300106449355272958094488991963653683
86229751638956030462735929099041646611914709584506645757898552450842113521213841
86087731454970054842391169055209466653970501128964885357870798676606353943657245
62803577830518252486571761813823006093178789122982827779858916850048515147817207
22409997210996854600371985079223846219110567386784827313672332034846953925893461
48227519074958638202317147451021640273770547746522267977036910878080937125035481
66625117636992014206791365457825060019334723786524043355450596836846597749219645
28061397872574054020939901413805242219932929544326221537016370616420591589613225
80023537650194032604399927491201006459354363948184489991872752692862397507299461
20463735838198050646711905619485406655756989542540843113430312850860977305459601
44843391078154218466753961411029864895356961788766607353852756254628135769215083
42487571670912832006193169699023882837778949906940049515056916216224199963019869
44601371894178232846319101477287684837312763322124847953834992470482375181659487
28203317056550030640373761457647422277976127900968081937034134490666351167279821
04207791274556834060119325633687424053354541586926847597658318654280713969635641
44021939810512814242319923839445226231536107360706421591498712234800335367411841
22605399836590210006559345273849084499990963742782863397416398470204737349291881
40647711814718494406755747899443440853112521302940861977214558610448533901691443
08467753870510038864995347871689666617352943746344629135678314092424975707619029
22007193078798032882937769859807840059514147906306225199872118878446113709851683
22847319010576296684937303673223024857952925982560483375090758496282133161475401
20641373670556656422377967037801868091936125124580667351076378830042177903655469
24061119234732696424153345451487826857596749308744281713878734650440319389015029
04243319832938454226331527017261606431590589702324801335276510850226153989275803
000


In conclusion, it turned out that there is quite a lot more than one way optimal way to guess the combination (as each finite field has multiple primitive polynomials and each polynomial yields an unique M-sequence). Still, you won't hack into a beer store anytime soon with this, as any practical implementation surely locks you out after some number of tries or adds an delay.

Posted by | Categories: Ideas | Comments »

## Entering PIN without confirmation

16.06.2009 19:52

A brand of electronic locks marketed by a certain security company in Slovenia has a peculiar way of recognizing the correct code.

To unlock it, you need to enter a four digit PIN. However, instead of the usual way where you have to press a confirm key after entering four digits this one unlocks as soon as you enter the correct sequence.

So basically, if the correct code is 1-2-3-4, it will unlock after pressing 9-8-1-2-3-4, or whatever sequence of digits comes before the 1-2-3-4.

It's certain that such a design reduces the time needed to guess the right combination. In theory you can check a new combination for every keypress where a normal lock would require four plus one for confirmation. For example, entering 9-8-1-2-3-4 checks 9-8-1-2, 8-1-2-3 and 1-2-3-4 with 6 presses while a normal lock would require 15 presses).

I'm wondering. What is the optimum strategy for guessing the combination? Is the theoretical one combination per key possible? Or is there necessarily some overlap between sequences entered in this way?

In the general case of n digits and m places I'm picturing this as a directed graph with nm vertices, one for each possible combination, where each vertex has n outgoing edges (corresponding to n possible keys you can press next). So the problem translates to finding a Hamiltonian path in such a graph (which is a NP-complete problem in the general case).

Path for n=5 m=2

Path for n=2 m=3

I have been able to find nicely symmetrical paths for simple case of m = 2. At least from those examples it seems that you can construct the path incrementally, by first exhausting all combinations with n digits before moving to n + 1

It appears possible that an analogue way exists for more places, but I'm having problems drawing three and four-dimensional grids on paper.

Posted by | Categories: Ideas | Comments »

## Angels, Demons and antimatter

18.05.2009 21:17

On Saturday I went to see Angels & demons. It was entertaining to watch, although it was apparent at times that whoever wrote the script was more at home in medieval myths than physics. I guess I got used to weird things characters tend to say in Hollywood movies whenever the topic of conversation shifts to science.

One thing that did made me curious is a scene towards the end when they had to get rid of an anti-matter explosive device in the middle of Rome with 5 minutes to spare. So a guy jumps into a helicopter and flies straight up. When the time is up, the helicopter is high enough that the explosion only causes a spectacular fireworks display and shatters some frescoes.

At the first thought, that seemed like a really bad idea to me. They say that the explosion had an equivalent yield of 5 kT of TNT, which is comparable to a small nuclear bomb. This kind of devices do the greatest damage when they're exploded at some height above the target, so is it really feasible to save a city in the way shown in the movie?

First question is how high can you get in an average helicopter in 5 minutes? After sampling some random technical specs on Wikipedia it appears that the rate of climb of a helicopter varies from 6 m/s in civilian to 13 m/s in military craft. In the movie they use a police or a medical helicopter. It's also empty, so take an average, say 8 m/s, to get a nice rounded figure of 500 m/min.

Second question: How dangerous is a 5 kT explosion at 2500 m if you're standing right below it (like the crowds does in the movie)?

Unfortunately the Strangelove Slide Rule doesn't provide this kind of calculation. I did some more searching and found a couple of old DOS programs from the US Defense Nuclear Agency that answer exactly this kind of questions (they're complete with warnings to reset the computer after entering classified data - I guess they haven't yet come to shredding hard disks at that stage).

The result shows that 5 kT explosion will cause a shock-wave with 10 kPa of maximum pressure on the ground zero directly below. That's a little under 1.5 PSI, which is, according to the previously mentioned slide rule, not so bad for any by-standers. Such a shock will shatter windows, but won't knock down houses or even damage most people's hearing.

Surprisingly, that's exactly what you see in the movie. So this may be one case where they actually got it right.

Posted by | Categories: Ideas | Comments »

## Having given both a whirl

29.04.2009 20:02

I'm fluent in both Perl and Python. I've been writing Perl scripts of various sizes since I began using Linux and more recently I've also worked on larger projects using Perl (from G-Tablix to Wikiprep). On the other hand my work at Zemanta has largely been about writing software in Python. Large amounts of complicated, production critical software in fact.

There's this very obvious philosophical difference between these two languages: Perl is all about there is more than one way to do it, while in Python there is (preferably) only One.

Recently I've rewritten software for this blog in Perl. My colleagues were surprised when I mentioned this. Why didn't I do it in Python, when I had plenty of examples at work of how easy it is to set up a web page with Django for example?

In my experience with both languages the fundamental difference in design philosophy shows in an interesting way: Writing software in Perl is fun by itself. The multiplicity of solutions to a single problem that Perl offers gives you the opportunity to express yourself. You can stop for a moment to contemplate the most elegant way of doing something. It's about writing software in your own, distinct style. To match your thought patterns. Not surprisingly, Perl is often compared to poetry.

Writing Python on the other hand is dull. If you already know in advance the algorithm you are about to write, writing code in Python is pure drudgery. You only have this limited number of valid expressions. You are forced to do indentation just right. Chances are that you'll write exactly the same code as the other guy if he was given the same specification. It's boring because it's simple and straightforward.

You better be discovering something new on the algorithmic level when using Python, or it won't be very enjoyable. On the other hand writing Perl code itself is fun.

Python is for experiments on a logical level. Perl is for experiments on the syntax level.

Then there's the different approach to errors: In majority of cases Perl will do what you want (unless what you want is consistency, as perlfunc(1) author remarks), maybe giving you a warning that you can later research and fix if you're a perfectionist. But in most cases if an undef gets into the stream it's an one-in-a-thousand occurrence that you can safely ignore.

Python on the other will usually throw exceptions in the most unexpected places and a single uncaught exception will kill your program (which was probably 5 minutes away from completing that 100 hour task).

It's the bugs in the code where the situation turns around. Fixing bugs in Python is fun. You get a descriptive back trace by default on all errors, code is easy to read even when it's not yours and it's satisfying to follow that stray None value as it traversed half a dozen modules until it got to some function that actually tried to do arithmetic on it.

On the other hand when something does go seriously wrong in Perl you better be the author of the code you are looking at and be familiar with interpreter's guts. Since functions will silently pass around invalid values and wrongly encoded strings, the root cause of the error you're hunting may be thousands of lines separated from its effects.

Now it's probably clear why I wrote my blogging software in Perl. It's an application that will most likely never be used, maintained or extended by anyone else by me. I was reinventing the wheel. There is no opportunity to be creative with software design of a yet another blog. It's a straightforward job of telling the computer what you want. So at least I was able to get some satisfaction from the act of expressing my wishes by jotting them down in Perl.

Having said that, I still firmly believe Python is way better language for any kind of cooperative work when compared with Perl. Most of software is at one point worked on by more than one person, so on average Python is probably a better bet when starting a project from scratch.

On another thought, even going through your own code a few years later may be considered working together with your younger self. So far I hadn't had any problems understanding my oldest Perl scripts. Then again maybe they're just not old enough.

Posted by | Categories: Ideas | Comments »

## Get Satisfaction suggestion

18.04.2009 0:52

It's not my duty to read and reply to complaints that get posted to Zemanta's corner of Get Satisfaction public support forums. However I do get notified about them and usually read a few that look like they may have something to do with my department.

Surprisingly many of them rank pretty high on the level of politeness and usefulness in regard to solving the problem. However, there are some that make you wonder which evolutionary step the writer of that particular post missed. For those gems, I propose a minor change to the usual I'm feeling happy/sad/whatever cuteness, which would at least Give Satisfaction to the replier, since the original poster is obviously incapable of getting any.

* Not his real name. No offense meant to the chimpanzees.

Posted by | Categories: Ideas | Comments »

## Reversible computing

10.04.2009 19:35

I've stumbled upon a Wikipedia page about reversible computing recently. It's a concept I haven't seen before.

The idea is that there's a theoretical limit to the efficiency of computation. For every irreversible bit operation (for example AND, OR, but not XOR or NOT), it is thought that two new microscopic states are added to the thermodynamical system that's doing the computation, increasing its entropy by:

\Delta S = k\ln{2}

If the temperature T of the computer remains constant, an amount of heat must be released to the environment:

\Delta Q = kT\ln{2}

That's called the von Neumann-Landauer limit, and it comes to 33 picowatts at 10 GHz clock and 75°C

As it's name suggests, reversible computing is exploring possibilities of doing computations with only reversible operations, that is logic gates that have a 1 to 1 mapping between input and output states. In contrast to irreversible operations there is no known lower limit to the amount of work that has to be done by such a gate.

Such computers would be very different from what we have today. For example, a mere variable assignment in procedural languages is irreversible, since the previous content of the variable is lost. On the other hand, you could run a program backwards starting with the result and ending with the input parameters. That would be an interesting feature for a debugger.

As I see it, this kind of the research fits in the same box as Dyson spheres and warp drives. These kinds of limits won't be reached any time soon in practice (they may bug Universal AC eventually). But it is still fascinating to see just how far you can push some concept before the most basic physics starts to interfere with you.

Posted by | Categories: Ideas | Comments »