Pillow talk

18.09.2008 19:12

Software is broken. No, really.

Ask any self respecting software engineer and he'll tell you that software never breaks. It can't wear out in the same way a mathematical equation never degrades over the years of use. The same person is usually quick to add how much he hates the hardware his perfect programs run on. Hard drives always fail when you need them, CPUs overheat and fan bearings seize.

Why is it then that software failure has become so ubiquitous in our lives, that a catastrophic failure in most systems does not even fall under warranty terms, while hardware is guaranteed to work for at least a year without errors or your money back? Why must basically every device today have a little button that says "reset" (or else you curse it because then that same common operation involves removing a battery or pulling the plug). Watchdog timers are common, a mechanism where imperfect hardware helps infallible algorithms do their job. I'm sure the possibility of data loss due to some software bug is several orders of magnitude higher that that of hardware failure.

IBM 402 plugboard by Chris Shrigley

Photo by Chris Shrigley CC BY 2.5

The software itself may indeed be immune to wear and tear (although even that could be debated), but its human authors are all but perfect, especially when faced with the immense complexity that is common today in software engineering. In contrast to physical products, software is usually equally broken when it's brand new as when it's of a ripe old age.

Complexity is causing all of these problems. Vast majority of production versions of software today should fall under the label of crude prototype. Engineering means understanding what you are doing. Software engineers do not understand their creations. Not with all the layers of abstraction, from high level programming languages, to underlying operating systems and complex CPU instruction sets. Even if you're writing low-level assembly, chances are you can't predict exactly how your code will execute on a user's PC. And given the reliability of embedded software in consumer electronics it looks like that's impossible even when you know exactly what hardware the program will run on.

High-level programming languages have made this problem worse. They give the programmer a false sense of security. It was way too easy for a C program to outgrow its creator's capacity to comprehend all its possible execution paths. It's stupidly easy to do that in Python. Latest trend towards web applications sounds like a bad joke in this respect. Industry that isn't capable of creating reliable consumer software that runs on a single computer wants to move to systems that span thousands of interconnected processes.

Physical systems do not tend to grow that large because production costs rise fast with complexity. Software has no production costs, only design and prototyping. And even then majority of design is usually skipped in favor of getting a semi-working prototype out on the market as soon as possible. The lack of documentation and write-only code is a running joke that comes true way to often.

Code reuse is seen by some as a holy grail that will solve this problem. The theory is that you use a library of well checked, proven code instead of rolling your own, probably flawed solution to a common problem. In practice however, this usually means that such code is used without understanding its full behavior and side effects, even when they are properly documented. It also makes it easier to make a blind assumption that someone else did his homework properly so you don't have to. In short, it makes the software author think he's actually in control.

This is not a technological problem and as such can not be solved purely by technological means. Software is still a novelty. Most users will fall for the shiniest, best advertised product, not for the one that will serve them best. Sadly, shiniest is usually the one with most features and hence the most complex and unreliable. Hopefully this will slowly correct itself when market gets more educated and computers stop being magic dust to most people. It's shockingly apparent that today in a lot of cases the final users are the ones that have the worst ideas about what functionality the product should have.

The software industry should also get its act together. It should have the courage to resist the vocal minority of users demanding thousands of new features and focus on providing simpler software that will work for the silent majority. Bugs should not be seen as an unavoidable problem. The engineering community should learn to respect simple, reliable solutions, not the most clever. Engineers should get a firm grasp of the complexity of the systems they are working on, even beyond the lines of code they themselves have written.

And finally new developer tools that aim to help this situation should focus on revealing the underlying complexity, not hiding it. They should help writing better software, not help writing more software. Rapid application development should become a thing of the past.

Posted by Tomaž | Categories: Ideas


Some very interesting thoughts, not sure I agree with all of them, but at least a few strike true. I don't agree that going back to C is the way forward. I think abstractions help to simplify the problem. High-level programming languages help constrain the problem. Adding features such as garbage collection and not allowing developers to do pointer arithmetic greatly reduces the final complexity.

I think it is true what you say about complexity though. Software is a new beast and no one really knows how to deal with it. It evolves and changes based on who is coding and who is driving development. The specs can change on a daily basis and it is rarely clear exactly what one is trying to build. While the building blocks are not perfect, they are getting better and some should be used. For example the networking infrastructure such as the TCP library has been developed and managed well. There is no real reason to have to code your own. It would be great to see software engineers doing their work and understanding the libraries they are using, but I don't really see that happening. Right now I think we need to move in the direction of directing developer's attention to particular problems, though even that we don't really know how to do. I guess what I'm trying to get at is that by doing reuse, if a problem is discovered in the underlining library it needs to be fixed one time. There are bad libraries out there, but there are also quiet a few rather good ones.

Found this article using Zemanta when writing a post comparing Software Engineering to other forms of Engineering. Check it out if interested (http://www.blog.graphsy.com/?p=140)

Thanks for your comment Maxim.

I didn't say that we should move back to C. Yet garbage collection is one of those things that in my opinion only hides complexity from the developer. When writing in C programmers are aware that they must free their allocated memory. But when you tell them that they have garbage collection, memory management is suddenly considered magic. Rarely people understand how their particular GC is implemented and how it can still lead to problems like circular references or undeterministic run time.

I agree what you say about libraries. Of course, no one should make their own TCP implementation. Networking is a nice example of this, because a lot of programs I've seen will crash when a network transaction doesn't complete in the expected way. Not because the TCP library has a problem, but because the user of the library didn't know failure modes of the particular function he was using. It gets worse when you have several layers of libraries.

The problem I see with code reuse is that it makes it easy to make programs that seem to work and pass some test cases, but will fail in a lot of cases that weren't actually tested.

Posted by Tomaž

Add a new comment

(No HTML tags allowed. Separate paragraphs with a blank line.)