Python regex trap
17.04.2008 19:09
This simple Python one liner will not do what you expect:
>>> import re
>>> re.sub("the", "", "The Last Lecture", re.I)
Before someone again accuses me of being illiterate: Yes, this behavior is documented. It's still one of those things that go undetected for a long time and bite you in the most inappropriate time. It's right up there with C's "if(x=0) ..." on the annoyance scale.
Juggling parentheses in Perl
26.03.2008 18:27
Matching balanced constructs (like some text that must contain matching pairs of parentheses) is a classical example where traditional regular expressions fail. It isn't surprising then that the Perl FAQ says that you should use text parsing functions from Text::Balanced module when you need to do that.
Perl however also provides extended regular expressions that among a lot of other things also support this kind of matches. Theory says that such extended expressions can't be compiled to finite automata and so typically have worse complexity than the traditional O(n) - something that is also mentioned in another part of Perl documentation.
So when I was confronted today with a problem that involved matching balanced "{{{" and "}}}" pairs (guess what kind of markup uses them) I naturally followed the suggestion in the FAQ.
Well, it turned out that that suggestion isn't one of the best ones. Text::Balanced is slow. Actually it's so slow that the identical code that uses the extended regular expression from Regexp::Common::balanced is around 50 times faster.
So much for the difference between theory and practice.
Linux mmap weirdness
14.03.2008 19:10
Linux mmap() call is full of surprises. Take for example the MAP_PRIVATE option. According to the man page, mmap() with this option makes copy-on-write pages, meaning that any changes the process makes to the mmaped region will not be propagated to the underlying file.
This is of course works as advertised, but there is a catch. What if some other process modifies the file? Take for example this simple program:
#include <stdio.h>
#include <fcntl.h>
#include <sys/mman.h>
int main()
{
int fd = open("file", O_RDRW);
char *p = mmap(NULL, 1,
PROT_READ | PROT_WRITE,
MAP_PRIVATE, fd, 0);
while(1) {
printf("%c\n", *p);
sleep(2);
}
close(fd);
}
If file contains a character "A", this program will print a string of As to the console. Now while the program is running you change the contents of the file. Does the output of the program change?
It turns out it does! However only if the program didn't change something in the mmaped region. If you modify the program above so that it writes something, the stream of characters on the console will not change when you modify the contents of the file.
Why is this significant (and why I spent a couple of hours exploring it)? It turns out that the dlopen() call in Linux loads shared objects by simply mmaping them (look at the strace output if you don't believe me). So if you change a .so file on the disk while some application is using it, you'll get a nice segmentation fault.
Now, /proc/*/maps file reveals that the executable itself is also mmaped. However a program doesn't crash when you modify the executable file, so either something gets changed in the program's image after it gets mmaped or I still don't understand everything that's going on here.
LiquidPCB
02.03.2008 19:09
A couple of weeks ago a post on geda-dev mentioned a new, free (as in freedom) application for designing printed circuit boards - LiquidPCB. At its web site Hugo Elias, LiquidPCB's author, points out some shortcomings of gEDA's PCB and other traditional CAD tools and how he attempts to fix them in his software. He says it's all about a new, modern user interface.
Discussion that followed on gEDA's mailing list didn't actually touch any of the problems pointed out by Hugo. With some help I managed to get LiquidPCB running this weekend on an old laptop with Windows, so here are some of my thoughts about it.
I mostly agree with Hugo regarding user interface problems. gEDA's PCB has an archaic GUI - most core developers still use its Lesstif incarnation (Lesstif is an reimplementation of Motif, a GUI toolkit from 1980). It surely requires quite a lot clicking around that could be avoided in this or other way, not to mention that it looks a bit out of place on a modern desktop. However I think most of these problems could be solved in a way that doesn't depart so radically from what everyone is used to today.
I think problems that LiquidPCB tries to solve can be split into two groups: First there are those that are shared by basically every complex desktop application today. I believe it's wrong to try to solve this kind of problems separately in each application. For example LiquidPCB reimplements a file selector dialog. From the usability standpoint this is a component that should be shared between all applications. So I believe a better way of solving this problem would be for example to submit patches for GTK file chooser dialog.
Another example are problems with the menu and toolbars. I believe this can be solved with better placing of entries in menus (gEDA's PCB for example has a problem with "Select" and "Buffer" menus where I often open the wrong one). I can't say if the hexagon menu system used by LiquidPCB is better than a well done traditional interface. I found it awkward, but that's because I'm used to traditional interfaces. What bothers me the most is the requirement to hold down the right mouse button while traveling through menus. What I'm almost sure of is that it can't replace keyboard shortcuts. The keyboard with its 100 and so buttons still provides a faster way of switching program states (like switching between select mode, via mode, etc.).
The other group of problems is specific to PCB design. In this field the approach of LiquidPCB looks superior to anything PCB has to offer. What you are doing in PCB design is solving a topological problem. You have a graph and you need to place the vertexes and edges on the plane so that their placement follows certain rules. Now the approach LiquidPCB takes is that it makes you solve that exact problem without requiring you to fiddle with side issues like the exact location of elements. You only say "pin A is placed on this side of line B" and LiquidPCB takes care of the rest. It feels like an autorouter that uses a user's brain to do what it does best (solving the topology) and the computer to do what it does best (finding placement for vias and lines). The example of LiquidPCB's web site showing how to insert a via in the middle of four tracks shows just how much work this approach saves you. In PCB this operation would really involve hundreds of steps where you would first move each of the existing lines out of the way (which could also require moving several components) and only then insert that new via.
I see some purely practical problems with LiquidPCB's approach. For example the constant dynamic optimization means that just opening a file also makes modifications in it. And also, at least from half an hour of playing with it today, it seems it is quite easy to make a mess of things if you're not careful. But I'm sure these problems will be easy to fix.
So in conclusion I think LiquidPCB addresses some quite real problems and shows some promising solutions to them. The program itself is in an early stage of development and isn't useful yet for anything more than playing with it. It will be very interesting to follow its development.
Gschem grid hack
27.01.2008 14:10
Some time ago gEDA moved their source repositories from CVS to Git. I've heard a lot of nice things about Git but so far I stayed away from it since its philosophy (and terminology - for example meaning of words like commits and revisions) is different enough from CVS and SVN that I never quite understood what certain commands did. Yesterday I finally took some time and read through most of its manual.
I must say that after reading the documentation things do begin to make sense. So I checked out the latest gEDA sources and experimented with Git a bit. Since I wanted to try out how convenient it is to make modifications to your local branch and then submit patches back upstream I started modifying things and in the end I (mostly) fixed one annoyance with gschem.
The problem is that when you have the view zoomed out the schematic quickly gets unreadable. I tried to solve this problem a while ago by porting gschem to Cairo, which provided nice anti-aliased lines and fonts. However that had problems of its own and Cairo support still hasn't been incorporated into gschem source.
Now I took a much simpler approach. It turned out that mayor factor contributing to unreadability is the grid. When zoomed out the features of the schematic get lost in closely spaced grid points. However making grid darker produced the problem that when zoomed in the points were hard to see. So I did a simple modification (made less simple by the awkward handling of colors in the code) that made grid points stand out less from the background when the view is zoomed out.
As you can see from the pictures below, the difference on a zoomed-out view is quite noticeable. It definitely makes gschem usable for editing on one zoom level more than before.
Left patched gschem, right original gschem.
Left patched gschem, right original gschem with grid color changed so that it matches grid color of patched gschem when zoomed out.
You can download the patch from SourceForge.
C++ streams suck
18.01.2008 20:19
My work at Zemanta in the last two weeks included optimizing performance of some core software components. This mostly means moving stuff from Python to C++ which also means that I've been exposed to more C++ in the last week than in five years at the Faculty (where I tried to stay away from it as much as possible except for a few odd KDE hacks back when I still used it).
Anyway, IO streams always seemed awkward to me. But that's usually the case when you start using a language you're not fluent in. So I tried to go on with them and waited for the enlightenment when I would say "Oh! I don't know how I ever lived with the plain old stdio.h". Well, today I finally got it, but it was the other way around. I'm moments from making it a policy that all my C++ code will strictly use the plain old stdio.h for console and file I/O.
Why? Well, consider the following example. I want to make a string that will hold a hex representation of a 32 bit integer. It's going to be read by another program so the format must be pretty strict. Using the C++ way, you would do that like so:
stringstream g;
g << "0x" << hex << setw(8) << setfill('0') << i << endl;
It's long and relatively unreadable. Add a couple more manipulators and you could easily have a statement that prints 10 characters span multiple lines. Compare this with C equivalent:
printf("0x%08x\n", i);
Ok, maybe I'm biased regarding readability. However compare what these two programs print out:
# C++ version 0x000b,eaf # C version 0x0000beaf
Why is there a comma in the middle? I just spent an hour debugging this earlier today. The cause? Well the comma is a thousands separator from the current locale which the stream library helpfully inserts in your string. Yes, a thousands separator that actually separates four thousand and ninety sixths. A solution? Adding another string manipulator that modifies the locale (a manipulator that just turns off the thousands separator doesn't actually exist). Just beautiful.
The fun doesn't stop there. I spent another hour today debugging some piece of code that opens a file. In the end it turned out that the open call was failing because I tried to open a file that was a couple of megabytes over the 2 GB limit and the program wasn't compiled with support for large files. I don't mind that, it was my mistake. However I do mind that it took me as long as it did to find the cause.
The fstream objects just silently fail when there is something wrong. The damn things don't even throw exceptions by default. Isn't that the C++ way of handling errors? Instead they set some hidden state you have to specially check and even then you can only get the message that something failed, not what specifically went wrong. Add to that the horrible mess that is STL documentation and you see why I only found out what was causing the program to crash when I dumped fstreams and rewrote the whole thing to use stdio.h functions.
So, in conclusion, C++ itself is pretty nice as a language (and blindingly fast compared to Python, which has the speed of a snail nailed to the table) but the standard infrastructure that comes with it causes more problems than it solves. Instead of using it and waiting to find out the bright sides of it, I'll use the old way I know from C and wait until someone else shares his enlightenment with me.
Eee power utility
02.01.2008 21:41
asus_acpi.ko kernel module provides an interface to what look like electronic power switches for different peripheral devices in Asus Eee PC.
For example doing echo "0" > /proc/acpi/asus/camera will cut power to the integrated web camera. If you want to get as much battery life out of Eee as possible it helps to switch off devices that you're not using. However typing commands like above quickly gets boring. So here's a little Perl script that does that in a bit more user-friendly way:
avian@galxpolx:~$ eeepower
Asus Eee PC peripheral power utility. Copyright (c) Tomaz Solc. Under GPL.
SD/MMC reader off
Wireless LAN on
Camera off
avian@galxpolx:~$ eeepower help
USAGE: eeepower [ device ] [ on | off ]
Available devices:
cardr SD/MMC reader
wlan Wireless LAN
camera Camera
gEDA application icons
17.12.2007 20:31
I just finished the new collection of Tango-style application icons for the gEDA suite. Who says that professional software can't also look nice?
Actually a lot of credit for these icons could go to Wikipedia contributors. If they wouldn't make the English Wikipedia so large, I wouldn't have time to play with such things.
On a more serious note, these icons go nicely with MIME-type icons I made previously. They are available under the GPL license - from what I gathered content under Creative Commons can't be included with gEDA source distribution for some reason. You can download scalable versions here.
Zemanta NanoBlogger plug-in
02.12.2007 16:26
I'm a big fan of NanoBlogger. I used it to make a number of web sites - from tablix.org and the blog you're currently reading to OpenOffice.org conference site and Society of cultural studies.
You might also have heard about Zemanta's Suggest system. Among other things it makes blogging easier by automatically adding explanatory in-text links, stock photographs, keywords and links to related articles to your blog posts. It's currently being tested and will be available soon for popular blogging platforms like Wordpress and Blogger.
Since I'm one of the developers of Zemanta Suggest and I want to use the system on my blog (eat your own dog food and all that), I made a NanoBlogger plug-in for it. It's written in Bash and Perl and communicates with the system via JSON. It supports all features of the system and is fully configurable. For example I'm only currently using it for in-text links because I think other additions don't fit on my type of blog.
Here's how an ordinary blog post looks on a vanilla installation of NanoBlogger:
And here is the same blog post with the plug-in installed and with all features turned on:
This is how it looks behind the curtains (i.e. the user interface when you are writing a new blog post). Note the perfect integration into NanoBlogger user experience:
Anyway, if you want to try it out, drop me a mail (I can't just put it up for download because you need an API key). Installation is easy - you just drop two files into NanoBlogger's plugin directory.
Update: coworker unfamiliar with NanoBlogger said I should give a before-after comparison.
My next Mac will be a PC
01.12.2007 14:55
One simple reason: it got really hard to do anything useful on my PowerBook after I upgraded to OS X 10.5 Leopard.
My office suite of choice is OpenOffice. Granted it wasn't very responsive even before the upgrade, but at least it responded fast enough to be usable. Now, it either doesn't start at all (version 2.2) or randomly crashes (version 2.3) after a couple of minutes of work. Now I'm using NeoOffice. That works, but has other quirks (like every once in a while it doesn't let me change the font). Did I mention how slow it is? It uses up practically all of computer's 1.25 GB of RAM and still takes around 5 seconds to show up a font selection dialog.
Other applications like Firefox and Thunderbird also got less stable. I often get first the spinning beach ball cursor and then after some minutes of unresponsiveness a crash dialog.
X11 applications are another pain with Leopard. GIMP and Inkscape will crash X11 or X11 will crash them once every 15 minutes. And they are also slow - much slower in my opinion than on OS X 10.3.
I also had to disable practically all of Leopard's fancy new features (at least those I could turn off). Spaces for example. After a couple of weeks of using it I found out that Spaces is a terribly broken implementation of virtual desktops. For example if I have several terminal windows scattered across desktops, and I switch with alt - tab to the Terminal application, Spaces will take me to a random desktop that has a Terminal window on it. Unusable and nerve breaking since most of the windows I used are terminals. It also plays terribly with X11.
There's also a problem with reboots - for some reason the system won't shutdown properly. It closes all applications but then hangs, showing just the desktop (something that painfully reminds me of similar problems I once had with Windows 95). Only a long press of the power button will help. The worst thing about that is that it does the same thing when restarting after installing software updates. I'm just waiting for some broken update to do some more serious damage.
Finally there are also some minor things that make the entire system feel unpolished. Like clipping errors on icon titles in Finder and some weird cases where I get random noise instead of transparent background in terminal windows.
Oh, and did you know that Finder now represents all non-Apple computers on the network with icons showing monitors that are displaying Blue Screen of Death? Now that's just childish. No Linux desktop I've ever seen did this and the average Linux user probably hates Windows more than an average Apple user.
Two years ago when I bought this PowerBook I bought it because it was a computer that really just worked. It seamlessly synchronized with my mobile phone and my other hardware, all applications worked out of the box and most free software I used on Linux also ran on OS X. I could really concentrate on my work instead of having to continuously tweak the operating system - something I don't want to do when I'm traveling with my laptop. Looking back this computer gave me the least problems and best experience of all systems I used.
With Leopard, all these benefits are gone... And if I can't have a system that works, then I rather have a PC with Linux where at least I can fix things myself when they aren't working.
My very own Perl bug
19.11.2007 0:30
They say that every hacker once in his life founds a bug in Perl. It seems I just found mine.
It started when I was hacking on Wikiprep, the Perl script we're using to parse Wikipedia. After I made some trivial change and ran it on a sample of Wikipedia pages I was greeted with a Segmentation fault message on the console. Since this was the first time I saw Perl do that, I quickly ran the same thing on another machine (I still don't trust Leopard on my laptop), this time a trusted Debian GNU/Linux development server. When I got the same result I knew I'm up to something.
After a couple of hours of testing I came up with this short script that consistently crashed Perl on every machine I tried:
sub a {
my ($ref) = @_;
$$ref =~ s/\[\[[^\[]*\]\]/&b(pos($$ref))/eg;
}
sub b {
my ($a) = @_;
return "";
}
$a = "";
open(FILE, "<sf.dat");
binmode(FILE, ':utf8');
while(<FILE>) {
$a .= $_;
}
close(FILE);
&a(\$a);
sf.dat in this case is a UTF-8 encoded file with some links in MediaWiki markup (you can see it here).
After some consulting with people on #perl IRC channel we found a work around and my work on Wikipedia could continue. The crash seems to be caused by the pos() function, which can be replaced with lookups to @+ and @- arrays.
I haven't yet reported this issue since I was told Perl 5.10 is just around the corner, which might have this issue already fixed.
Quantum top
27.10.2007 10:00
Large data structures in Python
20.10.2007 20:57
A lot my work at Zemanta has to do with storing large amounts of data (like titles of lots and lots of Wikipedia pages) in memory. Since the main problem here is running out of swap space I've done a couple of simple experiments with different data structures in Python and measured how much memory each of them used versus number of stored objects.
How I got these results: I used Python 2.4.4 (as packaged for current Debian Testing). The test machine is running Linux kernel 2.6.21 (again from Debian Testing) and has 1 GB of RAM. The metric is virtual memory size, as reported by the kernel in /proc/*/status (I used this piece of code)
First I tested dictionaries (or hashes in Perl-speak). The test code simply adds entries to the dictionary one by one and writes out virtual memory size every 1000 iterations. I've made three different tests here: simple integer to integer mapping, string to integer, string to tuple and string to list.
bighash={}
count=0
step=1000
while True:
for n in xrange(step):
# bighash[count]=1 #1
# bighash[str(count)]=1 #2
# bighash[str(count)]=[1] #3
# bighash[str(count)]=(1) #4
count+=1
if count>=10000000:
sys.exit(0)
sys.stdout.write("%d\t%f\n"%(count, memory()/1024.0/1024.0))
Results:
The most interesting point here is how inefficient lists are compared to tuples - there is no significant difference when storing an integer in the dictionary or a tuple containing a single integer.
In the second part I compared tuples and lists. For the first test I used append method to add elements to the list on each iteration. For the second test I repeated that with tuples. However because they are immutable I created a new tuple on each iteration with one additional element. The third test is identical to the second except this time it was done with lists (i.e. I treated lists as if they were immutable like tuples).
# biglist=[] #1 and #3
# bigtuple=() #2
count=0
step=1000
while True:
for n in xrange(step):
# biglist.append(1) #1
# bigtuple=bigtuple+(1,) #2
# biglist=biglist+[1] #3
count+=1
if count>=100000:
sys.exit(0)
sys.stdout.write("%d\t%f\n"%(count, memory()/1024.0/1024.0))
The second and third tests ran so slowly that I only tested sizes from 1 to 100000.
Interesting result here is that tuples do not seem to be significantly more efficient than lists when storing a lot of items (see the right end of the lower graph). However adding another element by creating a new tuple is very inefficient and takes a lot of memory and CPU time (as expected for an immutable data structure).
Wikipedia is broken
14.10.2007 18:24
When the World Wide Web and HTML were designed, a decision was made to try to make web page authoring as easy as possible. That meant that web browsers gracefully accepted all documents, even those that did not strictly conform to the HTML syntax, and tried their best to present them on the screen in the way document authors intended. This was probably one of the key factors of why WWW became so popular - everyone with a text editor and some patience could come up with some tag soup document that would be silently rendered by his web browser without displaying a single error message. However this also became a major problem of the web, because no one wrote standards-compliant HTML and browsers were forced to become more and more complex to cope with all the mistyped garbage that was floating around.
Wikipedia was founded good 10 years after the World Wide Web and it's current engine MediaWiki a year later. At that time the tag soup problem of the web was already well-known. You would think that the founders of Wikipedia would learn from history and would know that giving your users too much freedom in regard to markup syntax will only lead to problems. In reality it seems that exactly the opposite is true.
The syntax behind Wikipedia pages today is so diverse, filled with hacks and workarounds for errors and typos page editors made that the only thing capable of properly rendering a page from Wikipedia is MediaWiki itself. It's wonderfully difficult to use Wikipedia dumps from any other software and for any other purpose than displaying them in the browser. It takes for example a 2000 line Perl script Wikipedia Preprocessor to make sense of the most of the garbage and make information even remotely machine-readable.
Consider for example this comment from Wikiprep:
# The correct form to create a redirect is #REDIRECT [[ link ]], # and function 'Parse::MediaWikiDump::page->redirect' only supports this form. # However, it seems that Wikipedia can also tolerate a variety of other forms, such as # REDIRECT|REDIRECTS|REDIRECTED|REDIRECTION, then an optional ":", optional "to" or optional "=". # Therefore, we use our own function to handle these cases as well.
What possible reason could there be to allow this kind of flexibility in the markup syntax? The only one I can think of is that some administrator noticed a broken page that for example had a "REDIRECTS" keyword instead instead of "REDIRECT" and instead of fixing that page fixed MediaWiki to support this typo. There are a lot of other cases like this. For example disambiguation pages can be marked with {{disambiguation}}, {{disambig}} or {{dab}} because of those who can't remember the name. Then there is this strange policy of ignoring the case of the first letter in a page title and distinguishing the case of subsequent letters. I can't imagine a good reason for that.
In the end I have a feeling the syntax itself is starting to bite back. With time it got more and more complex. Take for example the source of this Wikipedia template:
<div class="notice metadata" id="disambig">
{|style="background:none"
|style="vertical-align:middle;"|[[Image:Disambig gray.svg|30px]]
|style="vertical-align:middle;"|''This
[[Wikipedia:Disambiguation|disambiguation]] page lists articles about distinct
geographical locations with the same name. If <!-- you are viewing this
online as opposed to as a [[hard copy]] and -->an
[[Special:Whatlinkshere/{{FULLPAGENAME}}|internal link]] led you here, you may
wish to change the link to point directly to the intended article.''
|}</div>
</div><includeonly>[[Category:Ambiguous place
names]]</includeonly><noinclude>[[Category:Disambiguation and
redirection templates|Geodis]]</noinclude>
This neither human nor machine readable and the only thing that can make sense out of it is the MediaWiki with its 100000 lines of PHP code dedicated to interpreting mess like this. Just figuring out what gets included from a template page is complex, full of special cases and exceptions:
# We're storing template text for future inclusion, therefore, # remove all <noinclude> text and keep all <includeonly> text # (but eliminate <includeonly> tags per se). # However, if <onlyinclude> ... </onlyinclude> parts are present, # then only keep them and discard the rest of the template body. # This is because using <onlyinclude> on a text fragment is # equivalent to enclosing it in <includeonly> tags **AND** # enclosing all the rest of the template body in <noinclude> tags. # These definitions can easily span several lines, hence the "/s" modifiers.
The very Wiki markup that made Wikipedia accessible to many is now making hard for common people to contribute. If I want to make a new page on Wikipedia today and mess up the markup there is a good chance it will get deleted. It isn't realistic to expect people will read through long, boring pages describing the markup.
How exactly would one solve this problem? I don't know, but I'm sure it won't be easy - most of the pages on the Web still aren't standards-compliant. The difference with Wikipedia is that it is all under the control of WikiMedia Foundation, so in theory it would be possible to try to automatically convert all pages to some saner, more strict markup and manually fix those that failed to convert. However it would require some enormous effort and it would probably turn away a lot of current editors so I don't think it will happen any time soon.
Galaksija ROM disassembly
27.09.2007 14:20
I've put online an incomplete Galaksija ROM disassembly in HTML format (with links to functions and all).
Oh, and z80dasm 1.1.0 is in Debian Unstable now.
Galaksija genuine advantage
22.09.2007 20:36
There's a lot of Galaksija emulators out there. First there's one for DOS and Windows. Then you have one for ZX Spectrum and now also for SAM Coupé.
So how can you, as a simple user, be sure that you're getting the real 1980s Galaksija experience instead of some emulator of a questionable origin? Well, fear no more: Galaksija genuine advantage software will tell you exactly what kind of machine is behind the display you are looking at:
On a more serious note. It's amazing that Galaksija can be emulated so well on another machine of its class (same CPU with similar clock frequency), but it isn't that hard tell apart emulator from the real thing.
This diagnostic program just looks at properties of different sections of Z80 address space. CNST means that you get consistent values from it, suggesting there is something attached to the bus at that address and MMRY means that it can remember a value that was previously written to it (like RAM for example). Emulators also patch some bytes in ROM, so a check sum is calculated and compared to a known value.
Update: Here's how it looks when it runs on an emulator:
Latex tip
21.09.2007 23:54
I have a feeling I had this problem at least once before, found the solution and forgot about it. So, here's the whole story hoping I won't forget about this the third time.
The twoside option to the article document class in Latex makes the document suitable for duplex printing (for example by making margins slightly different on odd and even pages, etc.). It's just one of those little things that make Latex documents look really nice when printed. However Latex documentation doesn't mention that it also has one nasty side effect...
It changes page layout from this:
to this:
Why is this you ask? Some grepping through the Latex configuration brings up this little gem (in article.cls):
\if@twoside \else \raggedbottom \fi
Which means that the raggedbottom option is not enabled if the twoside option is enabled. I'm sure a perfectly reasonable explanation exists for this (as it seems to for every other thing in Latex), however to me this kind of layout just looks plain ugly.
The solution is obviously to include a \raggedbottom command at the beginning of the document. This way the page layout you've painstakingly fine tuned doesn't break when you switch from single side to duplex printing.
z80dasm 1.1.0
17.09.2007 22:43
A few days ago I uploaded version 1.1.0 of z80dasm. Here are a few bits from the changelog:
Rewritten symbol table routines. Replaced old symbol table from dz80 that used statically allocated strings with some more modern code. Better comments in the symbol file regarding where and how a particular symbol is used is a nice side effect of this.
Support for input symbol files. You can now give the disassembler knowledge about known symbol values (constants, function locations, etc.) in the form of a standard Z80 assembler symbol file.
Support for splitting binary file into data and code blocks. Along with the symbol file you can also now give z80dasm a description of data and code sections in the binary file you're working on. It will then write data sections with "defb" or "defw" directives in the assembly file instead of disassembling them.
Collection of Galaksija demos
24.08.2007 18:28
Here's a short video of all of new demos that are included in the 0.2.0 release of Galaksija development tools:
Chariots of Fire
23.07.2007 0:15
Chariots of Fire is a short (260 bytes) program for Galaksija that plays a part of "Chariots of Fire" track by Vangelis.
The music it produces resembles some of the more advanced beeper music from Sinclair Spectrum. I was always curious how people were able to reproduce anything else than simple tones on a 1-bit D/A converter and this small program was nice a opportunity to learn about it.
It turns out that the code implements two independently running loops with a slightly different frequency. Each one of them changes the state of the audio output on each iteration (from 1 to 0 or 0 to 1). Result is an oscillator that slowly changes its duty cycle from 0% to 100% and back again. Here's the waveform of one note:
(listen)
The changing duty cycle periodically shifts the signal's energy from lower to higher frequencies and back again. Since the beeper (or whatever is between the audio output and your brain, including your ear) has a limited bandwidth, a lot of the higher frequencies get cut off. This means that the sound gets periodically stronger and weaker, which can be used to produce the illusion that the square wave signal has an envelope added to it.
Compare this with a simple square wave of the same frequency:
(listen)
And here is the same square wave with an added envelope, roughly matching the changing duty cycle:
(listen)
Dancing Demon on Galaksija
15.07.2007 16:04
Dancing Demon was a famous game for Tandy TRS-80 model I written by Leo Christopherson. I found out about it when I was researching the origins of Galaksija's operating system.
It turns out TRS-80 and Galaksija not only have very similar BASIC interpreters but also very similar graphics capabilities (which led me to believe Galaksija's ROM was based on Tandy's ROM, not Microsoft BASIC, but that is another story). To prove that, I ported Dancing Demon to Galaksija:
Of course, the whole program didn't fit into Galaksija's 6 kB of RAM (I believe the TRS-80 model on which the original Dancing Demon ran had 16 kB). I had to strip away the editor and basically everything else except the dancing animation (however you can still edit the dancing routine with BASIC editor in Galaksija's ROM). Melody playback also didn't make it (besides, Galaksija's software video would make that tricky), but I did manage to preserve the clicking sound (played back through Galaksija's cassette port, of course).
Source will be included in the next release of Galaksija development tools.
Galaksija demo
27.06.2007 0:49
Here'a a short assembly program (around 180 bytes of code) that demonstrates a small part of Galaksija's considerable graphics abilities :).
Some screenshots (it actually looks quite nice in motion if you're looking from far enough):
Manic Miner
21.06.2007 23:50
Manic Miner was one of the first computer games I played. It was also the first game I tried on my newly repaired Spectrum. Out of curiosity I did some googling around for technical information about this game. It turns out there's plenty floating around the Internet.
I was quite amazed that besides being one of the most popular games on the Spectrum it was also very well designed. The author (Matthew Smith) pushed a lot game's complexity from code into data: there's a pretty simple game engine taking the bottom 12 kB of Spectrum's 48kB RAM. The rest is available for data describing maps, sprites, etc. Within engine's limits you can make your own monster-infested levels simply by manipulating the data structures with no need to modify (or even understand) the engine's machine code. This made it possible for other people to make completely new games (I guess people would call them mods today) using his engine: there are several tens of them listed at World of Spectrum. This was made even simpler by the fact that the original Manic Miner release didn't employ any copy protection schemes or fancy loaders - you could push BREAK after the first part of the game's code loaded and do some simple modifications directly from Spectrum's BASIC, without needing some extra software.
It's also interesting how the engine development progressed over time: With Manic Miner's engine you could load at most 20 levels into Spectrum's memory (in addition to the engine) with each level taking 1024 bytes. It supported conveyor belts, crumbling floors and two types of monsters per level. The next version of the engine (used in the sequel Jet Set Willy) on the other hand supported 60 levels (256 bytes each) and added swinging ropes, stairways, more monsters per level and ability to connect different levels so that the player can freely walk between them instead of just passing from the first to the last level like in Manic Miner.
All this reminds me of for example more recent ID software's games like Doom and Quake. They are also written in much the same way: a compact, simple engine and a larger WAD file, containing data about the levels. Not surprisingly these game engines were also used in a large number of other games.
Here some further technical reading: Jet Set Willy and Manic Miner internals.
Lolcode
20.06.2007 12:47
If you think C supports some funny syntax, how about this:
HAI CAN HAS STDIO? PLZ OPEN FILE "LOLCATS.TXT"? AWSUM THX VISIBLE FILE O NOES INVISIBLE "ERROR!" KTHXBYE
Lolcode is a new Turing-complete language that will probably end up along such legends as Whitespace and Brainfuck. They even have a book cover ready:
z80dasm 1.0.0
18.06.2007 13:55
Here's the first release of z80dasm, my fork of the dz80 disassembler.
Some basic information about it can be found here. More details in the README file and the included man page.
Enjoy.
Assemble, disassemble, assemble, ...
14.06.2007 23:50
I'm getting tired of hunting Z80 assembler and disassembler bugs each time I want to focus on researching Galaksija's operating system.
Today I wanted to squash all those bugs once and for all, so I turned z80asm (a nice Z80 macro assembler) against its bitter rival z80dasm (my soon-to-be-released fork of dz80 Z80 disassembler).
After a long debugging session of z80dasm, I finally got the desired result (interestingly I found no new bugs in z80asm):
$ stress Final results of stress testing =============================== Tested: 2000 Failed: 0 $
(In each iteration this script copies 16kB of data from /dev/urandom, disassembles and assembles it and compares the result with the original)
IPv6 connectivity in Debian
11.06.2007 11:55
Hruške wrote about how to get IPv6 connectivity in Debian.
A few days ago I found out by accident that there is a simpler way: just install the tspc package (Some web site said to me "you are using IPv6" and I said "Wow. When did I configure that?" then remembered I installed some IPv6-related package a while ago to read its documentation). No configuration necessary and it works from behind a NAT.
It is a different kind of connectivity though (point-to-point tunnel versus 6to4), but it's good enough for just poking around the 128 bit address space and trying out a few pings.
PVM over firewalls and the Internet
06.06.2007 23:06
PVM was designed to connect different machines into a cluster over low-latency high-speed LANs. Some applications however do not need good connectivity between cluster nodes in order to work efficiently.
Now imagine you have a four-way Opteron machine at your disposal and the only thing that prevents you from adding it to your virtual machine is the fact that it is 30 km away and the only route to it leads through two firewalls and a block of public Internet.
PVM uses a combination of TCP and UDP connections for communication between nodes so you can't use SSH port forwarding. However there is a way to tunnel any kind of traffic over SSH and make a simple virtual private network using a combination of SSH and PPP daemons (see VPN HOWTO) which will also support PVM traffic.
The complete installation described in the HOW-TO is too complicated for one time use. So here are simple step-by-step instructions:
- Make sure you can make an SSH connection from master to opteron with a single command and without entering a password (e.g. by using RSA or DSA authentication). To traverse firewalls you can use SSH port forwarding for this connection (since you only need one standard TCP connection from master to opteron to achieve this - in my case it was SSH-over-SSH-over-SSH)
- Download and install pty-redir on master. It's only 100 lines of code, so it's easy to audit if you're worried about security.
- Install pppd on both master (the local host that runs the pvm console and controls the cluster) and opteron (the remote host you want to add to the cluster). Pray that both machines have ppp support enabled in kernel.
- Put the following into /etc/ppp/options on master:
- Put the following into /etc/ppp/options on opteron:
- Run ./pty-redir /usr/bin/ssh -C -c blowfish -t -e none -o 'Batchmode yes' -i path-to-your-rsa-key opteron-host-name /usr/sbin/pppd on master.
- Run pppd /dev/ttyp0 192.168.X.1:192.168.X.2 on master (replace /dev/ttyp0 with the device name that is printed by the previous command) Choose X so that these two IPs do not conflict with any other networks master and opteron are connected to. 192.168.X.1 becomes master's IP and 192.168.X.2 becomes opteron's IP in this little VPN.
- Write a PVM hostfile like this:
- Start PVM on master: pvm -n192.168.X.1 hostfile
- Enjoy :)
noauth
ipcp-accept-local ipcp-accept-remote proxyarp noauth
192.168.X.1 sp=1000 192.168.X.2 sp=10000 ...and any other local machines in your cluster
Galaksija tools 0.1.0
29.05.2007 10:15
From the README file:
This is a loose collection of tools for use with Galaksija home
microcomputer:
gtp2wav - Convert a binary Galaksija tape file to an audio file for
loading to a Galaksija computer through the sound card.
bin2gtp - Encapsulate a Z80 machine code block into a Galaksija tape
file together with a simple BASIC loader.
pgm2scr - Convert a bitmap to a Galaksija video framebuffer.
chargendump - Inspect and dump characters from a Galaksija character
generator ROM image.
include/ - Header files for use with a Z80 assembler that contain useful
macro definitions for writing assembly programs for
Galaksija.
examples/ - Example assembly programs for Galaksija
Get the tarball here.
