12.03.2010 21:25
I receive approximately 104 spam messages per month to my personal email address (compare this to around 3000 in September 2007).
I've long ago abandoned all hope that I can hide the address itself from spammers and their crawlers by playing tricks with obfuscation and Turing tests. Now you can find it in clear on numerous sites. I'm still convinced that it's not worth it and I wouldn't turn back to obfuscation even if I started using a fresh address. It's a far too fragile defense. All it takes is a single breach - one web site not hiding the address well enough (you can't control them all!), one person with a spyware infested computer with your address in the address book - and most of the effort has been for nothing.
These days on average 5 spams per day will get through my more or less default Bogofilter setup. I don't know how many legitimate mails end up in the spam folder - it's impossible to check them all manually. Every once in a while I check a few tens of mails classified as spam that are least likely to be spam according to Bogofilter scoring. So far I have only seen a handful (less than 10) useful mails end up there and that was enough to keep me convinced that the false-positive rate is negligible.
I run the Bogofilter in constant learning mode and the database I'm currently using is now a little more than 2.5 years old (I think the previous one got corrupted in a power outage). While tuning some classification parameters I found that it has this peculiar characteristic:
$ bogoutil -H ./wordlist.db
Histogram
score count pct histogram
0.00 518443 19.51 ############
0.05 3923 0.15 #
0.10 5205 0.20 #
0.15 1910 0.07 #
0.20 1418 0.05 #
0.25 5231 0.20 #
0.30 1753 0.07 #
0.35 1069 0.04 #
0.40 2573 0.10 #
0.45 1113 0.04 #
0.50 2070 0.08 #
0.55 1509 0.06 #
0.60 1422 0.05 #
0.65 1316 0.05 #
0.70 1405 0.05 #
0.75 1327 0.05 #
0.80 1284 0.05 #
0.85 1346 0.05 #
0.90 1621 0.06 #
0.95 2101188 79.08 ################################################
tot 2657126
hapaxes: ham 318147 (11.97%), spam 1679040 (63.19%)
pure: ham 511489 (19.25%), spam 2099448 (79.01%)
I'm not sure how such databases dwelling in other corners of the internet look like. This histogram means that my legitimate mails have a very distinct vocabulary with words that almost never appear in spam. There are relatively few words that appear in both classes (only 1.7% out of 2.5 million!). I was expecting a much more continuous distribution.
I'm thinking some of this is probably due to a part of my mail being in Slovene (and Slovenian spam is thankfully almost nonexistent). But still not enough I think to justify such a result.
On the other hand, considering the excellent success rate of filtering, I should have expected an outcome like this.
Posted by
Tomaž
| Categories:
Code
|
Comments »
03.03.2010 21:37
This is what was left of a 1n4148 diode after an aluminum 47 μF electrolytic capacitor was repeatedly charged through it from a low-impedance source. Average power dissipation was well below its specified maximum.
Signal diodes don't survive peak currents much larger than their continuous current rating.
Posted by
Tomaž
| Categories:
Life
|
Comments »
01.03.2010 19:34
Here's another weird Perl quirk that has a potential to cause error messages in scripts which lead you into a completely wrong direction.
$ mkdir foo
$ perl -le 'print open(F, "<foo");'
1
To cite Perl documentation, "Open returns nonzero upon success". Obviously, this means that Perl thinks the open() call above succeeded. However the filehandle F is useless - all it ever does is return undefs.
So I guess this means that before every call to open() you should check if the argument accidentally points to a directory, so you can give a meaningful error message. Otherwise you might read a bunch of undefs from it without noticing.
Posted by
Tomaž
| Categories:
Code
|
Comments »
28.02.2010 20:20
A while ago I wrote about a method of sandboxing certain untrusted applications by using unprivileged user accounts.
Obviously Chrome browser and Skype from that example had to had access to the network to be useful. However applications today have a nasty habit of phoning home and sharing all sorts of data with its creators, some of which you might prefer to keep private. So for an untrusted application that has no business talking to the network its only logical to preemptively prevent it from doing that.
On a recent Linux system, it's really simple to do that, as long as the application is running under its own user ID:
# iptables -D OUTPUT -o \! lo -m owner --uid-owner foo -j DROP
What this does is drop all packets originating from a process owned by user foo and are not destined for the loopback interface. You can put this line into /etc/rc.local for instance to make the setting permanent.
Of course, just as with my previous post a warning is in order here. This will only prevent casual network transmissions from applications not specifically written to be resilient to such methods.
Actually, it's pretty easy to circumvent if you know what you're dealing with. Pings from /bin/ping for instance, will get through on my system, because that binary is set SUID root.
Posted by
Tomaž
| Categories:
Code
|
Comments »
26.02.2010 20:56
I've been wanting to design and build a new 50 W lab power supply for some time now. It has turned out to be one of those projects that you think will take a month tops and then the lack of time stretches it to half a year and counting.
The minimum requirements are 0 - 25 V and 2 A with an adjustable current limit. I've considered four approaches for such a design:
- A plain linear regulator,
- linear regulator combined with a transformer with multiple taps,
- linear regulator with a thyristor pre-regulator and
- switched-mode regulator.
These are pretty much sorted by ascending complexity and efficiency.
The power requirements are just barely within the reach of the first option. However that would require a big passive heat sink (I want to keep away as far away from unreliable fans as possible). Plus building a new device that would operate around 10 - 20% efficiency most of the time doesn't really feel right. So scratch that.
I spent quite a bit of time researching the second option. In fact, I have an almost completed design for it on the drawing board right now. It uses a two-tap transformer with a relay to switch between them - transformers with more taps aren't easy to find. The regulator part is roughly based on the 0-30 V power supply from Electronics lab.
Still, I'm not really happy with it. I have doubts about the longevity of the relay and worst-case heat dissipation is still uncomfortably high.
By the way, the original Electronics lab design is pretty broken in several ways and I strongly doubt that it meets its specifications - but that is perhaps a topic for another post.
I'm not going to even consider building a switcher for this purpose. It's noisy and has worse regulation characteristics than a linear design. Not really something I would want in a lab supply. Plus finding appropriate ferrite cores for switchers is always a pain.
So, right now I'm looking into a thyristor pre-regulator. There's a pretty good application note from Linear technology that has a basic circuit. It looks solid on paper and I'm going to give it a try tomorrow to see how it behaves in practice. If it works as advertised I'm more than prepared to go back to the drawing board with this.
Posted by
Tomaž
| Categories:
Analog
|
Comments »
22.02.2010 16:31
How to make a shell (er, Bash) script wait until a certain line appears in a log file? Sounds simple, but I have yet to find an elegant solution for this task.
A common use case for this is when you start a daemon that forks into background and you need to wait in the script until the daemon has finished doing something.
The following is the best I came up with:
tail -f $LOG | ( \
IFS=""
while read LINE; do
if echo "$LINE" | grep "$CANARY" > /dev/null; then
break
fi
done
pkill -f "tail -f $LOG"
)
IFS is unset here because it appears to help with buffering for some reason. Without that line the script will sometimes wait even after the $CANARY has been appended to the file. That can be problematic when the line you're looking for is the last one that will be written to the log.
The most obvious flaw here is that pkill will kill all tail processes, even those that have not been started from this script.
Any better solutions are most welcome.
Update: Thanks to Nace here's a better version of the script that is more carefull at killing the tail process:
PARENT="$BASHPID" # (Bash 4.x)
PARENT=`$SHELL -c 'echo $PPID'` # (Bash 3.x)
tail -f $LOG | ( \
IFS=""
while read LINE; do
if echo "$LINE" | grep "$CANARY" > /dev/null; then
break
fi
done
pkill -P "$PARENT" -xf "tail -f $LOG"
)
Posted by
Tomaž
| Categories:
Code
|
Comments »
26.01.2010 22:16
I spent the better part of the day in the Royal Air Force Museum in London (and the rest of the day traveling to and from it via various combinations of walking, trains, subways and taxis).
Needless to say there's a lot of amazing technology piled up in there and I would recommend a visit to anyone interested in aircrafts. In fact there are so many machines crammed in the old hangars that it's quite a challenge to make a good photograph of any single one of them.
The order of exhibitions is a bit chaotic concerning the time line. But that can be a plus - it's interesting to compare the size of a modern jet fighter with a Spitfire.
The nose of Eurofighter Typhoon.
Handley Page Victor
Bomb bay of Avro Vulcan
Boeing B17G
Posted by
Tomaž
| Categories:
Life
|
Comments »
23.01.2010 23:09
Here are a couple of conclusions of my electrical water heater monitoring project.
- On average I use 3 kWh of electrical energy per day on hot water. That's 90 kWh per month or roughly half my monthly electricity bill.
- For comparison, space heating consumes 1120 kWh of heat per month in winter (if I can believe what the district heating company is billing me). According to my last year's car calculations, I burned up 740 kWh worth of chemical energy per month in my car.
- Shifting all water heater's consumption to night time would save me 2.2€ per month or 26€ yearly.
Surprisingly, providing hot water seems to require an order of magnitude less energy than daily car commute and two orders of magnitude less than providing moderately warm living quarters. That's something I didn't really expect.
Accordingly, the savings are low too. A timer would perhaps pay itself back in two years or so if I don't count in my own time required to install it. And I doubt I'll still be in the same apartment two years from now.
So it's not really worth doing anything right now. At least not until there's a bigger difference in electrical energy prices between day and night or smart grid becomes reality.
Posted by
Tomaž
| Categories:
Life
|
Comments »
22.01.2010 18:51
I finally found the time to finish assembling the "Russian" tube clock I got from Dedek Mraz. It seems I'm starting to gather quite a collection of weird time keeping devices.
Assembling the kit was a no-brainer - instructions include three pictures for every component you need to solder and have you check parts of the circuit before continuing. The only slightly tricky bit was getting the tube to align nicely with the casing before soldering its (many) pins.
Considering the kit comes from US, it was a nice touch to include 24 h European time and date format (yes, the clock can also show the day of week and current date). The power brick only had an US power plug though. I replaced it with the EU one - the circuit itself already supported 230 V.
Also, the pictures don't really show how the display looks like. Mine glows in a blue-green color and isn't particularly bright (I have the brightness set to 55). If you remember old video recorders that used to blink "12:00" (they also used to have VFDs) - that's how the tube actually looks like.
One annoying thing I noticed is that it's hard to set the clock to the second. You can't wait for the wall clock to catch up with the frozen seconds display because the menu will time out in less than a minute.
This means it's harder to assess the accuracy. Also this circuit doesn't provide a trimmer capacitor to fine adjust the oscillator frequency. I had the idea to add a DCF77 receiver (it appears there's a contact provided on the PCB board for that). However the FAQ page says the boost converter is too noisy for such a receiver to work near the clock. If that's true, I wonder what other devices also are also affected by this interference.
Posted by
Tomaž
| Categories:
Digital
|
Comments »
20.01.2010 23:45
I went to see Avatar last week.
Yes, if you ignore the visuals it would be boring. But if anything this is the kind of movie to see for the pretty moving images alone. This also means that it was still worth watching even after hearing all kinds of spoilers - it's been in theaters for a month now and I overheard most of the story in various conversations before I even went to the cinema. Still, I very much enjoyed it. Pseudo-3D-induced headache that followed not that much.
Actually, one of the reasons I didn't go watch it sooner was because I didn't found the still frames on posters very appealing. Interesting how the perception changes when things are moving.
One pleasant surprise was that nothing I saw in the movie was outrageously outside the domain of possible. Slower-than-light travel, no artificial gravity, a planet with unbreathable atmosphere and aliens that don't speak English (and no universal translator) are all rare nice touches in mainstream Hollywood science-fiction. Ok, floating mountains are stretching that a bit, but explanation with the Meissner effect at least passes the first mental plausibility test.
The movie obviously features as much fictional biology as it does technology. There I found some weirdness harder to ignore. One thing that caught my attention was that principles of evolution seemed kind of broken. Take a chunk of Earth and the larger lifeforms walking around (including any humans) will look pretty similar: you know, four limbs, fur, two eyes, mouth, etc. But on Pandora, no animals are seen with fur, they breathe through their stomach and have different numbers of limbs and eyes. The Na'vi with their hair and human-like bodies seem out of place in that scheme.
Talking about the blue folk, it's interesting how their feline traits (large eyes, ears, tails) on the screen appear to unrealistically amplify your perception of emotion on their faces. I wonder why is that? Reading other people's feelings is basically an image recognition task and that's something brain is very well adapted for. I guess such a face combines the features from expressions you instinctively recognize from both animals (ears) and humans (facial gestures). Since these two things never appear on the same individual you can't experience their combined effect in real life. With a bit of additional exaggeration that's possible with CGI (for example pupil dilation) no wonder some scenes feel like emotion overload.
In the end I left the theater thinking that a realistic sequel to the story would be very short. In 10 years when the next ship from Earth arrives, somebody says Nuke them from orbit. It's the only way to be sure. But I guess that would be bad for ticket sales.
Posted by
Tomaž
| Categories:
Life
|
Comments »