SYN trickle

24.02.2011 16:34

Back on Monday I started getting possible SYN flooding on port 25. Sending cookies. warnings in the kernel log of my server. Investigating with tcpdump I got the following trace:

03:49:48.313336 IP 203.81.64.yyy.9204 > S 846930886:846930886(0) win 61690 <mss 1460,nop,nop,sackOK>
03:49:48.313493 IP > 203.81.64.yyy.9204: S 1717881734:1717881734(0) ack 846930887 win 5808 <mss 1452,nop,nop,sackOK>
03:49:51.681833 IP > 203.81.64.yyy.9204: S 1717881734:1717881734(0) ack 846930887 win 5808 <mss 1452,nop,nop,sackOK>
03:49:58.081830 IP > 203.81.64.yyy.9204: S 1717881734:1717881734(0) ack 846930887 win 5808 <mss 1452,nop,nop,sackOK>
03:50:10.881785 IP > 203.81.64.yyy.9204: S 1717881734:1717881734(0) ack 846930887 win 5808 <mss 1452,nop,nop,sackOK>

I counted 28 different IP addresses from the network ("Myanma Post and Telecommunication" according to whois) sending TCP SYN packets to my port 25, but not answering to repeated SYN ACKs my machine is sending back. After what looks like a timeout the remote host will try again with a new SYN. When I started monitoring the rate was well below anything serious at a few packets per minute.

What could be causing such traffic? It looks to me like SYN ACKs from my server are dropped somewhere along the line. Perhaps spam bots on compromised machines and a failed attempt at SMTP filtering that drops in-bound packets instead of out-bound?

Posted by Tomaž | Categories: Code | Comments »


21.02.2011 22:55

After almost two months of waiting, my Kindle (Wi-Fi only version 3) arrived by mail mid-January. I've been using it daily for a while now and can give a few comments on it.

First, I should mention that it worked out of the box (ha!). There was no need to fix any solder joints, apply duct tape or add cardboard before use, which is quite refreshing. That's even better when you consider that there are no obvious non-destructive means of disassembly.

6 inch model of Amazon Kindle 3

The most important part of it is the electronic ink display, of course. From a normal reading distance it looks really good and reminds me of reading text from a glossy magazine page. Or maybe something printed by a laser printer on white glossy plastic. It is certainly way ahead of any backlit LCD display I've seen. It's also different from your normal paperback, but not in a bad way. The glare can be annoying and I had to re-adjust my night-time reading lamp a bit to get the optimum contrast.

When you look closer the fonts break up into surprisingly large pixels. Most characters are anti-aliased though and look just fine from a normal reading distance. In menus and user interface there's also some ghosting noticeable (like a shadow of the previous image displayed). But I haven't seen that in book reading mode, which I guess uses some slower but more accurate way of refreshing the screen.

The 6 inches in the diagonal is plenty for comfortable reading, but sometimes I wish it would have a thicker bezel for a better grip. Probably that's one idea behind all those covers and sleeves you can find on the market.

I'm using my Kindle to read some of the books from project Gutenberg and for longer articles from the web via Instapaper. So far it performed that task wonderfully. I've also tried the PDF support and the web browser, but those two are better left to a PC. Both require way too much scrolling which is a pain with the slow refresh of the eInk display. Even in Google Reader I simply skip or just glance through too many articles for the experience to be enjoyable on the Kindle. It's better to put longer texts in a queue with Instapaper during the day and read them in comfort on the Kindle in the evening.

I haven't registered the device with Amazon and right now I don't plan to buy any of their DRM-encumbered books. I can't say I will never do it - maybe after the current to-read list from Gutenberg runs out. I had a plan to brush up on my German and read a book or two with the help of the built-in dictionary but it turns out there are no German-to-any-language-I-know Kindle dictionaries available. There's no question I'll keep using paper books. As far as I know there are no Slovenian books in the Kindle store and all non-fiction literature will stay either on dead trees or on the PC.

Finally I should mention that after the initial charge the battery lasted for a bit over a month, which is quite shocking considering most of my other battery powered devices need daily charging. Note though that after experimenting with it I only switched the wireless LAN connection on occasionally to download a new issue of Instapaper.

Maybe we don't have a city on the moon and a manned mission to Jupiter, but at least one prediction from those old movies appears to be coming true.

Posted by Tomaž | Categories: Life | Comments »

To Mac and back again

12.02.2011 23:33

I used to be a Mac user. I enjoyed using my Powerbook G4 running OS X Panther and I have fond memories of it. Before things got sour that is.

Recently I remembered that that laptop, now gathering dust in a drawer, holds many files I still care about. Past projects from the Faculty, photos, old documents, all remaining to be stored in a single copy on an aging hard drive that hasn't spun in years (I noticed a while ago my last backup on an external drive got corrupted - for an unknown reason a file in a compressed set went missing)

So I powered it up, copied the files to my current computer - and was left with two home directories three years apart that needed merging. Some of the stuff I already copied from the Mac as I needed it. Some was later modified. Some was simply duplicated in different places. Do you see where this is going? In hindsight I would have saved myself a lot of work if I simply moved everything off the computer as soon as I stopped using it.

There are tools like fdupes that attempt to find duplicate files and offer to delete or hard-link the copy. But of course, it can't be that simple.

The culprit is modern software that is too smart and modifies files it shouldn't really touch. F-spot will for instance modify EXIF data in images it tracks in its library. On the other side, iTunes on Mac modifies ID3 tags in audio files it sees. Solution? Lots of shell scripting to strip metadata from files before comparing them (ExifTool and id3v3 came handy).

Speaking of images, it turned out that at some point the clock in my camera got reset to 2001-01-01 and I made around a hundred photos before noticing, which means all the dates in EXIF data have an unknown offset. Before moving everything to F-spot I wanted to correct that. Browsing through the collection I found several shots from the FOWA 2007 conference. I thought that maybe if I could find a schedule and find that particular lecture I could calculate the clock offset and correct it.

Well, scratch that idea. FOWA 2007 may have well never happened as far as the official conference site is concerned and my blog post about it came on top of most Google searches I tried. I did however find the exact same shot as I did on Flickr. From several cameras in fact, and their EXIF dates all agreed on the time to within a couple of minutes. So hurray for multitudes of redundant photos on the web!

At this point I should also mention that iPhoto stores dates in seconds since 2001-01-01 in the DateAsTimerInterval field in AlbumData.xml file (which is NOT the UNIX timestamp format, as some people on the Internet would have you believe). Here's a Python snippet to convert it to something more sane:

import datetime
print datetime(2001,1,1) + timedelta(seconds=DateAsTimerInterval)

By the way, I'm not the first one to go on this path. Donald seemed to have done the same hundred steps himself and wrote the iphoto2fspot. I didn't use it though because my F-spot collection is organized in a completely different way than what I had in iPhoto and most of the metadata there was useless anyway. Also, it parses the XML file with a regular expression.

Posted by Tomaž | Categories: Life | Comments »

Treacherous waters

09.02.2011 21:44

These days (or rather months) my daily work at Zemanta often takes me to the part of the web I would not normally visit. Its shadier parts so to speak. And it turns out that those are surprisingly crowded these days.

I'm sure you've been there. Probably when you were searching for some useful piece of information and such sites cluttered the top of the result list and you had to sort through piles and piles of fluff before you found what you were looking for. Or maybe it was recommended by a friend through one of the many channels such recommendations travel in the age of social web. Perhaps you even had to deal with a bug report because a piece of your web-facing software, while compliant to all relevant standards, didn't perform up to some user's satisfaction when dealing with such a web site.

Dark tunnel that is HTML 5

Imagine for a second the stereotypical web site of this class: fixed width design, unreadably small, gray type on white backgrounds. Left-top logo in saturated colors and gray gradients, courtesy of web-two-point-oh. Probably the definitive destination for some wildly popular topic right off the first page of your typical yellow press (celebrities, health, cars, shopping) or emerging interests (say Android development). At most two paragraphs of actual content and the rest filled with ads, user comments and everything in between. And of course at least 10 ways to share this page on all social networks you know about plus 10 more that you don't.

Considering that serving those abominations of the web is the only thing the companies behind them do, they are surprisingly incompetent about it. Pages won't validate or will throw a hundred Javascript errors from tens of different scripts that load behind the curtains. The little content there is was scraped from the Wikipedia or it looks like someone from a less fortunate country was hired to copy-and-paste a few statements on a prescribed topic from all around the Internet. Everything under a CC license is considered free-for-all (but don't dare break their lengthy personal-use-only terms of use!). Nobody cared about the fact that there is anything other than ASCII encoding or the subtleties XML parsing or for that matter even that the description of software product and an image that shows a porn star of the same name do not refer the same thing. As long as half a dopamine-starved human brain is able to decode it it's good enough.

What's puzzling at first is that such sites seem to be getting a shocking amount of traffic (and probably revenue as well). Of course, the opinions about the quality differ. Even among my colleagues some consider such sites valuable destinations. They have comment buttons and you can share them on Twitter! They're way more fun to visit than some tired old Wikipedia that doesn't even have Facebook integration. Regardless of the fact that any user-contributed discussion is as devoid of actual content as the site itself.

What I see in such pages is an evolution of link farming. Social farming if you will. Search engines have gotten better at detecting content that has just been blatantly and automatically copied from somewhere. So an up-to-date spammer, er, vertical influencer has switched from a website copying bot to a few mechanical turks producing syntactically unique but semantically carbon-copied content. The network effect of modern social networks brings more and more people to the site, producing worthless comments that again give the appearance of a respectable site. At this point they are trying to trick an algorithm by introducing living people into the content copying process.

Therefore you can hear a lot about how the traditional search engines will in time be completely replaced by your social network, introducing wetware on the other side as well. The idea is that natural language processing and information retrieval won't be able to distinguish between what you would consider a reputable site and a link-farmed site that approximately copied content from that reputable site. But your friends in a social network will. First because they are (hopefully) human and can understand things AI can't and second because you share their interests and trust what they trust. They can therefore in theory push more useful information in your direction than some algorithm, even when it is intimately familiar with your click- and search history.

However I think the sites I described above are a perfect example why this scheme won't bring any reduction to the fluff you will need to go through before getting to the information you want. At this point you even don't have to fool search engines any more because once you've got people clicking those little "share" buttons they will bring more visitors to your site and push your coupon codes and endorsements and whatnot regardless. In the end it's just as easy to subvert the human social network to pass around unsolicited advertisements as it is for a software algorithm. You just need a different kind of engineering.

Posted by Tomaž | Categories: Ideas | Comments »