Tech: August 2009 Archives

I recently spent a month using Yahoo instead of Google as my default search engine. (Incidentally, the "Google's better, because the Yahoo home page is too busy" argument is bunk--Yahoo's search page at is every bit as clean and simple as Google's.)

I was surprised at how decent Yahoo's search was. Though Google's ranking still seemed to make more intuitive sense, Yahoo did a reasonably good job throughout. Unfortunately,I never felt like Yahoo's search was actually doing a noticeably better job than Google's. It was just a series of minor disappointments when I noticed results that Google could have done better with.

Yahoo's typo detection is very poor, compared to Google's:

  • When I searched for "eage book mod_perl" (with "eagle" mispelled "eage"), Yahoo didnt catch my typo. Google caught the error, and gave me exactly what I wanted--information on Practical mod_perl, a.k.a. the "Eagle" book.
  • When I searched for "ighthouseapp", Yahoo didn't figure out that I wanted to find out about the Lighthouse product, located at Google did.

Some of the results were just bizarre, but have since been fixed by a reindex:

  • I searched for "yahoo india news" and the fourth results down on the search results was a nonexistent page on Yahoo's own site, leading to a 404
  • For some reason, results from were coming up absurdly high for a wide variety of results. It looks like this has been addressed now.
I was also surprised at how tech-centric Google is, vs. Yahoo:

  • When I searched for "puppet" on Yahoo, the first page of results referred to, well, puppets. Over on Google, the first two hits were for the open source system administration software called Puppet. (Of course, in this case, I really was looking for the Puppet software, so the point goes to Google, but the bias is interesting to note.)

My biggest practical irritation with Yahoo Search wasn't with the web search itself, but the lack of an integrated blog search. I frequently jump to Google's blog search when I'm trying to find out what people are saying about something. Not having that be just a click away changed the way I interacted with the web, very much for the worse.

The Yahoo challenge was fun, but it actually made me appreciate Google even more. Next up, I'm looking forward to trying Bing for a month. When doing side-by-side tests, it seems to return significantly more relevant results than Yahoo Search (which is likely one good reason for the recent deal). I may end up back at Google, but I want to know that I'm using it for the right reasons, and not just laziness-induced lock-in.

I'm a big fan of Arc90's Readability tool, which "makes reading on the Web more enjoyable by removing the clutter around what you're reading." It identifies the main body of the article or blog you're reading, re-presents it using an easy-to-read stylesheet, and hides everything else. It's a clever app, and I use it almost every day.

I needed to be able to pull out the main content of a web page for a personal project; it took me a few days till I realized that Readability does exactly that, and that Arc90 actually encourages ports to other platforms.

I just released HTML::ExtractMain, my Perl rewrite of Readability's content identification strategies. It's online at CPAN, and free to use under standard open source licenses. It's been a while since I released code as open source, and it feels good to be able to scratch my own itch while sharing code with other developers.


Anirvan Chatterjee is a San Francisco Bay Area tech geek and bibliophile.


Enter your email address:

About this Archive

This page is a archive of entries in the Tech category from August 2009.

Tech: June 2009 is the previous archive.

Tech: November 2010 is the next archive.

Find recent content on the main index or look in the archives to find all content.

Recently read