Anirvan Chatterjee: August 2009 Archives

I recently spent a month using Yahoo instead of Google as my default search engine. (Incidentally, the "Google's better, because the Yahoo home page is too busy" argument is bunk--Yahoo's search page at is every bit as clean and simple as Google's.)

I was surprised at how decent Yahoo's search was. Though Google's ranking still seemed to make more intuitive sense, Yahoo did a reasonably good job throughout. Unfortunately,I never felt like Yahoo's search was actually doing a noticeably better job than Google's. It was just a series of minor disappointments when I noticed results that Google could have done better with.

Yahoo's typo detection is very poor, compared to Google's:

  • When I searched for "eage book mod_perl" (with "eagle" mispelled "eage"), Yahoo didnt catch my typo. Google caught the error, and gave me exactly what I wanted--information on Practical mod_perl, a.k.a. the "Eagle" book.
  • When I searched for "ighthouseapp", Yahoo didn't figure out that I wanted to find out about the Lighthouse product, located at Google did.

Some of the results were just bizarre, but have since been fixed by a reindex:

  • I searched for "yahoo india news" and the fourth results down on the search results was a nonexistent page on Yahoo's own site, leading to a 404
  • For some reason, results from were coming up absurdly high for a wide variety of results. It looks like this has been addressed now.
I was also surprised at how tech-centric Google is, vs. Yahoo:

  • When I searched for "puppet" on Yahoo, the first page of results referred to, well, puppets. Over on Google, the first two hits were for the open source system administration software called Puppet. (Of course, in this case, I really was looking for the Puppet software, so the point goes to Google, but the bias is interesting to note.)

My biggest practical irritation with Yahoo Search wasn't with the web search itself, but the lack of an integrated blog search. I frequently jump to Google's blog search when I'm trying to find out what people are saying about something. Not having that be just a click away changed the way I interacted with the web, very much for the worse.

The Yahoo challenge was fun, but it actually made me appreciate Google even more. Next up, I'm looking forward to trying Bing for a month. When doing side-by-side tests, it seems to return significantly more relevant results than Yahoo Search (which is likely one good reason for the recent deal). I may end up back at Google, but I want to know that I'm using it for the right reasons, and not just laziness-induced lock-in.

August 15 is India's Independence Day. B and I spent the day marching to save South Asia in Richmond, California, while keeping an eye on the India Day parade in New York.

We spent our day in Richmond, California, participating in a mobilization to turn up the heat on Chevron, a major California polluter, and part of a group of companies that's been trying to thwart the best efforts of citizens from around the world trying to come up with a fair, realistic, science-based approach to dealing with global warming.

The mayor was there, as were labor activists, environmentalists, community health organizations, and representatives from communities around the world where Chevron has a presence (Burma, Nigeria, Ecuador, etc.) The message: Chevron, the 5th largest corporation on the planet, needs to stop poisoning the communities it operates in, and stay out of the climate talks, where the future of the planet will be decided.

In Bangladesh and West Bengal, rural communities are dealing with declining fish catches, and bigger floods and droughts. In Bangladesh today, climate refugees are losing everything they own. This is not theoretical. The front-line affected communities of Bangladesh and West Bengal didn't invent this problem; many of those affected don't even have reliable access to electricity. Man-made climate change was caused by developed nations like the U.S., and we need to take a leadership in dealing with the issue; things won't magically change by themselves.

On the other side of the country, the tri-state Federation of Indian Associations was holding its annual India Day parade in New York. The last year was particularly significant for people of Indian origin, as the Delhi High Court just overturned Section 377, the 150-year-old British-era legislation that criminalized gay and lesbian Indians. Unfortunately, the organizers of the India Day parade didn't get the hint, and refused to let Indian gays and lesbians, their friends, and allies celebrate this momentous victory as part of the parade. (Yup, you can have a Pride parade in Bombay, but you can't have a gay float in New York.)

The Indian-American community supports the Indian government's decriminalization of homosexuality, and B and I did our very small parts to directly question the organizers about their stances. A feminist organization ultimately invited LGBT community members to march with them; I wish we could have been there to cheer them on.

The ultimate result? Loads of happy Indian-American LGBTQ people in the parade, and public embarrassment (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) for the organizers. While parade organizers have previously gone to court to keep gays and lesbians out, the Indian government's change of stance is a game-changer. New York India Day parade organizers can choose to wallow in their cultural conservatism in New York, or change with the times as India moves ahead.

(photos by Roopa Singh)

I'm a big fan of Arc90's Readability tool, which "makes reading on the Web more enjoyable by removing the clutter around what you're reading." It identifies the main body of the article or blog you're reading, re-presents it using an easy-to-read stylesheet, and hides everything else. It's a clever app, and I use it almost every day.

I needed to be able to pull out the main content of a web page for a personal project; it took me a few days till I realized that Readability does exactly that, and that Arc90 actually encourages ports to other platforms.

I just released HTML::ExtractMain, my Perl rewrite of Readability's content identification strategies. It's online at CPAN, and free to use under standard open source licenses. It's been a while since I released code as open source, and it feels good to be able to scratch my own itch while sharing code with other developers.


Anirvan Chatterjee is a San Francisco Bay Area tech geek and bibliophile.


Enter your email address:

About this Archive

This page is a archive of recent entries written by Anirvan Chatterjee in August 2009.

Anirvan Chatterjee: July 2009 is the previous archive.

Anirvan Chatterjee: September 2009 is the next archive.

Find recent content on the main index or look in the archives to find all content.

Recently read