Introducing DesiFilter

| | Comments (0) | TrackBacks (0)

I recently launched a new web tool called DesiFilter.

Like a lot of folks from immigrant communities, I tend to be hyper-aware of names from my culture. If I'm watching a movie, part of my brain goes "hey, wow!" when I see that the gaffer's backup caterer is named Banerjee or Patel or Khan.

DesiFilter sample results

South Asian American community journalists and bloggers will regularly do the same--scanning long lists of names to find community members involved in larger news stories. So I built a tool to help out, based on a list of over 26,000 uniquely South Asian first and last names I collected and hand-edited. (The word "Desi" is often used interchangeably with South Asian in diaspora.)

You just give DesiFIlter a URL or a bunch of text, and it'll find and highlight possible South Asian names. Commercial name ethnicity matching tools have been around for a while, and are used for things like targeted marketing and political campaigning. I believe this is the first such tool that handles South Asian names that's freely available to the public.

It wasn't particularly hard to build; the tech side (powered by Perl's Regexp::Assemble) was a breeze compared to the difficult task of collecting and refining name lists. South Asian names come from all over, so I ended up making a lot of awkward decisions to maximize usability in majority-Anglo countries, including throwing out most Anglo and many Portuguese names common in South Asia to minimize false positives. This means, for example, that it'll fail to identify John Abraham as a South Asian name. Short of a hard-to-build-and-visualize system of weights, I can't think of a much better solution.

DesiFilter got some big love on Sepia Mutiny. I'm currently working on some features to make it more useful to the folks over at the South Asian Journalists Association.

0 TrackBacks

Listed below are links to blogs that reference this entry: Introducing DesiFilter.

TrackBack URL for this entry:

Leave a comment