Nerding out with Ruby, Tokyo Cabinet, Hpricot, Twitter, Sinatra, Haml & Passenger

[Update: The service described in this article has been relocated to http://gavinmcgovern.com]

I wanted to get back into Ruby and since it’s been many years I went for total immersion. Only one rule: don’t use anything I used in the past. So easy; everything is pretty much new to me now.

The project: grab the names of hot & happening bands and see which of them are getting any buzz on Twitter. What bands are getting talked about and where? Nothing fancy.

Hpricot

I use Hpricot to seed my list of bands. There are a tons of sites out there that offer lists of new bands, releases, etc. Hpricot provides a nice CSS selector search method, kinda JQuery-ish. One reason I went this route rather than with something like RSS/Atom is because I just want names. Extracting a band name from a feed’s body content is a whole other research project.

Twitter Search API

Once I have the band list I search Twitter for each one. For every matching Tweet I’ll do another little Hpricot scrape for the Tweeter’s location (“ul.entry-author>li>span.adr”, Hpricot makes it so easy.) I’m sure there’s some way of doing this using the Twitter API but I figured it’s public stuff anyway, no need to authenticate for it. Plus more Hpricot practice!

Tokyo Cabinet

All the search results end up in a Tokyo Cabinet database. Very simply, TC is a really fast key-value store. For my purposes I went with the schema-less table-based option, just rows of maps. I’ve been wanting to spend time outside the SQL world so Tokyo Cabinet is perfect for me. Plus building & installing Tokyo Cabinet on my Mac was painless. (Ubuntu was a tiny bit more complicated: ldconfig /usr/local/lib did the trick.)

Sinatra, Haml & Passenger

I keep a lot of data but I only do daily reports. Thankfully Tokyo Cabinet has a query method that lets me do simple filtering. Once I have that sliced up I just use standard Ruby methods to collect, count & sort the results. I have a very simple Sinatra app running under Phusion Passenger & Apache. It handles presenting the report and uses Haml & Sass for the templating.

Next Steps

As you can see there’s something a bit off (beside the old data.) Band names that are also common names or phrases have a great deal more mentions than the truly unique band names. Gomez and 50 mentions vs Dolby Anol and 8 mentions. But there’s good stuff in there! Top 3 locations for Gomez is Chile, United Kingdom and Jalisco. Makes sense: Gomez is a common Spanish name and a British band.

There are various approaches I can take with the search results to handle false positives. Thankfully the Ruby world is full of possibilities. More on this later.

One more addition. The report is just a snapshot in time. I’d like to add some history so I can get a better idea of activity. Is the band buzzing up? Buzzing down? What sorts of trends can we see?

I’m excited. My last project with Ruby was in 2003. The Ruby world of today is almost unrecognizable, in fact, the only thing still around that I remember is Rails!

This entry was posted in Uncategorized and tagged . Bookmark the permalink. Both comments and trackbacks are currently closed.