Twitter apps sing on MongoDB

As we’ve previously mentioned, at Squeejee, we’re big fans of MongoDB. We’re excited that some very smart people are starting to take notice of Mongo, too.

In this first post of what we hope will be a helpful introductory series, we’ll cover the use case of using a document store like MongoDB.

Mashups are all the rage

It’s rare that we are presented with a new project from a prospective client these days that doesn’t pull data from some third-party API. Twitter seems to be the most popular choice of late, perhaps because it adds value both as a content provider and a transportation medium. One of the challenges of integrating with Twitter in any large-scale way is creating your own local cache of Twitter data so you can ration those precious API calls and avoid their API rate limiting. When we built our own mashup, Tweet Congress we wanted to integrate with a number of third-party Twitter services that each pulled data from Twitter, analyzed it, and passed that value on via their own APIs. The common thread was that each of these services exposed data as JSON (JavaScript Object Notation). Taking these deep, rich data structures and breaking them out into relational database tables seemed silly (not to mention tedious). With the relational approach, if a service exposed a new value that we wanted to consume without fighting our framework, we’d have to extend the schema on our end to capture it.

Schema-less is more

MongoDB is a high-performance, open source, schema-free document-oriented database. This means we stash those deep data structures we get back from those nifty web services more easily. MongoDB uses BSON to store your documents (or objects). This BSON smells less like a buffalo and more like JSON, which you already know. The main difference is that BSON can serialize binary data (images, and even Word documents, oh the irony!) and dates. This schema-less storage means your app can store all that new data TwitterCounter returns you tomorrow without needing to change your database.

Hey, ever heard of CouchDB?

Certainly CouchDB (and many other tools that predate it), does document storage and does it well. We are still big CouchDB fans, however we found that querying CouchDB dynamically was a lot like getting a question answered at the DMV. CouchDB shines when you know how you’re going to query the data up-front, in static views of schema-less data.

MongoDB provides all that schema-less goodness without giving up any of the rich dynamic querying you get with a relational database. More on this in our next post.

Back to our Twitter mashup

Let’s talk specifically about Twitter for a moment. The Twitter API has methods to retrieve the social graph for a particular user, returning an array of integer user IDs for the user’s friends or followers. With a relational database, you’d normally insert a row for each of those values. So a user with a thousand friends would get a thousand database rows, each with something like user_id and friend_id (and id if you’re using some Rails magic). With MongoDB, we can just stash that array with the object itself — and query it: :conditions => {:friend_ids => current_user.id}

Fields on-the-fly

Another benefit of all this schema-less goodness is your application can define fields at run-time. On our Floxee platform, we support different custom fields on a per-site basis. Due to the magic of MongoDB, we can have custom fields for party, state, and district for all the congressfolk in Tweet Congress and fields like skills and soda_preference on our own Squeejee twitter directory. Just upload your CSV with some custom columns and we’ll create those fields on-the-fly (after sanitizing them of course).

More to come

In our upcoming posts we’ll cover Ruby bindings, performance, deployment, and more. Stay tuned!

  •