Should You Switch To NoSQL Too?

With Twitter going public with their plans to switch to Cassandra, a lot of people are asking if they should switch too.

Ian Eure from Digg (also switching to Cassandra) gave a great rule of thumb last week at PyCon: “if you’re deploying memcache on top of your database, you’re inventing your own ad-hoc, difficult to maintain NoSQL database,” and you should seriously consider using something explicitly designed for that instead.

While there are other reasons to use Cassandra, such as the less-rigid schema or the best-in-class support for replication across multiple datacenters, most people are using it because a single machine can’t handle their query volume.  This is the case for other distributed nosql databases as well, although there are other kinds that are also useful for some applications.

The price of scaling is that Cassandra provides poor support for ad-hoc queries, emphasizing denormalization instead.  For analytics, the upcoming 0.6 release (in beta now) offers Hadoop map/reduce integration, but for high volume, low-latency queries you will still need to design your app around denormalization.

So, NoSQL  systems are not drop-in replacements for a relational database.  It looks like Twitter has been working on moving to Cassandra at least since Evan’s post in July last year; your application may be easier or harder to port, but it’s a useful data point to keep in mind.

For more on why the difficulty of scaling relational databases is driving developers to Cassandra, see the video or slides from my talk last week at PyCon, and for an introduction to Cassandra itself, see Eric Evans’ talk at FOSDEM, also available with video or slides.

Related Posts: The Cassandra Project, How Do You Put Database In The Cloud?NoSQL Ecosystem

  • I agree with Ian that people mis-characterize their existing MySQL+Memcache architectures, but calling them “Eventually consistent” is wrong. They have no guarantees of converging on consistency.

    At Twitter we call such a system “Potentially Consistent”. 🙂

  • Using memcached to cache the results of SQL queries is a largely solved problem. Throwing out your SQL database based on the “well we are using memcached” “rule of thumb” means you lose the ability to populate your cache with SQL-based results, and also means your entire datamodel has to throw out ACID. You might need ACID, and you might need caching of query results. Throwing memcached on top of that by no means means you’re reinventing Cassandra.

    • Jan Lehnardt

      NoSQL does not mean that you automatically lose ACID.

  • Hi Jonathan!

    As you probably know ( 😉 ), the Drizzle folks actually see NoSQL solutions as a great partner technology. SQL and NoSQL needn’t compete. They solve different problems, and the truly innovative projects will recognize this fact and work closely together so as to integrate as seamlessly as possible and get the best of both worlds.

    Here’s to working together to solve the world’s problems! 🙂


  • Pingback: Scalability links for Feb 28th 2010 | Scalable web architectures()

  • @Jay, yes, we’ll have to come up with a new category name. And NoSQL was so catchy, too. 🙂

  • Jeo

    What keeps me from using Cassandra right now is the lack of atomic updates. There’s no way to do a reliable counter, unless you just save all your increments as separate items and sum them up when you read. The guys at Digg are helping to work toward fixing that.

  • Randy

    Another good NoSQL approach is

  • Steve

    “…Cassandra provides poor support for ad-hoc queries, emphasizing denormalization instead.”

    This suggests denormalized data is the cause for poor support of ad-hoc queries. Denormalization is what is dine in a typical data warehouse implementation to facilitate easier ad-hoc queries.

    By removing the need to join tons of tables a subject area can be represented in a few dimensions and one fact or more fact tables. This makes it easier to write queries since you don’t need to understand the normalized structure to get what you want.

    Maybe there is a better term to use?

  • See this, esp. the video, to judge the advantages of NoSQL for yourself… and to get a good laugh 🙂

  • Pingback: Cassandra ライブ情報がテンコ盛り – Jonathan Ellis @ Rackspace [ #cassandra #nosql ] « Agile Cat — Azure & Hadoop — Talking Book()

  • Pingback: