Ian Eure from Digg (also switching to Cassandra) gave a great rule of thumb last week at PyCon: “if you’re deploying memcache on top of your database, you’re inventing your own ad-hoc, difficult to maintain NoSQL database,” and you should seriously consider using something explicitly designed for that instead.
While there are other reasons to use Cassandra, such as the less-rigid schema or the best-in-class support for replication across multiple datacenters, most people are using it because a single machine can’t handle their query volume. This is the case for other distributed nosql databases as well, although there are other kinds that are also useful for some applications.
The price of scaling is that Cassandra provides poor support for ad-hoc queries, emphasizing denormalization instead. For analytics, the (in beta now) offers Hadoop map/reduce integration, but for high volume, low-latency queries you will still need to design your app around denormalization.
So, NoSQL systems are not drop-in replacements for a relational database. It looks like Twitter has been working on moving to Cassandra at least since Evan’s post in July last year; your application may be easier or harder to port, but it’s a useful data point to keep in mind.
For more on why the difficulty of scaling relational databases is driving developers to Cassandra, see the video or slides from my talk last week at PyCon, and for an introduction to Cassandra itself, see Eric Evans’ talk at FOSDEM, also available with video or slides.