Hadoop is the leading open-source project for computation and supporting infrastructure (such as HDFS, the Hadoop Distributed FileSystem based on the ). The 2008 Hadoop summit saw about 150 attendees; 2009 had literally five times that number. I am not a Hadoop expert but as a Cassandra developer, I’m interested in meeting people working with large datasets and there was no better place for that than the Hadoop summit.
Hadoop summit videos are not out yet, but should be soon. My favorite talks were the ones on Amazon Elastic MapReduce, Pig, and Hive. (At The Rackspace Cloud, we compete with Amazon but I have to give them credit for their talk!) Pig and Hive are both projects that offer a higher-level language for writing MapReduce jobs, with slightly different approaches. We use Pig internally.
I should also mention that the first 500 people to register at the Hadoop summit were given a free copy of Hadoop: The Definitive Guide. I would recommend this for anyone looking for an introduction to both using and administering Hadoop.
The NoSQL conference the next day featured an overview of a half-dozen of the most interesting open-source distributed databases, and CouchDB, which is targeting scaling down to mobile devices rather than out to hundreds of servers in your datacenter. NoSQL videos are up, and of course I have to point out the comment calling the Cassandra presentation (by Avinash Lakshman of Facebook) “hands-down the most interesting.” Besides ours, I would recommend Todd’s overview as well as the Voldemort and HBase talks. Yes, there are cases I would use one of those instead of Cassandra, but that’s a subject for another post! (In the meantime, Toby Negrin from Yahoo posted some notes on each.)
Want to know more about Hadoop at Rackspace? Be sure to check out this video interview from building43.com.
If you want to try out Hadoop or a distributed database but don’t have a cluster of your own, visit our Cloud Servers page for more information.