Connecting Your Data: Hadoop Summit 2014

Hadoop Summit kicks off today. And while I could start this post by providing a long diatribe about the momentum and disruption happening in the big data space, I feel I can spare you that point. This momentum is being felt across every sector and is addressing almost every new data workload coming into existence. The argument no longer needs to be made.

Big Data is here. It’s real and it’s running at internet scale. At the heart of the Big Data ecosystem is the Apache Hadoop community. This community is building a model of how to take a project from incubation to real world relevance in record time. It is becoming the ethos for a new way to innovate in technology utilizing a broad ecosystem of partners and developers to address the common problems users face and also the unique problems being voiced by the community. With this model, the roadmap and vision of the participants is expressly important to understanding the differences between the existing viewpoints.

At Rackspace, we have doubled down our commitment to open-source Hadoop. We deliver Hadoop at scale via our OpenStack-based cloud and allow users to access fully functional environments in a matter of minutes. This is due in a large part to our close partnership with Hortonworks and the amount of Hadoop expertise it brings to the table. We provide the Hortonworks Data Platform through managed services called Cloud Big Data Platform and Managed Big Data Platform. These offerings represent two focal deployment mechanisms we see prevalent in the Hadoop community: bare metal commodity clusters and on-demand virtual clusters in the cloud

With both platforms entering into general availability it’s an exciting time to revisit the work we have done on the platform and our overall contribution to understanding Hadoop design on OpenStack.

Here are three things we are really excited to talk to the community about at Hadoop Summit this week in San Jose, Calif.

Opening the Floodgates to the Demands of Big Data – You may have noted our platforms have been in a limited state. Big Data brings a new set of demands from our users that are different than your average cloud workloads. We heard very early on the demand for an offering to scale to petabytes in the cloud. We have had to balance this with a belief that Hadoop requires a special architecture to make it run best. We spent the limited availability period building out our capacity to meet these demands all while staying true to a Hadoop optimized platform. Cloud Big Data Platform is using local drives, 10GB networking and various other architecture enhancements that make it an ideal engine for Hadoop in the cloud. Cloud Big Data Platform and Managed Big Data Platform are now in general availability. We have been listening and we are ready to meet the sometimes-monolithic demands of our data centric users.

Expanding the Scope of Hadoop with HDP 2.1 – We are one of a handful of Hortonworks partners to be in production with the latest and greatest iteration of the Hortonworks Data Platform (Version 2.1). Available today on both platforms, HDP 2.1 represents a shift in how people leverage Hadoop. Our users want to understand how to address not only batch queries but also querying on top of streaming and interactive data sets. As the scope of this exercise increases, the enhancements in the latest versions of HDP become increasingly important. For a deeper dive into the enhancements in HDP 2.1 read our blog here.

Connecting Your Data – One thing we have learned from the community is that data platforms rarely exist in silos. Data needs to move between various platforms to get the desired end result, whether that is transactional scale, high performance or analytics. Moving data, and especially unstructured data, between platforms can be difficult. You may need to make a schema consideration or write a bunch of new code to ETL and manage data before it can be ingested in the new systems. This can be both taxing and expensive if you buy a third party solution, and can result in over architecting and multiple replication copies that have to be managed. To ease the transfer between some of the most popular cross-pollinated workloads, we have developed two connectors to improve interoperability between our Cloud Big Data Platform and other offerings. We have developed a connector that allows CBD users to process data that lives on Cloud Files (our OpenStack Swift-based file storage offering). Watch this video about how to apply the Swift connector to Cloud Big Data Platform. We have also just today released a new connector allowing users to house their data in MongoDB (ObjectRocket) and run analytical queries on that data store via a Post Init Script.

We are excited to showcase these new improvements along with listening to the Hadoop community at this year’s Hadoop Summit presented by our partners Hortonworks and Yahoo (June 3 through June 5). We will be in booth G1, so please stop by and chat with us.


  1. Really good piece of knowledge, I had come back to understand regarding your website from my friend Sravan, Hyderabad And it is very useful for who is looking forHadoop Learners.


Please enter your comment!
Please enter your name here