Big Data On Bare Metal: Don’t Sacrifice Performance For Flexibility

In the world of Big Data, bare metal is king. Many companies are seeking an architecture that allows for full utilization of resources like I/O and throughput, but we often hear from you that when it comes to Big Data you are forced to trade the advantages of cloud (elastic, on-demand, flexible) for the consistency and predictability of bare metal. We don’t think you should have to sacrifice one for the other.

When we unveiled OnMetal Cloud Servers earlier this year, we gave you the ability to spin up single-tenant bare metal servers in the Rackspace Managed Cloud. OnMetal servers are highly optimized, super scalable and available in minutes via API or Control Panel. We launched several flavors of OnMetal servers to meet a variety of workload demands. And one opportunity we identified was the benefit a bare metal offering like OnMetal could have on Big Data workloads.

Today, we’re combining the best of both worlds and giving you the opportunity to run Hadoop on OnMetal servers. Now, you have a choice of which platform makes the most sense for your Hadoop environment: cloud, dedicated or bare metal. Additionally, we’re adding support for the Apache Spark in-memory computing project. Rackspace is the first managed hosting provider to fully support the Apache Spark project – we have a team of Spark rock stars ready to help you harness this powerful new tool.

With that in mind, here are three reasons Rackspace Cloud Big Data OnMetal is a game changer:

1. Hadoop Is Hard

Open source technologies bring with them the promise of accelerated development and collaborative innovation, but institutional knowledge isn’t always readily available. One industry analyst said Hadoop is “free like a puppy,” meaning that while the software and tools are free and readily available, industry expertise and tribal knowledge is low. Even the smartest users can struggle to stand up Hadoop and even fewer are successful in optimizing it for their workload. The Rackspace Cloud Big Data Platform solves for this by provisioning a fully optimized and supported Apache Hadoop environment in minutes and with experts on hand and available to help you design and grow your clusters to meet the exact demands of your data.

2. No More Tradeoffs

Virtualized Hadoop in any form comes with a tax on resources, most notably I/O and throughput. Hadoop helps solve for this by provisioning resources over multiple instances allowing for horizontal scaling. Even so, this sometimes results in overprovisioning to meet the exact demands of the operation and an inherent performance penalty. Bare metal is ideal for Hadoop, but often there are unknowns or uncertainties when deploying it. Users don’t always know the demands of their workloads or are constantly changing design to further optimize. This can result in very costly mistakes or prolonged POC and testing scenarios. You shouldn’t have to make these tradeoffs. That is where Cloud Big Data OnMetal emerges as the killer solution for high demand scenarios where you need both flexibility and consistent performance. In preliminary testing, our Cloud Big Data OnMetal platform returned standard Terasort and DFSIO benchmarks in roughly half the time of traditional virtualized Hadoop.

Your queries can potentially be run in half the time using our OnMetal offering. Seconds matter in demanding scenarios like recommendation engines, machine learning and fraud detection; so be sure that you are implementing a solution that provides you the response time your analysis demands.

3. Hadoop Is Changing (Apache Spark)

Originally a batch processing application, Apache Hadoop was born out of Internet giants like Google and Yahoo to address analysis of extremely large sets of data. While traditional SQL and relational databases were fine at running analysis on small subsets of data very quickly, Hadoop was better at analyzing large sets that were problematic for legacy relational platforms. In reality, not many organizations have the amount of data that a Google or Yahoo have, but they are interested in learning more about their customers and optimizing their business. Apache Hadoop is evolving to address these changes with new tools like Apache Spark. Spark allows for data to be processed in-memory in a fraction of the time compared to traditional Hadoop. Apache Spark also provides additional tools and support languages. For instance, developers can interact with data in real time via Scala or Python Shells. And with the inclusion of high level tools like Spark SQL, Spark Streaming, MLib and GraphX, developers and data scientists can now take advantage of the most powerful tools from Spark for streaming, machine learning and graph-optimized applications. The main constraint is that Apache Spark can only process the amount of data that can fit into RAM, meaning that in order to run Spark successfully you need a lot of RAM and CPU. This is where our OnMetal offering really shines with 128GB of RAM and 2 x 8 Core processors. In addition, our OnMetal High I/O server had increased I/O and throughput plus a bonded 10GB network. Cloud Big Data OnMetal is the ideal architecture for running Apache Spark jobs. It is also an extremely performant platform for traditional Hadoop operations focused on addressing streaming, interactive and real-time querying. But don’t take our word for it: open up the sunroof and feel the wind in your hair by taking Cloud Big Data OnMetal for a two-month free test drive. For more information visit us here.

Rackspace is rising to the challenge to give open source developers access to the tools, platforms and expertise you need to be successful in the quickly evolving landscape of data technologies with our portfolio of open source database products including MongoDB, Apache Hadoop, Apache Spark, Redis, MySQL and Elastic Search (coming soon). These platforms are all back by Fanatical Support and experts who are committed to demystifying the fast-paced world of Big Data.

Come see us this week at the O’Reilly Strata Hadoop World in New York (October 15 to October 17). Pop by booth No. 221 to talk to our data experts, and while you’re there sign up for a chance to win an Xbox and Guitar Hero game.

And please join on me at 1 p.m. CDT Wednesday, October 29 for a live webinar during which I’ll discuss how Rackspace and OnMetal can accelerate your Spark workloads. Register here.

John Engates joined Rackspace in August 2000, just a year after the company was founded, as Vice President of Operations, managing the datacenter operations and customer-service teams. Two years later, when Rackspace decided to add new services for larger enterprise customers, John created and helped develop the Intensive Hosting business unit. Most recently, John has played an active role in the evolution and evangelism of Rackspace's cloud-computing strategy and cloud products. John meets frequently with customers to hear about their needs and concerns, and to discuss Rackspace's vision for the future of cloud computing. John currently serves as Chief Technology Officer. John is also an internationally recognized cloud computing expert and a sought-after speaker at technology conferences, including CA World, the Goldman Sachs Techtonics Conference and Cloud Expo. He speaks on the future of cloud computing, enterprise cloud adoption, data center efficiency, green data center best practices, and more. Prior to joining Rackspace, John was a founder and General Manager at Internet Direct, one of the original Internet service providers in Texas. John is a graduate of the University of Texas at San Antonio and holds a B.B.A. in Accounting.

2 COMMENTS

  1. This makes a lot of sense and the analogy I see is when developing code in Java or in a language like C or C++ which runs on bare metal. If you are looking for fast development and are not that worried about performance, Java will generally be used, however, if you need OnLine Transaction Processing (Old term, but still a goodie IMHO) levels of service, you’ll code in C or C++ (or even some of the other 3gl languages like COBOL) because it is compiled to machine code and run on the bare metal of the machine where it runs.

LEAVE A REPLY

Please enter your comment!
Please enter your name here