The world of data platforms is forging forward with increasing velocity. To stay relevant in today’s Big Data conversation, technologies must implement features and enhancements at a swifter cadence than legacy technology. The only way this is possible is by orchestrating the worldwide execution of an open ecosystem of participants. Consider Apache Hadoop; this level of advancement would not be possible without a broad network of developers and engineers working together to rapidly innovate to solve new problems. In addition to just fixing the issues users have with Hadoop, the community is changing the perception of how users can leverage it. Once a go-to tool for large batch processing jobs, Hadoop is changing to address the needs of multiple workloads simultaneously such as streaming and interactive workloads all done at the same level of scale of the original batch jobs.
One of the more notable updates to this ecosystem was the recent release of the Hortonworks Data Platform (HDP) version 2.1 last month. This version of HDP represents advancements that span every aspect of enterprise Hadoop from data management, data access, integration and governance, security and operations all while staying true to the core Apache Hadoop Foundation distribution. Some enhancements are transparent, like improvements of Apache Hive, Apache Tez and YARN through now complete the Stinger initiative, while to take advantage of others like KNOX and Falcon requires additional thought and architecture. These all culminate to provide a healthy enterprise update to the Modern Data Architecture of Hadoop.
Today we’ve made two key updates to our Cloud Big Data Platform powered by Apache HadoopTM. Still in limited availability, Cloud Big Data Platform is a Hadoop-based service aimed at providing all of the power of Apache Hadoop without the need to own, operate, troubleshoot and administer clusters. Cloud Big Data Platform allows you to execute Pig, Hive, MapReduce or YARN scripts directly to Data Nodes provisioned in the Rackspace Public Cloud all while interfacing with a 100 percent open Hadoop distribution with a sea of ecosystem partners already validated with Hortonworks.
Two notable changes have been rolled out today:
Self-Provisioning of up to three Data Nodes
Rackspace Public Cloud customers now have the ability to deploy up to three data nodes. Even though we are still in limited availability, all customers can spin up a small Hadoop cluster for testing, POC or smaller workloads. You can do this by visiting your Rackspace Cloud Control Panel and selecting the “Big Data” tab on the top of the navigation menu.
Once you are in this tab, simply hit the “Create Cluster” button and you will be given your deployment options. You can select the distribution of Hadoop you wish to deploy (1.3 or 2.1), the data node instance size (1.25TB or 10TB with approval), the geographical datacenter location and if you wish to connect to a Cloud Files container.
Once this is done, you can begin to interact with your Hadoop cluster in a matter of minutes. All of the components of the distribution are installed and ready to use and Hadoop experts are standing by for help. For guidance on pricing and best practices for deploying Rackspace Cloud Big Data Platform, you can visit our knowledge center article. Need more than three data nodes? No Problem, simply send an email to our Hadoop specialists.
Hortonworks Data Platform Version 2.1
We now offer the latest Apache Hadoop distribution from Hortonworks (Version 2.1) for deployment on Cloud Big Data Platform. We have outlined some enhancements in 2.1 above, but specifically for Cloud Big Data Platform you can now access features like
- Faster Query Times via the Stinger initiative. Stinger leverages the YARN architecture to process multiple jobs in parallel to improve the query experience in Hadoop.
- Apache Tez enables Hadoop users to address interactive workloads and query capabilities. Tez and Hive work together to deliver on enhanced performance of queries across the cluster.
- Secondary Name Node – Since the release of HDP version 2.0, you have been able to provision a secondary name node for fail over/redundancy. We have now added a secondary name node to the deployment of Cloud Big Data Platform for all cluster profiles. You do not need to change anything about how you deploy the solution, but you will notice a public IP and server instance during creation for secondary name node. This functionality removes name node from being a single source of failure and allows more production/live workloads to be operated in Cloud Big Data Platform.
These updates and additions to our Hadoop-powered Cloud Big Data Platform will make it easier for you and your business to leverage Big Data in meaningful ways. For more information or to join the limited availability program from Cloud Big Data Platform, click here or call your Rackspace support specialist.