Introducing Big Data Analytics and Archiving for Rackspace Cloud Databases

Today, Rackspace announced support new Hadoop ecosystem tools that allow for exciting connectivity between traditional relational databases and data platforms like popular big data services like Rackspace Cloud Big Data. By building integration between databases and data processing platforms, users can move data in and out systems to create technology stacks with multi-tiered functionality.

MySQL is an industry standard open-source database used as the backend of an innumerable amount of applications. It has risen to mainstay status in the common technology stack even representing the M in the popular Linux LAMP stack. There is no ignoring the widespread reach and prevalence of MySQL, and while MySQL is a reliable, consistent and predictable database backend, it lacks some of the flexibility, analytical functionality and horizontal scaling benefits of newer data platforms and NoSQL technologies. Previously, in order to gain the benefits of both types of platforms, companies had to setup expensive and redundant database environments and even hire teams to manage the flow of data between these systems to ensure that data wasn’t lost or corrupted. They also had to run sometimes-complicated ETL activities to ensure data format compatibility.

Apache Sqoop is a Hadoop Ecosystem tool that comes bundled in all the popular Hadoop distributions. Inside the Hortonworks Data Platform, Sqoop is a preferred method of transfering data and results in-between your Hadoop clusters and a relational database or service like Rackspace Cloud Databases. Apache Oozie is a workflow scheduler for Hadoop and helps handle ingestion and exporting on a desired cadence. These two tools allow for increased operation between relational database platforms and Hadoop clusters.

At Rackspace, cloud users have access to these data products inside their cloud control panel and can now easily interface with multiple data platforms inside the same data center and region, which alleviates concerns about latency and data loss. We have previously built connections between Cloud Big Data and Cloud Files and also between Cloud Big Data and our ObjectRocket MongoDB platform. These integrations help users bridge the gap between data environments and allow them to build modern data architectures that exist solely in the cloud. This has spawned net new use cases that did not exist due to the complexity of moving data.

Today, we want to talk about two exciting new use cases our users can build with simple clicks inside our user interface.

Analytics

While MySQL provides some analytical capability, it is often not the best choice for running analytic queries due to either the time it takes to return a query or the isolated nature of the database and its ability to scale. Hadoop has a steller reputation as an analytical engine and, through the use of mapreduce, it can culminate analytical outcomes over huge subsets of data held in clusters of servers. That said, you won’t find many production application engineers using anything outside of a relational database for their backend database. So how do we leverage the strengths of both platforms to have a high performiong relational database and also run periodic analytics? With this new functionality we can do exactly that. Sqoop will extract your relational data into HDFS (Hadoop’s native file system) in order for you to run analysis on top of the same data. The beauty of doing this in the cloud is that you can have a Cloud Database that is active all the time and only deploy Cloud Big Data when you want to run the analysis. You can also use Sqoop to put the results of your analysis back into the relational database. This will save users time and money by only having a Hadoop cluster for as long as they need to perform this operation.

Archival

As stated above, relational databases have some constraints in scaling mostly due to a guiding principle called ACID compliance. While this principle helps ensure that no data is lost, it presents challenges when users try to scale outside of a single server environment. Hadoop does not have this constraint and can scale over multiple machines yielding petabytes of storage. As a result, Hadoop has become popular for archiving information from databases with storage constraints. One popular use case is using Hadoop to archive material no longer needed by a relational database. This is where Apache Oozie comes into play. Oozie allows users to schedule these at whatever cadence they want.

Combined, these two tools allow users to setup self service archiving jobs, which means they don’t have to oversee or manage the operation and can feel confident their relational data is archived and safe before deleting it on a local machine.

These are just two practical examples of how you can use the new functionality of our Cloud Big Data Platform and Cloud Database service to maximize the use and performance of your data. If you have any questions please reach out to our dedicated data experts.

Rack Blogger is our catchall blog byline, subbed in when a Racker author moves on, or used when we publish a guest post. You can email Rack Blogger at blog@rackspace.com.

LEAVE A REPLY

Please enter your comment!
Please enter your name here