“Big Data” is one of the biggest buzzwords in the technology industry today – you can’t read the tech press or look at social media without seeing it. But what do we mean by Big Data, and what’s wrong with our old friend “little data” that has served us well for 30 odd years.
We are constantly bombarded with information, and making sense of it is becoming a challenge both on an individual and an enterprise scale. As individuals we now have access to a multitude of information sources, communication methods and purchasing options. Choosing between them and making sense of it all can be a headache. This has been building for the last decade or so and has now reached fever pitch. So much data, so little time!
Of course the corollary of this is that enterprises now have a great opportunity in front of them. How do they harness the data at their disposal to identify the right information source, the best communication method to entice the individual to buy from them and not their arch rival? These new data sources do not easily fit into a structured, relational database format and often require an entirely different approach to how data is stored, manipulated and understood. For instance, Twitter feeds or Facebook pages, where data is textual and can be mined for sentiment, connections or purchase intention. This type of information does not easily map into a structured database.
Welcome to the world of Big Data!
But how did we get here? For three decades now most companies have held all of their essential information in some form of structured relational database, such as SQL — from stock control, to customer details, to employee records, to customer payments and everything in between. SQL is perfect for this and a whole industry has grown up around it. Numerous software and hardware vendors have created a massive ecosystem designed to help companies keep all this data in check, so they can keep the lights on and make their shareholders happy. This technology is now essential and integral to any organization.
The fundamental structure behind these databases hasn’t changed much over the decades and enterprises have invested heavily in the technology and staff to operate them. The typical enterprise today has many database administrators and data scientists well conversant in SQL, for instance, a tool they use every day to help retain control over and gain a deep understanding of their data.
However, if we were to overlay the explosion of social media and other types of data on top of these traditional SQL databases these systems would quickly go into meltdown.
On the one hand, for the core enterprise systems we have a mature industry built around the likes of SQL databases with an ecosystem of suppliers and many skilled administrators and technicians. On the other hand, for new unstructured data sources we have an emerging technology built around the likes of the Hadoop file system, for instance, with a relative scarcity of people available to run a Map Reduce job to help make sense of all this new data. And where do you start if you want to integrate the two?
So what’s to be done? Do we abandon SQL and put all data, old and new, into a new Big Data format so it’s all in one place and retrain all those skilled administrators how to run map reduce jobs? We are all far too heavily invested in the SQL ecosystem for this to be a viable option.
For Big Data sources to gain traction in the enterprise, and add maximum value, they must be as easily accessible and analyzed as all those old SQL databases – not forgetting that for maximum business value they must be cross referenced against each other. Tools that allow Big Data sources to be analyzed as if they were SQL databases (where this is possible) would provide a great bridge between the old and the new. There is already a drive towards making DBAs and data scientists more able to consume and interact with big datasets in a SQL-type way to make them feel more comfortable with the inevitable change
SQL is not dead, and likely won’t be for some time, but things are changing – fast. One way to preserve both SQL and its ecosystem is to leverage tools to access unstructured data using SQL tools. Long live SQL.