YeSQL: An Overview of the Various Query Semantics in the Post Only-SQL World

Filed in by Angela Bartels | July 14, 2010 1:54 pm

This post was written and contributed by Nati Shalom[1], CTO & Co-Founder of Gigaspaces, a Rackspac[2]e Clou[3]d Tools Partner[4].

The NoSQL movement[5] faults the SQL query language as the source of many of the scalability issues that we face today with traditional database approach.

I think that the main reason so many people have come to see SQL as the source of all evil is the fact that, traditionally, the query language was burned into the database implementation. So by saying NoSQL you basically say “No” to the traditional non-scalable RDBMS implementations.

This view has brought on a flood of alternative query languages, each aiming to solve a different aspect that is missing in the traditional SQL query approach, such as a document model, or that provides a simpler approach, such as Key/Value query.

Most of the people I speak with seem fairly confused on this subject, and tend to use query semantics and architecture interchangeably. So I thought that a good start would be to provide a quick overview of what each query term stands for in the context of the NoSQL world. Then, I’ll try to break some common misconceptions — which led me to come up with the YeSQL term.

Common Query Semantics in the Post Only-SQL world
The following are some of the common query semantics in the NoSQL world. For those that are interested in code examples i’ve linked each category with the relevant GigaSpaces reference API.

YeSQL — There’s Nothing Wrong with SQL!
Now that we’ve covered some of the concepts behind query formats, it becomes more apparent that there is nothing really wrong with SQL. Like many languages, SQL gives you a fairly long rope with which to hang yourself if you choose to, but that is true of almost any language. If you design your data model to fit into a distributed model, you may find that SQL can be a fairly useful format to manage your data. A good example is Hive/Pig/Hbase and Google JPA /Bigtable. In both cases the underlying data store is based on a scalable Key/Value store, but the front-end query language happens to be SQL-based. MongoDB aims toward a similar goal with the main difference that it provides SQL-like support and doesn’t fully comply with any of the existing standards.

It’s About the Architecture, Stupid!
NoSQL implementations such as Hive[11]/HBase as well as JPA/Bigtable[12] can be a good example of how next-generation databases can support both linear scaling and a SQL API.
The key is the decoupling of the query semantics from the underlying data-store as illustrated in the diagram below:

Supporting a SQL API on top of a NoSQL data store in Google Bigtable

Convergence is Underway
Last week I spent some time at the Hadoop summit[13]. Hadoop created a fairly generic substrate that led to an innovative ecosystem behind it. There are already many new frameworks today that provide different levels of abstraction to the way Hadoop manages data in both query and processing, such as Hive, Cascading, and Pig. Many of them provide tools that the original creator of Hadoop never even thought of.

Which brings me to the point that we can apply the same decoupling pattern I mentioned above to support a document model in connection with SQL. In other words, I believe that most of the leading databases will support all of the semantics listed above, and we won’t have to choose a database implementation just because it supports a certain query language.

We’ve already seen a similar trend with dynamic languages. In the past, a language had to come with a full stack of tools, compiler, libraries, and development tools behind it, making the selection of a particular language quite strategic. Today, a JVM in Java or a CLR in .Net provides a common substrate that can support a large variety of dynamic languages on top of the same JVM runtime. Good examples are Groovy and Java or Jruby.

Final Words
As I pointed out throughout this post, SQL is actually a fairly good query language and will continue to serve a major role in the post only-SQL world. However, the concept of one size fits all doesn’t hold up. The data management world is going to be built from a variety of tools and data management languages, each serving a particular purpose. Ideally, we should be able to access any data using any of several query languages, regardless of how it was stored. For example, I should be able store a JSON object using a document model and then, at any time, query that JSON object using SQL query semantics or a simple Key/Value API.

Paul Ford, from Rackspace Corporate Development, is Your Connection to the Rackspace “Cloud Tools Partners” Ecosystem [14]. To find out more about how GigaSpaces and other tools can increase your productivity, satisfy your IT needs, and generally make your life easier, contact him any time at[15]

  1. Nati Shalom:
  2. Rackspac:
  3. e Clou:
  4. d Tools Partner:
  5. NoSQL movement:
  6. Key/Value query:
  7. Document-based query::
  8. Template query: :
  9. Map/Reduce: :
  10. SQL query::
  11. Hive:
  12. JPA/Bigtable:
  13. Hadoop summit:
  14. Rackspace “Cloud Tools Partners” Ecosystem :

Source URL: