The NoSQL meeting tried to raise the awareness towards the opportunity of using non-relational databases which promise to be cheaper, simpler to administer and maintain, and offering superior scalability. Michael Stonebraker, co-creator of Ingres and Postgres, thinks that the end of RDBMS era is close, while others think that we are not there yet.
The first NoSQL meeting held in San Francisco, US, in June was joined by architects and developers from companies like LinkedIn, Facebook, SpringSource, Google, and was centered around a number of presentations on non-RDBMS data stores. The organizer, Johan Oskarsson, a developer for Last.fm and committer for Apache Hadoop and Hive, posted the slides and videos:
Voldemort - Jay Kreps, Linkedin (slides pdf ppt, video1, video2)
Cassandra - Avinash Lakshman, Facebook (slides pdf ppt, video)
Dynomite - Cliff Moon, Powerset (slides, video)
HBase - Ryan Rawson, Stumbleupon (slides, video)
Hypertable - Doug Judd, Zvents (slides pdf ppt, video1, video2)
CouchDB - Chris Anderson, couch.io (slides, video1, video2)
VPork - Jon Travis, SpringSource (slides, video)
MongoDb - Dwight Merriman, 10gen (slides, video)
Infinite Scalability - Jonas S Karlsson, Google (slides, video)
ComputerWorld reported that Jon Travis, principal engineer at SpringSource, said during the meeting:
Relational databases give you too much. They force you to twist your object data to fit a RDBMS… NoSQL-based alternatives "just give you what you need".
Oskarsson said:
Many had even dumped the open-source MySQL database, a long-time Web 2.0 favorite, for a NoSQL alternative, because the advantages were too compelling to ignore.
But he admitted that even the company he is working for is not yet using a no-SQL database for production, and we are not there yet:
It's true that [NoSQL] aren't relevant right now to mainstream enterprises, but that might change one to two years down the line.
Michael Stonebraker, co-creator of Ingres and Postgres back in 70’s, has predicted the demise of RDBMSes for many years. He has reiterated his position lately, giving some market reasons why we are approaching the end of Relational DMBS era:
In the data warehouse market, a column store beats a row store by approximately a factor of 50 on typical business intelligence queries. The reason is because column stores read only the columns of interest to the query and not all of them. In addition, compression is more effective in a column store. Since the legacy systems are all row stores, they are vulnerable to competition from the newer column stores.
… In the online transaction processing (OLTP) market, a lightweight main memory DBMS beats a row store by a factor of 50. Leveraging main memory and the fact that no DBMS application will send a message to a human user in the middle of a transaction, allows an OLTP DBMS to run transactions to completion with no resource contention or locking overhead.
… In the science DBMS market, users have never liked relational DBMSs and want a non-relational model and query facility.
… Text applications have never used relational DBMSs. This was pointed out to me most clearly by Eric Brewer nearly 15 years ago in the early days of Inktomi. He wanted to use a relational DBMS to store the results of Web crawling, but found RDBMS to be two orders of magnitude slower than a home-brew system. All the major Web-search engines use home-brew text software to serve us search results. None use relational DBMSs.
Stonebraker suggest following the next path for superior performance:
Use a non-relational data model:
If the user’s data is naturally something other than tables and if simulating his natural data model on top of tables is awkward, then chances are that a native implementation of the natural data model will significantly outperform a conventional RDBMS. This is certainly true in scientific data.
Use a different implementation of tables:
If something other than a row store accelerates the user’s queries, then a direct implementation of the relational model using non-row store technology will run circles around a conventional RDBMS. This is true in the data warehouse marketplace.
Use a different implementation of transactions:
Current row stores give you a “one size fits all” implementation of transactions. This can be radically beaten if a user has lesser requirements or if the system can take advantage of workload specific features. This is true in the OLTP marketplace.
BJ Clark, a developer for Grasshopper and a consultant, considers that ending the SQL game is not so easy being way too premature for that. Besides, the main advantage of column databases, key/value stores or document databases is considered to be scalability, but he has reviewed several of these and not all of them scale as promised. For example, in his evaluation, Tokio, a key/value store with full text search, Redis, another key/value store, MongoDB, a document DB, do not scale, at least not yet. Amazon S3 and Voldermort scale well, according to his findings though he is not presenting data to back up his claims but only conclusions resulting from his team looking for the best data store solution for their project. His conclusion is:
So, does RDBMS scale? I would say the answer is: not any worse than lots of other things. Most of what doesn’t scale in a RDBMS is stuff people don’t use that often anyway. And does NoSQL scale: a couple solutions do, most don’t. You might even argue that it’s just as easy to scale mysql (with sharding via mysql proxy) as it is to shard some of these NoSQL dbs. And I think it’s a pretty far leap to declare the RDBMS dead.
It is obvious that RDBMSes are no longer the main keepers of the data, and that is especially true with some of the large companies that have risen during the Internet era: Amazon, Google, Facebook, LinkedIn, and others. But it is also true that many have invested heavily in Oracle, DB2 or MS SQL, and the truth is those databases are still serving their needs. It is completely unlikely relational DBs to disappear any time soon, but it is possible to see a gradual move towards open source non-SQL data stores for costs, simplicity and scalability reasons.