Following on from the Stinger initiative delivered in Apache Hive 0.13, Hortonworks has laid out the Stinger.next roadmap to provide fully ACID transactions, a sub-second query engine, and more complete SQL 2011 analytics support, all driving towards the goal of “enhancing the speed, scale and breadth of SQL support” in Hive.
All development for Stinger.next will be driven by the Hive community, with transactions to be delivered in the first phase, expected to land by the end of 2014. Full ACID transactions are a significant advance over the previous write once read many model, and opens up data use cases that require periodic changes.
The first Stinger initiative leveraged YARN to deliver a 100x speed improvement, and speed is again a major feature in the Stinger.next plan. Sub-second queries are scheduled for the second phase in the first half of 2015, using a new hybrid engine built on Apache Tez and a new technology called Live Long And Process (LLAP), which is an optional daemon process running on multiple nodes. LLAP provides fast response times by efficient in-memory data caching and low-latency processing
The final phase will extend Hive’s SQL capabilities by adding non-equi joins, set functions, interval types and sub-queries. This subset is many of the more frequently used SQL 2011 analytics functions, with more functions to come if user demand drives it.
Several other features are promised by the community; Hive on Spark for machine learning tasks, Hive streaming ingest for working on the latest data, cross-geo query support, materialized views and usability and deployment improvements.
Hive is far from alone in the SQL on Hadoop space, Cloudera’s Impala, IBM’s Big SQL, along with the Apache Phoenix and Drill projects are just some of its proprietary and open source competitors. Its not yet clear if SQL jobs represent the ideal Hadoop data use case, or if their growing popularity is due to bringing big data scale to existing BI tools such as Tableau and Informatica.