Yahoo spun-out its core Hadoop engineering team yesterday with an investment from Benchmark Capital, forming a new company Hortonworks. CEO Eric Baldeschwieler presented the company's vision at a keynote at the Hadoop Summit today - they will sell support, training, and certification primarily in partnership with other companies. Hortonworks is initially focusing on making Hadoop more accessible by simplifying installation and making frequent sustaining releases. Going into 2012, the company will focus on Hadoop 0.23 with its Next Generation MapReduce, improved performance, simplified management and continuing investment to eliminate single points of failure in Hadoop. Baldeschwieler emphasized support for Hadoop, HBase, Pig, and Hive.
Baldeschwieler quoted an analyst organization that they are "seeing Hadoop in all of our Fortune 2000 data accounts." He said that today the technology is hard to use, with a need to hire experts or consultants, so Hortonworks is focusing on plugging technology and knowledge gaps. The company will sell training and support via partners as well as do product development. It won't sell a distribution, but will sell support for the open source distribution.
The company's objectives are to make Apache Hadoop:
- easier to install, manage and use
- more robust (with better performance, High Availability, adminstration and monitoring)
- easier to integrate and extend with open APIs
In 2011 the company is focusing on:
- Accessibility for Hadoop and other projects "as we get involved with them": including 0.20.203 "the most stable Hadoop version ever"
- Easy installation by releasing stable code via Apache (e.g., RPMs and .debs)
- Frequent sustaining releases off of stable branches
In 2012 the company is emphasizing:
- Next generation Apache Hadoop (with betas in October 2011) including new MapReduce, HDFS Federation, and a substantial improvement in job performance
- Fixing key product gaps: a new write pipeline for HBase and High Availability (by eliminating Single Points of Failure incrementally)
- Enabling community and partner innovation via a modular architecture and open APIs, working with the community to define an integrated stack
- Data support through HCatalog 0.3 - a standard for file metadata that generalizes Hive's metastore, enabling support for Pig, Hive, and HBase, including storage in HDFS and HBase, as well as performance and storage enhancements
- Management and ease of use: testing all components together, making them installable as a stack, centralized configuration management, and a REST API and a GUI for administration
Baldeschwieler emphasized working through Apache: "All our work is going back into Hadoop, no if's what's or buts" and that they want companies to base their "offering on the Apache offering, anyone who does that is our partner." It's worth noting that Apache recently signaled a more aggressive enforcement of its trademark rules with regard to Hadoop. Baldeschwieler noted that Yahoo is a development partner as well as a customer and investor for Hortonworks - that Yahoo will continue to be central in the release process for Apache Hadoop. He also announced that business operations for the company will be run by its President Rob Bearden, the former COO of both SpringSource and JBoss. In response to a question about the relationship of Hortonworks and Cloudera, He said "That's to be determined. We've been working together to improve Apache Hadoop. In the past we've had our differences, but as I said anyone who's focusing on Apache Hadoop is our partner."