At Hadoop World this week, Membase and Cloudera announced the integration of the Membase Server, with Cloudera’s distribution for Hadoop. Membase is a NoSQL database which was released the day before, as reported by InfoQ. Hadoop is an open source project which includes distributed storage and map-reduce processing framework. At the conference, AOL Advertising and ShareThis presented how they used this integration for their ad targeting and serving platforms.
James Phillips, co-founder and SVP of Products at Membase wrote that:
On the technology integration front, we have built and are making available to customers two mechanisms for integrating Membase and Cloudera Distribution for Hadoop (CDH). The first is a Membase NodeCode module that can stream data from Membase to CDH in real-time. As new operational data enters Membase, it can be massaged in real time and pumped into a CDH cluster for processing. The second is a Sqoop-derived batch loader utility that enables loading of data from Membase to CDH, and vice versa.
According to Perry Krug, Systems Engineer, at Membase, the real-time integration uses Cloudera's Flume project to pass Membase updates as events, to be stored in the Hadoop distributed file system.
Pero Subasic, Chief Architect at AOL pointed out:
AOL serves billions of impressions per day from our ad serving platforms, and any incremental improvement in processing time translates to huge benefits in our ability to more effectively serve the ads to needed meet our contractual commitments. Traditional databases lack the scalability required to support our goal of five milliseconds per read/write. Creating user profiles with Hadoop, then serving them from Membase, reduces profile read and write access to under a millisecond, leaving the bulk of the processing time budget for improved targeting and customization.
Mike Olson, Cloudera CEO noted that:
Integrating with Membase Server with Cloudera's Distribution for Hadoop adds complementary functionality that customers are interested in. The result is a highly optimized data delivery system with virtually no lag time. This real-time processing capability is essential for any solution on which split decisions must be made, including ad targeting and social gaming.
In additional to the release of integration software, Phillips noted that Cloudera and Membase were collaborating to build joint solutions for "ad, offer and content targeting; log and event stream capture and analysis; and social gaming."