InfoQ Homepage Big Data Content on InfoQ
-
Hadoop and Metadata (Removing the Impedance Mis-match)
A new Apache HCatalog project is a table and storage management layer for Hadoop that enables different data processing tools – Pig, MapReduce, and Hive – to more easily inter-operate data. HCatalog’s presents users with a relational view of data and ensures that users need not worry about where or in what format their data is stored – RCFile format, text files, or sequence files.
-
Transitioning from RDBMS to NoSQL. Interview with Couchbase’s Dipti Borkar
While relational databases have been used for decades to store data, and they still represent a viable solution for many use cases, NoSQL is being chosen today especially for scalability and performance reasons. This article contains an interview with Dipti Borkar, Director of Product Management at Couchbase, on the challenges, benefits and the process of migrating from RDBMS to NoSQL.
-
Implementing Aggregation Functions in MongoDB
In this article, authors Arun Viswanathan and Shruthi Kumar discuss how to implement common aggregation functions on a MongoDB document database using its MapReduce functionality. They also discuss a typical application of aggregations which includes business reporting of sales data.
-
Evolution in Data Integration From EII to Big Data
With the emergence of inexpensive cloud-based storage and cost-effective ways to process large volumes and complex data there has been a shift in approach toward data integration.
-
Implementing Lucene Spatial Support
Lucene geospatial extension proposed in this article is based on a two level search – first level search is based on Cartesian Grid search and the second level implements shape specific spatial calculations
-
Exploring Hadoop OutputFormat
As more companies adopt Hadoop, its integration with other applications is becoming more important. A key to such integration is usage of the appropriate OutputFormat allowing to produce output data in a form most appropriate for other applications.
-
Uncovering mysteries of InputFormat: Providing better control for your Map Reduce execution.
In their article authors, Boris Lublinsky and Mike Segel, show how to leverage custom InputFormat class implementation to tighter control execution strategy of Maps in Hadoop Map Reduce jobs.
-
Extending Oozie
In this article authors show how leverage Oozie extensibility to implement custom language extensions. This approach can be viewed a specializing workflow language for a given company/line of business.
-
Oozie by Example
End to end Oozie example, including process design, resource coordinator and workflow implementation
-
Data Mining in the Swamp: Taming Unruly Data With Cloud Computing
Matrix presents a white paper on using the open source tool, Hadoop, to implement the MapReduce strategy and a Cloud computing strategy to solve business intelligence problems.
-
SOA Agents: Grid Computing meets SOA
Grid technology for improving scalability, high availability and throughput in SOA implementations. In this article, Boris Lublinsky explains how Grid computing can be used in the overall SOA architecture and introduces a programming model for Grid utilization in service implementation. He also introduces an experimental Grid implementation that can support this proposed architecture.