InfoQ Homepage Big Data Content on InfoQ

Articles

RSS Feed

Newer Older

Hadoop and Metadata (Removing the Impedance Mis-match)

A new Apache HCatalog project is a table and storage management layer for Hadoop that enables different data processing tools – Pig, MapReduce, and Hive – to more easily inter-operate data. HCatalog’s presents users with a relational view of data and ensures that users need not worry about where or in what format their data is stored – RCFile format, text files, or sequence files.

Russell Jurney Alan Gates
on Sep 26, 2012
Transitioning from RDBMS to NoSQL. Interview with Couchbase’s Dipti Borkar

While relational databases have been used for decades to store data, and they still represent a viable solution for many use cases, NoSQL is being chosen today especially for scalability and performance reasons. This article contains an interview with Dipti Borkar, Director of Product Management at Couchbase, on the challenges, benefits and the process of migrating from RDBMS to NoSQL.

Abel Avram
on Sep 08, 2012
Implementing Aggregation Functions in MongoDB

In this article, authors Arun Viswanathan and Shruthi Kumar discuss how to implement common aggregation functions on a MongoDB document database using its MapReduce functionality. They also discuss a typical application of aggregations which includes business reporting of sales data.

Shruthi Kumar Arun Viswanathan
on Jun 20, 2012
Evolution in Data Integration From EII to Big Data

With the emergence of inexpensive cloud-based storage and cost-effective ways to process large volumes and complex data there has been a shift in approach toward data integration.

JP Morgenthal
on Feb 22, 2012
Implementing Lucene Spatial Support

Lucene geospatial extension proposed in this article is based on a two level search – first level search is based on Cartesian Grid search and the second level implements shape specific spatial calculations

Boris Lublinsky
on Jan 13, 2012
Exploring Hadoop OutputFormat

As more companies adopt Hadoop, its integration with other applications is becoming more important. A key to such integration is usage of the appropriate OutputFormat allowing to produce output data in a form most appropriate for other applications.

Jim.Blomo
on Dec 07, 2011
AI, ML & Data Engineering

Uncovering mysteries of InputFormat: Providing better control for your Map Reduce execution.

In their article authors, Boris Lublinsky and Mike Segel, show how to leverage custom InputFormat class implementation to tighter control execution strategy of Maps in Hadoop Map Reduce jobs.

Boris Lublinsky Mike Segel
on Nov 04, 2011
Extending Oozie

In this article authors show how leverage Oozie extensibility to implement custom language extensions. This approach can be viewed a specializing workflow language for a given company/line of business.

Boris Lublinsky Mike Segel
on Aug 02, 2011
Oozie by Example

End to end Oozie example, including process design, resource coordinator and workflow implementation

Boris Lublinsky Mike Segel
on Jul 18, 2011
Data Mining in the Swamp: Taming Unruly Data With Cloud Computing

Matrix presents a white paper on using the open source tool, Hadoop, to implement the MapReduce strategy and a Cloud computing strategy to solve business intelligence problems.

John Brothers
on Aug 13, 2010
SOA Agents: Grid Computing meets SOA

Grid technology for improving scalability, high availability and throughput in SOA implementations. In this article, Boris Lublinsky explains how Grid computing can be used in the overall SOA architecture and introduces a programming model for Grid utilization in service implementation. He also introduces an experimental Grid implementation that can support this proposed architecture.

Boris Lublinsky
on Dec 11, 2008

Newer Articles

Older Articles

Topics

Panel: What Does the Future of Computing Looks Like

Key Trends from 2024: Cell-based Architecture, DORA & SPACE, LLM & SLM, Cloud Databases and Portals

Generally AI: Time to Travel

Building Safe and Usable Medical Device Software: A Conversation with Neeraj Mainkar

Bits, Bots, and Banter: A Deep Dive into How Tech Teams Work in a DevOps World

Helpful links

Choose your language

Articles

Hadoop and Metadata (Removing the Impedance Mis-match)

Transitioning from RDBMS to NoSQL. Interview with Couchbase’s Dipti Borkar

Implementing Aggregation Functions in MongoDB

Evolution in Data Integration From EII to Big Data

Implementing Lucene Spatial Support

Exploring Hadoop OutputFormat

Uncovering mysteries of InputFormat: Providing better control for your Map Reduce execution.

Extending Oozie

Oozie by Example

Data Mining in the Swamp: Taming Unruly Data With Cloud Computing

SOA Agents: Grid Computing meets SOA

Panel: What Does the Future of Computing Looks Like

Cloudflare 2024 Year in Review: Strong Growth for GitHub Copilot and Go Surpasses Node.js

AWS Adds News Amazon Q Developer Agent Capabilities: Doc Generation, Code Reviews, and Unit Tests

Key Trends from 2024: Cell-based Architecture, DORA & SPACE, LLM & SLM, Cloud Databases and Portals

How to Architect Software for a Greener Future

Key Takeaways from QCon & InfoQ Dev Summits with a Look ahead to 2025 Conferences

Building Safe and Usable Medical Device Software: A Conversation with Neeraj Mainkar

How to Go from Copy and Paste Deployments to Full GitOps

Leveraging Internal Developer Portals to Achieve Strategic Initiatives

NVIDIA Unveils Hymba 1.5B: a Hybrid Approach to Efficient NLP Models

LLaMA-Mesh: NVIDIA’s Breakthrough in Unifying 3D Mesh Generation and Language Models

DeepThought-8B Leverages LLaMA-3.1 8B to Create a Compact Reasoning Model

Prometheus 3.0 Brings New UI, OpenTelemetry Support and More

Pinterest's Use of Honeycomb for Enhanced CI Observability and Build Stability

Bits, Bots, and Banter: A Deep Dive into How Tech Teams Work in a DevOps World

QCon London

InfoQ Dev Summit Boston

InfoQ Dev Summit Munich

QCon San Francisco

InfoQ Dev Summit New York

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Articles