InfoQ Homepage Hadoop Content on InfoQ

Articles

RSS Feed

Newer Older

AI, ML & Data Engineering

Performance Tuning Techniques of Hive Big Data Table

In this article, author Sudhish Koloth discusses how to tackle performance problems when using Hive Big Data tables.

Sudhish Koloth
on Feb 05, 2021
AI, ML & Data Engineering

Peter Cnudde on How Yahoo Uses Hadoop, Deep Learning and Big Data Platform

Yahoo uses Hadoop for different use cases in big data & machine learning areas. They also use deep learning techniques in their products like Flickr. InfoQ spoke with Peter Cnudde on how Yahoo leverages big data platform technologies.

Srini Penchikala
on Oct 13, 2016
AI, ML & Data Engineering

Data Lake-as-a-Service: Big Data Processing and Analytics in the Cloud

Data Lake-as-a-Service solutions provide big data processing in the cloud for faster business outcomes in a very cost effective way. InfoQ spoke with Lovan Chetty and Hannah Smalltree from Cazena team about how Data Lake as a Service works.

Srini Penchikala
on Dec 10, 2015
AI, ML & Data Engineering

Oozie Plugin for Eclipse

Oozie Eclipse plugin is a new tool for editing Apache Oozie workflows graphically inside Eclipse. Usage of this plugin allows to skip hard to develop and maintain process definition in HPDL. Instead a process graph is defined graphically by placing process actions on pallet and connecting them. An article introduces Eclipse Oozie plugin and provides an example of its usage.

Ahmed Mahran
on Oct 30, 2015
Big Data as a Service, an Interview with Google's William Vambenepe

Many of the Big Data technologies in common use originated from Google and have become popular open source platforms, but now Google is bringing an increasing range of big data services to market as part of its Google Cloud Platform. InfoQ caught up with Google's William Vambenepe, who's lead product manager for Big Data services to ask him about the shift towards service based consumption.

Chris Swan
on Jul 06, 2015
Java

Designing a Highly Available, Fault Tolerant, Hadoop Cluster with Data Isolation

As data grows exponentially, the modern Hadoop ecosystem provides not only a reliable distributed aggregation system that delivers data parallelism, but also analytics for great data insights. In this article Monica Beckwith, starting from core Hadoop components, investigates the design of a highly available, fault tolerant Hadoop cluster, adding security and data-level isolation.

Monica Beckwith
on Dec 16, 2014
Interview with Alex Holmes, author of “Hadoop in Practice. Second Edition”

The new “Hadoop in Practice. Second Edition” book by Alex Holmes provides a deep insight into Hadoop ecosystem covering a wide spectrum of topics such as data organization, layouts and serialization, data processing, including MapReduce and big data patterns, special structures along with their usage to simplify big data processing, and SQL on Hadoop data.

Boris Lublinsky
on Nov 20, 2014
Matt Schumpert on Datameer Smart Execution

Datameer, a big data analytics application for Hadoop, introduced Datameer 5.0 with Smart Execution to dynamically select the optimal compute framework at each step in the big data analytics process. InfoQ spoke with Matt Schumpert from Datameer team about the new product and how it works to help with big data analytics needs.

Srini Penchikala
on Nov 13, 2014
AI, ML & Data Engineering

Real-Time Stream Processing as Game Changer in a Big Data World with Hadoop and Data Warehouse

This article discusses what stream processing is, how it fits into a big data architecture with Hadoop and a data warehouse (DWH), when stream processing makes sense, and what technologies and products you can choose from.

Kai Wähner
on Sep 10, 2014
Nikita Ivanov on GridGain’s In-Memory Accelerator for Hadoop

GridGain recently announced the In-Memory Accelerator for Hadoop, offering the benefits of in-memory computing to Hadoop based applications. It includes two components: an in-memory file system and a MapReduce implementation. InfoQ spoke with Nikita Ivanov, CTO of GridGain about the architecture of the product.

Srini Penchikala
on Sep 08, 2014
Rich Reimer on SQL-on-Hadoop Databases and Splice Machine

SQL-on-Hadoop technologies include a SQL layer or a SQL database over Hadoop. These solutions are becoming popular recently as they solve the data management issues of Hadoop and provide a scale-out alternative for traditional RDBMSs. InfoQ spoke with Rich Reimer, VP of Marketing and Product Management at Splice Machine about the architecture and data patterns for SQL in Hadoop databases.

Srini Penchikala
on Jun 19, 2014
Lambda Architecture: Design Simpler, Resilient, Maintainable and Scalable Big Data Solutions

Lambda Architecture proposes a simpler, elegant paradigm designed to store and process large amounts of data. In this article, author Daniel Jebaraj presents the motivation behind the Lambda Architecture, reviews its structure with the help of a sample Java application.

Daniel Jebaraj
on Mar 12, 2014

Newer Articles

Older Articles

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Articles