InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

Enter your e-mail address

Select your country

We protect your privacy.

InfoQ Homepage MapReduce Content on InfoQ

News

RSS Feed

Newer Older

AI, ML & Data Engineering

Uber Open-Sourced Its Highly Scalable and Reliable Shuffle as a Service for Apache Spark

Uber engineering has recently open-sourced its highly scalable and reliable shuffle as a service for Apache Spark. Spark is one of the most important tools and platforms in data engineering and analytics. It is shuffling data on local machines by default and causes challenges while the scale is getting very large. Shuffle as a service is a solution developed at Uber for this problem.

Reza Rahimi
on Aug 14, 2022
Cloud

Google Introduces Cloud Storage Connector for Hadoop Big Data Workloads

In a recent blog post, Google announced a new Cloud Storage connector for Hadoop. This new capability allows organizations to substitute their traditional HDFS with Google Cloud Storage. Columnar file formats such as Parquet and ORC may realize increased throughput, and customers will benefit from Cloud Storage directory isolation, lower latency, increased parallelization and intelligent defaults

Kent Weare
on Sep 09, 2019
Cloud

Cloudera and Hortonworks Merge with Goal to Increase Competition with Cloud Offerings

Earlier this month, Cloudera and Hortonworks announced an all-stock merger at a combined value of around $5.2 billion. Analysts have argued that this merger is aimed at increased competition that both companies are facing from cloud vendors like Amazon, Google and Microsoft. In this article we log reactions from analysts and the industry, and the implications for current customers.

Alex Giamas
on Oct 31, 2018
Glenn Tamkin on Applying Apache Hadoop to NASA's Big Climate Data

NASA Center for Climate Simulation (NCCS) is using Apache Hadoop for high-performance data analytics. Glenn Tamkin from NASA team, recently spoke at ApacheCon Conference and shared the details of the platform they built for climate data analysis with Hadoop.

Srini Penchikala
on May 06, 2015
Google Open Sources MapReduce Framework for C to Run Native Code in Hadoop

Google announced last week the release of open source MapReduce framework for C, called MR4C, that allows developers to run native code in Hadoop framework. MR4C framework brings together the performance and flexibility of natively developed algorithms with the scalability and throughput provided by Hadoop execution framework.

Srini Penchikala
on Feb 25, 2015
MapR-DB NoSQL Database Integrated into MapR Community Edition for Unlimited Production Use

MapR Technologies, provider of the Apache Hadoop distribution, has open sourced their MapR-DB NoSQL database for unlimited production use. MapR-DB is a Wide Column NoSQL database with native integration to Hadoop and support for strong consistency and ACID transactions.

Srini Penchikala
on Dec 05, 2014
Big Data Analytics: Using Hunk with Hadoop and Elastic MapReduce

Hunk is a relatively new product from Splunk for exploring and visualizing Hadoop and other NoSQL data stores. New in this release is support for Amazon’s Elastic MapReduce.

Jonathan Allen
on Oct 07, 2014
Apache Drill Included in MapR Latest Distribution Release

MapR recently announced including Apache Drill in its latest release of MapR distribution. Apache Drill is the open source version of Google’s Dremel. Dremel is the infrastructure on which BigQuery is based upon. Drill is offering a low latency SQL-on-Hadoop interface. While this puts it in the same space as several other technologies around Hadoop, Drill has some unique characteristics setting it

Alex Giamas
on Sep 30, 2014
Hydra Takes On Hadoop

The social-networking company AddThis open-sourced Hydra under the Apache version 2.0 License in a recent announcement. Hydra grew from an in-house platform created to process semi-structured social data as live streams and do efficient query processing on those data sets.

Rags Srinivas
on Apr 11, 2014
Hazelcast Introduces MapReduce API

Hazelcast, an open source in-memory data grid solution introduces a MapReduce API for its offering.

Michael Hausenblas
on Feb 18, 2014
AI, ML & Data Engineering

Twitter Open-Sources its MapReduce Streaming Framework Summingbird

Twitter has open sourced their MapReduce streaming framework, called Summingbird. Available under the Apache 2 license, Summingbird is a large-scale data processing system enabling developers to uniformly execute code in either batch-mode (Hadoop/MapReduce-based) or stream-mode (Storm-based) or a combination thereof, called hybrid mode.

Michael Hausenblas
on Jan 16, 2014
New Education Opportunities for Data Scientists

2013 has been rich in announcements for new programs, degrees and grants for aspiring data scientists and Big Data practitioners.

Charles Menguy
on Jan 14, 2014
Hadoop Jobs on GPU with ParallelX

The MapReduce paradigm is not always ideal when dealing with large computationally intensive algorithms. A small team of entrepreneurs is building a product called ParallelX to solve that bottleneck by harnessing the power of GPUs to give Hadoop jobs a significant boost.

Charles Menguy
on Dec 26, 2013
Elastic Mesos service automates Mesos cluster deployment in EC2

EC2 users can now automate the deployment of Apache Mesos, an open-source tool to share cluster resources between multiple data processing frameworks, at scale through a new web service called Elastic Mesos provided by Big Data startup Mesosphere.

Charles Menguy
on Dec 17, 2013
Apache Tez - a Generalization of the MapReduce Data Processing

A new Apache incubator project, Tez, generalizes the MapReduce paradigm to execute a complex DAG (directed acyclic graph) of tasks.

Boris Lublinsky
on Sep 20, 2013

Newer News

Older News

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

News