InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

Enter your e-mail address

Select your country

We protect your privacy.

InfoQ Homepage News Run Your Own Google Style Computing Cluster with Hadoop and Amazon EC2

Run Your Own Google Style Computing Cluster with Hadoop and Amazon EC2

Nov 10, 2006 1 min read

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

Clustered grid computing software does not simply happen. Efficient architectures must be designed. One of the core technologies used by Google is the MapReduce programming model which allows for the processing and generation of large data sets. By defining a scalable program structure upfront Map Reduce allows algorithms to easily scale across machines:

Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google's clusters every day.

Doug Cutting the creator of Lucene and now an employee of Yahoo has been working on an open source implementation of MapReduce and called Hadoop written in Java which also includes a distributed file system. Hadoop has already been tested on clusters up to 600 nodes.

Hadoop is a framework for running applications on large clusters of commodity hardware. The Hadoop framework transparently provides applications both reliability and data motion. Hadoop implements a computational paradigm named map/reduce, where the application is divided into many small fragments of work, each of which may be executed or reexecuted on any node in the cluster. In addition, it provides a distributed file system that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. Both map/reduce and the distributed file system are designed so that node failures are automatically handled by the framework.

Amazon recently released their EC2 Elastic Computing cloud which allows developers to acquisition computing power a the rate of $0.10 per hour consumed. Recently work has been done to allow Hadoop to run on EC2. This combination will allow developers to write scalable algorithms and then bring up large numbers of servers for computing power which can then be then shut them down when they are not needed.

This content is in the AWS topic

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Run Your Own Google Style Computing Cluster with Hadoop and Amazon EC2

Write for InfoQ

This content is in the AWS topic

Related Topics:

Popular in

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter