Siva's Blog...: Why Hadoop? The power of Hadoop and Who uses it?

Welcome to Siva's Blog

~-Scribbles by Sivananda Hanumanthu

My experiences and learnings on Technology, Leadership, Domains, Life and on various topics as a reference!

What you can expect here, it could be something on Java, J2EE, Databases, or altogether on a newer Programming language, Software Engineering Best Practices, Software Architecture, SOA, REST, Web Services, Micro Services, APIs, Technical Architecture, Design, Programming, Cloud, Application Security, Artificial Intelligence, Machine Learning, Big data and Analytics, Integrations, Middleware, Continuous Delivery, DevOps, Cyber Security, Application Security, QA/QE, Automations, Emerging Technologies, B2B, B2C, ERP, SCM, PLM, FinTech, IoT, RegTech or any other domain, Tips & Traps, News, Books, Life experiences, Notes, latest trends and many more...

Friday, December 23, 2011

Why Hadoop? The power of Hadoop and Who uses it?

Hadoop is a framework for running applications on large clusters built of commodity hardware. The Hadoop framework transparently provides applications both reliability and data motion. Hadoop implements a computational paradigm named Map/Reduce, where the application is divided into many small fragments of work, each of which may be executed or reexecuted on any node in the cluster. In addition, it provides a distributed file system (HDFS) that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. Both Map/Reduce and the distributed file system are designed so that node failures are automatically handled by the framework.

Map/Reduce is a programming paradigm that was made popular by Google where in a task is divided in to small portions and distributed to a large number of nodes for processing (map), and the results are then summarized in to the final answer (reduce). Google and Yahoo use this for their search engine technology, among other things.
Hadoop is a generic framework for implementing this kind of processing scheme. As for why it kicks ass, mostly because it provides neat features such as fault tolerance and lets you bring together pretty much any kind of hardware to do the processing. It also scales extremely well, provided your problem fits the paradigm.
You can read all about it on the website.
As for some examples, Paul gave a few, but here's a few more you could do that are not so web-centric:

Rendering a 3D film. The "map" step distributes the geometry for every frame to a different node, the nodes render it, and the rendered frames are recombined in the "reduce" step.
Computing the energy in a system in a molecular model. Each frame of a system trajectory is distributed to a node in the "map" step. The nodes compute the the energy for each frame,
and then the results are summarized in the "reduce" step.

Essentially the model works very well for a problem that can be broken down in to similar discrete computations that are completely independent, and can be recombined to produce a final result.

References:

When Should You Use Hadoop?
http://www.readwriteweb.com/cloud/2011/01/when-should-you-use-hadoop.php

Employees From Yahoo, Google, And Facebook Are Flocking To These Start-Ups:
http://www.businessinsider.com/employees-from-yahoo-google-and-facebook-are-flocking-to-these-start-ups-2011-12?utm_source=twitterfeed&utm_medium=twitter

Apache Hadoop:
http://hadoop.apache.org/
http://wiki.apache.org/hadoop/