My latest piece of work for GigaOM Pro just went live. Scaling Hadoop clusters: the role of cluster management is available to GigaOM Pro subscribers, and was underwritten by StackIQ.

Thanks to everyone who took the time to speak with me during the preparation of this report.

As the blurb describes,

From Facebook to Johns Hopkins University, organizations are coping with the challenge of processing unprecedented volumes of data. It is possible to manually build, run and maintain a large cluster and to use it to run applications such as Hadoop. However, many of the processes involved are repetitive, time-consuming and error-prone. So IT managers (and companies like IBM and Dell) are increasingly turning to cluster-management solutions capable of automating a wide range of tasks associated with cluster creation, management and maintenance.

This report provides an introduction to Hadoop and then turns to more-complicated matters like ensuring efficient infrastructure and exploring the role of cluster management. Also included is an analysis of different cluster-management tools from Rocks to Apachi Ambari and how to integrate them with Hadoop.

Compulsory picture of an elephant as it’s a Hadoop story provided by Flickr user Brian Snelson.