The Basics of Administrating a Hadoop Cluster

So assuming you followed and completed my first post, Getting Started with Hortonworks Data Platform 2.3, you should now have your very own Hadoop cluster (albeit, it pales slightly to Yahoo!’s reported 4,500 node cluster).

Still, you’ve taken a very big step towards learning about Hadoop and how to use it effectively. Well done!

We left off at the previous post looking at a webpage in your browser: the front end of Ambari.

Hopefully it looks something like this:


Ambari is a very important source of information for everything about your cluster. The center tiles tell you at a glance many of the most important aspects of your cluster including available disk space, nodes available, and cluster uptime. To the left, you have listed a number of Services installed on your cluster. Services are individual components that live in the Hadoop ecosystem and provide a great number of capabilities that do not come with the distributed file system that Hadoop is most known for.

Take a moment to click around and investigate what these are. Some of the essentials are HDFS, MapReduce2, Yarn, Hive, and Pig. Of course there are many more listed (and even more available out in the wilds of the Internet) but these are the ones that are included with the Hortonworks Data Platform.

Service health is indicated by the color next to the name, Green indicates good health, Yellow indicates warnings, and Red indicates errors or alerts. The first aid kit indicates that the service is stopped and in maintenance mode.

To start, stop, restart or enter maintenance mode for any service, select it on the left then click the ‘Service Actions’ drop down menu on the far right of the service’s page, circled below.


Installing a new service is as easy as selecting the ‘Actions’ drop down menu below all the listed Services and following the wizard to install. I’ll go through that in a later post however.

Next we’ll look at the other tabs available in Ambari: Hosts, Alerts, Admin.

The Hosts tab shows all the nodes, both worker and master, that are connected to your cluster along with their health (same indicators as before), name, IP address, server rack, cores, RAM, disk usage, load averages, Hadoop version installed, and finally: all the components that are installed on that particular node. In our cluster, we obviously only have one that we can see here:


The Alerts tab gives you a play-by-play of the things that go wrong in your cluster like process failures, node failure, or critical disk space usage. It’ll tell you what went wrong, when it went wrong, what service it’s associated with and what its current state is. We get some alerts from when Hortonworks first established this virtual machine image. In my case, this happened 15 days ago. You can also see that those alerts have been resolved. Thank you Hortonworks.


The Admin tab doesn’t let you do much, but it’s still full of information that’s very important in administrating a Hadoop cluster, particularly if you have other users running about inside of it (we’ll simulate this later). It tells you what services you have installed, what version you have of those services installed (that’s nice to know when someone asks you “Hey what version of Hive are we running?” because they need to know if a bug fix was implemented or not).


Also on this tab is a thing called Kerberos. Kerberos is an useful security feature in Hadoop. It was essentially the first and only security feature available to a Hadoop cluster other than basic user authentication. Unfortunately, implementing it is rather complicated and outside of the scope of this post. Also if done incorrectly, your entire cluster will be locked down and then your only option is to pray to the Hadoop gods for mercy or start completely over from scratch. The latter is not a good idea in a production environment.

This post gave a good overview of what Ambari is and how it helps you administer a cluster on a basic level. The next post will involve us actually getting our hands dirty and typing things on the command line. Exciting!

Until then, questions or comments below please.