The article provides a step-by-step guide to install Hadoop on Ubuntu operating system, including prerequisites, VMWare Player setup, Java JDK and SSH installation, Hadoop download, installation, and configuration.
Introduction
In my previous article, I tried to give an overview on Big Data and Hadoop. In this article, I will show you how to install Hadoop (single node cluster) on Ubuntu operating system. Windows users can also follow this article to install Ubuntu in a virtual machine and get the flavor of Hadoop. :)
Prerequisite of Hadoop
- JDK: The Java Development Kit (JDK) is a software development environment used for developing Java applications and applets. It includes the Java Runtime Environment (JRE), an interpreter/loader (java), a compiler (javac), an archiver (jar), a documentation generator (javadoc) and other tools needed in Java development. Since Hadoop framework is written in Java, it requires JDK.
- SSH: SSH ("Secure SHell") is a protocol for securely accessing one computer from another. Despite the name, SSH allows you to run command line and graphical programs, transfer files, and even create secure virtual private networks over the Internet.
Install VMWare Player and Ubuntu Operating System
This step is for windows users only. Please skip this step if you already have Ubuntu system installed. Start from step "Install Java 8 JDK".
Install Java 8 JDK
Setting JAVA_HOME Variable
- Run this command to get JDK path:
update-alternatives --config java
So JDK is installed in “/usr/lib/jvm/java-8-openjdk-amd64” path:
- Edit environment variables by typing the following command:
gedit /etc/environmen
- This will open an editor. Add the following line to the end of the editor:
JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64"
-
Click on “Save” and close the window.
-
Run this command to check if the edited file is error free.
source /etc/environmen
- Run this command to check if
JAVA_HOME
variable has been added properly:
echo $JAVA_HOME
Installing SSH
Download Hadoop
Download Hadoop version 2.7.3 from this link.
Click on 2.7.3 version binary:
- Click on the link marked as red to download the file. This will open a window. Select “Save File” option and click on “Save” button.
- This will start downloading the file:
- The file will be saved in default download location set in the browser.
Installing Hadoop
Configuring Hadoop
- In Terminal, login as root using the following command. Use the same password while you install Ubuntu:
sudo su
- Run this command to edit “.bashrc” file:
gedit ~/.bashrc
- This will open an editor. Add the following lines to the end of this editor. Replace
<JAVA_PATH>
and <HADOOP_HOME_PATH>
with appropriate paths:
#HADOOP VARIABLES START
<meta charset="utf-8" />export JAVA_HOME=<JAVA PATH>
<meta charset="utf-8" />export PATH=${JAVA_HOME}/bin:${PATH}
<meta charset="utf-8" />export HADOOP_INSTALL=<HADOOP HOME PATH>
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
#HADOOP VARIABLES END
-
In my case, it looks like this:
- Save and close the editor.
- Run the following command to check if there is any error in .bashrc file:
source ~/.bashrc
- Get into path “hadoop-2.7.3/etc/hadoop” by running the following command:
cd <HADOOP PATH>
In my case, it is:
cd /home/fazlur/hadoop-2.7.3/etc/hadoop
- Edit “hadoop-env.sh” file using the following command:
gedit hadoop-env.sh
- This will open an editor. Append this line to the end of the editor. Save and close the editor.
export JAVA_HOME=<Your Java Path>
In my case, it looks like this:
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
- Run the following command to check if there is any error in hadoop-env.sh file:
source hadoop-env.sh
- Make a directory called “hadoop_store” in the same directory where
hadoop-2.7.3
exists. And get into the directory. Run the following commands to do that:
cd <HOME PATH>
mkdir hadoop_store
cd hadoop_store
- In my case, it is:
cd /home/fazlur
- Make a directory called “hdfs” and get into it. Run these commands to do that:
mkdir hdfs
cd hdfs
- Make two directories called “namenode” and “datanode” inside “hdfs” directory. Run these commands to do that. The screenshot shows the consecutive commands and directory structure:
mkdir namenode
mkdir datanode
- Get into path “hadoop-2.7.3/etc/hadoop” by running the following command:
cd <HADOOP PATH>
In my case, it is:
cd /home/fazlur/hadoop-2.7.3/etc/hadoop
- Edit “hdfs-site.xml” by running the following command. This will open an editor:
gedit hdfs-site.xml
- Append the following lines between
<configuration></configuration>
tags. Replace <NAMENODE_FOLDER_PATH>
and <DATANODE_FOLDER_PATH>
with appropriate paths.
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:<NAMENODE_FOLDER_PATH></value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:<DATANODE_FOLDER_PATH></value>
</property>
- It looks like this in my case:
- Save and close the editor.
- Get into “hadoop-2.7.3” folder and create a directory called “tmp”. The following commands do this:
cd <hadoop-2.7.3 path>
mkdir tmp
In my case:
cd /home/fazlur/hadoop-2.7.3
mkdir tmp
- Edit “core-site.xml” file using the following command:
gedit core-site.xml
- This will open an editor. Append the following lines between
<configuration></configuration>
tags. Replace <TMP_FOLDER_PATH>
with appropriate path.
<property>
<name>hadoop.tmp.dir</name>
<value>/home/fazlur/hadoop-2.7.3/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
- Here is my one looks like:
- Save and close the editor.
- Run the following command to create “mapred-site.xml” file using “mapred-site.xml.template” template:
cp mapred-site.xml.template mapred-site.xml
- Edit “mapred-site.xml” using the following command:
gedit mapred-site.xml
- This will open an editor. Append the following lines between
<configuration></configuration>
tags. Replace <TMP_FOLDER_PATH>
with appropriate path.
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
- Here is my one looks like:
- Save and close the editor.
- Get into the root directory by executing command “
cd
”. - Format Hadoop File System by running the following command:
hadoop namenode -format
- Restart your machine.
- Open the terminal and login as “
su
”. - Run this command to start hadoop:
start-all.sh
- Run this command to check if all the services has been started:
jps
- It looks like NameNode service is not running. Follow these steps to get it working:
- Restart your machine.
- Open terminal and login as “
su
”. - Type “
cd
” to move to root directory. - Execute command “
hadoop namenode -format
” to format hadoop file system. - Execute command “
start-all.sh
” to start all services. - Execute command “
jps
” to check if all the services has been started.
- Now open your favourite browser and type the following url:
http://localhost:8088
- It opens a page like this if everything is up and running:
- Type the following url to check datanodes as well as browse hadoop file system:
http://localhost:50070
- This opens a page like this:
- Navigate to “Utilities-->Browse the file system” to check hadoop file system:
Conclusion
Hope you enjoyed reading and get a successful installation of hadoop in your ubuntu system. In my next consecutive articles, I will explain different components of Hadoop in details.
Thank you for reading my article and keeping in touch.
History
- 26th January, 2017: Initial version