Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / artificial-intelligence / big-data

Hadoop Beginners Guide - How to Install

5.00/5 (10 votes)
22 May 2022CPOL7 min read 24.9K  
Step by step procedure to install Hadoop 2.7.3 version on Ubuntu 16.04 operating system
The article provides a step-by-step guide to install Hadoop on Ubuntu operating system, including prerequisites, VMWare Player setup, Java JDK and SSH installation, Hadoop download, installation, and configuration.

Introduction

In my previous article, I tried to give an overview on Big Data and Hadoop. In this article, I will show you how to install Hadoop (single node cluster) on Ubuntu operating system. Windows users can also follow this article to install Ubuntu in a virtual machine and get the flavor of Hadoop. :)

Prerequisite of Hadoop

  • JDK: The Java Development Kit (JDK) is a software development environment used for developing Java applications and applets. It includes the Java Runtime Environment (JRE), an interpreter/loader (java), a compiler (javac), an archiver (jar), a documentation generator (javadoc) and other tools needed in Java development. Since Hadoop framework is written in Java, it requires JDK.
  • SSH: SSH ("Secure SHell") is a protocol for securely accessing one computer from another. Despite the name, SSH allows you to run command line and graphical programs, transfer files, and even create secure virtual private networks over the Internet.

Install VMWare Player and Ubuntu Operating System

This step is for windows users only. Please skip this step if you already have Ubuntu system installed. Start from step "Install Java 8 JDK".

  • Download VMWare Player from here
  • Install VMWare Player
  • Download Ubuntu from here
  • Open VMWare Player

    Image 1

  • Click on “Create a New Virtual Machine” which opens the following screen:

    Image 2

  • Choose option “I will install the operating system later” and click on “Next” button which opens the following screen:

    Image 3

  • Choose option “Linux” and select “Ubuntu 64-bit” from version dropdownlist and click on “Next” button to go to the next screen:

    Image 4

  • Enter the name of virtual machine, set the location and click on “Next” button to go to the next screen:

    Image 5

  • Set maximum disk size as 40 GB if you have enough disk space, choose option “Store virtual disk as a single file” and click on “Next” button which navigates to the next screen:

    Image 6

  • Click on Customize Hardware if you have more than 4GB RAM:

    Image 7

  • Select 2GB RAM and click on “Close” button. And then click on “Finish” button.

    Image 8

  • Click on “Edit virtual machine settings”:

    Image 9

  • Click on “CD/DVD (SATA)” hardware, choose option “Use ISO image file” and browse the Ubuntu ISO file. Click “OK” to close this window

  • Click on “Play Virtual Machine”. This will start installing Ubuntu operating system. Follow the step by step procedure and finish the installation

Install Java 8 JDK

  • Login to Ubuntu machine
  • Open Terminal by pressing Ctrl+Alt+T
  • Login as "su" (super user) using the following command. Use the same password while you install Ubuntu:
    sudo su
  • Type "cd" (change directory) and press Enter to move to the root directory:
    cd
    Image 10
  • Type the following command and press Enter:
    apt-get install openjdk-8-jdk 

    Image 11

  • This will ask for a confirmation. Type Y and press Enter:

    Image 12

  • This will take some time to complete. Execute “clear” command to clear the screen:
    clear
  • Execute the following command to see if JDK is installed successfully:
    java -version
    javac -version

    Image 13

Setting JAVA_HOME Variable

  • Run this command to get JDK path:
    update-alternatives --config java

    Image 14
    So JDK is installed in “/usr/lib/jvm/java-8-openjdk-amd64” path:

  • Edit environment variables by typing the following command:
    gedit /etc/environmen
  • This will open an editor. Add the following line to the end of the editor:
    JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64"

    Image 15

  • Click on “Save” and close the window.

  • Run this command to check if the edited file is error free.

    source /etc/environmen
  • Run this command to check if JAVA_HOME variable has been added properly:
    echo $JAVA_HOME

    Image 16

Installing SSH

  • Run the following command:
    apt-get install ssh
  • This will ask for a confirmation. Type Y and press Enter.

    Image 17

  • Once done, generate public/private rsa key pair by executing the following command:
    ssh-keygen -t rsa -P ""
  • This will ask “Enter file in which to save the key (/root/.ssh/id_rsa):”. Type nothing and press Enter.

    Image 18

  • Make the generated public key authorized by running the following command:
    cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

    Image 19

  • Check if ssh is installed and running properly by executing the following command:
    ssh localhost
  • This will ask “Are you sure you want to continue connecting (yes/no)?”. Type yes and press Enter.

    Image 20

  • If it shows error, execute the same command again:
    ssh localhost

    Image 21

  • It should display the above message if ssh is installed and running properly.

Download Hadoop

Download Hadoop version 2.7.3 from this link.

Click on 2.7.3 version binary:

Image 22

  • Click on the link marked as red to download the file. This will open a window. Select “Save File” option and click on “Save” button.

    Image 23

  • This will start downloading the file:

    Image 24

  • The file will be saved in default download location set in the browser.

Installing Hadoop

  • Close the terminal and open it again. No need to login as “su”.
  • Find the path where the hadoop installation file is downloaded and run the following command to unpack it.
    tar -xvzf ‘<downloaded package path>’
  • In my case, it is:
    tar -xvzf ‘/home/fazlur/Downloads/hadoop-2.7.3.tar.gz’
  • This creates a directory "hadoop-2.7.3" under home directory:

    Image 25

Configuring Hadoop

  • In Terminal, login as root using the following command. Use the same password while you install Ubuntu:
    sudo su
  • Run this command to edit “.bashrc” file:
    gedit ~/.bashrc
  • This will open an editor. Add the following lines to the end of this editor. Replace <JAVA_PATH> and <HADOOP_HOME_PATH> with appropriate paths:
    #HADOOP VARIABLES START
    <meta charset="utf-8" />export JAVA_HOME=<JAVA PATH>
    <meta charset="utf-8" />export PATH=${JAVA_HOME}/bin:${PATH}
    <meta charset="utf-8" />export HADOOP_INSTALL=<HADOOP HOME PATH>
    export PATH=$PATH:$HADOOP_INSTALL/bin
    export PATH=$PATH:$HADOOP_INSTALL/sbin
    export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar
    export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
    export HADOOP_COMMON_HOME=$HADOOP_INSTALL
    export HADOOP_HDFS_HOME=$HADOOP_INSTALL
    export YARN_HOME=$HADOOP_INSTALL
    export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
    export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
    #HADOOP VARIABLES END
  • In my case, it looks like this:

    Image 26

  • Save and close the editor.
  • Run the following command to check if there is any error in .bashrc file:
    source ~/.bashrc

    Image 27

  • Get into path “hadoop-2.7.3/etc/hadoop” by running the following command:
    cd <HADOOP PATH>

    In my case, it is:

    cd /home/fazlur/hadoop-2.7.3/etc/hadoop

    Image 28

  • Edit “hadoop-env.sh” file using the following command:
    gedit hadoop-env.sh
  • This will open an editor. Append this line to the end of the editor. Save and close the editor.
    export JAVA_HOME=<Your Java Path>

    In my case, it looks like this:

    export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

    Image 29

  • Run the following command to check if there is any error in hadoop-env.sh file:
    source hadoop-env.sh
  • Make a directory called “hadoop_store” in the same directory where hadoop-2.7.3 exists. And get into the directory. Run the following commands to do that:
    cd <HOME PATH>
    	mkdir hadoop_store
    	cd hadoop_store
  • In my case, it is:
    cd /home/fazlur
  • Make a directory called “hdfs” and get into it. Run these commands to do that:
    mkdir hdfs
    cd hdfs
  • Make two directories called “namenode” and “datanode” inside “hdfs” directory. Run these commands to do that. The screenshot shows the consecutive commands and directory structure:
    mkdir namenode
    mkdir datanode
  • Get into path “hadoop-2.7.3/etc/hadoop” by running the following command:
    cd <HADOOP PATH>

    In my case, it is:

    cd /home/fazlur/hadoop-2.7.3/etc/hadoop
  • Edit “hdfs-site.xml” by running the following command. This will open an editor:
    gedit hdfs-site.xml
  • Append the following lines between <configuration></configuration> tags. Replace <NAMENODE_FOLDER_PATH> and <DATANODE_FOLDER_PATH> with appropriate paths.
    XML
    <property>
     <name>dfs.replication</name>
     <value>1</value>
     <description>Default block replication.
     The actual number of replications can be specified when the file is created.
     The default is used if replication is not specified in create time.
     </description>
    </property>
    <property>
      <name>dfs.namenode.name.dir</name>
     <value>file:<NAMENODE_FOLDER_PATH></value>
    </property>
    <property>
      <name>dfs.datanode.data.dir</name>
      <value>file:<DATANODE_FOLDER_PATH></value>
    </property>
  • It looks like this in my case:

    Image 30

  • Save and close the editor.
  • Get into “hadoop-2.7.3” folder and create a directory called “tmp”. The following commands do this:
    cd <hadoop-2.7.3 path>
    mkdir tmp

    In my case:

    cd /home/fazlur/hadoop-2.7.3
    mkdir tmp
  • Edit “core-site.xml” file using the following command:
    gedit core-site.xml
  • This will open an editor. Append the following lines between <configuration></configuration> tags. Replace <TMP_FOLDER_PATH> with appropriate path.
    XML
    <property>
     <name>hadoop.tmp.dir</name>
     <value>/home/fazlur/hadoop-2.7.3/tmp</value>
     <description>A base for other temporary directories.</description>
    </property>
    
    <property>
     <name>fs.default.name</name>
     <value>hdfs://localhost:54310</value>
     <description>The name of the default file system.  A URI whose
     scheme and authority determine the FileSystem implementation.  The
     uri's scheme determines the config property (fs.SCHEME.impl) naming
     the FileSystem implementation class.  The uri's authority is used to
     determine the host, port, etc. for a filesystem.</description>
    </property>
  • Here is my one looks like:

    Image 31

  • Save and close the editor.
  • Run the following command to create “mapred-site.xml” file using “mapred-site.xml.template” template:
    cp mapred-site.xml.template mapred-site.xml
  • Edit “mapred-site.xml” using the following command:
    gedit mapred-site.xml
  • This will open an editor. Append the following lines between <configuration></configuration> tags. Replace <TMP_FOLDER_PATH> with appropriate path.
    XML
    <property>
     <name>mapred.job.tracker</name>
     <value>localhost:54311</value>
     <description>The host and port that the MapReduce job tracker runs
     at.  If "local", then jobs are run in-process as a single map
     and reduce task.
     </description>
    </property>
  • Here is my one looks like:

    Image 32

  • Save and close the editor.
  • Get into the root directory by executing command “cd”.
  • Format Hadoop File System by running the following command:
    hadoop namenode -format
  • Restart your machine.
  • Open the terminal and login as “su”.
  • Run this command to start hadoop:
    start-all.sh
  • Run this command to check if all the services has been started:
    jps

    Image 33

  • It looks like NameNode service is not running. Follow these steps to get it working:
    • Restart your machine.
    • Open terminal and login as “su”.
    • Type “cd” to move to root directory.
    • Execute command “hadoop namenode -format” to format hadoop file system.
    • Execute command “start-all.sh” to start all services.
    • Execute command “jps” to check if all the services has been started.

      Image 34

  • Now open your favourite browser and type the following url:
    http://localhost:8088
  • It opens a page like this if everything is up and running:

    Image 35

  • Type the following url to check datanodes as well as browse hadoop file system:
    http://localhost:50070
  • This opens a page like this:

    Image 36

  • Navigate to “Utilities-->Browse the file system” to check hadoop file system:

    Image 37

Conclusion

Hope you enjoyed reading and get a successful installation of hadoop in your ubuntu system. In my next consecutive articles, I will explain different components of Hadoop in details.

Thank you for reading my article and keeping in touch.

History

  • 26th January, 2017: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)