Chapter 3: Sqoop Installation

Depending on the framework you are using you will have to configure Sqoop accordingly. Let me take you through all separately.

Here I am considering three ways those people normally use-

  • Own configured Hadoop – Custom configuration
  • Cloudera CDH
  • HortonWorks

Now I will explain all in detail here-

1. Custom Sqoop Installation

Here I am considering that you have already installed Hadoop and Java on your system and now you just want to install Sqoop to proceed further.

Follow the below steps for Sqoop installation when you have configured your own clusters-

Step 1: Download Sqoop

In this tutorial, we are using the latest version of fully functional Sqoop 1.4.6. You can download this Sqoop version from here.

You will find multiple files here and you will have to download the sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz file.

Step 2: Start with the Sqoop installation

Now you will have to first untar the above file. Use the below command on your command line interface (eg. Putty) to untar the file.

$tar -xvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz

$ su

password:

Now you can move this file to the standard folder “/usr/lib/sqoop” directory using the below command-

# mv sqoop-1.4.6.bin__hadoop-2.0.4-alpha /usr/lib/sqoop

Step 3: Configure bashrc file

Now you will have to set the SQOOP_HOME path to the path where you put the Sqoop tar file. In our previous step, we have put it to “/usr/lib/sqoop” and so set the Sqoop_Home location to this location.

Use the below command to make this happen-

#Sqoop export SQOOP_HOME=/usr/lib/sqoop export PATH=$PATH:$SQOOP_HOME/bin

As we know whenever you will make changes to the bashrc file, you will have to execute it to reflect the changes. Use the below command to do that-

$ source ~/.bashrc

Step 4: Configure Sqoop now

So far we have untar the tar file and have also set the Sqoop_Home location. Now let’s perform the Sqoop basic file configuration.

To do this, we will have to make changes to sqoop-env.sh which you will find under Sqoop_Home/conf folder.

First of all, redirect to conf folder and copy the template file using the below two commands-

$ cd $SQOOP_HOME/conf

$ mv sqoop-env-template.sh sqoop-env.sh

Now open sqoop-env.sh file and make the changes as per the below 2 lines-

export HADOOP_COMMON_HOME=/usr/local/hadoop

export HADOOP_MAPRED_HOME=/usr/local/hadoop

Step5: Configure MySQL

As we need an RDBMS like MySQL or Oracle SQL for the data transfer between RDBMS and Hadoop and so let’s install MySQL as well.

First of all download mysql-connector-java-5.1.30.tar.gz file from this link.

Now untar the file and move to the /usr/lib/sqoop/lib directory.

$ tar -zxf mysql-connector-java-5.1.30.tar.gz

$ su

password:

# cd mysql-connector-java-5.1.30

# mv mysql-connector-java-5.1.30-bin.jar /usr/lib/sqoop/lib

You are all done now, just verify Sqoop and you are good to go.

Step 6: Verify Sqoop

$ cd $SQOOP_HOME/bin

$ sqoop-version

That’s all!!

Now let me tell you about the Sqoop installation in Cloudera and HortonWorks.

2. Install Sqoop in HortonWorks

Usually, if you are using HortonWorks image then you will find Sqoop and MySQL already installed and directly start using it from the command line.

You should check this link for the details on Install Sqoop in HortonWorks.

3. Sqoop installation in Cloudera

As you have already used Cloudera CDH and so you need to again install Sqoop. You can check the details using the below commands on command line-

$ sqoop help$ sqoop version

$ sqoop import

You can check the further details on Cloudera official site using this link.

This ends the Sqoop Installation chapter. If you have any issue, just email me at ashutosh741@gmail.com.

Previous: Sqoop ArchitectureChapter 4: Sqoop Import