Hadoop Hive MySQL Sqoop

How to Import Data in Hive using Sqoop

As we know from Sqoop Tutorial that Sqoop is mainly used to import the data from RDBMS to Hadoop system and export the same from Hadoop system to RDBMS.

Earlier we saw how to import data from RDBMS to HDFS and HBase and export the data from HDFS to RDBMS.

Here in this tutorial, we will see how to import data in Hive using Sqoop. The logic will be same as we used while importing in HBase. The only change you will find while importing data in Hive using Sqoop is the command.

How to Import Data in Hive using SqoopThis will be helpful when you will have to use Hive for data analysis. In such way you can save your time while first importing in HDFS and then to Hive.

So let’s see how we can import data in Hive using Sqoop efficiently.

How to import data in Hive using Sqoop

Hive is basically a data warehouse which is used for data analysis on large set of data. If you are looking for record level operation, prefer to use HBase for that.

But there is a limitation also. You can only import RDBMS table in Hive in Text and Sequence file formats.

If you are looking to import the data in ORC or RC file formats, you will have to follow the two-step procedures.

  • First you should import the RDBMS tables in HDFS- Check this link for details
  • Convert the data into ORC file format
  • Then create Hive table and import the HDFS data to Hive table using the below command

LOAD DATA INPATH ‘hdfs_file_path’ [OVERWRITE] INTO TABLE tablename

Here is the architectural diagram shared by HortonWorks which shows how the import data in Hive using Sqoop works.

Sqoop Operation ArchitectureYou need to use hive-import command to import RDBMS table in Hive using Sqoop.

So here is the script to import data into Hive using Sqoop-

$ sqoop import –connect jdbc:mysql://localhost/dbname
–username root -P
–table empdetails
–hive-import
–hive-table dbname.empdetails -m 1

Later you can use show table command in Hive to check whether the RDBMS table has been imported correctly or not.

hive> show tables;

Now depending on your requirements, you can perform any operation you want on the table.

Here all the parameters used means the same as it used to be normally. For details, you are advised to check this tutorial.

Let’s see this how to import data in Hive using Sqoop with the following example for clear understanding.

Import RDBMS Table in Hive using Sqoop Example

Id Name Company EmailID Designation
1 John ABC john@abc.com Director

Now we will import the above table (say emp) into Hive using Sqoop.

For this we will create a table in Hive. This table can be either internal or external depending on your requirements. You can check this article to know when to use external or internal table.

Use the below script to create a table in Hive with the similar schema. Here I am using internal table.

CREATE TABLE userdb.emp (
Id INT,
Name VARCHAR(20),
Company VARCHAR(20),
EmailID VARCHAR(50),
designation VARCHAR(20))
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘,’
STORED AS TEXTFILE;

Now execute the below script to import the RDBMS table in the Hive table using sqoop.

sqoop import \
–connect jdbc:mysql://localhost/userdb \
–username root
–password root \
–table emp –fields-terminated-by ‘,’ \
–hive-import
–hive-table userdb.emp \
— m 1
–target-dir /emp

By this way, your RDBMS table emp will be in Hive table userdb.emp. You can use the command show table to check the output.

Conclusion

This was all about how to import data in Hive using Sqoop. I hope with the help of this tutorial, you can easily import RDBMS table in Hive using Sqoop.

If you want to keep the data in Text or Sequence files, simply make the tables into Hive else first import in HDFS and then keep the data in Hive.

Do try this and comment down for any issue.

1 Comment

Leave a Comment