As we know from Sqoop Tutorial that Sqoop is mainly used to import the data from RDBMS to Hadoop system and export the same from Hadoop system to RDBMS.
Here in this tutorial, we will see how to import data in Hive using Sqoop. The logic will be same as we used while importing in HBase. The only change you will find while importing data in Hive using Sqoop is the command.
This will be helpful when you will have to use Hive for data analysis. In such way you can save your time while first importing in HDFS and then to Hive.
So let’s see how we can import data in Hive using Sqoop efficiently.
How to import data in Hive using Sqoop
Hive is basically a data warehouse which is used for data analysis on large set of data. If you are looking for record level operation, prefer to use HBase for that.
But there is a limitation also. You can only import RDBMS table in Hive in Text and Sequence file formats.
If you are looking to import the data in ORC or RC file formats, you will have to follow the two-step procedures.
- First you should import the RDBMS tables in HDFS- Check this link for details
- Convert the data into ORC file format
- Then create Hive table and import the HDFS data to Hive table using the below command
Here is the architectural diagram shared by HortonWorks which shows how the import data in Hive using Sqoop works.
You need to use hive-import command to import RDBMS table in Hive using Sqoop.
So here is the script to import data into Hive using Sqoop-
Later you can use show table command in Hive to check whether the RDBMS table has been imported correctly or not.
Now depending on your requirements, you can perform any operation you want on the table.
Here all the parameters used means the same as it used to be normally. For details, you are advised to check this tutorial.
Let’s see this how to import data in Hive using Sqoop with the following example for clear understanding.
Import RDBMS Table in Hive using Sqoop Example
Now we will import the above table (say emp) into Hive using Sqoop.
For this we will create a table in Hive. This table can be either internal or external depending on your requirements. You can check this article to know when to use external or internal table.
Use the below script to create a table in Hive with the similar schema. Here I am using internal table.
Now execute the below script to import the RDBMS table in the Hive table using sqoop.
By this way, your RDBMS table emp will be in Hive table userdb.emp. You can use the command show table to check the output.
This was all about how to import data in Hive using Sqoop. I hope with the help of this tutorial, you can easily import RDBMS table in Hive using Sqoop.
If you want to keep the data in Text or Sequence files, simply make the tables into Hive else first import in HDFS and then keep the data in Hive.
Do try this and comment down for any issue.