Sqoop Import Function

[vc_row][vc_column][vc_text_separator title=”Chapter 4: Sqoop Import” color=”green” style=”double” border_width=”2″ add_icon=”true”][/vc_column][/vc_row][vc_row][vc_column][vc_column_text]We have come to one of the best use of Sqoop that is Sqoop Import. Sqoop import tool is used to import the tables from a relational database like MySQL, Oracle SQL etc.to the Hadoop system like Sqoop import to HDFS or Hbase etc.

Each row of an RDBMS table is treated as the records of the file. It will store text data in text file and binary data in sequence file or Avro file.

Let me explain this Sqoop Import function using a simple example here.

For example, I have one table as below-

The table name is “emp” in the database “userdb” in MySQL database server[/vc_column_text][/vc_column][/vc_row][vc_row][vc_column][vc_column_text]

Id Name Salary Department Designation
101 John 5000 IT Developer
102 Chris 6000 Sales Manager
103 Jamie 7000 Support Director
[/vc_column_text][/vc_column][/vc_row][vc_row][vc_column][vc_column_text]Now we will import this table emp data into HDFS. Let’s see how easy it is-

Sqoop ‘Import’ tool is used to import the table. Sqoop provides some command to get this tool done.[/vc_column_text][/vc_column][/vc_row][vc_row][vc_column][vc_message message_box_color=”grey” icon_fontawesome=”fa fa-code”]$ sqoop import \

–connect jdbc:mysql://localhost/userdb \

–username root \

–table emp

–m 1[/vc_message][/vc_column][/vc_row][vc_row][vc_column][vc_column_text]Once you will run the above commands, you will find it running and using some mappers. Once done, you can also see the output using below commands-[/vc_column_text][/vc_column][/vc_row][vc_row][vc_column][vc_message message_box_color=”grey” icon_fontawesome=”fa fa-code”]$HADOOP_HOME/bin/hadoop fs -cat /emp/part-m-*[/vc_message][/vc_column][/vc_row][vc_row][vc_column][vc_column_text]You will find the output file in HDFS something like below. It will be a comma (,) separated.[/vc_column_text][/vc_column][/vc_row][vc_row][vc_column][vc_cta h2=”” txt_align=”center” style=”flat” color=”peacoc”]101,     John,   5000,   IT,        Developer

102,     Chris,   6000,   Sales,   Manager

103,     Jamie,  7000,   Support, Director[/vc_cta][/vc_column][/vc_row][vc_row][vc_column][vc_column_text]Using the above command, the file will be saved in a default location in HDFS.[/vc_column_text][/vc_column][/vc_row][vc_row][vc_column][vc_custom_heading text=”Import Table data in a particular location” font_container=”tag:h2|text_align:left|color:%2335b524″ use_theme_fonts=”yes”][/vc_column][/vc_row][vc_row][vc_column][vc_column_text]But what if we want to save the file in a particular location (let’s say rdbmsdata) in HDFS?

You can specify the location (target directory) in above command. For this, we use the below command-[/vc_column_text][/vc_column][/vc_row][vc_row][vc_column][vc_message message_box_color=”grey” icon_fontawesome=”fa fa-code”]–target-dir /rdbmsdata[/vc_message][/vc_column][/vc_row][vc_row][vc_column][vc_column_text]So the complete command will be something like below-[/vc_column_text][/vc_column][/vc_row][vc_row][vc_column][vc_message message_box_color=”grey” icon_fontawesome=”fa fa-code”]$ sqoop import \

–connect jdbc:mysql://localhost/userdb \

–username root \

–table emp \

–m 1 \

–target-dir / rdbmsdata[/vc_message][/vc_column][/vc_row][vc_row][vc_column][vc_column_text]Verify the result using the below command-[/vc_column_text][/vc_column][/vc_row][vc_row][vc_column][vc_message message_box_color=”grey” icon_fontawesome=”fa fa-code”]$HADOOP_HOME/bin/hadoop fs -cat /rdbmsdata/part-m-*[/vc_message][/vc_column][/vc_row][vc_row][vc_column][vc_custom_heading text=”Import data based on a condition” font_container=”tag:h2|text_align:left|color:%2335b524″ use_theme_fonts=”yes”][/vc_column][/vc_row][vc_row][vc_column][vc_column_text]Sqoop also provides the feasibility to import some part of the table. You can use “where” condition to insert some condition in the table.

For example, we just need those employee details who is a manager then simply put the condition in the above query and get that done.

The overall query will be like below-[/vc_column_text][/vc_column][/vc_row][vc_row][vc_column][vc_message message_box_color=”grey” icon_fontawesome=”fa fa-code”]$ sqoop import \

–connect jdbc:mysql://localhost/userdb \

–username root \

–table emp \

–m 1 \

–where “Designation =’Manager’” \

–target-dir /folderlocation[/vc_message][/vc_column][/vc_row][vc_row][vc_column][vc_column_text]You can verify the file using the below query-[/vc_column_text][/vc_column][/vc_row][vc_row][vc_column][vc_message message_box_color=”grey” icon_fontawesome=”fa fa-code”]$ $HADOOP_HOME/bin/hadoop fs -cat /folderlocation/part-m-*[/vc_message][/vc_column][/vc_row][vc_row][vc_column][vc_custom_heading text=”Incremental Import” font_container=”tag:h2|text_align:left|color:%2335b524″ use_theme_fonts=”yes”][/vc_column][/vc_row][vc_row][vc_column][vc_column_text]This section will be highly helpful to you if you are using on real time data. For example, you are working on some employee data of a company and everyday new hiring is getting done and so new employee will get added as well.

So in such situation, you will only want to import only the newly updated records and in such case incremental import will help you a lot.

To get this done, you need mainly three things-[/vc_column_text][/vc_column][/vc_row][vc_row][vc_column][vc_cta h2=”” style=”3d” color=”vista-blue”]

  • Incremental- Mode like appended
  • check-column – based on which column you want to decide the incremental value
  • last value – What was the last value you imported so that it will take next value after that
[/vc_cta][/vc_column][/vc_row][vc_row][vc_column][vc_column_text]Now considering a new employee has been added in the emp table with the record as below-

104,     Henning,         4000,   IT,        Developer

And we want to take ID column as the check column and so the last value was 103 and the new is 104.

The complete query to import the newly added rows in the table emp.

Let’s see the query-[/vc_column_text][/vc_column][/vc_row][vc_row][vc_column][vc_message message_box_color=”grey” icon_fontawesome=”fa fa-code”]sqoop import \

–connect jdbc:mysql://localhost/userdb \

–username root \

–table emp \

–m 1 \

–incremental append \

–check-column id \

–last-value 103[/vc_message][/vc_column][/vc_row][vc_row][vc_column][vc_column_text]Again you can check the output using the below command-[/vc_column_text][/vc_column][/vc_row][vc_row][vc_column][vc_message message_box_color=”grey” icon_fontawesome=”fa fa-code”]$HADOOP_HOME/bin/hadoop fs -cat /emp/part-m-*[/vc_message][/vc_column][/vc_row][vc_row][vc_column][vc_column_text]As you have seen so far, the result will be comma separated by default while importing the data from the relational database to Hadoop.

In all the above examples, I have shown you how to import table in HDFS through Sqoop. Now let me tell you, how to import data into HBase with Sqoop.[/vc_column_text][/vc_column][/vc_row][vc_row][vc_column][vc_custom_heading text=”How to Import Table Data in HBase with Sqoop?” font_container=”tag:h2|text_align:left|color:%2335b524″ use_theme_fonts=”yes”][/vc_column][/vc_row][vc_row][vc_column][vc_column_text]Like we imported table in HDFS through Sqoop, in a similar way we can do for Hbase as well but there is some twist.

Sqoop doesn’t now permit you to import a relational table directly into an HBase table having multiple column families. To work around this limitation, you can create the HBase table first and then execute three Sqoop import operations to finish the task.

Let’s start by creating an Hbase table first-

Create ‘tablename’, ‘columnfamily’

For example: create ‘empdetails’, ‘personal data’, ‘professional data’

Once you have created a table in HBase, you are all set to write the command to import the RDBMS table into Hbase through Sqoop. Below is the code-[/vc_column_text][/vc_column][/vc_row][vc_row][vc_column][vc_message message_box_color=”grey” icon_fontawesome=”fa fa-code”]sqoop import \

–connect jdbc:mysql://localhost/serviceorderdb \

–username root -P \

–table emp \

–columns “designation,name” \

–hbase-table empdetails \

–column-family personal details \

–hbase-row-key ID -m 1[/vc_message][/vc_column][/vc_row][vc_row][vc_column][vc_column_text]Once executed, you can scan the HBase table to check the output.

Scan empdetails;[/vc_column_text][/vc_column][/vc_row][vc_row][vc_column][vc_custom_heading text=”Sqoop Import All Tables” font_container=”tag:h2|text_align:left|color:%2335b524″ use_theme_fonts=”yes”][/vc_column][/vc_row][vc_row][vc_column][vc_column_text]There will be a time when you will need to import all the tables of a database to Hadoop and in that case, Import_all command of Sqoop will help you out.

Let’s say I have a database in MySQL called “emp_warehouse” which has around 1k tables for different purposes and I want to import all the tables in Hadoop then I will simply write the code like below and done-[/vc_column_text][/vc_column][/vc_row][vc_row][vc_column][vc_message message_box_color=”grey” icon_fontawesome=”fa fa-code”]$ sqoop import-all-tables \

–connect jdbc:mysql://localhost/emp_warehouse \

–username root[/vc_message][/vc_column][/vc_row][vc_row][vc_column][vc_message message_box_style=”solid” message_box_color=”alert-success” icon_type=”pixelicons” icon_pixelicons=”vc_pixel_icon vc_pixel_icon-tick”]Here the tool will create the directory as per the table name in HDFS.

One more thing here you should note that, if you want to import the complete database from any relational databases then you should have at least one primary key in all your table.[/vc_message][/vc_column][/vc_row]

Previous: Sqoop InstallationChapter 5: Sqoop Export Tool