Chapter 1: Sqoop Introduction

Before big data came in existence, all the data were used to store in Relational Database Servers in the relational database structure.

When Big Data (more specifically Hadoop) came into picture and developers started working on Hadoop and ecosystems like Hive, PIG etc. then they needed a system which can help them to get the data from earlier RDBMS to Hadoop system and by this way Sqoop came into picture to provide a solid interaction between relational database server and Hadoop’s HDFS.

And this is the reason, Sqoop is also known as-


Sqoop: SQL to Hadoop and Hadoop to SQL Tool


What is Sqoop?

Sqoop is a tool used for data transfer between RDBMS (like MySQL, Oracle SQL etc.) and Hadoop (Hive, HDFS, and HBASE etc.)

It is used to import data from RDBMS to Hadoop and export data from Hadoop to RDBMS.

Again Sqoop is one of the top projects by Apache software foundation and works brilliantly with relational databases such as Teradata, Netezza, Oracle, MySQL, and Postgres etc.


Why is Sqoop used?

Big data developer’s works start once the data are in Hadoop system like in HDFS, Hive or Hbase. They do their magical stuff to find all the golden information hidden on such a huge amount of data.

Before Sqoop came, developers used to write to import and export data between Hadoop and RDBMS and a tool was needed to the same.

Sqoop came and filled the gap between the transfer between relational databases and Hadoop system.

Again, Sqoop uses the MapReduce mechanism for its operations like import and export work and work on a parallel mechanism as well as fault tolerance.

In Sqoop, developers just need to mention the source, destination and the rest of the work will be done by the Sqoop tool.

Now let’s talk about some of the amazing features of Sqoop for big data developers.


Features of Sqoop

Sqoop is robust, easily usable and has community support and contribution. Currently, we are using Sqoop latest version 1.4.6.

Here are some stunning features of Sqoop-

• Full Load
• Incremental Load
• Parallel import/export
• Import results of SQL query
• Compression
• Connectors for all major RDBMS Databases
• Kerberos Security Integration
• Load data directly into Hive/Hbase
• Support for Accumulo

In the next section, I will go through the Sqoop architecture which is also very simple. In fact, if you will start enjoying then whole Sqoop is very easy. All our Sqoop tutorial chapters are small units and won’t take much of your time.

Previous Chapter: Sqoop IntroductionChapter 2: Sqoop Architecture