Chapter 6: Sqoop Jobs
In Sqoop, you can schedule the jobs so that those can run either at the time you will schedule or in the queue.
If you are have configured the Hadoop ecosystem including CDH and HortonWorks then be sure that your cluster is started for the job to run. But in a corporate environment as the server never shut down and so need to focus on this.
Now let me show you how to create the Sqoop jobs and schedule.
Create a job to Import data from RDBMS to Hadoop
Here we will create a job (let’s say firstjob) to import the table from MySQL to HDFS. Also, assuming here that in MySQL we have table ‘emp’ in ‘empdb’ database.
So, you can use the below commands to create a job to import the table data from RDBMS to Hadoop.
sqoop job –create firstjob \ --import \ --connect jdbc:mysql://localhost/empdb \ --username root \ --table emp --m 1
That’s all. Your job is scheduled but it won’t run as it is. In next section, I will tell you how to execute a Sqoop job.
How to check all Sqoop Jobs?
‘–list’ command is used to check all the scheduled Sqoop jobs. Use the below command and get the list of all your scheduled jobs.
How to find the details of a Job?
Let’s say you have a job (in our case, firstjob) and you want to know the details of it. You can do this easily by using ‘–show’ command.
How to execute a Sqoop job?
As I told you that you have just created a job and it won’t run until you will execute it. So let’s execute it. ‘—exec’ is used to execute a job.
$ sqoop job --exec firstjob
Here are the command names for different operations to create the Sqoop job.
|–create||Defines a new job with the specified job-id (name). Actual Sqoop import command should be separated by “–“|
|–delete||Delete a saved job.|
|–exec||Executes the saved job.|
|–show||Show the save job configuration|
|–list||Lists all the saved jobs|