Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes. You can use Sqoop to import data from a relational database management system (RDBMS) such as MySQL or Oracle or a mainframe into the Hadoop Distributed File System (HDFS), transform the data in Hadoop MapReduce, and then export the data back into an RDBMS.java
Sqoop automates most of this process, relying on the database to describe the schema for the data to be imported. Sqoop uses MapReduce to import and export the data, which provides parallel operation as well as fault tolerance.node
sqoop 在設計之初就被定義爲數據傳輸的工具,你能夠使用它在hadoop跟rdbms關係型數據庫之間傳輸數據,例如MySQL,ORACLE數據導入到hadoop中,同時也支持把hadoop中的數據導入到rdbms中。mysql
sqoop簡化了數據導入,應用的流程。sqoop使用MR導入導出數據,提供了高容錯特性。git
傳統的應用程序管理系統,即應用程序與使用RDBMS的關係數據庫的交互,是產生大數據的來源之一。由RDBMS生成的這種大數據存儲在關係數據庫結構中的關係數據庫服務器中。當大數據存儲和Hadoop生態系統的MapReduce,Hive,HBase,Cassandra,Pig等分析器出現時,他們須要一種工具來與關係數據庫服務器進行交互,以導入和導出駐留在其中的大數據。在這裏,Sqoop在Hadoop生態系統中佔據一席之地,以便在關係數據庫服務器和Hadoop的HDFS之間提供可行的交互。sql
[hadoop@hd1 conf]$ more /etc/redhat-release
Red Hat Enterprise Linux Server release 6.6 (Santiago)數據庫
從上面的介紹咱們瞭解到,sqoop 是一種數據傳輸工具,因此要部署sqoop就必須得有數據,這裏採用hadoop存儲結構化數據,MySQL存儲關係型數據,爲了更方便的操做hadoop裏面的數據咱們使用hive來實現。服務器
hadoop:Hadoop 2.6.0-cdh5.7.0app
啓動hadoop全部組件:statr-all.sh ide
[hadoop@hd1 ~]$ start-all.sh This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh 18/10/30 19:36:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Starting namenodes on [hd1] hd1: starting namenode, logging to /home/hadoop/hadoop-2.6.0-cdh5.7.0/logs/hadoop-hadoop-namenode-hd1.out hd4: starting datanode, logging to /home/hadoop/hadoop-2.6.0-cdh5.7.0/logs/hadoop-hadoop-datanode-hd4.out hd3: starting datanode, logging to /home/hadoop/hadoop-2.6.0-cdh5.7.0/logs/hadoop-hadoop-datanode-hd3.out hd2: starting datanode, logging to /home/hadoop/hadoop-2.6.0-cdh5.7.0/logs/hadoop-hadoop-datanode-hd2.out Starting secondary namenodes [hd2] hd2: starting secondarynamenode, logging to /home/hadoop/hadoop-2.6.0-cdh5.7.0/logs/hadoop-hadoop-secondarynamenode-hd2.out 18/10/30 19:37:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable starting yarn daemons starting resourcemanager, logging to /home/hadoop/hadoop-2.6.0-cdh5.7.0/logs/yarn-hadoop-resourcemanager-hd1.out hd2: starting nodemanager, logging to /home/hadoop/hadoop-2.6.0-cdh5.7.0/logs/yarn-hadoop-nodemanager-hd2.out hd3: starting nodemanager, logging to /home/hadoop/hadoop-2.6.0-cdh5.7.0/logs/yarn-hadoop-nodemanager-hd3.out hd4: starting nodemanager, logging to /home/hadoop/hadoop-2.6.0-cdh5.7.0/logs/yarn-hadoop-nodemanager-hd4.out
MySQL:5.7.20 MySQL Community Server (GPL)工具
/etc/init.d/mysqld start
Hive:hive-1.1.0-cdh5.7.0
下圖描述了Sqoop的工做流程。
首先須要部署hadoop集羣,部署文檔在 https://my.oschina.net/u/3862440/blog/1862524
hive部署在https://my.oschina.net/u/3862440/blog/2251273
tar -xvf sqoop-1.4.6-cdh5.7.0.tar.gz -C /home/hadoop/
(http://archive-primary.cloudera.com/cdh5/cdh/5/)
#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/home/hadoop/hadoop-2.6.0-cdh5.7.0
#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/home/hadoop/hadoop-2.6.0-cdh5.7.0
#set the path to where bin/hbase is available
#export HBASE_HOME=
#Set the path to where bin/hive is available
export HIVE_HOME=/home/hadoop/hive-1.1.0-cdh5.7.0
三、配置jdbc驅動
cp mysql-connector-java.jar /home/hadoop/sqoop-1.4.6-cdh5.7.0/lib/
export SQOOP_HOME=/home/hadoop/sqoop-1.4.6-cdh5.7.0
export PATH=$PATH:$SQOOP_HOME/bin
[hadoop@hd1 conf]$ sqoop-version 18/10/30 19:52:06 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.7.0 Sqoop 1.4.6-cdh5.7.0 git commit id Compiled by jenkins on Wed Mar 23 11:30:51 PDT 2016