sqoop 1.4.6-cdh5.7.0安裝

sqoop簡介:

Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes. You can use Sqoop to import data from a relational database management system (RDBMS) such as MySQL or Oracle or a mainframe into the Hadoop Distributed File System (HDFS), transform the data in Hadoop MapReduce, and then export the data back into an RDBMS.java

 

Sqoop automates most of this process, relying on the database to describe the schema for the data to be imported. Sqoop uses MapReduce to import and export the data, which provides parallel operation as well as fault tolerance.node

sqoop 在設計之初就被定義爲數據傳輸的工具,你能夠使用它在hadoop跟rdbms關係型數據庫之間傳輸數據,例如MySQL,ORACLE數據導入到hadoop中,同時也支持把hadoop中的數據導入到rdbms中。mysql

sqoop簡化了數據導入,應用的流程。sqoop使用MR導入導出數據,提供了高容錯特性。git

傳統的應用程序管理系統,即應用程序與使用RDBMS的關係數據庫的交互,是產生大數據的來源之一。由RDBMS生成的這種大數據存儲在關係數據庫結構中的關係數據庫服務器中。當大數據存儲和Hadoop生態系統的MapReduce,Hive,HBase,Cassandra,Pig等分析器出現時,他們須要一種工具來與關係數據庫服務器進行交互,以導入和導出駐留在其中的大數據。在這裏,Sqoop在Hadoop生態系統中佔據一席之地,以便在關係數據庫服務器和Hadoop的HDFS之間提供可行的交互。sql

環境:

[hadoop@hd1 conf]$ more /etc/redhat-release 
Red Hat Enterprise Linux Server release 6.6 (Santiago)數據庫

從上面的介紹咱們瞭解到,sqoop 是一種數據傳輸工具,因此要部署sqoop就必須得有數據,這裏採用hadoop存儲結構化數據,MySQL存儲關係型數據,爲了更方便的操做hadoop裏面的數據咱們使用hive來實現。服務器

hadoop:Hadoop 2.6.0-cdh5.7.0app

啓動hadoop全部組件:statr-all.sh ide

[hadoop@hd1 ~]$ start-all.sh 
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
18/10/30 19:36:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [hd1]
hd1: starting namenode, logging to /home/hadoop/hadoop-2.6.0-cdh5.7.0/logs/hadoop-hadoop-namenode-hd1.out
hd4: starting datanode, logging to /home/hadoop/hadoop-2.6.0-cdh5.7.0/logs/hadoop-hadoop-datanode-hd4.out
hd3: starting datanode, logging to /home/hadoop/hadoop-2.6.0-cdh5.7.0/logs/hadoop-hadoop-datanode-hd3.out
hd2: starting datanode, logging to /home/hadoop/hadoop-2.6.0-cdh5.7.0/logs/hadoop-hadoop-datanode-hd2.out
Starting secondary namenodes [hd2]
hd2: starting secondarynamenode, logging to /home/hadoop/hadoop-2.6.0-cdh5.7.0/logs/hadoop-hadoop-secondarynamenode-hd2.out
18/10/30 19:37:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/hadoop-2.6.0-cdh5.7.0/logs/yarn-hadoop-resourcemanager-hd1.out
hd2: starting nodemanager, logging to /home/hadoop/hadoop-2.6.0-cdh5.7.0/logs/yarn-hadoop-nodemanager-hd2.out
hd3: starting nodemanager, logging to /home/hadoop/hadoop-2.6.0-cdh5.7.0/logs/yarn-hadoop-nodemanager-hd3.out
hd4: starting nodemanager, logging to /home/hadoop/hadoop-2.6.0-cdh5.7.0/logs/yarn-hadoop-nodemanager-hd4.out

MySQL:5.7.20 MySQL Community Server (GPL)工具

/etc/init.d/mysqld start 

Hive:hive-1.1.0-cdh5.7.0

Sqoop如何工做?

下圖描述了Sqoop的工做流程。

首先須要部署hadoop集羣,部署文檔在 https://my.oschina.net/u/3862440/blog/1862524

hive部署在https://my.oschina.net/u/3862440/blog/2251273

sqoop部署:

一、解壓安裝包

tar -xvf sqoop-1.4.6-cdh5.7.0.tar.gz -C /home/hadoop/

(http://archive-primary.cloudera.com/cdh5/cdh/5/)

二、修改sqoop-env.sh 

#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/home/hadoop/hadoop-2.6.0-cdh5.7.0

#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/home/hadoop/hadoop-2.6.0-cdh5.7.0

#set the path to where bin/hbase is available
#export HBASE_HOME=

#Set the path to where bin/hive is available
export HIVE_HOME=/home/hadoop/hive-1.1.0-cdh5.7.0

三、配置jdbc驅動

cp mysql-connector-java.jar  /home/hadoop/sqoop-1.4.6-cdh5.7.0/lib/

四、配置sqoop PATH

export SQOOP_HOME=/home/hadoop/sqoop-1.4.6-cdh5.7.0
export PATH=$PATH:$SQOOP_HOME/bin

五、版本驗證

[hadoop@hd1 conf]$ sqoop-version  18/10/30 19:52:06 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.7.0 Sqoop 1.4.6-cdh5.7.0 git commit id  Compiled by jenkins on Wed Mar 23 11:30:51 PDT 2016

相關文章
相關標籤/搜索