Sqoop1和Sqoop2簡介

主要來源: http://www.linuxidc.com/Linux/2014-10/108337.htmhtml

http://sqoop.apache.org/ --- 官網
http://sqoop.apache.org/docs/1.4.4/SqoopUserGuide.html
http://sqoop.apache.org/
http://sqoop.apache.org/docs/1.4.4/index.html
http://blog.csdn.net/maixia24/article/details/9266275 ---文章不錯

1.什麼是Sqooplinux

Sqoop即 SQL to Hadoop ,是一款方便的在傳統型數據庫與Hadoop之間進行數據遷移的工具,充分利用MapReduce並行特色以批處理的方式加快數據傳輸,發展至今主要演化了二大版本,Sqoop1和Sqoop2。web

Sqoop工具是hadoop下鏈接關係型數據庫和Hadoop的橋樑,支持關係型數據庫和hive、hdfs,hbase之間數據的相互導入,能夠使用全表導入和增量導入。sql

那麼爲何選擇Sqoop呢?數據庫

高效可控的利用資源,任務並行度,超時時間。 數據類型映射與轉化,可自動進行,用戶也可自定義 支持多種主流數據庫,MySQL,Oracle,SQL Server,DB2等等 express

2.Sqoop1和Sqoop2對比的異同之處apache

兩個不一樣的版本,徹底不兼容 版本號劃分區別,Apache版本:1.4.x(Sqoop1); 1.99.x(Sqoop2)    CDH版本 : Sqoop-1.4.3-cdh4(Sqoop1) ; Sqoop2-1.99.2-cdh4.5.0 (Sqoop2)Sqoop2比Sqoop1的改進 引入Sqoop server,集中化管理connector等 多種訪問方式:CLI,Web UI,REST API 引入基於角色的安全機制 api

3.Sqoop1與Sqoop2的架構圖安全

Sqoop架構圖1bash

Sqoop架構圖2

===========

 

經過Sqoop實現Mysql / Oracle 與HDFS / Hbase互導數據 http://www.linuxidc.com/Linux/2013-06/85817.htm

[Hadoop] Sqoop安裝過程詳解 http://www.linuxidc.com/Linux/2013-05/84082.htm

用Sqoop進行MySQL和HDFS系統間的數據互導 http://www.linuxidc.com/Linux/2013-04/83447.htm

Hadoop Oozie學習筆記 Oozie不支持Sqoop問題解決 http://www.linuxidc.com/Linux/2012-08/67027.htm

Hadoop生態系統搭建(hadoop hive hbase zookeeper oozie Sqoop) http://www.linuxidc.com/Linux/2012-03/55721.htm

Hadoop學習全程記錄——使用Sqoop將MySQL中數據導入到Hive中 http://www.linuxidc.com/Linux/2012-01/51993.htm

4.Sqoop1與Sqoop2的優缺點

比較

Sqoop1

Sqoop2

架構

僅僅使用一個Sqoop客戶端

引入了Sqoop server集中化管理connector,以及rest api,web,UI,並引入權限安全機制

部署

部署簡單,安裝須要root權限,connector必須符合JDBC模型

架構稍複雜,配置部署更繁瑣

使用    

命令行方式容易出錯,格式緊耦合,沒法支持全部數據類型,安全機制不夠完善,例如密碼暴漏

多種交互方式,命令行,web UI,rest API,conncetor集中化管理,全部的連接安裝在Sqoop server上,完善權限管理機制,connector規範化,僅僅負責數據的讀寫

5.Sqoop1的安裝部署

5.0 安裝環境

hadoop:hadoop-2.3.0-cdh5.1.2

sqoop:sqoop-1.4.4-cdh5.1.2

5.1 下載安裝包及解壓

tar -zxvf  sqoop-1.4.4-cdh5.1.2.tar.gz

ln -s sqoop-1.4.4-cdh5.1.2  sqoop

5.2 配置環境變量和配置文件

cd sqoop/conf/

cat  sqoop-env-template.sh  >> sqoop-env.sh

vi sqoop-env.sh

在sqoop-env.sh中添加以下代碼

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# included in all the hadoop scripts with source command
# should not be executable directly
# also should not be passed any arguments, since we need original $*

# Set Hadoop-specific environment variables here.

#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/home/hadoop/hadoop

#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/home/hadoop/hadoop

#set the path to where bin/hbase is available
export HBASE_HOME=/home/hadoop/hbase

#Set the path to where bin/hive is available
export HIVE_HOME=/home/hadoop/hive

#Set the path for where zookeper config dir is
export ZOOCFGDIR=/home/hadoop/zookeeper

該配置文件中只有HADOOP_COMMON_HOME的配置是必須的 另外關於hbase和hive的配置 若是用到須要配置 不用的話就不用配置

更多詳情見請繼續閱讀下一頁的精彩內容: http://www.linuxidc.com/Linux/2014-10/108337p2.htm

相關文章
相關標籤/搜索