1.什麼是Sqoophtml
Sqoop即 SQL to Hadoop ,是一款方便的在傳統型數據庫與Hadoop之間進行數據遷移的工具,充分利用MapReduce並行特色以批處理的方式加快數據傳輸,發展至今主要演化了二大版本,Sqoop1和Sqoop2。 java
Sqoop工具是hadoop下鏈接關係型數據庫和Hadoop的橋樑,支持關係型數據庫和hive、hdfs,hbase之間數據的相互導入,可使用全表導入和增量導入。mysql
那麼爲何選擇Sqoop呢? web
高效可控的利用資源,任務並行度,超時時間。 數據類型映射與轉化,可自動進行,用戶也可自定義 支持多種主流數據庫,MySQL,Oracle,SQL Server,DB2等等 sql
2.Sqoop1和Sqoop2對比的異同之處數據庫
兩個不一樣的版本,徹底不兼容 版本號劃分區別,Apache版本:1.4.x(Sqoop1); 1.99.x(Sqoop2) CDH版本 : Sqoop-1.4.3-cdh4(Sqoop1) ; Sqoop2-1.99.2-cdh4.5.0 (Sqoop2)Sqoop2比Sqoop1的改進 引入Sqoop server,集中化管理connector等 多種訪問方式:CLI,Web UI,REST API 引入基於角色的安全機制 express
3.Sqoop1與Sqoop2的架構圖apache
Sqoop架構圖1api
Sqoop架構圖2安全
4.Sqoop1與Sqoop2的優缺點
比較 |
Sqoop1 |
Sqoop2 |
架構 |
僅僅使用一個Sqoop客戶端 |
引入了Sqoop server集中化管理connector,以及rest api,web,UI,並引入權限安全機制 |
部署 |
部署簡單,安裝須要root權限,connector必須符合JDBC模型 |
架構稍複雜,配置部署更繁瑣 |
使用 |
命令行方式容易出錯,格式緊耦合,沒法支持全部數據類型,安全機制不夠完善,例如密碼暴漏 |
多種交互方式,命令行,web UI,rest API,conncetor集中化管理,全部的連接安裝在Sqoop server上,完善權限管理機制,connector規範化,僅僅負責數據的讀寫 |
5.Sqoop1的安裝部署
5.0 安裝環境
hadoop:hadoop-2.3.0-cdh5.1.2
sqoop:sqoop-1.4.4-cdh5.1.2
5.1 下載安裝包及解壓
tar -zxvf sqoop-1.4.4-cdh5.1.2.tar.gz
ln -s sqoop-1.4.4-cdh5.1.2 sqoop
5.2 配置環境變量和配置文件
<span style="font-size:18px;">cd sqoop/conf/ cat sqoop-env-template.sh >> sqoop-env.sh vi sqoop-env.sh </span>
在sqoop-env.sh中添加以下代碼
<span style="font-size:18px;"># Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # included in all the hadoop scripts with source command # should not be executable directly # also should not be passed any arguments, since we need original $* # Set Hadoop-specific environment variables here. #Set path to where bin/hadoop is available export HADOOP_COMMON_HOME=/home/hadoop/hadoop #Set path to where hadoop-*-core.jar is available export HADOOP_MAPRED_HOME=/home/hadoop/hadoop #set the path to where bin/hbase is available export HBASE_HOME=/home/hadoop/hbase #Set the path to where bin/hive is available export HIVE_HOME=/home/hadoop/hive #Set the path for where zookeper config dir is export ZOOCFGDIR=/home/hadoop/zookeeper </span>
該配置文件中只有HADOOP_COMMON_HOME的配置是必須的 另外關於hbase和hive的配置 若是用到須要配置 不用的話就不用配置
5.3 添加須要的jar包到lib下面
這裏的jar包指的是鏈接關係型數據庫的jar 好比mysql oracle 這些jar包是須要本身添加到lib目錄下面去的
<span style="font-size:18px;"> cp ~/hive/lib/mysql-connector-java-5.1.30.jar ~/sqoop/lib/</span>
5.4 添加環境變量
vi ~/.profile
添加以下內容
<span style="font-size:18px;">export SQOOP_HOME=/home/hadoop/sqoop export SBT_HOME=/home/hadoop/sbt export PATH=$PATH:$SBT_HOME/bin:$SQOOP_HOME/bin export CLASSPATH=$CLASSPATH:$SQOOP_HOME/lib </span>
source ~/.profile使配置文件生效
5.5 測試mysql數據庫的鏈接使用
①鏈接mysql數據庫,列出全部的數據庫
<span style="font-size:18px;">hadoop@caozw:~/sqoop/conf$ sqoop list-databases --connect jdbc:mysql://127.0.0.1:3306/ --username root -P Warning: /home/hadoop/sqoop/../hive-hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /home/hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 14/10/21 18:15:15 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4-cdh5.1.2 Enter password: 14/10/21 18:15:19 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. information_schema XINGXUNTONG XINGXUNTONG_HIVE amon hive hmon mahout mysql oozie performance_schema realworld rman scm smon </span>
-P表示輸入密碼 能夠直接使用--password來制定密碼
②mysql數據庫的表導入到HDFS
hadoop@caozw:~/sqoop/conf$ sqoop import -m 1 --connect jdbc:mysql://127.0.0.1:3306/realworld --username root -P --table weblogs --target-dir /user/sqoop/test1 Warning: /home/hadoop/sqoop/../hive-hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /home/hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 14/10/21 18:19:18 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4-cdh5.1.2 Enter password: 14/10/21 18:19:21 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 14/10/21 18:19:21 INFO tool.CodeGenTool: Beginning code generation 14/10/21 18:19:22 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `weblogs` AS t LIMIT 1 14/10/21 18:19:22 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `weblogs` AS t LIMIT 1 14/10/21 18:19:22 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/hadoop Note: /tmp/sqoop-hadoop/compile/15cb67e2b315154cdf02e3a17cf32bbe/weblogs.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 14/10/21 18:19:23 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/15cb67e2b315154cdf02e3a17cf32bbe/weblogs.jar 14/10/21 18:19:23 WARN manager.MySQLManager: It looks like you are importing from mysql. 14/10/21 18:19:23 WARN manager.MySQLManager: This transfer can be faster! Use the --direct 14/10/21 18:19:23 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path. 14/10/21 18:19:23 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql) 14/10/21 18:19:23 INFO mapreduce.ImportJobBase: Beginning import of weblogs SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.3.0-cdh5.1.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/hadoop/hbase-0.98.1-cdh5.1.2/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 14/10/21 18:19:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/10/21 18:19:24 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 14/10/21 18:19:25 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 14/10/21 18:19:25 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 14/10/21 18:19:40 INFO db.DBInputFormat: Using read commited transaction isolation 14/10/21 18:19:41 INFO mapreduce.JobSubmitter: number of splits:1 14/10/21 18:19:42 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1413879907572_0002 14/10/21 18:19:46 INFO impl.YarnClientImpl: Submitted application application_1413879907572_0002 14/10/21 18:19:46 INFO mapreduce.Job: The url to track the job: N/A 14/10/21 18:19:46 INFO mapreduce.Job: Running job: job_1413879907572_0002 14/10/21 18:20:12 INFO mapreduce.Job: Job job_1413879907572_0002 running in uber mode : false 14/10/21 18:20:12 INFO mapreduce.Job: map 0% reduce 0% 14/10/21 18:20:41 INFO mapreduce.Job: map 100% reduce 0% 14/10/21 18:20:45 INFO mapreduce.Job: Job job_1413879907572_0002 completed successfully 14/10/21 18:20:46 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=107189 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=87 HDFS: Number of bytes written=251130 HDFS: Number of read operations=4 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=22668 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=22668 Total vcore-seconds taken by all map tasks=22668 Total megabyte-seconds taken by all map tasks=23212032 Map-Reduce Framework Map input records=3000 Map output records=3000 Input split bytes=87 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=41 CPU time spent (ms)=1540 Physical memory (bytes) snapshot=133345280 Virtual memory (bytes) snapshot=1201442816 Total committed heap usage (bytes)=76021760 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=251130 14/10/21 18:20:46 INFO mapreduce.ImportJobBase: Transferred 245.2441 KB in 80.7974 seconds (3.0353 KB/sec) 14/10/21 18:20:46 INFO mapreduce.ImportJobBase: Retrieved 3000 records.
-m 表示啓動幾個map任務來讀取數據 若是數據庫中的表沒有主鍵這個參數是必須設置的並且只能設定爲1 不然會提示
14/10/21 18:18:27 ERROR tool.ImportTool: Error during import: No primary key could be found for table weblogs. Please specify one with --split-by or perform a sequential import with '-m 1'.
而這個參數設置爲幾會直接決定導入的文件在hdfs上面是分紅幾塊的 好比 設置爲1 則會產生一個數據文件
14/10/21 18:23:54 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 2 items -rw-r--r-- 1 hadoop supergroup 0 2014-10-21 18:20 /user/sqoop/test1/_SUCCESS -rw-r--r-- 1 hadoop supergroup 251130 2014-10-21 18:20 /user/sqoop/test1/part-m-00000
這裏添加主鍵:
mysql> desc weblogs; +--------------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +--------------+-------------+------+-----+---------+-------+ | md5 | varchar(32) | YES | | NULL | | | url | varchar(64) | YES | | NULL | | | request_date | date | YES | | NULL | | | request_time | time | YES | | NULL | | | ip | varchar(15) | YES | | NULL | | +--------------+-------------+------+-----+---------+-------+ 5 rows in set (0.00 sec) mysql> alter table weblogs add primary key(md5,ip); Query OK, 3000 rows affected (1.60 sec) Records: 3000 Duplicates: 0 Warnings: 0 mysql> desc weblogs; +--------------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +--------------+-------------+------+-----+---------+-------+ | md5 | varchar(32) | NO | PRI | | | | url | varchar(64) | YES | | NULL | | | request_date | date | YES | | NULL | | | request_time | time | YES | | NULL | | | ip | varchar(15) | NO | PRI | | | +--------------+-------------+------+-----+---------+-------+ 5 rows in set (0.02 sec)
而後指定-m
hadoop@caozw:~/sqoop/conf$ sqoop import -m 2 --connect jdbc:mysql://127.0.0.1:3306/realworld --username root -P --table weblogs --target-dir /user/sqoop/test2 Warning: /home/hadoop/sqoop/../hive-hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /home/hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 14/10/21 18:22:40 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4-cdh5.1.2 Enter password: 14/10/21 18:24:04 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 14/10/21 18:24:04 INFO tool.CodeGenTool: Beginning code generation 14/10/21 18:24:04 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `weblogs` AS t LIMIT 1 14/10/21 18:24:04 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `weblogs` AS t LIMIT 1 14/10/21 18:24:04 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/hadoop Note: /tmp/sqoop-hadoop/compile/7061f445f29510afa2b89729126a57b9/weblogs.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 14/10/21 18:24:07 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/7061f445f29510afa2b89729126a57b9/weblogs.jar 14/10/21 18:24:07 WARN manager.MySQLManager: It looks like you are importing from mysql. 14/10/21 18:24:07 WARN manager.MySQLManager: This transfer can be faster! Use the --direct 14/10/21 18:24:07 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path. 14/10/21 18:24:07 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql) 14/10/21 18:24:07 ERROR tool.ImportTool: Error during import: No primary key could be found for table weblogs. Please specify one with --split-by or perform a sequential import with '-m 1'. hadoop@caozw:~/sqoop/conf$ sqoop import -m 2 --connect jdbc:mysql://127.0.0.1:3306/realworld --username root -P --table weblogs --target-dir /user/sqoop/test2 Warning: /home/hadoop/sqoop/../hive-hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /home/hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 14/10/21 18:30:04 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4-cdh5.1.2 Enter password: 14/10/21 18:30:07 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 14/10/21 18:30:07 INFO tool.CodeGenTool: Beginning code generation 14/10/21 18:30:07 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `weblogs` AS t LIMIT 1 14/10/21 18:30:07 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `weblogs` AS t LIMIT 1 14/10/21 18:30:07 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/hadoop Note: /tmp/sqoop-hadoop/compile/6dbf2401c1a51b81c5b885e6f7d43137/weblogs.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 14/10/21 18:30:09 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/6dbf2401c1a51b81c5b885e6f7d43137/weblogs.jar 14/10/21 18:30:09 WARN manager.MySQLManager: It looks like you are importing from mysql. 14/10/21 18:30:09 WARN manager.MySQLManager: This transfer can be faster! Use the --direct 14/10/21 18:30:09 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path. 14/10/21 18:30:09 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql) 14/10/21 18:30:09 WARN manager.CatalogQueryManager: The table weblogs contains a multi-column primary key. Sqoop will default to the column md5 only for this job. 14/10/21 18:30:09 WARN manager.CatalogQueryManager: The table weblogs contains a multi-column primary key. Sqoop will default to the column md5 only for this job. 14/10/21 18:30:09 INFO mapreduce.ImportJobBase: Beginning import of weblogs SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.3.0-cdh5.1.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/hadoop/hbase-0.98.1-cdh5.1.2/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 14/10/21 18:30:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/10/21 18:30:09 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 14/10/21 18:30:10 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 14/10/21 18:30:10 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 14/10/21 18:30:17 INFO db.DBInputFormat: Using read commited transaction isolation 14/10/21 18:30:17 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`md5`), MAX(`md5`) FROM `weblogs` 14/10/21 18:30:17 WARN db.TextSplitter: Generating splits for a textual index column. 14/10/21 18:30:17 WARN db.TextSplitter: If your database sorts in a case-insensitive order, this may result in a partial import or duplicate records. 14/10/21 18:30:17 WARN db.TextSplitter: You are strongly encouraged to choose an integral split column. 14/10/21 18:30:18 INFO mapreduce.JobSubmitter: number of splits:4 14/10/21 18:30:18 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1413879907572_0003 14/10/21 18:30:19 INFO impl.YarnClientImpl: Submitted application application_1413879907572_0003 14/10/21 18:30:19 INFO mapreduce.Job: The url to track the job: N/A 14/10/21 18:30:19 INFO mapreduce.Job: Running job: job_1413879907572_0003 14/10/21 18:30:32 INFO mapreduce.Job: Job job_1413879907572_0003 running in uber mode : false 14/10/21 18:30:32 INFO mapreduce.Job: map 0% reduce 0% 14/10/21 18:31:12 INFO mapreduce.Job: map 50% reduce 0% 14/10/21 18:31:13 INFO mapreduce.Job: map 75% reduce 0% 14/10/21 18:31:15 INFO mapreduce.Job: map 100% reduce 0% 14/10/21 18:31:21 INFO mapreduce.Job: Job job_1413879907572_0003 completed successfully 14/10/21 18:31:22 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=429312 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=532 HDFS: Number of bytes written=251209 HDFS: Number of read operations=16 HDFS: Number of large read operations=0 HDFS: Number of write operations=8 Job Counters Launched map tasks=4 Other local map tasks=4 Total time spent by all maps in occupied slots (ms)=160326 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=160326 Total vcore-seconds taken by all map tasks=160326 Total megabyte-seconds taken by all map tasks=164173824 Map-Reduce Framework Map input records=3001 Map output records=3001 Input split bytes=532 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=806 CPU time spent (ms)=5450 Physical memory (bytes) snapshot=494583808 Virtual memory (bytes) snapshot=4805771264 Total committed heap usage (bytes)=325058560 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=251209 14/10/21 18:31:22 INFO mapreduce.ImportJobBase: Transferred 245.3213 KB in 72.5455 seconds (3.3816 KB/sec)
這裏產生的文件跟主鍵的字段個數以及-m的參數是相關的 大體是-m的值乘以主鍵字段數,有待考證
hadoop@caozw:~/study/cdh5$ hadoop fs -ls /user/sqoop/test2/ 14/10/21 18:32:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 5 items -rw-r--r-- 1 hadoop supergroup 0 2014-10-21 18:31 /user/sqoop/test2/_SUCCESS -rw-r--r-- 1 hadoop supergroup 0 2014-10-21 18:31 /user/sqoop/test2/part-m-00000 -rw-r--r-- 1 hadoop supergroup 251130 2014-10-21 18:31 /user/sqoop/test2/part-m-00001 -rw-r--r-- 1 hadoop supergroup 0 2014-10-21 18:31 /user/sqoop/test2/part-m-00002 -rw-r--r-- 1 hadoop supergroup 79 2014-10-21 18:31 /user/sqoop/test2/part-m-00003
這裏的主鍵設計的不合理致使數據分佈不均勻~~ 有待改進
③數據導出Oracle和HBase
使用export可將hdfs中數據導入到遠程數據庫中
export --connect jdbc:oracle:thin:@192.168.**.**:**:**--username **--password=** -m1table VEHICLE--export-dir /user/root/VEHICLE
向Hbase導入數據
sqoop import --connect jdbc:oracle:thin:@192.168.**.**:**:**--username**--password=**--m 1 --table VEHICLE --hbase-create-table --hbase-table VEHICLE--hbase-row-key ID --column-family VEHICLEINFO --split-by ID
5.6 測試Mysql數據庫的使用
前提:導入mysql jdbc的jar包
①測試數據庫鏈接
sqoop list-databases –connect jdbc:mysql://192.168.10.63 –username root–password 123456
②Sqoop的使用
如下全部的命令每行以後都存在一個空格,不要忘記
(如下6中命令都沒有進行過成功測試)
<1>mysql–>hdfs
sqoop export –connect
jdbc:mysql://192.168.10.63/ipj
–username root
–password 123456
–table ipj_flow_user
–export-dir hdfs://192.168.10.63:8020/user/flow/part-m-00000
前提:
(1)hdfs中目錄/user/flow/part-m-00000必須存在
(2)若是集羣設置了壓縮方式lzo,那麼本機必須得安裝且配置成功lzo
(3)hadoop集羣中每一個節點都要有對mysql的操做權限
<2>hdfs–>mysql
sqoop import –connect
jdbc:mysql://192.168.10.63/ipj
–table ipj_flow_user
<3>mysql–>hbase
sqoop import –connect
jdbc:mysql://192.168.10.63/ipj
–table ipj_flow_user
–hbase-table ipj_statics_test
–hbase-create-table
–hbase-row-key id
–column-family imei
<4>hbase–>mysql
關於將Hbase的數據導入到mysql裏,Sqoop並非直接支持的,通常採用以下3種方法:
第一種:將Hbase數據扁平化成HDFS文件,而後再由Sqoop導入.
第二種:將Hbase數據導入Hive表中,而後再導入mysql。
第三種:直接使用Hbase的Java API讀取表數據,直接向mysql導入
不須要使用Sqoop。
<5>mysql–>hive
sqoop import –connect
jdbc:mysql://192.168.10.63/ipj
–table hive_table_test
–hive-import
–hive-table hive_test_table 或–create-hive-table hive_test_table
<6>hive–>mysql
sqoop export –connect
jdbc:mysql://192.168.10.63/ipj
–username hive
–password 123456
–table target_table
–export-dir /user/hive/warehouse/uv/dt=mytable
前提:mysql中表必須存在
③Sqoop其餘操做
<1>列出mysql中的全部數據庫
sqoop list-databases –connect jdbc:mysql://192.168.10.63:3306/ –usernameroot –password 123456
<2>列出mysql中某個庫下全部表
sqoop list-tables –connect jdbc:mysql://192.168.10.63:3306/ipj –usernameroot –password 123456
6 Sqoop1的性能
測試數據:
表名:tb_keywords
行數:11628209
數據文件大小:1.4G
測試結果:
|
HDFS--->DB |
HDFS<---DB |
Sqoop |
428s |
166s |
HDFS<->FILE<->DB |
209s |
105s |
從結果上來看,以FILE做爲中轉方式性能是要高於SQOOP的,緣由以下:
本質上SQOOP使用的是JDBC,效率不會比MYSQL自帶的導入\導出工具效率高以導入數據到DB爲例,SQOOP的設計思想是分階段提交,也就是說假設一個表有1K行,那麼它會先讀出100行(默認值),而後插入,提交,再讀取100行……如此往復
即使如此,SQOOP也是有優點的,好比說使用的便利性,任務執行的容錯性等。在一些測試環境中若是須要的話能夠考慮把它拿來做爲一個工具使用。