首先保證HDFS和HiveServer2正常運行,集羣運行在debugo01,debugo02,debugo03三臺主機上。java
在debugo03的MySQL中新建一個測試數據庫,並建測試表employee_salary。mysql
mysql -uroot -p mysql> create database test_sqoop; Query OK, 1 row affected (0.00 sec) mysql> use test_sqoop; SET FOREIGN_KEY_CHECKS=0; DROP TABLE IF EXISTS `employee_salary`; CREATE TABLE `employee_salary` ( `name` text, `id` int(8) NOT NULL AUTO_INCREMENT, `salary` int(8) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=MyISAM AUTO_INCREMENT=3 DEFAULT CHARSET=latin1; INSERT INTO `employee_salary` VALUES ('zhangsan', '1', '5000'); INSERT INTO `employee_salary` VALUES ('lisi', '2', '5500'); commit; CREATE USER 'test'@'%' IDENTIFIED BY 'test'; GRANT ALL PRIVILEGES ON test_sqoop.* TO 'test'@'%';
yum install sqoop
cp /usr/share/java/mysql-connector-java.jar /usr/lib/sqoop/libsql
(1) sqoop help數據庫
[root@hadoop01 ~]# sqoop help Available commands: codegen Generate code to interact with database records create-hive-table Import a table definition into Hive eval Evaluate a SQL statement and display the results export Export an HDFS directory to a database table help List available commands import Import a table from a database to HDFS import-all-tables Import tables from a database to HDFS import-mainframe Import datasets from a mainframe server to HDFS job Work with saved jobs list-databases List available databases on a server list-tables List available tables in a database merge Merge results of incremental imports metastore Run a standalone Sqoop metastore version Display version information
(2) 列出全部數據庫(可用於測試鏈接)
通常用於測試鏈接,獲得的結果只是該mysql用戶擁有權限的數據庫。apache
sqoop list-databases --connect jdbc:mysql://debugo03 --username test --password test information_schema test_sqoop
(3) 列出全部表bash
sqoop list-tables --connect jdbc:mysql://debugo03/test_sqoop --username test --password test employee_salary
(4) 導出mysql表到hdfs
其中的-D mapred.job.queue.name=lesson是用來指定yarn的執行隊列。
–m 1用來指定map任務個數爲1。app
sqoop import -D mapred.job.queue.name=lesson --connect jdbc:mysql://debugo03/test_sqoop --username test --password test --table employee_salary --m 1 --target-dir /user/sqoop
在hdfs上建立測試目錄less
su - hdfs hdfs dfs -mkdir /user/sqoop hdfs dfs -chown sqoop:hadoop /user/sqoop
執行導出oop
su - sqoop -s /bin/sh sqoop import -D mapred.job.queue.name=lesson --connect jdbc:mysql://debugo03/test_sqoop --username test --password test --table employee_salary --m 1 --target-dir /user/sqoop/employee_salary
若是出現下面的錯誤,請更新/usr/lib/sqoop/mysql-java-connector.jar文件。ISSUE: https://issues.apache.org/jira/browse/SQOOP-1400測試
ERROR manager.SqlManager: Error reading from database: java .sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@2c176ab7 is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries.java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@2c1 76ab7 is still active.
查看hdfs數據,與mysql數據庫中employee_salary表一致:
ERROR manager.SqlManager: Error reading from database: java .sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@2c176ab7 is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries.java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@2c1 76ab7 is still active.
(5) 導出mysql表所有數據到hive
在hive建立對應的數據庫和表
0: jdbc:hive2://debugo02:10000> create database test_sqoop; No rows affected (0.147 seconds) 0: jdbc:hive2://debugo02:10000> show databases; +----------------+--+ | database_name | +----------------+--+ | default | | sales | | test_sqoop | +----------------+--+ 3 rows selected (0.046 seconds)
使用sqoop建立表
sqoop create-hive-table -D mapred.job.queue.name=work --connect jdbc:mysql://debugo03/test_sqoop --username test --password test --table employee_salary --hive-table test_sqoop.employee_salary OK Time taken: 1.515 seconds
使用sqoop將數據從mysql導入hive
須要確認集羣中的任何一個節點均可以登陸mysql。
sqoop import -D mapred.job.queue.name=lesson --connect jdbc:mysql://debugo03/test_sqoop --username test --password test --table employee_salary --hive-import --hive-table test_sqoop.employee_salary OK Time taken: 0.698 seconds
檢查結果,成功!
(6) 將數據從hive導入mysql
先刪除mysql employee_salary表中的數據
mysql> use test_sqoop; Database changed mysql> show tables; +----------------------+ | Tables_in_test_sqoop | +----------------------+ | employee_salary | +----------------------+ 1 row in set (0.00 sec) mysql> truncate employee_salary; Query OK, 0 rows affected (0.00 sec) mysql> select * from employee_salary; Empty set (0.00 sec)
導數據
sqoop export -D mapred.job.queue.name=work --connect jdbc:mysql://debugo03/test_sqoop --username test --password test --table employee_salary --export-dir /user/hive/warehouse/test_sqoop.db/employee_salary/ --input-fields-terminated-by '\0001' ...... 15/04/01 19:40:55 INFO mapreduce.ExportJobBase: Exported 2 records
查看結果,記錄已經成功導出到mysql中。
(7) 從mysql表增量導入數據到hive
在mysql表中插入增量數據
插入一條wangwu
mysql> insert into employee_salary values('wangwu',3,6000); Query OK, 1 row affected (0.00 sec) mysql> commit; Query OK, 0 rows affected (0.00 sec) mysql> select * from employee_salary; +----------+----+--------+ | name | id | salary | +----------+----+--------+ | lisi | 2 | 5500 | | zhangsan | 1 | 5000 | | wangwu | 3 | 6000 | +----------+----+--------+ 3 rows in set (0.00 sec)
增量導入
sqoop import -D mapred.job.queue.name=lesson --connect jdbc:mysql://debugo03/test_sqoop --username test --password test --table employee_salary --hive-import --hive-table test_sqoop.employee_salary --check-column id --incremental append --last-value 2 OK Time taken: 0.77 seconds Loading data to table test_sqoop.employee_salary Table test_sqoop.employee_salary stats: [numFiles=3, numRows=0, totalSize=42, rawDataSize=0]OK Time taken: 0.625 seconds
查看結果
select * from test_sqoop.employee_salary;