Sqoop2從Mysql導入Hdfs (hadoop-2.7.1,Sqoop 1.99.6)

1、環境搭建mysql

1.Hadoopweb

http://my.oschina.net/u/204498/blog/519789 sql


2.Sqoop2.xshell

http://my.oschina.net/u/204498/blog/518941數據庫

3. mysqlapache


2、從mysql導入hdfside

1.建立mysql數據庫、表、以及測試數據oop

xxxxxxxx$  mysql -uroot -p
Enter password: 

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| test               |
+--------------------+
4 rows in set (0.00 sec)

test  => 是新建的數據庫

mysql> use test;

mysql> show tables;
+----------------------+
| Tables_in_test       |
+----------------------+               |
| test                 |
+----------------------+
1 rows in set (0.00 sec)

test => 是新增的表

mysql> desc test;
+-------+-------------+------+-----+---------+----------------+
| Field | Type        | Null | Key | Default | Extra          |
+-------+-------------+------+-----+---------+----------------+
| id    | int(11)     | NO   | PRI | NULL    | auto_increment |
| name  | varchar(45) | YES  |     | NULL    |                |
| age   | int(11)     | YES  |     | NULL    |                |
+-------+-------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)

mysql> select * from test;
+----+------+------+
| id | name | age  |
+----+------+------+
|  7 | a    |    1 |
|  8 | b    |    2 |
|  9 | c    |    3 |
+----+------+------+
3 rows in set (0.00 sec)

2. 爲各個用戶受權測試

注意:sqoop提交job後,各個節點在map階段會訪問數據庫,因此需事先受權ui

mysql> grant [all | select | ...] on {db}.{table} to {user}@{host} identified by {passwd};
mysql> flush privileges;

#我給特定的hostname受權 username:root passwd:root 訪問db:test 中任意table,權限是all
mysql> grant all on test.* to 'root'@{host} identified by 'root';

3.啓動sqoop2-server

[hadoop@hftclclw0001 sqoop-1.99.6-bin-hadoop200]$ pwd
/home/hadoop/sqoop-1.99.6-bin-hadoop200

[hadoop@hftclclw0001 sqoop-1.99.6-bin-hadoop200]$ ./bin/sqoop2-server start
...
...

webui能夠訪問校驗,也能夠查看log

4.啓動sqoop2-shell

[hadoop@hftclclw0001 sqoop-1.99.6-bin-hadoop200]$ pwd
/home/hadoop/sqoop-1.99.6-bin-hadoop200

[hadoop@hftclclw0001 sqoop-1.99.6-bin-hadoop200]$ ./bin/sqoop2-shell 
...
...

sqoop:000> show version
...
...

sqoop:000> show connector
+----+------------------------+---------+------------------------------------------------------+----------------------+
| Id |          Name          | Version |                        Class                         | Supported Directions |
+----+------------------------+---------+------------------------------------------------------+----------------------+
| 1  | generic-jdbc-connector | 1.99.6  | org.apache.sqoop.connector.jdbc.GenericJdbcConnector | FROM/TO              |
| 2  | kite-connector         | 1.99.6  | org.apache.sqoop.connector.kite.KiteConnector        | FROM/TO              |
| 3  | hdfs-connector         | 1.99.6  | org.apache.sqoop.connector.hdfs.HdfsConnector        | FROM/TO              |
| 4  | kafka-connector        | 1.99.6  | org.apache.sqoop.connector.kafka.KafkaConnector      | TO                   |
+----+------------------------+---------+------------------------------------------------------+----------------------+

根據你的connector建立connector
sqoop:000> create link -c 1      => 先建立jdbc
會填寫name、jdbc-driver、url、username、passwd等等

sqoop:000> create link -c 3      => 建立hdfs
會填寫name、hdfs url、等等

sqoop:000> show link
+----+-------------+--------------+------------------------+---------+
| Id |    Name     | Connector Id |     Connector Name     | Enabled |
+----+-------------+--------------+------------------------+---------+
| 3  | 10-21_jdbc1 | 1            | generic-jdbc-connector | true    |
| 4  | 10-21_hdfs1 | 3            | hdfs-connector         | true    |
+----+-------------+--------------+------------------------+---------+

建立job -f=> from  -t to 即從哪些導入到哪裏
sqoop:000> create job -f 3 -t 4
會填寫,相應的table信息。還有hdfs信息

sqoop:000> show job             
+----+---------------+----------------+--------------+---------+
| Id |     Name      | From Connector | To Connector | Enabled |
+----+---------------+----------------+--------------+---------+
| 1  | 10-20_sqoopy2 | 1              | 3            | true    |
+----+---------------+----------------+--------------+---------+

#啓動job
sqoop:000> start job -j 2
...
...
...

能夠再webui上訪問到,查看進度,也能夠使用
sqoop:000> status job -j 2

sqoop的guide

http://sqoop.apache.org/


5.troubleshooting

多看日誌,慢慢的排查

相關文章
相關標籤/搜索