1.問題描述html
經過sqoop抽取Mysql表數據到hive表,發現hive表全部列顯示爲nullmysql
Hive表的分隔符爲「\u001B」,sqoop指定的分隔符也是「\u001B」sql
經過命令show create table test_hive_delimiter查看建表語句以下:apache
0: jdbc:hive2://localhost:10000/> show create table test_hive_delimiter; ... INFO : OK +----------------------------------------------------+--+ | createtab_stmt | +----------------------------------------------------+--+ | CREATE EXTERNAL TABLE `test_hive_delimiter`( | | `id` int, | | `name` string, | | `address` string) | | ROW FORMAT SERDE | | 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' | | WITH SERDEPROPERTIES ( | | 'field.delim'='\u0015', | | 'serialization.format'='\u0015') | | STORED AS INPUTFORMAT | | 'org.apache.hadoop.mapred.TextInputFormat' | | OUTPUTFORMAT | | 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' | | LOCATION | | 'hdfs://ip-172-31-6-148.fayson.com:8020/fayson/test_hive_delimiter' | | TBLPROPERTIES ( | | 'COLUMN_STATS_ACCURATE'='false', | | 'numFiles'='0', | | 'numRows'='-1', | | 'rawDataSize'='-1', | | 'totalSize'='0', | | 'transient_lastDdlTime'='1504705887') | +----------------------------------------------------+--+ 22 rows selected (0.084 seconds) 0: jdbc:hive2://localhost:10000/>
發現Hive的原始建表語句中的分隔符是「\u001B」而經過show create table test_hive_delimiter命令查詢出來的分隔符爲「\u0015」,分隔符被修改了。ide
2.問題復現工具
1.建立Hive表test_hive_delimiter,使用「\u001B」分隔符oop
create external table test_hive_delimiter ( id int, name string, address string ) row format delimited fields terminated by '\u001B' stored as textfile location '/fayson/test_hive_delimiter';
2.使用sqoop抽取MySQL中test表數據到hive表(test_hive_delimiter)ui
[root@ip-172-31-6-148 ~]# sqoop import --connect jdbc:mysql://ip-172-31-6-148.fayson.com:3306/fayson -username root -password 123456 --table test -m 1 --hive-import --fields-terminated-by "\0x001B" --target-dir /fayson/test_hive_delimiter --hive-table test_hive_delimiter
數據抽取成功:3d
[root@ip-172-31-6-148 ~]# hadoop fs -ls /fayson/test_hive_delimiter Found 2 items -rw-r--r-- 3 fayson supergroup 0 2017-09-06 13:46 /fayson/test_hive_delimiter/_SUCCESS -rwxr-xr-x 3 fayson supergroup 56 2017-09-06 13:46 /fayson/test_hive_delimiter/part-m-00000 [root@ip-172-31-6-148 ~]# hadoop fs -ls /fayson/test_hive_delimiter/part-m-00000 -rwxr-xr-x 3 fayson supergroup 56 2017-09-06 13:46 /fayson/test_hive_delimiter/part-m-00000 [root@ip-172-31-6-148 ~]#
3.查看test_hive_delimiter表數據code
[root@ip-172-31-6-148 ~]# beeline Beeline version 1.1.0-cdh5.12.1 by Apache Hive beeline> !connect jdbc:hive2://localhost:10000/;principal=hive/ip-172-31-6-148.fayson.com@FAYSON.COM ... Transaction isolation: TRANSACTION_REPEATABLE_READ 0: jdbc:hive2://localhost:10000/> select * from test_hive_delimiter; ... INFO : OK +-------------------------+---------------------------+------------------------------+--+ | test_hive_delimiter.id | test_hive_delimiter.name | test_hive_delimiter.address | +-------------------------+---------------------------+------------------------------+--+ | NULL | NULL | NULL | | NULL | NULL | NULL | | NULL | NULL | NULL | +-------------------------+---------------------------+------------------------------+--+ 3 rows selected (0.287 seconds) 0: jdbc:hive2://localhost:10000/>
4.Hive表的建表語句以下
3.解決方法
分隔符「\u001B」爲十六進制,而Hive的分隔符實際是八進制,因此在使用十六進制的分隔符時會被Hive轉義,因此出現使用「\u001B」分隔符建立hive表後顯示的分隔符爲「\u0015」。
在不改變數據文件分隔符的狀況下,要先將十六進制分隔符轉換成八進制分隔符來建立Hive表。
1.將十六進制分隔符轉換爲八進制分隔符
「\u001B」轉換八進制爲「\033」,在線轉換工具:http://tool.lu/hexconvert/
2.修改建表語句使用八進制「\033」做爲分隔符
create external table test_hive_delimiter ( id int, name string, address string ) row format delimited fields terminated by '\033' stored as textfile location '/fayson/test_hive_delimiter';
使用命令show create table test_hive_delimiter查看建表語句
0: jdbc:hive2://localhost:10000/> show create table test_hive_delimiter; ... INFO : OK +----------------------------------------------------+--+ | createtab_stmt | +----------------------------------------------------+--+ | CREATE EXTERNAL TABLE `test_hive_delimiter`( | | `id` int, | | `name` string, | | `address` string) | | ROW FORMAT SERDE | | 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' | | WITH SERDEPROPERTIES ( | | 'field.delim'='\u001B', | | 'serialization.format'='\u001B') | | STORED AS INPUTFORMAT | | 'org.apache.hadoop.mapred.TextInputFormat' | | OUTPUTFORMAT | | 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' | | LOCATION | | 'hdfs://ip-172-31-6-148.fayson.com:8020/fayson/test_hive_delimiter' | | TBLPROPERTIES ( | | 'COLUMN_STATS_ACCURATE'='false', | | 'numFiles'='0', | | 'numRows'='-1', | | 'rawDataSize'='-1', | | 'totalSize'='0', | | 'transient_lastDdlTime'='1504707693') | +----------------------------------------------------+--+ 22 rows selected (0.079 seconds) 0: jdbc:hive2://localhost:10000/>
3.查詢test_hive_delimiter表數據
0: jdbc:hive2://localhost:10000/> select * from test_hive_delimiter; ... INFO : OK +-------------------------+---------------------------+------------------------------+--+ | test_hive_delimiter.id | test_hive_delimiter.name | test_hive_delimiter.address | +-------------------------+---------------------------+------------------------------+--+ | 1 | fayson | guangdong | | 2 | zhangsan | shenzheng | | 3 | lisi | shanghai | +-------------------------+---------------------------+------------------------------+--+ 3 rows selected (0.107 seconds) 0: jdbc:hive2://localhost:10000/>
將十六進制的」\u001B」轉換爲八進制的」\033」建表,問題解決。
4.備註
官網說明:https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_file_formats
醉酒鞭名馬,少年多浮誇! 嶺南浣溪沙,嘔吐酒肆下!摯友不願放,數據玩的花!
推薦關注Hadoop實操,第一時間,分享更多Hadoop乾貨,歡迎轉發和分享。