HIVE快速入門

時間 2019-11-11

標籤 hive 快速入門欄目 Hadoop 简体版

原文原文鏈接

（一）簡單入門
一、建立一個表
create table if not exists ljh_emp(
name string,
salary float,
gender string)
comment 'basic information of a employee'
row format delimited fields terminated by ',’;

二、準備數據文件
建立test目錄且目錄只有一個文件，文件內容以下：
ljh,25000,male
jediael,25000,male
llq,15000,female

三、將數據導入表中
load data local inpath '/home/ljhn1829/test' into table ljh_emp;

四、查詢表中的內容
select * from ljh_emp;
OK
ljh   25000.0   male
jediael   25000.0   male
llq   15000.0   female
Time taken: 0.159 seconds, Fetched: 3 row(s)

（二）關於分隔符
一、默認分隔符
hive中的行默認分隔符爲 \n，字段分隔符爲 ctrl+A，此外還有ctrl+B，ctrl+C，能夠用於分隔array,struct,map等，詳見《hive編程指南》P44。
所以，若在建表是不指定row format delimited fields terminated by ‘,’，則認爲默認字段分隔符爲ctrl+A。
能夠有2種解決方案：
一是在建立表時指定分隔符，如上例所示，
二是在數據文件中使用ctrl+A，見下例

二、在數據文件中使用ctrl+A全分隔符
（1）建立表
create table ljh_test_emp(name string, salary float, gender string);
（2）準備數據文件
建立test2目錄，目錄下只有一個文件，文件內容以下：
ljh^A25000^Amale
jediael^A25000^Amale
llq^A15000^Afemale
其中的^A字符僅在vi時才能看到，cat不能看到。
輸出^A的方法是：在vi的插入模式下，先按ctrl+V，再按ctrl+A
（3）將數據導入表
create table ljh_test_emp(name string, salary float, gender string);
（4）查詢數據
hive> select * from ljh_test_emp;
OK
ljh   25000.0   male
jediael   25000.0   male
llq   15000.0   female
Time taken: 0.2 seconds, Fetched: 3 row(s)

三、未指定分隔符，且又未使用ctrl+A做文件中的分隔符，出現如下錯誤
(1)建立表
create table if not exists ljh_emp_test(
name string,
salary float,
gender string)
comment 'basic information of a employee’;
（2）準備數據
ljh,25000,male
jediael,25000,male
llq,15000,female
（3）將數據導入表中
load data local inpath '/home/ljhn1829/test' into table ljh_emp_test;
（4）查看錶中數據
select * from ljh_emp_test;
OK
ljh,25000,male   NULL   NULL
jediael,25000,male   NULL   NULL
llq,15000,female   NULL   NULL
Time taken: 0.185 seconds, Fetched: 3 row(s)
能夠看出，因爲分隔符爲ctrl+A，所以導入數據時將文件中的每一行內容均只看成第一個字段，致使後面2個字段均爲null。

（三）複雜一點的表
一、建立表
create table employees (
    name string,
    slalary float,
    suboddinates array<string>,
    deductions map<string,float>,
    address struct<stree:string, city:string, state:string, zip:int>
)
partitioned by(country string, state string);

二、準備數據
John Doe^A100001.1^AMary Smith^BTodd Jones^AFederal Taxes^C.2^BStateTaxes^C.05^BInsurance^C.1^A1 Michigan Ave.^BChicago^BIL^B60600
Mary Smith^A80000.0^ABill King^AFederal Taxes^C.2^BState Taxes^C.05^BInsurance^C.1^A100 Ontario St.^BChicago^BIL^B60601
Todd Jones^A70000.0^A^AFederal Taxes^C.15^BState Taxes^C.03^BInsurance^C.1^A200 Chicago Ave.^BOak Park^BIL^B60700
Bill King^A60001.0^A^AFederal Taxes^C.15^BState Taxes^C.03^BInsurance^C.1^A300 Obscure Dr.^BObscuria^BIL^B60100
注意 ^A：分隔字段 ^B：分隔array/struct/map中的元素 ^C：分隔map中的KV
詳見《hive編程指南》P44。

三、將數據導入表中
load data local inpath '/home/ljhn1829/phd' into table employees partition(country='us',state='ca');

四、查看錶數據
hive> select * from employees;
OK
John Doe   100001.1   ["Mary Smith","Todd Jones"]   {"Federal Taxes":0.2,"StateTaxes":0.05,"Insurance":0.1}   {"stree":"1 Michigan Ave.","city":"Chicago","state":"IL","zip":60600}   us   ca
Mary Smith   80000.0   ["Bill King"]   {"Federal Taxes":0.2,"State Taxes":0.05,"Insurance":0.1}   {"stree":"100 Ontario St.","city":"Chicago","state":"IL","zip":60601}   us   ca
Todd Jones   70000.0   []   {"Federal Taxes":0.15,"State Taxes":0.03,"Insurance":0.1}   {"stree":"200 Chicago Ave.","city":"Oak Park","state":"IL","zip":60700}   us   ca
Bill King   60001.0   []   {"Federal Taxes":0.15,"State Taxes":0.03,"Insurance":0.1}   {"stree":"300 Obscure Dr.","city":"Obscuria","state":"IL","zip":60100}   us   ca
Time taken: 0.312 seconds, Fetched: 4 row(s)

五、查看hdfs中的文件
hadoop fs -ls /data/gamein/g4_us/meta/employees/country=us/state=ca
Found 1 items
-rwxr-x---   3 ljhn1829 g4_us        428 2015-05-12 12:49 /data/gamein/g4_us/meta/employees/country=us/state=ca/progamming_hive_data.txt
該文件中的內容與原有文件一致。

（四）經過select子句插入數據一、建立表 create table employees2 (     name string,     slalary float,     suboddinates array<string>,     deductions map<string,float>,     address struct<stree:string, city:string, state:string, zip:int> ) partitioned by(country string, state string); 二、插入數據 hive> set hive.exec.dynamic.partition.mode=nonstrict; 不然會出現如下異常： FAILED: SemanticException [Error 10096]: Dynamic partition strict mode requires at least one static partition column. To turn this off set hive.exec.dynamic.partition.mode=nonstrict insert into table employees2 partition (country,state) select name,slalary,suboddinates,deductions,address, e.country, e.state from employees e;

1. Hive快速入門
2. Hive 快速入門
3. HIVE快速入門
4. Hive 快速入門(全面)
5. 快速入門Hive的安裝
6. [轉帖]Hive 快速入門(全面)
7. ES6快速入門 ES6 快速入門
8. 快速入門
9. Hive快速入門系列(7) | Hive常見的屬性配置
10. Hive快速入門系列(10) | Hive的查詢語法
更多相關文章...
• SQL 快速參考 - SQL 教程
• Eclipse 快速修復 - Eclipse 教程
• YAML 入門教程
• Java Agent入門實戰（一）-Instrumentation介紹與使用

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。