hive動態分區

時間 2020-07-11

標籤 hive 動態分區欄目 Hadoop 简体版

原文原文鏈接

往hive分區表中插入數據時，若是須要建立的分區不少，好比以表中某個字段進行分區存儲，則須要複製粘貼修改不少sql去執行，效率低。由於hive是批處理系統，因此hive提供了一個動態分區功能，其能夠基於查詢參數的位置去推斷分區的名稱，從而創建分區。

   1.建立一個單一字段分區表

    hive>
       create table dpartition(id int ,name string )
       partitioned by(ct string );

   2.往表裏裝載數據，而且動態創建分區，以city創建動態分區

    hive>
     set hive.exec.dynamici.partition=true; #開啓動態分區，默認是false
     set hive.exec.dynamic.partition.mode=nonstrict; #開啓容許全部分區都是動態的，不然必需要有靜態分區才能使用。
     insert overwrite table dpartition
     partition(ct)
     select id ,name,city from mytest_tmp2_p;

    要點：由於dpartition表中只有兩個字段，因此當咱們查詢了三個字段時（多了city字段），因此係統默認以最後一個字段city爲分區名，由於分區表的
    分區字段默認也是該表中的字段，且依次排在表中字段的最後面。因此分區須要分區的字段只能放在後面，不能把順序弄錯。若是咱們查詢了四個字段的話，則會報
    錯，由於該表加上分區字段也才三個。要注意系統是根據查詢字段的位置推斷分區名的，而不是字段名稱。
    hive>--查看可知，hive已經完成了以city字段爲分區字段，實現了動態分區。
    hive (fdm_sor)> show partitions dpartition;
    partition
    ct=beijing
    ct=beijing1

注意：使用，insert...select 往表中導入數據時，查詢的字段個數必須和目標的字段個數相同，不能多，也不能少,不然會報錯。可是若是字段的類型不一致的話，則會使用null值填充，不會報錯。而使用load data形式往hive表中裝載數據時，則不會檢查。若是字段多了則會丟棄，少了則會null值填充。一樣若是字段類型不一致，也是使用null值填充。

3.多個分區字段時，實現半自動分區（部分字段靜態分區，注意靜態分區字段要在動態前面）

    1.建立一個只有一個字段，兩個分區字段的分區表
    hive (fdm_sor)> create table ds_parttion(id int )
                  > partitioned by (state string ,ct string );
    2.往該分區表半動態分區插入數據
    hive>
     set hive.exec.dynamici.partition=true;
     set hive.exec.dynamic.partition.mode=nonstrict;
     insert overwrite table ds_parttion
     partition(state='china',ct) #state分區爲靜態，ct爲動態分區，以查詢的city字段爲分區名
     select id ,city from mytest_tmp2_p;

    3.查詢結果顯示：
    hive (fdm_sor)> select * from ds_parttion where state='china'
                  > ;
    ds_parttion.id ds_parttion.state       ds_parttion.ct
    4       china   beijing
    3       china   beijing
    2       china   beijing
    1       china   beijing
    4       china   beijing1
    3       china   beijing1
    2       china   beijing1
    1       china   beijing1

    hive (fdm_sor)> select * from ds_parttion where state='china' and ct='beijing';
    ds_parttion.id ds_parttion.state       ds_parttion.ct
    4       china   beijing
    3       china   beijing
    2       china   beijing
    1       china   beijing

    hive (fdm_sor)> select * from ds_parttion where state='china' and ct='beijing1';
    ds_parttion.id ds_parttion.state       ds_parttion.ct
    4       china   beijing1
    3       china   beijing1
    2       china   beijing1
    1       china   beijing1
    Time taken: 0.072 seconds, Fetched: 4 row(s)

4.多個分區字段時，所有實現動態分區插入數據

     set hive.exec.dynamici.partition=true;
     set hive.exec.dynamic.partition.mode=nonstrict;
     insert overwrite table ds_parttion
     partition(state,ct)
     select id ,country,city from mytest_tmp2_p;
    注意：字段的個數和順序不能弄錯。

5.動態分區表的屬性

使用動態分區表必須配置的參數：

    set hive.exec.dynamic.partition =true（默認false）,表示開啓動態分區功能
    set hive.exec.dynamic.partition.mode = nonstrict(默認strict),表示容許全部分區都是動態的，不然必須有靜態分區字段

動態分區相關的調優參數：

    set hive.exec.max.dynamic.partitions.pernode=100 （默認100，通常能夠設置大一點，好比1000）

       表示每一個maper或reducer能夠容許建立的最大動態分區個數，默認是100，超出則會報錯。

   set hive.exec.max.dynamic.partitions =1000(默認值)

       表示一個動態分區語句能夠建立的最大動態分區個數，超出報錯

   set hive.exec.max.created.files =10000(默認) 全局能夠建立的最大文件個數，超出報錯。
---------------------
做者：牛大財有大才
來源：CSDN
原文：https://blog.csdn.net/qq_26442553/article/details/80382174
版權聲明：本文爲博主原創文章，轉載請附上博文連接！node

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。