Hive設置變量

時間 2019-11-10

標籤 hive 設置變量欄目 Hadoop 简体版

原文原文鏈接

hive --define --hivevar --hiveconf

set

一、hivevar命名空間

用戶自定義變量

     
     
     
     
      
      
      
      
      
      
      
      
      
      
      
      
     
     
     
     
hive -d name=zhangsanhive --define name=zhangsanhive -d a=1 -d b=2

效果跟hivevar是同樣的

     
     
     
     
      
      
      
      
     
     
     
     
hive --hivevar a=1 --hivevar b=2

引用hivevar命名空間的變量時，變量名前面能夠加hivevar:也能夠不加

     
     
     
     
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
     
     
     
     
set name;set name=zhangsan;set hivevar:name;set hivevar:name=zhangsan;

在代碼中使用${}引用，變量名前面能夠加hivevar:也能夠不加

    
    
    
    
     
     
     
     
    
    
    
    
create table ${a} ($(b) int);

二、hiveconf命名空間

hive的配置參數，覆蓋hive-site.xml（hive-default.xml）中的參數值

    
    
    
    
     
     
     
     
    
    
    
    
hive --hiveconf hive.cli.print.current.db=true --hiveconf hive.cli.print.header=true

    
    
    
    
     
     
     
     
    
    
    
    
hive --hiveconf hive.root.logger=INFO,console

啓動時指定用戶目錄，不一樣的用戶不一樣的目錄

    
    
    
    
     
     
     
     
    
    
    
    
hive --hiveconf hive.metastore.warehouse.dir=/hive/$USER

引用hiveconf命名空間的變量時，變量名前面能夠加hiveconf:也能夠不加

    
    
    
    
     
     
     
     
     
     
     
     
    
    
    
    
set hive.cli.print.header;set hive.cli.print.header=false;

三、sytem命名空間

JVM的參數，不能經過hive設置，只能讀取

引用時，前面必須加system:

    
    
    
    
     
     
     
     
    
    
    
    
set sytem:user.name;

    
    
    
    
     
     
     
     
    
    
    
    
create table ${system:user.name} (a int);

四、env命名空間

shell環境變量，引用時必須加env:

   
   
   
   
    
    
    
    
    
    
    
    
   
   
   
   
set env:USER;set env:HADOOP_HOME;

    
    
    
    
     
     
     
     
    
    
    
    
create table ${env:USER} (${env:USER} string);

附錄：經常使用的設置

在會話裏輸出日誌信息

   
   
   
   
    
    
    
    
   
   
   
   
hive --hiveconf hive.root.logger=DEBUG,console

也能夠修改$HIVE_HOME/conf/hive-log4j.properties的hive.root.logger屬性，可是用set命令是不行的。

顯示當前數據庫

   
   
   
   
    
    
    
    
   
   
   
   
set hive.cli.print.current.db=true;

顯示列名稱

   
   
   
   
    
    
    
    
   
   
   
   
set hive.cli.print.header=true;

向桶表中插入數據前，須要啓用桶

   
   
   
   
    
    
    
    
    
    
    
    
    
    
    
    
   
   
   
   
create table t1 (id int) clustered by (id) into 4 buckets;set hive.enforce.bucketing=true;insert into table t1 select * from t2;

向桶表insert數據時，hive自動根據桶表的桶數設置reduce的個數。不然須要手動設置reduce的個數：set mapreduce.job.reduces=N（桶表定義的桶數）或者mapred.reduce.tasks，而後在select語句後加clustered by

動態分區相關

   
   
   
   
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
   
   
   
   
set hive.exec.dynamic.partition=true #開啓動態分區set hive.exec.dynamic.partition.mode=nostrict #動態分區模式：strict至少要有個靜態分區，nostrict不限制set hive.exec.max.dynamic.partitions.pernode=100 #每一個mapper節點最多建立100個分區set hive.exec.max.dynamic.partitions=1000 #總共能夠建立的分區數

from t insert overwrite table p partition(country, dt) select ... cuntry, dt

上面的查詢在執行過程當中，單個map裏的數量不受控制，可能會超過hive.exec.max.dynamic.partition.pernode配置的數量，能夠經過對分區字段分區解決，上面的sql改爲：

from t insert overwrite table p partition(country, dt) select ... cuntry, dt distributed by country, dt;

hive操做的執行模式

   
   
   
   
    
    
    
    
   
   
   
   
set hive.mapred.mode=strict

strict：不執行有風險（巨大的mapreduce任務）的操做，好比： 笛卡爾積、沒有指定分區的查詢、bigint和string比較、bigint和double比較、沒有limit的orderby

nostrict：不限制

壓縮mapreduce中間數據

   
   
   
   
    
    
    
    
   
   
   
   
set hive.exec.compress.intermediate=true;

    
    
    
    
     
     
     
     
    
    
    
    
setmapreduce.map.output.compress.codec=org.apache.hadoop.io.compress.SnappyCodec; #設置中間數據的壓縮算法，默認是org.apache.hadoop.io.compress.DefaultCodec

壓縮mapreduce輸出結果

   
   
   
   
    
    
    
    
   
   
   
   
set hive.exec.compress.output=true;

    
    
    
    
     
     
     
     
    
    
    
    
set mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec #設置輸出數據的壓縮算法，使用GZip能夠得到更好的壓縮率，但對mapreduce而言是不可分隔的

     
     
     
     
      
      
      
      
     
     
     
     
set mapreduce.output.fileoutputformat.compress.type=BLOCK; #若是輸出的是SequenceFile，則使用塊級壓縮

啓用對分區歸檔

   
   
   
   
    
    
    
    
   
   
   
   
set hive.archive.enabled=true;

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。