hive常見操做語句--建立表語句

時間 2019-11-21

標籤 hive 常見語句建立表語欄目 Hadoop 简体版

原文原文鏈接

一：hive建表語句

sql

create table page_view
(
page_id bigint comment '頁面ID',
page_name string comment '頁面名稱',
page_url string comment '頁面URL'
)
comment '頁面視圖'
partitioned by (ds string comment '當前時間，用於分區字段')
row format delimited
stored as rcfile
location '/user/hive/test';

這裏須要說下stored as 關鍵詞，hive目前支持三種方式:

1:就是最普通的textfile，數據不作壓縮，磁盤開銷大，解析開銷也大

2:SquenceFIle,hadoop api提供的一種二進制API方式，其具備使用方便、可分割、可壓縮等特色。

3:rcfile行列存儲結合的方式，它會首先將數據進行分塊，保證同一個record在一個分塊上，避免讀一次記錄須要讀多個塊。其次塊數據列式存儲，便於數據存儲和快速的列存取。

RCFILE因爲採用是的列式存儲，因此加載時候開銷較大，但具備很好的查詢響應、較好的壓縮比。

若是創建的表須要加上分區，則語句以下:

這裏partitioned by 表示按什麼字段進行分割，一般來講是按時間

api

create table test_ds
(
  id int comment '用戶ID',
  name string comment '用戶名稱'
)
comment '測試分區表'
partitioned by(ds string comment '時間分區字段')
clustered by(id) sorted by(name) into 32 buckets
row format delimited 
fields terminated by '\t'
stored as rcfile;

若是須要對某些字段進行聚類存儲，方便對hive集羣列進行採樣，則應該這樣編寫SQL:oop

create table test_ds
(
  id int comment '用戶ID',
  name string comment '用戶名稱'
)
comment '測試分區表'
partitioned by(ds string comment '時間分區字段')
clustered by(id) sorted by(name) into 32 buckets	
row format delimited 
fields terminated by '\t'
stored as rcfile;

這裏表示將id按照name進行排序，聚類彙總，而後分區劃分到32個散列桶中。測試

若是想改變表在hdfs中的位置，則應該使用location字段顯式的指定:url

create table test_another_location
(
   id int, 
   name string,
   url string
)
comment '測試另一個位置'
row format delimited
fields terminated by '\t'
stored as textfile
location '/tmp/test_location';

其中/tmp/test_location可沒必要先建立code

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。