hive array、map、struct使用等數據類型

時間 2019-11-22

標籤 hive array map struct 使用數據類型欄目 Hadoop 简体版

原文原文鏈接

hive array、map、struct使用html

傳統數據庫是寫時候校驗，hive是讀取時候校驗java

describe extended h5_gif; 查看錶的詳細信息nginx

describe formatted h5_gif; 查看錶的詳細信息sql

普通表，分區表，外部表(建表須要:external)數據庫

set hive.mapred.mode=strict; 禁止不加分區提交spa

show partitions nginx_log; 查看一個表所擁有的全部分區code

建表的例子
CREATE TABLE user(
name string,
info struct<name:STRING, age:INT>,
string      string
)
PARTITIONED BY(p_hour STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY ':'
LINES TERMINATED BY '\n'
STORED AS RCFILE; // textFILE

load data local inpath '/root/java/testhive/user.log' overwrite into table user partition(p_hour="02")

select * from user where p_hour="02";

./hive -S -e "select * from user where p_hour='02'"; -S 去掉「OK」，「time tiken」等orm

set hive.cli.print.header=true; 打印clnhtm

order by , sort by ,distribute by ,Cluster Byblog

order by 會對輸入作全局排序，所以只有一個reducer（多個reducer沒法保證全局有序）數據大的時候，計算時間長

sort by 對於在到reduce 前排序，保證reduce 輸出是有序的

distribute by 根據指定的字段，將數據進入不一樣的reduce

cluster by 除了具備 distribute by 的功能外還兼具 sort by 的功能。

可是排序只能是倒序排序，不能指定排序規則爲asc 或者desc。

浮點數轉化爲整數不要用cast，而是用 round（）和 floor（）

採樣通常用 rand（）和 bucket