hive的語法命令介紹

時間 2020-04-24

標籤 hive 語法命令介紹欄目 Hadoop 简体版

原文原文鏈接

1.hive的基本語法：

create databases mydb #建立數據庫
show databases            #查看全部的庫
use mydb                      #切換數據庫
create table t_user(id int ,name string,age int)  #建立表
create table t_user(id int ,name string,age int) row format delimited fields terminated by '分隔符'  #指定分隔符的建表語句
insert into table t_user values(值1,值1,值1)     #插入數據
select * from t_table       #查詢語句
load data inpath 'HDFS path' into table t_name  #在hdfs中導入數據
load data local inpath 'linux path' into table t_name #導入Linux數據到hive

2.hive的DDL操做：

（1）對hive庫的操做：

建庫linux

create database if not exists myhive   #若是不存在則建立該數據庫
create database if not exists myhive2 localtion 'hdfs path'  #指定該庫的位置

查看庫：sql

show databases;                            #查看hive中全部的數據庫
desc databases dbname ;              #顯示數據庫的詳細信息
select current_database();             #查看正在使用的數據庫
show create database db_name ;  #查看建庫語句

刪除庫：數據庫

drop databases db_name restrict;
drop database if exists dbname;
#注意：默認狀況下，hive不容許刪除包含表的庫，有兩種辦法：
1. 手動刪除全部的表，而後在刪除庫
2. 使用cascade 關鍵字：drop database myhive cascade ;

（2）對hive表的操做：

建表：
語法分析：ide

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name 
[(col_name data_type [COMMENT col_comment], ...)] 
[COMMENT table_comment] 
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS] 
[ROW FORMAT row_format]
[STORED AS file_format]
[LOCATION hdfs_path]
[EXTERNAL] TABLE                # 表示建立的是內部表仍是外部表
[IF NOT EXISTS] table_name  # 防止報錯
[(col_name data_type [COMMENT col_comment], ...)]  #表的字段
[COMMENT table_comment]   #表的描述信息
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]  #指定分區表
[CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]   #指定分桶，排序規則，以及分桶個數
[ROW FORMAT row_format]   #指定分隔符
fields terminated by ‘’  #指定列分割符
lines terminated by ‘’  #指定行分隔符
[STORED AS file_format]   #指定數據存儲格式
[LOCATION hdfs_path]      #指定數據存儲目錄 （在建立外部表時使用）

建表舉例：性能

#內部表
create table if not exists student(id int ,name string) row format delimited fields terminated by ','      

#外部表
create external table if not exists student (id int ,name string) row format delimited fields terminated by ',' location '/hive/data/';
#分區表
create table  if not exists student (id int ,name string) partitioned by (age int conmment 'partitioned comment') row format delimited fields terminated by ',' ;   #分區字段的字段名稱，不能是表中的任意一個字段
#建立分桶表
create table if not exists  student (id int ,name string,age int ) clustered by (age) sort by (age desc) into 10 buckets row format delimited fields terminated by ',' ;   #分桶字段必定要是表中的屬性字段 
#like 方式
create table student like t_student ;   #複製一個表結構，分區表和分桶表也一樣能夠複製（分區表只能複製在建立表的時候的信息，以後添加的信息不能複製）
#CTAS
create table student as select * from t_student #建立表並複製

修改表屬性rest

alter table old_name rename to new_name ;   #修改表名
alter table t_name set tb_properties (property_name=roperty_val)  #修改表的屬性
alter table t_name set serdeproperties('field.delim'='-');   #修改列的分隔符
alter table t_name add cloumns(f_name type) #增長一個字段
alter table t_name drop   #hive自己不支持
alter table t_name replace columns(id int ,name string )  #替換全部的列
alter table t_name change old_field_name new_field_name type [first|after field] #修改字段的名稱、類型以及位置
#接下來是對分區表操做：
alter table t_name add partition(分區字段=‘value’)  #添加一個分區
alter table t_name add partition(分區字段=‘value’) partition (分區字段=‘value’)  #添加多個分區
alter table t_name drop partition(分區字段='value')  #刪除分區
alter table t_name partition(分區字段=‘value’) set location 'hdfs path' #修改分區路徑
alter table t_name partition(分區字段=‘value’) enable no_drop ; #防止分區被刪除
alter table t_name partition(分區字段=‘value’) enable offline #防止分區被查詢

刪除表：code

drop tab;e if exists t_name ;   #刪除表
注意：
1. 內部表刪除：元數據和數據都刪
2. 內部表刪除：元數據和數據都刪
3. 分區表（內部表）：全部的分區都會被刪除，元數據和數據都刪
4. 分桶表的刪除和普通表的刪除沒有任何區別
truncate table t_name ;  #清空表的內容

對錶的常見操做：orm

show tables ;  #查看庫下的全部表
show partitions t_name;  # 查看錶的分區
show partitions 表名 partition(分區字段=‘value’)  #查看某個分區
desc t_name ;   #查看錶的詳細信息
desc extended 表名  #查看錶的詳細信息   
desc formatted 表名  #查看錶的詳細信息

3.hive的DML操做：

（1）數據的裝載：

load data local inpath 'linux path' into table t_name ;   #本地導入
local data inpath 'hdfs path' into table t_name    #從hdfs中導入
#注意：若是是內部表的話，在hdfs導入，那麼本來的數據會被移動到相應的表的目錄下
load data local inpath 'linux path ' overwrite into table 表名；  #覆蓋導入

（2）數據的插入：

注意：排序

insert into //表示追加操做
insert overwrite   //表示覆蓋插入操做

insert into table t_name(fields1,fields2,fields3) values(value1,value2,value3)  #插入一條數據
insert into table t_name select * from tt_name;  #利用查詢，將結果導入表中
#分區表的多重插入
insert into talbe student_ptn partition(department=’SC’) select id ,name,age ,sex from student where dept=’ SC’; 
insert into talbe student_ptn partition(department=’AC’) select id ,name,age ,sex from student where dept=’ AC’;
insert into talbe student_ptn partition(department=’ES’) select id ,name,age ,sex from student where dept=’ ES’;
上面的方法是使用單個sql去查詢表，可是這裏每執行一個sql就須要對student表中的全部數據進行掃描，效率過低！
轉換：
from student 
insert into table student_ptn partition(department=’SC’) select id ,name,age ,sex where dept=’ SC’; 
insert into talbe student_ptn partition(department=’AC’) select id ,name,age ,sex  where dept=’ AC’;
insert into talbe student_ptn partition(department=’ES’) select id ,name,age ,sex where dept=’ ES’;
這種方式進行數據的處理，只須要掃描表一次，整個MR程序就是一個輸入多個輸出，若是指定的分區不存在，在執行這條語句時會自動建立。
# 分桶表的數據插入，這裏分桶表只能使用insert進行數據插入
insert into table stu_bck select * from 表名  #和普通的插入同樣
**分桶的原則：分桶字段的hashcode值%分桶個數=  相同的值分在一組

動態分區插入和靜態分區插入：
靜態分區插入：要進行數據插入的數據的定義是手動指定的（分區在插入以前指定）
動態分區插入：用來解決靜態分區插入的缺點。按照某個分區字段的值進行判斷，每遇到一個不一樣的值，當前的程序自行進行判斷來建立對應的分區
舉例：string

#靜態分區插入：
load data local inpath 「路徑」 into table 表名 partition(dpt=’’)
insert into talbe student_ptn partition(department=’SC’) select id ,name,age ,sex where dept=’ SC’;
#動態分區插入：
insert into table t_name partition(字段名) select * from tt_name #這裏查詢的表的最後一個字段須要是分區字段。
#多分區動態插入：
insert into table stu_ptn01 partition(sex,department) select id,name,age,sex,department from student_manager;  #只要查詢字段的最後幾個字段是分區字段便可，順序不能顛倒

注意：若是想使用動態分區插入須要在hive中開啓幾個參數：

set hive.exec.dynamic.partiton=true;  #打開動態分區開關
set hive.exec.dynamic.partition.mode=nonstrict ;  #關閉動態分區插入的不合法約束。

靜態分區插入和動態分區插入的區別：
- 靜態分區插入數據後，須要指定分區的名字，而動態分區不須要
- 靜態分區中可能會存在某一個分區沒有數據，分區的目錄是一個空目錄，動態分區的時候根據實際的數據生成分區，每個分區至少有一條數據
- 3）動態分區的時候，每個分區都會對應配置文件中設置的reducetask的個數，
set reducetask=3

（3）數據的導出：

#單重導出
insert overwrite local directory 'linux path' select * from t_name;
#多重導出
from t_name insert overwrite local directory 'linux path' select * where ...
insert overwrite local directory 'linux path' select * where...

4.hive的DQL操做：

查詢語句的書寫順序：select fields ... from [join] where group by having order by limit
查詢語句的執行順序：from ----join ----group by ---having ---select ---ordey by -----limit

（1）hive中的join

特色：
- Hive中鏈接，只支持等值鏈接不支持不等值鏈接
- Hive中and鏈接，不支持or
- Hive支持多表關聯，可是hive中進行關聯的時候儘可能避免笛卡爾積
- Hive支持in 和 exists 可是效率特別低
舉例：

#內鏈接
select a.id aid,a.name name,b.id bid,b.score score from test_a a inner join test_b b on a.id=b.id;  (交集)
#左外連接：以join左側的表爲基礎表  左側的表的全部數據都會顯示  右側能夠關聯上的就會補全  關聯不上 null補充
select a.id aid,a.name name,b.id bid,b.score score from test_a a left join test_b b on a.id=b.id;
#右外連接：以join右側的表爲基礎
select a.id aid,a.name name,b.id bid,b.score score from test_a a right join test_b b on a.id=b.id;
#全外連接：取兩個表的並集
select a.id aid,a.name name,b.id bid,b.score score from test_a a full join test_b b on a.id=b.id;
#半鏈接，至關於內鏈接  取左半表的數據，左表中在右表中出現關聯上的數據
select * from test_a a left semi join test_b b on a.id=b.id;

關於 left semi join 的特色：
left semi join 是對hive中的exists/in的一個更高級額的操做。
- left semi join 的限制是， JOIN 子句中右邊的表只能在 ON 子句中設置過濾條件，在 WHERE 子句、SELECT 子句或其餘地方過濾都不行。
- left semi join 是隻傳遞表的 join key 給 map 階段，所以left semi join 中最後 select 的結果只許出現左表。
- 由於 left semi join 是 in(keySet) 的關係，遇到右表重複記錄，左表會跳過，而 join 則會一直遍歷。這就致使右表有重複值得狀況下 left semi join 只產生一條，join 會產生多條，也會致使 left semi join 的性能更高。

（2）hive中的排序

order by
特色：局排序
例：select * from 表名 order by 字段 desc; (降序排序)

sort by
特色：sort by 是一個局部排序，在每個reduce中進行排序，當reduceTask個數爲1個時，這時與全局排序相同
原理：sort by 在進行分每個reduceTask中的數據時，時隨機選擇的字段進行分配
例：select * from 表名sort by 字段；

distribute by
特色：按照指定字段分桶，在每一個桶中進行排序。
例1：select * from 表名 distribute by 字段（字段.hash%分桶個數）
例2：select * from 表名 distribute by 分桶字段 sort by 排序字段 #按照指定字段分桶，在每個桶中進行排序

cluster by
特色：既分桶又排序
例：select * from 表名 cluster by 分桶排序字段
注意：當分桶字段和排序字段相同時：distribute by+ sort by= cluster by，不然distribute by+ sort by功能更強大一些！

（3）union和union all區別

union和union all：都是將查詢結果進行拼接，（鏈接的兩個表的結構必須相同）

select * from xxx  union selecet * from xxx
select * from xxx  union all selecet * from xxx

union：表示去重鏈接union all ：表示不去重鏈接

1. hive的語法命令介紹
2. hive------select語法介紹
3. awk命令用法介紹
4. 【hive】hive介紹
5. mysqlshow命令的用法介紹
6. docker 命令介紹
7. iptables 命令介紹
8. jps命令介紹
9. Linux命令介紹
10. npx命令介紹
更多相關文章...
• 網站主機介紹 - 網站主機教程
• Docker info 命令 - Docker命令大全
• Docker 清理命令
• Java Agent入門實戰（一）-Instrumentation介紹與使用

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。