Hive學習之路（七）Hive的DDL操做

時間 2019-11-19

標籤 hive 學習之路 ddl 欄目 Hadoop 简体版

原文原文鏈接

正文數據庫

庫操做

一、建立庫

語法結構

CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_nameapache

　　[COMMENT database_comment]　　　　　　//關於數據塊的描述ide

　　[LOCATION hdfs_path]　　　　　　　　　　//指定數據庫在HDFS上的存儲位置oop

　　[WITH DBPROPERTIES (property_name=property_value, ...)];　　　　//指定數據塊屬性spa

　　默認地址：/user/hive/warehouse/db_name.db/table_name/partition_name/…rest

建立庫的方式

（1）建立普通的數據庫

0: jdbc:hive2://hadoop3:10000> create database t1;
No rows affected (0.308 seconds)
0: jdbc:hive2://hadoop3:10000> show databases;
+----------------+
| database_name  |
+----------------+
| default        |
| myhive         |
| t1             |
+----------------+
3 rows selected (0.393 seconds)
0: jdbc:hive2://hadoop3:10000>

（2）建立庫的時候檢查存與否

0: jdbc:hive2://hadoop3:10000> create database if not exists t1;
No rows affected (0.176 seconds)
0: jdbc:hive2://hadoop3:10000>

（3）建立庫的時候帶註釋

0: jdbc:hive2://hadoop3:10000> create database if not exists t2 comment 'learning hive';
No rows affected (0.217 seconds)
0: jdbc:hive2://hadoop3:10000> code

（4）建立帶屬性的庫

0: jdbc:hive2://hadoop3:10000> create database if not exists t3 with dbproperties('creator'='hadoop','date'='2018-04-05');
No rows affected (0.255 seconds)
0: jdbc:hive2://hadoop3:10000>

二、查看庫

查看庫的方式

（1）查看有哪些數據庫

0: jdbc:hive2://hadoop3:10000> show databases;
+----------------+
| database_name |
+----------------+
| default |
| myhive |
| t1 |
| t2 |
| t3 |
+----------------+
5 rows selected (0.164 seconds)
0: jdbc:hive2://hadoop3:10000>orm

（2）顯示數據庫的詳細屬性信息

語法htm

desc database [extended] dbname;

示例

0: jdbc:hive2://hadoop3:10000> desc database extended t3;
+----------+----------+------------------------------------------+-------------+-------------+------------------------------------+
| db_name  | comment  |                 location                 | owner_name  | owner_type  |             parameters             |
+----------+----------+------------------------------------------+-------------+-------------+------------------------------------+
| t3       |          | hdfs://myha01/user/hive/warehouse/t3.db  | hadoop      | USER        | {date=2018-04-05, creator=hadoop}  |
+----------+----------+------------------------------------------+-------------+-------------+------------------------------------+
1 row selected (0.11 seconds)
0: jdbc:hive2://hadoop3:10000>

（3）查看正在使用哪一個庫

0: jdbc:hive2://hadoop3:10000> select current_database();
+----------+
|   _c0    |
+----------+
| default  |
+----------+
1 row selected (1.36 seconds)
0: jdbc:hive2://hadoop3:10000>

（4）查看建立庫的詳細語句

0: jdbc:hive2://hadoop3:10000> show create database t3;
+----------------------------------------------+
|                createdb_stmt                 |
+----------------------------------------------+
| CREATE DATABASE `t3`                         |
| LOCATION                                     |
|   'hdfs://myha01/user/hive/warehouse/t3.db'  |
| WITH DBPROPERTIES (                          |
|   'creator'='hadoop',                        |
|   'date'='2018-04-05')                       |
+----------------------------------------------+
6 rows selected (0.155 seconds)
0: jdbc:hive2://hadoop3:10000>

三、刪除庫

說明

刪除庫操做

drop database dbname;
drop database if exists dbname;

默認狀況下，hive 不容許刪除包含表的數據庫，有兩種解決辦法：

一、手動刪除庫下全部表，而後刪除庫

二、使用 cascade 關鍵字

drop database if exists dbname cascade;

默認狀況下就是 restrict drop database if exists myhive ==== drop database if exists myhive restrict

示例

（1）刪除不含表的數據庫

0: jdbc:hive2://hadoop3:10000> show tables in t1;
+-----------+
| tab_name  |
+-----------+
+-----------+
No rows selected (0.147 seconds)
0: jdbc:hive2://hadoop3:10000> drop database t1;
No rows affected (0.178 seconds)
0: jdbc:hive2://hadoop3:10000> show databases;
+----------------+
| database_name  |
+----------------+
| default        |
| myhive         |
| t2             |
| t3             |
+----------------+
4 rows selected (0.124 seconds)
0: jdbc:hive2://hadoop3:10000>

（2）刪除含有表的數據庫

0: jdbc:hive2://hadoop3:10000> drop database if exists t3 cascade;
No rows affected (1.56 seconds)
0: jdbc:hive2://hadoop3:10000>

四、切換庫

語法

use database_name

示例

0: jdbc:hive2://hadoop3:10000> use t2;
No rows affected (0.109 seconds)
0: jdbc:hive2://hadoop3:10000>

表操做

一、建立表

語法

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name

　　[(col_name data_type [COMMENT col_comment], ...)]

　　[COMMENT table_comment]

　　[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]

　　[CLUSTERED BY (col_name, col_name, ...)

　　　　[SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]

　　[ROW FORMAT row_format]

　　[STORED AS file_format]

　　[LOCATION hdfs_path]

詳情請參見： https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualD DL-CreateTable

•CREATE TABLE 建立一個指定名字的表。若是相同名字的表已經存在，則拋出異常；用戶能夠用 IF NOT EXIST 選項來忽略這個異常
•EXTERNAL 關鍵字可讓用戶建立一個外部表，在建表的同時指定一個指向實際數據的路徑（LOCATION）
•LIKE 容許用戶複製現有的表結構，可是不復制數據
•COMMENT能夠爲表與字段增長描述

•PARTITIONED BY 指定分區
•ROW FORMAT 
　　DELIMITED [FIELDS TERMINATED BY char] [COLLECTION ITEMS TERMINATED BY char] 
　　　　MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char] 
　　　　| SERDE serde_name [WITH SERDEPROPERTIES 
　　　　(property_name=property_value, property_name=property_value, ...)] 
　　用戶在建表的時候能夠自定義 SerDe 或者使用自帶的 SerDe。若是沒有指定 ROW FORMAT 或者 ROW FORMAT DELIMITED，將會使用自帶的 SerDe。在建表的時候，
用戶還須要爲表指定列，用戶在指定表的列的同時也會指定自定義的 SerDe，Hive 經過 SerDe 肯定表的具體的列的數據。 
•STORED AS 
　　SEQUENCEFILE //序列化文件
　　| TEXTFILE //普通的文本文件格式
　　| RCFILE　　//行列存儲相結合的文件
　　| INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname //自定義文件格式
　　若是文件數據是純文本，可使用 STORED AS TEXTFILE。若是數據須要壓縮，使用 STORED AS SEQUENCE 。

•LOCATION指定表在HDFS的存儲路徑

最佳實踐：
　　若是一份數據已經存儲在HDFS上，而且要被多個用戶或者客戶端使用，最好建立外部表
　　反之，最好建立內部表。

　　若是不指定，就按照默認的規則存儲在默認的倉庫路徑中。

示例

使用t2數據庫進行操做

（1）建立默認的內部表

0: jdbc:hive2://hadoop3:10000> create table student(id int, name string, sex string, age int,department string) row format delimited fields terminated by ",";
No rows affected (0.222 seconds)
0: jdbc:hive2://hadoop3:10000> desc student;
+-------------+------------+----------+
|  col_name   | data_type  | comment  |
+-------------+------------+----------+
| id          | int        |          |
| name        | string     |          |
| sex         | string     |          |
| age         | int        |          |
| department  | string     |          |
+-------------+------------+----------+
5 rows selected (0.168 seconds)
0: jdbc:hive2://hadoop3:10000>

（2）外部表

0: jdbc:hive2://hadoop3:10000> create external table student_ext
(id int, name string, sex string, age int,department string) row format delimited fields terminated by "," location "/hive/student";
No rows affected (0.248 seconds)
0: jdbc:hive2://hadoop3:10000>

（3）分區表

0: jdbc:hive2://hadoop3:10000> create external table student_ptn(id int, name string, sex string, age int,department string)
. . . . . . . . . . . . . . .> partitioned by (city string)
. . . . . . . . . . . . . . .> row format delimited fields terminated by ","
. . . . . . . . . . . . . . .> location "/hive/student_ptn";
No rows affected (0.24 seconds)
0: jdbc:hive2://hadoop3:10000>

添加分區

0: jdbc:hive2://hadoop3:10000> alter table student_ptn add partition(city="beijing");
No rows affected (0.269 seconds)
0: jdbc:hive2://hadoop3:10000> alter table student_ptn add partition(city="shenzhen");
No rows affected (0.236 seconds)
0: jdbc:hive2://hadoop3:10000>

若是某張表是分區表。那麼每一個分區的定義，其實就表現爲了這張表的數據存儲目錄下的一個子目錄
若是是分區表。那麼數據文件必定要存儲在某個分區中，而不能直接存儲在表中。

（4）分桶表

0: jdbc:hive2://hadoop3:10000> create external table student_bck(id int, name string, sex string, age int,department string)
. . . . . . . . . . . . . . .> clustered by (id) sorted by (id asc, name desc) into 4 buckets
. . . . . . . . . . . . . . .> row format delimited fields terminated by ","
. . . . . . . . . . . . . . .> location "/hive/student_bck";
No rows affected (0.216 seconds)
0: jdbc:hive2://hadoop3:10000>

（5）使用CTAS建立表

做用：就是從一個查詢SQL的結果來建立一個表進行存儲

現象student表中導入數據

0: jdbc:hive2://hadoop3:10000> load data local inpath "/home/hadoop/student.txt" into table student;
No rows affected (0.715 seconds)
0: jdbc:hive2://hadoop3:10000> select * from student;
+-------------+---------------+--------------+--------------+---------------------+
| student.id  | student.name  | student.sex  | student.age  | student.department  |
+-------------+---------------+--------------+--------------+---------------------+
| 95002       | 劉晨            | 女            | 19           | IS                  |
| 95017       | 王風娟           | 女            | 18           | IS                  |
| 95018       | 王一            | 女            | 19           | IS                  |
| 95013       | 馮偉            | 男            | 21           | CS                  |
| 95014       | 王小麗           | 女            | 19           | CS                  |
| 95019       | 邢小麗           | 女            | 19           | IS                  |
| 95020       | 趙錢            | 男            | 21           | IS                  |
| 95003       | 王敏            | 女            | 22           | MA                  |
| 95004       | 張立            | 男            | 19           | IS                  |
| 95012       | 孫花            | 女            | 20           | CS                  |
| 95010       | 孔小濤           | 男            | 19           | CS                  |
| 95005       | 劉剛            | 男            | 18           | MA                  |
| 95006       | 孫慶            | 男            | 23           | CS                  |
| 95007       | 易思玲           | 女            | 19           | MA                  |
| 95008       | 李娜            | 女            | 18           | CS                  |
| 95021       | 週二            | 男            | 17           | MA                  |
| 95022       | 鄭明            | 男            | 20           | MA                  |
| 95001       | 李勇            | 男            | 20           | CS                  |
| 95011       | 包小柏           | 男            | 18           | MA                  |
| 95009       | 夢圓圓           | 女            | 18           | MA                  |
| 95015       | 王君            | 男            | 18           | MA                  |
+-------------+---------------+--------------+--------------+---------------------+
21 rows selected (0.342 seconds)
0: jdbc:hive2://hadoop3:10000>

使用CTAS建立表

0: jdbc:hive2://hadoop3:10000> create table student_ctas as select * from student where id < 95012;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution 
engine (i.e. spark, tez) or using Hive 1.X releases.
No rows affected (34.514 seconds)
0: jdbc:hive2://hadoop3:10000> select * from student_ctas
. . . . . . . . . . . . . . .> ;
+------------------+--------------------+-------------------+-------------------+--------------------------+
| student_ctas.id  | student_ctas.name  | student_ctas.sex  | student_ctas.age  | student_ctas.department  |
+------------------+--------------------+-------------------+-------------------+--------------------------+
| 95002            | 劉晨                 | 女                 | 19                | IS                       |
| 95003            | 王敏                 | 女                 | 22                | MA                       |
| 95004            | 張立                 | 男                 | 19                | IS                       |
| 95010            | 孔小濤                | 男                 | 19                | CS                       |
| 95005            | 劉剛                 | 男                 | 18                | MA                       |
| 95006            | 孫慶                 | 男                 | 23                | CS                       |
| 95007            | 易思玲                | 女                 | 19                | MA                       |
| 95008            | 李娜                 | 女                 | 18                | CS                       |
| 95001            | 李勇                 | 男                 | 20                | CS                       |
| 95011            | 包小柏                | 男                 | 18                | MA                       |
| 95009            | 夢圓圓                | 女                 | 18                | MA                       |
+------------------+--------------------+-------------------+-------------------+--------------------------+
11 rows selected (0.445 seconds)
0: jdbc:hive2://hadoop3:10000>

（6）複製表結構

0: jdbc:hive2://hadoop3:10000> create table student_copy like student;
No rows affected (0.217 seconds)
0: jdbc:hive2://hadoop3:10000>

注意：

若是在table的前面沒有加external關鍵字，那麼複製出來的新表。不管如何都是內部表
若是在table的前面有加external關鍵字，那麼複製出來的新表。不管如何都是外部表

二、查看錶

（1）查看錶列表

查看當前使用的數據庫中有哪些表

0: jdbc:hive2://hadoop3:10000> show tables;
+---------------+
|   tab_name    |
+---------------+
| student       |
| student_bck   |
| student_copy  |
| student_ctas  |
| student_ext   |
| student_ptn   |
+---------------+
6 rows selected (0.163 seconds)
0: jdbc:hive2://hadoop3:10000>

查看非當前使用的數據庫中有哪些表

0: jdbc:hive2://hadoop3:10000> show tables in myhive;
+-----------+
| tab_name  |
+-----------+
| student   |
+-----------+
1 row selected (0.144 seconds)
0: jdbc:hive2://hadoop3:10000>

查看數據庫中以xxx開頭的表

0: jdbc:hive2://hadoop3:10000> show tables like 'student_c*';
+---------------+
|   tab_name    |
+---------------+
| student_copy  |
| student_ctas  |
+---------------+
2 rows selected (0.13 seconds)
0: jdbc:hive2://hadoop3:10000>

（2）查看錶的詳細信息

查看錶的信息

0: jdbc:hive2://hadoop3:10000> desc student;
+-------------+------------+----------+
|  col_name   | data_type  | comment  |
+-------------+------------+----------+
| id          | int        |          |
| name        | string     |          |
| sex         | string     |          |
| age         | int        |          |
| department  | string     |          |
+-------------+------------+----------+
5 rows selected (0.149 seconds)
0: jdbc:hive2://hadoop3:10000>

查看錶的詳細信息（格式不友好）

0: jdbc:hive2://hadoop3:10000> desc extended student;

查看錶的詳細信息（格式友好）

0: jdbc:hive2://hadoop3:10000> desc formatted student;

查看分區信息

0: jdbc:hive2://hadoop3:10000> show partitions student_ptn;

（3）查看錶的詳細建表語句

0: jdbc:hive2://hadoop3:10000> show create table student_ptn;

三、修改表

（1）修改表名

0: jdbc:hive2://hadoop3:10000> alter table student rename to new_student;

（2）修改字段定義

A. 增長一個字段

0: jdbc:hive2://hadoop3:10000> alter table new_student add columns (score int);

B. 修改一個字段的定義

0: jdbc:hive2://hadoop3:10000> alter table new_student change name new_name string;

C. 刪除一個字段

不支持

D. 替換全部字段

0: jdbc:hive2://hadoop3:10000> alter table new_student replace columns (id int, name string, address string);

（3）修改分區信息

A. 添加分區

靜態分區

　　添加一個

0: jdbc:hive2://hadoop3:10000> alter table student_ptn add partition(city="chongqing");

　　添加多個

0: jdbc:hive2://hadoop3:10000> alter table student_ptn add partition(city="chongqing2") partition(city="chongqing3") partition(city="chongqing4");

動態分區

先向student_ptn表中插入數據，數據格式以下圖

0: jdbc:hive2://hadoop3:10000> load data local inpath "/home/hadoop/student.txt" into table student_ptn partition(city="beijing");

如今我把這張表的內容直接插入到另外一張表student_ptn_age中，並實現sex爲動態分區（不指定究竟是哪中性別，讓系統本身分配決定）

首先建立student_ptn_age並指定分區爲age

0: jdbc:hive2://hadoop3:10000> create table student_ptn_age(id int,name string,sex string,department string) partitioned by (age int);

從student_ptn表中查詢數據並插入student_ptn_age表中

0: jdbc:hive2://hadoop3:10000> insert overwrite table student_ptn_age partition(age)
. . . . . . . . . . . . . . .> select id,name,sex,department，age from student_ptn;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
No rows affected (27.905 seconds)
0: jdbc:hive2://hadoop3:10000>

B. 修改分區

修改分區，通常來講，都是指修改分區的數據存儲目錄

在添加分區的時候，直接指定當前分區的數據存儲目錄

0: jdbc:hive2://hadoop3:10000> alter table student_ptn add if not exists partition(city='beijing') 
. . . . . . . . . . . . . . .> location '/student_ptn_beijing' partition(city='cc') location '/student_cc';
No rows affected (0.306 seconds)
0: jdbc:hive2://hadoop3:10000>

修改已經指定好的分區的數據存儲目錄

0: jdbc:hive2://hadoop3:10000> alter table student_ptn partition (city='beijing') set location '/student_ptn_beijing';

此時原先的分區文件夾仍存在，可是在往分區添加數據時，只會添加到新的分區目錄

C. 刪除分區

0: jdbc:hive2://hadoop3:10000> alter table student_ptn drop partition (city='beijing');

四、刪除表

0: jdbc:hive2://hadoop3:10000> drop table new_student;

五、清空表

0: jdbc:hive2://hadoop3:10000> truncate table student_ptn;

其餘輔助命令

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。

Hive學習之路 （七）Hive的DDL操做

庫操做