正文數據庫
CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_nameapache
[COMMENT database_comment] //關於數據塊的描述ide
[LOCATION hdfs_path] //指定數據庫在HDFS上的存儲位置oop
[WITH DBPROPERTIES (property_name=property_value, ...)]; //指定數據塊屬性spa
默認地址:/user/hive/warehouse/db_name.db/table_name/partition_name/…rest
0: jdbc:hive2://hadoop3:10000> create database t1; No rows affected (0.308 seconds) 0: jdbc:hive2://hadoop3:10000> show databases; +----------------+ | database_name | +----------------+ | default | | myhive | | t1 | +----------------+ 3 rows selected (0.393 seconds) 0: jdbc:hive2://hadoop3:10000>
0: jdbc:hive2://hadoop3:10000> create database if not exists t1; No rows affected (0.176 seconds) 0: jdbc:hive2://hadoop3:10000>
0: jdbc:hive2://hadoop3:10000> create database if not exists t2 comment 'learning hive';
No rows affected (0.217 seconds)
0: jdbc:hive2://hadoop3:10000> code
0: jdbc:hive2://hadoop3:10000> create database if not exists t3 with dbproperties('creator'='hadoop','date'='2018-04-05'); No rows affected (0.255 seconds) 0: jdbc:hive2://hadoop3:10000>
0: jdbc:hive2://hadoop3:10000> show databases;
+----------------+
| database_name |
+----------------+
| default |
| myhive |
| t1 |
| t2 |
| t3 |
+----------------+
5 rows selected (0.164 seconds)
0: jdbc:hive2://hadoop3:10000>orm
語法htm
desc database [extended] dbname;
示例
0: jdbc:hive2://hadoop3:10000> desc database extended t3; +----------+----------+------------------------------------------+-------------+-------------+------------------------------------+ | db_name | comment | location | owner_name | owner_type | parameters | +----------+----------+------------------------------------------+-------------+-------------+------------------------------------+ | t3 | | hdfs://myha01/user/hive/warehouse/t3.db | hadoop | USER | {date=2018-04-05, creator=hadoop} | +----------+----------+------------------------------------------+-------------+-------------+------------------------------------+ 1 row selected (0.11 seconds) 0: jdbc:hive2://hadoop3:10000>
0: jdbc:hive2://hadoop3:10000> select current_database(); +----------+ | _c0 | +----------+ | default | +----------+ 1 row selected (1.36 seconds) 0: jdbc:hive2://hadoop3:10000>
0: jdbc:hive2://hadoop3:10000> show create database t3; +----------------------------------------------+ | createdb_stmt | +----------------------------------------------+ | CREATE DATABASE `t3` | | LOCATION | | 'hdfs://myha01/user/hive/warehouse/t3.db' | | WITH DBPROPERTIES ( | | 'creator'='hadoop', | | 'date'='2018-04-05') | +----------------------------------------------+ 6 rows selected (0.155 seconds) 0: jdbc:hive2://hadoop3:10000>
刪除庫操做
drop database dbname; drop database if exists dbname;
默認狀況下,hive 不容許刪除包含表的數據庫,有兩種解決辦法:
一、 手動刪除庫下全部表,而後刪除庫
二、 使用 cascade 關鍵字
drop database if exists dbname cascade;
默認狀況下就是 restrict drop database if exists myhive ==== drop database if exists myhive restrict
0: jdbc:hive2://hadoop3:10000> show tables in t1; +-----------+ | tab_name | +-----------+ +-----------+ No rows selected (0.147 seconds) 0: jdbc:hive2://hadoop3:10000> drop database t1; No rows affected (0.178 seconds) 0: jdbc:hive2://hadoop3:10000> show databases; +----------------+ | database_name | +----------------+ | default | | myhive | | t2 | | t3 | +----------------+ 4 rows selected (0.124 seconds) 0: jdbc:hive2://hadoop3:10000>
0: jdbc:hive2://hadoop3:10000> drop database if exists t3 cascade; No rows affected (1.56 seconds) 0: jdbc:hive2://hadoop3:10000>
use database_name
0: jdbc:hive2://hadoop3:10000> use t2; No rows affected (0.109 seconds) 0: jdbc:hive2://hadoop3:10000>
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
[(col_name data_type [COMMENT col_comment], ...)]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...)
[SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
[ROW FORMAT row_format]
[STORED AS file_format]
[LOCATION hdfs_path]
詳情請參見: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualD DL-CreateTable
•CREATE TABLE 建立一個指定名字的表。若是相同名字的表已經存在,則拋出異常;用戶能夠用 IF NOT EXIST 選項來忽略這個異常 •EXTERNAL 關鍵字可讓用戶建立一個外部表,在建表的同時指定一個指向實際數據的路徑(LOCATION) •LIKE 容許用戶複製現有的表結構,可是不復制數據 •COMMENT能夠爲表與字段增長描述
•PARTITIONED BY 指定分區
•ROW FORMAT
DELIMITED [FIELDS TERMINATED BY char] [COLLECTION ITEMS TERMINATED BY char]
MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
| SERDE serde_name [WITH SERDEPROPERTIES
(property_name=property_value, property_name=property_value, ...)]
用戶在建表的時候能夠自定義 SerDe 或者使用自帶的 SerDe。若是沒有指定 ROW FORMAT 或者 ROW FORMAT DELIMITED,將會使用自帶的 SerDe。在建表的時候,
用戶還須要爲表指定列,用戶在指定表的列的同時也會指定自定義的 SerDe,Hive 經過 SerDe 肯定表的具體的列的數據。
•STORED AS
SEQUENCEFILE //序列化文件
| TEXTFILE //普通的文本文件格式
| RCFILE //行列存儲相結合的文件
| INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname //自定義文件格式
若是文件數據是純文本,可使用 STORED AS TEXTFILE。若是數據須要壓縮,使用 STORED AS SEQUENCE 。
•LOCATION指定表在HDFS的存儲路徑
最佳實踐:
若是一份數據已經存儲在HDFS上,而且要被多個用戶或者客戶端使用,最好建立外部表
反之,最好建立內部表。
若是不指定,就按照默認的規則存儲在默認的倉庫路徑中。
使用t2數據庫進行操做
0: jdbc:hive2://hadoop3:10000> create table student(id int, name string, sex string, age int,department string) row format delimited fields terminated by ","; No rows affected (0.222 seconds) 0: jdbc:hive2://hadoop3:10000> desc student; +-------------+------------+----------+ | col_name | data_type | comment | +-------------+------------+----------+ | id | int | | | name | string | | | sex | string | | | age | int | | | department | string | | +-------------+------------+----------+ 5 rows selected (0.168 seconds) 0: jdbc:hive2://hadoop3:10000>
0: jdbc:hive2://hadoop3:10000> create external table student_ext
(id int, name string, sex string, age int,department string) row format delimited fields terminated by "," location "/hive/student"; No rows affected (0.248 seconds) 0: jdbc:hive2://hadoop3:10000>
0: jdbc:hive2://hadoop3:10000> create external table student_ptn(id int, name string, sex string, age int,department string) . . . . . . . . . . . . . . .> partitioned by (city string) . . . . . . . . . . . . . . .> row format delimited fields terminated by "," . . . . . . . . . . . . . . .> location "/hive/student_ptn"; No rows affected (0.24 seconds) 0: jdbc:hive2://hadoop3:10000>
添加分區
0: jdbc:hive2://hadoop3:10000> alter table student_ptn add partition(city="beijing"); No rows affected (0.269 seconds) 0: jdbc:hive2://hadoop3:10000> alter table student_ptn add partition(city="shenzhen"); No rows affected (0.236 seconds) 0: jdbc:hive2://hadoop3:10000>
若是某張表是分區表。那麼每一個分區的定義,其實就表現爲了這張表的數據存儲目錄下的一個子目錄
若是是分區表。那麼數據文件必定要存儲在某個分區中,而不能直接存儲在表中。
0: jdbc:hive2://hadoop3:10000> create external table student_bck(id int, name string, sex string, age int,department string) . . . . . . . . . . . . . . .> clustered by (id) sorted by (id asc, name desc) into 4 buckets . . . . . . . . . . . . . . .> row format delimited fields terminated by "," . . . . . . . . . . . . . . .> location "/hive/student_bck"; No rows affected (0.216 seconds) 0: jdbc:hive2://hadoop3:10000>
做用: 就是從一個查詢SQL的結果來建立一個表進行存儲
現象student表中導入數據
0: jdbc:hive2://hadoop3:10000> load data local inpath "/home/hadoop/student.txt" into table student; No rows affected (0.715 seconds) 0: jdbc:hive2://hadoop3:10000> select * from student; +-------------+---------------+--------------+--------------+---------------------+ | student.id | student.name | student.sex | student.age | student.department | +-------------+---------------+--------------+--------------+---------------------+ | 95002 | 劉晨 | 女 | 19 | IS | | 95017 | 王風娟 | 女 | 18 | IS | | 95018 | 王一 | 女 | 19 | IS | | 95013 | 馮偉 | 男 | 21 | CS | | 95014 | 王小麗 | 女 | 19 | CS | | 95019 | 邢小麗 | 女 | 19 | IS | | 95020 | 趙錢 | 男 | 21 | IS | | 95003 | 王敏 | 女 | 22 | MA | | 95004 | 張立 | 男 | 19 | IS | | 95012 | 孫花 | 女 | 20 | CS | | 95010 | 孔小濤 | 男 | 19 | CS | | 95005 | 劉剛 | 男 | 18 | MA | | 95006 | 孫慶 | 男 | 23 | CS | | 95007 | 易思玲 | 女 | 19 | MA | | 95008 | 李娜 | 女 | 18 | CS | | 95021 | 週二 | 男 | 17 | MA | | 95022 | 鄭明 | 男 | 20 | MA | | 95001 | 李勇 | 男 | 20 | CS | | 95011 | 包小柏 | 男 | 18 | MA | | 95009 | 夢圓圓 | 女 | 18 | MA | | 95015 | 王君 | 男 | 18 | MA | +-------------+---------------+--------------+--------------+---------------------+ 21 rows selected (0.342 seconds) 0: jdbc:hive2://hadoop3:10000>
使用CTAS建立表
0: jdbc:hive2://hadoop3:10000> create table student_ctas as select * from student where id < 95012; WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution
engine (i.e. spark, tez) or using Hive 1.X releases. No rows affected (34.514 seconds) 0: jdbc:hive2://hadoop3:10000> select * from student_ctas . . . . . . . . . . . . . . .> ; +------------------+--------------------+-------------------+-------------------+--------------------------+ | student_ctas.id | student_ctas.name | student_ctas.sex | student_ctas.age | student_ctas.department | +------------------+--------------------+-------------------+-------------------+--------------------------+ | 95002 | 劉晨 | 女 | 19 | IS | | 95003 | 王敏 | 女 | 22 | MA | | 95004 | 張立 | 男 | 19 | IS | | 95010 | 孔小濤 | 男 | 19 | CS | | 95005 | 劉剛 | 男 | 18 | MA | | 95006 | 孫慶 | 男 | 23 | CS | | 95007 | 易思玲 | 女 | 19 | MA | | 95008 | 李娜 | 女 | 18 | CS | | 95001 | 李勇 | 男 | 20 | CS | | 95011 | 包小柏 | 男 | 18 | MA | | 95009 | 夢圓圓 | 女 | 18 | MA | +------------------+--------------------+-------------------+-------------------+--------------------------+ 11 rows selected (0.445 seconds) 0: jdbc:hive2://hadoop3:10000>
0: jdbc:hive2://hadoop3:10000> create table student_copy like student; No rows affected (0.217 seconds) 0: jdbc:hive2://hadoop3:10000>
注意:
若是在table的前面沒有加external關鍵字,那麼複製出來的新表。不管如何都是內部表
若是在table的前面有加external關鍵字,那麼複製出來的新表。不管如何都是外部表
0: jdbc:hive2://hadoop3:10000> show tables; +---------------+ | tab_name | +---------------+ | student | | student_bck | | student_copy | | student_ctas | | student_ext | | student_ptn | +---------------+ 6 rows selected (0.163 seconds) 0: jdbc:hive2://hadoop3:10000>
0: jdbc:hive2://hadoop3:10000> show tables in myhive; +-----------+ | tab_name | +-----------+ | student | +-----------+ 1 row selected (0.144 seconds) 0: jdbc:hive2://hadoop3:10000>
0: jdbc:hive2://hadoop3:10000> show tables like 'student_c*'; +---------------+ | tab_name | +---------------+ | student_copy | | student_ctas | +---------------+ 2 rows selected (0.13 seconds) 0: jdbc:hive2://hadoop3:10000>
0: jdbc:hive2://hadoop3:10000> desc student; +-------------+------------+----------+ | col_name | data_type | comment | +-------------+------------+----------+ | id | int | | | name | string | | | sex | string | | | age | int | | | department | string | | +-------------+------------+----------+ 5 rows selected (0.149 seconds) 0: jdbc:hive2://hadoop3:10000>
0: jdbc:hive2://hadoop3:10000> desc extended student;
0: jdbc:hive2://hadoop3:10000> desc formatted student;
0: jdbc:hive2://hadoop3:10000> show partitions student_ptn;
0: jdbc:hive2://hadoop3:10000> show create table student_ptn;
0: jdbc:hive2://hadoop3:10000> alter table student rename to new_student;
0: jdbc:hive2://hadoop3:10000> alter table new_student add columns (score int);
0: jdbc:hive2://hadoop3:10000> alter table new_student change name new_name string;
不支持
0: jdbc:hive2://hadoop3:10000> alter table new_student replace columns (id int, name string, address string);
靜態分區
添加一個
0: jdbc:hive2://hadoop3:10000> alter table student_ptn add partition(city="chongqing");
添加多個
0: jdbc:hive2://hadoop3:10000> alter table student_ptn add partition(city="chongqing2") partition(city="chongqing3") partition(city="chongqing4");
動態分區
先向student_ptn表中插入數據,數據格式以下圖
0: jdbc:hive2://hadoop3:10000> load data local inpath "/home/hadoop/student.txt" into table student_ptn partition(city="beijing");
如今我把這張表的內容直接插入到另外一張表student_ptn_age中,並實現sex爲動態分區(不指定究竟是哪中性別,讓系統本身分配決定)
首先建立student_ptn_age並指定分區爲age
0: jdbc:hive2://hadoop3:10000> create table student_ptn_age(id int,name string,sex string,department string) partitioned by (age int);
從student_ptn表中查詢數據並插入student_ptn_age表中
0: jdbc:hive2://hadoop3:10000> insert overwrite table student_ptn_age partition(age) . . . . . . . . . . . . . . .> select id,name,sex,department,age from student_ptn; WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. No rows affected (27.905 seconds) 0: jdbc:hive2://hadoop3:10000>
修改分區,通常來講,都是指修改分區的數據存儲目錄
在添加分區的時候,直接指定當前分區的數據存儲目錄
0: jdbc:hive2://hadoop3:10000> alter table student_ptn add if not exists partition(city='beijing') . . . . . . . . . . . . . . .> location '/student_ptn_beijing' partition(city='cc') location '/student_cc'; No rows affected (0.306 seconds) 0: jdbc:hive2://hadoop3:10000>
修改已經指定好的分區的數據存儲目錄
0: jdbc:hive2://hadoop3:10000> alter table student_ptn partition (city='beijing') set location '/student_ptn_beijing';
此時原先的分區文件夾仍存在,可是在往分區添加數據時,只會添加到新的分區目錄
0: jdbc:hive2://hadoop3:10000> alter table student_ptn drop partition (city='beijing');
0: jdbc:hive2://hadoop3:10000> drop table new_student;
0: jdbc:hive2://hadoop3:10000> truncate table student_ptn;