DDL,Hive Data Definition Language,數據定義語言;數據庫
通俗理解就是數據庫與庫表相關的操做,本文總結一下基本方法apache
hive 數據倉庫默認位置在 hdfs 上的 /user/hive/warehouse 路徑下;json
hive 有個默認的數據庫叫 default;安全
可是在 /user/hive/warehouse 路徑下沒有建立 default 文件夾,default 下的表是直接在 /user/hive/warehouse 路徑下 建立文件夾ide
在 hive 中,數據庫對應 hdfs 上一個路徑(叫文件夾或者更合適),數據表也對應 hdfs 上一個路徑,數據對應 hdfs 上一個文件oop
管理表也稱內建表;hive 默認建立的表都是管理表;測試
管理表和外建表的數據都存儲在 hdfs,由於都是 hive 的表;ui
hive 在建立內部表時,會把數據移動到數據倉庫指定的路徑,如 hdfs 某個地方;this
若是建立外部表,不會移動數據,僅在元數據中記錄數據所在的位置;url
最大的區別在於:當刪除內部表時,同時刪除數據和元數據;當刪除外部表時,僅刪除元數據,不刪除數據;
鑑於這種特性,管理表不適合共享數據,容易產生安全問題;
在實際工做中,通常使用外建表
查看錶的類型
hive> desc formatted student_p; Table Type: MANAGED_TABLE
管理表 to 外部表
hive> alter table student_p set tblproperties('EXTERNAL'='TRUE');
Table Type: EXTERNAL_TABLE
外部表 to 管理表
hive> alter table student_p set tblproperties('EXTERNAL'='FALSE');
注意必須大寫
CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name [COMMENT database_comment] [LOCATION hdfs_path] [WITH DBPROPERTIES (property_name=property_value, ...)];
示例
hive> create database hive1101 location '/usr/hive_test'; OK Time taken: 0.12 seconds
注意這裏 location 的地址並非 hive 默認的 hdfs 地址,說明是能夠指定非默認地址的
數據庫必須是空的
DROP (DATABASE|SCHEMA) [IF EXISTS] database_name [RESTRICT|CASCADE];
改變數據庫的屬性
ALTER (DATABASE|SCHEMA) database_name SET DBPROPERTIES (property_name=property_value, ...); -- (Note: SCHEMA added in Hive 0.14.0) ALTER (DATABASE|SCHEMA) database_name SET OWNER [USER|ROLE] user_or_role; -- (Note: Hive 0.13.0 and later; SCHEMA added in Hive 0.14.0) ALTER (DATABASE|SCHEMA) database_name SET LOCATION hdfs_path; -- (Note: Hive 2.2.1, 2.4.0 and later)
示例
hive> alter database hive1101 set dbproperties ('edit_by'='wjd'); OK Time taken: 0.118 seconds
注意,location 沒法更改
可能只有 Hive 2.2.1, 2.4.0 and later 才能夠,個人是 2.3.6,沒有測試
切換到目標數據庫下
USE database_name; USE DEFAULT;
顯示全部數據庫名稱
show databases;
hive> desc database hive1101; OK hive1101 hdfs://hadoop10:9000/usr/hive_test root USER Time taken: 0.182 seconds, Fetched: 1 row(s)
只顯示了元數據信息,也能夠 在 database 後加 extended,能多顯示一些信息
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name -- (Note: TEMPORARY available in Hive 0.14.0 and later) [(col_name data_type [column_constraint_specification] [COMMENT col_comment], ... [constraint_specification])] [COMMENT table_comment] [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS] [SKEWED BY (col_name, col_name, ...) -- (Note: Available in Hive 0.10.0 and later)] ON ((col_value, col_value, ...), (col_value, col_value, ...), ...) [STORED AS DIRECTORIES] [ [ROW FORMAT row_format] [STORED AS file_format] | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] -- (Note: Available in Hive 0.6.0 and later) ] [LOCATION hdfs_path]
後面還有不少參數,具體可參照官網 - 下面的參考資料
參數解釋
temporary:
exeternal:建立一個外部表,同時須要指定實際數據所在的路徑,location 來指定
like:複製表結構,但不復制數據
row format:指定每行的格式,若是原數據的格式不符,能夠寫入表,但不能正確的寫入表
// delimited fields terminated by '\t' 以 \t 爲間隔
// delimited fields terminated by ',' 注意逗號分隔的只能是 csv 文件,本身寫的不能用,會出錯
// delimited 間隔;terminated 結尾;
ROW FORMAT DELIMITED [FIELDS TERMINATED BY char] [COLLECTION ITEMS TERMINATED BY char] [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char] | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]
stored as: 加載的文件格式
// 若是是純文本文件,能夠用 stored as textfile;若是是壓縮文件,能夠用 stored as SEQUENCEFILE
// 還有 ORC、json 等多種個數,可查看官網
partitioned by:分區表,這個很重要,後面專門講
CLUSTERED BY:分桶表,後面和分區表一塊兒將
示例
hive> create table student(id int,name string) row format delimited fields terminated by '\t'; 建立表,以 hive> create table if not exists student1 like student; 建立一個和表同樣模式的表 hive> create table if not exists mytable(sid int,sname string) > row format delimited fields terminated by '\005' > stored as textfile; 建立內部表 hive> create external table if not exists pageview( > pageid int, > page_url string comment 'the page url' > ) > row format delimited fields terminated by ',' > location 'hdfs://192.168.220.144:9000/user/hive/warehouse'; 建立外部表 hive> create table student_p(id int,name string,sexex string,age int,dept string) > partitioned by(part string) > row format delimited fields terminated by ',' > stored as textfile; 建立分區表
測試 row format
寫入以下數據到 student,以 \t 爲間隔
1 a 2 b 3 c 4 d,
很顯然,最後一行不是以 \t 間隔
hive> load data local inpath '/usr/lib/hive2.3.6/1.txt' into table student; Loading data to table hive1101.student OK Time taken: 0.868 seconds hive> select * from student; OK 1 a 2 b 3 c NULL NULL Time taken: 0.17 seconds, Fetched: 4 row(s)
能夠看到最後一行沒有正確的寫入
DROP TABLE [IF EXISTS] table_name [PURGE]; -- (Note: PURGE available in Hive 0.14.0 and later)
清空表;注意不能清空外部表
TRUNCATE TABLE table_name [PARTITION partition_spec]; partition_spec: : (partition_column = partition_col_value, partition_column = partition_col_value, ...)
修改表的屬性
ALTER TABLE table_name RENAME TO new_table_name;
ALTER TABLE table_name SET TBLPROPERTIES table_properties; table_properties: : (property_name = property_value, property_name = property_value, ... )
ALTER TABLE table_name SET TBLPROPERTIES ('comment' = new_comment);
ALTER TABLE table_name [PARTITION partition_spec] SET SERDE serde_class_name [WITH SERDEPROPERTIES serde_properties]; ALTER TABLE table_name [PARTITION partition_spec] SET SERDEPROPERTIES serde_properties; serde_properties: : (property_name = property_value, property_name = property_value, ... )
修改列的名字,類型,位置 等
ALTER TABLE table_name [PARTITION partition_spec] CHANGE [COLUMN] col_old_name col_new_name column_type [COMMENT col_comment] [FIRST|AFTER column_name] [CASCADE|RESTRICT];
示例
CREATE TABLE test_change (a int, b int, c int); // First change column a's name to a1. ALTER TABLE test_change CHANGE a a1 INT; // Next change column a1's name to a2, its data type to string, and put it after column b. ALTER TABLE test_change CHANGE a1 a2 STRING AFTER b; // The new table's structure is: b int, a2 string, c int. // Then change column c's name to c1, and put it as the first column. ALTER TABLE test_change CHANGE c c1 INT FIRST; // The new table's structure is: c1 int, b int, a2 string. // Add a comment to column a1 ALTER TABLE test_change CHANGE a1 a1 INT COMMENT 'this is column a1';
增長或者替換列
ALTER TABLE table_name [PARTITION partition_spec] -- (Note: Hive 0.14.0 and later) ADD|REPLACE COLUMNS (col_name data_type [COMMENT col_comment], ...) [CASCADE|RESTRICT] -- (Note: Hive 1.1.0 and later)
CREATE INDEX index_name ON TABLE base_table_name (col_name, ...) AS index_type [WITH DEFERRED REBUILD] [IDXPROPERTIES (property_name=property_value, ...)] [IN TABLE index_table_name] [ [ ROW FORMAT ...] STORED AS ... | STORED BY ... ] [LOCATION hdfs_path] [TBLPROPERTIES (...)] [COMMENT "index comment"];
DROP INDEX [IF EXISTS] index_name ON table_name;
ALTER INDEX index_name ON table_name [PARTITION partition_spec] REBUILD;
SHOW (DATABASES|SCHEMAS) [LIKE 'identifier_with_wildcards'];
SHOW TABLES [IN database_name] ['identifier_with_wildcards'];
SHOW TBLPROPERTIES tblname;
SHOW TBLPROPERTIES tblname("foo");
SHOW CREATE TABLE ([db_name.]table_name|view_name);
SHOW [FORMATTED] (INDEX|INDEXES) ON table_with_index [(FROM|IN) db_name];
SHOW COLUMNS (FROM|IN) table_name [(FROM|IN) db_name];
示例
-- SHOW COLUMNS CREATE DATABASE test_db; USE test_db; CREATE TABLE foo(col1 INT, col2 INT, col3 INT, cola INT, colb INT, colc INT, a INT, b INT, c INT); -- SHOW COLUMNS basic syntax SHOW COLUMNS FROM foo; -- show all column in foo SHOW COLUMNS FROM foo "*"; -- show all column in foo SHOW COLUMNS IN foo "col*"; -- show columns in foo starting with "col" OUTPUT col1,col2,col3,cola,colb,colc SHOW COLUMNS FROM foo '*c'; -- show columns in foo ending with "c" OUTPUT c,colc SHOW COLUMNS FROM foo LIKE "col1|cola"; -- show columns in foo either col1 or cola OUTPUT col1,cola SHOW COLUMNS FROM foo FROM test_db LIKE 'col*'; -- show columns in foo starting with "col" OUTPUT col1,col2,col3,cola,colb,colc SHOW COLUMNS IN foo IN test_db LIKE 'col*'; -- show columns in foo starting with "col" (FROM/IN same) OUTPUT col1,col2,col3,cola,colb,colc -- Non existing column pattern resulting in no match SHOW COLUMNS IN foo "nomatch*"; SHOW COLUMNS IN foo "col+"; -- + wildcard not supported SHOW COLUMNS IN foo "nomatch";
不屬於 DDL,屬於 DML,後面會講
參考資料:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL 官網
https://ask.hellobi.com/blog/wujiadong/9483
https://blog.csdn.net/xiaozelulu/article/details/81585867