Hive 教程(三)-DDL基礎

時間 2019-11-22

標籤 hive 教程 ddl 基礎欄目 Hadoop 简体版

原文原文鏈接

DDL，Hive Data Definition Language，數據定義語言；數據庫

通俗理解就是數據庫與庫表相關的操做，本文總結一下基本方法apache

hive 數據倉庫配置

hive 數據倉庫默認位置在 hdfs 上的 /user/hive/warehouse 路徑下；json

hive 有個默認的數據庫叫 default；安全

可是在 /user/hive/warehouse 路徑下沒有建立 default 文件夾，default 下的表是直接在 /user/hive/warehouse 路徑下建立文件夾ide

在 hive 中，數據庫對應 hdfs 上一個路徑（叫文件夾或者更合適），數據表也對應 hdfs 上一個路徑，數據對應 hdfs 上一個文件oop

管理表 vs 外建表

管理表也稱內建表；hive 默認建立的表都是管理表；測試

管理表和外建表的數據都存儲在 hdfs，由於都是 hive 的表；ui

區別

hive 在建立內部表時，會把數據移動到數據倉庫指定的路徑，如 hdfs 某個地方；this

若是建立外部表，不會移動數據，僅在元數據中記錄數據所在的位置；url

最大的區別在於：當刪除內部表時，同時刪除數據和元數據；當刪除外部表時，僅刪除元數據，不刪除數據；

鑑於這種特性，管理表不適合共享數據，容易產生安全問題；

在實際工做中，通常使用外建表

相互轉換

查看錶的類型

hive> desc formatted student_p;
Table Type:             MANAGED_TABLE

管理表 to 外部表

hive> alter table student_p set tblproperties('EXTERNAL'='TRUE');
Table Type:             EXTERNAL_TABLE

外部表 to 管理表

hive> alter table student_p set tblproperties('EXTERNAL'='FALSE');

注意必須大寫

Database

Create Database

CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name
  [COMMENT database_comment]
  [LOCATION hdfs_path]
  [WITH DBPROPERTIES (property_name=property_value, ...)];

示例

hive> create database hive1101 location '/usr/hive_test';
OK
Time taken: 0.12 seconds

注意這裏 location 的地址並非 hive 默認的 hdfs 地址，說明是能夠指定非默認地址的

Drop Database

數據庫必須是空的

DROP (DATABASE|SCHEMA) [IF EXISTS] database_name [RESTRICT|CASCADE];

Alter Database

改變數據庫的屬性

ALTER (DATABASE|SCHEMA) database_name SET DBPROPERTIES (property_name=property_value, ...);   -- (Note: SCHEMA added in Hive 0.14.0)
ALTER (DATABASE|SCHEMA) database_name SET OWNER [USER|ROLE] user_or_role;   -- (Note: Hive 0.13.0 and later; SCHEMA added in Hive 0.14.0)
ALTER (DATABASE|SCHEMA) database_name SET LOCATION hdfs_path; -- (Note: Hive 2.2.1, 2.4.0 and later)

示例

hive> alter database hive1101 set dbproperties ('edit_by'='wjd');
OK
Time taken: 0.118 seconds

注意，location 沒法更改

可能只有 Hive 2.2.1, 2.4.0 and later 才能夠，個人是 2.3.6，沒有測試

Use Database

切換到目標數據庫下

USE database_name;
USE DEFAULT;

Show Database

顯示全部數據庫名稱

show databases;

顯示數據庫信息

hive> desc database hive1101;
OK
hive1101        hdfs://hadoop10:9000/usr/hive_test    root    USER    
Time taken: 0.182 seconds, Fetched: 1 row(s)

只顯示了元數據信息，也能夠在 database 後加 extended，能多顯示一些信息

Table

Create Table

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name    -- (Note: TEMPORARY available in Hive 0.14.0 and later)
  [(col_name data_type [column_constraint_specification] [COMMENT col_comment], ... [constraint_specification])]
  [COMMENT table_comment]
  [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
  [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
  [SKEWED BY (col_name, col_name, ...)                  -- (Note: Available in Hive 0.10.0 and later)]
     ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
     [STORED AS DIRECTORIES]
  [
   [ROW FORMAT row_format] 
   [STORED AS file_format]
     | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]  -- (Note: Available in Hive 0.6.0 and later)
  ]
  [LOCATION hdfs_path]

後面還有不少參數，具體可參照官網 - 下面的參考資料

參數解釋

temporary：

exeternal：建立一個外部表，同時須要指定實際數據所在的路徑，location 來指定

like：複製表結構，但不復制數據

row format：指定每行的格式，若是原數據的格式不符，能夠寫入表，但不能正確的寫入表

　　// delimited fields terminated by '\t' 　以 \t 爲間隔

　　// delimited fields terminated by ','　　注意逗號分隔的只能是 csv 文件，本身寫的不能用，會出錯

　　// delimited 間隔；terminated 結尾；

ROW FORMAT 
DELIMITED [FIELDS TERMINATED BY char] [COLLECTION ITEMS TERMINATED BY char] 
    [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char] 
   | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]

stored as：加載的文件格式

　　// 若是是純文本文件，能夠用 stored as textfile；若是是壓縮文件，能夠用 stored as SEQUENCEFILE

　　// 還有 ORC、json 等多種個數，可查看官網

partitioned by：分區表，這個很重要，後面專門講

CLUSTERED BY：分桶表，後面和分區表一塊兒將

示例

hive> create table student(id int,name string) row format delimited fields terminated by '\t'; 建立表，以
hive> create  table if not exists student1 like student; 建立一個和表同樣模式的表

hive> create table if not exists mytable(sid int,sname string)
    >  row format delimited fields terminated by '\005' 
    >  stored as textfile; 建立內部表
    
hive> create external table if not exists pageview(
    >  pageid int,
    >  page_url string comment 'the page url'
    > )
    > row format delimited fields terminated by ','
    > location 'hdfs://192.168.220.144:9000/user/hive/warehouse'; 建立外部表
    
hive> create table student_p(id int,name string,sexex string,age int,dept string)
    >  partitioned by(part string)
    >  row format delimited fields terminated by ','
    >  stored as textfile;    建立分區表

測試 row format

寫入以下數據到 student，以 \t 爲間隔

很顯然，最後一行不是以 \t 間隔

hive> load data local inpath '/usr/lib/hive2.3.6/1.txt' into table student;
Loading data to table hive1101.student
OK
Time taken: 0.868 seconds
hive> select * from student;
OK
1    a
2    b
3    c
NULL    NULL
Time taken: 0.17 seconds, Fetched: 4 row(s)

能夠看到最後一行沒有正確的寫入

Drop Table

DROP TABLE [IF EXISTS] table_name [PURGE];     -- (Note: PURGE available in Hive 0.14.0 and later)

Truncate Table

清空表；注意不能清空外部表

TRUNCATE TABLE table_name [PARTITION partition_spec];
 
partition_spec:
  : (partition_column = partition_col_value, partition_column = partition_col_value, ...)

Alter Table

修改表的屬性

Rename Table

ALTER TABLE table_name RENAME TO new_table_name;

Alter Table Properties

ALTER TABLE table_name SET TBLPROPERTIES table_properties;
 
table_properties:
  : (property_name = property_value, property_name = property_value, ... )

Alter Table Comment

ALTER TABLE table_name SET TBLPROPERTIES ('comment' = new_comment);

Add SerDe Properties

ALTER TABLE table_name [PARTITION partition_spec] SET SERDE serde_class_name [WITH SERDEPROPERTIES serde_properties];
 
ALTER TABLE table_name [PARTITION partition_spec] SET SERDEPROPERTIES serde_properties;
 
serde_properties:
  : (property_name = property_value, property_name = property_value, ... )

Alter Column

Change Column Name/Type/Position/Comment

修改列的名字，類型，位置等

ALTER TABLE table_name [PARTITION partition_spec] CHANGE [COLUMN] col_old_name col_new_name column_type
  [COMMENT col_comment] [FIRST|AFTER column_name] [CASCADE|RESTRICT];

示例

CREATE TABLE test_change (a int, b int, c int);
 
// First change column a's name to a1.
ALTER TABLE test_change CHANGE a a1 INT;
 
// Next change column a1's name to a2, its data type to string, and put it after column b.
ALTER TABLE test_change CHANGE a1 a2 STRING AFTER b;
// The new table's structure is:  b int, a2 string, c int.
  
// Then change column c's name to c1, and put it as the first column.
ALTER TABLE test_change CHANGE c c1 INT FIRST;
// The new table's structure is:  c1 int, b int, a2 string.
  
// Add a comment to column a1
ALTER TABLE test_change CHANGE a1 a1 INT COMMENT 'this is column a1';

Add/Replace Columns

增長或者替換列

ALTER TABLE table_name 
  [PARTITION partition_spec]                 -- (Note: Hive 0.14.0 and later)
  ADD|REPLACE COLUMNS (col_name data_type [COMMENT col_comment], ...)
  [CASCADE|RESTRICT]                         -- (Note: Hive 1.1.0 and later)

Index

Create Index

CREATE INDEX index_name
  ON TABLE base_table_name (col_name, ...)
  AS index_type
  [WITH DEFERRED REBUILD]
  [IDXPROPERTIES (property_name=property_value, ...)]
  [IN TABLE index_table_name]
  [
     [ ROW FORMAT ...] STORED AS ...
     | STORED BY ...
  ]
  [LOCATION hdfs_path]
  [TBLPROPERTIES (...)]
  [COMMENT "index comment"];

Drop Index

DROP INDEX [IF EXISTS] index_name ON table_name;

Alter Index

ALTER INDEX index_name ON table_name [PARTITION partition_spec] REBUILD;

Show

Show Databases

SHOW (DATABASES|SCHEMAS) [LIKE 'identifier_with_wildcards'];

Show Tables

SHOW TABLES [IN database_name] ['identifier_with_wildcards'];

Show Table Properties

SHOW TBLPROPERTIES tblname;
SHOW TBLPROPERTIES tblname("foo");

Show Create Table

SHOW CREATE TABLE ([db_name.]table_name|view_name);

Show Indexes

SHOW [FORMATTED] (INDEX|INDEXES) ON table_with_index [(FROM|IN) db_name];

Show Columns

SHOW COLUMNS (FROM|IN) table_name [(FROM|IN) db_name];

示例

-- SHOW COLUMNS
CREATE DATABASE test_db;
USE test_db;
CREATE TABLE foo(col1 INT, col2 INT, col3 INT, cola INT, colb INT, colc INT, a INT, b INT, c INT);
  
-- SHOW COLUMNS basic syntax
SHOW COLUMNS FROM foo;                            -- show all column in foo
SHOW COLUMNS FROM foo "*";                        -- show all column in foo
SHOW COLUMNS IN foo "col*";                       -- show columns in foo starting with "col"                 OUTPUT col1,col2,col3,cola,colb,colc
SHOW COLUMNS FROM foo '*c';                       -- show columns in foo ending with "c"                     OUTPUT c,colc
SHOW COLUMNS FROM foo LIKE "col1|cola";           -- show columns in foo either col1 or cola                 OUTPUT col1,cola
SHOW COLUMNS FROM foo FROM test_db LIKE 'col*';   -- show columns in foo starting with "col"                 OUTPUT col1,col2,col3,cola,colb,colc
SHOW COLUMNS IN foo IN test_db LIKE 'col*';       -- show columns in foo starting with "col" (FROM/IN same)  OUTPUT col1,col2,col3,cola,colb,colc
  
-- Non existing column pattern resulting in no match
SHOW COLUMNS IN foo "nomatch*";
SHOW COLUMNS IN foo "col+";                       -- + wildcard not supported
SHOW COLUMNS IN foo "nomatch";