Hive內部表和外部表的區別詳解

內部表&外部表
未被external修飾的是內部表(managed table),被external修飾的爲外部表(external table);
區別:
內部表數據由Hive自身管理,外部表數據由HDFS管理;
內部表數據存儲的位置是hive.metastore.warehouse.dir(默認:/user/hive/warehouse),外部表數據的存儲位置由本身制定(若是沒有LOCATION,Hive將在HDFS上的/user/hive/warehouse文件夾下之外部表的表名建立一個文件夾,並將屬於這個表的數據存放在這裏);
刪除內部表會直接刪除元數據(metadata)及存儲數據;刪除外部表僅僅會刪除元數據,HDFS上的文件並不會被刪除;
對內部表的修改會將修改直接同步給元數據,而對外部表的表結構和分區進行修改,則須要修復(MSCK REPAIR TABLE table_name;)html

以下,進行試驗進行理解apache

試驗理解
建立內部表t1
create table t1(
    id      int
   ,name    string
   ,hobby   array<string>
   ,add     map<String,string>
)
row format delimited
fields terminated by ','
collection items terminated by '-'
map keys terminated by ':'
;
1
2
3
4
5
6
7
8
9
10
11app

2. 查看錶的描述:desc t1;ide


裝載數據(t1)
注:通常不多用insert (不是insert overwrite)語句,由於就算就算插入一條數據,也會調用MapReduce,這裏咱們選擇Load Data的方式。oop

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
1
建立一個文件粘貼上述記錄,並上載便可,以下圖:.net

文件內容以下命令行

1,xiaoming,book-TV-code,beijing:chaoyang-shagnhai:pudong
2,lilei,book-code,nanjing:jiangning-taiwan:taibei
3,lihua,music-book,heilongjiang:haerbin
1
2
3
而後上載code

load data local inpath '/home/hadoop/Desktop/data' overwrite into table t1;
1
別忘記寫文件名/data,筆者第一次忘記寫,把整個Desktop上傳了,一查全是null和亂碼。。。。
查看錶內容:orm

select * from t1;
1htm


建立一個外部表t2
create external table t2(
    id      int
   ,name    string
   ,hobby   array<string>
   ,add     map<String,string>
)
row format delimited
fields terminated by ','
collection items terminated by '-'
map keys terminated by ':'
location '/user/t2'
;
1
2
3
4
5
6
7
8
9
10
11
12


裝載數據(t2)
load data local inpath '/home/hadoop/Desktop/data' overwrite into table t2;
1


查看文件位置
以下圖,咱們在NameNode:50070/explorer.html#/user/目錄下,能夠看到t2文件


t1在哪呢?在咱們以前配置的默認路徑裏


一樣咱們能夠經過命令行得到二者的位置信息:

desc formatted table_name;
1

注:圖中managed table就是內部表,而external table就是外部表。
##分別刪除內部表和外部表
下面分別刪除內部表和外部表,查看區別


觀察HDFS上的文件
發現t1已經不存在了


可是t2仍然存在

於是外部表僅僅刪除元數據

從新建立外部表t2
create external table t2(
    id      int
   ,name    string
   ,hobby   array<string>
   ,add     map<String,string>
)
row format delimited
fields terminated by ','
collection items terminated by '-'
map keys terminated by ':'
location '/user/t2'
;
1
2
3
4
5
6
7
8
9
10
11
12


不往裏面插入數據,咱們select * 看看結果

可見數據仍然在!!!

官網解釋
如下是官網中關於external表的介紹:

A table created without the EXTERNAL clause is called a managed table because Hive manages its data.
Managed and External Tables
By default Hive creates managed tables, where files, metadata and statistics are managed by internal Hive processes. A managed table is stored under the hive.metastore.warehouse.dir path property, by default in a folder path similar to /apps/hive/warehouse/databasename.db/tablename/. The default location can be overridden by the location property during table creation. If a managed table or partition is dropped, the data and metadata associated with that table or partition are deleted. If the PURGE option is not specified, the data is moved to a trash folder for a defined duration.
Use managed tables when Hive should manage the lifecycle of the table, or when generating temporary tables.
An external table describes the metadata / schema on external files. External table files can be accessed and managed by processes outside of Hive. External tables can access data stored in sources such as Azure Storage Volumes (ASV) or remote HDFS locations. If the structure or partitioning of an external table is changed, an MSCK REPAIR TABLE table_name statement can be used to refresh metadata information.
Use external tables when files are already present or in remote locations, and the files should remain even if the table is dropped.
Managed or external tables can be identified using the DESCRIBE FORMATTED table_name command, which will display either MANAGED_TABLE or EXTERNAL_TABLE depending on table type.
Statistics can be managed on internal and external tables and partitions for query optimization.

Hive官網介紹: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-DescribeTable/View/Column ———————————————— 版權聲明:本文爲CSDN博主「劉金寶_Arvin」的原創文章,遵循 CC 4.0 BY-SA 版權協議,轉載請附上原文出處連接及本聲明。 原文連接:https://blog.csdn.net/qq_36743482/article/details/78393678

相關文章
相關標籤/搜索