hive的元數據

 簡介:

hive是創建在hadoop之上的數據倉庫,通常用於對大型數據集的讀寫和管理,存在hive裏的數據實際上就是存在HDFS上,都是以文件的形式存在,不能進行讀寫操做,因此咱們須要元數據或者說叫schem來對hdfs上的數據進行管理。那元數據表之間有沒有什麼關聯呢?答案是確定的。hive默認元數據表是存儲在derby中的,可是derby是單session的,因此咱們通常會修改會mysqljava

那麼該如何啓用mysql來管理元數據呢?

 1 <configuration>
 2  <property>
 3   <name>javax.jdo.option.ConnectionURL</name>
 4   <value>jdbc:mysql://hadoop001:3306/ruoze_d6?createDatabaseIfNotExist=true&amp;characterEncoding=UTF-8</value>
 5  </property>
 6 <property>
 7       <name>javax.jdo.option.ConnectionDriverName</name>
 8         <value>com.mysql.jdbc.Driver</value>
 9  </property>
10 <property>
11   <name>javax.jdo.option.ConnectionUserName</name>
12     <value>root</value>
13     </property>
14 <property>
15   <name>javax.jdo.option.ConnectionPassword</name>
16     <value>123456</value>
17     </property>
18 </configuration>
以上的配置就會啓用MYSQL管理元數據
第4行的配置是配置了mysql裏的數據庫名字叫ruoze_d6,第11行和第16行式配置了MySQL的登陸用戶名和密碼,而且ruoze_d6這個庫不須要在mysql中特別創建

mysql> use ruoze_d6; Database changed mysql>

 

 1 mysql> show tables;  2 +---------------------------+
 3 | Tables_in_ruoze_d6        |
 4 +---------------------------+
 5 | bucketing_cols            |
 6 | cds                       |
 7 | columns_v2                |
 8 | database_params           |
 9 | dbs                       |
10 | func_ru                   |
11 | funcs                     |
12 | global_privs              |
13 | groupinfor                |
14 | idxs                      |
15 | index_params              |
16 | makedata_job              |
17 | part_col_privs            |
18 | part_col_stats            |
19 | part_privs                |
20 | partition_key_vals        |
21 | partition_keys            |
22 | partition_params          |
23 | partitions                |
24 | roles                     |
25 | sd_params                 |
26 | sds                       |
27 | sequence_table            |
28 | serde_params              |
29 | serdes                    |
30 | skewed_col_names          |
31 | skewed_col_value_loc_map  |
32 | skewed_string_list        |
33 | skewed_string_list_values |
34 | skewed_values             |
35 | sort_cols                 |
36 | tab_col_stats             |
37 | table_params              |
38 | tbl_col_privs             |
39 | tbl_privs                 |
40 | tbls                      |
41 | version                   |
42 |                   |
43 +---------------------------+
44 37 rows in set (0.00 sec)

這裏一共有37張表, 咱們撿主次分析一下

  • version(存儲Hive版本的元數據表)

mysql> select * from version ; +--------+----------------+----------------------------------------+
| VER_ID | SCHEMA_VERSION | VERSION_COMMENT                        |
+--------+----------------+----------------------------------------+
|     11 | 1.1.0          | Set by MetaStore hadoop@172.16.202.233 |
+--------+----------------+----------------------------------------+
1 row in set (0.00 sec)
  • 說明
  1. 第一列是ID主鍵;第二列是hive的版本,第三列是版本說明,而且這張表裏只有一條數據,且只能有一條數據,若是這張表被刪除,當啓動Hive-Cli時候,就會報錯」Table ‘hive.version’ doesn’t exist」。
  2. 可是前提示關閉某個參數,若是那個參數開着,那麼你若是刪除了這張表或者說清空這張表,他都會自動創建,那個參數我忘記是啥了,回頭想起來會來補上

 

  • DBS(hive數據庫相關的元數據表)

mysql> select * from DBS \G; *************************** 1. row *************************** DB_ID: 1 DESC: Default Hive database DB_LOCATION_URI: hdfs://hadoop001:9000/user/hive/warehouse
           NAME: default OWNER_NAME: public OWNER_TYPE: ROLE *************************** 2. row *************************** DB_ID: 6 DESC: NULL DB_LOCATION_URI: hdfs://hadoop001:9000/user/hive/warehouse/hadoop_g6.db
 NAME: hadoop_g6 OWNER_NAME: hadoop OWNER_TYPE: USER *************************** 3. row *************************** DB_ID: 11 DESC: NULL DB_LOCATION_URI: hdfs://hadoop001:9000/user/hive/warehouse/ruoze_d6.db
 NAME: ruoze_d6 OWNER_NAME: hadoop OWNER_TYPE: USER 3 rows in set (0.00 sec)
  • 說明:該表存儲Hive中全部數據庫的基本信息
列名 解釋
DB_ID
數據庫ID
DESC
數據庫描述
DB_LOCATION_URI
數據庫HDFS路徑
NAME
數據庫名
OWNER_NAME
數據庫全部者用戶名
OWNER_TYPE
全部者角色

 

  • database_params(hive數據庫相關的元數據表)

mysql> desc database_params; +-------------+---------------+------+-----+---------+-------+
| Field       | Type          | Null | Key | Default | Extra |
+-------------+---------------+------+-----+---------+-------+
| DB_ID       | bigint(20)    | NO   | PRI | NULL    |       |
| PARAM_KEY   | varchar(180)  | NO   | PRI | NULL    |       |
| PARAM_VALUE | varchar(4000) | YES  |     | NULL    |       |
+-------------+---------------+------+-----+---------+-------+
  • 說明:該表存儲數據庫的相關參數,在CREATE DATABASE時候用 WITH DBPROPERTIES (property_name=property_value, …)指定的參數
字段 說明 示例
DB_ID
數據庫ID 11
PARAM_KEY
參數名 createby
PARAM_VALUE
參數值 root

 

 

  •  TBLS(Hive表和視圖相關的元數據表)

mysql> select * from TBLS \G; *************************** 1. row *************************** TBL_ID: 37 CREATE_TIME: 1555494334 DB_ID: 1 LAST_ACCESS_TIME: 0 OWNER: hadoop RETENTION: 0 SD_ID: 37 TBL_NAME: makedata_job TBL_TYPE: MANAGED_TABLE VIEW_EXPANDED_TEXT: NULL VIEW_ORIGINAL_TEXT: NULL
  • 說明:該表中存儲Hive表、視圖、索引表的基本信息。
TBL_ID
表ID
 CREATE_TIME
建立時間
 DB_ID
數據庫ID
LAST_ACCESS_TIME
上次訪問時間
OWNER
全部者
RETENTION
保留字段
 SD_ID
序列化配置信息(對應SDS表中的SD_ID
TBL_NAME
表名
TBL_TYPE
表類型
VIEW_EXPANDED_TEXT
視圖詳細的HQL語句
VIEW_ORIGINAL_TEXT
視圖原始的HQL語句
   

 

 

  • table_params(Hive表和視圖相關的元數據表)

mysql> select * from table_params; +--------+-----------------------+-------------+
| TBL_ID | PARAM_KEY             | PARAM_VALUE |
+--------+-----------------------+-------------+
|     37 | COLUMN_STATS_ACCURATE | true        |
|     37 | numFiles              | 5           |
|     37 | numRows               | 0           |
|     37 | rawDataSize           | 0           |
|     37 | totalSize             | 2921282     |
|     37 | transient_lastDdlTime | 1555551458  |
|     42 | EXTERNAL              | TRUE        |
|     42 | transient_lastDdlTime | 1555555620  |
|     46 | COLUMN_STATS_ACCURATE | true        |
|     46 | numFiles              | 1           |
|     46 | numRows               | 500000      |
|     46 | rawDataSize           | 72051224    |
|     46 | totalSize             | 30284817    |
|     46 | transient_lastDdlTime | 1555557177  |
|     51 | EXTERNAL              | TRUE        |
|     51 | transient_lastDdlTime | 1555772013  |
|     52 | COLUMN_STATS_ACCURATE | true        |
|     52 | numFiles              | 1           |
|     52 | numRows               | 500000      |
|     52 | rawDataSize           | 67551224    |
|     52 | totalSize             | 75265591    |
|     52 | transient_lastDdlTime | 1555772485  |
|     56 | COLUMN_STATS_ACCURATE | true        |
|     56 | numFiles              | 1           |
|     56 | numRows               | 500000      |
|     56 | rawDataSize           | 64051224    |
|     56 | totalSize             | 64641768    |
|     56 | transient_lastDdlTime | 1555773864  |
|     66 | COLUMN_STATS_ACCURATE | true        |
|     66 | numFiles              | 1           |
|     66 | numRows               | 500000      |
|     66 | rawDataSize           | 359000000   |
|     66 | totalSize             | 17782969    |
|     66 | transient_lastDdlTime | 1555775575  |
|     67 | COLUMN_STATS_ACCURATE | true        |
|     67 | numFiles              | 1           |
|     67 | numRows               | 500000      |
|     67 | orc.compress          | NONE        |
|     67 | rawDataSize           | 359000000   |
|     67 | totalSize             | 53967047    |
|     67 | transient_lastDdlTime | 1555775880  |
|     68 | COLUMN_STATS_ACCURATE | true        |
|     68 | numFiles              | 1           |
|     68 | numRows               | 500000      |
|     68 | rawDataSize           | 4000000     |
|     68 | totalSize             | 61117546    |
|     68 | transient_lastDdlTime | 1555776185  |
|     69 | COLUMN_STATS_ACCURATE | true        |
|     69 | numFiles              | 1           |
|     69 | numRows               | 500000      |
|     69 | rawDataSize           | 4000000     |
|     69 | totalSize             | 16854027    |
|     69 | transient_lastDdlTime | 1555776356  |
|     71 | COLUMN_STATS_ACCURATE | true        |
|     71 | numFiles              | 1           |
|     71 | numRows               | 1           |
|     71 | rawDataSize           | 0           |
|     71 | totalSize             | 1           |
|     71 | transient_lastDdlTime | 1555809751  |
|     76 | transient_lastDdlTime | 1555836141  |
|     77 | COLUMN_STATS_ACCURATE | true        |
|     77 | numFiles              | 1           |
|     77 | numRows               | 0           |
|     77 | rawDataSize           | 0           |
|     77 | totalSize             | 366         |
|     77 | transient_lastDdlTime | 1555837173  |
+--------+-----------------------+-------------+
  • 說明:該表存儲表/視圖的屬性信息。
字段 dec
TBL_ID
表ID(對應TBLS中的TBL_ID)
PARAM_KEY
屬性名
PARAM_VALUES
屬性值

 

  •  TBL_PRIVS 該表存儲表/視圖的受權信息(不作詳細說明)

mysql> desc TBL_PRIVS; +----------------+--------------+------+-----+---------+-------+
| Field          | Type         | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------+-------+
| TBL_GRANT_ID   | bigint(20)   | NO   | PRI | NULL    |       |
| CREATE_TIME    | int(11)      | NO   |     | NULL    |       |
| GRANT_OPTION   | smallint(6)  | NO   |     | NULL    |       |
| GRANTOR        | varchar(128) | YES  |     | NULL    |       |
| GRANTOR_TYPE   | varchar(128) | YES  |     | NULL    |       |
| PRINCIPAL_NAME | varchar(128) | YES  |     | NULL    |       |
| PRINCIPAL_TYPE | varchar(128) | YES  |     | NULL    |       |
| TBL_PRIV       | varchar(128) | YES  |     | NULL    |       |
| TBL_ID         | bigint(20)   | YES  | MUL | NULL    |       |
+----------------+--------------+------+-----+---------+-------+
9 rows in set (0.01 sec)
TBL_ID對應TBLS中的TBL_ID
  • sds(Hive文件存儲信息相關的元數據表)

mysql> desc sds; +---------------------------+---------------+------+-----+---------+-------+
| Field                     | Type          | Null | Key | Default | Extra |
+---------------------------+---------------+------+-----+---------+-------+
| SD_ID                     | bigint(20)    | NO   | PRI | NULL    |       |
| CD_ID                     | bigint(20)    | YES  | MUL | NULL    |       |
| INPUT_FORMAT              | varchar(4000) | YES  |     | NULL    |       |
| IS_COMPRESSED             | bit(1)        | NO   |     | NULL    |       |
| IS_STOREDASSUBDIRECTORIES | bit(1)        | NO   |     | NULL    |       |
| LOCATION                  | varchar(4000) | YES  |     | NULL    |       |
| NUM_BUCKETS               | int(11)       | NO   |     | NULL    |       |
| OUTPUT_FORMAT             | varchar(4000) | YES  |     | NULL    |       |
| SERDE_ID                  | bigint(20)    | YES  | MUL | NULL    |       |
+---------------------------+---------------+------+-----+---------+-------+

 

  • 說明:文件存儲的基本信息:
SD_ID
 
CD_ID
字段信息ID
INPUT_FORMAT
文件輸入格式
IS_COMPRESSED
是否壓縮
IS_STOREDASSUBDIRECTORIES
是否以子目錄存儲
LOCATION
HDFS路徑
NUM_BUCKETS
分桶數量
OUTPUT_FORMAT
文件輸出格式
SERDE_ID
序列化類ID
字段 說明

 

  • SD_PARAMS(Hive文件存儲信息相關的元數據表)

mysql> desc SD_PARAMS; +-------------+---------------+------+-----+---------+-------+
| Field       | Type          | Null | Key | Default | Extra |
+-------------+---------------+------+-----+---------+-------+
| SD_ID       | bigint(20)    | NO   | PRI | NULL    |       |
| PARAM_KEY   | varchar(256)  | NO   | PRI | NULL    |       |
| PARAM_VALUE | varchar(4000) | YES  |     | NULL    |       |
+-------------+---------------+------+-----+---------+-------+
3 rows in set (0.00 sec)
  • 說明:該表存儲Hive存儲的屬性信息,在建立表時候使用 

STORED BY ‘storage.handler.class.name’ [WITH SERDEPROPERTIES (…)指定。mysql

  • serdes(Hive文件存儲信息相關的元數據表)

mysql> select * from serdes; +----------+------+-------------------------------------------------------------+
| SERDE_ID | NAME | SLIB                                                        |
+----------+------+-------------------------------------------------------------+
|       37 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe          |
|       42 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe          |
|       43 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe          |
|       46 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe          |
|       51 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe          |
|       52 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe          |
|       56 | NULL | org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe        |
|       66 | NULL | org.apache.hadoop.hive.ql.io.orc.OrcSerde                   |
|       67 | NULL | org.apache.hadoop.hive.ql.io.orc.OrcSerde                   |
|       68 | NULL | org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe |
|       69 | NULL | org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe |
|       71 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe          |
|       76 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe          |
|       77 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe          |
+----------+------+-------------------------------------------------------------+
14 rows in set (0.00 sec)
  • 說明:該表存儲序列化使用的類信息
字段 字段說明
SERDE_ID
序列化類配置ID(對應SDS的SERDE_ID )

NAME
序列化類別名
SLIB
序列化類
  • serde_params(Hive文件存儲信息相關的元數據表)

mysql> select * from serde_params; +----------+----------------------+-------------+
| SERDE_ID | PARAM_KEY            | PARAM_VALUE |
+----------+----------------------+-------------+
|       37 | field.delim          |                |
|       37 | serialization.format |                |
|       42 | field.delim          |                |
|       42 | serialization.format |                |
|       43 | field.delim          |                |
|       43 | serialization.format |                |
|       46 | serialization.format | 1           |
|       51 | field.delim          |                |
|       51 | serialization.format |                |
|       52 | serialization.format | 1           |
|       56 | serialization.format | 1           |
|       66 | serialization.format | 1           |
|       67 | serialization.format | 1           |
|       68 | serialization.format | 1           |
|       69 | serialization.format | 1           |
|       71 | serialization.format | 1           |
|       76 | field.delim          |                |
|       76 | serialization.format |                |
|       77 | field.delim          |                |
|       77 | serialization.format |                |
+----------+----------------------+-------------+
20 rows in set (0.00 sec)
  • 說明:該表存儲序列化的一些屬性、格式信息,好比:行、列分隔符
字段 字段說明
SERDE_ID
序列化類配置ID(對應SDS的SERDE_ID )
PARAM_KEY
屬性名
PARAM_VALUE
屬性值

 

  • columns_v2Hive表字段相關的元數據表

mysql> select * from columns_v2; +-------+---------+-------------+--------------+-------------+
| CD_ID | COMMENT | COLUMN_NAME | TYPE_NAME    | INTEGER_IDX |
+-------+---------+-------------+--------------+-------------+
|    37 | NULL    | ip          | varchar(20)  |           4 |
|    37 | NULL    | levelnm     | varchar(6)   |           2 |
|    37 | NULL    | region      | varchar(6)   |           1 |
|    37 | NULL    | time_random | varchar(20)  |           3 |
|    37 | NULL    | traffic     | varchar(12)  |           7 |
|    37 | NULL    | urlid       | varchar(100) |           6 |
|    37 | NULL    | urlnm       | varchar(6)   |           0 |
|    37 | NULL    | urlym       | varchar(20)  |           5 |
|    42 | NULL    | cdn         | string       |           0 |
|    42 | NULL    | domain      | string       |           5 |
|    42 | NULL    | ip          | string       |           4 |
|    42 | NULL    | level       | string       |           2 |
|    42 | NULL    | region      | string       |           1 |
|    42 | NULL    | time        | string       |           3 |
|    42 | NULL    | traffic     | bigint       |           7 |
|    42 | NULL    | url         | string       |           6 |
+-------+---------+-------------+--------------+-------------+
17 rows in set (0.00 sec)
  • 說明:表的字段信息
字段 字段說明
CD_ID
字段信息ID(對應表SDS的CD_ID)
COMMENT
字段註釋
COLUMN_NAME
字段名
TYPE_NAME
字段類型
INTEGER_IDX
字段順序

 

 

 

  • partitions(Hive表分區相關的元數據表)

mysql> select * from partitions ; +---------+-------------+------------------+--------------+-------+--------+
| PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME    | SD_ID | TBL_ID |
+---------+-------------+------------------+--------------+-------+--------+
|      21 |  1555555926 |                0 | day=20190418 |    43 |     42 |
+---------+-------------+------------------+--------------+-------+--------+
1 row in set (0.00 sec)
  • 說明:分區的基本信息
 字段 字段說明 
PART_ID
 分區ID
CREATE_TIME
 分區建立時間
LAST_ACCESS_TIME
 最後一次訪問時間
PART_NAME
 分區名稱
 
SD_ID
 分區存儲ID
 
TBL_ID
 表ID

 

 

  • partition_keys(Hive表分區相關的元數據表)

mysql> select * from partition_keys; +--------+--------------+-----------+-----------+-------------+
| TBL_ID | PKEY_COMMENT | PKEY_NAME | PKEY_TYPE | INTEGER_IDX |
+--------+--------------+-----------+-----------+-------------+
|     42 | NULL         | day       | string    |           0 |
+--------+--------------+-----------+-----------+-------------+
1 row in set (0.00 sec)
  • 說明:分區的字段信息
字段名稱 字段說明
TBL_ID
表ID
PKEY_COMMENT
分區字段說明
PKEY_NAME
分區字段名稱
PKEY_TYPE
分區字段類型
INTEGER_IDX
分區字段順序

 

 

  • partition_key_vals(Hive表分區相關的元數據表)

mysql> select * from partition_key_vals; +---------+--------------+-------------+
| PART_ID | PART_KEY_VAL | INTEGER_IDX |
+---------+--------------+-------------+
|      21 | 20190418     |           0 |
+---------+--------------+-------------+
1 row in set (0.00 sec)
  • 說明:該表存儲分區字段值
字段 字段說明
PART_ID
分區ID
PART_KEY_VAL
分區字段值
INTEGER_IDX
分區字段值順序

 

  • partition_params(Hive表分區相關的元數據表)

mysql> select * from partition_params; +---------+-----------------------+-------------+
| PART_ID | PARAM_KEY             | PARAM_VALUE |
+---------+-----------------------+-------------+
|      21 | COLUMN_STATS_ACCURATE | true        |
|      21 | numFiles              | 1           |
|      21 | totalSize             | 29975501    |
|      21 | transient_lastDdlTime | 1555556171  |
+---------+-----------------------+-------------+
4 rows in set (0.00 sec)
  • 說明:該表存儲分區的屬性信息.
字段 字段說明
PART_ID
分區ID
PARAM_KEY
分區屬性名
PARAM_VALUE
分區屬性值

 

 

 

  • 其餘不經常使用的元數據表


此圖轉載於https://mp.weixin.qq.com/s/c2C4SYaj-GUP6hTkPNV_hQsql

參考博客:https://mp.weixin.qq.com/s/c2C4SYaj-GUP6hTkPNV_hQ數據庫

相關文章
相關標籤/搜索