簡介:
hive是創建在hadoop之上的數據倉庫,通常用於對大型數據集的讀寫和管理,存在hive裏的數據實際上就是存在HDFS上,都是以文件的形式存在,不能進行讀寫操做,因此咱們須要元數據或者說叫schem來對hdfs上的數據進行管理。那元數據表之間有沒有什麼關聯呢?答案是確定的。hive默認元數據表是存儲在derby中的,可是derby是單session的,因此咱們通常會修改會mysqljava
那麼該如何啓用mysql來管理元數據呢?
1 <configuration> 2 <property> 3 <name>javax.jdo.option.ConnectionURL</name> 4 <value>jdbc:mysql://hadoop001:3306/ruoze_d6?createDatabaseIfNotExist=true&characterEncoding=UTF-8</value> 5 </property> 6 <property> 7 <name>javax.jdo.option.ConnectionDriverName</name> 8 <value>com.mysql.jdbc.Driver</value> 9 </property> 10 <property> 11 <name>javax.jdo.option.ConnectionUserName</name> 12 <value>root</value> 13 </property> 14 <property> 15 <name>javax.jdo.option.ConnectionPassword</name> 16 <value>123456</value> 17 </property> 18 </configuration>
以上的配置就會啓用MYSQL管理元數據
第4行的配置是配置了mysql裏的數據庫名字叫ruoze_d6,第11行和第16行式配置了MySQL的登陸用戶名和密碼,而且ruoze_d6這個庫不須要在mysql中特別創建
mysql> use ruoze_d6; Database changed mysql>
1 mysql> show tables; 2 +---------------------------+ 3 | Tables_in_ruoze_d6 | 4 +---------------------------+ 5 | bucketing_cols | 6 | cds | 7 | columns_v2 | 8 | database_params | 9 | dbs | 10 | func_ru | 11 | funcs | 12 | global_privs | 13 | groupinfor | 14 | idxs | 15 | index_params | 16 | makedata_job | 17 | part_col_privs | 18 | part_col_stats | 19 | part_privs | 20 | partition_key_vals | 21 | partition_keys | 22 | partition_params | 23 | partitions | 24 | roles | 25 | sd_params | 26 | sds | 27 | sequence_table | 28 | serde_params | 29 | serdes | 30 | skewed_col_names | 31 | skewed_col_value_loc_map | 32 | skewed_string_list | 33 | skewed_string_list_values | 34 | skewed_values | 35 | sort_cols | 36 | tab_col_stats | 37 | table_params | 38 | tbl_col_privs | 39 | tbl_privs | 40 | tbls | 41 | version | 42 | | 43 +---------------------------+ 44 37 rows in set (0.00 sec)
這裏一共有37張表, 咱們撿主次分析一下
-
version(存儲Hive版本的元數據表)
mysql> select * from version ; +--------+----------------+----------------------------------------+ | VER_ID | SCHEMA_VERSION | VERSION_COMMENT | +--------+----------------+----------------------------------------+ | 11 | 1.1.0 | Set by MetaStore hadoop@172.16.202.233 | +--------+----------------+----------------------------------------+ 1 row in set (0.00 sec)
- 說明:
- 第一列是ID主鍵;第二列是hive的版本,第三列是版本說明,而且這張表裏只有一條數據,且只能有一條數據,若是這張表被刪除,當啓動Hive-Cli時候,就會報錯」Table ‘hive.version’ doesn’t exist」。
- 可是前提示關閉某個參數,若是那個參數開着,那麼你若是刪除了這張表或者說清空這張表,他都會自動創建,那個參數我忘記是啥了,回頭想起來會來補上
-
DBS(hive數據庫相關的元數據表)
mysql> select * from DBS \G; *************************** 1. row *************************** DB_ID: 1 DESC: Default Hive database DB_LOCATION_URI: hdfs://hadoop001:9000/user/hive/warehouse NAME: default OWNER_NAME: public OWNER_TYPE: ROLE *************************** 2. row *************************** DB_ID: 6 DESC: NULL DB_LOCATION_URI: hdfs://hadoop001:9000/user/hive/warehouse/hadoop_g6.db NAME: hadoop_g6 OWNER_NAME: hadoop OWNER_TYPE: USER *************************** 3. row *************************** DB_ID: 11 DESC: NULL DB_LOCATION_URI: hdfs://hadoop001:9000/user/hive/warehouse/ruoze_d6.db NAME: ruoze_d6 OWNER_NAME: hadoop OWNER_TYPE: USER 3 rows in set (0.00 sec)
- 說明:該表存儲Hive中全部數據庫的基本信息
列名 | 解釋 |
DB_ID |
數據庫ID |
DESC |
數據庫描述 |
DB_LOCATION_URI |
數據庫HDFS路徑 |
NAME |
數據庫名 |
OWNER_NAME |
數據庫全部者用戶名 |
OWNER_TYPE |
全部者角色 |
-
database_params(hive數據庫相關的元數據表)
mysql> desc database_params; +-------------+---------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------+---------------+------+-----+---------+-------+ | DB_ID | bigint(20) | NO | PRI | NULL | | | PARAM_KEY | varchar(180) | NO | PRI | NULL | | | PARAM_VALUE | varchar(4000) | YES | | NULL | | +-------------+---------------+------+-----+---------+-------+
- 說明:該表存儲數據庫的相關參數,在CREATE DATABASE時候用 WITH DBPROPERTIES (property_name=property_value, …)指定的參數
字段 | 說明 | 示例 |
DB_ID |
數據庫ID | 11 |
PARAM_KEY |
參數名 | createby |
PARAM_VALUE |
參數值 | root |
-
TBLS(Hive表和視圖相關的元數據表)
mysql> select * from TBLS \G; *************************** 1. row *************************** TBL_ID: 37 CREATE_TIME: 1555494334 DB_ID: 1 LAST_ACCESS_TIME: 0 OWNER: hadoop RETENTION: 0 SD_ID: 37 TBL_NAME: makedata_job TBL_TYPE: MANAGED_TABLE VIEW_EXPANDED_TEXT: NULL VIEW_ORIGINAL_TEXT: NULL
- 說明:該表中存儲Hive表、視圖、索引表的基本信息。
TBL_ID |
表ID |
CREATE_TIME |
建立時間 |
DB_ID |
數據庫ID |
LAST_ACCESS_TIME |
上次訪問時間 |
OWNER |
全部者 |
RETENTION |
保留字段 |
SD_ID |
序列化配置信息(對應SDS表中的SD_ID) |
TBL_NAME |
表名 |
TBL_TYPE |
表類型 |
VIEW_EXPANDED_TEXT |
視圖詳細的HQL語句 |
VIEW_ORIGINAL_TEXT |
視圖原始的HQL語句 |
-
table_params(Hive表和視圖相關的元數據表)
mysql> select * from table_params; +--------+-----------------------+-------------+ | TBL_ID | PARAM_KEY | PARAM_VALUE | +--------+-----------------------+-------------+ | 37 | COLUMN_STATS_ACCURATE | true | | 37 | numFiles | 5 | | 37 | numRows | 0 | | 37 | rawDataSize | 0 | | 37 | totalSize | 2921282 | | 37 | transient_lastDdlTime | 1555551458 | | 42 | EXTERNAL | TRUE | | 42 | transient_lastDdlTime | 1555555620 | | 46 | COLUMN_STATS_ACCURATE | true | | 46 | numFiles | 1 | | 46 | numRows | 500000 | | 46 | rawDataSize | 72051224 | | 46 | totalSize | 30284817 | | 46 | transient_lastDdlTime | 1555557177 | | 51 | EXTERNAL | TRUE | | 51 | transient_lastDdlTime | 1555772013 | | 52 | COLUMN_STATS_ACCURATE | true | | 52 | numFiles | 1 | | 52 | numRows | 500000 | | 52 | rawDataSize | 67551224 | | 52 | totalSize | 75265591 | | 52 | transient_lastDdlTime | 1555772485 | | 56 | COLUMN_STATS_ACCURATE | true | | 56 | numFiles | 1 | | 56 | numRows | 500000 | | 56 | rawDataSize | 64051224 | | 56 | totalSize | 64641768 | | 56 | transient_lastDdlTime | 1555773864 | | 66 | COLUMN_STATS_ACCURATE | true | | 66 | numFiles | 1 | | 66 | numRows | 500000 | | 66 | rawDataSize | 359000000 | | 66 | totalSize | 17782969 | | 66 | transient_lastDdlTime | 1555775575 | | 67 | COLUMN_STATS_ACCURATE | true | | 67 | numFiles | 1 | | 67 | numRows | 500000 | | 67 | orc.compress | NONE | | 67 | rawDataSize | 359000000 | | 67 | totalSize | 53967047 | | 67 | transient_lastDdlTime | 1555775880 | | 68 | COLUMN_STATS_ACCURATE | true | | 68 | numFiles | 1 | | 68 | numRows | 500000 | | 68 | rawDataSize | 4000000 | | 68 | totalSize | 61117546 | | 68 | transient_lastDdlTime | 1555776185 | | 69 | COLUMN_STATS_ACCURATE | true | | 69 | numFiles | 1 | | 69 | numRows | 500000 | | 69 | rawDataSize | 4000000 | | 69 | totalSize | 16854027 | | 69 | transient_lastDdlTime | 1555776356 | | 71 | COLUMN_STATS_ACCURATE | true | | 71 | numFiles | 1 | | 71 | numRows | 1 | | 71 | rawDataSize | 0 | | 71 | totalSize | 1 | | 71 | transient_lastDdlTime | 1555809751 | | 76 | transient_lastDdlTime | 1555836141 | | 77 | COLUMN_STATS_ACCURATE | true | | 77 | numFiles | 1 | | 77 | numRows | 0 | | 77 | rawDataSize | 0 | | 77 | totalSize | 366 | | 77 | transient_lastDdlTime | 1555837173 | +--------+-----------------------+-------------+
- 說明:該表存儲表/視圖的屬性信息。
字段 | dec |
TBL_ID |
表ID(對應TBLS中的TBL_ID) |
PARAM_KEY |
屬性名 |
PARAM_VALUES |
屬性值 |
-
TBL_PRIVS 該表存儲表/視圖的受權信息(不作詳細說明)
mysql> desc TBL_PRIVS; +----------------+--------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +----------------+--------------+------+-----+---------+-------+ | TBL_GRANT_ID | bigint(20) | NO | PRI | NULL | | | CREATE_TIME | int(11) | NO | | NULL | | | GRANT_OPTION | smallint(6) | NO | | NULL | | | GRANTOR | varchar(128) | YES | | NULL | | | GRANTOR_TYPE | varchar(128) | YES | | NULL | | | PRINCIPAL_NAME | varchar(128) | YES | | NULL | | | PRINCIPAL_TYPE | varchar(128) | YES | | NULL | | | TBL_PRIV | varchar(128) | YES | | NULL | | | TBL_ID | bigint(20) | YES | MUL | NULL | | +----------------+--------------+------+-----+---------+-------+ 9 rows in set (0.01 sec)
TBL_ID對應TBLS中的TBL_ID
-
sds(Hive文件存儲信息相關的元數據表)
mysql> desc sds; +---------------------------+---------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +---------------------------+---------------+------+-----+---------+-------+ | SD_ID | bigint(20) | NO | PRI | NULL | | | CD_ID | bigint(20) | YES | MUL | NULL | | | INPUT_FORMAT | varchar(4000) | YES | | NULL | | | IS_COMPRESSED | bit(1) | NO | | NULL | | | IS_STOREDASSUBDIRECTORIES | bit(1) | NO | | NULL | | | LOCATION | varchar(4000) | YES | | NULL | | | NUM_BUCKETS | int(11) | NO | | NULL | | | OUTPUT_FORMAT | varchar(4000) | YES | | NULL | | | SERDE_ID | bigint(20) | YES | MUL | NULL | | +---------------------------+---------------+------+-----+---------+-------+
- 說明:文件存儲的基本信息:
SD_ID |
|
CD_ID |
字段信息ID |
INPUT_FORMAT |
文件輸入格式 |
IS_COMPRESSED |
是否壓縮 |
IS_STOREDASSUBDIRECTORIES |
是否以子目錄存儲 |
LOCATION |
HDFS路徑 |
NUM_BUCKETS |
分桶數量 |
OUTPUT_FORMAT |
文件輸出格式 |
SERDE_ID |
序列化類ID |
字段 | 說明 |
-
SD_PARAMS(Hive文件存儲信息相關的元數據表)
mysql> desc SD_PARAMS; +-------------+---------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------+---------------+------+-----+---------+-------+ | SD_ID | bigint(20) | NO | PRI | NULL | | | PARAM_KEY | varchar(256) | NO | PRI | NULL | | | PARAM_VALUE | varchar(4000) | YES | | NULL | | +-------------+---------------+------+-----+---------+-------+ 3 rows in set (0.00 sec)
- 說明:該表存儲Hive存儲的屬性信息,在建立表時候使用
STORED BY ‘storage.handler.class.name’ [WITH SERDEPROPERTIES (…)指定。mysql
-
serdes(Hive文件存儲信息相關的元數據表)
mysql> select * from serdes; +----------+------+-------------------------------------------------------------+ | SERDE_ID | NAME | SLIB | +----------+------+-------------------------------------------------------------+ | 37 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | | 42 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | | 43 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | | 46 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | | 51 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | | 52 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | | 56 | NULL | org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe | | 66 | NULL | org.apache.hadoop.hive.ql.io.orc.OrcSerde | | 67 | NULL | org.apache.hadoop.hive.ql.io.orc.OrcSerde | | 68 | NULL | org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe | | 69 | NULL | org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe | | 71 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | | 76 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | | 77 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | +----------+------+-------------------------------------------------------------+ 14 rows in set (0.00 sec)
- 說明:該表存儲序列化使用的類信息
字段 | 字段說明 |
SERDE_ID |
序列化類配置ID(對應SDS的SERDE_ID ) |
NAME |
序列化類別名 |
SLIB |
序列化類 |
-
serde_params(Hive文件存儲信息相關的元數據表)
mysql> select * from serde_params; +----------+----------------------+-------------+ | SERDE_ID | PARAM_KEY | PARAM_VALUE | +----------+----------------------+-------------+ | 37 | field.delim | | | 37 | serialization.format | | | 42 | field.delim | | | 42 | serialization.format | | | 43 | field.delim | | | 43 | serialization.format | | | 46 | serialization.format | 1 | | 51 | field.delim | | | 51 | serialization.format | | | 52 | serialization.format | 1 | | 56 | serialization.format | 1 | | 66 | serialization.format | 1 | | 67 | serialization.format | 1 | | 68 | serialization.format | 1 | | 69 | serialization.format | 1 | | 71 | serialization.format | 1 | | 76 | field.delim | | | 76 | serialization.format | | | 77 | field.delim | | | 77 | serialization.format | | +----------+----------------------+-------------+ 20 rows in set (0.00 sec)
- 說明:該表存儲序列化的一些屬性、格式信息,好比:行、列分隔符
字段 | 字段說明 |
SERDE_ID |
序列化類配置ID(對應SDS的SERDE_ID ) |
PARAM_KEY |
屬性名 |
PARAM_VALUE |
屬性值 |
-
columns_v2Hive表字段相關的元數據表
mysql> select * from columns_v2; +-------+---------+-------------+--------------+-------------+ | CD_ID | COMMENT | COLUMN_NAME | TYPE_NAME | INTEGER_IDX | +-------+---------+-------------+--------------+-------------+ | 37 | NULL | ip | varchar(20) | 4 | | 37 | NULL | levelnm | varchar(6) | 2 | | 37 | NULL | region | varchar(6) | 1 | | 37 | NULL | time_random | varchar(20) | 3 | | 37 | NULL | traffic | varchar(12) | 7 | | 37 | NULL | urlid | varchar(100) | 6 | | 37 | NULL | urlnm | varchar(6) | 0 | | 37 | NULL | urlym | varchar(20) | 5 | | 42 | NULL | cdn | string | 0 | | 42 | NULL | domain | string | 5 | | 42 | NULL | ip | string | 4 | | 42 | NULL | level | string | 2 | | 42 | NULL | region | string | 1 | | 42 | NULL | time | string | 3 | | 42 | NULL | traffic | bigint | 7 | | 42 | NULL | url | string | 6 | +-------+---------+-------------+--------------+-------------+ 17 rows in set (0.00 sec)
- 說明:表的字段信息
字段 | 字段說明 |
CD_ID |
字段信息ID(對應表SDS的CD_ID) |
COMMENT |
字段註釋 |
COLUMN_NAME |
字段名 |
TYPE_NAME |
字段類型 |
INTEGER_IDX |
字段順序 |
-
partitions(Hive表分區相關的元數據表)
mysql> select * from partitions ; +---------+-------------+------------------+--------------+-------+--------+ | PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME | SD_ID | TBL_ID | +---------+-------------+------------------+--------------+-------+--------+ | 21 | 1555555926 | 0 | day=20190418 | 43 | 42 | +---------+-------------+------------------+--------------+-------+--------+ 1 row in set (0.00 sec)
- 說明:分區的基本信息
字段 | 字段說明 |
PART_ID |
分區ID |
CREATE_TIME |
分區建立時間 |
LAST_ACCESS_TIME |
最後一次訪問時間 |
PART_NAME |
分區名稱 |
SD_ID |
分區存儲ID |
TBL_ID |
表ID |
-
partition_keys(Hive表分區相關的元數據表)
mysql> select * from partition_keys; +--------+--------------+-----------+-----------+-------------+ | TBL_ID | PKEY_COMMENT | PKEY_NAME | PKEY_TYPE | INTEGER_IDX | +--------+--------------+-----------+-----------+-------------+ | 42 | NULL | day | string | 0 | +--------+--------------+-----------+-----------+-------------+ 1 row in set (0.00 sec)
- 說明:分區的字段信息
字段名稱 | 字段說明 |
TBL_ID |
表ID |
PKEY_COMMENT |
分區字段說明 |
PKEY_NAME |
分區字段名稱 |
PKEY_TYPE |
分區字段類型 |
INTEGER_IDX |
分區字段順序 |
-
partition_key_vals(Hive表分區相關的元數據表)
mysql> select * from partition_key_vals; +---------+--------------+-------------+ | PART_ID | PART_KEY_VAL | INTEGER_IDX | +---------+--------------+-------------+ | 21 | 20190418 | 0 | +---------+--------------+-------------+ 1 row in set (0.00 sec)
- 說明:該表存儲分區字段值
字段 | 字段說明 |
PART_ID |
分區ID |
PART_KEY_VAL |
分區字段值 |
INTEGER_IDX |
分區字段值順序 |
-
partition_params(Hive表分區相關的元數據表)
mysql> select * from partition_params; +---------+-----------------------+-------------+ | PART_ID | PARAM_KEY | PARAM_VALUE | +---------+-----------------------+-------------+ | 21 | COLUMN_STATS_ACCURATE | true | | 21 | numFiles | 1 | | 21 | totalSize | 29975501 | | 21 | transient_lastDdlTime | 1555556171 | +---------+-----------------------+-------------+ 4 rows in set (0.00 sec)
- 說明:該表存儲分區的屬性信息.
字段 | 字段說明 |
PART_ID |
分區ID |
PARAM_KEY |
分區屬性名 |
PARAM_VALUE |
分區屬性值 |
-
其餘不經常使用的元數據表
此圖轉載於https://mp.weixin.qq.com/s/c2C4SYaj-GUP6hTkPNV_hQsql
參考博客:https://mp.weixin.qq.com/s/c2C4SYaj-GUP6hTkPNV_hQ數據庫