示例數據庫爲 db_hive mysql
一、 建立表 create-table.sqlsql
create table if not exists db_hive.tb_user ( id int, username string comment '用戶名', age int comment '年齡', address string comment '地址' ) comment '用戶表' row format delimited fields terminated by ',' stored as textfile location '/user/hive/warehouse/db_hive.db/db_user'
二、執行建立表數據庫
hive -f 'create-table.sql'
三、加載數據到 tb_user 表中測試
數據文件 /root/files/tb_user.txtspa
1001,Logan,16,shenzhen
1002,Herry potter,12,Magic school
1003,孫悟空,500,花果山
Hive交互式命令行執行命令 load data local inpath '/root/files/tb_user.txt' into table db_hive.tb_user;命令行
以下所示:code
hive (db_hive)> load data local inpath '/root/files/tb_user.txt' into table db_hive.tb_user;
若是要覆蓋舊數據,能夠加 overwrite,以下所示orm
hive (db_hive)> load data local inpath '/root/files/tb_user.txt' overwrite into table db_hive.tb_user;
四、查詢數據blog
hive -e "select id,username from db_hive.tb_user"
五、根據已有表建立只有部分字段的子表ip
create table if not exists db_hive.tb_user_sub as select id,username from db_hive.tb_user;
六、 like 建立表
create table if not exists db_hive.tb_user_like like db_hive.tb_user;
插入數據
insert into table db_hive.tb_user_like select * from db_hive.tb_user;
七、重命名錶
alter table tb_user_like rename to tb_user_rename ;
八、 建立外部表,刪除時只刪除元數據,不會刪除表數據
create external table if not exists db_hive.tb_ext(id string);
九、建立分區表
create table if not exists db_hive.tb_logs( ip string, text string, log_time string ) partitioned by (month string) row format delimited fields terminated by "\t";
數據文件 /root/files/tb_logs.txt
192.168.32.100 login 20190429072650 192.168.32.100 order 20190429072730 192.168.32.101 browse 20190429072812
載入數據
load data local inpath '/root/files/tb_logs.txt' into table db_hive.tb_logs partition (month = '201904')
查詢數據
select ip,text,log_time from tb_logs where month = '201904';
十、手工建立分區數據及修復分區表
建立分區目錄
hdfs dfs -mkdir -p /user/hive/warehouse/db_hive.db/tb_logs/month=201905
上傳數據文件到分區目錄下
hdfs dfs -put /root/files/tb_logs.txt /user/hive/warehouse/db_hive.db/tb_logs/month=201905
此時執行查詢
select count(distinct ip) from db_hive.tb_logs where month = '201905';
查詢結果爲0。
【緣由】:數據並未添加到分區中,查看配置的MySQL元數據信息
mysql> use hive_metastore; mysql> select * from PARTITIONS;
示例配置的Hive元數據存放爲MySQL數據庫中的 hive_metastore 數據庫
查詢分區表 PARTITIONS 中的數據,發現只有一條記錄,以下所示:
+---------+-------------+------------------+--------------+-------+--------+ | PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME | SD_ID | TBL_ID | +---------+-------------+------------------+--------------+-------+--------+ | 1 | 1556494255 | 0 | month=201904 | 29 | 28 | +---------+-------------+------------------+--------------+-------+--------+
【修復方法一】直接執行修復命令
msck repair table tb_logs
此時分區表中的數據以下:
+---------+-------------+------------------+--------------+-------+--------+ | PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME | SD_ID | TBL_ID | +---------+-------------+------------------+--------------+-------+--------+ | 1 | 1556494255 | 0 | month=201904 | 29 | 28 | | 2 | 1556495227 | 0 | month=201905 | 30 | 28 | +---------+-------------+------------------+--------------+-------+--------+
執行查詢命令
select count(distinct ip) from db_hive.tb_logs where month = '201905';
返回結果爲2,數據已正常加入分區。
【修復方法二】 使用增長分區命令
操做步驟:建立新分區目錄並上傳數據文件,命令以下:
hive (db_hive)> dfs -mkdir -p /user/hive/warehouse/db_hive.db/tb_logs/month=201906; hive (db_hive)> dfs -put /root/files/tb_logs.txt /user/hive/warehouse/db_hive.db/tb_logs/month=201906;
執行增長分區命令
alter table tb_logs add partition(month = '201906');
查詢數據,測試結果正常。
此時元數據分區表中數據以下:
+---------+-------------+------------------+--------------+-------+--------+ | PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME | SD_ID | TBL_ID | +---------+-------------+------------------+--------------+-------+--------+ | 1 | 1556494255 | 0 | month=201904 | 29 | 28 | | 2 | 1556495227 | 0 | month=201905 | 30 | 28 | | 3 | 1556495635 | 0 | month=201906 | 31 | 28 | +---------+-------------+------------------+--------------+-------+--------+
查看錶分區命令
show partitions db_hive.tb_logs;
十一、 導出表數據
export table db_hive.tb_logs to '/user/hive/warehouse/export/db_hive/tb_logs';
十二、 導入表數據
建立表
create table tb_logs_like like tb_logs;
導入數據
import table tb_logs_like from '/user/hive/warehouse/export/db_hive/tb_logs';
1三、導出數據到本地文件
insert overwrite local directory '/root/files/hive_out' row format delimited fields terminated by '\t' collection items terminated by '\n' select * from db_hive.tb_logs;
Hive 經常使用命令和語句
.