Hive命令行經常使用操做(數據庫/表操做)

時間 2019-11-21

標籤 hive 命令行經常使用數據庫欄目 Hadoop 简体版

原文原文鏈接

1、數據庫操做；正則表達式

1.1:查看全部的數據庫: hive>show databases;sql

1.2:使用數據庫default; hive>use default;數據庫

1.3:查看數據庫信息: hive>describe database default;測試

1.4:顯示的展現當前使用的數據庫:hive>set hive.cli.print.current.db=true;spa

1.5:Hive顯示錶中標題行: hive>set hive.cli.print.header=true;code

1.6:建立數據庫命令: hive>create database test;orm

1.7:切換當前的數據庫: hive>use test;xml

1.8:刪除數據庫: 刪除數據庫的時候，不容許刪除有數據的數據庫，若是數據庫裏面有數據則會報錯。若是要忽略這些內容，則在後面增長CASCADE關鍵字，則忽略報錯，刪除數據庫。Restrict關鍵字是默認狀況，即若是有表存在，則不容許刪除數據庫。string

hive> drop database dbname [CASCADE|RESTRICT] (可選);
hive> drop database if exists dbname CASCADE;it

1.9:hive在HDFS上的默認存儲路徑: Hive的數據都是存儲在HDFS上的，默認有一個根目錄，在hive-site.xml中，由參數hive.metastore.warehouse.dir指定。默認值爲/user/hive/warehouse.

2.0:Hive中的數據庫在HDFS上的存儲路徑爲:${hive.metastore.warehouse.dir}/dbname.db;

其中,hive.metastore.warehouse.dir的默認值是:/user/hive/warehouse。好比，數據庫ethan的存儲路徑爲：/user/hive/warehouse/ethan.db; ".db"是數據庫的擴展名。

2.1:建立數據庫時指定存儲路徑:hive>create database test location '/user/hive/mytest';

2、表操做;

2.1:查看當前DB有哪些表：hive>show tables in dbname;

也可使用正則表達式:hive>show tables like 'e*';

2.2:建立內部(Managed_table)表:create table tablename (); 適用場景：hive中間表,結果表,不需從外部上傳數據的狀況。

建立外部(External_table)表:create external table tablename();適用場景:源表，須要按期將外部數據映射到表中。

建表案例:

create external table test1
     (id int,
      name string
     )comment '測試用表'
partitioned by (day string)
row format delimited
fields terminated by ','
stored as textfile
location 'hdfs://cdh5/tmp/**';

關鍵字解釋:
External:表示該表爲外部表，若是不指定External關鍵字，則表示內部表；
Comment:爲表和列添加註釋;
Partitioned by:表示該表爲分區表，分區字段爲day，類型爲string;
Row format delimited:指定表的分隔符，一般後面要與一下關鍵字連用:
Fields terminated by ',':指定每行中字段分隔符爲逗號；
Lines terminated by '\n':指定行分隔符；
Collection items terminated by ',':指定集合中元素之間的分隔符；
Map keys terminated by ':':指定數據中Map類型的key和value之間的分隔符；

Stored as:指定表在HDFS上的文件存儲格式，可選的文件存儲格式有:
Textfile:文本，默認值；
Sequencefile:二進制序列文件；
Rcfile:列式存儲格式文件，Hive0.6之後開始支持；
Orc:列式存儲格式文件，比Rcfile具備更高的壓縮比和讀寫效率，Hive0.11之後開始支持。
Parquet:列式存儲格式文件,Hive0.13之後開始支持。
Location:指定表在HDFS上的存儲位置。

其它建表方式:

根據一個已存在的表建立另外一個表:

hive>create table test2 like test1; (只複製了表結構，不會複製內容);不須要執行mapreduce;

hive>create table test2 as select id,name from test1;(既複製表結構又複製表內容),需執行mapreduce;

2.3:獲取表的建表語句：hive>show create table tablename;

2.4:獲取表信息:hive>desc extended tablename;或者 hive>desc formatted tablename;

2.5:加載數據到表: hive>load data local inpath '/c/xxx/hivedata/data' overwrite into table tablename;

若是沒有加overwrite，則會再copy一份數據，不會覆蓋掉原來的數據。

2.6：刪除表: hive>drop table tablename;

內部表刪除，會連同hdfs存儲的數據一同刪除，而外部表刪除，只會刪除外部表的元數據信息,不會刪除HDFS上的數據。

3、鏈接HiveServe;

3.1:使用beeline鏈接HiveServe; 啓動beeline: bin/beeline --color=true --fastConnect=true;

鏈接到服務端: !connect jdbc:hive2//ctrl:10000;

例如:

[root@cheyo hive]# bin/beeline --color=true --fastConnect=true
Beeline version 1.0.0 by Apache Hive
beeline> !connect jdbc:hive2://ctrl:10000
scan complete in 13ms
Connecting to jdbc:hive2://ctrl:10000
Enter username for jdbc:hive2://ctrl:10000: root
Enter password for jdbc:hive2://ctrl:10000:

一步到位啓動beeline直接鏈接:

bin/beeline --color=true --fastConnect=true -u jdbc:hive2://ctrl:10000
# 指定用戶名登陸
bin/beeline --color=true --fastConnect=true -u jdbc:hive2://ctrl:10000 -n root -p ""