一、建立數據庫shell
create database 數據庫名;
二、使用數據庫數據庫
use 數據庫名;
三、建立表oop
內部表:表目錄安裝hive的規範來部署,位於hive倉庫目錄/user/hive/warehouse中測試
create table t_pv_log(ip string,url string,access_time string ) row format delimited fields terminated by ',';
外部表:表目錄由用戶指定網站
在hdfs上建立文件夾url
hadoop fs -mkdir -p /pvlog/2017-09-16
準備測試數據code
192.168.33.1,http://sina.com/a,2017-09-16 12:52:01
192.168.33.2,http://sina.com/a,2017-09-16 12:51:01
192.168.33.1,http://sina.com/a,2017-09-16 12:50:01
192.168.33.2,http://sina.com/b,2017-09-16 12:49:01
192.168.33.1,http://sina.com/b,2017-09-16 12:48:01
192.168.33.4,http://sina.com/a,2017-09-16 12:47:01
192.168.33.3,http://sina.com/a,2017-09-16 12:46:01
192.168.33.2,http://sina.com/b,2017-09-16 12:45:01
192.168.33.2,http://sina.com/a,2017-09-16 12:44:01
192.168.33.1,http://sina.com/a,2017-09-16 13:43:01orm
將數據上傳至hdfs中/pvlog/2017-09-16ip
hadoop fs -put ./pv.log /pvlog/2017-09-16
建立外部表:hadoop
create external table t_pv_log(ip string,url string,access_time string ) row format delimited fields terminated by ',' location '/pvlog/2017-09-16';
內部表和外部表區別:
內部表刪除時表和數據同時刪除
外部表只刪除表,數據文件依舊存在於hdfs系統中
四、分區表
分區表的實質是:在表目錄中爲數據文件建立分區子目錄,以便於在查詢時,MR程序能夠針對分區子目錄中的數據進行處理,縮減讀取數據的範圍。
好比,網站天天產生的瀏覽記錄,瀏覽記錄應該建一個表來存放,可是,有時,咱們可能只須要對每一天的瀏覽記錄進行分析
這時,就能夠將這個表建爲分區表,天天的數據導入其中的一個分區
準備數據:
192.168.33.1,http://sina.com/a,2017-09-16 12:52:01
192.168.33.2,http://sina.com/a,2017-09-16 12:51:01
192.168.33.1,http://sina.com/a,2017-09-16 12:50:01
192.168.33.2,http://sina.com/b,2017-09-16 12:49:01
192.168.33.1,http://sina.com/b,2017-09-15 12:48:01
192.168.33.4,http://sina.com/a,2017-09-15 12:47:01
192.168.33.3,http://sina.com/a,2017-09-15 12:46:01
192.168.33.2,http://sina.com/b,2017-09-15 12:45:01
192.168.33.2,http://sina.com/a,2017-09-15 12:44:01
192.168.33.1,http://sina.com/a,2017-09-15 13:43:01
建立分區表
create table t_pv_log(ip string,url string ,access_time string) partitioned by(day string) row format delimited fields terminated by ',';
將數據加載入新建的表中:
load data local inpath '/usr/local/hivetest/pv.log.15' into table t_pv_log partition(day='20170916');
經過分區字段查詢數據:
0: jdbc:hive2://hadoop00:10000> select * from t_pv_log where day ='20170916'; +---------------+--------------------+-----------------------+---------------+--+ | t_pv_log.ip | t_pv_log.url | t_pv_log.access_time | t_pv_log.day | +---------------+--------------------+-----------------------+---------------+--+ | 192.168.33.1 | http://sina.com/a | 2017-09-16 12:52:01 | 20170916 | | 192.168.33.2 | http://sina.com/a | 2017-09-16 12:51:01 | 20170916 | | 192.168.33.1 | http://sina.com/a | 2017-09-16 12:50:01 | 20170916 | | 192.168.33.2 | http://sina.com/b | 2017-09-16 12:49:01 | 20170916 | | 192.168.33.1 | http://sina.com/b | 2017-09-16 12:48:01 | 20170916 | | 192.168.33.4 | http://sina.com/a | 2017-09-16 12:47:01 | 20170916 | | 192.168.33.3 | http://sina.com/a | 2017-09-16 12:46:01 | 20170916 | | 192.168.33.2 | http://sina.com/b | 2017-09-16 12:45:01 | 20170916 | | 192.168.33.2 | http://sina.com/a | 2017-09-16 12:44:01 | 20170916 | | 192.168.33.1 | http://sina.com/a | 2017-09-16 13:43:01 | 20170916 | +---------------+--------------------+-----------------------+---------------+--+
五、文件導入
方式1:
手動用hdfs命令,將文件放入表目錄。
方式2:在hive的交互式shell中用hive命令來導入本地數據到表目錄
load data local inpath '/usr/local/data/' into table order;
方式3:用hive命令導入hdfs中的數據文件到表目錄
load data inpath ‘access.log’ into table t_access partition(day='20170916');
注意導入本地文件和導HDFS文件區別:
本地文件導入表:複製
HDFS文件導入表:移動