Hive學習之基礎語法

時間 2020-02-07

標籤 hive 學習基礎語法欄目 Hadoop 简体版

原文原文鏈接

一、建立數據庫shell

create database 數據庫名;

二、使用數據庫數據庫

use 數據庫名;

三、建立表oop

內部表:表目錄安裝hive的規範來部署，位於hive倉庫目錄/user/hive/warehouse中測試

create table t_pv_log(ip string,url string,access_time string )
row format delimited
fields terminated by ',';

外部表:表目錄由用戶指定網站

在hdfs上建立文件夾url

 hadoop fs -mkdir -p /pvlog/2017-09-16

準備測試數據code

192.168.33.1,http://sina.com/a,2017-09-16 12:52:01
192.168.33.2,http://sina.com/a,2017-09-16 12:51:01
192.168.33.1,http://sina.com/a,2017-09-16 12:50:01
192.168.33.2,http://sina.com/b,2017-09-16 12:49:01
192.168.33.1,http://sina.com/b,2017-09-16 12:48:01
192.168.33.4,http://sina.com/a,2017-09-16 12:47:01
192.168.33.3,http://sina.com/a,2017-09-16 12:46:01
192.168.33.2,http://sina.com/b,2017-09-16 12:45:01
192.168.33.2,http://sina.com/a,2017-09-16 12:44:01
192.168.33.1,http://sina.com/a,2017-09-16 13:43:01orm

將數據上傳至hdfs中/pvlog/2017-09-16ip

hadoop fs -put ./pv.log /pvlog/2017-09-16

建立外部表:hadoop

create external table t_pv_log(ip string,url string,access_time string )
row format delimited
fields terminated by ','
location '/pvlog/2017-09-16';

內部表和外部表區別:

內部表刪除時表和數據同時刪除

外部表只刪除表，數據文件依舊存在於hdfs系統中

四、分區表

分區表的實質是:在表目錄中爲數據文件建立分區子目錄，以便於在查詢時，MR程序能夠針對分區子目錄中的數據進行處理,縮減讀取數據的範圍。

好比,網站天天產生的瀏覽記錄，瀏覽記錄應該建一個表來存放，可是，有時，咱們可能只須要對每一天的瀏覽記錄進行分析

這時，就能夠將這個表建爲分區表，天天的數據導入其中的一個分區

準備數據:

    192.168.33.1,http://sina.com/a,2017-09-16 12:52:01
    192.168.33.2,http://sina.com/a,2017-09-16 12:51:01
    192.168.33.1,http://sina.com/a,2017-09-16 12:50:01
    192.168.33.2,http://sina.com/b,2017-09-16 12:49:01
    192.168.33.1,http://sina.com/b,2017-09-15 12:48:01
    192.168.33.4,http://sina.com/a,2017-09-15 12:47:01
    192.168.33.3,http://sina.com/a,2017-09-15 12:46:01
    192.168.33.2,http://sina.com/b,2017-09-15 12:45:01
    192.168.33.2,http://sina.com/a,2017-09-15 12:44:01
    192.168.33.1,http://sina.com/a,2017-09-15 13:43:01

建立分區表

create table t_pv_log(ip string,url string ,access_time string)
partitioned by(day string)
row format delimited
fields terminated by ',';

將數據加載入新建的表中:

load data local inpath '/usr/local/hivetest/pv.log.15' into table t_pv_log partition(day='20170916');

經過分區字段查詢數據:

0: jdbc:hive2://hadoop00:10000> select * from t_pv_log where day ='20170916';
+---------------+--------------------+-----------------------+---------------+--+
|  t_pv_log.ip  |    t_pv_log.url    | t_pv_log.access_time  | t_pv_log.day  |
+---------------+--------------------+-----------------------+---------------+--+
| 192.168.33.1  | http://sina.com/a  | 2017-09-16 12:52:01   | 20170916      |
| 192.168.33.2  | http://sina.com/a  | 2017-09-16 12:51:01   | 20170916      |
| 192.168.33.1  | http://sina.com/a  | 2017-09-16 12:50:01   | 20170916      |
| 192.168.33.2  | http://sina.com/b  | 2017-09-16 12:49:01   | 20170916      |
| 192.168.33.1  | http://sina.com/b  | 2017-09-16 12:48:01   | 20170916      |
| 192.168.33.4  | http://sina.com/a  | 2017-09-16 12:47:01   | 20170916      |
| 192.168.33.3  | http://sina.com/a  | 2017-09-16 12:46:01   | 20170916      |
| 192.168.33.2  | http://sina.com/b  | 2017-09-16 12:45:01   | 20170916      |
| 192.168.33.2  | http://sina.com/a  | 2017-09-16 12:44:01   | 20170916      |
| 192.168.33.1  | http://sina.com/a  | 2017-09-16 13:43:01   | 20170916      |
+---------------+--------------------+-----------------------+---------------+--+

五、文件導入

方式1:

手動用hdfs命令,將文件放入表目錄。

方式2:在hive的交互式shell中用hive命令來導入本地數據到表目錄

load data local inpath '/usr/local/data/' into table order;

方式3:用hive命令導入hdfs中的數據文件到表目錄

load data inpath ‘access.log’ into table t_access partition(day='20170916');

注意導入本地文件和導HDFS文件區別:

本地文件導入表:複製

HDFS文件導入表:移動