Hive是什麼?java
由facebook開源,最初用於解決海量結構化的日誌數據統計問題;
ETL (Extraction-Transformation-Loading )工具mysql
構建在Hadoop之上的數據倉庫;
數據計算使用MR ,數據存儲使用HDFSsql
Hive 定義了一種類 SQL 查詢語言——HQL;
相似SQL , 但不徹底相同數據庫
一般用於進行離線數據處理(採用MapReduce);express
可認爲是一個HQL MR的語言翻譯器。apache
Hive典型應用場景bash
日誌分析
統計網站一個時間段內的pv、uv
多維度數據分析
大部分互聯網公司使用Hive進行日誌分析,包括百度、淘寶等架構
其餘場景
海量結構化數據離線分析
低成本進行數據分析(不直接編寫MR)app
爲何使用Hive?less
簡單、容易上手
提供了類SQL 查詢語言HQL ;
爲超大數據集設計的計算/擴展能力
MR 做爲計算引擎,HDFS
Hive各模塊組成
用戶接口
包括 CLI ,JDBC/ODBC ,WebUI
元數據存儲(metastore)
默認存儲在自帶的數據庫derby 中,線上使用時通常換爲MySQL
驅動器(Driver)
解釋器、編譯器、優化器、執行器
Hadoop
用 MapReduce進行計算,用HDFS進行存儲
Hive部署架構-實驗環境
數據類型(不斷增長中……)
數據定義語句(DDL)
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name (col_name data_type, ...) [PARTITIONED BY (col_name data_type, ...)] [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS] [SKEWED BY (col_name, col_name, ...)] [ [ROW FORMAT row_format] [STORED AS file_format] ] [LOCATION hdfs_path]
1:下載地址
http://archive.apache.org/dist/hive
2:解壓
3:配置hive的環境變量
在當前用戶的.bashrc中配置以下內容
export HIVE_HOME=/home/hadoop/bd/apache-hive-2.1.0-bin
4:配置hive安裝目錄下的conf目錄下的hive-env.sh文件
該文件能夠經過複製hive-env.sh.template更名得來
配置內容以下:
# Set HADOOP_HOME to point to a specific hadoop install directory
HADOOP_HOME=/home/hadoop/bd/hadoop-2.7.3
# Hive Configuration Directory can be controlled by:
export HIVE_CONF_DIR=/home/hadoop/bd/apache-hive-2.1.0-bin/conf
# Folder containing extra ibraries required for hive compilation/execution can be controlled by:
export HIVE_AUX_JARS_PATH=/home/hadoop/bd/apache-hive-2.1.0-bin/lib
5:修改hive的日誌文件存放的地址
cp hive-log4j2.properties.template hive-log4j2.properties
經過vi修改日誌的存放文件
property.hive.log.dir = /home/hadoop/bd/apache-hive-2.1.0-bin/logs
6:啓動hadoop集羣
7:安裝默認的derby數據庫爲hive的元數據庫
能夠先經過./schematool --help 命令來查看schematool命令的一些選項
./schematool -dbType derby -initSchema,使用這個命令來安裝derby數據庫爲元數據
8:執行bin目錄下的hive命令,進入hive命令行
./hive
若是沒有問題的話,hive就安裝成功了
1:建立表
create table 表名
指定分隔符建立表:create table teacher (id int, name string) row format delimited fields terminated by '\t';
二:更改元數據庫爲mysql
1:複製文件hive-default.xml.template改名爲hive-site.xml
cp hive-default.xml.template hive-site.xml
2:清空hive-site.xml裏面的配置信息
添加咱們自定義的信息
<configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://hm02:3306/hive?createDatabaseIfNotExist=true</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>123</value> </property> </configuration>
3:拷貝mysql驅動jar包到hive安裝目錄下的lib目錄
4:mysql受權以及實例化metastore
1)若是以前對該主機和用戶進行了受權,那麼能夠不用再次受權,不然進行受權,參考sqoop那章
(grant all privileges on *.* to root@'主機名' identified by '密碼')
前提是use mysql這個庫。
2)實例化metastore命令:
./schematool -dbType mysql -initSchema
5:關於mysql數據庫做爲元數據庫的幾點說明
1)hive當中建立的表的信息,在元數據庫的TBLS表裏面
2)這個表的字段信息,在元數據庫的COLUMNS_V2表裏面
3)這個表在HDFS上面的位置信息,在元數據庫的SDS表裏面
hive-site.xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --><configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://hm02:3306/hive?createDatabaseIfNotExist=true</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>123</value> </property> </configuration>
hive-site-back.xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --><configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://hm:3306/hive?createDatabaseIfNotExist=true</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>123</value> </property> </configuration>