大數據(Hive-搭建和基本使用)

Hive背景及應用場景

Hive是什麼?java

由facebook開源,最初用於解決海量結構化的日誌數據統計問題;
   ETL (Extraction-Transformation-Loading )工具mysql

構建在Hadoop之上的數據倉庫;
   數據計算使用MR ,數據存儲使用HDFSsql

Hive 定義了一種類 SQL 查詢語言——HQL;
   相似SQL , 但不徹底相同數據庫

一般用於進行離線數據處理(採用MapReduce);express

可認爲是一個HQL MR的語言翻譯器。apache

Hive典型應用場景bash

日誌分析
   統計網站一個時間段內的pv、uv
   多維度數據分析
   大部分互聯網公司使用Hive進行日誌分析,包括百度、淘寶等架構

其餘場景
   海量結構化數據離線分析
   低成本進行數據分析(不直接編寫MR)app

爲何使用Hive?less

簡單、容易上手
   提供了類SQL 查詢語言HQL ;

爲超大數據集設計的計算/擴展能力
   MR 做爲計算引擎,HDFS

Hive基本架構

Hive各模塊組成

用戶接口
   包括 CLI ,JDBC/ODBC ,WebUI

元數據存儲(metastore)
   默認存儲在自帶的數據庫derby 中,線上使用時通常換爲MySQL

驅動器(Driver)
   解釋器、編譯器、優化器、執行器

Hadoop
   用 MapReduce進行計算,用HDFS進行存儲

Hive部署架構-實驗環境

數據類型(不斷增長中……)

數據定義語句(DDL)

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
(col_name data_type, ...)
[PARTITIONED BY (col_name data_type, ...)]
[CLUSTERED BY (col_name, col_name, ...) [SORTED BY
(col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
[SKEWED BY (col_name, col_name, ...)]
[ [ROW FORMAT row_format] [STORED AS file_format] ]
[LOCATION hdfs_path]

 

搭建步驟

1:下載地址
http://archive.apache.org/dist/hive

2:解壓
3:配置hive的環境變量
    在當前用戶的.bashrc中配置以下內容
    export HIVE_HOME=/home/hadoop/bd/apache-hive-2.1.0-bin

4:配置hive安裝目錄下的conf目錄下的hive-env.sh文件
    該文件能夠經過複製hive-env.sh.template更名得來
    配置內容以下:
    # Set HADOOP_HOME to point to a specific hadoop install directory
     HADOOP_HOME=/home/hadoop/bd/hadoop-2.7.3

    # Hive Configuration Directory can be controlled by:
     export HIVE_CONF_DIR=/home/hadoop/bd/apache-hive-2.1.0-bin/conf

    # Folder containing extra ibraries required for hive compilation/execution can be controlled by:
     export HIVE_AUX_JARS_PATH=/home/hadoop/bd/apache-hive-2.1.0-bin/lib

5:修改hive的日誌文件存放的地址
    cp hive-log4j2.properties.template hive-log4j2.properties
    經過vi修改日誌的存放文件
    property.hive.log.dir = /home/hadoop/bd/apache-hive-2.1.0-bin/logs

6:啓動hadoop集羣

7:安裝默認的derby數據庫爲hive的元數據庫
    能夠先經過./schematool --help 命令來查看schematool命令的一些選項
     ./schematool -dbType derby -initSchema,使用這個命令來安裝derby數據庫爲元數據

8:執行bin目錄下的hive命令,進入hive命令行
    ./hive

若是沒有問題的話,hive就安裝成功了


1:建立表
    create table 表名
    指定分隔符建立表:create table teacher (id int, name string) row format delimited fields terminated by '\t';

二:更改元數據庫爲mysql

1:複製文件hive-default.xml.template改名爲hive-site.xml
    cp hive-default.xml.template hive-site.xml

2:清空hive-site.xml裏面的配置信息
    添加咱們自定義的信息

    <configuration>
     <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://hm02:3306/hive?createDatabaseIfNotExist=true</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>root</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>123</value>
    </property>
  </configuration>

3:拷貝mysql驅動jar包到hive安裝目錄下的lib目錄

4:mysql受權以及實例化metastore
    1)若是以前對該主機和用戶進行了受權,那麼能夠不用再次受權,不然進行受權,參考sqoop那章
    (grant all privileges on *.* to root@'主機名' identified by '密碼')
    前提是use mysql這個庫。

    2)實例化metastore命令:
    ./schematool -dbType mysql -initSchema

5:關於mysql數據庫做爲元數據庫的幾點說明
    1)hive當中建立的表的信息,在元數據庫的TBLS表裏面
    2)這個表的字段信息,在元數據庫的COLUMNS_V2表裏面
    3)這個表在HDFS上面的位置信息,在元數據庫的SDS表裏面

hive-site.xml

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!--
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
--><configuration>
 	<property>
		<name>javax.jdo.option.ConnectionURL</name>
		<value>jdbc:mysql://hm02:3306/hive?createDatabaseIfNotExist=true</value>
	</property>
	<property>
		<name>javax.jdo.option.ConnectionDriverName</name>
		<value>com.mysql.jdbc.Driver</value>
	</property>
	<property>
		<name>javax.jdo.option.ConnectionUserName</name>
		<value>root</value>
	</property>
	<property>
		<name>javax.jdo.option.ConnectionPassword</name>
		<value>123</value>
	</property>
  </configuration>

hive-site-back.xml

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!--
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
--><configuration>
	<property>
		<name>javax.jdo.option.ConnectionURL</name>
		<value>jdbc:mysql://hm:3306/hive?createDatabaseIfNotExist=true</value>
	</property>
	<property>
		<name>javax.jdo.option.ConnectionDriverName</name>
		<value>com.mysql.jdbc.Driver</value>
	</property>
	<property>
		<name>javax.jdo.option.ConnectionUserName</name>
		<value>root</value>
	</property>
	<property>
		<name>javax.jdo.option.ConnectionPassword</name>
		<value>123</value>
	</property>
  </configuration>
相關文章
相關標籤/搜索