The Apache Hive ™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive.html
--apache Hive可使用SQL在分佈式系統來讀取,寫入和管理大數據集合。java
hive是基於hadoop 集羣系統之上的對數據管理的一個框架,因此在搭建hive以前 須要先搭建hadoop和mysql。node
mysql主要是用於存儲hadoop元數據信息,數據塊存儲在hadoop中。其實hive只是接受用戶輸入的sql語句,而後把sql語句映射成MR JOB,由Yarn 完成對任務的調度執行。從流程上講,hive這種設計在效率上並不高,可是有一個好處是hadoop開發不須要寫java代碼,就像操做關係型數據庫同樣操做hadoop,hive提供一種類SQL的語言使用。mysql
爲了簡單方便,這裏採用虛擬機搭建hadoop集羣系統。linux
linux系統安裝沒有什麼好說的,略過。web
節點規劃:sql
NN | SNN | DN | MR | YARN | |
hd1 | Y | Y | Y | ||
hd2 | Y | Y | |||
hd3 | Y | ||||
hd4 | Y |
這裏介紹幾個概念:數據庫
主要功能:apache
元數據還會存儲到磁盤的文件名fsimage,block的位置不會保存到fsimage,爲了提升速度 block位置信息不會保存到fsimage,block位置會一直在內存中加載讀取。 若是修改一條數據,hdfs並不會立刻修改fsimage,而是先記錄到metadata操做日誌,等到知足條件會寫到fsimage中。咱們能夠這麼理解,fsimage是對元數據的按期全備,edits是對元數據的實時記錄,是對源數據庫的增量備份,全備加上增量備份就是元數據的完整備份。bash
它不是nn的備份,只是nn的一部分數據備份。固然也能夠做爲備份,它的主要功能是協助nn合併editslog,減小nn啓動時間。fsimage在跟edits合併的時候會刪除,這個時候會有大量的IO操做,這個時候snn會協助nn完整這個動做。合併完成以後在snn上會生成一個新的fsimage,而後推送給nn. 以上動做會周而復始的進行。
yarn的引入,使得多個計算框架可運行在一個集羣中。每一個應用程序對應一個applicationMaster。目前多個計算框架能夠運行在yarn上。
基本功能:
每個mapreduce做業對應一個MRAppMaster:
MRAppMaster容錯 :
DN在啓動的時候會向NN彙報block信息; 經過主動向nn發送心跳保持一致(3s),若是NN 10分鐘沒有收到DN心跳信息,則認爲其已經丟失,並拷貝其block到其餘DN.
爲了方便這裏採用虛擬機拷貝方式,這裏注意一點就是網卡信息問題,能夠參考 https://my.oschina.net/u/3862440/blog/2250996。
可參考http://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-common/ClusterSetup.html
[root@hd1 ~]# vi /etc/sysconfig/network NETWORKING=yes HOSTNAME=hd2.localdomain
groupadd -g 10010 hadoop useradd -u 10010 -g hadoop -d /home/hadoop hadoop groupadd -g 10012 mysql useradd -u 10012 -g mysql -d /home/mysql mysql
tar -xvf jdk-8u11-linux-x64.tar -C /usr/
JAVA_HOME=/usr/java/jdk1.8.0_11 export JAVA_HOME export JRE_HOME=/usr/java/jdk1.8.0_11 export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JRE_HOME/lib export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin
[root@hd1 ~]# vi /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.83.11 hd1 192.168.83.12 hd2 192.168.83.13 hd3 192.168.83.14 hd4
ssh-keygen -t rsa cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys chmod 0600 ~/.ssh/authorized_keys
讓hd1擁有hd2,hd3,hd4節點的公鑰,其餘幾個節點也須要擁有hd1的公鑰;這樣hd1鏈接其餘幾個節點就不須要密碼了,其餘幾個節點鏈接hd1也不須要密碼。
hadoop-2.6.0-cdh5.7.0
hive-1.1.0-cdh5.7.0.tar
mysql-5.7.20-linux-glibc2.12-x86_64.tar
配置 JAVA_HOME
vi etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.8.0_11
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://hd1:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>file:/usr/hadoop/hadoop-2.7.1/tmp</value> <description>Abase for other temporary directories.</description> </property> </configuration>
<configuration> <property> <name>dfs.namenode.secondary.http-address</name> <value>hd2:50090</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/usr/hadoop/hadoop-2.7.1/tmp/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/usr/hadoop/hadoop-2.7.1/tmp/dfs/data</value> </property> </configuration>
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>hd1:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>hd1:19888</value> </property> </configuration>
<configuration> <property> <name>yarn.resourcemanager.hostname</name> <value>hd1</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
[hadoop@hd1 hadoop]$ vi slaves hd2 hd3 hd4
把hd1 下面的hadoop-2.7.1 文件拷貝到hd2,hd3,hd4節點,
scp -r /home/hadoop/hadoop-2.7.1 hd2:/home/hadoop/ scp -r /home/hadoop/hadoop-2.7.1 hd3:/home/hadoop/ scp -r /home/hadoop/hadoop-2.7.1 hd4:/home/hadoop/
配置hadoop環境變量,爲了方便後面的操做:
export HADOOP_INSTALL=/home/hadoop/hadoop-2.7.1 export PATH=$PATH:$HADOOP_INSTALL/bin:$HADOOP_INSTALL/sbin
分別拷貝到hd2,hd3,hd4節點:
scp ~/.bash_profile hd2:~/ scp ~/.bash_profile hd3:~/ scp ~/.bash_profile hd4:~/
使hadoop用戶環境變量配置文件生效 source ~/.bash_profile
啓動:
咱們知道hdfs是一個分佈式文件系統,是本身的一套文件系統,因此在啓動的時候,須要格式化。hadoop並不能識別linux上的ext3,ext4文件系統。格式化:
hadoop namenode -format
java.net.UnknownHostException: hd1.localdomain: hd1.localdomain: unknown error at java.net.InetAddress.getLocalHost(InetAddress.java:1484) at org.apache.hadoop.net.DNS.resolveLocalHostname(DNS.java:264) at org.apache.hadoop.net.DNS.<clinit>(DNS.java:57) at org.apache.hadoop.hdfs.server.namenode.NNStorage.newBlockPoolID(NNStorage.java:966) at org.apache.hadoop.hdfs.server.namenode.NNStorage.newNamespaceInfo(NNStorage.java:575) at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:157) at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:991) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1429) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1554) Caused by: java.net.UnknownHostException: hd1.localdomain: unknown error at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:907) at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1302) at java.net.InetAddress.getLocalHost(InetAddress.java:1479) ... 8 more 18/10/23 22:49:36 WARN net.DNS: Unable to determine address of the host-falling back to "localhost" address java.net.UnknownHostException: hd1.localdomain: hd1.localdomain: unknown error at java.net.InetAddress.getLocalHost(InetAddress.java:1484) at org.apache.hadoop.net.DNS.resolveLocalHostIPAddress(DNS.java:287) at org.apache.hadoop.net.DNS.<clinit>(DNS.java:58) at org.apache.hadoop.hdfs.server.namenode.NNStorage.newBlockPoolID(NNStorage.java:966) at org.apache.hadoop.hdfs.server.namenode.NNStorage.newNamespaceInfo(NNStorage.java:575) at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:157) at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:991) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1429) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1554) Caused by: java.net.UnknownHostException: hd1.localdomain: unknown error at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:907) at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1302) at java.net.InetAddress.getLocalHost(InetAddress.java:1479) ... 8 more 18/10/23 22:49:36 INFO namenode.FSImage: Allocated new BlockPoolId: BP-520690254-127.0.0.1-1540306176095 18/10/23 22:49:36 INFO common.Storage: Storage directory /usr/hadoop/hadoop-2.7.1/tmp/dfs/name has been successfully formatted. 18/10/23 22:49:36 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0 18/10/23 22:49:36 INFO util.ExitUtil: Exiting with status 0 18/10/23 22:49:36 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at java.net.UnknownHostException: hd1.localdomain: hd1.localdomain: unknown error ************************************************************
能夠看到不能識別hd1.localdomain,因此這裏把主機名修改成hd1,不帶DNS後綴。
全部的節點都須要修改。
vi /etc/sysconfig/network NETWORKING=yes HOSTNAME=hd1
從新格式化:
[hadoop@hd1 ~]$ hadoop namenode -format DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. 18/10/23 22:54:03 INFO util.GSet: capacity = 2^15 = 32768 entries Re-format filesystem in Storage Directory /usr/hadoop/hadoop-2.7.1/tmp/dfs/name ? (Y or N) Y 18/10/23 22:54:06 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1243395970-192.168.83.11-1540306446216 18/10/23 22:54:06 INFO common.Storage: Storage directory /usr/hadoop/hadoop-2.7.1/tmp/dfs/name has been successfully formatted. 18/10/23 22:54:06 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0 18/10/23 22:54:06 INFO util.ExitUtil: Exiting with status 0 18/10/23 22:54:06 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at hd1/192.168.83.11 ************************************************************/
成功。
啓動dfs:
[hadoop@hd1 ~]$ start-dfs.sh
[hadoop@hd1 hadoop]$ start-dfs.sh 18/10/23 23:01:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Starting namenodes on [hd1] hd1: starting namenode, logging to /home/hadoop/hadoop-2.7.1/logs/hadoop-hadoop-namenode-hd1.out hd4: starting datanode, logging to /home/hadoop/hadoop-2.7.1/logs/hadoop-hadoop-datanode-hd4.out hd3: starting datanode, logging to /home/hadoop/hadoop-2.7.1/logs/hadoop-hadoop-datanode-hd3.out hd2: starting datanode, logging to /home/hadoop/hadoop-2.7.1/logs/hadoop-hadoop-datanode-hd2.out Starting secondary namenodes [hd2] hd2: starting secondarynamenode, logging to /home/hadoop/hadoop-2.7.1/logs/hadoop-hadoop-secondarynamenode-hd2.out 18/10/23 23:01:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
以上能夠看到,NN啓動在第一個節點,SNN啓動在第二個節點,DN啓動在hd2,hd3,hd4三個節點,跟以前規劃一致。
啓動yarn:
[hadoop@hd1 ~]$ start-yarn.sh
[hadoop@hd1 hadoop]$ start-yarn.sh starting yarn daemons starting resourcemanager, logging to /home/hadoop/hadoop-2.7.1/logs/yarn-hadoop-resourcemanager-hd1.out hd2: starting nodemanager, logging to /home/hadoop/hadoop-2.7.1/logs/yarn-hadoop-nodemanager-hd2.out hd4: starting nodemanager, logging to /home/hadoop/hadoop-2.7.1/logs/yarn-hadoop-nodemanager-hd4.out hd3: starting nodemanager, logging to /home/hadoop/hadoop-2.7.1/logs/yarn-hadoop-nodemanager-hd3.out
能夠看到,resuurceManager啓動在hd1節點上,NodeManager啓動在hd2,hd3,hd4節點上。
yarn是一個資源管理,任務調度的框架,因此他分配資源,任何的hdfs的操做都須要想yarn申請資源,因此yarn須要在NN上有一個ResourceManager進程,接受用戶任務的請求;因爲yarn是資源管理因此他須要明白全部的節點的資源狀態,因此DN節點須要有NodeManager進程。
[hadoop@hd1 ~]$ mr-jobhistory-daemon.sh start historyserver
[hadoop@hd1 hadoop]$ mr-jobhistory-daemon.sh start historyserver starting historyserver, logging to /home/hadoop/hadoop-2.7.1/logs/mapred-hadoop-historyserver-hd1.out
MR是一個計算框架,他是部署在yarn 之上的,因此這裏啓動yarn 以後,MR就自動啓動了。
jps查看節點進程狀態:
[hadoop@hd1 hadoop]$ jps 3681 NameNode 4259 JobHistoryServer 3957 ResourceManager 4362 Jps [hadoop@hd2 ~]$ jps 2643 DataNode 2973 Jps 2735 SecondaryNameNode 2815 NodeManager [hadoop@hd3 ~]$ jps 2216 DataNode 2472 Jps 2317 NodeManager [hadoop@hd4 ~]$ jps 2368 NodeManager 2504 Jps 2265 DataNode
hdfs 測試:
[hadoop@hd1 ~]$ hdfs dfs -mkdir /hadoop/ 18/10/23 23:26:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable [hadoop@hd1 ~]$ hdfs dfs -ls / 18/10/23 23:26:29 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 2 items drwxr-xr-x - hadoop supergroup 0 2018-10-23 23:26 /hadoop drwxrwx--- - hadoop supergroup 0 2018-10-23 23:09 /tmp
[mysql@hd1 bin]$ ./mysqld --initialize-insecure --basedir=/home/mysql/mysql-5.7.20 --datadir=/home/mysql/mysql-5.7.20/data --user=mysql 2018-10-23T15:33:43.531824Z 0 [Warning] Changed limits: max_open_files: 1024 (requested 5000) 2018-10-23T15:33:43.532051Z 0 [Warning] Changed limits: table_open_cache: 431 (requested 2000) 2018-10-23T15:33:43.532611Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details). 2018-10-23T15:33:46.626624Z 0 [Warning] InnoDB: New log files created, LSN=45790 2018-10-23T15:33:47.139730Z 0 [Warning] InnoDB: Creating foreign key constraint system tables. 2018-10-23T15:33:47.498281Z 0 [Warning] No existing UUID has been found, so we assume that this is the first time that this server has been started. Generating a new UUID: 08944547-d6d9-11e8-b3f4-000c297eaaf3. 2018-10-23T15:33:47.550419Z 0 [Warning] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be opened. 2018-10-23T15:33:47.564974Z 1 [Warning] root@localhost is created with an empty password ! Please consider switching off the --initialize-insecure option.
[mysqld] basedir=/home/mysql/mysql-5.7.20 datadir=/home/mysql/mysql-5.7.20/data socket=/tmp/mysql.sock log_error=/home/mysql/mysql-5.7.20/mysql.err user=mysql [mysql] socket=/tmp/mysql.sock
[mysql@hd1 support-files]$ ./mysql.server start Starting MySQL..[ OK ]
爲了方便啓動,能夠把MySql的啓動腳本拷貝到 /etc/init.d/下面,讓隨着OS一塊兒啓動。
[root@hd1 support-files]# cp mysql.server /etc/init.d/mysqld [root@hd1 support-files]# /etc/init.d/mysqld restart Shutting down MySQL..[ OK ] Starting MySQL.[ OK ]
[mysql@hd1 ~]$ mysql Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 3 Server version: 5.7.20 MySQL Community Server (GPL) Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | mysql | | performance_schema | | sys | +--------------------+ 4 rows in set (0.00 sec)
設置mysql密碼:
[mysql@hd1 bin]$ mysqladmin -u root passwor 'Oracle123' -S '/tmp/mysql.sock'; mysqladmin: [Warning] Using a password on the command line interface can be insecure. Warning: Since password will be sent to server in plain text, use ssl connection to ensure password safety.
密碼登陸:
[mysql@hd1 bin]$ mysql -uroot -p -S '/tmp/mysql.sock' Enter password: Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 5 Server version: 5.7.20 MySQL Community Server (GPL) Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> show database; ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'database' at line 1 mysql> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | mysql | | performance_schema | | sys | +--------------------+ 4 rows in set (0.00 sec)
到這裏MySql安裝完成。
解壓
配置:[hadoop@hd1 conf]$ more hive-site.xml ,
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>hive.metastore.local</name> <value>true</value> </property> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://192.168.83.11:3306/hive?characterEncoding=UTF-8</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>Oracle123</value><!-- 這裏是數據庫密碼 --> </property> </configuration>
這裏配置訪問MySql的帳號密碼,須要把mysql的驅動包(mysql-connector-java.jar)放到Hive/lib下面。
在MySql裏面事先須要建立愛你一個hive存放元數據的庫。
mysql> create database hive; Query OK, 1 row affected (0.11 sec) mysql> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | hive | | mysql | | performance_schema | | sys | +--------------------+ 5 rows in set (0.01 sec)
啓動hive :
[hadoop@hd1 conf]$ hive which: no hbase in (/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/java/jdk1.8.0_11/bin:/usr/java/jdk1.8.0_11/bin:/home/hadoop/bin:/home/hadoop/hadoop-2.6.0-cdh5.7.0/bin:/home/hadoop/hadoop-2.6.0-cdh5.7.0/sbin:/home/hadoop/hive-1.1.0-cdh5.7.0/bin) 18/10/24 03:14:34 WARN conf.HiveConf: HiveConf of name hive.metastore.local does not exist Logging initialized using configuration in jar:file:/home/hadoop/hive-1.1.0-cdh5.7.0/lib/hive-common-1.1.0-cdh5.7.0.jar!/hive-log4j.properties WARNING: Hive CLI is deprecated and migration to Beeline is recommended.