title: Hive的安裝及配置
summary: 關鍵詞:Hive ubuntu 安裝和配置 Derby MySQL PostgreSQL 數據庫鏈接
date: 2019-5-19 13:25
urlname: 2019051903
author: foochane
img: /medias/featureimages/19.jpg
categories: 大數據
tags:html
本文做者: foochane
本文連接: https://foochane.cn/article/2019051903.html
在安裝hive以前,須要安裝hadoop集羣環境,若是沒有能夠查看:Hadoop分佈式集羣的搭建java
軟件 | 版本 | 下載地址 |
---|---|---|
linux | Ubuntu Server 18.04.2 LTS | https://www.ubuntu.com/downlo... |
hadoop | hadoop-2.7.1 | http://archive.apache.org/dis... |
java | jdk-8u211-linux-x64 | https://www.oracle.com/techne... |
hive | hive-2.3.5 | http://mirror.bit.edu.cn/apac... |
mysql-connector-java | mysql-connector-java-5.1.45.jar | 命令行安裝 |
postgresql-jdbc4 | postgresql-jdbc4.jar | 命令行安裝 |
名稱 | ip | hostname |
---|---|---|
主節點 | 192.168.233.200 | Master |
子節點1 | 192.168.233.201 | Slave01 |
子節點2 | 192.168.233.202 | Slave02 |
注意:本文的hive
、MySQL
、PostgreSQL
均只安裝在Master
節點上,實際生產環境中,需根據實際狀況調整mysql
Hive
默認元數據保存在內嵌的 Derby
數據庫中,這是最簡單的一種存儲方式,使用derby
存儲方式時,運行hive
會在當前目錄生成一個derby
文件和一個metastore_db
目錄。Derby
數據庫中,只能容許一個會話鏈接,只適合簡單的測試,實際生產環境中不適用。 爲了支持多用戶會話,則須要一個獨立的元數據庫,使用 MySQL
或者PostgreSQL
做爲元數據庫,Hive
內部對 MySQL
和PostgreSQL
提供了很好的支持。linux
本文將逐一介紹hive
鏈接Derby
、PostgreSQL
、MySQL
這三種數據庫數據庫的安裝和配置。sql
$ tar -zxvf apache-hive-2.3.5-bin.tar.gz -C /usr/local/bigdata & cd /usr/local/bigdata $ mv apache-hive-2.3.5-bin hive-2.3.5 $ sudo chown -R hadoop:hadoop hive #以前bigdata目錄已經修改過權限了
要修改的文件在/usr/local/hive-2.3.5/conf
目錄下,須要修改hive-site.xml
、hive-env.sh
、hive-log4j2.properties
這3個文件。數據庫
先把.template
文件複製一份出來,而後進行修改。apache
$ cd /usr/local/hive-2.3.5/conf $ cp hive-default.xml.template hive-site.xml $ cp hive-env.sh.template hive-env.sh $ cp hive-log4j.properties.template hive-log4j.properties
配置Derby只須要修改javax.jdo.option.ConnectionURL
指定metastore_db
的存儲位置便可
具體修改以下:ubuntu
<property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:derby:;databaseName=/usr/local/bigdata/hive-2.3.5/metastore/metastore_db;create=true</value> <description> JDBC connect string for a JDBC metastore. To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL. For example, jdbc:postgresql://myhost/db?ssl=true for postgres database. </description> </property>
添加:bash
export HADOOP_HOME=/usr/local/bigdata/hadoop-2.7.1 export HIVE_CONF_DIR=/usr/local/bigdata/hive-2.3.5/conf
日誌配置能夠先默認,暫時不修改什麼。session
在 ~/.bashrc
文件中添加以下內容,執行source ~/.bashrc
使其生效。
export HIVE_HOME=/usr/local/bigdata/hive-2.3.5 export PATH=$PATH:/usr/local/bigdata/hive-2.3.5/bin
注意先啓動hadoop
集羣
$ hadoop fs -mkdir -p /user/hive/warehouse $ hadoop fs -mkdir -p /tmp $ hadoop fs -chmod g+w /user/hive/warehouse $ hadoop fs -chmod g+w /tmp
初始化元數據數據庫
$ schematool -initSchema -dbType derby
成功初始化應該出現以下內容:
$ schematool -initSchema -dbType derby SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/bigdata/hive-2.3.5/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/bigdata/hadoop-2.7.1/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Metastore connection URL: jdbc:derby:;databaseName=/usr/local/bigdata/hive-2.3.5/metastore/metastore_db;create=true Metastore Connection Driver : org.apache.derby.jdbc.EmbeddedDriver Metastore connection User: APP Starting metastore schema initialization to 2.3.0 Initialization script hive-schema-2.3.0.derby.sql Initialization script completed schemaTool completed
啓動hive
$ hive
若是成功運行將出現以下內容:
$ hive SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/bigdata/hive-2.3.5/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/bigdata/hadoop-2.7.1/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Logging initialized using configuration in file:/usr/local/bigdata/hive-2.3.5/conf/hive-log4j2.properties Async: true Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. hive> >
建立表
create table t1( id int ,name string ,hobby array<string> ,add map<String,string> ) row format delimited fields terminated by ',' collection items terminated by '-' map keys terminated by ':' ;
hive> > > > show databases; OK default Time taken: 22.279 seconds, Fetched: 1 row(s) hive> create table t1( > id int > ,name string > ,hobby array<string> > ,add map<String,string> > ) > row format delimited > fields terminated by ',' > collection items terminated by '-' > map keys terminated by ':' > ; OK Time taken: 1.791 seconds hive>
至此,以Derby
作元數據庫的hive鏈接方式就配置完成了。
下面介紹如何將hive
鏈接到PostgreSQL
和MySQL
執行以下命令:
$ sudo apt install postgresql postgresql-contrib
安裝完成後默認會有一個postgres
的用戶,且沒有密碼,做爲管理員
$ sudo systemctl enable postgresql $ sudo systemctl start postgresql
hadoop@Master:~$ sudo -i -u postgres postgres@Master:~$ psql psql (10.8 (Ubuntu 10.8-0ubuntu0.18.04.1)) Type "help" for help. postgres=# help You are using psql, the command-line interface to PostgreSQL. Type: \copyright for distribution terms \h for help with SQL commands \? for help with psql commands \g or terminate with semicolon to execute query \q to quit postgres=#
$ sudo apt-get install libpostgresql-jdbc-java $ ln -s /usr/share/java/postgresql-jdbc4.jar /usr/local/bigdata/hive-2.3.5/lib
修改 /etc/postgresql/10/main/pg_hba.conf文件
# Database administrative login by Unix domain socket #local all postgres peer local all postgres trust # TYPE DATABASE USER ADDRESS METHOD # "local" is for Unix domain socket connections only #local all all peer local all all trust # IPv4 local connections: #host all all 127.0.0.1/32 md5 host all all 127.0.0.1/32 trust # IPv6 local connections: #host all all ::1/128 md5 host all all ::1/128 trust # Allow replication connections from localhost, by a user with the # replication privilege. #local replication all peer #local replication all peer #local replication all peer local replication all trust host replication all 127.0.0.1/32 trust host replication all ::1/128 trust
先建立一個名爲hiveuser
的用戶,密碼:123456
,
而後建立一個名爲metastore
的數據庫:
$ sudo -u postgres psql postgres=# CREATE USER hiveuser WITH PASSWORD '123456'; postgres=# CREATE DATABASE metastore;
測試用戶和數據庫是否能登陸
$ psql -h localhost -U hiveuser -d pymetastore
登陸成功說明配置完成
hadoop@Master:~$ psql -h localhost -U hiveuser -d metastore Password for user hive: psql (10.8 (Ubuntu 10.8-0ubuntu0.18.04.1)) SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off) Type "help" for help. pymetastore=>
以前配置的是以Derby
作元數據庫,如今一樣也是修改hive-site.xml
文件。
首先在開頭添加以下內容:
<property> <name>system:java.io.tmpdir</name> <value>/tmp/hive/java</value> </property> <property> <name>system:user.name</name> <value>${user.name}</value> </property>
而後修改以下屬性:
name | value | description |
---|---|---|
javax.jdo.option.ConnectionURL |
jdbc:postgresql://localhost/metastore |
指定鏈接的數據庫(以前建立的) |
javax.jdo.option.ConnectionDriverName |
org.postgresql.Driver |
數據庫驅動 |
javax.jdo.option.ConnectionUserName |
hiveuser |
用戶名(以前建立的) |
javax.jdo.option.ConnectionPassword |
123456 |
用戶名密碼 |
具體以下:
<property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:postgresql://localhost/metastore</value> <description> JDBC connect string for a JDBC metastore. To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL. For example, jdbc:postgresql://myhost/db?ssl=true for postgres database. </description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>org.postgresql.Driver</value> <description>Driver class name for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hiveuser</value> <description>Username to use against metastore database</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>123456</value> <description>password to use against metastore database</description> </property>
先運行schematool
進行初始化:
schematool -dbType postgres -initSchema
而後執行$ hive
啓動hive。
建立表格進行測試
hadoop@Master:~$ hive SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/bigdata/hive-2.3.5/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/bigdata/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Logging initialized using configuration in file:/usr/local/bigdata/hive-2.3.5/conf/hive-log4j2.properties Async: true Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /usr/local/bigdata/hadoop-2.7.7/lib/native/libhadoop.so which might have disabled stack guard. The VM will try to fix the stack guard now. It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'. Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. hive> > show databases; OK default Time taken: 12.294 seconds, Fetched: 1 row(s) hive> create table t1( > id int > ,name string > ,hobby array<string> > ,add map<String,string> > ) > row format delimited > fields terminated by ',' > collection items terminated by '-' > map keys terminated by ':' > ; OK Time taken: 1.239 seconds hive> Connection reset by 192.168.233.200 port 22
查看是否建立成功:
$ psql -h localhost -U hiveuser -d metastore psql (10.8 (Ubuntu 10.8-0ubuntu0.18.04.1)) SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off) Type "help" for help. metastore=> SELECT * from "TBLS"; TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME | TBL_TYPE | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT | IS_REWRITE_ENABLED --------+-------------+-------+------------------+--------+-----------+-------+----------+---------------+--------------------+--------------------+-------------------- 1 | 1560074934 | 1 | 0 | hadoop | 0 | 1 | t1 | MANAGED_TABLE | | | f (1 row)
$ sudo apt install mysql-server
若是沒有設置密碼的話,設置密碼。
這裏密碼設置爲hadoop
$ mysql -u root -p
用來存放Hive的元數據。
與Hive配置文件hive-site.xml
中的 mysql://localhost:3306/metastore
對應
#創建數據庫和用戶 mysql> create database if not exists metastore; mysql> CREATE USER 'hiveuser'@'localhost' IDENTIFIED BY '123456'; #設置遠程登陸的權限 mysql> REVOKE ALL PRIVILEGES, GRANT OPTION FROM 'hiveuser'@'localhost'; mysql> GRANT ALL PRIVILEGES ON metastore.* TO 'hiveuser'@'localhost'; #刷新配置 mysql> FLUSH PRIVILEGES; mysql> quit;
$ sudo apt-get install libmysql-java $ ln -s /usr/share/java/mysql-connector-java-5.1.45.jar /usr/local/bigdata/hive-2.3.5/lib
首先在開頭添加以下內容:
<property> <name>system:java.io.tmpdir</name> <value>/tmp/hive/java</value> </property> <property> <name>system:user.name</name> <value>${user.name}</value> </property>
而後修改以下屬性:
name | value | description |
---|---|---|
javax.jdo.option.ConnectionURL |
jdbc:mysql://localhost:3306/metastore?useSSL=true |
指定鏈接的數據庫(以前建立的) |
javax.jdo.option.ConnectionDriverName |
com.mysql.jdbc.Driver |
數據庫驅動 |
javax.jdo.option.ConnectionUserName |
hiveuser |
用戶名(以前建立的) |
javax.jdo.option.ConnectionPassword |
123456 |
用戶名密碼 |
具體以下:
<property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://localhost:3306/metastore?useSSL=true</value> <description> JDBC connect string for a JDBC metastore. To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL. For example, jdbc:postgresql://myhost/db?ssl=true for postgres database. </description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>Driver class name for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hiveuser</value> <description>Username to use against metastore database</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>123456</value> <description>password to use against metastore database</description> </property>
先初始化
schematool -dbType mysql -initSchema
和前面同樣,執行
$ hive
初始化derby時報以下錯誤,提示沒有hive-exec-*.jar
hadoop@Master:~$ schematool -initSchema -dbType derby Missing Hive Execution Jar: /usr/local/biddata/hive-2.3.5/lib/hive-exec-*.jar
檢查該目錄下是否確實不存在hive-exec-2.35.jar
,若是不存在,下載一個放到該目錄下。
下載地址:https://mvnrepository.com/art...
若是存在,那必定是環境變量配置有問題,查看HIVE_HOME
及$HIVE_HOME/bin
是否配置正確。
報錯:
Exception in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.na at org.apache.hadoop.fs.Path.initialize(Path.java:205) at org.apache.hadoop.fs.Path.<init>(Path.java:171) at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:659) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:582) at org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:549) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:750) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D at java.net.URI.checkPath(URI.java:1823) at java.net.URI.<init>(URI.java:745) at org.apache.hadoop.fs.Path.initialize(Path.java:202) ... 12 more
在hive-site.xml
文件開頭加入以下配置:
<property> <name>system:java.io.tmpdir</name> <value>/tmp/hive/java</value> </property> <property> <name>system:user.name</name> <value>${user.name}</value> </property>
執行$ schematool -dbType postgres -initSchema
時報錯
hadoop@Master:~$ schematool -dbType postgres -initSchema SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/bigdata/hive-2.3.5/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/bigdata/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Metastore connection URL: jdbc:postgresql://localhost/pymetastore Metastore Connection Driver : org.postgresql.Driver Metastore connection User: hive Starting metastore schema initialization to 2.3.0 Initialization script hive-schema-2.3.0.postgres.sql Error: ERROR: relation "BUCKETING_COLS" already exists (state=42P07,code=0) org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !! Underlying cause: java.io.IOException : Schema script failed, errorcode 2 Use --verbose for detailed stacktrace. *** schemaTool failed ***
另外也會有這個錯:
Error: ERROR: relation "txns" already exists (state=42P07,code=0) org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !! Underlying cause: java.io.IOException : Schema script failed, errorcode 2 Use --verbose for detailed stacktrace. *** schemaTool failed ***
這個問題,我嘗試了好久也沒有找到緣由,網上有說是hive
版本的緣由,我換了hive-1.2.1
、hive-1.2.2
等低版本的hive,依然時候有這個問題。
最後是從新建立用戶和數據庫就沒有這個問題了,感受是數據庫有衝突。
Error: Duplicate key name 'PCS_STATS_IDX' (state=42000,code=1061) org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !! Underlying cause: java.io.IOException : Schema script failed, errorcode 2 Use --verbose for detailed stacktrace. *** schemaTool failed ***
注意使用MySQL存儲元數據的時候,使用root用戶有可能權限不夠,會報錯。另外,$ schematool -dbType postgres -initSchema
執行一次就行了。