本博客主要介紹SparkSql基於Hive做爲元數據的基本操做,包括如下內容:html
一、Hive安裝java
二、Spark與Hive的集成mysql
三、SparkSql的操做sql
注:在操做本博客的內容時,須要安裝Hadoop和Spark。數據庫
其中hadoop安裝可參考:https://my.oschina.net/u/729917/blog/1556872apache
spark安裝可參考:https://my.oschina.net/u/729917/blog/1556871vim
一、Hive安裝bash
a)、安裝Mysql數據庫,此步驟自行百度。ide
b)、官網下載Hive:http://mirror.bit.edu.cn/apache/hive/,做者下載後放在了目錄/home/hadoop/tools/apache-hive-2.2.0-bin.tar.gz下。oop
c)、移動到指定目錄並解壓:做者是解壓到/usr/local/目錄下,而且做者的Hadoop和Spark均是安裝在此目錄下。
sudo mv /home/hadoop/tools/apache-hive-2.2.0-bin.tar.gz /usr/local/apache-hive-2.2.0-bin/
sudo tar -zxvf apache-hive-2.2.0-bin.tar.gz
d)、配置環境變量
vim ~/.bashrc
export HIVE_HOME=/usr/local/apache-hive-2.2.0-bin export PATH=$PATH:${HIVE_HOME}/bin
環境變量生效
source ~/.bashrc
e)、在conf目錄下新建一個hive-site.xml,配置hive信息,使用mysql保存hive元數據信息
hadoop@Master:/usr/local/apache-hive-2.2.0-bin/conf$ touch hive-site.xml
下面是hive-site.xml的信息
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>0000</value> </property> <property> <name>hive.metastore.schema.verification</name> <value>false</value> <description> Enforce metastore schema version consistency. True: Verify that version information stored in metastore matches with one from Hive jars. Also disable automatic schema migration attempt. Users are required to manully migrate schema after Hive upgrade which ensures proper metastore schema migration. (Default) False: Warn if the version information stored in metastore doesn't match with one from in Hive jars. </description> </property> </configuration>
f)、啓動hive:輸入hive命令便可
啓動成功顯示:
hadoop@Master:~$ hive SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/apache-hive-2.2.0-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Logging initialized using configuration in jar:file:/usr/local/apache-hive-2.2.0-bin/lib/hive-common-2.2.0.jar!/hive-log4j2.properties Async: true Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. hive>
二、Spark集成Hive
a)、在spark的conf目錄下新建hive-site.xml文件
hadoop@Master:/usr/local/spark-2.2.0-bin-hadoop2.7/conf$ touch hive-site.xml
<configuration> <property> <name>hive.metastore.uris</name> <value>thrift://Master:9083</value> </property> </configuration>
b)、啓動hadoop和spark
c)、啓動hive service metastore服務
hadoop@Master:/usr/local/spark-2.2.0-bin-hadoop2.7/bin$ hive --service metastore&
d)、啓動spark-sql進行測試
hadoop@Master:/usr/local/spark-2.2.0-bin-hadoop2.7/bin$ ./spark-sql
啓動成功後部分截屏
17/11/19 21:50:37 INFO SessionState: Created HDFS directory: /tmp/hive/hadoop/ce95f463-74ca-42de-ac85-3a283aa1520a 17/11/19 21:50:37 INFO SessionState: Created local directory: /tmp/hadoop/ce95f463-74ca-42de-ac85-3a283aa1520a 17/11/19 21:50:37 INFO SessionState: Created HDFS directory: /tmp/hive/hadoop/ce95f463-74ca-42de-ac85-3a283aa1520a/_tmp_space.db 17/11/19 21:50:37 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/usr/local/spark-2.2.0-bin-hadoop2.7/bin/spark-warehouse 17/11/19 21:50:37 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint 17/11/19 21:50:38 INFO SessionState: Created local directory: /tmp/2110b645-b83e-4b65-87a8-5e9f1482699e_resources 17/11/19 21:50:38 INFO SessionState: Created HDFS directory: /tmp/hive/hadoop/2110b645-b83e-4b65-87a8-5e9f1482699e 17/11/19 21:50:38 INFO SessionState: Created local directory: /tmp/hadoop/2110b645-b83e-4b65-87a8-5e9f1482699e 17/11/19 21:50:38 INFO SessionState: Created HDFS directory: /tmp/hive/hadoop/2110b645-b83e-4b65-87a8-5e9f1482699e/_tmp_space.db 17/11/19 21:50:38 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/usr/local/spark-2.2.0-bin-hadoop2.7/bin/spark-warehouse spark-sql>