1 安裝hivehtml
下載java
http://mirrors.shu.edu.cn/apache/hive/hive-1.2.2/,紅框中的不須要編譯。mysql
因爲hive是默認將元數據保存在本地內嵌的 Derby 數據庫中,可是這種作法缺點也很明顯,Derby不支持多會話鏈接,所以本文將選擇mysql做爲元數據存儲。web
安裝mysqlsql
yum安裝mysql 1 wget -i -c http://dev.mysql.com/get/mysql57-community-release-el7-10.noarch.rpm 2 yum -y install mysql57-community-release-el7-10.noarch.rpm 3 yum -y install mysql-community-server 啓動MySQL systemctl start mysqld.service 查看MySQL運行狀態 systemctl status mysqld.service mysql -uroot -p # 回車後會提示輸入密碼 此時MySQL已經開始正常運行,不過要想進入MySQL還得先找出此時root用戶的密碼,經過以下命令能夠在日誌文件中找出密碼: ALTER USER 'root'@'localhost' IDENTIFIED BY 'new password'; 具體請參考 https://www.cnblogs.com/brianzhu/p/8575243.html
tar -zxvf apache-hive-1.2.3-bin.tar.gz 後的內容以下,進入到conf數據庫
cp hive-default.xml.template hive-site.xml
編輯文件hive-site.xmlexpress
<?xml version="1.0" encoding="utf-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --> <configuration> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>xxxx</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>xxxx</value> </property> <property> <name>javax.jdo.option.ConnectionURL</name>mysql <value>jdbc:mysql://hostIP:3306/hive?createDatabaseIfNotExist=true&useSSL=false&nullNamePatternMatchesAll=true</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> </configuration>
複製mysql的驅動程序到hive/lib下面apache
運行hive以前首先要確保meta store服務已經啓動, nohup hive --service metastore > metastore.log 2>&1 & 若是須要用到遠程客戶端(好比 Tableau)鏈接到hive數據庫,還須要啓動hive service nohup hive --service hiveserver2 > hiveserver2.log 2>&1 &
[sms@gc64 conf]$ hive --help
Usage ./hive <parameters> --service serviceName <service parameters>
Service List: beeline cli help hiveburninclient hiveserver2 hiveserver hwi jar lineage metastore metatool orcfiledump rcfilecat schemaTool version
Parameters parsed:
--auxpath : Auxillary jars
--config : Hive configuration directory
--service : Starts specific service/component. cli is default
Parameters used:
HADOOP_HOME or HADOOP_PREFIX : Hadoop install directory
HIVE_OPT : Hive options
For help on a particular service:
./hive --service serviceName --help
Debug help: ./hive --debug --help
hive2.0一下沒有web查看app
[sms@gc64 ~]$ hive Logging initialized using configuration in jar:file:/home/sms/app/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.properties Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary. hive> > > > show databases; OK default Time taken: 1.285 seconds, Fetched: 1 row(s) hive> > > > > > > > > >
from pyspark.sql import HiveContext,Row from pyspark import SparkConf, SparkContext conf = SparkConf().setMaster("local").setAppName("count") sc = SparkContext(conf=conf) hiveCtx=HiveContext(sc) hiveCtx.sql("show tables").show() hiveCtx.sql("select count(1) from (select msid from raw_data group by msid) a").show()