Windows環境下安裝Hadoop+Hive的使用案例

Hadoop安裝:

首先到官方下載官網的hadoop2.7.7,連接以下 
https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/ 
找網盤的hadooponwindows-master.zip 
連接以下 
https://pan.baidu.com/s/1VdG6PBnYKM91ia0hlhIeHg 
把hadoop-2.7.7.tar.gz解壓後 
使用hadooponwindows-master的bin和etc替換hadoop2.7.7的bin和etc 

java

注意:安裝Hadoop2.7.7 
  官網下載Hadoop2.7.7,安裝時注意,最好不要安裝到帶有空格的路徑名下,例如:Programe Files,不然在配置Hadoop的配置文件時會找不到JDK(按相關說法,配置文件中的路徑加引號便可解決,但我沒測試成功)。 
配置HADOOP_HOME node

path添加%HADOOP_HOME%\bin(win10不用分號或者以下編輯界面不用分號,其他加上 ;)mysql

 

-----------------------------------------------------------配置文件----------------------------web

使用編輯器打開E:\Hadoop2.7.7\hadoop-2.7.7\etc\hadoop\hadoop-env.cmd 
修改JAVA_HOME的路徑 
把set JAVA_HOME改成jdk的位置 
注意其中PROGRA~1表明Program Files 
set JAVA_HOME=E:\PROGRA~1\Java\jdk1.8.0_171 sql

打開 hadoop-2.7.7/etc/hadoop/hdfs-site.xml 
修改路徑爲hadoop下的namenode和datanode 

dfs.replication 
數據庫


dfs.namenode.name.dir 
/E:/Hadoop2.7.7/hadoop-2.7.7/data/namenode apache


dfs.datanode.data.dir 
/E:/Hadoop2.7.7/hadoop-2.7.7/data/datanode 

windows

在E:\Hadoop-2.7.7目錄下 添加tmp文件夾 
在E:/Hadoop2.7.7/hadoop-2.7.7/添加data和namenode,datanode子文件夾 maven

 

 

 

 還須要把hadoop.dll(從)拷貝到 C:\Windows\System32 編輯器

 

否則在window平臺使用MapReduce測試時報錯

以管理員身份打開命令提示符 
輸入hdfs namenode -format,看到seccessfully就說明format成功。

 

轉到Hadoop-2.7.3\sbin文件下   輸入start-all,啓動hadoop集羣 ,關閉是 stop-all

輸入jps - 能夠查看運行的全部節點

 

訪問http://localhost:50070,訪問hadoop的web界面

 

---------------------------------------------------------------------

hadoop啓動後,建立以下的HDFS文件:

D:\Code\hadoop-2.7.7\hadoop-2.7.7\sbin>hdfs dfs -mkdir /user
D:\Code\hadoop-2.7.7\hadoop-2.7.7\sbin>hdfs dfs -mkdir /user/hive
D:\Code\hadoop-2.7.7\hadoop-2.7.7\sbin>hdfs dfs -mkdir /user/hive/warehouse
D:\Code\hadoop-2.7.7\hadoop-2.7.7\sbin>hdfs dfs -mkdir /tmp
D:\Code\hadoop-2.7.7\hadoop-2.7.7\sbin>hdfs dfs -mkdir /tmp/hive
D:\Code\hadoop-2.7.7\hadoop-2.7.7\sbin>hadoop fs -chmod -R 777 /tmp

 

 

HIVE安裝:

1.安裝hadoop

2.從maven中下載mysql-connector-java-5.1.26-bin.jar(或其餘jar版本)放在hive目錄下的lib文件夾

3.配置hive環境變量,HIVE_HOME=F:\hadoop\apache-hive-2.1.1-bin

4.hive配置

hive的配置文件放在$HIVE_HOME/conf下,裏面有4個默認的配置文件模板

hive-default.xml.template                           默認模板

hive-env.sh.template                hive-env.sh默認配置

hive-exec-log4j.properties.template    exec默認配置

 hive-log4j.properties.template               log默認配置

可不作任何修改hive也能運行,默認的配置元數據是存放在Derby數據庫裏面的,大多數人都不怎麼熟悉,咱們得改用mysql來存儲咱們的元數據,以及修改數據存放位置和日誌存放位置等使得咱們必須配置本身的環境,下面介紹如何配置。

(1)建立配置文件

$HIVE_HOME/conf/hive-default.xml.template  -> $HIVE_HOME/conf/hive-site.xml

$HIVE_HOME/conf/hive-env.sh.template  -> $HIVE_HOME/conf/hive-env.sh

$HIVE_HOME/conf/hive-exec-log4j.properties.template ->  $HIVE_HOME/conf/hive-exec-log4j.properties

$HIVE_HOME/conf/hive-log4j.properties.template  -> $HIVE_HOME/conf/hive-log4j.properties

(2)修改 hive-env.sh

export HADOOP_HOME=F:\hadoop\hadoop-2.7.2
export HIVE_CONF_DIR=F:\hadoop\apache-hive-2.1.1-bin\conf
export HIVE_AUX_JARS_PATH=F:\hadoop\apache-hive-2.1.1-bin\lib

(3)修改 hive-site.xml

複製代碼
  1  <!--修改的配置-->  
  2 
  3 <property>  
  4 
  5 <name>hive.metastore.warehouse.dir</name>  
  6 
  7 <!--hive的數據存儲目錄,指定的位置在hdfs上的目錄-->  
  8 
  9 <value>/user/hive/warehouse</value>  
 10 
 11 <description>location of default database for the warehouse</description>  
 12 
 13 </property>  
 14 
 15 <property>  
 16 
 17 <name>hive.exec.scratchdir</name>  
 18 
 19 <!--hive的臨時數據目錄,指定的位置在hdfs上的目錄-->  
 20 
 21 <value>/tmp/hive</value>  
 22 
 23 <description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/<username> is created, with ${hive.scratch.dir.permission}.</description>  
 24 
 25 </property>  
 26 
 27 <property>  
 28 
 29 <name>hive.exec.local.scratchdir</name>  
 30 
 31 <!--本地目錄-->  
 32 
 33 <value>F:/hadoop/apache-hive-2.1.1-bin/hive/iotmp</value>  
 34 
 35 <description>Local scratch space for Hive jobs</description>  
 36 
 37 </property>  
 38 
 39 <property>  
 40 
 41 <name>hive.downloaded.resources.dir</name>  
 42 
 43 <!--本地目錄-->  
 44 
 45 <value>F:/hadoop/apache-hive-2.1.1-bin/hive/iotmp</value>  
 46 
 47 <description>Temporary local directory for added resources in the remote file system.</description>  
 48 
 49 </property>  
 50 
 51 <property>  
 52 
 53 <name>hive.querylog.location</name>  
 54 
 55 <!--本地目錄-->  
 56 
 57 <value>F:/hadoop/apache-hive-2.1.1-bin/hive/iotmp</value>  
 58 
 59 <description>Location of Hive run time structured log file</description>  
 60 
 61 </property>  
 62 
 63 <property>  
 64 
 65 <name>hive.server2.logging.operation.log.location</name>  
 66 
 67 <value>F:/hadoop/apache-hive-2.1.1-bin/hive/iotmp/operation_logs</value>  
 68 
 69 <description>Top level directory where operation logs are stored if logging functionality is enabled</description>  
 70 
 71 </property>  
 72 
 73 <!--新增的配置-->  
 74 
 75 <property>  
 76 
 77 <name>javax.jdo.option.ConnectionURL</name>  
 78 
 79 <value>jdbc:mysql://localhost:3306/hive?characterEncoding=UTF-8</value>  
 80 
 81 </property>  
 82 
 83 <property>  
 84 
 85 <name>javax.jdo.option.ConnectionDriverName</name>  
 86 
 87 <value>com.mysql.jdbc.Driver</value>  
 88 
 89 </property>  
 90 
 91 <property>  
 92 
 93 <name>javax.jdo.option.ConnectionUserName</name>  
 94 
 95 <value>root</value>  
 96 
 97 </property>  
 98 
 99 <property>  
100 
101 <name>javax.jdo.option.ConnectionPassword</name>  
102 
103 <value>root</value>  
104 
105 </property>  
106 
107 <!-- 解決 Required table missing : "`VERSION`" in Catalog "" Schema "". DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable "datanucleus.autoCreateTables"  -->  
108 
109 <property>  
110 
111 <name>datanucleus.autoCreateSchema</name>  
112 
113 <value>true</value>  
114 
115 </property>  
116 
117 <property>  
118 
119 <name>datanucleus.autoCreateTables</name>  
120 
121 <value>true</value>  
122 
123 </property>  
124 
125 <property>  
126 
127 <name>datanucleus.autoCreateColumns</name>  
128 
129 <value>true</value>  
130 
131 </property>  
132 
133 <!-- 解決 Caused by: MetaException(message:Version information not found in metastore. )  -->  
134 
135 <property>    
136 
137 <name>hive.metastore.schema.verification</name>    
138 
139 <value>false</value>    
140 
141 <description>    
142 
143     Enforce metastore schema version consistency.    
144 
145     True: Verify that version information stored in metastore matches with one from Hive jars.  Also disable automatic    
146 
147           schema migration attempt. Users are required to manully migrate schema after Hive upgrade which ensures    
148 
149           proper metastore schema migration. (Default)    
150 
151     False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.    
152 
153 </description>    
154 
155 </property>   
複製代碼

注:須要事先在hadoop上建立hdfs目錄

啓動metastore服務:hive --service metastore

在數據庫中生成對應的 hive 數據庫

 

 啓動Hive:hive

 

-------------------------------------------------------------- 建立表 以及 查詢案例

hive上建立表:

CREATE TABLE testB (
id INT,
name string,
area string
) PARTITIONED BY (create_time string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE;

 

將本地文件上傳到 HDFS:

HDFS下執行:    D:\Code\hadoop-2.7.7\hadoop-2.7.7\sbin>hdfs dfs -put D:\Code\hadoop-2.7.7\gxy\bbb.txt /user/hive/warehouse

 

 

hive導入HDFS中的數據:

LOAD DATA INPATH '/user/hive/warehouse/bbb.txt' INTO TABLE testb PARTITION(create_time='2015-07-08');

 

 執行選擇命令:

select * from testb;

相關文章
相關標籤/搜索