CentOS7+Hadoop2.7.2(HA高可用+Federation聯邦)+Hive1.2.1+Spark2.1.0 徹底分佈式集羣安裝

1       VM網絡配置... 3html

2       CentOS配置... 5java

2.1             下載地址... 5node

2.2             激活網卡... 5mysql

2.3             SecureCRT. 5linux

2.4             修改主機名... 6web

2.5             yum代理上網... 7算法

2.6             安裝ifconfig. 8sql

2.7             wget安裝與代理... 8shell

2.8             安裝VMware Tools. 8數據庫

2.9             其餘... 9

2.9.1         問題... 9

2.9.2         設置... 9

2.9.2.1     去掉開機等待時間... 9

2.9.2.2    VM調整... 9

2.9.3         命令... 10

2.9.3.1     關機與重啓... 10

2.9.3.2     服務中止與禁用... 10

2.9.3.3     查大文件目錄... 11

2.9.3.4     查看磁盤使用狀況... 11

2.9.3.5     查看內存使用狀況... 12

3       安裝JDK. 12

4       複製虛擬機... 12

5       SSH 免密碼登陸... 14

5.1             通常的ssh原理(須要密碼)... 14

5.2             免密碼原理... 14

5.3             SSH免密碼... 14

6       HA+Federation服務器規劃... 15

7       zookeeper. 16

7.1             超級權限... 17

7.2             問題... 17

8       Hadoop. 17

8.1             hadoop-env.sh. 17

8.2             hdfs-site.xml18

8.3             core-site.xml20

8.4             slaves. 20

8.5             yarn-env.sh. 21

8.6             mapred-site.xml21

8.7             yarn-site.xml21

8.8             複製與修改... 22

8.9             啓動ZK. 23

8.10          格式化zkfc. 23

8.11          啓動journalnode. 23

8.12          namenode格式化和啓動... 24

8.13          啓動zkfc. 26

8.14          啓動datanode. 27

8.15          HDFS驗證... 27

8.16          HA驗證... 27

8.16.1       手動切換... 28

8.17          啓動yarn. 28

8.18          MapReduce測試... 29

8.19          腳本... 29

8.19.1       啓動與停用腳本... 29

8.19.2       重啓、關機... 31

8.20          Eclipse插件... 31

8.20.1       插件安裝... 31

8.20.2       WordCount工程... 32

8.20.2.1    WordCount.java. 33

8.20.2.2    yarn-default.xml34

8.20.2.3    build.xml34

8.20.2.4    log4j.properties. 35

8.20.3       打包執行... 35

8.20.4       權限訪問... 36

8.21          殺任務... 36

8.22          日誌... 36

8.22.1       Hadoop系統服務日誌... 36

8.22.2       Mapreduce日誌... 38

8.22.3       System.out. 41

8.22.4       log4j42

9       MySQL. 44

10              HIVE安裝... 46

10.1          三種安裝模式... 46

10.2          遠程模式安裝... 47

11              Scala安裝... 49

12              Spark安裝... 49

12.1          測試... 50

12.2          Hive啓動問題... 52

13              清理與壓縮... 52

14              hadoop2.x經常使用端口... 53

15              Linux命令... 54

16              hadoop文件系統命令... 55

 


 

 

本文檔主要記錄了Hadoop+Hive+Spark集羣安裝過程,而且對NameNodeResourceManager進行了HA高可用配置,以及對NameNode的橫向擴展(Federation聯邦)

 

1       VM網絡配置

將子網IP設置爲192.168.1.0

將網關設置爲192.168.1.2

並禁止DHCP

 

當通過上面配置後,虛擬網卡8IP會變成192.168.1.1

虛擬機與物理機不在一個網段是沒有關係的

2                      CentOS配置

2.1       下載地址

http://mirrors.neusoft.edu.cn/centos/7/isos/x86_64/CentOS-7-x86_64-Minimal-1511.iso

下載不帶桌面的最小安裝版本

2.2       激活網卡

激活網卡,並設置相關IP

網關與DNS設置爲上面虛擬網卡8中設置的網關便可

2.3       SecureCRT

當網卡激活後,就可使用SecureCRT終端遠程鏈接Linux,這樣方便後續操做。如何鏈接這裏省略,

這裏鏈接上後簡單的進行下面設置:

 

2.4       修改主機名

/etc/sysconfig/network

 

/etc/hostname

 

/etc/hosts

192.168.1.11   node1

192.168.1.12   node2

192.168.1.13   node3

192.168.1.14   node4

 

2.5       yum代理上網

因爲公司內部是代理上網,因此yum沒法連網搜索軟件包

yum代理的設置:vi /etc/yum.conf

 

再次運行yum,發現能夠連網搜索軟件包了:

 

2.6       安裝ifconfig

2.7       wget安裝與代理

 

安裝好wget後,在/etc目錄下就會產生wget配置文件wgetrc,在這裏面能夠配置wget代理:

[root@node1 ~]# vi /etc/wgetrc

http_proxy = http://10.19.110.55:8080

https_proxy = http://10.19.110.55:8080

ftp_proxy = http://10.19.110.55:8080

2.8       安裝VMware Tools

爲了虛擬機與主機時間同步,因此須要安裝VMWare Tools

 

[root@node1 opt]# yum -y install perl

[root@node1 ~]# mount /dev/cdrom /mnt

[root@node1 ~]# tar -zxvf /mnt/VMwareTools-9.6.1-1378637.tar.gz -C /root

[root@node1 ~]# umount /dev/cdrom

[root@node1 ~]# /root/vmware-tools-distrib/vmware-install.pl

[root@node1 ~]# rm -rf /root/vmware-tools-distrib

注:下面文件共享與鼠標拖放功能不要安裝,不然安裝過程會出問題:

[root@node1 ~]# chkconfig --list | grep vmware

vmware-tools    0:    1:    2:    3:    4:    5:    6:

vmware-tools-thinprint  0:    1:    2:    3:    4:    5:    6:

[root@node1 ~]# chkconfig vmware-tools-thinprint off

[root@node1 ~]# find / -name *vmware-tools-thinprint* | xargs rm -rf

 

2.9       其餘

2.9.1  問題

剛啓動時會出如下錯誤提示:

修改虛擬機配置文件node1.vmx能夠解決:

vcpu.hotadd = "FALSE"

mem.hotadd = "FALSE"

 

2.9.2  設置

2.9.2.1去掉開機等待時間

[root@node1 ~]# vim /etc/default/grub

GRUB_TIMEOUT=0                                               #默認爲5

 

[root@node1 ~]# grub2-mkconfig -o /boot/grub2/grub.cfg

2.9.2.2VM調整

注:小內存禁用

 

修改node1.vmx文件:

mainMem.useNamedFile = "FALSE"

 

 

爲了全屏顯示,方便命令行輸入,作如下調整:

並去掉狀態欄顯示:

2.9.3  命令

2.9.3.1關機與重啓

[root@node1 ~]# reboot

[root@node1 ~]# shutdown -h now

2.9.3.2服務中止與禁用

#查看開機自啓動服務

[root@node1 ~]# systemctl list-unit-files | grep enabled | sort

auditd.service                               enabled

crond.service                               enabled

dbus-org.freedesktop.NetworkManager.service enabled

dbus-org.freedesktop.nm-dispatcher.service  enabled

default.target                              enabled

dm-event.socket                             enabled

getty@.service                              enabled

irqbalance.service                          enabled

lvm2-lvmetad.socket                         enabled

lvm2-lvmpolld.socket                        enabled

lvm2-monitor.service                        enabled

microcode.service                           enabled

multi-user.target                           enabled

NetworkManager-dispatcher.service           enabled

NetworkManager.service                      enabled

postfix.service                             enabled

remote-fs.target                            enabled

rsyslog.service                             enabled

sshd.service                                enabled

systemd-readahead-collect.service           enabled

systemd-readahead-drop.service              enabled

systemd-readahead-replay.service            enabled

tuned.service                               enabled

 

[root@node1 ~]#  systemctl | grep running | sort 

crond.service                   loaded active running   Command Scheduler

dbus.service                    loaded active running   D-Bus System Message Bus

dbus.socket                     loaded active running   D-Bus System Message Bus Socket

getty@tty1.service              loaded active running   Getty on tty1

irqbalance.service              loaded active running   irqbalance daemon

lvm2-lvmetad.service            loaded active running   LVM2 metadata daemon

lvm2-lvmetad.socket             loaded active running   LVM2 metadata daemon socket

NetworkManager.service          loaded active running   Network Manager

polkit.service                  loaded active running   Authorization Manager

postfix.service                 loaded active running   Postfix Mail Transport Agent

rsyslog.service                 loaded active running   System Logging Service

session-1.scope                 loaded active running   Session 1 of user root

session-2.scope                 loaded active running   Session 2 of user root

session-3.scope                 loaded active running   Session 3 of user root

sshd.service                    loaded active running   OpenSSH server daemon

systemd-journald.service        loaded active running   Journal Service

systemd-journald.socket         loaded active running   Journal Socket

systemd-logind.service          loaded active running   Login Service

systemd-udevd-control.socket    loaded active running   udev Control Socket

systemd-udevd-kernel.socket     loaded active running   udev Kernel Socket

systemd-udevd.service           loaded active running   udev Kernel Device Manager

tuned.service                   loaded active running   Dynamic System Tuning Daemon

vmware-tools.service            loaded active running   SYSV: Manages the services needed to run VMware software

wpa_supplicant.service          loaded active running   WPA Supplicant daemon

 

#查看一個服務的狀態

systemctl status auditd.service

 

#開機時啓用一個服務

systemctl enable auditd.service

 

#開機時關閉一個服務

systemctl disable auditd.service

systemctl disable postfix.service

systemctl disable rsyslog.service

systemctl disable wpa_supplicant.service

 

#查看服務是否開機啓動

systemctl is-enabled auditd.service

2.9.3.3查大文件目錄

find . -type f -size +10M  -print0 | xargs -0 du -h | sort -nr

 

將前最大的前20目錄列出來,--max-depth表示目錄深度,若是去掉,則遍歷全部子目錄:

du -hm --max-depth=5 / | sort -nr | head -20

 

find /etc -name '*srm*'  #表示在/etc目錄下查找文件名中含有字符

2.9.3.4查看磁盤使用狀況

[root@node1 dev]# df -h

文件系統                 容量  已用  可用 已用% 掛載點

/dev/mapper/centos-root   50G  1.5G   49G    3% /

devtmpfs                 721M     0  721M    0% /dev

tmpfs                    731M     0  731M    0% /dev/shm

tmpfs                    731M  8.5M  723M    2% /run

tmpfs                    731M     0  731M    0% /sys/fs/cgroup

/dev/mapper/centos-home   47G   33M   47G    1% /home

/dev/sda1                497M  106M  391M   22% /boot

tmpfs                    147M     0  147M    0% /run/user/0

2.9.3.5查看內存使用狀況

[root@node1 dev]# top

3                      安裝JDK

JDK全部舊版本在官網中的下載地址:http://www.oracle.com/technetwork/java/archive-139210.html

 

在線下載jdk-8u72-linux-x64.tar.gz,並存放在/root下:

wget -O /root/jdk-8u92-linux-x64.tar.gz http://120.52.72.24/download.oracle.com/c3pr90ntc0td/otn/java/jdk/8u92-b14/jdk-8u92-linux-x64.tar.gz

 

 

[root@node1 ~]# tar -zxvf /root/jdk-8u92-linux-x64.tar.gz -C /root

 

[root@node1 ~]# vi /etc/profile

 

/etc/profile文件的最末加上以下內容:

export JAVA_HOME=/root/jdk1.8.0_92
export PATH=.:$PATH:$JAVA_HOME/bin

export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

 

[root@node1 ~]# source /etc/profile

[root@node1 ~]# java -version

java version "1.8.0_92"

Java(TM) SE Runtime Environment (build 1.8.0_92-b14)

Java HotSpot(TM) 64-Bit Server VM (build 25.92-b14, mixed mode)

 

使用env命令查看當前設置的環境變量是否正確:

[root@node1 ~]# env | grep CLASSPATH

CLASSPATH=.:/root/jdk1.8.0_92/jre/lib/rt.jar:/root/jdk1.8.0_92/lib/dt.jar:/root/jdk1.8.0_92/lib/tools.jar

4                      複製虛擬機

前面只安裝一臺node1的物理機,現從node1複製出node2\node3\node3

node1

192.168.1.11

node2

192.168.1.12

node3

192.168.1.13

node4

192.168.1.14

修改相應虛擬機顯示名:

 

開機時選擇複製:

 

 

修改主機名:

[root@node1 ~]# vi /etc/sysconfig/network

 

[root@node1 ~]# vi /etc/hostname

5                      SSH 免密碼登陸

RSA加密算法是一種典型的非對稱加密算法

RSA算法能夠用於數據加密公鑰加密,私鑰解密)和數字簽名或認證私鑰加密,公鑰解密

 

5.1       通常的ssh原理(須要密碼)

客戶端向服務器端發出鏈接請求

服務器端向客戶端發出本身的公鑰

客戶端使用服務器端的公鑰加密通信登陸密碼而後發給服務器端

若是通信過程被截獲,因爲竊聽者即便獲知公鑰和通過公鑰加密的內容,但不擁有私鑰依然沒法解密(RSA算法)

服務器端接收到密文後,用私鑰解密,獲知通信密碼

5.2       免密碼原理

先在客戶端建立一對密匙,並把公用密匙放在須要訪問的服務器上

客戶端向服務器發出請求,請求用你的密匙進行安全驗證

   服務器收到請求以後, 先在該服務器上你的主目錄下尋找你的公用密匙,而後把它和你發送過來的公用密匙進行比較。若是兩個密匙一致, 服務器就用公用密匙加密「質詢」(challenge)並把它發送給客戶端

客戶端收到「質詢」以後就能夠用本身的私人密匙解密再把它發送給服務器

服務器比較發來的「質詢」和原先的是否一致,若是一致則進行受權,完成創建會話的操做

 

5.3       SSH免密碼

先刪除之前生成的:

rm -rf /root/.ssh

生成密鑰:

[root@node1 ~]# ssh-keygen -t rsa

[root@node2 ~]# ssh-keygen -t rsa

[root@node3 ~]# ssh-keygen -t rsa

[root@node4 ~]# ssh-keygen -t rsa

命令「ssh-keygen -t rsa」表示使用 rsa 加密方式生成密鑰, 回車後,會提示三次輸入信息,咱們直接回車便可。

 

查看生成的密鑰:

其中id_rsa.pub爲公鑰,id_rsa爲私鑰

 

服務器之間公鑰拷貝:

ssh-copy-id -i /root/.ssh/id_rsa.pub <主機名>

表示將本機的公鑰拷貝到hadoop-slave1主機上去,並自動追加到authorized_keys文件中去,若是不存在則會自動建立一個。若是是本身遠程本身時,主機就填本身

[root@node1 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub node1

[root@node1 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub node2

[root@node1 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub node3

[root@node1 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub node4

 

[root@node2 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub node1

[root@node2 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub node2

[root@node2 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub node3

[root@node2 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub node4

 

[root@node3 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub node1

[root@node3 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub node2

[root@node3 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub node3

[root@node3 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub node4

 

[root@node4 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub node1

[root@node4 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub node2

[root@node4 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub node3

[root@node4 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub node4

注:若是發現三臺虛擬機上生成的公鑰都是一個時,請先刪除/etc/udev/rules.d/70-persistent-net.rules 文件,再刪除 /root/.ssh文件夾後,從新生成

 

6                      HA+Federation服務器規劃

 

 

node1

node2

node3

node4

NameNode

Hadoop

Y(屬於cluster1

Y集羣1

Y(屬於cluster2

Y集羣2

DateNode

 

Y

Y

Y

NodeManager

 

Y

Y

Y

JournalNodes

Y

Y

Y

 

zkfcDFSZKFailoverController

Y(有namenode的地方

Y就有zkfc

Y

Y

ResourceManager

Y

Y

 

 

ZooKeeperQuorumPeerMain

Zookeeper

Y

Y

Y

 

MySQL

HIVE

 

 

 

Y

metastoreRunJar

 

 

Y

 

HIVERunJar

Y

 

 

 

Scala

Spark

Y

Y

Y

Y

Spark-master

Y

 

 

 

Spark-worker

 

Y

Y

Y

 

不一樣的NameNode經過同一ClusterID來共用同一套DataNode

說明: HDFS Federation Architecture

 

NS-n單元:

說明: Hadoop-HA

說明: MapReduce NextGen Architecture

 

7                      zookeeper

[root@node1 ~]# wget -O /root/zookeeper-3.4.9.tar.gz https://mirrors.tuna.tsinghua.edu.cn/apache/zookeeper/zookeeper-3.4.9/zookeeper-3.4.9.tar.gz

 

[root@node1 ~]# tar -zxvf /root/zookeeper-3.4.9.tar.gz -C /root

 

[root@node1 conf]# cp /root/zookeeper-3.4.9/conf/zoo_sample.cfg /root/zookeeper-3.4.9/conf/zoo.cfg

 

[root@node1 conf]# vi /root/zookeeper-3.4.9/conf/zoo.cfg

 

[root@node1 conf]# mkdir /root/zookeeper-3.4.9/zkData

[root@node1 conf]# touch /root/zookeeper-3.4.9/zkData/myid

[root@node1 conf]# echo 1 > /root/zookeeper-3.4.9/zkData/myid

 

[root@node1 conf]# scp -r /root/zookeeper-3.4.9 node2:/root

[root@node1 conf]# scp -r /root/zookeeper-3.4.9 node3:/root

[root@node2 conf]# echo 2 > /root/zookeeper-3.4.9/zkData/myid

[root@node3 conf]# echo 3 > /root/zookeeper-3.4.9/zkData/myid

7.1       超級權限

[root@node1 ~]# vi /root/zookeeper-3.4.9/bin/zkServer.sh

在下面啓動Java的地方加上啓動參數"-Dzookeeper.DigestAuthenticationProvider.superDigest=super:Q9YtF+3h9Ko5UNT8apBWr8hovH4="super後面是密碼(AAAaaa111):

 

[root@node1 ~]# /root/zookeeper-3.4.9/bin/zkCli.sh

[zk: localhost:2181(CONNECTED) 11] addauth digest super:AAAaaa111

如今就能夠任意刪除節點數據了:

[zk: localhost:2181(CONNECTED) 15] rmr /rmstore/ZKRMStateRoot

7.2       問題

zookeeper沒法啓動"Unable to load database on disk"

 

[root@node3 ~]# more zookeeper.out

2017-01-24 11:31:31,827 [myid:3] - ERROR [main:QuorumPeer@557] - Unable to load database on disk

java.io.IOException: The accepted epoch, d is less than the current epoch, 17

        at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:554)

        at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:500)

        at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:153)

        at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111)

        at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)

 

[root@node3 ~]# more /root/zookeeper-3.4.9/conf/zoo.cfg | grep dataDir

dataDir=/root/zookeeper-3.4.9/zkData

[root@node3 ~]# ls /root/zookeeper-3.4.9/zkData

myid  version-2  zookeeper_server.pid

清空version-2下的全部文件:

[root@node3 ~]# rm -f /root/zookeeper-3.4.9/zkData/version-2/*.*

[root@node3 ~]# rm -rf /root/zookeeper-3.4.9/zkData/version-2/acceptedEpoch

[root@node3 ~]# rm -rf /root/zookeeper-3.4.9/zkData/version-2/currentEpoch

8                      Hadoop

[root@node1 ~]# wget -O /root/hadoop-2.7.2.tar.gz  http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz

[root@node1 ~]# tar -zxvf /root/hadoop-2.7.2.tar.gz -C /root

8.1       hadoop-env.sh

[root@node1 ~]# vi /root/hadoop-2.7.2/etc/hadoop/hadoop-env.sh

 

下面這個存放PID進程號的位置必定要修改,不然可能會出現:XXX running as process 1609. Stop it first.

8.2       hdfs-site.xml

[root@node1 ~]# vi /root/hadoop-2.7.2/etc/hadoop/hdfs-site.xml

<configuration>

   

       <property>

               <name>dfs.replication</name>

               <value>2</value>

<description>指定DataNode存儲block的副本數量。默認值是3個,咱們如今有4DataNode,該值不大於4便可</description>

        </property>

 

<property>

  <name>dfs.blocksize</name>

  <value>134217728</value>

  <description>

      The default block size for new files, in bytes.

      You can use the following suffix (case insensitive):

      k(kilo), m(mega), g(giga), t(tera), p(peta), e(exa) to specify the size (such as 128k, 512m, 1g, etc.),

      Or provide complete size in bytes (such as 134217728 for 128 MB).

           注:1.X及之前版本默認是64M,並且配置項名爲dfs.block.size

  </description>

</property>

 

<property>

     <name>dfs.permissions.enabled</name>

     <value>false</value>

     <description>注:若是還有權限問題,請執行下「/root/hadoop-2.7.2/bin/hdfs dfs -chmod -R 777 /」命令</description>

</property>

 

<property>

  <name>dfs.nameservices</name>

  <value>cluster1,cluster2</value>

<description>使用federation時,使用了2HDFS集羣。這裏抽象出兩個NameService實際上就是給這2HDFS集羣起了個別名。名字能夠隨便起,相互不重複便可。多個集羣時使用逗號分開。注:這裏的命名只是個邏輯空間的概念,不是集羣1、集羣2兩集羣,應該是 cluster1+cluster2 才組成一個集羣,cluster1cluster2只是集羣的一部分,從邏輯上將整個集羣分紅了兩部分(固然還要以加一個高可用NameNode進來,組成第三部分),cluster1cluster2是否屬於同一集羣,則是是clusterID決定的,clusterID這個值是在格式化NameNode時指定的,請參照namenode格式化和啓動</description>

</property>

<property>

  <name>dfs.ha.namenodes.cluster1</name>

  <value>nn1,nn2</value>

<description>集羣1裏面NameNode的邏輯名,注:只是隨便命的邏輯名,這裏不是真實的NameNode主機名,後面配置才指定到主機</description>

</property>

<property>

  <name>dfs.ha.namenodes.cluster2</name>

  <value>nn3,nn4</value>

<description>集羣2裏的NameNode邏輯名</description>

</property>

 

<!-- 下面配置實現邏輯名與物理主機綁定-->

<property>

  <name>dfs.namenode.rpc-address.cluster1.nn1</name>

  <value>node1:8020</value>

<description>8020HDFS 客戶端接入地址(包括命令行與程序),有的使用9000</description>

</property>

<property>

  <name>dfs.namenode.rpc-address.cluster1.nn2</name>

  <value>node2:8020</value>

</property>

<property>

  <name>dfs.namenode.rpc-address.cluster2.nn3</name>

  <value>node3:8020</value>

</property>

<property>

  <name>dfs.namenode.rpc-address.cluster2.nn4</name>

  <value>node4:8020</value>

</property>

<property>

  <name>dfs.namenode.http-address.cluster1.nn1</name>

  <value>node1:50070</value>

<description> namenode web的接入地址</description>

</property>

<property>

  <name>dfs.namenode.http-address.cluster1.nn2</name>

  <value>node2:50070</value>

</property>

<property>

  <name>dfs.namenode.http-address.cluster2.nn3</name>

  <value>node3:50070</value>

</property>

<property>

  <name>dfs.namenode.http-address.cluster3.nn4</name>

  <value>node4:50070</value>

</property>

 

<property>

  <name>dfs.namenode.shared.edits.dir</name>

  <value>qjournal://node1:8485;node2:8485;node3:8485/cluster1</value>

<description>指定cluster1的兩個NameNode共享edits文件目錄時,使用的JournalNode集羣信息。

node1\node2主機中使用這個配置</description>

</property>

<!--

<property>

  <name>dfs.namenode.shared.edits.dir</name>

  <value>qjournal://node1:8485;node2:8485;node3:8485/cluster2</value>

<description>指定cluster2的兩個NameNode共享edits文件目錄時,使用的JournalNode集羣信息。

node3\node4主機中使用這個配置</description>

</property>

-->

 

<property>

<name>dfs.ha.automatic-failover.enabled.cluster1</name>

<value>true</value>

<description>指定cluster1是否啓動自動故障恢復,即當NameNode出故障時,是否自動切換到另外一臺NameNode</description>

</property>

<property>

<name>dfs.ha.automatic-failover.enabled.cluster2</name>

<value>true</value>

</property>

<property>

  <name>dfs.client.failover.proxy.provider.cluster1</name>

  <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>

<description>指定cluster1出故障時,哪一個Java類負責執行故障切換</description>

</property>

<property>

  <name>dfs.client.failover.proxy.provider.cluster2</name>

  <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>

</property>

 

<property>

  <name>dfs.journalnode.edits.dir</name>

  <value>/root/hadoop-2.7.2/tmp/journal</value>

<description>指定JournalNode自身存儲數據的磁盤路徑</description>

</property>

 

<property>

  <name>dfs.ha.fencing.methods</name>

  <value>sshfence</value>

  <description>NameNode使用SSH進行主備切換</description>

</property>

<property>

  <name>dfs.ha.fencing.ssh.private-key-files</name>

  <value>/root/.ssh/id_rsa</value>

<description>若是使用ssh進行故障切換,使用ssh通訊時用的密鑰存儲的位置</description>

</property>

 

</configuration>

8.3       core-site.xml

[root@node1 ~]# vi /root/hadoop-2.7.2/etc/hadoop/core-site.xml

<configuration>

       <property>

                <name>fs.defaultFS</name>

                <value>hdfs://cluster1:8020</value>

                <description>在使用客戶端(或程序)時,若是不指定具體的接入地址?該值來自於hdfs-site.xml中的配置。注:全部主機上配置都同樣</description>

       </property>

       <property>

               <name>hadoop.tmp.dir</name>

               <value>/root/hadoop-2.7.2/tmp</value>

               <description>這裏的路徑默認是NameNodeDataNodeJournalNode等存放數據的公共目錄</description>

       </property>

<property>

   <name>ha.zookeeper.quorum</name>

   <value>node1:2181,node2:2181,node3:2181</value>

   <description>這裏是ZooKeeper集羣的地址和端口。注意,數量必定是奇數,且很多於三個節點</description>

</property>

 

 

<!-- 下面的配置可解決NameNode鏈接JournalNode超時異常問題-->

<property>

  <name>ipc.client.connect.retry.interval</name>

  <value>10000</value>

  <description>Indicates the number of milliseconds a client will wait for

    before retrying to establish a server connection.

  </description>

</property>

 

</configuration>

8.4       slaves

指定DataNode所在主機:

[root@node1 ~]# vi /root/hadoop-2.7.2/etc/hadoop/slaves

8.5       yarn-env.sh

[root@node1 ~]# vi /root/hadoop-2.7.2/etc/hadoop/yarn-env.sh

8.6       mapred-site.xml

[root@node1 ~]# vi /root/hadoop-2.7.2/etc/hadoop/mapred-site.xml

<configuration>

          <property>

 <name>mapreduce.framework.name</name>

                <value>yarn</value>

<description>指定mapreduce運行在Yarn框架下</description>

           </property>

 

    <property>

       <name>mapreduce.jobhistory.address</name>

       <value>node1:10020</value>

<description>注:每臺機器上配置都不同,須要修改爲對應的主機名,端口不用修改,好比node2:10020node3:10020node4:10020,,拷貝過去後請作相應修改</description>

    </property>

 

    <property>

       <name>mapreduce.jobhistory.webapp.address</name>

       <value>node1:19888</value>

       <description>注:每臺機器上配置都不同,須要修改爲對應的主機名,端口不用修改,好比node2:19888node3:19888node4:19888,拷貝過去後請作相應修改</description>

</property>

</configuration>

8.7       yarn-site.xml

[root@node1 ~]# vi /root/hadoop-2.7.2/etc/hadoop/yarn-site.xml

<configuration>

        <property>

               <name>yarn.nodemanager.aux-services</name>

               <value>mapreduce_shuffle</value>

        </property>

        <property>                                                               

<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>

               <value>org.apache.hadoop.mapred.ShuffleHandler</value>

        </property>

 

<property>

  <name>yarn.resourcemanager.ha.enabled</name>

  <value>true</value>

</property>

 

<property>

  <name>yarn.resourcemanager.cluster-id</name>

  <value>yarn-cluster</value>

</property>

<property>

  <name>yarn.resourcemanager.ha.rm-ids</name>

  <value>rm1,rm2</value>

</property>

<property>

  <name>yarn.resourcemanager.hostname.rm1</name>

  <value>node1</value>

</property>

<property>

  <name>yarn.resourcemanager.hostname.rm2</name>

  <value>node2</value>

</property>

<property>

  <name>yarn.resourcemanager.webapp.address.rm1</name>

  <value>node1:8088</value>

</property>

<property>

  <name>yarn.resourcemanager.webapp.address.rm2</name>

  <value>node2:8088</value>

</property>

<property>

  <name>yarn.resourcemanager.zk-address</name>

  <value>node1:2181,node2:2181,node3:2181</value>

</property>

 

<property>

<name>yarn.resourcemanager.recovery.enabled</name>

<value>true</value>

</property>

<property>

<name>yarn.resourcemanager.store.class</name>

<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>

<description>RM的數據默認存放在ZK上的/rmstore中,可經過yarn.resourcemanager.zk-state-store.parent-path 設定</description>

</property>

 

 

<property>

<name>yarn.log-aggregation-enable</name>  

<value>true</value>

<description>開啓日誌收集,這樣會將每臺執行任務的機上產生的本地日誌文件集中拷貝到HDFS的某個地方,這樣就能夠在任何一臺集羣中的機器上集中查看做業日誌了</description>

</property>

 

<property>

  <name>yarn.log.server.url</name>

  <value>http://node1:19888/jobhistory/logs</value>

  <description>注:每臺機器上配置都不同,須要修改爲對應的主機名,端口不用修改,好比http://node2:19888/jobhistory/logshttp://node3:19888/jobhistory/logshttp://node4:19888/jobhistory/logs,拷貝過去後請作相應修改</description>

</property>

 

 

</configuration>

8.8       複製與修改

[root@node1 ~]# scp -r /root/hadoop-2.7.2/ node2:/root

[root@node1 ~]# scp -r /root/hadoop-2.7.2/ node3:/root

[root@node1 ~]# scp -r /root/hadoop-2.7.2/ node4:/root

 

[root@node3 ~]# vi /root/hadoop-2.7.2/etc/hadoop/hdfs-site.xml

[root@node3 ~]# scp /root/hadoop-2.7.2/etc/hadoop/hdfs-site.xml node4:/root/hadoop-2.7.2/etc/hadoop

 

[root@node2 ~]# vi /root/hadoop-2.7.2/etc/hadoop/mapred-site.xml

[root@node3 ~]# vi /root/hadoop-2.7.2/etc/hadoop/mapred-site.xml

[root@node4 ~]# vi /root/hadoop-2.7.2/etc/hadoop/mapred-site.xml

 

[root@node2 ~]# vi /root/hadoop-2.7.2/etc/hadoop/yarn-site.xml

[root@node3 ~]# vi /root/hadoop-2.7.2/etc/hadoop/yarn-site.xml

[root@node4 ~]# vi /root/hadoop-2.7.2/etc/hadoop/yarn-site.xml

8.9       啓動ZK

[root@node1 bin]# /root/zookeeper-3.4.9/bin/zkServer.sh start

[root@node2 bin]# /root/zookeeper-3.4.9/bin/zkServer.sh start

[root@node3 bin]# /root/zookeeper-3.4.9/bin/zkServer.sh start

[root@node1 bin]# jps

1622 QuorumPeerMain

 

查看狀態:

[root@node1 ~]# /root/zookeeper-3.4.9/bin/zkServer.sh status

ZooKeeper JMX enabled by default

Using config: /root/zookeeper-3.4.9/bin/../conf/zoo.cfg

Mode: follower

[root@node2 ~]# /root/zookeeper-3.4.9/bin/zkServer.sh status

ZooKeeper JMX enabled by default

Using config: /root/zookeeper-3.4.9/bin/../conf/zoo.cfg

Mode: leader

 

查看數據節點:

[root@node1 hadoop-2.7.2]# /root/zookeeper-3.4.9/bin/zkCli.sh

[zk: localhost:2181(CONNECTED) 0] ls /

[zookeeper]

8.10格式化zkfc

在每一個集羣上的任意一節點上進行操做,目的是在Zookeeper集羣上創建HA的相應Znode節點數據

 

[root@node1 ~]# /root/hadoop-2.7.2/bin/hdfs zkfc -formatZK

[root@node3 ~]# /root/hadoop-2.7.2/bin/hdfs zkfc -formatZK

 

格式化後,會在ZK上建立hadoop-ha名稱的Znode數據節點:

[root@node1 ~]# /root/zookeeper-3.4.9/bin/zkCli.sh

8.11啓動journalnode

[root@node1 ~]# /root/hadoop-2.7.2/sbin/hadoop-daemon.sh start journalnode

[root@node2 ~]# /root/hadoop-2.7.2/sbin/hadoop-daemon.sh start journalnode

[root@node3 ~]# /root/hadoop-2.7.2/sbin/hadoop-daemon.sh start journalnode

[root@node1 ~]# jps

1810 JournalNode

8.12namenode格式化和啓動

[root@node1 ~]# /root/hadoop-2.7.2/bin/hdfs namenode -format -clusterId CLUSTER_UUID_1

[root@node1 ~]# /root/hadoop-2.7.2/sbin/hadoop-daemon.sh start namenode

[root@node1 ~]# jps

1613 NameNode

 

同一集羣中的全部集羣ID必須相同(包括NameNodeDataNode等):

[root@node2 ~]# /root/hadoop-2.7.2/bin/hdfs namenode -bootstrapStandby

[root@node2 ~]# /root/hadoop-2.7.2/sbin/hadoop-daemon.sh start namenode

 

[root@node3 ~]# /root/hadoop-2.7.2/bin/hdfs namenode -format -clusterId CLUSTER_UUID_1

[root@node3 ~]# /root/hadoop-2.7.2/sbin/hadoop-daemon.sh start namenode

[root@node4 ~]# /root/hadoop-2.7.2/bin/hdfs namenode -bootstrapStandby

[root@node4 ~]# /root/hadoop-2.7.2/sbin/hadoop-daemon.sh start namenode

8.13啓動zkfc

ZKFCzookeeper Failover Controller)是用來監控NameNode狀態的,協助實現主備NameNode切換的,在全部NameNode上執行

 

[root@node1 ~]# /root/hadoop-2.7.2/sbin/hadoop-daemon.sh start zkfc

[root@node2 ~]# /root/hadoop-2.7.2/sbin/hadoop-daemon.sh start zkfc

[root@node1 ~]# jps

5280 DFSZKFailoverController

 

自動切換成功:

 

[root@node3 ~]# /root/hadoop-2.7.2/sbin/hadoop-daemon.sh start zkfc

[root@node4 ~]# /root/hadoop-2.7.2/sbin/hadoop-daemon.sh start zkfc

 

8.14啓動datanode

[root@node2 ~]# /root/hadoop-2.7.2/sbin/hadoop-daemon.sh start datanode

[root@node3 ~]# /root/hadoop-2.7.2/sbin/hadoop-daemon.sh start datanode

[root@node4 ~]# /root/hadoop-2.7.2/sbin/hadoop-daemon.sh start datanode

8.15HDFS驗證

上傳到指定的集羣2中:

[root@node1 ~]# /root/hadoop-2.7.2/bin/hdfs dfs -put /root/hadoop-2.7.2.tar.gz hdfs://cluster2/

[root@node1 ~]# /root/hadoop-2.7.2/bin/hdfs dfs -put /root/test_upload.tar hdfs://cluster1:8020/

上傳時若是未明確指定路徑,則會默認使用core-site.xml配置文本中的fs.defaultFS配置項:

[root@node1 ~]# /root/hadoop-2.7.2/bin/hdfs dfs -put /root/hadoop-2.7.2.tar.gz /

也能夠具體到某臺主機(但要是處於激活狀態):

/root/hadoop-2.7.2/bin/hdfs dfs -put /root/hadoop-2.7.2.tar hdfs://node3:8020/

/root/hadoop-2.7.2/bin/hdfs dfs -put /root/hadoop-2.7.2.tar hdfs://node3/

 

8.16HA驗證

[root@node1 ~]# /root/hadoop-2.7.2/bin/hdfs haadmin -ns cluster1 -getServiceState nn1

active

[root@node1 ~]# /root/hadoop-2.7.2/bin/hdfs haadmin -ns cluster1 -getServiceState nn2

standby

[root@node1 ~]# jps

2448 NameNode

3041 DFSZKFailoverController

3553 Jps

2647 JournalNode

2954 QuorumPeerMain

[root@node1 ~]# kill 2448

[root@node1 ~]# /root/hadoop-2.7.2/bin/hdfs haadmin -ns cluster1 -getServiceState nn2

active

8.16.1              手動切換

/root/hadoop-2.7.2/bin/hdfs haadmin -ns cluster1 -failover nn2 nn1

/root/hadoop-2.7.2/bin/hdfs haadmin -ns cluster2 -failover nn4 nn3

8.17啓動yarn

[root@node1 ~]# /root/hadoop-2.7.2/sbin/yarn-daemon.sh start resourcemanager

[root@node2 ~]# /root/hadoop-2.7.2/sbin/yarn-daemon.sh start resourcemanager

 

[root@node2 ~]# /root/hadoop-2.7.2/sbin/yarn-daemon.sh start nodemanager

[root@node3 ~]# /root/hadoop-2.7.2/sbin/yarn-daemon.sh start nodemanager

[root@node4 ~]# /root/hadoop-2.7.2/sbin/yarn-daemon.sh start nodemanager

 

http://node1:8088/cluster/cluster

注:輸入地址爲http://XXXXX/cluster/cluster形式,不然若是是備用的則會自動跳轉到激活主機上面去

http://node2:8088/cluster/cluster

 

查看狀態命令:

[root@node4 logs]# /root/hadoop-2.7.2/bin/yarn rmadmin -getServiceState rm2

8.18MapReduce測試

[root@node4 ~]# /root/hadoop-2.7.2/bin/hdfs dfs -mkdir hdfs://cluster1/hadoop

[root@node4 ~]# /root/hadoop-2.7.2/bin/hdfs dfs -put /root/hadoop-2.7.2/etc/hadoop/*xml* hdfs://cluster1/hadoop

[root@node4 ~]# /root/hadoop-2.7.2/bin/hadoop jar /root/hadoop-2.7.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount hdfs://cluster1:8020/hadoop/h* hdfs://cluster1:8020/hadoop/m* hdfs://cluster1/wordcountOutput

注:MapReduce的輸出要與其輸入在同一集羣。雖然能夠放在另外一集羣時也要執行成功,但經過Web查看輸出結果文件時,會找不到

8.19腳本

如下腳本放在node1上運行

8.19.1              啓動與停用腳本

自動交互

在經過腳本進行RM手動切換時使用

[root@node1 ~]# yum install expect

 

[root@node1 ~]# vi /root/starthadoop.sh

#rm -rf /root/hadoop-2.7.2/logs/*.*

#ssh root@node2 'export BASH_ENV=/etc/profile;rm -rf /root/hadoop-2.7.2/logs/*.*'

#ssh root@node3 'export BASH_ENV=/etc/profile;rm -rf /root/hadoop-2.7.2/logs/*.*'

#ssh root@node4 'export BASH_ENV=/etc/profile;rm -rf /root/hadoop-2.7.2/logs/*.*'

 

/root/zookeeper-3.4.9/bin/zkServer.sh start

ssh root@node2 'export BASH_ENV=/etc/profile;/root/zookeeper-3.4.9/bin/zkServer.sh start'

ssh root@node3 'export BASH_ENV=/etc/profile;/root/zookeeper-3.4.9/bin/zkServer.sh start'

 

/root/hadoop-2.7.2/sbin/start-all.sh

ssh root@node2 'export BASH_ENV=/etc/profile;/root/hadoop-2.7.2/sbin/yarn-daemon.sh start resourcemanager'

 

/root/hadoop-2.7.2/sbin/hadoop-daemon.sh start zkfc

ssh root@node2 'export BASH_ENV=/etc/profile;/root/hadoop-2.7.2/sbin/hadoop-daemon.sh start zkfc'

ssh root@node3 'export BASH_ENV=/etc/profile;/root/hadoop-2.7.2/sbin/hadoop-daemon.sh start zkfc'

ssh root@node4 'export BASH_ENV=/etc/profile;/root/hadoop-2.7.2/sbin/hadoop-daemon.sh start zkfc'

 

#ret=`/root/hadoop-2.7.2/bin/hdfs dfsadmin -safemode get | grep ON | head -1`

#while [ -n "$ret" ]

#do

#echo '等待離開安全模式'

#sleep 1s

#ret=`/root/hadoop-2.7.2/bin/hdfs dfsadmin -safemode get | grep ON | head -1`

#done

 

/root/hadoop-2.7.2/bin/hdfs haadmin -ns cluster1 -failover nn2 nn1

/root/hadoop-2.7.2/bin/hdfs haadmin -ns cluster2 -failover nn4 nn3

echo 'Y' | ssh root@node1 'export BASH_ENV=/etc/profile;/root/hadoop-2.7.2/bin/yarn rmadmin -transitionToActive --forcemanual rm1'

 

/root/hadoop-2.7.2/sbin/mr-jobhistory-daemon.sh start historyserver

ssh root@node2 'export BASH_ENV=/etc/profile;/root/hadoop-2.7.2/sbin/mr-jobhistory-daemon.sh start historyserver'

ssh root@node3 'export BASH_ENV=/etc/profile;/root/hadoop-2.7.2/sbin/mr-jobhistory-daemon.sh start historyserver'

ssh root@node4 'export BASH_ENV=/etc/profile;/root/hadoop-2.7.2/sbin/mr-jobhistory-daemon.sh start historyserver'

 

#此命令行啓動Spark,只安裝Hadoop時去掉

/root/spark-2.1.0-bin-hadoop2.7/sbin/start-all.sh

 

echo '--------------node1---------------'

jps | grep -v Jps | sort  -k 2 -t ' '

echo '--------------node2---------------'

ssh root@node2 "export PATH=/usr/bin:$PATH;jps | grep -v Jps | sort  -k 2 -t ' '"

echo '--------------node3---------------'

ssh root@node3 "export PATH=/usr/bin:$PATH;jps | grep -v Jps | sort  -k 2 -t ' '"

echo '--------------node4---------------'

ssh root@node4 "export PATH=/usr/bin:$PATH;jps | grep -v Jps | sort  -k 2 -t ' '"

 

#下面兩行命令用來啓動Hive,沒有安裝時請去掉

ssh root@node4 'export BASH_ENV=/etc/profile;service mysql start'

ssh root@node3 'export BASH_ENV=/etc/profile;/root/hive-1.2.1/bin/hive --service metastore&'

 [root@node1 ~]# vi /root/stophadoop.sh

#此命令行用來中止Spark,未安裝時去掉

/root/spark-2.1.0-bin-hadoop2.7/sbin/stop-all.sh

#下面兩行用來中止HIVE,未安裝時去掉

ssh root@node4 'export BASH_ENV=/etc/profile;service mysql stop'

ssh root@node3 'export BASH_ENV=/etc/profile;/root/jdk1.8.0_92/bin/jps | grep RunJar | head -1 |cut -f1 -d " "|  xargs kill'

 

ssh root@node2 'export BASH_ENV=/etc/profile;/root/hadoop-2.7.2/sbin/yarn-daemon.sh stop resourcemanager'

/root/hadoop-2.7.2/sbin/stop-all.sh

 

/root/hadoop-2.7.2/sbin/hadoop-daemon.sh stop zkfc

ssh root@node2 'export BASH_ENV=/etc/profile;/root/hadoop-2.7.2/sbin/hadoop-daemon.sh stop zkfc'

ssh root@node3 'export BASH_ENV=/etc/profile;/root/hadoop-2.7.2/sbin/hadoop-daemon.sh stop zkfc'

ssh root@node4 'export BASH_ENV=/etc/profile;/root/hadoop-2.7.2/sbin/hadoop-daemon.sh stop zkfc'

 

/root/zookeeper-3.4.9/bin/zkServer.sh stop

ssh root@node2 'export BASH_ENV=/etc/profile;/root/zookeeper-3.4.9/bin/zkServer.sh stop'

ssh root@node3 'export BASH_ENV=/etc/profile;/root/zookeeper-3.4.9/bin/zkServer.sh stop'

 

/root/hadoop-2.7.2/sbin/mr-jobhistory-daemon.sh stop historyserver

ssh root@node2 'export BASH_ENV=/etc/profile;/root/hadoop-2.7.2/sbin/mr-jobhistory-daemon.sh stop historyserver'

ssh root@node3 'export BASH_ENV=/etc/profile;/root/hadoop-2.7.2/sbin/mr-jobhistory-daemon.sh stop historyserver'

ssh root@node4 'export BASH_ENV=/etc/profile;/root/hadoop-2.7.2/sbin/mr-jobhistory-daemon.sh stop historyserver'

 

[root@node1 ~]# chmod 777 starthadoop.sh stophadoop.sh

8.19.2              重啓、關機

[root@node1 ~]# vi /root/reboot.sh 

ssh root@node2 "export PATH=/usr/bin:$PATH;reboot"

ssh root@node3 "export PATH=/usr/bin:$PATH;reboot"

ssh root@node4 "export PATH=/usr/bin:$PATH;reboot"

reboot

 

[root@node1 ~]# vi /root/shutdown.sh

ssh root@node2 "export PATH=/usr/bin:$PATH;shutdown -h now"

ssh root@node3 "export PATH=/usr/bin:$PATH;shutdown -h now"

ssh root@node4 "export PATH=/usr/bin:$PATH;shutdown -h now"

shutdown -h now

 

[root@node1 ~]# chmod 777 /root/shutdown.sh /root/reboot.sh

 

8.20Eclipse插件

8.20.1              插件安裝

一、  hadoop-2.7.2.tar.gz(前面本身編譯的CentOS版本)解壓到D:\hadoop下,並將winutil.exe.hadoop.dll等文件到hadoop安裝目錄bin文件夾下,再將hadoop.dll放到C:\WindowsC:\Windows\System32下。

二、  添加HADOOP_HOME環境變量,值爲D:\hadoop\hadoop-2.7.2,並將%HADOOP_HOME%\bin添加到Path環境變量中

三、  雙擊winutils.exe,若是出現「缺失MSVCR120.dll」的提示,則安裝VC++2013相關組件

四、  hadoop-eclipse-plugin-2.7.2.jar(該插件包也是要在Windows上進行編譯,很是麻煩,也找現成的吧!)插件包拷貝到Eclipse plugins目錄下

五、  運行Eclipse,進行配置:

  • Map/ReduceV2 Master :這個端口不用管,不影響任務遠程提交與執行。若是配置正確,下面這個就能夠在Eclips直接監視任務執行狀況了(這個搗鼓了好久,仍是沒出來,在hadoop1.2.1卻是搞出來過):

  • DFS Master Name NodeIP和端口,hdfs-site.xmldfs.namenode.rpc-address配置端口,這個配置決定了左邊樹是否能夠連上Hadoopdfs

8.20.2              WordCount工程

8.20.2.1         WordCount.java

package jzj;

 

import java.io.IOException;

import java.net.URI;

import java.util.StringTokenizer;

 

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.FileSystem;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.log4j.Logger;

 

publicclass WordCount {

 

       publicstaticclass TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {

 

              privatefinalstatic IntWritable one = new IntWritable(1);

              private Text word = new Text();

              private Logger log = Logger.getLogger(TokenizerMapper.class);

 

              publicvoid map(Object key, Text value, Context context) throws IOException, InterruptedException {

                     log.debug("[Thread=" + Thread.currentThread().hashCode() + "]  map任務:log4j輸出:wordcountkey=" + key + "value=" + value);

                     System.out.println("[Thread=" + Thread.currentThread().hashCode() + "]  map任務:System.out輸出:wordcountkey=" + key + "value="

                                   + value);

                     StringTokenizer itr = new StringTokenizer(value.toString());

                     while (itr.hasMoreTokens()) {

                            word.set(itr.nextToken());

                            context.write(word, one);

                     }

              }

       }

 

       publicstaticclass IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {

              private IntWritable result = new IntWritable();

              private Logger log = Logger.getLogger(IntSumReducer.class);

 

             publicvoid reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {

                     int sum = 0;

                     for (IntWritable val : values) {

                            sum += val.get();

                     }

                     result.set(sum);

                     context.write(key, result);

                     log.debug("[Thread=" + Thread.currentThread().hashCode() + "]  reduce任務:log4j輸出:wordcountkey=" + key + "count=" + sum);

                     System.out.println("[Thread=" + Thread.currentThread().hashCode() + "]  reduce任務:System.out輸出:wordcountkey=" + key + "count="

                                   + sum);

              }

       }

 

       publicstaticvoid main(String[] args) throws Exception {

              Logger log = Logger.getLogger(WordCount.class);

              log.debug("JOB Main方法:log4j輸出:wordcount");

              System.out.println("JOB Main方法:System.out輸出:wordcount");

              Configuration conf = new Configuration();

              // 注:xxx.jar任務包中須要一個空的yarn-default.xml配置文件,不然任務遠程提交後會一直等待,Why

              conf.set("mapreduce.framework.name", "yarn");// 指定使用yarn框架

              conf.set("yarn.resourcemanager.address", "node1:8032"); // 提交任務到哪臺機器上

              // 須要加上,不然拋異常:java.io.IOException: The ownership on the staging

              // directory /tmp/hadoop-yarn/staging/15040078/.staging

              // is not as expected. It is owned by . The directory must be owned by

              // the submitter 15040078 or by 15040078

              conf.set("fs.defaultFS", "hdfs://node1:8020");// 指定namenode

              // 加上該配置,不然拋異常:Stack trace: ExitCodeException exitCode=1: /bin/bash: 0

              // :fg: 無任務控制

              conf.set("mapreduce.app-submission.cross-platform", "true");

 

              // 此處Keymapred.jar不要修改,值爲本項目導出的Jar,若是不設置,則報找不到類

              conf.set("mapred.jar", "wordcount.jar");

 

              Job job = Job.getInstance(conf, "wordcount");

              job.setJarByClass(WordCount.class);

              job.setMapperClass(TokenizerMapper.class);

              // 若是這裏設置了Combiner,則Map端與會有reduce日誌,緣由設置了Combiner後,Map端作完Map後,會繼續運行reduce任務,因此在Map端也會看到reduce任務日誌就不奇怪了

              // job.setCombinerClass(IntSumReducer.class);

              job.setReducerClass(IntSumReducer.class);

              job.setOutputKeyClass(Text.class);

              job.setOutputValueClass(IntWritable.class);

              // job.setNumReduceTasks(4);

              FileInputFormat.addInputPath(job, new Path("hdfs://node1/hadoop/core-site.xml"));

              FileInputFormat.addInputPath(job, new Path("hdfs://node1/hadoop/m*"));

 

              FileSystem fs = FileSystem.get(URI.create("hdfs://node1"), conf);

              fs.delete(new Path("/wordcountOutput"), true);

 

              FileOutputFormat.setOutputPath(job, new Path("hdfs://node1/wordcountOutput"));

 

              System.exit(job.waitForCompletion(true) ? 0 : 1);

              System.out.println(job.getStatus().getJobID());

       }

}

 

8.20.2.2      yarn-default.xml

注:工程中的yarn-default.xml爲空文件,但經測式必定須要

8.20.2.3      build.xml

<projectdefault="jar"name="Acid">

       <propertyname="lib.dir"value="D:/hadoop/hadoop-2.7.2/share/hadoop"/>

       <propertyname="src.dir"value="../src"/>

       <propertyname="classes.dir"value="../bin"/>

 

       <propertyname="output.dir"value=".."/>

       <propertyname="jarname"value="wordcount.jar"/>

       <propertyname="mainclass"value="jzj.WordCount"/>

 

       <!-- 第三方jar包的路徑 -->

       <pathid="lib-classpath">

              <filesetdir="${lib.dir}">

                     <includename="**/*.jar"/>

              </fileset>

       </path>

 

       <!-- 1. 初始化工做,如建立目錄等 -->

       <targetname="init">

              <mkdirdir="${classes.dir}"/>

              <mkdirdir="${output.dir}"/>

              <deletefile="${output.dir}/wordcount.jar"/>

              <deleteverbose="true"includeemptydirs="true">

                     <filesetdir="${classes.dir}">

                            <includename="**/*"/>

                     </fileset>

              </delete>

 

       </target>

 

       <!-- 2. 編譯 -->

       <targetname="compile"depends="init">

              <javacsrcdir="${src.dir}"destdir="${classes.dir}"includeantruntime="on">

                     <compilerargline="-encoding GBK"/>

                     <classpathrefid="lib-classpath"/>

              </javac>

       </target>

 

 

       <!-- 3. 打包jar文件 -->

       <targetname="jar"depends="compile">

              <copytodir="${classes.dir}">

                     <filesetdir="${src.dir}">

                            <includename="**"/>

                            <excludename="build.xml"/>

                            <!--注:不能排除掉log4j.properties文件,該文件也要一塊兒打包,不然運行時不會顯示日誌

                            該日誌配置文件僅做用於JOB,即會在做業提交的客戶端上產生日誌,而TASKMapReduce任務)

                            則是由/root/hadoop-2.7.2/etc/hadoop/log4j.properties配置文件來決定-->

                            <!--exclude name="log4j.properties" / -->

                     </fileset>

              </copy>

              <!-- jar文件的輸出路徑 -->

              <jardestfile="${output.dir}/${jarname}"basedir="${classes.dir}">

                     <manifest>

                            <attributename="Main-class"value="${mainclass}"/>

                     </manifest>

              </jar>

       </target>

</project>

8.20.2.4      log4j.properties

log4j.rootLogger=info,stdout,R 

log4j.appender.stdout=org.apache.log4j.ConsoleAppender 

log4j.appender.stdout.layout=org.apache.log4j.PatternLayout 

log4j.appender.stdout.layout.ConversionPattern=%5p-%m%n 

log4j.appender.R=org.apache.log4j.RollingFileAppender 

log4j.appender.R.File=mapreduce_test.log 

log4j.appender.R.MaxFileSize=1MB 

log4j.appender.R.MaxBackupIndex=1

log4j.appender.R.layout=org.apache.log4j.PatternLayout 

log4j.appender.R.layout.ConversionPattern=%p%t%c-%m%n 

 

log4j.logger.jzj =DEBUG

8.20.3              打包執行

打開工程中的build.xml構件文件,按 SHIFT+ALT+XQ,便可在工程下打成做業jar包:

包結構以下:

而後打開工程中的WordCount.java源碼文件,點擊:

8.20.4              權限訪問

運行時若是報如下異常:

 

Exception in thread "main" org.apache.hadoop.security.AccessControlException: Permission denied: user=15040078, access=EXECUTE, inode="/tmp/hadoop-yarn/staging/15040078/.staging/job_1484039063795_0001":root:supergroup:drwxrwx---

       at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)

       at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:259)

       at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:205)

       at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)

       at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1720)

       at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1704)

       at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkOwner(FSDirectory.java:1673)

       at org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setPermission(FSDirAttrOp.java:61)

       at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setPermission(FSNamesystem.java:1653)

       at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setPermission(NameNodeRpcServer.java:695)

       at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setPermission(ClientNamenodeProtocolServerSideTranslatorPB.java:453)

       at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)

       at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)

       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)

       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)

       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)

       at java.security.AccessController.doPrivileged(Native Method)

       at javax.security.auth.Subject.doAs(Subject.java:422)

       at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)

       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)

 

[root@node1 ~]# /root/hadoop-2.7.2/bin/hdfs dfs -chmod -R 777 /

8.21殺任務

若是發現任務提交後,中止不前,則能夠殺掉該任務:

[root@node1 ~]# /root/hadoop-2.7.2/bin/hadoop job -list

[root@node1 ~]# /root/hadoop-2.7.2/bin/hadoop job -kill job_1475762778825_0008

8.22日誌

8.22.1              Hadoop系統服務日誌

NameNodesecondarynamenodehistoryserverResourceManageDataNodenodemanager等系統自帶的服務輸出來的日誌默認是存放 ${HADOOP_HOME}/logs目錄下,也能夠經過Web頁面這樣查看:

http://node1:19888/logs/

 

這些日誌實際上對應每臺主機上的本地日誌文件,進入相應主機能夠看到原始文件:

當日志到達必定的大小將會被切割出一個新的文件,後面的數字越大,表明日誌越舊。在默認狀況 下,只保存前20個日誌文件。系統日誌位置及大小都是能夠在 ${HADOOP_HOME}/etc/hadoop/log4j.properties文件中配置的,配置文件中的環境變量由${HADOOP_HOME}/etc/hadoop/目錄下相關配置文件來設定

*.out文件,標準輸出會重定向到這裏

 

http://node2:19888/logs/

http://node3:19888/logs/

http://node4:19888/logs/

也能夠這樣點進來:

 

8.22.2              Mapreduce日誌

Mapreduce日誌能夠分爲歷史做業日誌和Container日誌。
  (1)、歷史做業的記錄裏面包含了一個做業用了多少個Map、用了多少個Reduce、做業提交時間、做業啓動時間、做業完成時間等信息;這些信息對分析做業是頗有幫助的,咱們能夠經過這些歷史做業記錄獲得天天有多少個做業運行成功、有多少個做業運行失敗、每一個隊列做業運行了多少個做業等頗有用的信息。這些歷史做業的信息是經過下面的信息配置的:

注:這一類日誌文件是放在HDFS上面的

2)、Container日誌:包含ApplicationMaster日誌和普通Task日誌等信息。

YARN提供了兩種存放容器(container)日誌的方式:

1)         本地:若是日誌聚合服務被開啓的話(經過yarn.log-aggregation-enable來配置),容器日誌將會被拷貝到HDFS中而且刪除本機上的日誌文件,位置由yarn-site.xml中的yarn.nodemanager.remote-app-log-dir來配置,默認在hdfs://tmp/logs目錄中:

<property>

    <description>Where to aggregate logs to.</description>

    <name>yarn.nodemanager.remote-app-log-dir</name>

    <value>/tmp/logs</value>

  </property>

/tmp/logs下的子目錄默認配置:

<property>

    <description>The remote log dir will be created at {yarn.nodemanager.remote-app-log-dir}/${user}/{thisParam}

    </description>

    <name>yarn.nodemanager.remote-app-log-dir-suffix</name>

    <value>logs</value>

  </property>

默認狀況下,這些日誌信息是存放在${HADOOP_HOME}/logs/userlogs目錄下:

咱們能夠經過下面的配置進行修改:

2)         HDFS:當日志聚合服務關閉時(yarn.log-aggregation-enablefalse),日誌被保留在任務執行的機器本地的$HADOOP_HOME/logs/userlogs,做業執行完後不會被移到HDFS系統中

 

 

經過http://node1:8088/cluster/apps進去點擊便可查看正在運行與已經完成的做業的日誌信息:

 

點擊相應連接能夠查看到每一個MapReduce任務的日誌:

8.22.3              System.out

JOB啓動類main方法中的System.out:會在 Job做業提交節點的終端上輸出。若是在是Eclipse上遠程提交的,會在Eclipse中輸出:

 

 

若是做業提交到遠程服務器上運行,在哪一個節點(jobtracker)上啓動做業,就在哪一個節點終端上顯示輸出:

 

若是是Map或者是reduce類裏輸出的,則會將日誌輸出到 ${HADOOP_HOME}/logs/userlogs目錄下的文件中(若是日誌聚合服務被開啓的話,則任務執行完後會移到HDFS中去存儲,因此在試驗時要在任務運行完以前查看):

 

這些日誌還能夠經過http://node1:8088/cluster/apps頁面查看的

 

8.22.4              log4j

Eclipse中啓動運行:

做業提交代碼(即Main方法)中的日誌、以及做業運行過程當中Eclipse控制檯輸出,是由做業jar打包中的log4j.properties配置文件來決定:

因爲在log4j.properties文件中配置了Console標準輸出,因此在Eclipse控制檯會直接打印出來:

從輸出來看,除了Main方法中的日誌輸出外,還有大量的做業運行過程當中產生的日誌記錄,這些也是log4j輸出的,這全部日誌記錄(Main中的輸出、做業系統框架輸出)都會記錄到mapreduce_test.log文件中去:

 

提交到服務上運行時:此時的配置文件爲/root/hadoop-2.7.2/etc/hadoop/log4j.properties

 

MapReduce任務中的日誌級別是由mapred-site.xml中配置,下面是默認配置:

<property>

  <name>mapreduce.map.log.level</name>

  <value>INFO</value>

  <description>The logging level for the map task. The allowed levels are:

  OFF, FATAL, ERROR, WARN, INFO, DEBUG, TRACE and ALL.

  The setting here could be overridden if "mapreduce.job.log4j-properties-file"

  is set.

  </description>

</property>

 

<property>

  <name>mapreduce.reduce.log.level</name>

  <value>INFO</value>

  <description>The logging level for the reduce task. The allowed levels are:

  OFF, FATAL, ERROR, WARN, INFO, DEBUG, TRACE and ALL.

  The setting here could be overridden if "mapreduce.job.log4j-properties-file"

  is set.

  </description>

</property>

 

MapReduce類中的log4j輸出日誌會直接輸入到${HADOOP_HOME}/logs/userlogs目錄下的相應文件中(若是日誌聚合服務被開啓的話,則任務執行完後會移到HDFS中去存儲),而不是/root/hadoop-2.7.2/etc/hadoop/log4j.properties中配的日誌文件(該配置文件所指定的默認名爲hadoop.log,但一直都沒找到過!?):

 

注:若是這裏設置了Combiner,則Map端與會有reduce日誌,緣由設置了Combiner後,Map端作完Map後,會繼續運行reduce任務,因此在Map端也會看到reduce任務日誌就不奇怪了

9                      MySQL

一、下載mysqlrepo

[root@node4 ~]# wget http://repo.mysql.com/mysql-community-release-el7-5.noarch.rpm

 

二、安裝mysql-community-release-el7-5.noarch.rpm

[root@node4 ~]# rpm -ivh mysql-community-release-el7-5.noarch.rpm

 

安裝這個包後,會得到兩個mysqlyum repo源:/etc/yum.repos.d/mysql-community.repo/etc/yum.repos.d/mysql-community-source.repo

 

三、安裝mysql

[root@node4 ~]# yum install mysql-server

 

四、啓動數據庫

[root@node4 /root]# service mysql start

 

五、修改root的密碼

[root@node4 /root]# mysqladmin -u root password 'AAAaaa111'

 

六、配置遠程訪問,爲了安全,默認狀況只容許本地登陸,限制其餘IP遠程訪問

[root@node4 /root]# mysql -h localhost -u root -p

Enter password: AAAaaa111

mysql> GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY 'AAAaaa111' WITH GRANT OPTION;

mysql> flush privileges;

 

七、查看數據庫字符集

mysql> show variables like 'character%';

八、修改字符集

[root@node4 /root]# vi /etc/my.cnf

[client]

default-character-set=utf8

[mysql]

default-character-set=utf8

[mysqld]

character-set-server=utf8

 

九、大小寫敏感配置

不區分表名的大小寫;

[root@node4 /root]# vi /etc/my.cnf

[mysqld]

lower_case_table_names = 1

其中 0:區分大小寫,1:不區分大小寫

 

十、     重啓服務

[root@node4 /root]# service mysql stop

[root@node4 /root]# service mysql start

 

十一、     [root@node4 /root]# mysql -h localhost -u root -p

 

十二、     字符集修改後再次查看

mysql> show variables like 'character%';

1三、     建立庫

mysql> create database hive;

 

1四、     顯示數據庫

mysql> show databases;

 

1五、     鏈接數據庫

mysql> use hive;

 

1六、     查看庫中有哪些表

mysql> show tables;

 

1七、     退出:

mysql> exit;

 

10               HIVE安裝

10.1三種安裝模式

基本概念:metastore包括兩部分,服務進程數據的存儲

hadoop權威指南 第二版》374頁這張圖:

說明: http://attach.dataguru.cn/attachments/forum/201211/22/200716nfnqd4d334q2qr2q.jpg

1.上方描述的是內嵌模式,特色是:hive服務metastore服務運行在同一個進程中,derby服務也運行在該進程中。
該模式無需特殊配置

2.
中間是本地模式,特色是:hive服務metastore服務運行在同一個進程中,mysql是單獨的進程,能夠在同一臺機器上,也能夠在遠程機器上。
該模式只需將hive-site.xml中的ConnectionURL指向mysql,並配置好驅動名、數據庫鏈接帳號便可

說明: http://attach.dataguru.cn/attachments/forum/201211/22/2012562j7x92wx1x723sxp.jpg

3.下方是遠程模式,特色是:hive服務和metastore不一樣的進程內,多是不一樣的機器
該模式須要將hive.metastore.local設置爲false,並將hive.metastore.uris設置爲metastore服務器URI,若有多個metastore服務器,URI之間用逗號分隔。metastore服務器URI的格式爲thrift://host:portThrift:是hive的通訊協議

<property>
<name>hive.metastore.uris</name>
<value>thrift://127.0.0.1:9083</value>
</property>

把這些理解後,你們就會明白,其實僅鏈接遠程的mysql並不能稱之爲遠程模式,是否遠程指的是metastorehive服務是否在同一進程內,換句話說,指的是metastorehive服務離得

10.2遠程模式安裝

node1上安裝hive,在node3上安裝metastore服務:

一、  下載地址:http://apache.fayea.com/hive

Hadoop版本爲2.7.2,這裏下載apache-hive-1.2.1-bin.tar.gz

[root@node1 ~]# wget http://apache.fayea.com/hive/stable/apache-hive-1.2.1-bin.tar.gz

二、  [root@node1 ~]# tar -zxvf apache-hive-1.2.1-bin.tar.gz

三、  [root@node1 ~]# mv apache-hive-1.2.1-bin hive-1.2.1

四、  [root@node1 ~]# vi /etc/profile

export HIVE_HOME=/root/hive-1.2.1

export PATH=.:$PATH:$JAVA_HOME/bin:$HIVE_HOME/bin

五、  [root@node1 ~]# source /etc/profile

六、  mysql-connector-java-5.6-bin.jar驅動放在 /root/hive-1.2.1/lib/ 目錄下面

七、  [root@node1 ~]# cp /root/hive-1.2.1/conf/hive-env.sh.template /root/hive-1.2.1/conf/hive-env.sh

八、  [root@node1 ~]# vi /root/hive-1.2.1/conf/hive-env.sh

通過上面這些操做後,應該能夠啓動默認配置(數據庫用的是內嵌數據庫derbyHIVE了(注:運行Hive以前要啓動Hadoop):

[root@node1 ~]# hive

Logging initialized using configuration in jar:file:/root/hive-1.2.1/lib/hive-common-1.2.1.jar!/hive-log4j.properties

hive>

九、  node1上的Hive拷貝到node3

[root@node1 ~]# scp -r /root/hive-1.2.1 node3:/root

[root@node1 ~]# scp /etc/profile node3:/etc/profile

[root@node3 ~]# source /etc/profile

 

十、              [root@node1 ~]# vi /root/hive-1.2.1/conf/hive-site.xml

<configuration>

<property>

<name>hive.metastore.uris</name>

<value>thrift://node3:9083</value>

</property>   

</configuration>

 

十一、              [root@node3 ~]# vi /root/hive-1.2.1/conf/hive-site.xml

<configuration>

    <property>

      <name>hive.metastore.warehouse.dir</name>

      <value>/user/hive/warehouse</value>

    </property>

 

    <property>

      <name>javax.jdo.option.ConnectionURL</name>

      <value>jdbc:mysql://node4:3306/hive?createDatabaseIfNotExist=true&amp;characterEncoding=UTF-8</value>

    </property>

 

    <property>

      <name>javax.jdo.option.ConnectionDriverName</name>

      <value>com.mysql.jdbc.Driver</value>

    </property>

 

    <property>

      <name>javax.jdo.option.ConnectionUserName</name>

      <value>root</value>

    </property>

 

    <property>

      <name>javax.jdo.option.ConnectionPassword</name>

      <value>AAAaaa111</value>

    </property>

</configuration>

 

十二、              啓動metastore 服務:

[root@node3 ~]# hive --service metastore&

[1] 2561

Starting Hive Metastore Server

[root@hadoop-slave1 /root]# jps

2561 RunJar

&表示讓metastore服務在後臺運行

 

1三、              啓動Hive Server

[root@node1 ~]# hive --service hiveserver2 &

[1] 3310

[root@hadoop-master /root]# jps

3310 RunJar

進程號名也是RunJar

 

注:不要使用 hive --service hiveserver 來啓動服務,不然會拋異常:

Exception in thread "main" java.lang.ClassNotFoundException: org.apache.hadoop.hive.service.HiveServer

 

直接使用hive命令啓動shell環境時,其實已經順帶啓動了hiveserver,因此遠程模式下其實只須要單獨啓動metastore,而後就能夠進入shell環境正常使用,因此這一步實際上能夠省掉,直接運行hive進入shell環境

 

1四、              啓動hive命令行

[root@hadoop-master /root]# hive

Logging initialized using configuration in jar:file:/root/hive-1.2.1/lib/hive-common-1.2.1.jar!/hive-log4j.properties

hive>

注:啓運hive時會順帶啓動了hiveserver,因此沒有必要運行hive --service hiveserver2 & 命令了

 

1五、              驗證hive

[root@hadoop-master /root]# hive

 

Logging initialized using configuration in jar:file:/root/hive-1.2.1/lib/hive-common-1.2.1.jar!/hive-log4j.properties

hive> show tables;

OK

Time taken: 1.011 seconds

hive> create table test(id int,name string);

可能會出現如下兩種之一的異常:

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:For direct MetaStore DB connections, we don't support retries at the client level.)

 

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:javax.jdo.JDODataStoreException: An exception was thrown while adding/validating class(es) : Specified key was too long; max key length is 767 bytes

com.mysql.jdbc.exceptions.MySQLSyntaxErrorException: Specified key was too long; max key length is 767 bytes

 

這是因爲數據庫字符集引發的,進入mysql修改:

[root@node4 /root]# mysql -h localhost -u root -p

mysql> alter database hive character set latin1;

 

1六、              登陸mySQL查看meta信息

mysql> use hive;

3)登陸hadoop查看

[root@node1 ~]# hadoop-2.7.2/bin/hdfs dfs -ls /user/hive/warehouse

Found 1 items

drwxr-xr-x   - root supergroup          0 2017-01-22 23:45 /user/hive/warehouse/test

11               Scala安裝

一、    [root@node1 ~]# wget -O /root/scala-2.12.1.tgz http://downloads.lightbend.com/scala/2.12.1/scala-2.12.1.tgz

二、    [root@node1 ~]# tar -zxvf /root/scala-2.12.1.tgz

三、    [root@node1 ~]# vi /etc/profile

export SCALA_HOME=/root/scala-2.12.1

export PATH=.:$PATH:$JAVA_HOME/bin:$HIVE_HOME/bin:$SCALA_HOME/bin

四、    [root@node1 ~]# source /etc/profile

五、    [root@node1 ~]# scala -version    

Scala code runner version 2.12.1 -- Copyright 2002-2016, LAMP/EPFL and Lightbend, Inc.

 

[root@node1 ~]# scala

Welcome to Scala 2.12.1 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_92).

Type in expressions for evaluation. Or try :help.

 

scala> 9*9;

res0: Int = 81

 

scala>

六、    [root@node1 ~]# scp -r /root/scala-2.12.1 node2:/root

[root@node1 ~]# scp -r /root/scala-2.12.1 node3:/root

[root@node1 ~]# scp -r /root/scala-2.12.1 node4:/root

[root@node1 ~]# scp /etc/profile node2:/etc

[root@node1 ~]# scp /etc/profile node3:/etc

[root@node1 ~]# scp /etc/profile node4:/etc

[root@node2 ~]# source /etc/profile

[root@node3 ~]# source /etc/profile

[root@node4 ~]# source /etc/profile

12               Spark安裝

一、    [root@node1 ~]# wget -O /root/spark-2.1.0-bin-hadoop2.7.tgz http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz

二、    [root@node1 ~]# tar -zxvf /root/spark-2.1.0-bin-hadoop2.7.tgz

三、    [root@node1 ~]# vi /etc/profile

export SPARK_HOME=/root/spark-2.1.0-bin-hadoop2.7

export PATH=.:$PATH:$JAVA_HOME/bin:$HIVE_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin

四、    [root@node1 ~]# source /etc/profile

五、    [root@node1 ~]# cp /root/spark-2.1.0-bin-hadoop2.7/conf/spark-env.sh.template /root/spark-2.1.0-bin-hadoop2.7/conf/spark-env.sh

六、    [root@node1 ~]# vi /root/spark-2.1.0-bin-hadoop2.7/conf/spark-env.sh

export SCALA_HOME=/root/scala-2.12.1

export JAVA_HOME=//root/jdk1.8.0_92

export HADOOP_CONF_DIR=/root/hadoop-2.7.2/etc/hadoop

七、    [root@node1 ~]# cp /root/spark-2.1.0-bin-hadoop2.7/conf/slaves.template /root/spark-2.1.0-bin-hadoop2.7/conf/slaves

八、    [root@node1 ~]# vi /root/spark-2.1.0-bin-hadoop2.7/conf/slaves

七、    [root@node1 ~]# scp -r /root/spark-2.1.0-bin-hadoop2.7 node2:/root

[root@node1 ~]# scp -r /root/spark-2.1.0-bin-hadoop2.7 node3:/root

[root@node1 ~]# scp -r /root/spark-2.1.0-bin-hadoop2.7 node4:/root

[root@node1 ~]# scp /etc/profile node2:/etc

[root@node1 ~]# scp /etc/profile node3:/etc

[root@node1 ~]# scp /etc/profile node4:/etc

[root@node2 ~]# source /etc/profile

[root@node3 ~]# source /etc/profile

[root@node4 ~]# source /etc/profile

八、    [root@node1 conf]# /root/spark-2.1.0-bin-hadoop2.7/sbin/stop-all.sh

[root@node1 ~]# jps

2569 Master

[root@node2 ~]# jps

2120 Worker

 [root@node3 ~]# jps

2121 Worker

[root@node4 ~]# jps

2198 Worker

 

12.1測試

直接在Spark Shell中進行測試:

[root@node1 conf]# spark-shell

val file=sc.textFile("hdfs://node1/hadoop/core-site.xml")

val rdd = file.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_)

rdd.collect()

rdd.foreach(println)

 

使用SparkHadoop提供的WordCount示例提交測試:

[root@node1 ~]# spark-submit --master spark://node1:7077 --class org.apache.hadoop.examples.WordCount --name wordcount /root/hadoop-2.7.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar hdfs://node1/hadoop/core-site.xml hdfs://node1/output

不過此種狀況仍是提交成MapReduce任務,而不是Spark任務,該示例包jarJava語開發的,而且程序中未使用到Spark

 

使用Spark提供的WordCount示例進行測試:

spark-submit --master spark://node1:7077 --class org.apache.spark.examples.JavaWordCount --name wordcount /root/spark-2.1.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.1.0.jar hdfs://node1/hadoop/core-site.xml hdfs://node1/output

該示例也是Java語句實現,但程序是經過Spark包實現的,因此產生了Spark任務:

12.2Hive啓動問題

Hivespark2.0.0啓動時沒法訪問../lib/spark-assembly-*.jar: 沒有那個文件或目錄的解決辦法

 

[root@node1 ~]# vi /root/hive-1.2.1/bin/hive

  #sparkAssemblyPath=`ls ${SPARK_HOME}/lib/spark-assembly-*.jar`

  sparkAssemblyPath=`ls ${SPARK_HOME}/jars/*.jar`

[root@node1 ~]# scp /root/hive-1.2.1/bin/hive node3:/root/hive-1.2.1/bin

 

13               清理與壓縮

yum 會把下載的軟件包和header存儲在cache中,而不會自動刪除。清除YUM緩存:

[root@node1 ~]# yum clean all

[root@node1 ~]# dd if=/dev/zero of=/0bits bs=20M       //將碎片空間填充上0,結束的時候會提示磁盤空間不足,忽略便可

[root@node1 ~]# rm  /0bits                           //刪除上面的填充

 

關閉虛擬機,而後打開cmd ,用cd命令進入到你的vmware安裝文件夾,如D:\BOE4 而後執行:

vmware-vdiskmanager -k  D:\hadoop\spark\VM\node1\node1.vmdk       //注:這個vmdk文件爲總文件,而不是子的

14               hadoop2.x經常使用端口

 

組件

節點

默認端口

配置

用途說明

HDFS

DataNode

50010

dfs.datanode.address

datanode服務端口,用於數據傳輸

HDFS

DataNode

50075

dfs.datanode.http.address

http服務的端口

HDFS

DataNode

50475

dfs.datanode.https.address

https服務的端口

HDFS

DataNode

50020

dfs.datanode.ipc.address

ipc服務的端口

HDFS

NameNode

50070

dfs.namenode.http-address

http服務的端口

HDFS

NameNode

50470

dfs.namenode.https-address

https服務的端口

HDFS

NameNode

8020

fs.defaultFS

接收Client鏈接的RPC端口,用於獲取文件系統metadata信息。

HDFS

journalnode

8485

dfs.journalnode.rpc-address

RPC服務

HDFS

journalnode

8480

dfs.journalnode.http-address

HTTP服務

HDFS

ZKFC

8019

dfs.ha.zkfc.port

ZooKeeper FailoverController,用於NN HA

YARN

ResourceManager

8032

yarn.resourcemanager.address

RMapplications manager(ASM)端口

YARN

ResourceManager

8030

yarn.resourcemanager.scheduler.address

scheduler組件的IPC端口

YARN

ResourceManager

8031

yarn.resourcemanager.resource-tracker.address

IPC

YARN

ResourceManager

8033

yarn.resourcemanager.admin.address

IPC

YARN

ResourceManager

8088

yarn.resourcemanager.webapp.address

http服務端口

YARN

NodeManager

8040

yarn.nodemanager.localizer.address

localizer IPC

YARN

NodeManager

8042

yarn.nodemanager.webapp.address

http服務端口

YARN

NodeManager

8041

yarn.nodemanager.address

NMcontainer manager的端口

YARN

JobHistory Server

10020

mapreduce.jobhistory.address

IPC

YARN

JobHistory Server

19888

mapreduce.jobhistory.webapp.address

http服務端口

HBase

Master

60000

hbase.master.port

IPC

HBase

Master

60010

hbase.master.info.port

http服務端口

HBase

RegionServer

60020

hbase.regionserver.port

IPC

HBase

RegionServer

60030

hbase.regionserver.info.port

http服務端口

HBase

HQuorumPeer

2181

hbase.zookeeper.property.clientPort

HBase-managed ZK mode,使用獨立的ZooKeeper集羣則不會啓用該端口。

HBase

HQuorumPeer

2888

hbase.zookeeper.peerport

HBase-managed ZK mode,使用獨立的ZooKeeper集羣則不會啓用該端口。

HBase

HQuorumPeer

3888

hbase.zookeeper.leaderport

HBase-managed ZK mode,使用獨立的ZooKeeper集羣則不會啓用該端口。

Hive

Metastore

9083

/etc/default/hive-metastoreexport PORT=<port>來更新默認端口

 

Hive

HiveServer

10000

/etc/hive/conf/hive-env.shexport HIVE_SERVER2_THRIFT_PORT=<port>來更新默認端口

 

ZooKeeper

Server

2181

/etc/zookeeper/conf/zoo.cfgclientPort=<port>

對客戶端提供服務的端口

ZooKeeper

Server

2888

/etc/zookeeper/conf/zoo.cfgserver.x=[hostname]:nnnnn[:nnnnn],標藍部分

follower用來鏈接到leader,只在leader上監聽該端口。

ZooKeeper

Server

3888

/etc/zookeeper/conf/zoo.cfgserver.x=[hostname]:nnnnn[:nnnnn],標藍部分

用於leader選舉的。只在electionAlg1,23(默認)時須要。

 

15               Linux命令

查超出10M的文件:

find . -type f -size +10M  -print0 | xargs -0 du -h | sort -nr

 

將前最大的前20目錄列出來,--max-depth表示目錄深度,若是去掉,則遍歷全部子目錄:

du -hm --max-depth=5 / | sort -nr | head -20

 

find /etc -name '*srm*'  #表示在/etc目錄下查找文件名中含有srm字符的全部文件

 

 

清除YUM緩存

  yum 會把下載的軟件包和header存儲在cache中,而不會自動刪除。假如咱們以爲他們佔用了磁盤空間,可以使用yum clean指令進行清除,更精確 的用法是yum clean headers清除headeryum clean packages清除下載的rpm包,yum clean all一古腦兒端 .

 

更改全部者

chown -R -v 15040078 /tmp

 

16               hadoop文件系統命令

[root@node1 ~/hadoop-2.6.0/bin]# ./hdfs dfs -chmod -R 700 /tmp

 

附件列表

相關文章
相關標籤/搜索