本文檔主要記錄了Hadoop+Hive+Spark集羣安裝過程,而且對NameNode與ResourceManager進行了HA高可用配置,以及對NameNode的橫向擴展(Federation聯邦)
將子網IP設置爲192.168.1.0:
將網關設置爲192.168.1.2:
並禁止DHCP
當通過上面配置後,虛擬網卡8的IP會變成192.168.1.1:
(虛擬機與物理機不在一個網段是沒有關係的)
http://mirrors.neusoft.edu.cn/centos/7/isos/x86_64/CentOS-7-x86_64-Minimal-1511.iso
下載不帶桌面的最小安裝版本
激活網卡,並設置相關IP:
網關與DNS設置爲上面虛擬網卡8中設置的網關便可
當網卡激活後,就可使用SecureCRT終端遠程鏈接Linux,這樣方便後續操做。如何鏈接這裏省略,
這裏鏈接上後簡單的進行下面設置:
/etc/sysconfig/network
/etc/hostname
/etc/hosts
192.168.1.11 node1
192.168.1.12 node2
192.168.1.13 node3
192.168.1.14 node4
因爲公司內部是代理上網,因此yum沒法連網搜索軟件包
yum代理的設置:vi /etc/yum.conf
再次運行yum,發現能夠連網搜索軟件包了:
安裝好wget後,在/etc目錄下就會產生wget配置文件wgetrc,在這裏面能夠配置wget代理:
[root@node1 ~]# vi /etc/wgetrc
http_proxy = http://10.19.110.55:8080
https_proxy = http://10.19.110.55:8080
ftp_proxy = http://10.19.110.55:8080
爲了虛擬機與主機時間同步,因此須要安裝VMWare Tools
[root@node1 opt]# yum -y install perl
[root@node1 ~]# mount /dev/cdrom /mnt
[root@node1 ~]# tar -zxvf /mnt/VMwareTools-9.6.1-1378637.tar.gz -C /root
[root@node1 ~]# umount /dev/cdrom
[root@node1 ~]# /root/vmware-tools-distrib/vmware-install.pl
[root@node1 ~]# rm -rf /root/vmware-tools-distrib
注:下面文件共享與鼠標拖放功能不要安裝,不然安裝過程會出問題:
[root@node1 ~]# chkconfig --list | grep vmware
vmware-tools 0:關 1:關 2:開 3:開 4:開 5:開 6:關
vmware-tools-thinprint 0:關 1:關 2:開 3:開 4:開 5:開 6:關
[root@node1 ~]# chkconfig vmware-tools-thinprint off
[root@node1 ~]# find / -name *vmware-tools-thinprint* | xargs rm -rf
剛啓動時會出如下錯誤提示:
修改虛擬機配置文件node1.vmx能夠解決:
vcpu.hotadd = "FALSE"
mem.hotadd = "FALSE"
[root@node1 ~]# vim /etc/default/grub
GRUB_TIMEOUT=0 #默認爲5秒
[root@node1 ~]# grub2-mkconfig -o /boot/grub2/grub.cfg
(注:小內存禁用)
修改node1.vmx文件:
mainMem.useNamedFile = "FALSE"
爲了全屏顯示,方便命令行輸入,作如下調整:
並去掉狀態欄顯示:
[root@node1 ~]# reboot
[root@node1 ~]# shutdown -h now
#查看開機自啓動服務
[root@node1 ~]# systemctl list-unit-files | grep enabled | sort
auditd.service enabled
crond.service enabled
dbus-org.freedesktop.NetworkManager.service enabled
dbus-org.freedesktop.nm-dispatcher.service enabled
default.target enabled
dm-event.socket enabled
getty@.service enabled
irqbalance.service enabled
lvm2-lvmetad.socket enabled
lvm2-lvmpolld.socket enabled
lvm2-monitor.service enabled
microcode.service enabled
multi-user.target enabled
NetworkManager-dispatcher.service enabled
NetworkManager.service enabled
postfix.service enabled
remote-fs.target enabled
rsyslog.service enabled
sshd.service enabled
systemd-readahead-collect.service enabled
systemd-readahead-drop.service enabled
systemd-readahead-replay.service enabled
tuned.service enabled
[root@node1 ~]# systemctl | grep running | sort
crond.service loaded active running Command Scheduler
dbus.service loaded active running D-Bus System Message Bus
dbus.socket loaded active running D-Bus System Message Bus Socket
getty@tty1.service loaded active running Getty on tty1
irqbalance.service loaded active running irqbalance daemon
lvm2-lvmetad.service loaded active running LVM2 metadata daemon
lvm2-lvmetad.socket loaded active running LVM2 metadata daemon socket
NetworkManager.service loaded active running Network Manager
polkit.service loaded active running Authorization Manager
postfix.service loaded active running Postfix Mail Transport Agent
rsyslog.service loaded active running System Logging Service
session-1.scope loaded active running Session 1 of user root
session-2.scope loaded active running Session 2 of user root
session-3.scope loaded active running Session 3 of user root
sshd.service loaded active running OpenSSH server daemon
systemd-journald.service loaded active running Journal Service
systemd-journald.socket loaded active running Journal Socket
systemd-logind.service loaded active running Login Service
systemd-udevd-control.socket loaded active running udev Control Socket
systemd-udevd-kernel.socket loaded active running udev Kernel Socket
systemd-udevd.service loaded active running udev Kernel Device Manager
tuned.service loaded active running Dynamic System Tuning Daemon
vmware-tools.service loaded active running SYSV: Manages the services needed to run VMware software
wpa_supplicant.service loaded active running WPA Supplicant daemon
#查看一個服務的狀態
systemctl status auditd.service
#開機時啓用一個服務
systemctl enable auditd.service
#開機時關閉一個服務
systemctl disable auditd.service
systemctl disable postfix.service
systemctl disable rsyslog.service
systemctl disable wpa_supplicant.service
#查看服務是否開機啓動
systemctl is-enabled auditd.service
find . -type f -size +10M -print0 | xargs -0 du -h | sort -nr
將前最大的前20目錄列出來,--max-depth表示目錄深度,若是去掉,則遍歷全部子目錄:
du -hm --max-depth=5 / | sort -nr | head -20
find /etc -name '*srm*' #表示在/etc目錄下查找文件名中含有字符
[root@node1 dev]# df -h
文件系統 容量 已用 可用 已用% 掛載點
/dev/mapper/centos-root 50G 1.5G 49G 3% /
devtmpfs 721M 0 721M 0% /dev
tmpfs 731M 0 731M 0% /dev/shm
tmpfs 731M 8.5M 723M 2% /run
tmpfs 731M 0 731M 0% /sys/fs/cgroup
/dev/mapper/centos-home 47G 33M 47G 1% /home
/dev/sda1 497M 106M 391M 22% /boot
tmpfs 147M 0 147M 0% /run/user/0
[root@node1 dev]# top
JDK全部舊版本在官網中的下載地址:http://www.oracle.com/technetwork/java/archive-139210.html
在線下載jdk-8u72-linux-x64.tar.gz,並存放在/root下:
wget -O /root/jdk-8u92-linux-x64.tar.gz http://120.52.72.24/download.oracle.com/c3pr90ntc0td/otn/java/jdk/8u92-b14/jdk-8u92-linux-x64.tar.gz
[root@node1 ~]# tar -zxvf /root/jdk-8u92-linux-x64.tar.gz -C /root
[root@node1 ~]# vi /etc/profile
在/etc/profile文件的最末加上以下內容:
export JAVA_HOME=/root/jdk1.8.0_92
export PATH=.:$PATH:$JAVA_HOME/bin
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
[root@node1 ~]# source /etc/profile
[root@node1 ~]# java -version
java version "1.8.0_92"
Java(TM) SE Runtime Environment (build 1.8.0_92-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.92-b14, mixed mode)
使用env命令查看當前設置的環境變量是否正確:
[root@node1 ~]# env | grep CLASSPATH
CLASSPATH=.:/root/jdk1.8.0_92/jre/lib/rt.jar:/root/jdk1.8.0_92/lib/dt.jar:/root/jdk1.8.0_92/lib/tools.jar
前面只安裝一臺node1的物理機,現從node1複製出node2\node3\node3
node1 |
192.168.1.11 |
node2 |
192.168.1.12 |
node3 |
192.168.1.13 |
node4 |
192.168.1.14 |
修改相應虛擬機顯示名:
開機時選擇複製:
修改主機名:
[root@node1 ~]# vi /etc/sysconfig/network
[root@node1 ~]# vi /etc/hostname
RSA加密算法是一種典型的非對稱加密算法
RSA算法能夠用於數據加密(公鑰加密,私鑰解密)和數字簽名或認證(私鑰加密,公鑰解密)
客戶端向服務器端發出鏈接請求
服務器端向客戶端發出本身的公鑰
客戶端使用服務器端的公鑰加密通信登陸密碼而後發給服務器端
若是通信過程被截獲,因爲竊聽者即便獲知公鑰和通過公鑰加密的內容,但不擁有私鑰依然沒法解密(RSA算法)
服務器端接收到密文後,用私鑰解密,獲知通信密碼
先在客戶端建立一對密匙,並把公用密匙放在須要訪問的服務器上
客戶端向服務器發出請求,請求用你的密匙進行安全驗證
服務器收到請求以後, 先在該服務器上你的主目錄下尋找你的公用密匙,而後把它和你發送過來的公用密匙進行比較。若是兩個密匙一致, 服務器就用公用密匙加密「質詢」(challenge)並把它發送給客戶端
客戶端收到「質詢」以後就能夠用本身的私人密匙解密再把它發送給服務器
服務器比較發來的「質詢」和原先的是否一致,若是一致則進行受權,完成創建會話的操做
先刪除之前生成的:
rm -rf /root/.ssh
生成密鑰:
[root@node1 ~]# ssh-keygen -t rsa
[root@node2 ~]# ssh-keygen -t rsa
[root@node3 ~]# ssh-keygen -t rsa
[root@node4 ~]# ssh-keygen -t rsa
命令「ssh-keygen -t rsa」表示使用 rsa 加密方式生成密鑰, 回車後,會提示三次輸入信息,咱們直接回車便可。
查看生成的密鑰:
其中id_rsa.pub爲公鑰,id_rsa爲私鑰
服務器之間公鑰拷貝:
ssh-copy-id -i /root/.ssh/id_rsa.pub <主機名>
表示將本機的公鑰拷貝到hadoop-slave1主機上去,並自動追加到authorized_keys文件中去,若是不存在則會自動建立一個。若是是本身遠程本身時,主機就填本身
[root@node1 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub node1
[root@node1 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub node2
[root@node1 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub node3
[root@node1 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub node4
[root@node2 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub node1
[root@node2 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub node2
[root@node2 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub node3
[root@node2 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub node4
[root@node3 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub node1
[root@node3 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub node2
[root@node3 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub node3
[root@node3 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub node4
[root@node4 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub node1
[root@node4 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub node2
[root@node4 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub node3
[root@node4 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub node4
注:若是發現三臺虛擬機上生成的公鑰都是一個時,請先刪除/etc/udev/rules.d/70-persistent-net.rules 文件,再刪除 /root/.ssh文件夾後,從新生成
|
|
node1 |
node2 |
node3 |
node4 |
NameNode |
Hadoop |
Y(屬於cluster1 |
Y集羣1) |
Y(屬於cluster2 |
Y集羣2) |
DateNode |
|
Y |
Y |
Y |
|
NodeManager |
|
Y |
Y |
Y |
|
JournalNodes |
Y |
Y |
Y |
|
|
zkfc(DFSZKFailoverController) |
Y(有namenode的地方 |
Y就有zkfc) |
Y |
Y |
|
ResourceManager |
Y |
Y |
|
|
|
ZooKeeper(QuorumPeerMain) |
Zookeeper |
Y |
Y |
Y |
|
MySQL |
HIVE |
|
|
|
Y |
metastore(RunJar) |
|
|
Y |
|
|
HIVE(RunJar) |
Y |
|
|
|
|
Scala |
Spark |
Y |
Y |
Y |
Y |
Spark-master |
Y |
|
|
|
|
Spark-worker |
|
Y |
Y |
Y |
不一樣的NameNode經過同一ClusterID來共用同一套DataNode:
NS-n單元:
[root@node1 ~]# wget -O /root/zookeeper-3.4.9.tar.gz https://mirrors.tuna.tsinghua.edu.cn/apache/zookeeper/zookeeper-3.4.9/zookeeper-3.4.9.tar.gz
[root@node1 ~]# tar -zxvf /root/zookeeper-3.4.9.tar.gz -C /root
[root@node1 conf]# cp /root/zookeeper-3.4.9/conf/zoo_sample.cfg /root/zookeeper-3.4.9/conf/zoo.cfg
[root@node1 conf]# vi /root/zookeeper-3.4.9/conf/zoo.cfg
[root@node1 conf]# mkdir /root/zookeeper-3.4.9/zkData
[root@node1 conf]# touch /root/zookeeper-3.4.9/zkData/myid
[root@node1 conf]# echo 1 > /root/zookeeper-3.4.9/zkData/myid
[root@node1 conf]# scp -r /root/zookeeper-3.4.9 node2:/root
[root@node1 conf]# scp -r /root/zookeeper-3.4.9 node3:/root
[root@node2 conf]# echo 2 > /root/zookeeper-3.4.9/zkData/myid
[root@node3 conf]# echo 3 > /root/zookeeper-3.4.9/zkData/myid
[root@node1 ~]# vi /root/zookeeper-3.4.9/bin/zkServer.sh
在下面啓動Java的地方加上啓動參數"-Dzookeeper.DigestAuthenticationProvider.superDigest=super:Q9YtF+3h9Ko5UNT8apBWr8hovH4=",super後面是密碼(AAAaaa111):
[root@node1 ~]# /root/zookeeper-3.4.9/bin/zkCli.sh
[zk: localhost:2181(CONNECTED) 11] addauth digest super:AAAaaa111
如今就能夠任意刪除節點數據了:
[zk: localhost:2181(CONNECTED) 15] rmr /rmstore/ZKRMStateRoot
zookeeper沒法啓動"Unable to load database on disk"
[root@node3 ~]# more zookeeper.out
2017-01-24 11:31:31,827 [myid:3] - ERROR [main:QuorumPeer@557] - Unable to load database on disk
java.io.IOException: The accepted epoch, d is less than the current epoch, 17
at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:554)
at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:500)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:153)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
[root@node3 ~]# more /root/zookeeper-3.4.9/conf/zoo.cfg | grep dataDir
dataDir=/root/zookeeper-3.4.9/zkData
[root@node3 ~]# ls /root/zookeeper-3.4.9/zkData
myid version-2 zookeeper_server.pid
清空version-2下的全部文件:
[root@node3 ~]# rm -f /root/zookeeper-3.4.9/zkData/version-2/*.*
[root@node3 ~]# rm -rf /root/zookeeper-3.4.9/zkData/version-2/acceptedEpoch
[root@node3 ~]# rm -rf /root/zookeeper-3.4.9/zkData/version-2/currentEpoch
[root@node1 ~]# wget -O /root/hadoop-2.7.2.tar.gz http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz
[root@node1 ~]# tar -zxvf /root/hadoop-2.7.2.tar.gz -C /root
[root@node1 ~]# vi /root/hadoop-2.7.2/etc/hadoop/hadoop-env.sh
下面這個存放PID進程號的位置必定要修改,不然可能會出現:XXX running as process 1609. Stop it first.
[root@node1 ~]# vi /root/hadoop-2.7.2/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
<description>指定DataNode存儲block的副本數量。默認值是3個,咱們如今有4個DataNode,該值不大於4便可</description>
</property>
<property>
<name>dfs.blocksize</name>
<value>134217728</value>
<description>
The default block size for new files, in bytes.
You can use the following suffix (case insensitive):
k(kilo), m(mega), g(giga), t(tera), p(peta), e(exa) to specify the size (such as 128k, 512m, 1g, etc.),
Or provide complete size in bytes (such as 134217728 for 128 MB).
注:1.X及之前版本默認是64M,並且配置項名爲dfs.block.size
</description>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
<description>注:若是還有權限問題,請執行下「/root/hadoop-2.7.2/bin/hdfs dfs -chmod -R 777 /」命令</description>
</property>
<property>
<name>dfs.nameservices</name>
<value>cluster1,cluster2</value>
<description>使用federation時,使用了2個HDFS集羣。這裏抽象出兩個NameService實際上就是給這2個HDFS集羣起了個別名。名字能夠隨便起,相互不重複便可。多個集羣時使用逗號分開。注:這裏的命名只是個邏輯空間的概念,不是集羣1、集羣2兩集羣,應該是 cluster1+cluster2 才組成一個集羣,cluster1、cluster2只是集羣的一部分,從邏輯上將整個集羣分紅了兩部分(固然還要以加一個高可用NameNode進來,組成第三部分),cluster1、cluster2是否屬於同一集羣,則是是clusterID決定的,clusterID這個值是在格式化NameNode時指定的,請參照namenode格式化和啓動</description>
</property>
<property>
<name>dfs.ha.namenodes.cluster1</name>
<value>nn1,nn2</value>
<description>集羣1裏面NameNode的邏輯名,注:只是隨便命的邏輯名,這裏不是真實的NameNode主機名,後面配置才指定到主機</description>
</property>
<property>
<name>dfs.ha.namenodes.cluster2</name>
<value>nn3,nn4</value>
<description>集羣2裏的NameNode邏輯名</description>
</property>
<!-- 下面配置實現邏輯名與物理主機綁定-->
<property>
<name>dfs.namenode.rpc-address.cluster1.nn1</name>
<value>node1:8020</value>
<description>8020爲HDFS 客戶端接入地址(包括命令行與程序),有的使用9000</description>
</property>
<property>
<name>dfs.namenode.rpc-address.cluster1.nn2</name>
<value>node2:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.cluster2.nn3</name>
<value>node3:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.cluster2.nn4</name>
<value>node4:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.cluster1.nn1</name>
<value>node1:50070</value>
<description> namenode web的接入地址</description>
</property>
<property>
<name>dfs.namenode.http-address.cluster1.nn2</name>
<value>node2:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.cluster2.nn3</name>
<value>node3:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.cluster3.nn4</name>
<value>node4:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://node1:8485;node2:8485;node3:8485/cluster1</value>
<description>指定cluster1的兩個NameNode共享edits文件目錄時,使用的JournalNode集羣信息。
node1\node2主機中使用這個配置</description>
</property>
<!--
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://node1:8485;node2:8485;node3:8485/cluster2</value>
<description>指定cluster2的兩個NameNode共享edits文件目錄時,使用的JournalNode集羣信息。
node3\node4主機中使用這個配置</description>
</property>
-->
<property>
<name>dfs.ha.automatic-failover.enabled.cluster1</name>
<value>true</value>
<description>指定cluster1是否啓動自動故障恢復,即當NameNode出故障時,是否自動切換到另外一臺NameNode</description>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled.cluster2</name>
<value>true</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.cluster1</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
<description>指定cluster1出故障時,哪一個Java類負責執行故障切換</description>
</property>
<property>
<name>dfs.client.failover.proxy.provider.cluster2</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/root/hadoop-2.7.2/tmp/journal</value>
<description>指定JournalNode自身存儲數據的磁盤路徑</description>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
<description>NameNode使用SSH進行主備切換</description>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
<description>若是使用ssh進行故障切換,使用ssh通訊時用的密鑰存儲的位置</description>
</property>
</configuration>
[root@node1 ~]# vi /root/hadoop-2.7.2/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://cluster1:8020</value>
<description>在使用客戶端(或程序)時,若是不指定具體的接入地址?該值來自於hdfs-site.xml中的配置。注:全部主機上配置都同樣。</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/root/hadoop-2.7.2/tmp</value>
<description>這裏的路徑默認是NameNode、DataNode、JournalNode等存放數據的公共目錄</description>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>node1:2181,node2:2181,node3:2181</value>
<description>這裏是ZooKeeper集羣的地址和端口。注意,數量必定是奇數,且很多於三個節點</description>
</property>
<!-- 下面的配置可解決NameNode鏈接JournalNode超時異常問題-->
<property>
<name>ipc.client.connect.retry.interval</name>
<value>10000</value>
<description>Indicates the number of milliseconds a client will wait for
before retrying to establish a server connection.
</description>
</property>
</configuration>
指定DataNode所在主機:
[root@node1 ~]# vi /root/hadoop-2.7.2/etc/hadoop/slaves
[root@node1 ~]# vi /root/hadoop-2.7.2/etc/hadoop/yarn-env.sh
[root@node1 ~]# vi /root/hadoop-2.7.2/etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<description>指定mapreduce運行在Yarn框架下</description>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>node1:10020</value>
<description>注:每臺機器上配置都不同,須要修改爲對應的主機名,端口不用修改,好比node2:10020、node3:10020、node4:10020,,拷貝過去後請作相應修改</description>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>node1:19888</value>
<description>注:每臺機器上配置都不同,須要修改爲對應的主機名,端口不用修改,好比node2:19888、node3:19888、node4:19888,拷貝過去後請作相應修改</description>
</property>
</configuration>
[root@node1 ~]# vi /root/hadoop-2.7.2/etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yarn-cluster</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>node1</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>node2</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>node1:8088</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>node2:8088</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>node1:2181,node2:2181,node3:2181</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
<description>RM的數據默認存放在ZK上的/rmstore中,可經過yarn.resourcemanager.zk-state-store.parent-path 設定</description>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
<description>開啓日誌收集,這樣會將每臺執行任務的機上產生的本地日誌文件集中拷貝到HDFS的某個地方,這樣就能夠在任何一臺集羣中的機器上集中查看做業日誌了</description>
</property>
<property>
<name>yarn.log.server.url</name>
<value>http://node1:19888/jobhistory/logs</value>
<description>注:每臺機器上配置都不同,須要修改爲對應的主機名,端口不用修改,好比http://node2:19888/jobhistory/logs、http://node3:19888/jobhistory/logs、http://node4:19888/jobhistory/logs,拷貝過去後請作相應修改</description>
</property>
</configuration>
[root@node1 ~]# scp -r /root/hadoop-2.7.2/ node2:/root
[root@node1 ~]# scp -r /root/hadoop-2.7.2/ node3:/root
[root@node1 ~]# scp -r /root/hadoop-2.7.2/ node4:/root
[root@node3 ~]# vi /root/hadoop-2.7.2/etc/hadoop/hdfs-site.xml
[root@node3 ~]# scp /root/hadoop-2.7.2/etc/hadoop/hdfs-site.xml node4:/root/hadoop-2.7.2/etc/hadoop
[root@node2 ~]# vi /root/hadoop-2.7.2/etc/hadoop/mapred-site.xml
[root@node3 ~]# vi /root/hadoop-2.7.2/etc/hadoop/mapred-site.xml
[root@node4 ~]# vi /root/hadoop-2.7.2/etc/hadoop/mapred-site.xml
[root@node2 ~]# vi /root/hadoop-2.7.2/etc/hadoop/yarn-site.xml
[root@node3 ~]# vi /root/hadoop-2.7.2/etc/hadoop/yarn-site.xml
[root@node4 ~]# vi /root/hadoop-2.7.2/etc/hadoop/yarn-site.xml
[root@node1 bin]# /root/zookeeper-3.4.9/bin/zkServer.sh start
[root@node2 bin]# /root/zookeeper-3.4.9/bin/zkServer.sh start
[root@node3 bin]# /root/zookeeper-3.4.9/bin/zkServer.sh start
[root@node1 bin]# jps
1622 QuorumPeerMain
查看狀態:
[root@node1 ~]# /root/zookeeper-3.4.9/bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /root/zookeeper-3.4.9/bin/../conf/zoo.cfg
Mode: follower
[root@node2 ~]# /root/zookeeper-3.4.9/bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /root/zookeeper-3.4.9/bin/../conf/zoo.cfg
Mode: leader
查看數據節點:
[root@node1 hadoop-2.7.2]# /root/zookeeper-3.4.9/bin/zkCli.sh
[zk: localhost:2181(CONNECTED) 0] ls /
[zookeeper]
在每一個集羣上的任意一節點上進行操做,目的是在Zookeeper集羣上創建HA的相應Znode節點數據
[root@node1 ~]# /root/hadoop-2.7.2/bin/hdfs zkfc -formatZK
[root@node3 ~]# /root/hadoop-2.7.2/bin/hdfs zkfc -formatZK
格式化後,會在ZK上建立hadoop-ha名稱的Znode數據節點:
[root@node1 ~]# /root/zookeeper-3.4.9/bin/zkCli.sh
[root@node1 ~]# /root/hadoop-2.7.2/sbin/hadoop-daemon.sh start journalnode
[root@node2 ~]# /root/hadoop-2.7.2/sbin/hadoop-daemon.sh start journalnode
[root@node3 ~]# /root/hadoop-2.7.2/sbin/hadoop-daemon.sh start journalnode
[root@node1 ~]# jps
1810 JournalNode
[root@node1 ~]# /root/hadoop-2.7.2/bin/hdfs namenode -format -clusterId CLUSTER_UUID_1
[root@node1 ~]# /root/hadoop-2.7.2/sbin/hadoop-daemon.sh start namenode
[root@node1 ~]# jps
1613 NameNode
同一集羣中的全部集羣ID必須相同(包括NameNode、DataNode等):
[root@node2 ~]# /root/hadoop-2.7.2/bin/hdfs namenode -bootstrapStandby
[root@node2 ~]# /root/hadoop-2.7.2/sbin/hadoop-daemon.sh start namenode
[root@node3 ~]# /root/hadoop-2.7.2/bin/hdfs namenode -format -clusterId CLUSTER_UUID_1
[root@node3 ~]# /root/hadoop-2.7.2/sbin/hadoop-daemon.sh start namenode
[root@node4 ~]# /root/hadoop-2.7.2/bin/hdfs namenode -bootstrapStandby
[root@node4 ~]# /root/hadoop-2.7.2/sbin/hadoop-daemon.sh start namenode
ZKFC(zookeeper Failover Controller)是用來監控NameNode狀態的,協助實現主備NameNode切換的,在全部NameNode上執行
[root@node1 ~]# /root/hadoop-2.7.2/sbin/hadoop-daemon.sh start zkfc
[root@node2 ~]# /root/hadoop-2.7.2/sbin/hadoop-daemon.sh start zkfc
[root@node1 ~]# jps
5280 DFSZKFailoverController
自動切換成功:
[root@node3 ~]# /root/hadoop-2.7.2/sbin/hadoop-daemon.sh start zkfc
[root@node4 ~]# /root/hadoop-2.7.2/sbin/hadoop-daemon.sh start zkfc
[root@node2 ~]# /root/hadoop-2.7.2/sbin/hadoop-daemon.sh start datanode
[root@node3 ~]# /root/hadoop-2.7.2/sbin/hadoop-daemon.sh start datanode
[root@node4 ~]# /root/hadoop-2.7.2/sbin/hadoop-daemon.sh start datanode
上傳到指定的集羣2中:
[root@node1 ~]# /root/hadoop-2.7.2/bin/hdfs dfs -put /root/hadoop-2.7.2.tar.gz hdfs://cluster2/
[root@node1 ~]# /root/hadoop-2.7.2/bin/hdfs dfs -put /root/test_upload.tar hdfs://cluster1:8020/
上傳時若是未明確指定路徑,則會默認使用core-site.xml配置文本中的fs.defaultFS配置項:
[root@node1 ~]# /root/hadoop-2.7.2/bin/hdfs dfs -put /root/hadoop-2.7.2.tar.gz /
也能夠具體到某臺主機(但要是處於激活狀態):
/root/hadoop-2.7.2/bin/hdfs dfs -put /root/hadoop-2.7.2.tar hdfs://node3:8020/
/root/hadoop-2.7.2/bin/hdfs dfs -put /root/hadoop-2.7.2.tar hdfs://node3/
[root@node1 ~]# /root/hadoop-2.7.2/bin/hdfs haadmin -ns cluster1 -getServiceState nn1
active
[root@node1 ~]# /root/hadoop-2.7.2/bin/hdfs haadmin -ns cluster1 -getServiceState nn2
standby
[root@node1 ~]# jps
2448 NameNode
3041 DFSZKFailoverController
3553 Jps
2647 JournalNode
2954 QuorumPeerMain
[root@node1 ~]# kill 2448
[root@node1 ~]# /root/hadoop-2.7.2/bin/hdfs haadmin -ns cluster1 -getServiceState nn2
active
/root/hadoop-2.7.2/bin/hdfs haadmin -ns cluster1 -failover nn2 nn1
/root/hadoop-2.7.2/bin/hdfs haadmin -ns cluster2 -failover nn4 nn3
[root@node1 ~]# /root/hadoop-2.7.2/sbin/yarn-daemon.sh start resourcemanager
[root@node2 ~]# /root/hadoop-2.7.2/sbin/yarn-daemon.sh start resourcemanager
[root@node2 ~]# /root/hadoop-2.7.2/sbin/yarn-daemon.sh start nodemanager
[root@node3 ~]# /root/hadoop-2.7.2/sbin/yarn-daemon.sh start nodemanager
[root@node4 ~]# /root/hadoop-2.7.2/sbin/yarn-daemon.sh start nodemanager
http://node1:8088/cluster/cluster
注:輸入地址爲http://XXXXX/cluster/cluster形式,不然若是是備用的則會自動跳轉到激活主機上面去
http://node2:8088/cluster/cluster
查看狀態命令:
[root@node4 logs]# /root/hadoop-2.7.2/bin/yarn rmadmin -getServiceState rm2
[root@node4 ~]# /root/hadoop-2.7.2/bin/hdfs dfs -mkdir hdfs://cluster1/hadoop
[root@node4 ~]# /root/hadoop-2.7.2/bin/hdfs dfs -put /root/hadoop-2.7.2/etc/hadoop/*xml* hdfs://cluster1/hadoop
[root@node4 ~]# /root/hadoop-2.7.2/bin/hadoop jar /root/hadoop-2.7.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount hdfs://cluster1:8020/hadoop/h* hdfs://cluster1:8020/hadoop/m* hdfs://cluster1/wordcountOutput
注:MapReduce的輸出要與其輸入在同一集羣。雖然能夠放在另外一集羣時也要執行成功,但經過Web查看輸出結果文件時,會找不到
如下腳本放在node1上運行
自動交互
在經過腳本進行RM手動切換時使用
[root@node1 ~]# yum install expect
[root@node1 ~]# vi /root/starthadoop.sh
#rm -rf /root/hadoop-2.7.2/logs/*.*
#ssh root@node2 'export BASH_ENV=/etc/profile;rm -rf /root/hadoop-2.7.2/logs/*.*'
#ssh root@node3 'export BASH_ENV=/etc/profile;rm -rf /root/hadoop-2.7.2/logs/*.*'
#ssh root@node4 'export BASH_ENV=/etc/profile;rm -rf /root/hadoop-2.7.2/logs/*.*'
/root/zookeeper-3.4.9/bin/zkServer.sh start
ssh root@node2 'export BASH_ENV=/etc/profile;/root/zookeeper-3.4.9/bin/zkServer.sh start'
ssh root@node3 'export BASH_ENV=/etc/profile;/root/zookeeper-3.4.9/bin/zkServer.sh start'
/root/hadoop-2.7.2/sbin/start-all.sh
ssh root@node2 'export BASH_ENV=/etc/profile;/root/hadoop-2.7.2/sbin/yarn-daemon.sh start resourcemanager'
/root/hadoop-2.7.2/sbin/hadoop-daemon.sh start zkfc
ssh root@node2 'export BASH_ENV=/etc/profile;/root/hadoop-2.7.2/sbin/hadoop-daemon.sh start zkfc'
ssh root@node3 'export BASH_ENV=/etc/profile;/root/hadoop-2.7.2/sbin/hadoop-daemon.sh start zkfc'
ssh root@node4 'export BASH_ENV=/etc/profile;/root/hadoop-2.7.2/sbin/hadoop-daemon.sh start zkfc'
#ret=`/root/hadoop-2.7.2/bin/hdfs dfsadmin -safemode get | grep ON | head -1`
#while [ -n "$ret" ]
#do
#echo '等待離開安全模式'
#sleep 1s
#ret=`/root/hadoop-2.7.2/bin/hdfs dfsadmin -safemode get | grep ON | head -1`
#done
/root/hadoop-2.7.2/bin/hdfs haadmin -ns cluster1 -failover nn2 nn1
/root/hadoop-2.7.2/bin/hdfs haadmin -ns cluster2 -failover nn4 nn3
echo 'Y' | ssh root@node1 'export BASH_ENV=/etc/profile;/root/hadoop-2.7.2/bin/yarn rmadmin -transitionToActive --forcemanual rm1'
/root/hadoop-2.7.2/sbin/mr-jobhistory-daemon.sh start historyserver
ssh root@node2 'export BASH_ENV=/etc/profile;/root/hadoop-2.7.2/sbin/mr-jobhistory-daemon.sh start historyserver'
ssh root@node3 'export BASH_ENV=/etc/profile;/root/hadoop-2.7.2/sbin/mr-jobhistory-daemon.sh start historyserver'
ssh root@node4 'export BASH_ENV=/etc/profile;/root/hadoop-2.7.2/sbin/mr-jobhistory-daemon.sh start historyserver'
#此命令行啓動Spark,只安裝Hadoop時去掉
/root/spark-2.1.0-bin-hadoop2.7/sbin/start-all.sh
echo '--------------node1---------------'
jps | grep -v Jps | sort -k 2 -t ' '
echo '--------------node2---------------'
ssh root@node2 "export PATH=/usr/bin:$PATH;jps | grep -v Jps | sort -k 2 -t ' '"
echo '--------------node3---------------'
ssh root@node3 "export PATH=/usr/bin:$PATH;jps | grep -v Jps | sort -k 2 -t ' '"
echo '--------------node4---------------'
ssh root@node4 "export PATH=/usr/bin:$PATH;jps | grep -v Jps | sort -k 2 -t ' '"
#下面兩行命令用來啓動Hive,沒有安裝時請去掉
ssh root@node4 'export BASH_ENV=/etc/profile;service mysql start'
ssh root@node3 'export BASH_ENV=/etc/profile;/root/hive-1.2.1/bin/hive --service metastore&'
[root@node1 ~]# vi /root/stophadoop.sh
#此命令行用來中止Spark,未安裝時去掉
/root/spark-2.1.0-bin-hadoop2.7/sbin/stop-all.sh
#下面兩行用來中止HIVE,未安裝時去掉
ssh root@node4 'export BASH_ENV=/etc/profile;service mysql stop'
ssh root@node3 'export BASH_ENV=/etc/profile;/root/jdk1.8.0_92/bin/jps | grep RunJar | head -1 |cut -f1 -d " "| xargs kill'
ssh root@node2 'export BASH_ENV=/etc/profile;/root/hadoop-2.7.2/sbin/yarn-daemon.sh stop resourcemanager'
/root/hadoop-2.7.2/sbin/stop-all.sh
/root/hadoop-2.7.2/sbin/hadoop-daemon.sh stop zkfc
ssh root@node2 'export BASH_ENV=/etc/profile;/root/hadoop-2.7.2/sbin/hadoop-daemon.sh stop zkfc'
ssh root@node3 'export BASH_ENV=/etc/profile;/root/hadoop-2.7.2/sbin/hadoop-daemon.sh stop zkfc'
ssh root@node4 'export BASH_ENV=/etc/profile;/root/hadoop-2.7.2/sbin/hadoop-daemon.sh stop zkfc'
/root/zookeeper-3.4.9/bin/zkServer.sh stop
ssh root@node2 'export BASH_ENV=/etc/profile;/root/zookeeper-3.4.9/bin/zkServer.sh stop'
ssh root@node3 'export BASH_ENV=/etc/profile;/root/zookeeper-3.4.9/bin/zkServer.sh stop'
/root/hadoop-2.7.2/sbin/mr-jobhistory-daemon.sh stop historyserver
ssh root@node2 'export BASH_ENV=/etc/profile;/root/hadoop-2.7.2/sbin/mr-jobhistory-daemon.sh stop historyserver'
ssh root@node3 'export BASH_ENV=/etc/profile;/root/hadoop-2.7.2/sbin/mr-jobhistory-daemon.sh stop historyserver'
ssh root@node4 'export BASH_ENV=/etc/profile;/root/hadoop-2.7.2/sbin/mr-jobhistory-daemon.sh stop historyserver'
[root@node1 ~]# chmod 777 starthadoop.sh stophadoop.sh
[root@node1 ~]# vi /root/reboot.sh
ssh root@node2 "export PATH=/usr/bin:$PATH;reboot"
ssh root@node3 "export PATH=/usr/bin:$PATH;reboot"
ssh root@node4 "export PATH=/usr/bin:$PATH;reboot"
reboot
[root@node1 ~]# vi /root/shutdown.sh
ssh root@node2 "export PATH=/usr/bin:$PATH;shutdown -h now"
ssh root@node3 "export PATH=/usr/bin:$PATH;shutdown -h now"
ssh root@node4 "export PATH=/usr/bin:$PATH;shutdown -h now"
shutdown -h now
[root@node1 ~]# chmod 777 /root/shutdown.sh /root/reboot.sh
一、 將hadoop-2.7.2.tar.gz(前面本身編譯的CentOS版本)解壓到D:\hadoop下,並將winutil.exe.hadoop.dll等文件到hadoop安裝目錄bin文件夾下,再將hadoop.dll放到C:\Windows及C:\Windows\System32下。
二、 添加HADOOP_HOME環境變量,值爲D:\hadoop\hadoop-2.7.2,並將%HADOOP_HOME%\bin添加到Path環境變量中
三、 雙擊winutils.exe,若是出現「缺失MSVCR120.dll」的提示,則安裝VC++2013相關組件
四、 將hadoop-eclipse-plugin-2.7.2.jar(該插件包也是要在Windows上進行編譯,很是麻煩,也找現成的吧!)插件包拷貝到Eclipse plugins目錄下
五、 運行Eclipse,進行配置:
package jzj;
import java.io.IOException;
import java.net.URI;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.log4j.Logger;
publicclass WordCount {
publicstaticclass TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {
privatefinalstatic IntWritable one = new IntWritable(1);
private Text word = new Text();
private Logger log = Logger.getLogger(TokenizerMapper.class);
publicvoid map(Object key, Text value, Context context) throws IOException, InterruptedException {
log.debug("[Thread=" + Thread.currentThread().hashCode() + "] map任務:log4j輸出:wordcount,key=" + key + ",value=" + value);
System.out.println("[Thread=" + Thread.currentThread().hashCode() + "] map任務:System.out輸出:wordcount,key=" + key + ",value="
+ value);
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
publicstaticclass IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable result = new IntWritable();
private Logger log = Logger.getLogger(IntSumReducer.class);
publicvoid reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
log.debug("[Thread=" + Thread.currentThread().hashCode() + "] reduce任務:log4j輸出:wordcount,key=" + key + ",count=" + sum);
System.out.println("[Thread=" + Thread.currentThread().hashCode() + "] reduce任務:System.out輸出:wordcount,key=" + key + ",count="
+ sum);
}
}
publicstaticvoid main(String[] args) throws Exception {
Logger log = Logger.getLogger(WordCount.class);
log.debug("JOB Main方法:log4j輸出:wordcount");
System.out.println("JOB Main方法:System.out輸出:wordcount");
Configuration conf = new Configuration();
// 注:xxx.jar任務包中須要一個空的yarn-default.xml配置文件,不然任務遠程提交後會一直等待,Why?
conf.set("mapreduce.framework.name", "yarn");// 指定使用yarn框架
conf.set("yarn.resourcemanager.address", "node1:8032"); // 提交任務到哪臺機器上
// 須要加上,不然拋異常:java.io.IOException: The ownership on the staging
// directory /tmp/hadoop-yarn/staging/15040078/.staging
// is not as expected. It is owned by . The directory must be owned by
// the submitter 15040078 or by 15040078
conf.set("fs.defaultFS", "hdfs://node1:8020");// 指定namenode
// 加上該配置,不然拋異常:Stack trace: ExitCodeException exitCode=1: /bin/bash: 第 0
// 行:fg: 無任務控制
conf.set("mapreduce.app-submission.cross-platform", "true");
// 此處Key值mapred.jar不要修改,值爲本項目導出的Jar包,若是不設置,則報找不到類
conf.set("mapred.jar", "wordcount.jar");
Job job = Job.getInstance(conf, "wordcount");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
// 若是這裏設置了Combiner,則Map端與會有reduce日誌,緣由設置了Combiner後,Map端作完Map後,會繼續運行reduce任務,因此在Map端也會看到reduce任務日誌就不奇怪了
// job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
// job.setNumReduceTasks(4);
FileInputFormat.addInputPath(job, new Path("hdfs://node1/hadoop/core-site.xml"));
FileInputFormat.addInputPath(job, new Path("hdfs://node1/hadoop/m*"));
FileSystem fs = FileSystem.get(URI.create("hdfs://node1"), conf);
fs.delete(new Path("/wordcountOutput"), true);
FileOutputFormat.setOutputPath(job, new Path("hdfs://node1/wordcountOutput"));
System.exit(job.waitForCompletion(true) ? 0 : 1);
System.out.println(job.getStatus().getJobID());
}
}
注:工程中的yarn-default.xml爲空文件,但經測式必定須要
<projectdefault="jar"name="Acid">
<propertyname="lib.dir"value="D:/hadoop/hadoop-2.7.2/share/hadoop"/>
<propertyname="src.dir"value="../src"/>
<propertyname="classes.dir"value="../bin"/>
<propertyname="output.dir"value=".."/>
<propertyname="jarname"value="wordcount.jar"/>
<propertyname="mainclass"value="jzj.WordCount"/>
<!-- 第三方jar包的路徑 -->
<pathid="lib-classpath">
<filesetdir="${lib.dir}">
<includename="**/*.jar"/>
</fileset>
</path>
<!-- 1. 初始化工做,如建立目錄等 -->
<targetname="init">
<mkdirdir="${classes.dir}"/>
<mkdirdir="${output.dir}"/>
<deletefile="${output.dir}/wordcount.jar"/>
<deleteverbose="true"includeemptydirs="true">
<filesetdir="${classes.dir}">
<includename="**/*"/>
</fileset>
</delete>
</target>
<!-- 2. 編譯 -->
<targetname="compile"depends="init">
<javacsrcdir="${src.dir}"destdir="${classes.dir}"includeantruntime="on">
<compilerargline="-encoding GBK"/>
<classpathrefid="lib-classpath"/>
</javac>
</target>
<!-- 3. 打包jar文件 -->
<targetname="jar"depends="compile">
<copytodir="${classes.dir}">
<filesetdir="${src.dir}">
<includename="**"/>
<excludename="build.xml"/>
<!--注:不能排除掉log4j.properties文件,該文件也要一塊兒打包,不然運行時不會顯示日誌
該日誌配置文件僅做用於JOB,即會在做業提交的客戶端上產生日誌,而TASK(MapReduce任務)
則是由/root/hadoop-2.7.2/etc/hadoop/log4j.properties配置文件來決定-->
<!--exclude name="log4j.properties" / -->
</fileset>
</copy>
<!-- jar文件的輸出路徑 -->
<jardestfile="${output.dir}/${jarname}"basedir="${classes.dir}">
<manifest>
<attributename="Main-class"value="${mainclass}"/>
</manifest>
</jar>
</target>
</project>
log4j.rootLogger=info,stdout,R
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%5p-%m%n
log4j.appender.R=org.apache.log4j.RollingFileAppender
log4j.appender.R.File=mapreduce_test.log
log4j.appender.R.MaxFileSize=1MB
log4j.appender.R.MaxBackupIndex=1
log4j.appender.R.layout=org.apache.log4j.PatternLayout
log4j.appender.R.layout.ConversionPattern=%p%t%c-%m%n
log4j.logger.jzj =DEBUG
打開工程中的build.xml構件文件,按 SHIFT+ALT+X,Q,便可在工程下打成做業jar包:
包結構以下:
而後打開工程中的WordCount.java源碼文件,點擊:
運行時若是報如下異常:
Exception in thread "main" org.apache.hadoop.security.AccessControlException: Permission denied: user=15040078, access=EXECUTE, inode="/tmp/hadoop-yarn/staging/15040078/.staging/job_1484039063795_0001":root:supergroup:drwxrwx---
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:259)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:205)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1720)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1704)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkOwner(FSDirectory.java:1673)
at org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setPermission(FSDirAttrOp.java:61)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setPermission(FSNamesystem.java:1653)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setPermission(NameNodeRpcServer.java:695)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setPermission(ClientNamenodeProtocolServerSideTranslatorPB.java:453)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
[root@node1 ~]# /root/hadoop-2.7.2/bin/hdfs dfs -chmod -R 777 /
若是發現任務提交後,中止不前,則能夠殺掉該任務:
[root@node1 ~]# /root/hadoop-2.7.2/bin/hadoop job -list
[root@node1 ~]# /root/hadoop-2.7.2/bin/hadoop job -kill job_1475762778825_0008
如NameNode、secondarynamenode、historyserver、ResourceManage、DataNode、nodemanager等系統自帶的服務輸出來的日誌默認是存放 在${HADOOP_HOME}/logs目錄下,也能夠經過Web頁面這樣查看:
這些日誌實際上對應每臺主機上的本地日誌文件,進入相應主機能夠看到原始文件:
當日志到達必定的大小將會被切割出一個新的文件,後面的數字越大,表明日誌越舊。在默認狀況 下,只保存前20個日誌文件。系統日誌位置及大小都是能夠在 在${HADOOP_HOME}/etc/hadoop/log4j.properties文件中配置的,配置文件中的環境變量由${HADOOP_HOME}/etc/hadoop/目錄下相關配置文件來設定
*.out文件,標準輸出會重定向到這裏
也能夠這樣點進來:
Mapreduce日誌能夠分爲歷史做業日誌和Container日誌。
(1)、歷史做業的記錄裏面包含了一個做業用了多少個Map、用了多少個Reduce、做業提交時間、做業啓動時間、做業完成時間等信息;這些信息對分析做業是頗有幫助的,咱們能夠經過這些歷史做業記錄獲得天天有多少個做業運行成功、有多少個做業運行失敗、每一個隊列做業運行了多少個做業等頗有用的信息。這些歷史做業的信息是經過下面的信息配置的:
注:這一類日誌文件是放在HDFS上面的
(2)、Container日誌:包含ApplicationMaster日誌和普通Task日誌等信息。
YARN提供了兩種存放容器(container)日誌的方式:
1) 本地:若是日誌聚合服務被開啓的話(經過yarn.log-aggregation-enable來配置),容器日誌將會被拷貝到HDFS中而且刪除本機上的日誌文件,位置由yarn-site.xml中的yarn.nodemanager.remote-app-log-dir來配置,默認在hdfs://tmp/logs目錄中:
<property>
<description>Where to aggregate logs to.</description>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/tmp/logs</value>
</property>
/tmp/logs下的子目錄默認配置:
<property>
<description>The remote log dir will be created at {yarn.nodemanager.remote-app-log-dir}/${user}/{thisParam}
</description>
<name>yarn.nodemanager.remote-app-log-dir-suffix</name>
<value>logs</value>
</property>
默認狀況下,這些日誌信息是存放在${HADOOP_HOME}/logs/userlogs目錄下:
咱們能夠經過下面的配置進行修改:
2) HDFS:當日志聚合服務關閉時(yarn.log-aggregation-enable爲false),日誌被保留在任務執行的機器本地的$HADOOP_HOME/logs/userlogs,做業執行完後不會被移到HDFS系統中
經過http://node1:8088/cluster/apps進去點擊便可查看正在運行與已經完成的做業的日誌信息:
點擊相應連接能夠查看到每一個Map或Reduce任務的日誌:
JOB啓動類main方法中的System.out:會在 Job做業提交節點的終端上輸出。若是在是Eclipse上遠程提交的,會在Eclipse中輸出:
若是做業提交到遠程服務器上運行,在哪一個節點(jobtracker)上啓動做業,就在哪一個節點終端上顯示輸出:
若是是Map或者是reduce類裏輸出的,則會將日誌輸出到 ${HADOOP_HOME}/logs/userlogs目錄下的文件中(若是日誌聚合服務被開啓的話,則任務執行完後會移到HDFS中去存儲,因此在試驗時要在任務運行完以前查看):
這些日誌還能夠經過http://node1:8088/cluster/apps頁面查看的
在Eclipse中啓動運行:
做業提交代碼(即Main方法)中的日誌、以及做業運行過程當中Eclipse控制檯輸出,是由做業jar打包中的log4j.properties配置文件來決定:
因爲在log4j.properties文件中配置了Console標準輸出,因此在Eclipse控制檯會直接打印出來:
從輸出來看,除了Main方法中的日誌輸出外,還有大量的做業運行過程當中產生的日誌記錄,這些也是log4j輸出的,這全部日誌記錄(Main中的輸出、做業系統框架輸出)都會記錄到mapreduce_test.log文件中去:
提交到服務上運行時:此時的配置文件爲/root/hadoop-2.7.2/etc/hadoop/log4j.properties
而MapReduce任務中的日誌級別是由mapred-site.xml中配置,下面是默認配置:
<property>
<name>mapreduce.map.log.level</name>
<value>INFO</value>
<description>The logging level for the map task. The allowed levels are:
OFF, FATAL, ERROR, WARN, INFO, DEBUG, TRACE and ALL.
The setting here could be overridden if "mapreduce.job.log4j-properties-file"
is set.
</description>
</property>
<property>
<name>mapreduce.reduce.log.level</name>
<value>INFO</value>
<description>The logging level for the reduce task. The allowed levels are:
OFF, FATAL, ERROR, WARN, INFO, DEBUG, TRACE and ALL.
The setting here could be overridden if "mapreduce.job.log4j-properties-file"
is set.
</description>
</property>
Map、Reduce類中的log4j輸出日誌會直接輸入到${HADOOP_HOME}/logs/userlogs目錄下的相應文件中(若是日誌聚合服務被開啓的話,則任務執行完後會移到HDFS中去存儲),而不是/root/hadoop-2.7.2/etc/hadoop/log4j.properties中配的日誌文件(該配置文件所指定的默認名爲hadoop.log,但一直都沒找到過!?):
注:若是這裏設置了Combiner,則Map端與會有reduce日誌,緣由設置了Combiner後,Map端作完Map後,會繼續運行reduce任務,因此在Map端也會看到reduce任務日誌就不奇怪了
一、下載mysql的repo源
[root@node4 ~]# wget http://repo.mysql.com/mysql-community-release-el7-5.noarch.rpm
二、安裝mysql-community-release-el7-5.noarch.rpm包
[root@node4 ~]# rpm -ivh mysql-community-release-el7-5.noarch.rpm
安裝這個包後,會得到兩個mysql的yum repo源:/etc/yum.repos.d/mysql-community.repo,/etc/yum.repos.d/mysql-community-source.repo
三、安裝mysql
[root@node4 ~]# yum install mysql-server
四、啓動數據庫
[root@node4 /root]# service mysql start
五、修改root的密碼
[root@node4 /root]# mysqladmin -u root password 'AAAaaa111'
六、配置遠程訪問,爲了安全,默認狀況只容許本地登陸,限制其餘IP遠程訪問
[root@node4 /root]# mysql -h localhost -u root -p
Enter password: AAAaaa111
mysql> GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY 'AAAaaa111' WITH GRANT OPTION;
mysql> flush privileges;
七、查看數據庫字符集
mysql> show variables like 'character%';
八、修改字符集
[root@node4 /root]# vi /etc/my.cnf
[client]
default-character-set=utf8
[mysql]
default-character-set=utf8
[mysqld]
character-set-server=utf8
九、大小寫敏感配置
不區分表名的大小寫;
[root@node4 /root]# vi /etc/my.cnf
[mysqld]
lower_case_table_names = 1
其中 0:區分大小寫,1:不區分大小寫
十、 重啓服務
[root@node4 /root]# service mysql stop
[root@node4 /root]# service mysql start
十一、 [root@node4 /root]# mysql -h localhost -u root -p
十二、 字符集修改後再次查看
mysql> show variables like 'character%';
1三、 建立庫
mysql> create database hive;
1四、 顯示數據庫
mysql> show databases;
1五、 鏈接數據庫
mysql> use hive;
1六、 查看庫中有哪些表
mysql> show tables;
1七、 退出:
mysql> exit;
基本概念:metastore包括兩部分,服務進程和數據的存儲。
《hadoop權威指南 第二版》374頁這張圖:
1.上方描述的是內嵌模式,特色是:hive服務和metastore服務運行在同一個進程中,derby服務也運行在該進程中。
該模式無需特殊配置
2.中間是本地模式,特色是:hive服務和metastore服務運行在同一個進程中,mysql是單獨的進程,能夠在同一臺機器上,也能夠在遠程機器上。
該模式只需將hive-site.xml中的ConnectionURL指向mysql,並配置好驅動名、數據庫鏈接帳號便可:
3.下方是遠程模式,特色是:hive服務和metastore在不一樣的進程內,多是不一樣的機器。
該模式須要將hive.metastore.local設置爲false,並將hive.metastore.uris設置爲metastore服務器URI,若有多個metastore服務器,URI之間用逗號分隔。metastore服務器URI的格式爲thrift://host:port,Thrift:是hive的通訊協議
<property>
<name>hive.metastore.uris</name>
<value>thrift://127.0.0.1:9083</value>
</property>
把這些理解後,你們就會明白,其實僅鏈接遠程的mysql並不能稱之爲「遠程模式」,是否遠程指的是metastore和hive服務是否在同一進程內,換句話說,「遠」指的是metastore和hive服務離得「遠」。
在node1上安裝hive,在node3上安裝metastore服務:
一、 下載地址:http://apache.fayea.com/hive
Hadoop版本爲2.7.2,這裏下載apache-hive-1.2.1-bin.tar.gz包
[root@node1 ~]# wget http://apache.fayea.com/hive/stable/apache-hive-1.2.1-bin.tar.gz
二、 [root@node1 ~]# tar -zxvf apache-hive-1.2.1-bin.tar.gz
三、 [root@node1 ~]# mv apache-hive-1.2.1-bin hive-1.2.1
四、 [root@node1 ~]# vi /etc/profile
export HIVE_HOME=/root/hive-1.2.1
export PATH=.:$PATH:$JAVA_HOME/bin:$HIVE_HOME/bin
五、 [root@node1 ~]# source /etc/profile
六、 將mysql-connector-java-5.6-bin.jar驅動放在 /root/hive-1.2.1/lib/ 目錄下面
七、 [root@node1 ~]# cp /root/hive-1.2.1/conf/hive-env.sh.template /root/hive-1.2.1/conf/hive-env.sh
八、 [root@node1 ~]# vi /root/hive-1.2.1/conf/hive-env.sh
通過上面這些操做後,應該能夠啓動默認配置(數據庫用的是內嵌數據庫derby)的HIVE了(注:運行Hive以前要啓動Hadoop):
[root@node1 ~]# hive
Logging initialized using configuration in jar:file:/root/hive-1.2.1/lib/hive-common-1.2.1.jar!/hive-log4j.properties
hive>
九、 將node1上的Hive拷貝到node3上
[root@node1 ~]# scp -r /root/hive-1.2.1 node3:/root
[root@node1 ~]# scp /etc/profile node3:/etc/profile
[root@node3 ~]# source /etc/profile
十、 [root@node1 ~]# vi /root/hive-1.2.1/conf/hive-site.xml
<configuration>
<property>
<name>hive.metastore.uris</name>
<value>thrift://node3:9083</value>
</property>
</configuration>
十一、 [root@node3 ~]# vi /root/hive-1.2.1/conf/hive-site.xml
<configuration>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://node4:3306/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>AAAaaa111</value>
</property>
</configuration>
十二、 啓動metastore 服務:
[root@node3 ~]# hive --service metastore&
[1] 2561
Starting Hive Metastore Server
[root@hadoop-slave1 /root]# jps
2561 RunJar
&表示讓metastore服務在後臺運行
1三、 啓動Hive Server:
[root@node1 ~]# hive --service hiveserver2 &
[1] 3310
[root@hadoop-master /root]# jps
3310 RunJar
進程號名也是RunJar
注:不要使用 hive --service hiveserver 來啓動服務,不然會拋異常:
Exception in thread "main" java.lang.ClassNotFoundException: org.apache.hadoop.hive.service.HiveServer
注:直接使用hive命令啓動shell環境時,其實已經順帶啓動了hiveserver,因此遠程模式下其實只須要單獨啓動metastore,而後就能夠進入shell環境正常使用,因此這一步實際上能夠省掉,直接運行hive進入shell環境
1四、 啓動hive命令行
[root@hadoop-master /root]# hive
Logging initialized using configuration in jar:file:/root/hive-1.2.1/lib/hive-common-1.2.1.jar!/hive-log4j.properties
hive>
注:啓運hive時會順帶啓動了hiveserver,因此沒有必要運行hive --service hiveserver2 & 命令了
1五、 驗證hive:
[root@hadoop-master /root]# hive
Logging initialized using configuration in jar:file:/root/hive-1.2.1/lib/hive-common-1.2.1.jar!/hive-log4j.properties
hive> show tables;
OK
Time taken: 1.011 seconds
hive> create table test(id int,name string);
可能會出現如下兩種之一的異常:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:For direct MetaStore DB connections, we don't support retries at the client level.)
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:javax.jdo.JDODataStoreException: An exception was thrown while adding/validating class(es) : Specified key was too long; max key length is 767 bytes
com.mysql.jdbc.exceptions.MySQLSyntaxErrorException: Specified key was too long; max key length is 767 bytes
這是因爲數據庫字符集引發的,進入mysql修改:
[root@node4 /root]# mysql -h localhost -u root -p
mysql> alter database hive character set latin1;
1六、 登陸mySQL查看meta信息
mysql> use hive;
3)登陸hadoop查看
[root@node1 ~]# hadoop-2.7.2/bin/hdfs dfs -ls /user/hive/warehouse
Found 1 items
drwxr-xr-x - root supergroup 0 2017-01-22 23:45 /user/hive/warehouse/test
一、 [root@node1 ~]# wget -O /root/scala-2.12.1.tgz http://downloads.lightbend.com/scala/2.12.1/scala-2.12.1.tgz
二、 [root@node1 ~]# tar -zxvf /root/scala-2.12.1.tgz
三、 [root@node1 ~]# vi /etc/profile
export SCALA_HOME=/root/scala-2.12.1
export PATH=.:$PATH:$JAVA_HOME/bin:$HIVE_HOME/bin:$SCALA_HOME/bin
四、 [root@node1 ~]# source /etc/profile
五、 [root@node1 ~]# scala -version
Scala code runner version 2.12.1 -- Copyright 2002-2016, LAMP/EPFL and Lightbend, Inc.
[root@node1 ~]# scala
Welcome to Scala 2.12.1 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_92).
Type in expressions for evaluation. Or try :help.
scala> 9*9;
res0: Int = 81
scala>
六、 [root@node1 ~]# scp -r /root/scala-2.12.1 node2:/root
[root@node1 ~]# scp -r /root/scala-2.12.1 node3:/root
[root@node1 ~]# scp -r /root/scala-2.12.1 node4:/root
[root@node1 ~]# scp /etc/profile node2:/etc
[root@node1 ~]# scp /etc/profile node3:/etc
[root@node1 ~]# scp /etc/profile node4:/etc
[root@node2 ~]# source /etc/profile
[root@node3 ~]# source /etc/profile
[root@node4 ~]# source /etc/profile
一、 [root@node1 ~]# wget -O /root/spark-2.1.0-bin-hadoop2.7.tgz http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz
二、 [root@node1 ~]# tar -zxvf /root/spark-2.1.0-bin-hadoop2.7.tgz
三、 [root@node1 ~]# vi /etc/profile
export SPARK_HOME=/root/spark-2.1.0-bin-hadoop2.7
export PATH=.:$PATH:$JAVA_HOME/bin:$HIVE_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin
四、 [root@node1 ~]# source /etc/profile
五、 [root@node1 ~]# cp /root/spark-2.1.0-bin-hadoop2.7/conf/spark-env.sh.template /root/spark-2.1.0-bin-hadoop2.7/conf/spark-env.sh
六、 [root@node1 ~]# vi /root/spark-2.1.0-bin-hadoop2.7/conf/spark-env.sh
export SCALA_HOME=/root/scala-2.12.1
export JAVA_HOME=//root/jdk1.8.0_92
export HADOOP_CONF_DIR=/root/hadoop-2.7.2/etc/hadoop
七、 [root@node1 ~]# cp /root/spark-2.1.0-bin-hadoop2.7/conf/slaves.template /root/spark-2.1.0-bin-hadoop2.7/conf/slaves
八、 [root@node1 ~]# vi /root/spark-2.1.0-bin-hadoop2.7/conf/slaves
七、 [root@node1 ~]# scp -r /root/spark-2.1.0-bin-hadoop2.7 node2:/root
[root@node1 ~]# scp -r /root/spark-2.1.0-bin-hadoop2.7 node3:/root
[root@node1 ~]# scp -r /root/spark-2.1.0-bin-hadoop2.7 node4:/root
[root@node1 ~]# scp /etc/profile node2:/etc
[root@node1 ~]# scp /etc/profile node3:/etc
[root@node1 ~]# scp /etc/profile node4:/etc
[root@node2 ~]# source /etc/profile
[root@node3 ~]# source /etc/profile
[root@node4 ~]# source /etc/profile
八、 [root@node1 conf]# /root/spark-2.1.0-bin-hadoop2.7/sbin/stop-all.sh
[root@node1 ~]# jps
2569 Master
[root@node2 ~]# jps
2120 Worker
[root@node3 ~]# jps
2121 Worker
[root@node4 ~]# jps
2198 Worker
直接在Spark Shell中進行測試:
[root@node1 conf]# spark-shell
val file=sc.textFile("hdfs://node1/hadoop/core-site.xml")
val rdd = file.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_)
rdd.collect()
rdd.foreach(println)
使用Spark將Hadoop提供的WordCount示例提交測試:
[root@node1 ~]# spark-submit --master spark://node1:7077 --class org.apache.hadoop.examples.WordCount --name wordcount /root/hadoop-2.7.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar hdfs://node1/hadoop/core-site.xml hdfs://node1/output
不過此種狀況仍是提交成MapReduce任務,而不是Spark任務,該示例包jar由Java語開發的,而且程序中未使用到Spark包
使用Spark提供的WordCount示例進行測試:
spark-submit --master spark://node1:7077 --class org.apache.spark.examples.JavaWordCount --name wordcount /root/spark-2.1.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.1.0.jar hdfs://node1/hadoop/core-site.xml hdfs://node1/output
該示例也是Java語句實現,但程序是經過Spark包實現的,因此產生了Spark任務:
Hive在spark2.0.0啓動時沒法訪問../lib/spark-assembly-*.jar: 沒有那個文件或目錄的解決辦法
[root@node1 ~]# vi /root/hive-1.2.1/bin/hive
#sparkAssemblyPath=`ls ${SPARK_HOME}/lib/spark-assembly-*.jar`
sparkAssemblyPath=`ls ${SPARK_HOME}/jars/*.jar`
[root@node1 ~]# scp /root/hive-1.2.1/bin/hive node3:/root/hive-1.2.1/bin
yum 會把下載的軟件包和header存儲在cache中,而不會自動刪除。清除YUM緩存:
[root@node1 ~]# yum clean all
[root@node1 ~]# dd if=/dev/zero of=/0bits bs=20M //將碎片空間填充上0,結束的時候會提示磁盤空間不足,忽略便可
[root@node1 ~]# rm /0bits //刪除上面的填充
關閉虛擬機,而後打開cmd ,用cd命令進入到你的vmware安裝文件夾,如D:\BOE4 而後執行:
vmware-vdiskmanager -k D:\hadoop\spark\VM\node1\node1.vmdk //注:這個vmdk文件爲總文件,而不是子的
組件 |
節點 |
默認端口 |
配置 |
用途說明 |
HDFS |
DataNode |
50010 |
dfs.datanode.address |
datanode服務端口,用於數據傳輸 |
HDFS |
DataNode |
50075 |
dfs.datanode.http.address |
http服務的端口 |
HDFS |
DataNode |
50475 |
dfs.datanode.https.address |
https服務的端口 |
HDFS |
DataNode |
50020 |
dfs.datanode.ipc.address |
ipc服務的端口 |
HDFS |
NameNode |
50070 |
dfs.namenode.http-address |
http服務的端口 |
HDFS |
NameNode |
50470 |
dfs.namenode.https-address |
https服務的端口 |
HDFS |
NameNode |
8020 |
fs.defaultFS |
接收Client鏈接的RPC端口,用於獲取文件系統metadata信息。 |
HDFS |
journalnode |
8485 |
dfs.journalnode.rpc-address |
RPC服務 |
HDFS |
journalnode |
8480 |
dfs.journalnode.http-address |
HTTP服務 |
HDFS |
ZKFC |
8019 |
dfs.ha.zkfc.port |
ZooKeeper FailoverController,用於NN HA |
YARN |
ResourceManager |
8032 |
yarn.resourcemanager.address |
RM的applications manager(ASM)端口 |
YARN |
ResourceManager |
8030 |
yarn.resourcemanager.scheduler.address |
scheduler組件的IPC端口 |
YARN |
ResourceManager |
8031 |
yarn.resourcemanager.resource-tracker.address |
IPC |
YARN |
ResourceManager |
8033 |
yarn.resourcemanager.admin.address |
IPC |
YARN |
ResourceManager |
8088 |
yarn.resourcemanager.webapp.address |
http服務端口 |
YARN |
NodeManager |
8040 |
yarn.nodemanager.localizer.address |
localizer IPC |
YARN |
NodeManager |
8042 |
yarn.nodemanager.webapp.address |
http服務端口 |
YARN |
NodeManager |
8041 |
yarn.nodemanager.address |
NM中container manager的端口 |
YARN |
JobHistory Server |
10020 |
mapreduce.jobhistory.address |
IPC |
YARN |
JobHistory Server |
19888 |
mapreduce.jobhistory.webapp.address |
http服務端口 |
HBase |
Master |
60000 |
hbase.master.port |
IPC |
HBase |
Master |
60010 |
hbase.master.info.port |
http服務端口 |
HBase |
RegionServer |
60020 |
hbase.regionserver.port |
IPC |
HBase |
RegionServer |
60030 |
hbase.regionserver.info.port |
http服務端口 |
HBase |
HQuorumPeer |
2181 |
hbase.zookeeper.property.clientPort |
HBase-managed ZK mode,使用獨立的ZooKeeper集羣則不會啓用該端口。 |
HBase |
HQuorumPeer |
2888 |
hbase.zookeeper.peerport |
HBase-managed ZK mode,使用獨立的ZooKeeper集羣則不會啓用該端口。 |
HBase |
HQuorumPeer |
3888 |
hbase.zookeeper.leaderport |
HBase-managed ZK mode,使用獨立的ZooKeeper集羣則不會啓用該端口。 |
Hive |
Metastore |
9083 |
/etc/default/hive-metastore中export PORT=<port>來更新默認端口 |
|
Hive |
HiveServer |
10000 |
/etc/hive/conf/hive-env.sh中export HIVE_SERVER2_THRIFT_PORT=<port>來更新默認端口 |
|
ZooKeeper |
Server |
2181 |
/etc/zookeeper/conf/zoo.cfg中clientPort=<port> |
對客戶端提供服務的端口 |
ZooKeeper |
Server |
2888 |
/etc/zookeeper/conf/zoo.cfg中server.x=[hostname]:nnnnn[:nnnnn],標藍部分 |
follower用來鏈接到leader,只在leader上監聽該端口。 |
ZooKeeper |
Server |
3888 |
/etc/zookeeper/conf/zoo.cfg中server.x=[hostname]:nnnnn[:nnnnn],標藍部分 |
用於leader選舉的。只在electionAlg是1,2或3(默認)時須要。 |
查超出10M的文件:
find . -type f -size +10M -print0 | xargs -0 du -h | sort -nr
將前最大的前20目錄列出來,--max-depth表示目錄深度,若是去掉,則遍歷全部子目錄:
du -hm --max-depth=5 / | sort -nr | head -20
find /etc -name '*srm*' #表示在/etc目錄下查找文件名中含有srm字符的全部文件
清除YUM緩存
yum 會把下載的軟件包和header存儲在cache中,而不會自動刪除。假如咱們以爲他們佔用了磁盤空間,可以使用yum clean指令進行清除,更精確 的用法是yum clean headers清除header,yum clean packages清除下載的rpm包,yum clean all一古腦兒端 .
更改全部者
chown -R -v 15040078 /tmp
[root@node1 ~/hadoop-2.6.0/bin]# ./hdfs dfs -chmod -R 700 /tmp