1.hadoop集羣規化
ip | 主機名 | 安裝軟件 | 角色 | 運行進程 |
---|---|---|---|---|
10.124.147.22 | hadoop1 | jdk、zookeeper、hadoop | namenode/zookeeper/jobhistoryserver | DFSZKFailoverController、NameNode、JobHistoryServer、QuorumPeerMain |
10.124.147.23 | hadoop2 | jdk、zookeeper、hadoop | namenode/zookeeper | DFSZKFailoverController、NameNode、QuorumPeerMain |
10.124.147.32 | hadoop3 | jdk、zookeeper、hadoop | resourcemanager/zookeeper | ResourceManager、QuorumPeerMain |
10.124.147.33 | hadoop4 | jdk、zookeeper、hadoop | resourcemanager/zookeeper | ResourceManager、QuorumPeerMain |
10.110.92.161 | hadoop5 | jdk、hadoop | datanode/journalnode | NodeManager、JournalNode、DataNode |
10.110.92.162 | hadoop6 | jdk、hadoop | datanode/journalnode | NodeManager、JournalNode、DataNode |
10.122.147.37 | hadoop7 | jdk、hadoop | datanode/journalnode | NodeManager、JournalNode、DataNode |
2.基本環境
system os: centos 6.5java
hadoop: 2.7.3node
zoopkeeper: 3.4.12mysql
jdk: 1.8.0linux
3.環境準備
3.1 hosts設定
[root@10-124-147-23 local]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 10.124.147.22 hadoop1 10-124-147-22 10.124.147.23 hadoop2 10-124-147-23 10.124.147.32 hadoop3 10-124-147-32 10.124.147.33 hadoop4 10-124-147-33 10.110.92.161 hadoop5 10-110-92-161 10.110.92.162 hadoop6 10-110-92-162 10.122.147.37 hadoop7 10-122-147-37
在此須要注意兩點web
- 127.0.0.1以後不要有主機名,好比上面的10-124-147-22的
- 最好將ipv6地址欄的localhosts刪除
- 此處除了hadoop1以外,我還設定了10-124-147-22,是由於不想更改主機名,實際實際的時候,直接進行hostname更改便可
3.2 java環境安裝
3.2.1 jdk安裝包解壓
[root@10-124-147-23 letv]# tar xvf jdk-8u141-linux-x64.tar.gz [root@10-124-147-23 letv]# ln -svfn /letv/jdk1.8.0_141 /usr/local/java
3.2.2 profile環境的變動
[root@10-124-147-23 letv]# tail -3 /etc/profile export JAVA_HOME=/usr/local/java export HADOOP_HOME=/usr/local/hadoop export PATH=$HADOOP_HOME/bin:$JAVA_HOME/bin:$PATH [root@10-124-147-23 letv]# source /etc/profile
3.3 zookeeper集羣的安裝
3.3.1 zookeeper安裝包的解壓
[root@10-124-147-23 letv]# tar xvf zookeeper-3.4.12.tar.gz [root@10-124-147-23 letv]# ln -svnf /letv/zookeeper-3.4.12 /usr/local/zookeeper [root@10-124-147-23 letv]# cd /usr/local/zookeeper/conf [root@10-124-147-23 conf]# ll total 16 -rw-rw-r-- 1 1000 1000 535 Mar 27 12:32 configuration.xsl -rw-rw-r-- 1 1000 1000 2161 Mar 27 12:32 log4j.properties -rw-rw-r-- 1 1000 1000 922 Mar 27 12:32 zoo_sample.cfg [root@10-124-147-23 conf]# cp zoo_sample.cfg zoo.cfg
3.3.2 zoo.cfg配置文件修改
[root@10-124-147-23 conf]# grep ^[^#] zoo.cfg tickTime=2000 initLimit=10 syncLimit=5 dataDir=/usr/local/zookeeper/data clientPort=2181 server.1=hadoop1:2888:3888 server.2=hadoop2:2888:3888 server.3=hadoop3:2888:3888 server.4=hadoop4:2888:3888
修改dataDir
值,由於同時要創建zookeeper集羣,下面寫下對應的server地址sql
[root@10-124-147-23 conf]# echo 1 > /usr/local/zookeeper/data/myid
將當前主機在zookeeper集羣中的id值寫入,而後啓動zookeepershell
3.3.3 啓動zookeeper
[root@10-124-147-23 bin]# pwd /usr/local/zookeeper/bin [root@10-124-147-23 bin]# ./zkServer.sh start
同理,啓動其它主機的zookeeper,操做同上,惟一區別的就是/usr/local/zookeeper/data/myid
中的值,須要彼此不同express
3.3.4 查看zookeeper狀態
[root@10-124-147-23 bin]# ./zkServer.sh status ZooKeeper JMX enabled by default Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg Mode: follower [root@10-124-147-33 ~]# /usr/local/zookeeper/bin/zkServer.sh status ZooKeeper JMX enabled by default Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg Mode: leader
4.hadoop的安裝
hadoop2.0官方提供了兩種HDFS HA的解決方案,一種是NFS,另外一種是QJM。這裏咱們使用簡單的QJM。在該方案中,主備NameNode之間經過一組JournalNode同步元數據信息,一條數據只要成功寫入多數JournalNode即認爲寫入成功。JournalNode的個數須要爲奇數個apache
4.1 hadoop解壓
[root@10-124-147-33 letv]# tar xvf hadoop-2.7.6.tar.gz [root@10-124-147-23 ~]# ln -svnf /letv/hadoop-2.7.6 /usr/local/hadoop
4.2 hadoop環境
本次安裝hadoop
,只須要指定java
環境和hadoop
環境便可,由於zookeeper
和hadoop
都須要運行java
環境,上述安裝環境已經指定bootstrap
[root@10-124-147-23 letv]# tail -3 /etc/profile export JAVA_HOME=/usr/local/java export HADOOP_HOME=/usr/local/hadoop export PATH=$HADOOP_HOME/bin:$JAVA_HOME/bin:$PATH
4.3 hadoop配置文件的修改
hadoop配置文件位於etc/hadoop目錄之下,主要控制文件有如下6個
4.3.1 hadoop-env.sh
[root@10-124-147-23 ~]# grep JAVA_HOME /usr/local/hadoop/etc/hadoop/hadoop-env.sh # The only required environment variable is JAVA_HOME. All others are # set JAVA_HOME in this file, so that it is correctly defined on export JAVA_HOME=/usr/local/java
此處須要指向java
環境的實際路徑,不能直接使用${JAVA_HOME}
來指定,此處並不能直接識別此變量,具體緣由未知。
4.3.2 hdfs-site.xml
[root@10-124-147-23 ~]# cat /usr/local/hadoop/etc/hadoop/hdfs-site.xml <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <!--指定hdfs的nameservice爲ns1,須要和core-site.xml中的保持一致 --> <property> <name>dfs.nameservices</name> <value>ns1</value> </property> <!-- ns1下面有兩個NameNode,分別是nn1,nn2 --> <property> <name>dfs.ha.namenodes.ns1</name> <value>nn1,nn2</value> </property> <!-- nn1的RPC通訊地址 --> <property> <name>dfs.namenode.rpc-address.ns1.nn1</name> <value>hadoop1:9000</value> </property> <!-- nn1的http通訊地址 --> <property> <name>dfs.namenode.http-address.ns1.nn1</name> <value>hadoop1:50070</value> </property> <!-- nn2的RPC通訊地址 --> <property> <name>dfs.namenode.rpc-address.ns1.nn2</name> <value>hadoop2:9000</value> </property> <!-- nn2的http通訊地址 --> <property> <name>dfs.namenode.http-address.ns1.nn2</name> <value>hadoop2:50070</value> </property> <!-- 指定NameNode的元數據在JournalNode上的存放位置 --> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://hadoop5:8485;hadoop6:8485;hadoop7:8485/ns1</value> </property> <!-- 指定JournalNode在本地磁盤存放數據的位置 --> <property> <name>dfs.journalnode.edits.dir</name> <value>/usr/local/hadoop/data/journaldata</value> </property> <!-- 開啓NameNode失敗自動切換 --> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <!-- 配置失敗自動切換實現方式 --> <property> <name>dfs.client.failover.proxy.provider.ns1</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <!-- 配置隔離機制方法,多個機制用換行分割,即每一個機制佔用一行--> <property> <name>dfs.ha.fencing.methods</name> <value> sshfence shell(/bin/true) </value> </property> <!-- 使用sshfence隔離機制時須要ssh免登錄 --> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/hadoop/.ssh/id_rsa</value> </property> <!-- 配置sshfence隔離機制超時時間 --> <property> <name>dfs.ha.fencing.ssh.connect-timeout</name> <value>30000</value> </property> </configuration>
在hadoop 3中,hdfs
的web通信端口50070
已經變動爲9870
4.3.3 mapred-site.xml
[root@10-124-147-23 ~]# cat /usr/local/hadoop/etc/hadoop/mapred-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <!-- 指定mr框架爲yarn方式 --> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>hadoop1:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>hadoop1:19888</value> </property> </configuration>
4.3.4 core-site.xml
[root@10-124-147-23 ~]# cat /usr/local/hadoop/etc/hadoop/core-site.xml <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <!-- 指定hdfs的nameservice爲ns1 --> <property> <name>fs.defaultFS</name> <value>hdfs://ns1</value> </property> <!-- 指定hadoop臨時目錄 --> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/hadoop/data/tmp</value> </property> <!-- 指定zookeeper地址 --> <property> <name>ha.zookeeper.quorum</name> <value>hadoop1:2181,hadoop2:2181,hadoop3:2181,hadoop4:2181</value> </property> </configuration>
4.3.5 yarn-site.xml
[root@10-124-147-23 ~]# cat /usr/local/hadoop/etc/hadoop/yarn-site.xml <?xml version="1.0"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <configuration> <!-- Site specific YARN configuration properties --> <!-- 開啓RM高可靠 --> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <!-- 指定RM的cluster id --> <property> <name>yarn.resourcemanager.cluster-id</name> <value>yrc</value> </property> <!-- 指定RM的名字 --> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <!-- 分別指定RM的地址 --> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>hadoop3</value> </property> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>hadoop4</value> </property> <!-- 指定zk集羣地址 --> <property> <name>yarn.resourcemanager.zk-address</name> <value>hadoop1:2181,hadoop2:2181,hadoop3:2181,hadoop4:2181</value> </property> <!-- 在RM節點接管後,任務狀態能夠恢復--> <property> <name>yarn.resourcemanager.recovery.enabled</name> <value>true</value> </property> <!-- 設置存儲yarn中狀態信息的地方,默認爲hdfs,這裏設置爲zookeeper--> <property> <name>yarn.resourcemanager.store.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value> </property> <!-- 使在yarn上可以運行mapreduce_shuffle程序--> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
4.3.6 slave
[root@10-124-147-23 ~]# cat /usr/local/hadoop/etc/hadoop/slaves hadoop5 hadoop6 hadoop7
這裏的slaves分兩種,對於hadoop1
而言,其爲namenode
,因此其slaves是hdfs系統中的slaves,也就是datanode
,在本文中,設定hadoop5
,hadoop6
,hadoop7
爲datanode
而對於hadoop3
而言,其爲resourcemanager
,故其slaves是yarn系統中的slaves,也就是nodemanager
,nodemanager
是對每機機器的資源狀態進行監控,同時將監控結果向resourcemanager
進行報告,通常一臺datanode
上面都會有着nodemanager
進程。
本文中journalnode
,nodemanager
,datanode
三個角色都是位於同一機器,實際上journalnode
只是參與到namenode
HA模式中,與後二者並不掛鉤,由於集羣中不容許同時有兩個namenode
同時工做 ,不然數據地址空間就會出錯,可是爲了HA,因此standby
的namenode
須要保持與active
狀態的namenode
數據一致,兩個namenode
爲了數據同步,會經過一組稱做journalnodes
的獨立進程進行相互通訊。當active狀態的namenode
的命名空間有任何修改時,會告知大部分的journalnodes
進程。standby
狀態的namenode
有能力讀取journalnodes
中的變動信息,而且一直監控edit log
的變化,把變化應用於本身的命名空間。standby
能夠確保在集羣出錯時,命名空間狀態已經徹底同步了。
通常正常生產中,journalnode
設定爲5個,基本上zookeeper
個數也是設定爲5個,文中我zookeeper
設定4個其實不太合理。
綜上,因此對於hadoop3而言,其slaves也能夠設定爲hadoop5
,hadoop6
,hadoop7
因此本文中全部節點,hadoop配置能夠保持一致
4.3.7 ssh-key驗證
實際生產中其實只須要namenode
之間ssh-key免密便可,實驗環境中,由於須要在namenode
中直接經過腳本啓動其它slaves節點,因此須要進行ssh-key免密的設定
主要的設定的是datanode中須要有兩個namenode
和兩個resourcemanager
的ssh-key信息,同時namenode
和resourcemanger
自身也須要自身的ssh-key,以便啓動,因此文中hadoop1
,hadoop2
,hadoop3
,hadoop4
4臺主機的hadoop用戶的ssh-key須要放置於每一臺主機hadoop用戶之下。
[root@10-124-147-23 ~]# useradd hadoop [hadoop@10-124-147-23 ~]$ ssh-keygen [hadoop@10-124-147-23 ~]$ cat .ssh/id_rsa.pub ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAyQ9T7zTAlhqFM9XQoHTPzwfgDwAzwLUgqe7NnDpufiirK9QqCdLZFNE6PNtN7oNyWMu3r9UE5aMYv9uLMu22m+8xyTXXINYfPW9hsityu/N6a9DwhEC9joNS3DVjBR8YRMQG2sxtDbebbaG2R4BK77DZyoB0uyqRItxLIMYTiZ/00LCMJCoAINUQVzOrteVpLHAviRNnrwZewoD2sUgeZU0A0hT++RiE/prqI+jIFJSacduVaKsabRu/zKan9b8coC1b+GJnypqk+CPyahJL+0jgb9Jgrjm2Lt4erbBo/k3u16nSJpSoSdf7kr5HKv3ds5+fwcMQV5oKV1jv6ximIw== hadoop@10-124-147-23
而後切換至其它節點主要,依次建立hadoop用戶,將namenode
節點ssh-key寫入
[root@10-124-147-33 letv]# useradd hadoop [hadoop@10-124-147-33 ~]$ mkdir .ssh [hadoop@10-124-147-33 ~]$ chmod g-w .ssh 以上這一步很是重要,由於正常狀況下須要對hadoop用戶進行密碼設定以後,而後再使用ssh-copy-id將key自動寫入到其它主機中,咱們並無對hadoop用戶設定密碼,而ssh中爲了安全,g與o用戶是對.ssh目錄均無w權限的,因此須要將.ssh目錄中g與o用戶的w權限去掉。相似的還在後面中的authorized_keys文件 [hadoop@10-124-147-33 ~]$ vim .ssh/authorized_keys 將hadoop1中的id_rsa.pub寫入 [hadoop@10-124-147-33 ~]$ chmod 600 .ssh/authorized_keys [hadoop@10-124-147-33 ~]$ ll .ssh/authorized_keys -rw------- 1 hadoop hadoop 1608 Jul 19 11:43 .ssh/authorized_keys [hadoop@10-124-147-33 ~]$ ll -d .ssh/ drwxr-xr-x 2 hadoop hadoop 4096 Jul 19 11:43 .ssh/
4.3.8 hadoop 文件copy
將hadoop1中的hadoop目錄整個scp至其它節點,同時注意/etc/profile文件,以及部分節點上面的java環境
4.4 hadoop的啓動
4.4.1 啓動journalnode
[hadoop@10-110-92-161 ~]$ cd /usr/local/hadoop/ [hadoop@10-110-92-161 hadoop]$ sbin/hadoop-daemon.sh start journalnode [hadoop@10-110-92-161 hadoop]$ jps 1557 JournalNode 22439 Jps
三個節點的journalnode
都要啓動
4.4.2 格式化namenode
[hadoop@10-124-147-22 hadoop]$ hdfs namenode -format
4.4.3 啓動active namenode
[hadoop@10-124-147-22 hadoop]$ sbin/hadoop-daemon.sh start namenode [hadoop@10-124-147-22 hadoop]$ jps 2580 DFSZKFailoverController 29590 Jps 1487 NameNode
4.4.4 複製active namenode信息至standby namenode
格式化active namenode
後會在根據core-site.xml中的hadoop.tmp.dir配置生成個文件,可能直接copy至standby namenode
,也能夠經過選項-bootstrapStandby
直接從active namenode
拉取,使用命令拉取的前提是active namenode
進程須要啓動
[hadoop@10-124-147-23 hadoop]$ hdfs namenode -bootstrapStandby [hadoop@10-124-147-23 hadoop]$ sbin/hadoop-daemon.sh start namenode [hadoop@10-124-147-23 hadoop]$ jps 899 NameNode 11846 Jps 1353 DFSZKFailoverController
4.4.5 格式化zkfc
[hadoop@10-124-147-22 hadoop]$ hdfs zkfc -formatZK
4.4.6 啓動hdfs
[hadoop@10-124-147-22 hadoop]$ sbin/start-dfs.sh
4.4.7 啓動resourcemanager
[hadoop@10-124-147-32 hadoop]$ pwd /usr/local/hadoop [hadoop@10-124-147-32 hadoop]$ resourcemanager sbin/start-yarn.sh [hadoop@10-124-147-32 hadoop]$ jps 30882 ResourceManager 26868 Jps
4.4.8 啓動standby resourcemanager
[hadoop@10-124-147-33 hadoop]$ pwd /usr/local/hadoop [hadoop@10-124-147-33 hadoop]$ sbin/yarn-daemon.sh start resourcemanager [hadoop@10-124-147-33 hadoop]$ jps 22675 Jps 26980 ResourceManager
4.4.9 集羣狀態檢測
[hadoop@10-124-147-22 hadoop]$ hdfs haadmin -getServiceState nn1 active [hadoop@10-124-147-22 hadoop]$ hdfs haadmin -getServiceState nn2 standby [hadoop@10-124-147-22 hadoop]$ yarn rmadmin -getServiceState rm1 active [hadoop@10-124-147-22 hadoop]$ yarn rmadmin -getServiceState rm2 standby
此時,能夠經過web訪問active namenode
的50070
端口和active resourcemanager
的8080
端口
4.4.10 啓動history進程
在active namenode啓動便可
[hadoop@10-124-147-22 hadoop]$ sbin/mr-jobhistory-daemon.sh start historyserver [hadoop@10-124-147-22 hadoop]$ pwd /usr/local/hadoop [hadoop@10-124-147-22 hadoop]$ jps 2580 DFSZKFailoverController 31781 Jps 2711 JobHistoryServer 1487 NameNode
4.5 hadoop的簡單使用
4.5.1 上傳文件於hdfs
新建一個文件/tmp/test.txt [hadoop@10-124-147-22 hadoop]$ cat /tmp/test.txt hello world hello mysql hello mongo hello elasticsearch hello hadoop hello hdfs hello yarn hello namenode hello datanode hello resourcemanager hello nodemanager hello journalnode [hadoop@10-124-147-22 hadoop]$ hadoop fs -put /tmp/test.txt /wordcount 將/tmp/test.txt文件上傳於hdfs中,並重命名爲wordcount [hadoop@10-124-147-22 hadoop]$ hadoop fs -cat /wordcount hello world hello mysql hello mongo hello elasticsearch hello hadoop hello hdfs hello yarn hello namenode hello datanode hello resourcemanager hello nodemanager hello journalnode
4.5.2 hadoop任務測試
hadoop中提供了簡單的任務測試jar包,能夠進行測試
[hadoop@10-124-147-22 hadoop]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.6.jar pi 2 10 Number of Maps = 2 Samples per Map = 10 Wrote input for Map #0 Wrote input for Map #1 Starting Job 18/07/23 15:41:47 INFO input.FileInputFormat: Total input paths to process : 2 18/07/23 15:41:47 INFO mapreduce.JobSubmitter: number of splits:2 18/07/23 15:41:47 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1532056892547_0003 18/07/23 15:41:47 INFO impl.YarnClientImpl: Submitted application application_1532056892547_0003 18/07/23 15:41:47 INFO mapreduce.Job: The url to track the job: http://hadoop3:8088/proxy/application_1532056892547_0003/ 18/07/23 15:41:47 INFO mapreduce.Job: Running job: job_1532056892547_0003 18/07/23 15:41:53 INFO mapreduce.Job: Job job_1532056892547_0003 running in uber mode : false 18/07/23 15:41:53 INFO mapreduce.Job: map 0% reduce 0% 18/07/23 15:41:58 INFO mapreduce.Job: map 100% reduce 0% 18/07/23 15:42:03 INFO mapreduce.Job: map 100% reduce 100% 18/07/23 15:42:04 INFO mapreduce.Job: Job job_1532056892547_0003 completed successfully 18/07/23 15:42:05 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=50 FILE: Number of bytes written=376437 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=510 HDFS: Number of bytes written=215 HDFS: Number of read operations=11 HDFS: Number of large read operations=0 HDFS: Number of write operations=3 Job Counters Launched map tasks=2 Launched reduce tasks=1 Data-local map tasks=2 Total time spent by all maps in occupied slots (ms)=5283 Total time spent by all reduces in occupied slots (ms)=2804 Total time spent by all map tasks (ms)=5283 Total time spent by all reduce tasks (ms)=2804 Total vcore-milliseconds taken by all map tasks=5283 Total vcore-milliseconds taken by all reduce tasks=2804 Total megabyte-milliseconds taken by all map tasks=5409792 Total megabyte-milliseconds taken by all reduce tasks=2871296 Map-Reduce Framework Map input records=2 Map output records=4 Map output bytes=36 Map output materialized bytes=56 Input split bytes=274 Combine input records=0 Combine output records=0 Reduce input groups=2 Reduce shuffle bytes=56 Reduce input records=4 Reduce output records=0 Spilled Records=8 Shuffled Maps =2 Failed Shuffles=0 Merged Map outputs=2 GC time elapsed (ms)=219 CPU time spent (ms)=3030 Physical memory (bytes) snapshot=752537600 Virtual memory (bytes) snapshot=6612717568 Total committed heap usage (bytes)=552075264 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=236 File Output Format Counters Bytes Written=97 Job Finished in 18.492 seconds Estimated value of Pi is 3.80000000000000000000
在job執行的時候,能夠查看resourcemanger
web端的8088端口,上面能夠看到job的完成進度
再執行一個word count任務
能夠執行字母統計,將hdfs中的wordcount文件統計,並將結果輸出到wordcount-to-output [hadoop@10-124-147-22 hadoop]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.6.jar wordcount /wordcount /wordcount-to-output 18/07/23 15:45:12 INFO input.FileInputFormat: Total input paths to process : 1 18/07/23 15:45:13 INFO mapreduce.JobSubmitter: number of splits:1 18/07/23 15:45:13 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1532056892547_0004 18/07/23 15:45:13 INFO impl.YarnClientImpl: Submitted application application_1532056892547_0004 18/07/23 15:45:13 INFO mapreduce.Job: The url to track the job: http://hadoop3:8088/proxy/application_1532056892547_0004/ 18/07/23 15:45:13 INFO mapreduce.Job: Running job: job_1532056892547_0004 18/07/23 15:45:19 INFO mapreduce.Job: Job job_1532056892547_0004 running in uber mode : false 18/07/23 15:45:19 INFO mapreduce.Job: map 0% reduce 0% 18/07/23 15:45:23 INFO mapreduce.Job: map 100% reduce 0% 18/07/23 15:45:29 INFO mapreduce.Job: map 100% reduce 100% 18/07/23 15:45:29 INFO mapreduce.Job: Job job_1532056892547_0004 completed successfully 18/07/23 15:45:29 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=197 FILE: Number of bytes written=250631 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=264 HDFS: Number of bytes written=140 HDFS: Number of read operations=6 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=2492 Total time spent by all reduces in occupied slots (ms)=3007 Total time spent by all map tasks (ms)=2492 Total time spent by all reduce tasks (ms)=3007 Total vcore-milliseconds taken by all map tasks=2492 Total vcore-milliseconds taken by all reduce tasks=3007 Total megabyte-milliseconds taken by all map tasks=2551808 Total megabyte-milliseconds taken by all reduce tasks=3079168 Map-Reduce Framework Map input records=12 Map output records=24 Map output bytes=275 Map output materialized bytes=197 Input split bytes=85 Combine input records=24 Combine output records=13 Reduce input groups=13 Reduce shuffle bytes=197 Reduce input records=13 Reduce output records=13 Spilled Records=26 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=155 CPU time spent (ms)=2440 Physical memory (bytes) snapshot=465940480 Virtual memory (bytes) snapshot=4427837440 Total committed heap usage (bytes)=350224384 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=179 File Output Format Counters Bytes Written=140
執行結果
[hadoop@10-124-147-22 hadoop]$ hadoop fs -ls / Found 5 items drwxrwx--- - hadoop supergroup 0 2018-07-20 11:21 /tmp drwxr-xr-x - hadoop supergroup 0 2018-07-20 11:47 /user -rw-r--r-- 3 hadoop supergroup 179 2018-07-20 11:22 /wordcount drwxr-xr-x - hadoop supergroup 0 2018-07-23 15:45 /wordcount-to-output [hadoop@10-124-147-22 hadoop]$ hadoop fs -ls /wordcount-to-output Found 2 items -rw-r--r-- 3 hadoop supergroup 0 2018-07-23 15:45 /wordcount-to-output/_SUCCESS -rw-r--r-- 3 hadoop supergroup 140 2018-07-23 15:45 /wordcount-to-output/part-r-00000 [hadoop@10-124-147-22 hadoop]$ hadoop fs -cat /wordcount-to-output/part-r-00000 datanode 1 elasticsearch 1 hadoop 1 hdfs 1 hello 12 journalnode 1 mongo 1 mysql 1 namenode 1 nodemanager 1 resourcemanager 1 world 1 yarn 1
5.其它
5.1 hadoop3相對比hadoop2進程端口更變
Namenode ports: 50470 --> 9871, 50070 --> 9870, 8020 --> 9820 Secondary NN ports: 50091 --> 9869, 50090 --> 9868 Datanode ports: 50020 --> 9867, 50010 --> 9866, 50475 --> 9865, 50075 --> 9864 KMS service :16000 --> 9600
同時變動的還有slaves
文件,在hadoop2中的slaves
文件在hadoop3中變成works
文件
5.2生產中datanode的啓動
生產中hadoop集羣裏面的datanode
通常都是幾百上千臺主機,實際上生產中的datanode
都是在各自主機中自行單獨啓動,並非直接經過namenode
進行啓動,因此上面4.3.7中的ssh-key在實際生產中並不無那麼多需求。同時journalnode
雖然消耗資源小,可是通常也不與datanode
分佈於同一臺主機中。