廢話很少說直接實戰,部署Hadoop高性能集羣:java
拓撲圖:node
1、實驗前期環境準備:linux
一、三臺主機配置hosts文件:(複製到另外兩臺主機上)web
[root@tiandong63 ~]# more /etc/hosts
192.168.199.3 tiandong63
192.168.199.4 tiandong64
192.168.199.5 tiandong65apache
二、建立Hadoop帳號(另外兩臺主機上都的建立)vim
[root@tiandong63 ~]#useradd -u 8000 hadoopbash
[root@tiandong63 ~]#echo '123456' | passwd --stdin hadoop服務器
三、給hadoop用戶增長sudo權限,增長內容(另外兩臺主機上都得配置)app
[root@tiandong63 ~]# vim /etc/sudoers
hadoop ALL=(ALL) ALL
四、主機互信(在tiandong63主機上)
能夠ssh無密碼登陸機器tiandong63,tiandong64,tiandong65 ,方便後期複製文件和啓動服務。由於namenode啓動時,會鏈接到datanode上啓動對應的服務。框架
[root@tiandong63 ~]# su - hadoop
[hadoop@tiandong63 ~]$ ssh-keygen
[hadoop@tiandong63 ~]$ ssh-copy-id root@192.168.199.4
[hadoop@tiandong63 ~]$ ssh-copy-id root@192.168.199.5
2、配置hadoop環境
一、配置jdk環境:(三臺都的配置)
[root@tiandong63 ~]# ll jdk-8u191-linux-x64.tar.gz
-rw-r--r-- 1 root root 191753373 Dec 30 00:58 jdk-8u191-linux-x64.tar.gz
[root@tiandong63 ~]# tar zxvf jdk-8u191-linux-x64.tar.gz -C /usr/local/src/
[root@tiandong63 ~]# vim /etc/profile
export JAVA_HOME=/usr/local/src/jdk1.8.0_191
export JAVA_BIN=/usr/local/src/jdk1.8.0_191/bin
export PATH=${JAVA_HOME}/bin:$PATH
export CLASSPATH=.:${JAVA_HOME}/lib/dt.jar:${JAVA_HOME}/lib/tools.jar
export hadoop_root_logger=DEBUG,console
[root@tiandong63 ~]# source /etc/profile
二、關閉防火牆(三臺都的關閉)
[root@tiandong63 ~]# /etc/init.d/iptables stop
三、在tiandong63安裝Hadoop 並配置成namenode主節點
[root@tiandong63 ~]# cd /home/hadoop/
[root@tiandong63 hadoop]# ll hadoop-2.7.7.tar.gz
-rw-r--r-- 1 hadoop hadoop 218720521 Dec 29 21:11 hadoop-2.7.7.tar.gz
[root@tiandong63 ~]# su - hadoop
[hadoop@tiandong63 ~]$ tar -zxvf hadoop-2.7.7.tar.gz
建立hadoop相關的工做目錄:
[hadoop@tiandong63 ~]$ mkdir -p /home/hadoop/dfs/name/ /home/hadoop/dfs/data/^Chome/hadoop/tmp/
四、配置hadoop(修改7個配置文件)
文件名稱:hadoop-env.sh、yarn-evn.sh、slaves、core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml
[hadoop@tiandong63 ~]$ cd /home/hadoop/hadoop-2.7.7/etc/hadoop/
[hadoop@tiandong63 hadoop]$ vim hadoop-env.sh 指定hadoop的Java運行環境
25 export JAVA_HOME=/usr/local/src/jdk1.8.0_191
[hadoop@tiandong63 hadoop]$ vim yarn-env.sh 指定yarn框架的java運行環境
26 JAVA_HOME=/usr/local/src/jdk1.8.0_191
[hadoop@tiandong63 hadoop]$ vim slaves 指定datanode 數據存儲服務器
tiandong64
tiandong65
[hadoop@tiandong63 hadoop]$ vim core-site.xml 指定訪問hadoop web界面訪問路徑
19 <configuration>
20 <property>
21 <name>fs.defaultFS</name>
22 <value>hdfs://tiandong63:9000</value>
23 </property>
24
25 <property>
26 <name>io.file.buffer.size</name>
27 <value>131072</value>
28 </property>
29
30 <property>
31 <name>hadoop.tmp.dir</name>
32 <value>file:/home/hadoop/tmp</value>
33 <description>Abase for other temporary directories.</description>
34 </property>
35 </configuration>
[hadoop@tiandong63 hadoop]$ mkdir -p /home/hadoop/tmp
[hadoop@tiandong63 hadoop]$ vim hdfs-site.xml
hdfs的配置文件,dfs.http.address配置了hdfs的http的訪問位置,dfs.replication配置了文件塊的副本數,通常不大於從機的個數。
19 <configuration>
20 <property>
21 <name>dfs.namenode.secondary.http-address</name>
22 <value>tiandong63:9001</value>
23 </property>
24
25 <property>
26 <name>dfs.namenode.name.dir</name>
27 <value>file:/home/hadoop/dfs/name</value>
28 </property>
29 <property>
30 <name>dfs.datanode.data.dir</name>
31 <value>file:/home/hadoop/dfs/data</value>
32 </property>
33
34 <property>
35 <name>dfs.replication</name>
36 <value>2</value>
37 </property>
38
39 <property>
40 <name>dfs.webhdfs.enabled</name>
41 <value>true</value>
42 </property>
43 </configuration>
[hadoop@tiandong63 hadoop]$ vim mapred-site.xml
19 <configuration>
20 <property>
21 <name>mapreduce.framework.name</name>
22 <value>yarn</value>
23 </property>
24
25 <property>
26 <name>mapreduce.jobhistory.address</name>
27 <value>tiandong63:10020</value>
28 </property>
29
30 <property>
31 <name>mapreduce.jobhistory.webapp.address</name>
32 <value>tiandong:19888</value>
33 </property>
34 </configuration>
[hadoop@tiandong63 hadoop]$ vim yarn-site.xml 該文件爲yarn框架的配置,主要是一些任務的啓動位置
15 <configuration>
16
17 <!-- Site specific YARN configuration properties -->
18 <property>
19 <name>yarn.nodemanager.aux-services</name>
20 <value>mapreduce_shuffle</value>
21 </property>
22
23 <property>
24 <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
25 <value>org.apache.hadoop.mapred.ShuffleHandler</value>
26 </property>
27
28 <property>
29 <name>yarn.resourcemanager.address</name>
30 <value>tiandong63:8032</value>
31 </property>
32
33 <property>
34 <name>yarn.resourcemanager.scheduler.address</name>
35 <value>tiandong63:8030</value>
36 </property>
37
38 <property>
39 <name>yarn.resourcemanager.resource-tracker.address</name>
40 <value>tiandong63:8031</value>
41 </property>
42
43 <property>
44 <name>yarn.resourcemanager.admin.address</name>
45 <value>tiandong63:8033</value>
46 </property>
47
48 <property>
49 <name>yarn.resourcemanager.webapp.address</name>
50 <value>tiandong63:8088</value>
51 </property>
52 </configuration>
複製到其餘datanode節點(192.168.199.4/5)
[hadoop@tiandong63 ~]$ scp -r /home/hadoop/hadoop-2.7.7 hadoop@tiandong64
[hadoop@tiandong63 ~]$ scp -r /home/hadoop/hadoop-2.7.7 hadoop@tiandong65
3、啓動hadoop:
在tiandong63上面啓動(使用hadoop用戶)
一、格式化:
[hadoop@tiandong63 ~]$ cd /home/hadoop/hadoop-2.7.7/bin/
[hadoop@tiandong63 bin]$ ./hdfs nodename -format
二、啓動hdfs:
[hadoop@tiandong63 ~]$ /home/hadoop/hadoop-2.7.7/sbin/start-dfs.sh
查看進程(tiandong63上面查看namenode):
[hadoop@tiandong63 ~]$ ps -axu | grep namenode --color
Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.8/FAQ
hadoop 42851 0.4 17.6 2762196 177412 ? Sl 18:13 1:26 /usr/local/src/jdk1.8.0_191/bin/java -Dproc_namenode -Xmx1000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/hadoop/hadoop-2.7.7/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/home/hadoop/hadoop-2.7.7 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,console -Djava.library.path=/home/hadoop/hadoop-2.7.7/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/hadoop/hadoop-2.7.7/logs -Dhadoop.log.file=hadoop-hadoop-namenode-tiandong63.log -Dhadoop.home.dir=/home/hadoop/hadoop-2.7.7 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,RFA -Djava.library.path=/home/hadoop/hadoop-2.7.7/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.namenode.NameNode
hadoop 43046 0.2 13.3 2733336 134344 ? Sl 18:13 0:35 /usr/local/src/jdk1.8.0_191/bin/java -Dproc_secondarynamenode -Xmx1000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/hadoop/hadoop-2.7.7/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/home/hadoop/hadoop-2.7.7 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,console -Djava.library.path=/home/hadoop/hadoop-2.7.7/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/hadoop/hadoop-2.7.7/logs -Dhadoop.log.file=hadoop-hadoop-secondarynamenode-tiandong63.log -Dhadoop.home.dir=/home/hadoop/hadoop-2.7.7 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,RFA -Djava.library.path=/home/hadoop/hadoop-2.7.7/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode
在tiandong64和tiandong65上面查看進程(datanode)
[root@tiandong64 ~]# ps -aux|grep datanode
Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.8/FAQ
hadoop 3938 0.3 12.2 2757576 122592 ? Sl 18:13 1:04 /usr/local/src/jdk1.8.0_191/bin/java -Dproc_datanode -Xmx1000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/hadoop/hadoop-2.7.7/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/home/hadoop/hadoop-2.7.7 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,console -Djava.library.path=/home/hadoop/hadoop-2.7.7/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/hadoop/hadoop-2.7.7/logs -Dhadoop.log.file=hadoop-hadoop-datanode-tiandong64.log -Dhadoop.home.dir=/home/hadoop/hadoop-2.7.7 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,RFA -Djava.library.path=/home/hadoop/hadoop-2.7.7/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -server -Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.datanode.DataNode
三、啓動yarn(在tiandong63上面,即啓動分佈式計算):
[hadoop@tiandong63 ~]$ /home/hadoop/hadoop-2.7.7/sbin/start-yarn.sh
在tiandong63上面查看進程(resourcemanager進程)
[hadoop@tiandong63 ~]$ ps -ef|grep resourcemanager --color
hadoop 43196 1 0 18:14 pts/1 00:02:55 /usr/local/src/jdk1.8.0_191/bin/java -Dproc_resourcemanager -Xmx1000m -Dhadoop.log.dir=/home/hadoop/hadoop-2.7.7/logs -Dyarn.log.dir=/home/hadoop/hadoop-2.7.7/logs -Dhadoop.log.file=yarn-hadoop-resourcemanager-tiandong63.log -Dyarn.log.file=yarn-hadoop-resourcemanager-tiandong63.log -Dyarn.home.dir= -Dyarn.id.str=hadoop -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/home/hadoop/hadoop-2.7.7/lib/native -Dyarn.policy.file=hadoop-policy.xml -Dhadoop.log.dir=/home/hadoop/hadoop-2.7.7/logs -Dyarn.log.dir=/home/hadoop/hadoop-2.7.7/logs -Dhadoop.log.file=yarn-hadoop-resourcemanager-tiandong63.log -Dyarn.log.file=yarn-hadoop-resourcemanager-tiandong63.log -Dyarn.home.dir=/home/hadoop/hadoop-2.7.7 -Dhadoop.home.dir=/home/hadoop/hadoop-2.7.7 -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/home/hadoop/hadoop-2.7.7/lib/native -classpath /home/hadoop/hadoop-2.7.7/etc/hadoop:/home/hadoop/hadoop-2.7.7/etc/hadoop:/home/hadoop/hadoop-2.7.7/etc/hadoop:/home/hadoop/hadoop-2.7.7/share/hadoop/common/lib/*:/home/hadoop/hadoop-2.7.7/share/hadoop/common/*:/home/hadoop/hadoop-2.7.7/share/hadoop/hdfs:/home/hadoop/hadoop-2.7.7/share/hadoop/hdfs/lib/*:/home/hadoop/hadoop-2.7.7/share/hadoop/hdfs/*:/home/hadoop/hadoop-2.7.7/share/hadoop/yarn/lib/*:/home/hadoop/hadoop-2.7.7/share/hadoop/yarn/*:/home/hadoop/hadoop-2.7.7/share/hadoop/mapreduce/lib/*:/home/hadoop/hadoop-2.7.7/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar:/contrib/capacity-scheduler/*.jar:/contrib/capacity-scheduler/*.jar:/contrib/capacity-scheduler/*.jar:/home/hadoop/hadoop-2.7.7/share/hadoop/yarn/*:/home/hadoop/hadoop-2.7.7/share/hadoop/yarn/lib/*:/home/hadoop/hadoop-2.7.7/etc/hadoop/rm-config/log4j.properties org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
在tiandong64和tiandong65上面查看進程(nodemanager)
[root@tiandong64 ~]# ps -aux|grep nodemanager --color
Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.8/FAQ
hadoop 4048 0.5 15.7 2802552 158380 ? Sl 18:14 1:42 /usr/local/src/jdk1.8.0_191/bin/java -Dproc_nodemanager -Xmx1000m -Dhadoop.log.dir=/home/hadoop/hadoop-2.7.7/logs -Dyarn.log.dir=/home/hadoop/hadoop-2.7.7/logs -Dhadoop.log.file=yarn-hadoop-nodemanager-tiandong64.log -Dyarn.log.file=yarn-hadoop-nodemanager-tiandong64.log -Dyarn.home.dir= -Dyarn.id.str=hadoop -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/home/hadoop/hadoop-2.7.7/lib/native -Dyarn.policy.file=hadoop-policy.xml -server -Dhadoop.log.dir=/home/hadoop/hadoop-2.7.7/logs -Dyarn.log.dir=/home/hadoop/hadoop-2.7.7/logs -Dhadoop.log.file=yarn-hadoop-nodemanager-tiandong64.log -Dyarn.log.file=yarn-hadoop-nodemanager-tiandong64.log -Dyarn.home.dir=/home/hadoop/hadoop-2.7.7 -Dhadoop.home.dir=/home/hadoop/hadoop-2.7.7 -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/home/hadoop/hadoop-2.7.7/lib/native -classpath /home/hadoop/hadoop-2.7.7/etc/hadoop:/home/hadoop/hadoop-2.7.7/etc/hadoop:/home/hadoop/hadoop-2.7.7/etc/hadoop:/home/hadoop/hadoop-2.7.7/share/hadoop/common/lib/*:/home/hadoop/hadoop-2.7.7/share/hadoop/common/*:/home/hadoop/hadoop-2.7.7/share/hadoop/hdfs:/home/hadoop/hadoop-2.7.7/share/hadoop/hdfs/lib/*:/home/hadoop/hadoop-2.7.7/share/hadoop/hdfs/*:/home/hadoop/hadoop-2.7.7/share/hadoop/yarn/lib/*:/home/hadoop/hadoop-2.7.7/share/hadoop/yarn/*:/home/hadoop/hadoop-2.7.7/share/hadoop/mapreduce/lib/*:/home/hadoop/hadoop-2.7.7/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar:/contrib/capacity-scheduler/*.jar:/home/hadoop/hadoop-2.7.7/share/hadoop/yarn/*:/home/hadoop/hadoop-2.7.7/share/hadoop/yarn/lib/*:/home/hadoop/hadoop-2.7.7/etc/hadoop/nm-config/log4j.properties org.apache.hadoop.yarn.server.nodemanager.NodeManager
注意:start-dfs和start-yarn.sh這兩個腳本可使用start-all.sh代替
四、查看HDFS分佈式文件系統狀態
[hadoop@tiandong63 ~]$ /home/hadoop/hadoop-2.7.7/bin/hdfs dfsadmin -report
4、hadoop的簡單使用
運行hadoop計算任務,word count字數統計
在HDFS上建立文件夾:
[hadoop@tiandong63 ~]$ /home/hadoop/hadoop-2.7.7/bin/hadoop fs -mkdir /test/input
查看文件夾:
[hadoop@tiandong63 ~]$ /home/hadoop/hadoop-2.7.7/bin/hadoop fs -ls /test/
drwxr-xr-x - hadoop supergroup 0 2018-12-30 21:40 /test/input
上傳文件:
建立一個計算的文件:
[hadoop@tiandong63 ~]$ more file1.txt
welcome to beijing
my name is thunder
what is your name
上傳到HDFS的/test/input文件夾中
[hadoop@tiandong63 ~]$ /home/hadoop/hadoop-2.7.7/bin/hadoop fs -put /home/hadoop/file1.txt /test/input
查看是否上傳成功:
[hadoop@tiandong63 ~]$ /home/hadoop/hadoop-2.7.7/bin/hadoop fs -ls /test/input
Found 1 items
-rw-r--r-- 2 hadoop supergroup 56 2018-12-30 21:40 /test/input/file1.txt
計算:[hadoop@tiandong63 ~]$ /home/hadoop/hadoop-2.7.7/bin/hadoop jar /home/hadoop/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar wordcount /test/input /test/output查看運行結果:[hadoop@tiandong63 ~]$ /home/hadoop/hadoop-2.7.7/bin/hadoop fs -cat /test/output/part-r-00000beijing 1is 2my 1name 2thunder 1to 1welcome 1what 1your 1