公司Commerce Cloud平臺上提供申請主機的服務。昨天試了下,申請了3臺機器,搭了個hadoop環境。如下是機器的一些配置:html
emi-centos-6.4-x86_64
medium | 6GB 內存| 2 虛擬內核 | 30.0GB 盤java
3個機器的主機和ip規劃以下:node
IP地址 主機名 用途linux
192.168.0.101 hd1 namenode
192.168.0.102 hd2 datanode
192.168.0.103 hd3 datanodeweb
1、系統設置 apache
(全部步驟都須要在全部節點執行)windows
1. 修改主機名及ip地址解析centos
1) 修改主機名oracle
[root@hd1 toughhou]# hostname hd1 [root@hd1 toughhou]# cat /etc/sysconfig/network NETWORKING=yes HOSTNAME=hd1
2) 增長ip和主機映射app
[root@hd1 toughhou]# vi /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.0.101 hd1 192.168.0.102 hd2 192.168.0.103 hd3
3) 驗證是否成功
[toughhou@hd1 ~]$ ping hd2 PING hd2 (192.168.0.102) 56(84) bytes of data. 64 bytes from hd2 (192.168.0.102): icmp_seq=1 ttl=63 time=2.55 ms [toughhou@hd1 ~]$ ping hd3 PING hd3 (192.168.0.103) 56(84) bytes of data. 64 bytes from hd3 (192.168.0.103): icmp_seq=1 ttl=63 time=2.48 ms
能ping通說明已經OK。
2. 關閉防火牆
[root@hd1 toughhou]# chkconfig iptables off
3. SSH免密碼登錄
1) 生成密鑰與公鑰
登錄到hd1,把生成的id_rsa.pub(公鑰)內容cat到authorized_keys文件中。同時登錄到hd2, hd3,生成id_rsa.pub,並把hd2, hd3各自的id_rsa.pub的內容copy到hd1中的authorzied_keys中。最後從hd1中scp到hd2, hd3的.ssh目錄中。
[toughhou@hd1 ~]$ ssh-keygen -t rsa [toughhou@hd1 ~]$ cat id_rsa.pub >> authorized_keys [toughhou@hd2 ~]$ ssh-keygen -t rsa [toughhou@hd2 ~]$ cat id_rsa.pub >> authorized_keys [toughhou@hd3 ~]$ ssh-keygen -t rsa [toughhou@hd3 ~]$ cat id_rsa.pub >> authorized_keys
2) scp authorized_keys到hd2, hd3
[toughhou@hd1 ~]$ scp authorized_keys 192.168.0.102:/home/toughhou/.ssh/ [toughhou@hd1 ~]$ scp authorized_keys 192.168.0.103:/home/toughhou/.ssh/
3) 驗證ssh登錄是不是免密碼
(第一次須要密碼,若配置正確的話以後就不用密碼了。)
[toughhou@hd1 ~]$ ssh 192.168.0.102 [toughhou@hd2 ~]$ [toughhou@hd1 ~]$ ssh 192.168.0.103 [toughhou@hd3 ~]$
關於SSH免密碼登錄,也能夠參考文章 「SSH時不需輸入密碼」,它更具體地說了關於SSH設置。
2、安裝jdk、hadoop及設置環境變量
1. 下載jdk、hadoop安裝包
download.oracle.com/otn-pub/java/jdk/7u65-b17/jdk-7u65-linux-x64.tar.gz
http://mirrors.cnnic.cn/apache/hadoop/common/hadoop-2.4.0/hadoop-2.4.0.tar.gz
2. 解壓
[toughhou@hd1 software]$ tar zxvf jdk-7u65-linux-x64.gz [toughhou@hd1 software]$ tar zxvf hadoop-2.4.0.tar.gz [root@hd1 software]# mv hadoop-2.4.0 /opt/hadoop-2.4.0 [root@hd1 software]# mv jdk1.7.0_65 /opt/jdk1.7.0
3. 設置Java環境變量
以root用戶登錄編輯/etc/profile,加入如下內容:
[root@hd1 software]# vi /etc/profile #java export JAVA_HOME=/opt/jdk1.7.0 export JRE_HOME=$JAVA_HOME/jre export PATH=$PATH:$JAVA_HOME/bin export CLASSPATH=./:$JAVA_HOME/lib:$JAVA_HOME/jre/lib #hadoop export HADOOP_HOME=/opt/hadoop-2.4.0 export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_YARN_HOME=$HADOOP_HOME export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/lib export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib" export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native
4. 驗證環境變量
[toughhou@hd1 hadoop]$ java -version
[toughhou@hd1 hadoop]$ hadoop
Usage: hadoop [--config confdir] COMMAND
3、hadoop集羣設置
1. 修改hadoop配置文件
[toughhou@hd1 hadoop]$ cd /opt/hadoop-2.4.0/etc/hadoop
1) hadoop-env.sh、yarn-env.sh 設置JAVA_HOME環境變量
最開始覺得已經在/etc/profile設置了JAVA_HOME,因此在hadoop-env.sh和yarn-env.sh中已經能成功獲取到JAVA_HOME,因此就不用再設置了。最終發現這在hadoop-2.4.0中行不通,start-all.sh的時候出錯了(hd1: Error: JAVA_HOME is not set and could not be found.)。
找到裏面的JAVA_HOME,修改成實際路徑
2) slaves
這個文件配置全部datanode節點,以便namenode搜索
[toughhou@hd1 hadoop]$ vi slaves hd2 hd3
3) core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://hd1:9000</value> </property> <property> <name>io.file.buffer.size</name> <value>131072</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/hadoop/temp</value> <description>A base for other temporary directories.</description> </property> <property> <name>hadoop.proxyuser.root.hosts</name> <value>hd1</value> </property> <property> <name>hadoop.proxyuser.root.groups</name> <value>*</value> </property> </configuration>
4) hdfs-site.xml
<configuration> <property> <name>dfs.namenode.name.dir</name> <value>/hadoop/name</value> <final>true</final> </property> <property> <name>dfs.datanode.data.dir</name> <value>/hadoop/data</value> <final>true</final> </property> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> </configuration>
5) mapred-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://hd1:9000</value> </property> <property> <name>io.file.buffer.size</name> <value>131072</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/hadoop/temp</value> <description>A base for other temporary directories.</description> </property> <property> <name>hadoop.proxyuser.root.hosts</name> <value>hd1</value> </property> <property> <name>hadoop.proxyuser.root.groups</name> <value>*</value> </property> </configuration>
6) yarn-site.xml
<configuration> <property> <name>yarn.resourcemanager.address</name> <value>hd1:18040</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>hd1:18030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>hd1:18025</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>hd1:18041</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>hd1:8088</value> </property> <property> <name>yarn.nodemanager.local-dirs</name> <value>/hadoop/mynode/my</value> </property> <property> <name>yarn.nodemanager.log-dirs</name> <value>/hadoop/mynode/logs</value> </property> <property> <name>yarn.nodemanager.log.retain-seconds</name> <value>10800</value> </property> <property> <name>yarn.nodemanager.remote-app-log-dir</name> <value>/logs</value> </property> <property> <name>yarn.nodemanager.remote-app-log-dir-suffix</name> <value>logs</value> </property> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>-1</value> </property> <property> <name>yarn.log-aggregation.retain-check-interval-seconds</name> <value>-1</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
2. 把如下文件複製到其它節點
[root@hd1 toughhou]# scp -R /opt/hadoop-2.4.0/ hd2:/opt/ [root@hd1 toughhou]# scp -R /opt/hadoop-2.4.0/ hd3:/opt/ [root@hd1 toughhou]# scp -R /opt/jdk1.7.0/ hd2:/opt/ [root@hd1 toughhou]# scp -R /opt/jdk1.7.0/ hd3:/opt/ [root@hd1 toughhou]# scp /etc/profile hd2:/etc/profile [root@hd1 toughhou]# scp /etc/profile hd3:/etc/profile [root@hd1 toughhou]# scp /etc/hosts hd2:/etc/hosts [root@hd1 toughhou]# scp /etc/hosts hd3:/etc/hosts
配置完成以後須要重啓電腦
3. namenode初始化
只須要第一次的時候初始化,以後就不須要了
[toughhou@hd1 bin]$ hdfs namenode -format
若是「Exiting with status 0」,就說明OK。
14/07/23 03:26:33 INFO util.ExitUtil: Exiting with status 0
4. 啓動集羣
[toughhou@hd1 sbin]$ cd /opt/hadoop-2.4.0/sbin [toughhou@hd1 sbin]$ ./start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh Starting namenodes on [hd1] hd1: namenode running as process 12580. Stop it first. hd2: starting datanode, logging to /opt/hadoop-2.4.0/logs/hadoop-toughhou-datanode-hd2.out hd3: starting datanode, logging to /opt/hadoop-2.4.0/logs/hadoop-toughhou-datanode-hd3.out Starting secondary namenodes [0.0.0.0] 0.0.0.0: secondarynamenode running as process 12750. Stop it first. starting yarn daemons resourcemanager running as process 11900. Stop it first. hd3: starting nodemanager, logging to /opt/hadoop-2.4.0/logs/yarn-toughhou-nodemanager-hd3.out hd2: starting nodemanager, logging to /opt/hadoop-2.4.0/logs/yarn-toughhou-nodemanager-hd2.out
5. 查看各節點的狀態
[toughhou@hd1 sbin]$ jps 16358 NameNode 16535 SecondaryNameNode 16942 Jps 16683 ResourceManage [toughhou@hd2 ~]$ jps 2253 NodeManager 2369 Jps 2152 DataNode [toughhou@hd3 ~]$ jps 2064 NodeManager 2178 Jps 1963 DataNode
以上說明都OK。
6. windows添加快捷訪問
爲了方便訪問,咱們也能夠編輯 %systemroot%\system32\drivers\etc\hosts 文件,加入如下的 ip和主機映射
192.168.0.101 hd1 192.168.0.102 hd2 192.168.0.103 hd3
這樣,咱們在本身機器上也能夠經過 http://hd2:8042/node 方式訪問節點,而不必用 http://192.168.0.102:8042/node。
7. wordcount 測試
爲了更進一步驗證hadoop環境,咱們能夠運行hadoop自帶的例子。
wordcount是hadoop最經典的mapreduce例子。咱們進入到相應目錄運行自帶的jar包,來測試hadoop環境是否OK。
具體步驟:
1) hdfs上建立目錄
[toughhou@hd1 ~]$ hadoop fs -mkdir /in/wordcount [toughhou@hd1 ~]$ hadoop fs -mkdir /out/
2) 上傳文件到hdfs
[toughhou@hd1 ~]$ cat in1.txt Hello World , Hello China, Hello Shanghai I love China How are you [toughhou@hd1 ~]$ hadoop fs -put in1.txt /in/wordcount
3) 運行wordcount
[toughhou@hd1 ~]$ cd /opt/hadoop-2.4.0/share/hadoop/mapreduce/ [toughhou@hd2 mapreduce]$ hadoop jar hadoop-mapreduce-examples-2.4.0.jar wordcount /in/wordcount /out/out1 14/07/23 10:42:36 INFO client.RMProxy: Connecting to ResourceManager at hd1/192.168.0.101:18040 14/07/23 10:42:38 INFO input.FileInputFormat: Total input paths to process : 2 14/07/23 10:42:38 INFO mapreduce.JobSubmitter: number of splits:2 14/07/23 10:42:38 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1406105556378_0003 14/07/23 10:42:38 INFO impl.YarnClientImpl: Submitted application application_1406105556378_0003 14/07/23 10:42:38 INFO mapreduce.Job: The url to track the job: http://hd1:8088/proxy/application_1406105556378_0003/ 14/07/23 10:42:38 INFO mapreduce.Job: Running job: job_1406105556378_0003 14/07/23 10:42:46 INFO mapreduce.Job: Job job_1406105556378_0003 running in uber mode : false 14/07/23 10:42:46 INFO mapreduce.Job: map 0% reduce 0% 14/07/23 10:42:55 INFO mapreduce.Job: map 100% reduce 0% 14/07/23 10:43:01 INFO mapreduce.Job: map 100% reduce 100%
4) 查看運行結果
[toughhou@hd2 mapreduce]$ hadoop fs -cat /out/out4/part-r-00000 , 1 China 1 China, 1 Hello 3 How 1 I 1 Shanghai 1 World 1 are 1 love 1 you 1
到此,所有結束。整個hadoop-2.4.0集羣搭建過程所有結束。