CentOS6.5上搭建hadoop 2.5.2 筆記

 

 

規劃 三臺機器,html

一、master  192.168.79.135
二、slave1     192.168.79.131
三、slave2     192.168.79.132 
 
OS:CentOS6.5
hadoop:hadoop-2.5.2
jdk:1.7.0_67
 
因爲使用vmware,爲了方便,採用clone操做,在操做時,會常常碰到連不上外網的狀況,並且有時ip會浮動,建議在clone過程當中,按順序克隆,並按順序啓動,如:首先安裝master,而後在master上clone,完成slave1,而後在slave1上clone,完成slave2。
完成克隆後,按順序啓動機器,(原本打算使用固定ip的,可是沒有成功,暫時先使用自動獲取的),若不適用虛擬機,則須要單獨安裝三個系統,並配置ip
 
一、修改/etc/hosts文件
vi /etc/hosts
#127.0.0.1              localhost.localdomain localhost          爲了以防萬一,我將127.0.0.1也註釋掉,正常應該只註釋ipv6  ::1便可
#::1            localhost6.localdomain6 localhost6
192.168.79.135  master
192.168.79.131  slave1
192.168.79.132  slave2
能夠每一個機器都單獨配置,也可使用scp命令進行服務器間的拷貝,可是此時沒有進行免密碼登錄(後邊將有ssh免密碼登陸的說明),拷貝時須要輸入密碼
二、網絡配置:
主要是配置hostname,對於CentOS6.5,配置hostname可使用命令
#vi /etc/sysconfig/network
 
NETWORKING=yes
NETWORKING_IPV6=no
HOSTNAME
=master
NTPSERVERARGS=iburst

將HOSTNAME修改成預先規劃的名稱,三個機器都要修改,固然、不要重複java

 關閉防火牆:
  a、永久關閉 chkconfig iptables off
  b、臨時關閉 service iptables stop
  c、查看防火牆狀態(狀態爲inactive) service iptables status

 

 
三、ssh免密碼登錄:
每一個機器執行     ssh-keygen
在主節點master執行     
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
ssh slave1 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
ssh slave2 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

執行完之後,將authorized_keys文件傳輸到每一個機器的/etc/目錄下node

 
拷貝完成後,須要在每一個機器執行 ssh 免密碼登陸到其餘機器,包括機器自己,此操做將會把機器的rsa密鑰添加到know_hosts,並能夠經過該操做驗證是否免密碼登陸設置成功。
 
確保全部機器互相之間都可以避免密碼登陸之後,既可使用scp命令免密碼傳輸文件了

 

 
四、安裝jdk:
先檢查系統是否包含jdk,在安裝過程當中,可能會自動安裝openjdk(使用過程當中CentOS7安裝時勾選開發者工具後,會在系統中自動安裝openjdk,CentOS6.5不會安裝),
若安裝了openjdk,須要卸載後安裝sun jdk(http://blog.csdn.net/tralonzhang/article/details/7773824)
先查看 rpm -qa | grep java

顯示以下信息:

java-1.4.2-gcj-compat-1.4.2.0-40jpp.115

java-1.6.0-openjdk-1.6.0.0-1.7.b09.el5

卸載:

rpm -e --nodeps java-1.4.2-gcj-compat-1.4.2.0-40jpp.115

rpm -e --nodeps java-1.6.0-openjdk-1.6.0.0-1.7.b09.el5

 

 

 

下載jdk 對應版本tar包,解壓到指定目錄,本人(/usr/java/)
解壓後在/etc/profile中配置jdk環境變量
export JAVA_HOME=/usr/java/jdk1.7.0_67
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export JRE_HOME=/usr/java/jdk1.7.0_67/jre
export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin

修改完成後,執行source /etc/profile,是環境變量生效web

接下來驗證jdk是否安裝成功:
[root@master ~]# java -version
java version "1.7.0_67"
Java(TM) SE Runtime Environment (build 1.7.0_67-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.79-b02, mixed mode)
[root@master ~]# javac -version
javac 1.7.0_67
[root@master ~]# $JAVA_HOME
-bash: /usr/java/jdk1.7.0_67: is a directory

如上顯示正常版本及jdk路徑,則安裝成功,能夠把java安裝文件及/etc/profile文件拷貝到其餘節點機器,執行 source /etc/profile,便可。apache

 
五、安裝hadoop-2.5.2:
 hadoop 安裝目錄           /data/hadoop/hadoop-2.5.2/
文件系統數據目錄          /data/hadoop/hadoop_home/hdfs/data/
文件系統namenode      /data/hadoop/hadoop_home/hdfs/name/  
在每一個機器建立目錄 /data/hadoop/ ,將hadoop-2.5.2.tar.gz拷貝到該目錄下,進入該目錄,
執行 tar -zxvf hadoop-2.5.2.tar.gz     解壓 hadoop-2.5.2.tar.gz
而後進入最關鍵的hadoop配置環節:
新版的hadoop,將目錄文件結構進行了調整,0.20時配置文件均在conf目錄下,如今所有移入/data/hadoop/hadoop-2.5.2/etc/hadoop/。
在配置過程當中,開始是找了網上的兩篇配置文檔,
對照着進行配置,可是最終都是隻能啓動,可是resourceManager會掉線,致使失敗,後來直接找了官方文檔,發現網上教程中的字母大小寫都是錯的,一步步對照,一個個屬性查看,進行了一次配置,成功了!
重要的是發現了hadoop jar包中原本就包含了詳細的官方文檔,還有全部配置的默認屬性,徹底能夠拷出來修改,避免書寫錯誤集羣安裝文檔:{jar包位置}hadoop-2.5.2/share/doc/hadoop/hadoop-project-dist/hadoop-common/ClusterSetup.html
仍是官方的第一手文檔比較靠譜,之後下載後會先看jar包中是否包含文檔,免得去網上各類野文檔,浪費時間(我也在製造野文檔......哈哈哈)。
言歸正傳:
(1)、修改hadoop-env.sh及yarn-env.sh中的JAVA_HOME,與環境變量中的路徑相同便可
(2)、core-site.xml
<configuration>
<property>
  <name>fs.defaultFS</name>
  <value>hdfs://master:9000</value>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.</description>
</property>
<property>
  <name>hadoop.tmp.dir</name>
  <value>/data/hadoop/tmp/hadoop-${user.name}</value>
  <description>A base for other temporary directories.</description>
</property>
<!-- i/o properties -->
<property>
  <name>io.file.buffer.size</name>
  <value>131072</value>
  <description>The size of buffer for use in sequence files.
  The size of this buffer should probably be a multiple of hardware
  page size (4096 on Intel x86), and it determines how much data is
  buffered during read and write operations.</description>
</property>
</configuration>

(3).hdfs-site.xmlbash

<configuration>
<property>
  <name>dfs.namenode.name.dir</name>
  <value>/data/hadoop/hadoop-2.5.2/hdfs/name</value>
  <description>Determines where on the local filesystem the DFS name node
      should store the name table(fsimage).  If this is a comma-delimited list
      of directories then the name table is replicated in all of the
      directories, for redundancy. </description>
</property>
<property>
  <name>dfs.datanode.data.dir</name>
  <value>/data/hadoop/hadoop-2.5.2/hdfs/data</value>
  <description>Determines where on the local filesystem an DFS data node
  should store its blocks.  If this is a comma-delimited
  list of directories, then data will be stored in all named
  directories, typically on different devices.
  Directories that do not exist are ignored.
  </description>
</property>
<property>
  <name>dfs.replication</name>
  <value>2</value>
  <description>Default block replication.
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  </description>
</property>
<property>
  <name>dfs.blocksize</name>
  <value>134217728</value>
  <description>
      The default block size for new files, in bytes.
      You can use the following suffix (case insensitive):
      k(kilo), m(mega), g(giga), t(tera), p(peta), e(exa) to specify the size (such as 128k, 512m, 1g, etc.),
      Or provide complete size in bytes (such as 134217728 for 128 MB).
  </description>
</property>
<property>
  <name>dfs.namenode.handler.count</name>
  <value>10</value>
  <description>The number of server threads for the namenode.</description>
</property>
</configuration>

 

(4)、mapred-site.xml服務器

  

<configuration>
<property>
  <name>mapreduce.framework.name</name>
  <value>yarn</value>
  <description>The runtime framework for executing MapReduce jobs.
  Can be one of local, classic or yarn.
  </description>
</property>
<!-- jobhistory properties -->
<property>
  <name>mapreduce.jobhistory.address</name>
  <value>master:10020</value>
  <description>MapReduce JobHistory Server IPC host:port</description>
</property>
<property>
  <name>mapreduce.jobhistory.webapp.address</name>
  <value>master:19888</value>
  <description>MapReduce JobHistory Server Web UI host:port</description>
</property>
</configuration>

 

(5)、yarn-site.xml網絡

<configuration>
<!-- Site specific YARN configuration properties -->
<property>
    <description>The hostname of the RM.</description>
    <name>yarn.resourcemanager.hostname</name>
    <value>master</value>
  </property>
<property>
    <name>yarn.resourcemanager.address</name>
    <value>${yarn.resourcemanager.hostname}:8032</value>
    <description>The address of the applications manager interface in the RM.</description>
</property>
<property>
    <description>The address of the scheduler interface.</description>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>${yarn.resourcemanager.hostname}:8030</value>
  </property>
<property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>${yarn.resourcemanager.hostname}:8031</value>
  </property>
<property>
    <description>The address of the RM admin interface.</description>
    <name>yarn.resourcemanager.admin.address</name>
    <value>${yarn.resourcemanager.hostname}:8033</value>
  </property>
<property>
    <description>The http address of the RM web application.</description>
    <name>yarn.resourcemanager.webapp.address</name>
    <value>${yarn.resourcemanager.hostname}:8088</value>
  </property>
<property>
    <description>The minimum allocation for every container request at the RM,
    in MBs. Memory requests lower than this won't take effect,
    and the specified value will get allocated at minimum.
        default is 1024
    </description>
    <name>yarn.scheduler.minimum-allocation-mb</name>
    <value>512</value>
  </property>
<property>
    <description>The maximum allocation for every container request at the RM,
    in MBs. Memory requests higher than this won't take effect,
    and will get capped to this value.
    default value is 8192</description>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>2048</value>
  </property>
<property>
    <description>Amount of physical memory, in MB, that can be allocated
    for containers.default value is 8192</description>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>2048</value>
  </property>
<property>
    <description>Whether to enable log aggregation. Log aggregation collects
      each container's logs and moves these logs onto a file-system, for e.g.
      HDFS, after the application completes. Users can configure the
      "yarn.nodemanager.remote-app-log-dir" and
      "yarn.nodemanager.remote-app-log-dir-suffix" properties to determine
      where these logs are moved to. Users can access the logs via the
      Application Timeline Server.
    </description>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
  </property>
</configuration>

 

(6)、slaves(根據官方文檔所說,slaves的配置能夠是ip也能夠是hostname,爲了修改ip方便,本人使用hostname)
slave1
slave2

在以上配置文件中,有好多屬性爲hadoop默認屬性值,拿來只是爲了標註清楚,在配置時,若發現與默認文檔相同的值,能夠省略app

至此,hadoop配置文件就配置完了,接着須要配置hadoop的環境變量,以前java環境變量也是包含其中,以下:dom

/etc/profile

#set java_env

export JAVA_HOME=/usr/java/jdk1.7.0_67
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export JRE_HOME=/usr/java/jdk1.7.0_67/jre
export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin
###set hadoop_env
export HADOOP_HOME=/data/hadoop/hadoop-2.5.2
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/lib
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib" 

一樣執行source /etc/profile

$HADOOP_HOME、hadoop -version 測試配置是否成功 
若以上都配置成功,將對應安裝文件(包括/etc/profile)拷貝到其餘機器上,注意位置必定要相同,則可啓動hadoop
 
六、首先使用hadoop namenode -format 格式化namenode,也可使用/bin/hdfs namenode -format進行格式化操做
出現  Storage directory /data/hadoop/hadoop-2.5.2/hdfs/name has been successfully formatted,則格式化成功。
而後能夠啓動hadoop,官方的啓動方式是是用守護進程,好像比較繁雜,
故使用start-dfs.sh啓動hdfs,
能夠看到會有幾個日誌輸出位置,其中包含SecondaryNameNode(0.0.0.0),這個是正確的,由於在hdfs-default.xml中提到
<property>
  <name>dfs.namenode.rpc-bind-host</name>
  <value></value>
  <description>
    The actual address the RPC server will bind to. If this optional address is
    set, it overrides only the hostname portion of dfs.namenode.rpc-address.
    It can also be specified per name node or name service for HA/Federation.
    This is useful for making the name node listen on all interfaces by
    setting it to 0.0.0.0.
  </description>
</property>
[root@master hadoop]# jps
2630 Jps
1955 SecondaryNameNode
1785 NameNode
 
[root@slave1 ~]# jps
1942 Jps
1596 DataNode

若在master使用jps發現上邊兩個進程,在slave發現DataNode,則dfs啓動成功(固然你須要在slave節點查看日誌,如有錯仍還須要排查)

都成功後,可使用./start-yarn.sh
啓動yarn
 
[root@master hadoop]# jps
2630 Jps
1955 SecondaryNameNode
1785 NameNode
2316 ResourceManager
 
[root@slave1 ~]# jps
1942 Jps
1596 DataNode
1774 NodeManager

啓動成功後,master增長一個ResourceManager,slave增長一個NodeManager,啓動成功 

七、在文檔中還有一個jobhistory的web UI,若只啓動dfs和yarn是看不到這個界面的,須要單獨啓動,這時就須要使用守護進程來啓動了:
如下命令 用於啓動mapreduce-jobhistoryserver  (若不須要查看,則不須要啓動),jps能夠查看到多了一個JobHistoryServer
./mr-jobhistory-daemon.sh start historyserver --config $HADOOP_CONF_DIR
 
如下爲hadoop環境的可視化管理界面
Daemon Web Interface Notes
NameNode http://nn_host:port/ Default HTTP port is 50070.
ResourceManager http://rm_host:port/ Default HTTP port is 8088.
MapReduce JobHistory Server http://jhs_host:port/ Default HTTP port is 19888.
 
 
引: 0.0.0.0:8031錯誤緣由及解決辦法
Exception:
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.ConnectException: Call From slave1/192.168.79.131 to 0.0.0.0:8031 failed on connection exception: java.net.ConnectException: 拒絕鏈接; For more details see:   http://wiki.apache.org/hadoop/ConnectionRefused
 
在啓動yarn時,slave老是會鏈接0.0.0.0:8031,此處的0.0.0.0實際是resourceManager的默認hostname,若在yarn-site.xml中不設置${yarn.resourcemanager.hostname},則將會出現鏈接錯誤,最終致使yarn啓動失敗,還會在slave日誌中出現拒絕鏈接的錯誤,
該屬性能夠在yarn-site.xml中設置爲變量,其餘屬性能夠直接引用該變量便可,方便修改(備註:在yarn-default.xml中,該屬性的值爲ip,實際配置過程爲了方便修改,將其設置爲hostname,是否可行需後續啓動驗證)
 
各配置文件之間變量能夠相互引用
相關文章
相關標籤/搜索