準備: 配置 最低版本jdk 1.7.0.45 以上, 軟件下載:http://archive.cloudera.com/cdh5/cdh/5/ 系統目前支持64位 我用的是Centos 6.5 配置環境 /etc/profile root用戶下 export JAVA_HOME=/usr/java/jdk1.7.0_45 export JRE_HOME=${JAVA_HOME}/jre export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib export HADOOP_HOME=/usr/hadoop/chd5 export HADOOP_PID_DIR=/usr/hadoop/hadoop_pid_dir export HADOOP_MAPRED_HOME=${HADOOP_HOME} export HADOOP_COMMON_HOME=${HADOOP_HOME} export HADOOP_HDFS_HOME=${HADOOP_HOME} export YARN_HOME=${HADOOP_HOME} export HADOOP_YARN_HOME=${HADOOP_HOME} export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop export HDFS_CONF_DIR=${HADOOP_HOME}/etc/hadoop export YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop export HADOOP_COMMON_LIB_NATIVE_DIR=/usr/hadoop/chd5 export ZOOKEEPER_HOME=/usr/hadoop/zookeeper export PATH=${JAVA_HOME}/bin:${ZOOKEEPER_HOME}/bin:$PATH source /etc/profile重啓變量設置生效 角色分配: master:192.168.1.10 角色:namenode JournalNode master2:192.168.1.9 角色:namenode JournalNode slave1:192.168.1.11 角色 :datanode JournalNode slave2:192.168.1.12 角色 datanode slave3:192.168.1.13 角色 datanode core-site.xml配置》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》 <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://cluster</value> /**fs.defaultFS的值表示hdfs路徑的邏輯名稱。由於咱們會啓動2個NameNode,每一個NameNode的位置不同,那麼切換後, 用戶也要修改代碼,很麻煩,所以使用一個邏輯路徑,用戶就能夠沒必要擔憂NameNode切換帶來的路徑不一致問題了。**/ </property> <property> <name>hadoop.tmp.dir</name> <value>/usr/hadoop/tmp</value>//在usr/hadoop目錄下建立文件夾 tmp <description>A base for other temporary directories.</description> </property> <property> <name>dfs.name.dir</name> <value>/usr/hadoop/hdfs/name</value>//在Hadoop目錄下建立文件夾hdfs及子目錄 data和name </property> <property> <name>fs.trash.interval</name> <value>10080</value> </property> <property> <name>fs.trash.checkpoint.interval</name> <value>10080</value> </property> <property> <name>topology.script.file.name</name> <value>/usr/hadoop/chd5/etc/hadoop/rack.py</value> //rack.py是機架感知程序 </property> <property> <name>topology.script.number.args</name> <value>6</value> </property> <property> <name>hadoop.native.lib</name> <value>false</value> <description>Should native hadoop libraries, if present, be used.</description> </property> <property> <name>hadoop.security.group.mapping</name> <value>org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback</value> </property> <property> <name>hadoop.proxyuser.hadoop.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hadoop.groups</name> <value>*</value> </property> </configuration> hdfs-site.xml配置》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》 <configuration> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.blocksize</name> <value>16m</value> </property> <property> <name>dfs.data.dir</name> <value>/usr/hadoop/hdfs/data</value> </property> <property> <name>dfs.nameservices</name> <value>cluster</value> </property> <property> <name>dfs.ha.namenodes.cluster</name> <value>master,master2</value> </property> <property> <name>dfs.namenode.rpc-address.cluster.master</name> <value>master:9000</value> </property> <property> <name>dfs.namenode.rpc-address.cluster.master2</name> <value>master2:9000</value> </property> <property> <name>dfs.namenode.http-address.cluster.master</name> <value>master:50070</value> </property> <property> <name>dfs.namenode.http-address.cluster.master2</name> <value>master2:50070</value> </property> <property> <name>dfs.namenode.secondary.http-address.cluster.master</name> <value>master:50090</value> </property> <property> <name>dfs.namenode.secondary.http-address.cluster.master2</name> <value>master2:50090</value> </property> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://master:8485;master2:8485;slave1:8485/cluster</value> </property> <property> <name>dfs.client.failover.proxy.provider.cluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <property> <name>ha.zookeeper.quorum</name> <value>master:2181,slave1:2181,slave2:2181,slave3:2181,master2:2181</value> </property> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/hadoop/.ssh/id_rsa</value> </property> <property> <name>dfs.journalnode.edits.dir</name> <value>/usr/hadoop/tmp/journal</value> </property> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> <property> <name>dfs.datanode.max.xcievers</name> <value>1000000</value> </property> <property> <name>dfs.balance.bandwidthPerSec</name> <value>104857600</value> <description> Specifies the maximum amount of bandwidth that each datanode can utilize for the balancing purpose in the number of bytes per second. </description> </property> <property> <name>dfs.hosts.exclude</name> <value>/usr/hadoop/chd5/etc/hadoop/excludes</value> <description> Names a file that contains a list of hosts that are not permitted to connect to the namenode. The full pathname of the file must be specified. If the value is empty, no hosts are excluded </description> </property> </configuration> mapred-site.xml 配置》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》 <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>master:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>master:19888</value> </property> <property> <name>mapreduce.output.fileoutputformat.compress</name> <value>true</value> </property> <property> <name>mapreduce.output.fileoutputformat.compress.type</name> <value>BLOCK</value> </property> <property> <name>mapreduce.output.fileoutputformat.compress.codec</name> <value>org.apache.hadoop.io.compress.SnappyCodec</value> </property> <property> <name>mapreduce.map.output.compress</name> <value>true</value> </property> <property> <name>mapreduce.map.output.compress.codec</name> <value>org.apache.hadoop.io.compress.SnappyCodec</value> </property> </configuration> yarn-site.xml配置》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》 <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>master:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>master:19888</value> </property> <property> <name>mapreduce.output.fileoutputformat.compress</name> <value>true</value> </property> <property> <name>mapreduce.output.fileoutputformat.compress.type</name> <value>BLOCK</value> </property> <property> <name>mapreduce.output.fileoutputformat.compress.codec</name> <value>org.apache.hadoop.io.compress.SnappyCodec</value> </property> <property> <name>mapreduce.map.output.compress</name> <value>true</value> </property> <property> <name>mapreduce.map.output.compress.codec</name> <value>org.apache.hadoop.io.compress.SnappyCodec</value> </property> </configuration> rack.py》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》 #!/bin/env python import sys,os,time pwd = os.path.realpath( __file__ ) rack_file = os.path.dirname(pwd) + "/rack.data" rack_list = [ l.strip().split() for l in open(rack_file).readlines() if len(l.strip().split()) > 1 ] rack_map = {} for item in rack_list: for host in item[:-1]: rack_map[host] = item[-1] rack_map['default'] = 'default' in rack_map and rack_map['default'] or '/default/rack' rack_result = [av in rack_map and rack_map[av] or rack_map['default'] for av in sys.argv[1:]] #print rack_map, rack_result print ' '.join( rack_result ) f = open('/tmp/rack.log','a+') f.writelines( "[%s] %sn" % (time.strftime("%F %T"),str(sys.argv))) f.close() zookeeper 中zoo.cfg配置》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》 在zookeeper文件夾下建立data目錄用於存放數據,並在data 目錄下建立文件myid 根據server id 的不一樣寫入1或2或3或4或5和 和配置裏的server.1,server.2……一致 # The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=10 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=5 # the directory where the snapshot is stored. # do not use /tmp for storage, /tmp here is just # example sakes. dataDir=/usr/hadoop/zookeeper/data # the port at which the clients will connect clientPort=2181 server.1=master:2888:3888 server.2=master2:2888:3888 server.3=slave1:2888:3888 server.4=slave2:2888:3888 server.5=slave3:2888:3888 # # Be sure to read the maintenance section of the # administrator guide before turning on autopurge. # # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance # # The number of snapshots to retain in dataDir #autopurge.snapRetainCount=3 # Purge task interval in hours # Set to "0" to disable auto purge feature #autopurge.purgeInterval=1
如下命令嚴格注意執行順序,不能顛倒!html
一、在master、master2,slave1上,執行命令 hadoop-daemon.sh start journalnodejava
二、在5個節點啓動zookeeper 執行命令:zkServer.sh start命令node
三、在master執行命令 hdfs namenode –formatpython
在master執行命令 hadoop-daemon.sh start namenodeweb
四、在master2執行命令 hdfs namenode -bootstrapStandbyapache
在master2執行命令 hadoop-daemon.sh start namenodebootstrap
此時訪問192.168.1.9:50070 和192.168.1.10:50070都是standy狀態
app
五、格式化zookeeper HA配置文件執行命令ssh
hdfs zkfc -formatZK
六、在任意一個namenode節點執行start.sh命令 其中一個namenode的standy狀態會變成active,另外一個是standy狀態webapp
若有不一樣意見或見解望你們留下寶貴意見,及時修改,共同進步。