1、安裝前準備: 操做系統:CentOS 6.5 64位操做系統 環境:jdk1.7.0_45以上,本次採用jdk-7u55-linux-x64.tar.gz master01 10.10.2.57 namenode 節點 master02 10.10.2.58 namenode 節點 slave01:10.10.2.173 datanode 節點 slave02:10.10.2.59 datanode 節點 slave03: 10.10.2.60 datanode 節點 注:Hadoop2.0以上採用的是jdk環境是1.7,Linux自帶的jdk卸載掉,從新安裝 下載地址:http://www.oracle.com/technetwork/java/javase/downloads/index.html 軟件版本:hadoop-2.3.0-cdh5.1.0.tar.gz, zookeeper-3.4.5-cdh5.1.0.tar.gz 下載地址:http://archive.cloudera.com/cdh5/cdh/5/ 開始安裝: 2、jdk安裝 一、檢查是否自帶jdk rpm -qa | grep jdk java-1.6.0-openjdk-1.6.0.0-1.45.1.11.1.el6.i686 二、卸載自帶jdk yum -y remove java-1.6.0-openjdk-1.6.0.0-1.45.1.11.1.el6.i686 三、安裝jdk-7u55-linux-x64.tar.gz 在usr/目錄下建立文件夾java,在java文件夾下運行tar –zxvf jdk-7u55-linux-x64.tar.gz 解壓到java目錄下 [root@master01 java]# ls jdk1.7.0_55 3、配置環境變量 遠行vi /etc/profile # /etc/profile # System wide environment and startup programs, for login setup # Functions and aliases go in /etc/bashrc export JAVA_HOME=/usr/java/jdk1.7.0_55 export JRE_HOME=/usr/java/jdk1.7.0_55/jre export CLASSPATH=/usr/java/jdk1.7.0_55/lib export PATH=$JAVA_HOME/bin: $PATH 保存修改,運行source /etc/profile 從新加載環境變量 運行java -version [root@master01 java]# java -version java version "1.7.0_55" Java(TM) SE Runtime Environment (build 1.7.0_55-b13) Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode) Jdk配置成功 4、系統配置 預先準備5臺機器,並配置IP 關閉防火牆 chkconfig iptables off(永久性關閉) 配置主機名和hosts文件 [root@master01 java]# vi /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 10.10.2.57 master01 10.10.2.58 master02 10.10.2.173 slave01 10.10.2.59 slave02 10.10.2.60 slave03 按照不一樣機器IP配置不一樣的主機名 三、SSH無密碼驗證配置 由於Hadoop運行過程須要遠程管理Hadoop的守護進程,NameNode節點須要經過SSH(Secure Shell)連接各個DataNode節點,中止或啓動他們的進程,因此SSH必須是沒有密碼的,因此咱們要把NameNode節點和DataNode節點配製成無祕密通訊,同理DataNode也須要配置無密碼連接NameNode節點。 在每一臺機器上配置: vi /etc/ssh/sshd_config打開 RSAAuthentication yes # 啓用 RSA 認證,PubkeyAuthentication yes # 啓用公鑰私鑰配對認證方式 Master01:運行:ssh-keygen –t rsa –P '' 不輸入密碼直接enter 默認存放在 /root/.ssh目錄下, cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys [root@master01 .ssh]# ls authorized_keys id_rsa id_rsa.pub known_hosts slave01執行相同的操做,而後將master01 /root/.ssh/目錄下的id_rsa.pub放到 slave01 相同目錄下的authorized_keys這樣slave01就持有了master01的公鑰 而後直接ssh slave01測試是否能夠無密碼鏈接到slave01上,而後將slave01 上的id_rsa.pub 追加到master01的authorized_keys中,測試ssh master01 是否能夠直接連上slave01. [root@master01 ~]# ssh slave01 Last login: Tue Aug 19 14:28:15 2014 from master01 [root@slave01 ~]# Master01-master02 Master01-slave01 Master01-slave02 Master01-slave03 Master02-slave01 Master02-slave02 Master02-slave03 執行相同的操做。 5、安裝Hadoop 創建文件目錄 /usr/local/cloud 建立文件夾data,存放數據、日誌文件,haooop原文件,zookeeper原文件 [root@slave01 cloud]# ls data hadoop tar zookeeper 5.一、配置hadoop-env.sh 進入到/usr/local/cloud/hadoop/etc/hadoop目錄下 配置vi hadoop-env.sh hadoop運行環境加載 export JAVA_HOME=/usr/java/jdk1.7.0_55 5.二、配置core-site.xml <!—hadoop.tmp.dir:hadoop不少路徑都依賴他,namenode節點該目錄不能夠刪除,不然須要從新格式化--> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/cloud/data/hadoop/tmp</value> </property> <!—這個配置文件描述了集羣的namenode節點的url,這裏採用HA表明默認邏輯名,集羣中的每一個datanode節點都須要知道namenode的地址,數據才能夠被使用--> <property> <name>fs.defaultFS</name> <value>hdfs://zzg</value> </property> <!-- zookeeper集羣的地址和端口,最好保持基數個至少3臺--> <property> <name>ha.zookeeper.quorum</name> <value>master01:2181,slave01:2181,slave02:2181</value> </property> (2)hdfs-site.xml配置 <!—hadoop namenode數據的存儲目錄,只是針對與namenode,包含了namenode的系統信息元數據信息--> <property> <name>dfs.namenode.name.dir</name> <value>/usr/local/cloud/data/hadoop/dfs/nn</value> </property> <!—datanode 要存儲到數據到本地的路徑,沒必要每一臺機器都同樣,可是爲了方便管理最好仍是同樣--> <property> <name>dfs.datanode.data.dir</name> <value>/usr/local/cloud/data/hadoop/dfs/dn</value> </property> <!—系統中文件備份數量,系統默認是3分--> <property> <name>dfs.replication</name> <value>3</value> </property> <!-- dfs.webhdfs.enabled 置爲true,不然一些命令沒法使用如:webhdfs的LISTSTATUS --> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> <!—可選,關閉權限帶來一些沒必要要的麻煩--> <property> <name>dfs.permissions</name> <value>false</value> </property> <!—可選,關閉權限帶來一些沒必要要的麻煩--> <property> <name>dfs.permissions.enabled</name> <value>false</value> </property> <!—HA配置--> <!—設置集羣的邏輯名--> <property> <name>dfs.nameservices</name> <value>zzg</value> </property> <!—hdfs聯邦集羣中的namenode節點邏輯名--> <property> <name>dfs.ha.namenodes.zzg</name> <value>nn1,nn2</value> </property> <!—hdfs namenode邏輯名中RPC配置,rpc 簡單理解爲序列化文件上傳輸出文件要用到--> <property> <name>dfs.namenode.rpc-address.zzg.nn1</name> <value>master01:9000</value> </property> <property> <name>dfs.namenode.rpc-address.zzg.nn2</name> <value>master02:9000</value> </property> <!—配置hadoop頁面訪問端口端口--> <property> <name>dfs.namenode.http-address.zzg.nn1</name> <value>master01:50070</value> </property> <property> <name>dfs.namenode.http-address.zzg.nn2</name> <value>master02:50070</value> </property> <!—創建與namenode的通訊--> <property> <name>dfs.namenode.servicerpc-address.zzg.nn1</name> <value>master01:53310</value> </property> <property> <name>dfs.namenode.servicerpc-address.zzg.nn2</name> <value>master02:53310</value> </property> <!—journalnode 共享文件集羣--> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://master01:8485;slave01:8485;slave02:8485/zzg</value> </property> <!—journalnode對namenode的進行共享設置--> <property> <name>dfs.journalnode.edits.dir</name> <value>/usr/local/cloud/data/hadoop/ha/journal</value> </property> <!—設置故障處理類--> <property> <name>dfs.client.failover.proxy.provider.zzg</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <!—開啓自動切換--> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <property> <name>ha.zookeeper.quorum</name> <value>master01:2181,slave01:2181,slave02:2181</value> </property> <!—使用ssh方式進行故障切換--> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <!—ssh通訊密碼通訊位置--> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/root/.ssh/id_rsa</value> </property> 5.3 配置maped-site.xml <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> 5.4配置yarn HA 配置yarn-en.sh java環境 # some Java parameters export JAVA_HOME=/usr/java/jdk1.7.0_55 5.5配置yarn-site.xml <!—rm失聯後從新連接的時間--> <property> <name>yarn.resourcemanager.connect.retry-interval.ms</name> <value>2000</value> </property> <!—開啓resource manager HA,默認爲false--> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <!—開啓故障自動切換--> <property> <name>yarn.resourcemanager.ha.automatic-failover.enabled</name> <value>true</value> </property> <!—配置resource manager --> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <!—在master01上配置rm1,在master02上配置rm2,--> <property> <name>yarn.resourcemanager.ha.id</name> <value>rm1</value> <description>If we want to launch more than one RM in single node, we need this configuration</description> </property> <!—開啓自動恢復功能--> <property> <name>yarn.resourcemanager.recovery.enabled</name> <value>true</value> </property> <!—配置與zookeeper的鏈接地址--> <property> <name>yarn.resourcemanager.zk-state-store.address</name> <value>localhost:2181</value> </property> <property> <name>yarn.resourcemanager.store.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value> </property> <property> <name>yarn.resourcemanager.zk-address</name> <value>localhost:2181</value> </property> <property> <name>yarn.resourcemanager.cluster-id</name> <value>yarn-cluster</value> </property> <!—schelduler失聯等待鏈接時間--> <property> <name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name> <value>5000</value> </property> <!—配置rm1--> <property> <name>yarn.resourcemanager.address.rm1</name> <value>master01:23140</value> </property> <property> <name>yarn.resourcemanager.scheduler.address.rm1</name> <value>master01:23130</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm1</name> <value>master01:23188</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address.rm1</name> <value>master01:23125</value> </property> <property> <name>yarn.resourcemanager.admin.address.rm1</name> <value>master01:23141</value> </property> <property> <name>yarn.resourcemanager.ha.admin.address.rm1</name> <value>master01:23142</value> </property> <!—配置rm2--> <property> <name>yarn.resourcemanager.address.rm2</name> <value>master02:23140</value> </property> <property> <name>yarn.resourcemanager.scheduler.address.rm2</name> <value>master02:23130</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm2</name> <value>master02:23188</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address.rm2</name> <value>master02:23125</value> </property> <property> <name>yarn.resourcemanager.admin.address.rm2</name> <value>master02:23141</value> </property> <property> <name>yarn.resourcemanager.ha.admin.address.rm2</name> <value>master02:23142</value> </property> <!—配置nodemanager--> <property> <description>Address where the localizer IPC is.</description> <name>yarn.nodemanager.localizer.address</name> <value>0.0.0.0:23344</value> </property> <!—nodemanager http訪問端口--> <property> <description>NM Webapp address.</description> <name>yarn.nodemanager.webapp.address</name> <value>0.0.0.0:23999</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.nodemanager.local-dirs</name> <value>/usr/local/cloud/data/hadoop/yarn/local</value> </property> <property> <name>yarn.nodemanager.log-dirs</name> <value>/usr/local/cloud/data/logs/hadoop</value> </property> <property> <name>mapreduce.shuffle.port</name> <value>23080</value> </property> <!—故障處理類--> <property> <name>yarn.client.failover-proxy-provider</name> <value>org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider</value> </property> 6、配置zookeeper集羣 在zookeeper目錄下創建data目錄 和logs目錄, 配置zoo.cnf dataDir=/usr/local/cloud/zookeeper/data dataLogDir=/usr/local/cloud/zookeeper/logs # the port at which the clients will connect clientPort=2181 server.1=master01:2888:3888 server.2=master02:2888:3888 server.3=slave01:2888:3888 server.4=slave02:2888:3888 server.5=slave03:2888:3888 在data目錄下建立myid文件,並在對應的機器上填寫數字,如上配置master01 server01 的myid寫入1, master02 中的data的myid寫入2,依次在其餘機子上執行相同操做。 在各個機器下zookeeper目錄下的bin目錄下執行zkServer.sh start命令 再運行zkServer.sh status若是出現leader 或fllower 則說明集羣配置正確。 到此各個配置文件配置完畢 7、啓動Hadoop集羣嚴格按照如下順序執行(第一次) (1)各個節點啓動zookeeper,在zookeeper/bin/zkServer.sh start (2) 在hadoop/bin/hdfs zkfc –formatZK 進行格式化建立命名空間 (3)在配置了journalnode的節點啓動,master01,slave01,slave02 在hadoop/sbin/hadoop-daemon.sh journalnode (4)在主namenode節點執行格式化 ./bin/hadoop namenode -format zzg 主機器上啓動namenode hadoop/sbin/ hadoop-daemon.sh start namenode (5)將主namenode節點格式化的目錄拷貝到從主namenode節點上 hadoop/bin/hdfs namenode –bootstrapStandby hadoop/sbin/hadoop-daemon.sh start namenode (6) 在兩個namenode節點都執行如下命令 ./sbin/hadoop-daemon.sh start zkfc (7) 在全部datanode節點都執行如下命令啓動datanode ./sbin/hadoop-daemon.sh start datanode (8)在主namenode節點啓動yarn,運行yarn-start.sh命令 jps能夠看到 namenode節點 [root@master01 ~]# jps 38972 JournalNode 38758 NameNode 39166 DFSZKFailoverController 37473 QuorumPeerMain 39778 ResourceManager 42620 Jps datanode節點 [root@slave01 ~]# jps 33440 DataNode 35277 Jps 32681 QuorumPeerMain 33568 JournalNode 34231 NodeManager