集羣環境至少須要3個節點,也就是至少須要3臺服務器設備:1個Master,2個Slave,節點之間局域網鏈接,能夠相互ping通,下面經過表格的方式,列出3臺服務器設備的具體配置信息:java
Hostname | IP | User | Password | Role |
master | 192.168.1.101 | hadoop | 123456 | namenode |
slave1 | 192.168.1.105 | hadoop | 123456 | datanode |
slave2 | 192.168.1.106 | hadoop | 123456 | datanode |
爲了便於維護,集羣環境配置項最好使用相同用戶名、用戶密碼、相同hadoop、hbase、zookeeper目錄結構。node
以上軟件可在 http://www.apache.org/dyn/closer.cgi 下載。web
很簡單,使用root帳戶登陸三臺節點,建立hadoop帳戶。shell
$ useradd hadoop $ passwd hadoop #設置密碼爲123456
分別在三個節點上添加hosts映射關係:apache
$ vim /etc/hosts
添加的內容以下:vim
192.168.1.101 master 192.168.1.105 slave1 192.168.1.106 slave2
CentOS默認安裝了ssh,若是沒有你須要先安裝ssh 。瀏覽器
集羣環境的使用必須經過ssh無密碼登錄來執行,本機登錄本機必須無密碼登錄,主機與從機之間必須能夠雙向無密碼登錄,從機與從機之間無限制。服務器
主要有三步:session
$ ssh-keygen -t rsa -f ~/.ssh/id_rsa $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys $ chmod 700 ~/.ssh && chmod 600 ~/.ssh/*
測試,第一次登陸可能須要yes確認,以後就能夠直接登陸了:app
$ ssh localhost Last login: Sat Jul 18 22:57:44 2015 from localhost
對於 slave1 和 slave2,進行無密碼自登錄設置,操做同上。
$ cat ~/.ssh/id_rsa.pub | ssh hadoop@slave1 'cat - >> ~/.ssh/authorized_keys' $ cat ~/.ssh/id_rsa.pub | ssh hadoop@slave2 'cat - >> ~/.ssh/authorized_keys'
測試:
[hadoop@master ~]$ ssh hadoop@slave1 Last login: Sat Jul 18 23:25:41 2015 from master [hadoop@master ~]$ ssh hadoop@slave2 Last login: Sat Jul 18 23:25:14 2015 from master
分別在slave一、slave2上執行:
$ cat ~/.ssh/id_rsa.pub | ssh hadoop@master 'cat - >> ~/.ssh/authorized_keys'
這是安裝三個軟件的前提,推薦你們使用rpm安裝,很是簡單。我專門寫了一篇博客,請移步閱讀《Centos6.6 64位安裝配置JDK 8教程》,雖然我如今使用的Centos7,此教程不受系統版本影響的。
在master節點上,將hadoop、hbase、zookeeper的安裝包都解壓到/home/hadoop(也就是hadoop帳戶的默認家目錄),並重命名爲hadoop、hbase、zookeeper。Hadoop的配置文件都在~/hadoop/etc/目錄下,接下來對hadoop進行配置。
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://master:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>file:/home/hadoop/hadoop/tmp</value> </property> <property> <name>io.file.buffer.size</name> <value>131702</value> </property> </configuration>
<configuration> <property> <name>dfs.namenode.name.dir</name> <value>file:/home/hadoop/hadoop/hdfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/home/hadoop/hadoop/hdfs/data</value> </property> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>master:9001</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> </configuration>
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>master:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>master:19888</value> </property> </configuration>
將 export JAVA_HOME=${JAVA_HOME} 改成 export JAVA_HOME=/usr/java/default
將 export JAVA_HOME=${JAVA_HOME} 改成 export JAVA_HOME=/usr/java/default
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>master:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>master:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>master:8035</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>master:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>master:8088</value> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>768</value> </property> </configuration>
vi masters ,加入如下配置內容:
master
vi slaves,加入如下配置內容:
slave1 slave2
使用scp命令進行從本地到遠程(或遠程到本地)的文件拷貝操做:
scp -r /home/hadoop/hadoop slave1:/home/hadoop scp -r /home/hadoop/hadoop slave2:/home/hadoop
進入master的~/hadoop目錄,執行如下操做:
$ bin/hadoop namenode -format
格式化namenode,第一次啓動服務前執行的操做,之後不須要執行。
在master節點上,執行以下命令,啓動hadoop集羣:
[hadoop@master ~]$ ~/hadoop/sbin/start-dfs.sh Starting namenodes on [master] master: starting namenode, logging to /home/hadoop/hadoop/logs/hadoop-hadoop-namenode-master.out slave2: starting datanode, logging to /home/hadoop/hadoop/logs/hadoop-hadoop-datanode-slave2.out slave1: starting datanode, logging to /home/hadoop/hadoop/logs/hadoop-hadoop-datanode-slave1.out Starting secondary namenodes [0.0.0.0] 0.0.0.0: starting secondarynamenode, logging to /home/hadoop/hadoop/logs/hadoop-hadoop-secondarynamenode-master.out [hadoop@master ~]$
在master節點上,執行以下命令,是否能夠看到下面幾個進程:
[hadoop@master ~]$ jps 2598 SecondaryNameNode 2714 Jps 2395 NameNode [hadoop@master ~]$
在兩臺slave上執行以下命令,是否能夠看到下面幾個進程:
[hadoop@slave1 ~]$ jps 2394 Jps 2317 DataNode [hadoop@slave1 ~]$ ############# [hadoop@slave2 ~]$ jps 2396 Jps 2319 DataNode [hadoop@slave2 ~]$
在master節點上,執行以下命令,啓動YARN:
[hadoop@master ~]~/hadoop/sbin/start-yarn.sh starting yarn daemons starting resourcemanager, logging to /home/hadoop/hadoop/logs/yarn-hadoop-resourcemanager-master.out slave2: starting nodemanager, logging to /home/hadoop/hadoop/logs/yarn-hadoop-nodemanager-slave2.out slave1: starting nodemanager, logging to /home/hadoop/hadoop/logs/yarn-hadoop-nodemanager-slave1.out [hadoop@master ~]$ jps
使用jps,若是出現 ResourceManager 這個進程,證實yarn已經啓動成功了。
最後,驗證集羣計算,執行Hadoop自帶的examples,執行以下命令:
~/hadoop/bin/hadoop jar ~/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar randomwriter out
在瀏覽器瀏覽:
http://192.168.1.101:8088/
http://192.168.1.101:50070/
看是否能打開,以瞭解一下集羣的信息。
解壓zookeeper安裝包,並重命名爲zookeeper,而後進行如下操做。
進入~/zookeeper/conf目錄,拷貝zoo_sample.cfg文件爲zoo.cfg
$ cp zoo_sample.cfg zoo.cfg
對zoo.cfg進行編輯,內容以下:
dataDir=/home/hadoop/zookeeper/data server.1=master:2888:3888 server.2=slave1:2888:3888 server.3=slave2:2888:3888
在dataDir目錄下新建myid文件,輸入一個數字(master爲1,slave1爲2,slave2爲3),好比master主機上的操做以下:
$ mkdir /home/hadoop/zookeeper/data $ echo "1" > /home/hadoop/zookeeper/data/myid
一樣,你也可使用scp命令進行遠程複製,只不過要修改每一個節點上myid文件中的數字。
在ZooKeeper集羣的每一個結點上,執行啓動ZooKeeper服務的腳本:
$ ~/zookeeper/bin/zkServer.sh start
這裏要強調的是,須要在三臺機器上都啓動Zookeeper。只在master啓動是不行的。
將hbase安裝包進行解壓,並重命名爲hbase,而後進行以下配置。
export JAVA_HOME=/usr/java/default export HBASE_CLASSPATH=/home/hadoop/hadoop/etc/hadoop/ export HBASE_MANAGES_ZK=false
<configuration> <property> <name>hbase.rootdir</name> <value>hdfs://master:9000/hbase</value> </property> <property> <name>hbase.master</name> <value>master</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.zookeeper.property.clientPort</name> <value>2181</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>master,slave1,slave2</value> </property> <property> <name>zookeeper.session.timeout</name> <value>60000000</value> </property> <property> <name>dfs.support.append</name> <value>true</value> </property> </configuration>
在 regionservers 文件中添加slave列表:
slave1 slave2
將整個hbase安裝目錄都遠程拷貝到全部slave服務器:
$ scp -r /home/hadoop/hbase slave1:/home/hadoop $ scp -r /home/hadoop/hbase slave2:/home/hadoop
在master節點上,執行以下命令:
[hadoop@master ~]$ ~/hbase/bin/start-hbase.sh starting master, logging to /home/hadoop/hbase/bin/../logs/hbase-hadoop-master-master.out Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0 Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0 slave1: starting regionserver, logging to /home/hadoop/hbase/bin/../logs/hbase-hadoop-regionserver-slave1.out slave2: starting regionserver, logging to /home/hadoop/hbase/bin/../logs/hbase-hadoop-regionserver-slave2.out slave1: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0 slave1: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0 slave2: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0 slave2: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0 [hadoop@master ~]$
在master節點,使用 jps 查看進程:
[hadoop@master ~]$ jps 3586 Jps 2408 NameNode 2808 ResourceManager 3387 HMaster 2607 SecondaryNameNode 3231 QuorumPeerMain [hadoop@master ~]$
在slave節點,使用 jps 查看進程:
[hadoop@slave1 ~]$ jps 2736 HRegionServer 2952 Jps 2313 DataNode 2621 QuorumPeerMain [hadoop@slave1 ~]$
若是看到了HMaster和HRegionServer則標識HBase啓動成功。
8.6.使用shell操做HBase
[hadoop@master ~]$ ~/hbase/bin/hbase shell 2016-10-27 11:52:47,394 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable HBase Shell; enter 'help<RETURN>' for list of supported commands. Type "exit<RETURN>" to leave the HBase Shell Version 1.2.3, rbd63744624a26dc3350137b564fe746df7a721a4, Mon Aug 29 15:13:42 PDT 2016 hbase(main):001:0> list TABLE member 1 row(s) in 0.4470 seconds => ["member"] hbase(main):002:0> version 1.2.3, rbd63744624a26dc3350137b564fe746df7a721a4, Mon Aug 29 15:13:42 PDT 2016 hbase(main):004:0> status 1 active master, 0 backup masters, 2 servers, 0 dead, 1.5000 average load hbase(main):005:0> exit
若是可以順利執行,則表明HBase,啓動成功了。裏面的member表是我後來建立的,你也可使用create命令建立一個。
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Hadoop 2已經將HDFS和YARN分開管理,這樣分開管理,可使HDFS更方便地進行HA或Federation,實現HDFS的線性擴展(Scale out),從而保證HDFS集羣的高可用性。從另外一個方面們來講,HDFS能夠做爲一個通用的分佈式存儲系統,而爲第三方的分佈式計算框架提供方便,就像相似YARN的計算框架,其餘的如,Spark等等。
YARN就是MapReduce V2,將原來Hadoop 1.x中的JobTracker拆分爲兩部分:一部分是負責資源的管理(Resource Manager),另外一部分負責任務的調度(Scheduler)。
https://www.iwwenbo.com/hadoop-hbase-zookeeper/ (特別要感謝此做者,我是經過他的教程順利完成安裝的,可是因爲hadoop,zookeeper,hbase使用的版本過老了,此次我寫到教程加入了YARN的配置和我的的理解。)
http://blog.csdn.net/shirdrn/article/details/9731423
http://blog.csdn.net/renfengjun/article/details/25320043
http://blog.csdn.net/young_kim1/article/details/50324345