說明:本次集羣搭建使用系統版本Centos 7.5 ,軟件版本 V3.1.1。html
本次集羣搭建共三臺機器,具體說明下:node
主機名
|
IP
|
說明
|
hadoop01
|
10.0.0.10
|
DataNode、NodeManager、NameNode
|
hadoop02
|
10.0.0.11
|
DataNode、NodeManager、ResourceManager、SecondaryNameNode
|
hadoop03
|
10.0.0.12
|
DataNode、NodeManager
|
[clsn@hadoop01 /home/clsn] $cat /etc/redhat-release CentOS Linux release 7.5.1804 (Core) [clsn@hadoop01 /home/clsn] $uname -r 3.10.0-862.el7.x86_64 [clsn@hadoop01 /home/clsn] $sestatus SELinux status: disabled [clsn@hadoop01 /home/clsn] $systemctl status firewalld.service ● firewalld.service - firewalld - dynamic firewall daemon Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled) Active: inactive (dead) Docs: man:firewalld(1) [clsn@hadoop01 /home/clsn] $id clsn uid=1000(clsn) gid=1000(clsn) 組=1000(clsn) [clsn@hadoop01 /home/clsn] $cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 10.0.0.10 hadoop01 10.0.0.11 hadoop02 10.0.0.12 hadoop03
注:本集羣內全部進程均由clsn用戶啓動python
ssh-keygen ssh-copy-id -i ~/.ssh/id_rsa.pub 127.0.0.1 scp -rp ~/.ssh hadoop02:/home/clsn scp -rp ~/.ssh hadoop03:/home/clsn
在三臺機器上都須要操做linux
tar xf jdk-8u191-linux-x64.tar.gz -C /usr/local/ ln -s /usr/local/jdk1.8.0_191 /usr/local/jdk sed -i.ori '$a export JAVA_HOME=/usr/local/jdk\nexport PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH\nexport CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$JAVA_HOME/lib/tools.jar' /etc/profile . /etc/profile
wget http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-3.1.1/hadoop-3.1.1.tar.gz
tar xf hadoop-3.1.1.tar.gz -C /usr/local/ ln -s /usr/local/hadoop-3.1.1 /usr/local/hadoop sudo chown -R clsn.clsn /usr/local/hadoop-3.1.1/
配置文件所有位於 /usr/local/hadoop/etc/hadoop 文件夾下git
[clsn@hadoop01 /usr/local/hadoop/etc/hadoop] $ head hadoop-env.sh . /etc/profile # # Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information # regarding copyright ownership. The ASF licenses this file # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at
[clsn@hadoop01 /usr/local/hadoop/etc/hadoop] $ cat core-site.xml <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <!-- 指定HDFS老大(namenode)的通訊地址 --> <property> <name>fs.defaultFS</name> <value>hdfs://hadoop01:9000</value> </property> <!-- 指定hadoop運行時產生文件的存儲路徑 --> <property> <name>hadoop.tmp.dir</name> <value>/data/tmp</value> </property> </configuration>
[clsn@hadoop01 /usr/local/hadoop/etc/hadoop] $ cat hdfs-site.xml <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <!-- 設置namenode的http通信地址 --> <property> <name>dfs.namenode.http-address</name> <value>hadoop01:50070</value> </property> <!-- 設置secondarynamenode的http通信地址 --> <property> <name>dfs.namenode.secondary.http-address</name> <value>hadoop02:50090</value> </property> <!-- 設置namenode存放的路徑 --> <property> <name>dfs.namenode.name.dir</name> <value>/data/name</value> </property> <!-- 設置hdfs副本數量 --> <property> <name>dfs.replication</name> <value>2</value> </property> <!-- 設置datanode存放的路徑 --> <property> <name>dfs.datanode.data.dir</name> <value>/data/datanode</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> </configuration>
[clsn@hadoop01 /usr/local/hadoop/etc/hadoop] $ cat mapred-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <!-- 通知框架MR使用YARN --> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.application.classpath</name> <value> /usr/local/hadoop/etc/hadoop, /usr/local/hadoop/share/hadoop/common/*, /usr/local/hadoop/share/hadoop/common/lib/*, /usr/local/hadoop/share/hadoop/hdfs/*, /usr/local/hadoop/share/hadoop/hdfs/lib/*, /usr/local/hadoop/share/hadoop/mapreduce/*, /usr/local/hadoop/share/hadoop/mapreduce/lib/*, /usr/local/hadoop/share/hadoop/yarn/*, /usr/local/hadoop/share/hadoop/yarn/lib/* </value> </property> </configuration>
[clsn@hadoop01 /usr/local/hadoop/etc/hadoop] $ cat yarn-site.xml <?xml version="1.0"?> <configuration> <property> <name>yarn.resourcemanager.hostname</name> <value>hadoop02</value> </property> <property> <description>The http address of the RM web application.</description> <name>yarn.resourcemanager.webapp.address</name> <value>${yarn.resourcemanager.hostname}:8088</value> </property> <property> <description>The address of the applications manager interface in the RM.</description> <name>yarn.resourcemanager.address</name> <value>${yarn.resourcemanager.hostname}:8032</value> </property> <property> <description>The address of the scheduler interface.</description> <name>yarn.resourcemanager.scheduler.address</name> <value>${yarn.resourcemanager.hostname}:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>${yarn.resourcemanager.hostname}:8031</value> </property> <property> <description>The address of the RM admin interface.</description> <name>yarn.resourcemanager.admin.address</name> <value>${yarn.resourcemanager.hostname}:8033</value> </property> </configuration>
echo 'hadoop02' >> /usr/local/hadoop/etc/hadoop/masters echo 'hadoop03 hadoop01' >> /usr/local/hadoop/etc/hadoop/slaves
啓動腳本文件所有位於 /usr/local/hadoop/sbin 文件夾下:
(1)修改 start-dfs.sh stop-dfs.sh 文件添加:github
HDFS_DATANODE_USER=clsn HADOOP_SECURE_DN_USER=hdfs HDFS_NAMENODE_USER=clsn HDFS_SECONDARYNAMENODE_USER=clsn
(2)修改start-yarn.sh 和 stop-yarn.sh文件添加:web
YARN_RESOURCEMANAGER_USER=clsn HADOOP_SECURE_DN_USER=yarn YARN_NODEMANAGER_USER=clsn
mkdir -p /data/tmp mkdir -p /data/name mkdir -p /data/datanode chown -R clsn.clsn /data
在集羣內全部機器上都進行建立,也能夠複製文件夾sql
for i in hadoop02 hadoop03 do sudo scp -rp /data $i:/ done
for i in hadoop02 hadoop03 do sudo scp -rp /usr/local/hadoop-3.1.1 $i:/usr/local/ done
(1)第一次啓動前須要格式化shell
/usr/local/hadoop/bin/hdfs namenode -format
(2)啓動集羣apache
cd /usr/local/hadoop/sbin ./start-all.sh
(1)使用jps查看集羣中各個角色,是否與預期相一致
[clsn@hadoop01 /home/clsn] $ pssh -ih cluster "`which jps`" [1] 11:30:31 [SUCCESS] hadoop03 7947 DataNode 8875 Jps 8383 NodeManager [2] 11:30:31 [SUCCESS] hadoop01 20193 DataNode 20665 NodeManager 21017 NameNode 22206 Jps [3] 11:30:31 [SUCCESS] hadoop02 8896 DataNode 9427 NodeManager 10883 Jps 9304 ResourceManager 10367 SecondaryNameNode
(2)瀏覽器訪問http://hadoop02:8088/cluster/nodes
該頁面爲ResourceManager 管理界面,在上面能夠看到集羣中的三臺Active Nodes。
(3) 瀏覽器訪問http://hadoop01:50070/dfshealth.html#tab-datanode
該頁面爲NameNode管理頁面
cd /opt/ wget http://mirrors.tuna.tsinghua.edu.cn/apache/hbase/1.4.9/hbase-1.4.9-bin.tar.gz tar xf hbase-1.4.9-bin.tar.gz -C /usr/local/ ln -s /usr/local/hbase-1.4.9 /usr/local/hbase
# 添加一行 . /etc/profile
6.2.2
[clsn@hadoop01 /usr/local/hbase/conf] $ cat hbase-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>hbase.rootdir</name> <!-- hbase存放數據目錄 --> <value>hdfs://hadoop01:9000/hbase/hbase_db</value> <!-- 端口要和Hadoop的fs.defaultFS端口一致--> </property> <property> <name>hbase.cluster.distributed</name> <!-- 是否分佈式部署 --> <value>true</value> </property> <property> <name>hbase.zookeeper.quorum</name> <!-- zookooper 服務啓動的節點,只能爲奇數個 --> <value>hadoop01,hadoop02,hadoop03</value> </property> <property> <!--zookooper配置、日誌等的存儲位置,必須爲以存在 --> <name>hbase.zookeeper.property.dataDir</name> <value>/data/hbase/zookeeper</value> </property> <property> <!--hbase web 端口 --> <name>hbase.master.info.port</name> <value>16610</value> </property> </configuration>
注意:
zookeeper有這樣一個特性:
集羣中只要有過半的機器是正常工做的,那麼整個集羣對外就是可用的。
也就是說若是有2個zookeeper,那麼只要有1個死了zookeeper就不能用了,由於1沒有過半,因此2個zookeeper的死亡容忍度爲0;
同理,要是有3個zookeeper,一個死了,還剩下2個正常的,過半了,因此3個zookeeper的容忍度爲1;
再多列舉幾個:2->0 ; 3->1 ; 4->1 ; 5->2 ; 6->2 會發現一個規律,2n和2n-1的容忍度是同樣的,都是n-1,因此爲了更加高效,何須增長那一個沒必要要的zookeeper
[clsn@hadoop01 /usr/local/hbase/conf] $ cat regionservers hadoop01 hadoop02 hadoop03
for i in hadoop02 hadoop03 do sudo scp -rp /usr/local/hbase-1.4.9 $i:/usr/local/ done
[clsn@hadoop01 /usr/local/hbase/bin] $ sudo ./start-hbase.sh hadoop03: running zookeeper, logging to /usr/local/hbase-1.4.9/bin/../logs/hbase-root-zookeeper-hadoop03.out hadoop02: running zookeeper, logging to /usr/local/hbase-1.4.9/bin/../logs/hbase-root-zookeeper-hadoop02.out hadoop01: running zookeeper, logging to /usr/local/hbase-1.4.9/bin/../logs/hbase-root-zookeeper-hadoop01.out running master, logging to /usr/local/hbase-1.4.9/bin/../logs/hbase-root-master-hadoop01.out hadoop02: running regionserver, logging to /usr/local/hbase-1.4.9/bin/../logs/hbase-root-regionserver-hadoop02.out hadoop03: running regionserver, logging to /usr/local/hbase-1.4.9/bin/../logs/hbase-root-regionserver-hadoop03.out hadoop01: running regionserver, logging to /usr/local/hbase-1.4.9/bin/../logs/hbase-root-regionserver-hadoop01.out
訪問http://hadoop01:16610/master-status 查看hbase狀態
[clsn@hadoop01 /usr/local/hbase/bin] $ ./hbase shell #啓動hbase客戶端 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hbase-1.4.9/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop-3.1.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] HBase Shell Use "help" to get list of supported commands. Use "exit" to quit this interactive shell. Version 1.4.9, rd625b212e46d01cb17db9ac2e9e927fdb201afa1, Wed Dec 5 11:54:10 PST 2018 hbase(main):001:0> create 'clsn','cf' #建立一個clsn表,一個cf 列簇 0 row(s) in 7.8790 seconds => Hbase::Table - clsn hbase(main):003:0> list #查看hbase 全部表 TABLE clsn 1 row(s) in 0.0860 seconds => ["clsn"] hbase(main):004:0> put 'clsn','1000000000','cf:name','clsn' #put一條記錄到表clsn,rowkey 爲 1000000000,放到 name列上 0 row(s) in 0.3390 seconds hbase(main):005:0> put 'clsn','1000000000','cf:sex','male' #put一條記錄到表clsn,rowkey 爲 1000000000,放到sex列上 0 row(s) in 0.0300 seconds hbase(main):006:0> put 'clsn','1000000000','cf:age','24' #put一條記錄到表clsn,rowkey 爲 1000000000,放到age列上 0 row(s) in 0.0290 seconds hbase(main):007:0> count 'clsn' 1 row(s) in 0.2100 seconds => 1 hbase(main):008:0> get 'clsn','cf' COLUMN CELL 0 row(s) in 0.1050 seconds hbase(main):009:0> get 'clsn','1000000000' #獲取數據 COLUMN CELL cf:age timestamp=1545710530665, value=24 cf:name timestamp=1545710495871, value=clsn cf:sex timestamp=1545710509333, value=male 1 row(s) in 0.0830 seconds hbase(main):010:0> list TABLE clsn 1 row(s) in 0.0240 seconds => ["clsn"] hbase(main):011:0> drop clsn NameError: undefined local variable or method `clsn' for #<Object:0x6f731759> hbase(main):012:0> drop 'clsn' ERROR: Table clsn is enabled. Disable it first. Here is some help for this command: Drop the named table. Table must first be disabled: hbase> drop 't1' hbase> drop 'ns1:t1' hbase(main):013:0> list TABLE clsn 1 row(s) in 0.0330 seconds => ["clsn"] hbase(main):015:0> disable 'clsn' 0 row(s) in 2.4710 seconds hbase(main):016:0> list TABLE clsn 1 row(s) in 0.0210 seconds => ["clsn"]
https://hadoop.apache.org/releases.html
https://my.oschina.net/orrin/blog/1816023
https://www.yiibai.com/hadoop/
http://blog.fens.me/hadoop-family-roadmap/
http://www.cnblogs.com/Springmoon-venn/p/9054006.html
https://github.com/googlehosts/hosts
http://abloz.com/hbase/book.html