搭建Hadoop僞分佈式集羣

版本與環境

準備

  • (PS:如下配置需在克隆slave以前完成)
  • 安裝Ubuntu(PS:記得安裝OpenSSH)
  • 解壓hadoop和jdk:tar -zxvf xxx.tar.gz
  • 移動hadoop根目錄:mv hadoop-3.1.3 /usr/local/hadoop3
  • 移動jdk根目錄:mv jdk-1.8.0_231 /usr/local/jdk1.8

添加環境變量

  • 執行如下命令將環境變量寫入.bashrc
# cd ~
# vim .bashrc
  • java variables
export JAVA_HOME=/usr/local/jdk1.8/ 
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib
export PATH=$JAVA_HOME/bin:$PATH
  • hadoop variables
export HADOOP_HOME=/usr/local/hadoop3
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
  • 保存環境變量
# source .bashrc

配置Hadoop

  • 進入目錄:cd /usr/local/hadoop3/etc/hadoop
  • 配置文件hadoop-env.sh
<property>
        <name>hadoop.tmp.dir</name>
        <value>/usr/local/hadoop3/tmp</value>
        <description>文件臨時存儲目錄</description>
    </property>
    <property>
        <name>fs.defaultFS</name>
        <!-- 1.x name>fs.default.name</name -->
        <value>hdfs://master:9000</value>
        <description>hdfs namenode訪問地址</description>
    </property>
    <property>
         <name>io.file.buffer.size</name>
         <value>102400</value>
         <description>文件塊大小</description>
    </property>
  • 配置文件hdfs-site.xml
<property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>slave1:50080</value>
    </property>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
        <description>文件塊的副本數</description>
    </property> </property>
    <property>
        <name>dfs.name.dir</name>
        <value>/usr/local/hadoop3/hdfs/name</value>
        <description>namenode目錄</description>
    </property>
    <property>
        <name>dfs.data.dir</name>
        <value>/usr/local/hadoop3/hdfs/data</value>
        <description>datanode目錄</description>
    </property>
  • 配置文件mapred-site.xml
<property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>master:10020</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>master:19888</value>
    </property>
    <property>
        <name>yarn.app.mapreduce.am.env</name>
        <value>HADOOP_MAPRED_HOME=/usr/local/hadoop3</value>
    </property>
    <property>
        <name>mapreduce.map.env</name>
        <value>HADOOP_MAPRED_HOME=/usr/local/hadoop3</value>
    </property>
    <property>
        <name>mapreduce.reduce.env</name>
        <value>HADOOP_MAPRED_HOME=/usr/local/hadoop3</value>
    </property>
  • 配置文件yarn-site.xml
<property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>master</value>
    </property>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>master:8032</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>master:8030</value>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>master:8031</value>
    </property>
    <property>
        <name>yarn.resourcemanager.admin.address</name>
        <value>master:8033</value>
    </property>
    <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>master:8088</value>
    </property>
    <property>  
        <name>yarn.nodemanager.vmem-check-enabled</name>  
        <value>false</value>  
    </property>

克隆節點

  • 完成以上配置後,即可以此爲模板克隆多個節點
  • 博主以兩個slave節點爲例

配置主機名與IP

  • 分別修改主機名爲:masterslave1slave2
# hostnamectl set-hostname xxx
  • 如有/etc/cloud/cloud.cfg文件,則修改preserve_hostnametrue
  • 分別修改靜態IP:192.168.127.134192.168.127.135192.168.127.136
# vim /etc/netplan/50-cloud-init.yaml

示例圖

  • 使IP配置生效:# netplan apply
  • 修改每一個節點的靜態DNS解析,例如:
# vim /etc/hosts
192.168.127.134 master
192.168.127.135 slave1
192.168.127.136 slave2

設置節點間免密登陸

  • master、slave一、slave2中輸入:ssh-keygen -t rsa -P ""
  • 在master中將slave一、slave2的配置合成keys
# cd ~/.ssh
# scp -P 22 slave1:~/.ssh/id_rsa.pub id_rsa.pub1
# scp -P 22 slave2:~/.ssh/id_rsa.pub id_rsa.pub2
# cat id_rsa.pub >> authorized_keys
# cat id_rsa.pub1 >> authorized_keys
# cat id_rsa.pub2 >> authorized_keys
  • 將配置傳給slave一、slave2
# scp -P 22 authorized_keys slave1:~/.ssh/
# scp -P 22 authorized_keys slave2:~/.ssh/

配置腳本文件

  • 配置master節點便可
  • 進入存放指令的目錄:cd /usr/local/hadoop3/sbin
  • 修改start-dfs.shstop-dfs.sh
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
  • 修改start-yarn.shstop-yarn.sh
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root

啓動並驗證

  • 啓動集羣:# /usr/local/hadoop3/sbin/start-all.sh
  • 顯示當前全部java進程:jps
  • 登陸master:8088master:9870查看hadoop自帶的web服務

運行測試用例

  • 進入目錄:/usr/local/hadoop3
  • 在HDFS中建立文件夾:# hdfs dfs -mkdir -p /data/input
  • 將任一txt文件放入:# hdfs dfs -put README.txt /data/input
  • 執行mapreduce測試用例:# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount /data/input /data/output/result
  • 查看結果:# hdfs dfs -cat /data/output/result/part-r-00000
相關文章
相關標籤/搜索