參考: http://hadoop.apache.org/docs...html
Hadoop是由Apache基金會開發的分佈式系統基礎架構,用來解決海量數據的存儲和分析計算問題。java
Hadoop的優點node
2.x版本和3.x版本組成以下:linux
備註:1.x版本中,沒有Yarn,MapRedcuer承擔計算和資源調度git
IP- 主機名 | 操做系統 | 配置 | 節點 |
---|---|---|---|
192.168.122.10-Hadoop10 | CentOS 7.5 | 1核/4G內存/50G硬盤 | NameNode、DataNode、NodeManager |
192.168.122.11-Hadoop11 | CentOS 7.5 | 1核/4G內存/50G硬盤 | ResourceManager、DataNode、NodeManager |
192.168.122.12-Hadoop12 | CentOS 7.5 | 1核/4G內存/50G硬盤 | SecondaryNameNode、DataNode、NodeManager |
yum install -y epel-release yum update
[v2admin@hadoop10 ~]$ ssh-keygen -t rsa //...連續回車便可生成私鑰id_rsa和id_rsa.pub // 個人用戶是v2admin,後續操做都是以這個用戶 [v2admin@hadoop10 ~]$ ssh-copy-id hadoop10 [v2admin@hadoop10 ~]$ ssh-copy-id hadoop11 [v2admin@hadoop10 ~]$ ssh-copy-id hadoop12 // hadoop11 hadoop12 執行一樣操做
// 我本身的用操做系統是Ubuntu 18.04,直接使用scp進行上傳。 // 若是使用windows系統,能夠安裝lrzsz或者使用ftp方式上傳至虛擬機 scp jdk-8u212-linux-x64.tar.gz hadoop-3.1.3.tar.gz v2admin@192.168.122.10:/home/v2admin scp jdk-8u212-linux-x64.tar.gz hadoop-3.1.3.tar.gz v2admin@192.168.122.11:/home/v2admin scp jdk-8u212-linux-x64.tar.gz hadoop-3.1.3.tar.gz v2admin@192.168.122.12:/home/v2admin
[v2admin@hadoop10 ~]$tar zxvf jdk-8u212-linux-x64.tar.gz [v2admin@hadoop10 ~]$sudo mv jdk1.8.0_212/ /usr/local/jdk8
[v2admin@hadoop10 ~]$sudo tar zxvf hadoop-3.1.3.tar.gz -C /opt [v2admin@hadoop10 ~]$ sudo chown -R v2admin:v2admin /opt/hadoop-3.1.3 // 修改所屬用戶和組爲當前用戶
[v2admin@hadoop10 ~]$sudo vim /etc/profile // 最後面添加 ...... # set jdk hadoop env export JAVA_HOME=/usr/local/jdk8 export JRE_HOME=${JAVA_HOME}/jre export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib export HADOOP_HOME=/opt/hadoop-3.1.3 export PATH=${PATH}:${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin .... [v2admin@hadoop10 ~]$source /etc/profile [v2admin@hadoop10 ~]$java -version // 驗證下jdk java version "1.8.0_212" Java(TM) SE Runtime Environment (build 1.8.0_212-b10) Java HotSpot(TM) 64-Bit Server VM (build 25.212-b10, mixed mode) [v2admin@hadoop10 ~]$hadoop version // 驗證下hadoop Hadoop 3.1.3 Source code repository https://gitbox.apache.org/repos/asf/hadoop.git -r ba631c436b806728f8ec2f54ab1e289526c90579 Compiled by ztang on 2019-09-12T02:47Z Compiled with protoc 2.5.0 From source with checksum ec785077c385118ac91aadde5ec9799 This command was run using /opt/module/hadoop-3.1.3/share/hadoop/common/hadoop-common-3.1.3.jar
由於三臺虛擬機配置文件是同樣的,若是沒有這個腳本,則須要一臺一臺配置,很繁瑣
腳本文件名xrsync.sh
賦予執行權限,將其放到bin目錄下,這樣能夠像使用其餘shell命令同樣直接調用web
#!bin/bash if [ $# -lt 1 ] then echo 缺乏必要的參數 fi # 遍歷集羣服務器 for host in hadoop10 hadoop11 hadoop12 do for file in $@ do if [ -e $file ] then # 獲取父目錄 pdir=$(cd -P $(dirname $file);pwd) # 獲取當前文件名稱 filename=$(basename $file) ssh $host "mkdir -p $pdir" rsync -av $(pdir)/$(fname) $host:$pdir else echo $file not exists! fi done done
[v2admin@hadoop10 ~]$ cd /opt/hadoop-3.1.3/etc/hadoop [v2admin@hadoop10 ~]$ vim hadoop-env.sh // 修改JAVA_HOME內容 export JAVA_HOME=/usr/local/jdk8 [v2admin@hadoop10 ~]$xrsync hadoop-env.sh // 同步更新其餘兩臺主機的配置文件
core-site.xmlshell
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <!-- NameNode的地址 --> <property> <name>fs.defaultFS</name> <value>hdfs://hadoop10:9820</value> </property> <!-- hadoop 數據的存儲目錄 --> <property> <name>hadoop.data.dir</name> <value>/opt/module/hadoop-3.1.3/data</value> </property> <property> <name>hadoop.proxyuser.v2admin.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.v2admin.groups</name> <value>*</value> </property> <!-- 指定用戶 --> <property> <name>hadoop.http.staticuser.user</name> <value>v2admin</value> </property> </configuration>
hdfs-site.xmlapache
<configuration> <!--NameNode數據的存儲目錄 --> <property> <name>dfs.namenode.name.dir</name> <value>file://${hadoop.data.dir}/name</value> </property> <!--DataNode數據存儲目錄 --> <property> <name>dfs.datanode.data.dir</name> <value>file://${hadoop.data.dir}/data</value> </property> <!-- 2n數據的存儲目錄--> <property> <name>dfs.namenode.checkpoint.dir</name> <value>file://${hadoop.data.dir}/namesecondary</value> </property> <property> <name>dfs.client.datanode-restart.timeout</name> <value>30</value> </property> <!--nn的WEB訪問地址 --> <property> <name>dfs.namenode.http-address</name> <value>hadoop10:9870</value> </property> </configuration>
yarn-site.xmlvim
<configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>hadoop11</value> </property> <property> <name>yarn.nodemanager.env-whitelist</name> <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <name>yarn.log.server.url</name> <value>http://hadoop10:19888/jobhistory/logs</value> </property> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>604800</value> </property> </configuration>
mapred-site.xmlwindows
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <!-- 歷史服務器端地址 --> <property> <name>mapreduce.jobhistory.address</name> <value>hadoop10:10020</value> </property> <!-- 歷史服務器web端地址 --> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>hadoop10:19888</value> </property> </configuration>
啓動集羣須要在每臺服務器上去執行相關啓動操做,爲方便啓動集羣,和查看啓動信息,編寫一個啓動腳本startMyCluster.sh
#!/bin/bash if [ $# -lt 1 ] then echo "Not enough arguments Input !!!" exit fi case $1 in # 啓動 "start") echo "==========start hdfs=============" ssh hadoop10 /opt/module/hadoop-3.1.3/sbin/start-dfs.sh echo "==========start historyServer============" ssh hadoop10 /opt/module/hadoop-3.1.3/bin/mapred --daemon start historyserver echo "==========start yarn============" ssh hadoop11 /opt/module/hadoop-3.1.3/sbin/start-yarn.sh ;; # 關閉 "stop") echo "==========stop hdfs=============" ssh hadoop10 /opt/module/hadoop-3.1.3/sbin/stop-dfs.sh echo "==========stop yarn============" ssh hadoop11 /opt/module/hadoop-3.1.3/sbin/stop-yarn.sh echo "==========stop historyserver====" ssh hadoop10 /opt/module/hadoop-3.1.3/bin/mapred --daemon stop historyserver ;; # 查看啓動信息 "jps") for i in hadoop10 hadoop11 hadoop12 do echo "==============$i jps================" ssh $i /usr/local/jdk8/bin/jps done ;; *) echo "Input Args Error!!!" ;; esac
一樣將其放到/bin目錄下方便直接調用
[v2admin@hadoop10 ~]$ startMyCluster.sh start //啓動 ==========start hdfs============= Starting namenodes on [hadoop10] Starting datanodes Starting secondary namenodes [hadoop12] ==========start historyServer============ ==========start yarn============ Starting resourcemanager Starting nodemanagers [v2admin@hadoop10 ~]$ startMyCluster.sh jps //查看啓動信息 ==============hadoop10 jps================ 1831 NameNode 2504 Jps 2265 JobHistoryServer 1980 DataNode 2382 NodeManager ==============hadoop11 jps================ 1635 DataNode 1814 ResourceManager 2297 Jps 1949 NodeManager ==============hadoop12 jps================ 1795 NodeManager 1590 DataNode 1927 Jps 1706 SecondaryNameNode
在安裝部署完畢,啓動時,可能會遇到NoClassDefFoundError: javax/activation/DataSource
我在之前安裝時,沒有出現過,但以前用的2.x,此次打算安裝3.x版本,遇到這個問題,緣由是yarn的lib中缺乏相關jar包
解決方法:
cd /opt/hadoop-3.1.3/share/hadoop/yarn/lib wget https://repo1.maven.org/maven2/javax/activation/activation/1.1.1/activation-1.1.1.jar