最近有人提出能不能發一些大數據相關的知識,No problem ! 今天先從安裝環境提及,搭建起本身的學習環境。java
Hadoop的三種搭建方式以及使用環境:node
這篇文件介紹如何搭建徹底分佈式的hadoop集羣,一個主節點,兩個數據節點。python
虛擬機
、物理機
、雲上實例
都可,本篇使用Openstack
私有云裏面的3個實例進行安裝部署。apache
服務器 | 系統 | 內存 | IP | 規劃 | JDK | HADOOP |
---|---|---|---|---|---|---|
node1 | Ubuntu 18.04.2 LTS | 8G | 10.101.18.21 | master | JDK 1.8.0_222 | hadoop-3.2.1 |
node2 | Ubuntu 18.04.2 LTS | 8G | 10.101.18.8 | slave1 | JDK 1.8.0_222 | hadoop-3.2.1 |
node3 | Ubuntu 18.04.2 LTS | 8G | 10.101.18.24 | slave2 | JDK 1.8.0_222 | hadoop-3.2.1 |
由於Hadoop是用Java語言編寫的,因此計算機上須要安裝Java環境,我在這使用JDK 1.8.0_222
(推薦使用Sun JDK)ubuntu
安裝命令vim
sudo apt install openjdk-8-jdk-headless
配置JAVA環境變量,在當前用戶根目錄下的.profile文件最下面加入如下內容:瀏覽器
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 export PATH=$JAVA_HOME/bin:$PATH export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
使用source
命令讓當即生效bash
source .profile
修改三臺服務器的hosts文件服務器
vim /etc/hosts #添加下面內容,根據我的服務器IP配置 10.101.18.21 master 10.101.18.8 slave1 10.101.18.24 slave2
ssh-keygen -t rsa
ssh-copy-id -i ~/.ssh/id_rsa.pub master ssh-copy-id -i ~/.ssh/id_rsa.pub slave1 ssh-copy-id -i ~/.ssh/id_rsa.pub slave2
ssh master ssh slave1 ssh slave2
咱們先在Master節點下載Hadoop包,而後修改配置,隨後複製到其餘Slave節點稍做修改就能夠了。app
#下載 wget http://http://apache.claz.org/hadoop/common/hadoop-3.2.1//hadoop-3.2.1.tar.gz #解壓到 /usr/local 目錄 sudo tar -xzvf hadoop-3.2.1.tar.gz -C /usr/local #修改hadoop的文件權限 sudo chown -R ubuntu:ubuntu hadoop-3.2.1.tar.gz #重命名文件夾 sudo mv hadoop-3.2.1 hadoop
和配置JDK環境變量同樣,編輯用戶目錄下的.profile
文件, 添加Hadoop環境變量:
export HADOOP_HOME=/usr/local/hadoop export PATH=$PATH:$HADOOP_HOME/bin export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
執行 source .profile
讓當即生效
Hadoop 的各個組件均用XML文件進行配置, 配置文件都放在 /usr/local/hadoop/etc/hadoop
目錄中:
a. 編輯core-site.xml
文件,修改內容以下:
<configuration> <property> <name>hadoop.tmp.dir</name> <value>file:/usr/local/hadoop/tmp</value> <description>Abase for other temporary directories.</description> </property> <property> <name>fs.defaultFS</name> <value>hdfs://master:9000</value> </property> </configuration>
參數說明:
如沒有配置
hadoop.tmp.dir
參數,系統使用默認的臨時目錄:/tmp/hadoo-hadoop。而這個目錄在每次重啓後都會被刪除,必須從新執行format才行,不然會出錯。
b. 編輯hdfs-site.xml
,修改內容以下:
<configuration> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.name.dir</name> <value>/usr/local/hadoop/hdfs/name</value> </property> <property> <name>dfs.data.dir</name> <value>/usr/local/hadoop/hdfs/data</value> </property> </configuration>
參數說明:
c. 編輯mapred-site.xml
,修改內容以下:
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.application.classpath</name> <value>$HADOOP_HOME/share/hadoop/mapreduce/*:$HADOOP_HOME/share/hadoop/mapreduce/lib/*</value> </property> </configuration>
d. 編輯yarn-site.xml
,修改內容以下:
<configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>master</value> </property> <property> <name>yarn.nodemanager.env-whitelist</name> <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME</value> </property> </configuration>
e. 編輯workers
, 修改內容以下:
slave1 slave2
配置worker節點
將Master節點配置好的Hadoop打包,發送到其餘兩個節點:
# 打包hadoop包 tar -cxf hadoop.tar.gz /usr/local/hadoop # 拷貝到其餘兩個節點 scp hadoop.tar.gz ubuntu@slave1:~ scp hadoop.tar.gz ubuntu@slave2:~
在其餘節點加壓Hadoop包到/usr/local
目錄
sudo tar -xzvf hadoop.tar.gz -C /usr/local/
配置Slave1和Slaver2兩個節點的Hadoop環境變量:
export HADOOP_HOME=/usr/local/hadoop export PATH=$PATH:$HADOOP_HOME/bin export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
進入Master節點的Hadoop目錄,執行一下操做:
bin/hadoop namenode -format
格式化namenode,第一次啓動服務前執行的操做,之後不須要執行。
截取部分日誌(看第5行日誌表示格式化成功):
2019-11-11 13:34:18,960 INFO util.GSet: VM type = 64-bit 2019-11-11 13:34:18,960 INFO util.GSet: 0.029999999329447746% max memory 1.7 GB = 544.5 KB 2019-11-11 13:34:18,961 INFO util.GSet: capacity = 2^16 = 65536 entries 2019-11-11 13:34:18,994 INFO namenode.FSImage: Allocated new BlockPoolId: BP-2017092058-10.101.18.21-1573450458983 2019-11-11 13:34:19,010 INFO common.Storage: Storage directory /usr/local/hadoop/hdfs/name has been successfully formatted. 2019-11-11 13:34:19,051 INFO namenode.FSImageFormatProtobuf: Saving image file /usr/local/hadoop/hdfs/name/current/fsimage.ckpt_0000000000000000000 using no compression 2019-11-11 13:34:19,186 INFO namenode.FSImageFormatProtobuf: Image file /usr/local/hadoop/hdfs/name/current/fsimage.ckpt_0000000000000000000 of size 401 bytes saved in 0 seconds . 2019-11-11 13:34:19,207 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0 2019-11-11 13:34:19,214 INFO namenode.FSImage: FSImageSaver clean checkpoint: txid=0 when meet shutdown.
sbin/start-all.sh
啓動過程遇到的問題與解決方案:
a. 錯誤:master: rcmd: socket: Permission denied
解決:
執行 echo "ssh" > /etc/pdsh/rcmd_default
b. 錯誤:JAVA_HOME is not set and could not be found.
解決:
修改三個節點的hadoop-env.sh
,添加下面JAVA環境變量
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
Master節點執行輸出:
19557 ResourceManager 19914 Jps 19291 SecondaryNameNode 18959 NameNode
Slave節點執行輸入:
18580 NodeManager 18366 DataNode 18703 Jps
hadoop dfsadmin -report
查看結果:
Configured Capacity: 41258442752 (38.42 GB) Present Capacity: 5170511872 (4.82 GB) DFS Remaining: 5170454528 (4.82 GB) DFS Used: 57344 (56 KB) DFS Used%: 0.00% Replicated Blocks: Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 Missing blocks (with replication factor 1): 0 Low redundancy blocks with highest priority to recover: 0 Pending deletion blocks: 0 Erasure Coded Block Groups: Low redundancy block groups: 0 Block groups with corrupt internal blocks: 0 Missing block groups: 0 Low redundancy blocks with highest priority to recover: 0 Pending deletion blocks: 0 ------------------------------------------------- Live datanodes (2): Name: 10.101.18.24:9866 (slave2) Hostname: slave2 Decommission Status : Normal Configured Capacity: 20629221376 (19.21 GB) DFS Used: 28672 (28 KB) Non DFS Used: 16919797760 (15.76 GB) DFS Remaining: 3692617728 (3.44 GB) DFS Used%: 0.00% DFS Remaining%: 17.90% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 1 Last contact: Mon Nov 11 15:00:27 CST 2019 Last Block Report: Mon Nov 11 14:05:48 CST 2019 Num of Blocks: 0 Name: 10.101.18.8:9866 (slave1) Hostname: slave1 Decommission Status : Normal Configured Capacity: 20629221376 (19.21 GB) DFS Used: 28672 (28 KB) Non DFS Used: 19134578688 (17.82 GB) DFS Remaining: 1477836800 (1.38 GB) DFS Used%: 0.00% DFS Remaining%: 7.16% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 1 Last contact: Mon Nov 11 15:00:24 CST 2019 Last Block Report: Mon Nov 11 13:53:57 CST 2019 Num of Blocks: 0
sbin/stop-all.sh
在瀏覽器輸入 http://10.101.18.21:9870 ,結果以下:
在瀏覽器輸入 http://10.101.18.21:8088 ,結果以下:
關注公衆號:JAVA九點半課堂,回覆【資料】獲取2T最新技術資料,天天還有技術分享,咱們一塊兒進步,謝謝!