因爲篇幅較大,廢話很少說,直奔主題。html
hadoop 安裝一樣可分爲 單機模式、僞分佈式、徹底分佈式java
本文主要介紹徹底分佈式,環境 centos 6.5,hadoop-2.6.5node
[root@localhost ~]# hostname localhost.localdomain [root@localhost ~]# vi /etc/sysconfig/network [root@localhost ~]# hostname localhost.localdomain
修改成 linux
NETWORKING=yes
HOSTNAME=hadoop10
因爲這種方法須要重啓才能生效,故 再查 hostname 沒有變化,這裏我不想重啓,直接使用 臨時更改命令apache
[root@localhost ~]# hostname hodoop10 [root@localhost ~]# hostname hodoop10
重啓失效centos
依次修改 4 臺電腦的 hostname服務器
這個文件和 hostname 的修改沒有任何關係,他須要放在集羣中的每一個節點,以告知每一個節點 各個 IP 對應的 hostname,至關於 DNSsession
vi 命令,加入下面內容app
192.168.10.10 hadoop10 192.168.10.11 hadoop11 192.168.10.12 hadoop12 192.168.10.13 hadoop13
依次修改 4 臺電腦的 /etc/hostsdom
[root@hadoop11 ~]# chkconfig iptables off [root@localhost ~]# chkconfig --list iptables iptables 0:off 1:off 2:off 3:off 4:off 5:off 6:off
[root@localhost ~]# ssh localhost The authenticity of host 'localhost (::1)' can't be established. RSA key fingerprint is d1:40:d3:50:c8:2d:af:d4:a0:d4:cb:9f:6d:8d:ed:2f. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'localhost' (RSA) to the list of known hosts. root@localhost's password: Last login: Tue Sep 17 01:11:07 2019 from 192.168.10.1
出現如上界面,說明已經安裝了 ssh,若是沒有,用下面命令安裝
yum install openssh-server -y
[root@hodoop10 ~]# cd ~/.ssh [root@hodoop10 .ssh]# ls known_hosts
剛開始該目錄下只有一個文件,這個文件記錄 ssh 訪問過的計算機的公鑰
[root@hodoop10 .ssh]# ssh-keygen -t rsa Generating public/private rsa key pair. Enter file in which to save the key (/root/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /root/.ssh/id_rsa. Your public key has been saved in /root/.ssh/id_rsa.pub. The key fingerprint is: df:db:71:2b:a7:59:96:95:88:cd:0d:7e:25:85:f1:0d root@hodoop10 The key's randomart image is: +--[ RSA 2048]----+ | Eo.| | .+.| | .. +| | = +.o| | S . = +.| | . . . o| | . . .+.| | +++.| | .o=. | +-----------------+
一路回車,不須要任何其餘操做
此時再 ls 目錄,能夠看到 公鑰和私鑰
[root@hodoop10 .ssh]# ls id_rsa id_rsa.pub known_hosts
依次爲 4 臺電腦建立公鑰和私鑰
首先把 本臺電腦 的公鑰放到 authorized_keys 文件裏
[root@hodoop10 .ssh]# cat id_rsa.pub >> authorized_keys [root@hodoop10 .ssh]# ls authorized_keys id_rsa id_rsa.pub known_hosts
而後把 authorized_keys 發送給其餘全部節點
[root@hodoop10 .ssh]# scp authorized_keys root@hadoop12:~/.ssh The authenticity of host 'hadoop12 (192.168.10.12)' can't be established. RSA key fingerprint is 43:68:54:4e:85:ed:ac:30:7c:b2:a1:48:02:b9:67:57. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'hadoop12,192.168.10.12' (RSA) to the list of known hosts. root@hadoop12's password: authorized_keys
可能還須要修改 authorized_keys 的權限 644;
此時可測試,該節點免密登陸其餘節點,不須要輸密碼
[root@hodoop10 .ssh]# ssh hadoop12 Last login: Tue Sep 17 01:49:50 2019 from localhost [root@hadoop12 ~]# exit logout Connection to hadoop12 closed.
登陸成功,並退出
依次在 4 臺電腦上重複上步操做
最終 authorized_keys 文件包含了 4 個節點的公鑰
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAuFkD0t6HZM/H7pyqjqBnrnF+4wr2gI8p4wjCDdN8smAH8ujLviUAK0rE1Gh8bcXtWSjLmFLOf1oQwrCvtWnP4q9+enFwgqFFLEkQvT5jRbKrJImYWpafGimOlO5hb1jPZKrxpRZlMy9LFzLnfr5aJ+fES E2sSrTwlXbfXm0w1xhBKzoo5JZq8xIvzYXYQ8qyaTRFd2+EZbZKJ0CgVw83hKjiq9bjrbqtEg2oo8FdQwi4SNZ6d4jozhw54J8nCk8YduVneYoFSf1gmdwUcMb2iyGUfMRrhK3k0vUxBZKsfrG9aS4P4Gzd/CVGtMlqEWVldyTS9vmORHNAHEFqdyVI/w== root@hodoop10
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA7pZA4t2E00jJtotZeFST+HWXrAtzfjGFBvDkpnqwoYs1cEjsr8Ez2XjWbcdGBqbEFNohTWUh0dpfQHyWcT2fun10aRJ9GyYuebzSJm5BWT06PKWB5QavqNtdmqNTSzEfNXGjyvaV8PbfFA8kfIeaiq0/u TwTrtjcLHmN9ENm1NjJqibZxNSNJnQGXJs7Gj6ujIXrVmr//G9OqS97ZM5slgHw68F7azvpCfzHBsJu3QTZYL96WRUSRXHH8GteRMtBYVlRzg7N1gU+YKx4fMXjEk7xu/p8ub5IG5kClCIU+mR+Z0VNReGVP3n4GZuE/Fa3OMerESUs6i/GWczNbA2cSQ== root@hadoop11
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAqL/aQhVUd4B7VsfnzOFEXFQJX/rV1obelijX6M/eVns2IlpxB54UUgYoAet97Xew5vc31tAAbURW8zS4CAJujKWKFnAB/R2UIzLww6CxahsTqrsPkj89SiLl3Q4SsBDC49hULfbd5AxuEdq/v0XIFT2js bpaUtWQ2pF5HxzkhpnrpEbcwHjc14GfM1cFtyPcR3XXZC4P+scaLGgdn8I3So0k6ENqo7LfQ7y2/FNQMXtKxObfO0j7bESsNWQxPGwolXdVeBO4VEYIrYH/6/gPdOxtNGe2gCnr8MM8z7eElLXy1cF5wTddv6vCdBv9bl5H3/BHtUrJ+/5/XjkkyRVECw== root@hadoop12
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAvOn53kK/2uoDBKKq/0LQhJ63S34K6lnksgAGJYWTugx57TxroRvms2DkdrV3EKhlIzVkpE3Xzrx4hyOFHXfnfAdsrvj22zgsPx4cNxM0Tmx6ELwCpcLPF381lDjEc5/7MEqQB+wV07tjAZAXOl5wETLLO 269iHvbX3oEZ3Q62xq52BLoKCkBunk5C0lVDHAhKtzBp1XTntixircUIxpNWWduhoUwiaTrUrki8gEyC2O/Hm9Wq6h2RyC7SvH8jaAZoC9UUso50TitD10J5bhdeg8iYnhb/wUJZ5zhkwSJuj8H4j8huCo5j/eX7sPXe/3eKnVlpEz/PX0/8eAQYJY6SQ== root@hadoop13
最終實現 每臺電腦能夠免密登陸其餘全部電腦
方法不少,可自行百度
首先檢查是否已經安裝 java
yum list installed |grep java
yum 查看可用版本,並安裝
yum -y list java*
yum -y install java-1.8.0-openjdk*
檢測版本
[root@node .ssh]# java -version openjdk version "1.8.0_181"
下載地址 hadoop,注意不要下載 包含 src 的 tar 包,不然踩坑
解壓便可
而後設置環境變量,測試是否安裝成功
[root@hadoop10 lib]# vi /etc/profile [root@hadoop10 lib]# source /etc/profile [root@hadoop10 lib]# hadoop Usage: hadoop [--config confdir] COMMAND where COMMAND is one of: fs run a generic filesystem user client version print the version jar <jar> run a jar file checknative [-a|-h] check native hadoop and compression libraries availability distcp <srcurl> <desturl> copy file or directories recursively archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive classpath prints the class path needed to get the credential interact with credential providers Hadoop jar and the required libraries daemonlog get/set the log level for each daemon trace view and modify Hadoop tracing settings or CLASSNAME run the class named CLASSNAME Most commands print help when invoked w/o parameters.
說明安裝成功。
環境變量設置以下
export HADOOP_HOME=/usr/lib/hadoop-2.6.5
export PATH=.:$HADOOP_HOME/bin:$PATH
依次在 4 臺電腦上安裝 hadoop
注意,只執行 第七步 就是 單機安裝模式,對,只需這一步,而後咱們這裏對單機模式作個小測試
進入 hadoop 根目錄,建個 input 文件夾,而後上傳一個文件 log.txt 到 input,而後在根目錄執行 統計詞頻 的命令
[root@hadoop10 lib]# cd hadoop-2.6.5 [root@hadoop10 hadoop-2.6.5]# ls bin etc include lib libexec LICENSE.txt NOTICE.txt README.txt sbin share [root@hadoop10 hadoop-2.6.5]# mkdir input [root@hadoop10 input]# ls log.txt [root@hadoop10 hadoop-2.6.5]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.5.jar wordcount input output 19/09/17 23:07:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 19/09/17 23:07:20 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id 19/09/17 23:07:20 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 19/09/17 23:07:20 INFO input.FileInputFormat: Total input paths to process : 1 19/09/17 23:07:20 INFO mapreduce.JobSubmitter: number of splits:1 19/09/17 23:07:20 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1348719737_0001 19/09/17 23:07:21 INFO mapreduce.Job: The url to track the job: http://localhost:8080/ 19/09/17 23:07:21 INFO mapreduce.Job: Running job: job_local1348719737_0001 19/09/17 23:07:21 INFO mapred.LocalJobRunner: OutputCommitter set in config null 19/09/17 23:07:21 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter 19/09/17 23:07:21 INFO mapred.LocalJobRunner: Waiting for map tasks 19/09/17 23:07:21 INFO mapred.LocalJobRunner: Starting task: attempt_local1348719737_0001_m_000000_0 19/09/17 23:07:21 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 19/09/17 23:07:21 INFO mapred.MapTask: Processing split: file:/usr/lib/hadoop-2.6.5/input/log.txt:0+183 19/09/17 23:07:21 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584) 19/09/17 23:07:21 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100 19/09/17 23:07:21 INFO mapred.MapTask: soft limit at 83886080
...
19/09/17 23:07:22 INFO mapreduce.Job: map 100% reduce 100%
hadoop 的配置文件都在 /etc/hadoop 下
修改 java 環境變量爲 絕對路徑
設置 hdfs 的 Namenode 地址;
設置 hadoop 運行時臨時文件的存儲路徑
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://hadoop10:8020</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/opt/module/hadoop-2.6.5/data/tmp</value> </property> </configuration>
若是沒有設置 hadoop.tmp.dir,默認存儲路徑在 /tmp/hadoop-username 下
設置 hdfs 的備份數,默認爲 3
<configuration> <property> <name>dfs.replication</name> <value>5</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>hadoop10:50090</value> </property> <configuration>
把 mapred-site.xml.template 重命名爲 mapred-site.xml 【這步不須要貌似也能夠】
mv mapred-site.xml.template mapred-site.xml
修改 mapreduce 配置文件,設置 jobTracker 的地址和端口
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
指定 mr 運行在 yarn 上
刪除原有內容,寫入全部節點 hostname,這樣能夠一鍵啓動整個集羣
hadoop10
hadoop11
hadoop12
hadoop13
注意不能有 空格和空行
一樣 將 java 環境變量改爲 絕對路徑 【這步不要也能夠試試】
<!-- 指定YARN的老大(ResourceManager)的地址 --> <property> <name>yarn.resourcemanager.hostname</name> <value>hadoop10</value> </property> <!-- reducer獲取數據的方式 --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property>
scp -r /usr/lib/hadoop-2.6.5/etc/hadoop root@hadoop13:/usr/lib/hadoop-2.6.5/etc/
集羣搭好了,先把全部磁盤格式化一下,後面要存數據了,避免有雜質,同時建立一些東西。
注意:只在第一次啓動是格式化,後面啓動無需格式化
咱們要看 namenode 設置在哪一個節點上,而後在該節點上執行以下命令
bin/hdfs namenode -format
單節點啓動 namenode
[root@hadoop10 hadoop-2.6.5]# sbin/hadoop-daemon.sh start namenode starting namenode, logging to /usr/lib/hadoop-2.6.5/logs/hadoop-root-namenode-hadoop10.out [root@hadoop10 hadoop-2.6.5]# jps 3877 NameNode 3947 Jps
單節點啓動 datanode
[root@hadoop10 hadoop-2.6.5]# sbin/hadoop-daemon.sh start datanode starting datanode, logging to /usr/lib/hadoop-2.6.5/logs/hadoop-root-datanode-hadoop10.out [root@hadoop10 hadoop-2.6.5]# jps 3877 NameNode 4060 Jps 3982 DataNode
在 其餘節點 依次啓動 datanode
這樣啓動 hdfs,是否是很麻煩,並且咱們發現 SecondaryNameNode 並無被啓動,因此 hadoop 提供了其餘啓動方式
一步啓動 集羣 hdfs:Namenode、Datanode、SecondaryNameNode
[root@hadoop10 hadoop-2.6.5]# sbin/start-dfs.sh 19/09/18 18:37:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Starting namenodes on [hadoop10] hadoop10: starting namenode, logging to /usr/lib/hadoop-2.6.5/logs/hadoop-root-namenode-hadoop10.out hadoop10: starting datanode, logging to /usr/lib/hadoop-2.6.5/logs/hadoop-root-datanode-hadoop10.out hadoop13: starting datanode, logging to /usr/lib/hadoop-2.6.5/logs/hadoop-root-datanode-hadoop13.out hadoop12: starting datanode, logging to /usr/lib/hadoop-2.6.5/logs/hadoop-root-datanode-hadoop12.out hadoop11: starting datanode, logging to /usr/lib/hadoop-2.6.5/logs/hadoop-root-datanode-hadoop11.out Starting secondary namenodes [hadoop10] hadoop10: starting secondarynamenode, logging to /usr/lib/hadoop-2.6.5/logs/hadoop-root-secondarynamenode-hadoop10.out 19/09/18 18:37:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable [root@hadoop10 hadoop-2.6.5]# jps 6162 NameNode 6258 DataNode 6503 Jps 6381 SecondaryNameNode
一樣看 yarn 設置在哪一個節點,yarn-site.xml,而後在該節點執行以下命令
[root@hadoop10 hadoop-2.6.5]# sbin/start-yarn.sh starting yarn daemons starting resourcemanager, logging to /usr/lib/hadoop-2.6.5/logs/yarn-root-resourcemanager-hadoop10.out hadoop10: starting nodemanager, logging to /usr/lib/hadoop-2.6.5/logs/yarn-root-nodemanager-hadoop10.out hadoop13: starting nodemanager, logging to /usr/lib/hadoop-2.6.5/logs/yarn-root-nodemanager-hadoop13.out hadoop11: starting nodemanager, logging to /usr/lib/hadoop-2.6.5/logs/yarn-root-nodemanager-hadoop11.out hadoop12: starting nodemanager, logging to /usr/lib/hadoop-2.6.5/logs/yarn-root-nodemanager-hadoop12.out [root@hadoop10 hadoop-2.6.5]# jps 6162 NameNode 6770 NodeManager 6258 DataNode 7012 Jps 6668 ResourceManager 6381 SecondaryNameNode
ResourceManager 和 NodeManager 都 啓動
yarn 也能夠分開啓動
好麻煩,因此 hadoop 還提供了一鍵啓動和一鍵關閉
sbin/start-all.sh
sbin/stop-all.sh
不過這個命令官方不建議使用,新版本已經廢棄
namenode 的 IP
50070 端口 訪問 hdfs http://192.168.10.10:50070
8088 端口 訪問 mapreduce http://192.168.10.10:8088
給 hdfs 文件系統中創建目錄,兩種方式
bin/hdfs dfs -mkdir -p /usr/input/yanshw
bin/hadoop fs -mkdir -p /usr/input/yansw
遠程可訪問
上傳文件
bin/hadoop fs -put README.txt /usr/input/yanshw
遠程可查看
執行命令
必須指定輸入輸出,輸出 不能提早建立
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.5.jar wordcount /usr/input/yanshw /usr/output/yanshw
遠程查看
讓咱們好奇的是,上面的目錄在哪裏呢?理論上應該是在 hadoop.tmp.dir 中,實際上確實如此,可是藏得很深
能夠直接 cat 這個 文件就能看到他的內容就是咱們上傳的文件;
這個文件很小,只有一個 block,若是是大文件,會被分紅 多個 block,每一個 block 就和上面的圖同樣;
咱們能夠把 全部 block cat >> 到一個 文件內,就能夠看到咱們上傳的文件
hadoop fs -linux 命令
如
$ hdfs dfs -ls / $ hdfs dfs -mkdir /user/hduser $ hdfs dfs -put /home/hduser/input.txt /user/hduser $ hdfs dfs -get input.txt /home/hduser
上面咱們雖然部署好了 hadoop 集羣,可是並不完美,由於 咱們把 Namenode、SecondaryNamenode、ResourceManager 都部署到了一臺服務器;
這樣使得這臺服務器壓力很是大;並且這 3 個組件所能用到的資源都被壓縮了;
因此在搭建集羣以前咱們最好先作個規劃,相似下圖
三個核心組件分別放在 3 臺服務器上,最簡單的集羣只需 3 臺服務器
1. 找不到 jps
jps 是 查看 java 進程
找不到 jps 命令,說明 java 沒裝好,須要設置 java 環境變量
2. 重啓後沒法啓動 datanode
一般在第一次搭建時能夠成功,可是重啓後不能成功,datanode 沒法啓動,緣由是 datanode 沒法被 namenode 識別。
namenode 在 format 時會造成兩個標識,blockPoolId 和 clusterId;
當有 datanode 加入時,會獲取這兩個標識做爲從屬 這個 namenode 的標識,這樣才能組成集羣;
一旦 namenode 被從新 format,會更新這兩個標識;
然而 datanode 還拿原來的標識過來接頭,天然被拒之門外;
解決方法:刪除全部節點的數據,即 tmp,包括 namenode 的數據,從新格式化,再啓動
3. 各類操做都會有以下 警告
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
無需理會,只是警告,確實想解決,參考 解決辦法
參考資料:
https://www.cnblogs.com/laov/p/3421479.html hadoop1.2.1
https://blog.csdn.net/baidu_28997655/article/details/81586418 hadoop2.6.5
https://blog.csdn.net/qq285016127/article/details/80501418 hadoop2.6.4
https://www.cnblogs.com/xia520pi/archive/2012/05/16/2503949.html#_label3_0 講得很詳細