Hadoop 3.0.0-alpha2安裝(一)

1、集羣部署概述

1.1 Hadoop簡介

    研發要作數據挖掘統計,須要Hadoop環境,便開始了本次安裝測試,僅僅使用了3臺虛擬機作測試工做。 簡介……此處省略好多……,可自行查找 ……java

    從你找到的內容能夠總結看到,NameNodeJobTracker負責分派任務,DataNodeTaskTracker負責數據計算和存儲。這樣集羣中能夠有一臺NameNode+JobTrackerN多臺DataNodeTaskTrackernode

### 直接從word文檔中拷貝到博客編輯後臺的,看官注意個別空格等問題!linux

1.2版本信息

本次測試安裝所需軟件版本信息如表1-1所示。c++

1-1:軟件版本信息apache

名稱        vim

版本信息安全

操做系統服務器

CentOS-6.8-x86_64-bin-DVD1.isocookie

Javaapp

jdk-8u121-linux-x64.tar.gz

Hadoop

hadoop-3.0.0-alpha2.tar.gz

1.3測試環境說明

    本實驗環境是在虛擬機中安裝測試的,Hadoop集羣中包括1Master2Salve,節點之間內網互通,虛擬機主機名和IP地址如表1-2所示。

主機名

模擬外網IP地址(eth1

備註

master

192.168.24.15

NameNode+JobTracker

slave1

192.168.24.16

DataNode+TaskTracker

slave2

192.168.24.17

DataNode+TaskTracker

### 說明:文檔出現的灰色陰影部份內容爲文件編輯內容或操做顯示內容。

2、操做系統設置

1、安裝經常使用軟件

### 因爲操做系統是最小化安裝,因此安裝一些經常使用的軟件包

# yum install gcc gcc-c++ openssh-clients vimmake ntpdate unzip cmake tcpdump openssl openssl-devel lzo lzo-devel zlibzlib-devel snappy snappy-devel lz4 lz4-devel bzip2 bzip2-devel cmake wget

2、修改主機名

# vim /etc/sysconfig/network        # 其餘兩個節點分別是:slave1slave2

NETWORKING=yes

HOSTNAME=master

3、配置hosts文件

# vim /etc/hosts      # masterslave服務器上均添加如下配置內容

10.0.24.15 master

10.0.24.16 slave1

10.0.24.17 slave2

4、建立帳號

# useradd hadoop

5、文件句柄設置

# vim/etc/security/limits.conf

*  soft nofile 65000

*  hard nofile 65535

$ ulimit -n   # 查看

6、系統內核參數調優sysctl.conf

net.ipv4.ip_forward = 0

net.ipv4.conf.default.rp_filter = 1

net.ipv4.conf.default.accept_source_route = 0

kernel.sysrq = 0

kernel.core_uses_pid = 1

net.ipv4.tcp_syncookies = 1

kernel.msgmnb = 65536

kernel.msgmax = 65536

kernel.shmmax = 68719476736

kernel.shmall = 4294967296

net.ipv4.tcp_max_tw_buckets = 60000

net.ipv4.tcp_sack = 1

net.ipv4.tcp_window_scaling = 1

net.ipv4.tcp_rmem = 4096 87380 4194304

net.ipv4.tcp_wmem = 4096 16384 4194304

net.core.wmem_default = 8388608

net.core.rmem_default = 8388608

net.core.rmem_max = 16777216

net.core.wmem_max = 16777216

net.core.netdev_max_backlog = 262144

net.core.somaxconn = 262144

net.ipv4.tcp_max_orphans = 3276800

net.ipv4.tcp_max_syn_backlog = 262144

net.ipv4.tcp_timestamps = 0

net.ipv4.tcp_synack_retries = 1

net.ipv4.tcp_syn_retries = 1

net.ipv4.tcp_tw_recycle = 1

net.ipv4.tcp_tw_reuse = 1

net.ipv4.tcp_mem = 94500000 915000000 927000000

net.ipv4.tcp_fin_timeout = 1

net.ipv4.tcp_keepalive_time = 1200

net.ipv4.tcp_max_syn_backlog = 65536

net.ipv4.tcp_timestamps = 0

net.ipv4.tcp_synack_retries = 2

net.ipv4.tcp_syn_retries = 2

net.ipv4.tcp_tw_recycle = 1

#net.ipv4.tcp_tw_len = 1

net.ipv4.tcp_tw_reuse = 1

#net.ipv4.tcp_fin_timeout = 30

#net.ipv4.tcp_keepalive_time = 120

net.ipv4.ip_local_port_range = 1024  65535

 

7、關閉SELINUX

# vim /etc/selinux/config  

#SELINUX=enforcing      

#SELINUXTYPE=targeted  

SELINUX=disabled       

# reboot                    # 重啓服務器生效

8、配置ssh

# vim /etc/ssh/sshd_config       # 去掉如下內容前「#」註釋

HostKey /etc/ssh/ssh_host_rsa_key

RSAAuthentication yes

PubkeyAuthentication yes

AuthorizedKeysFile      .ssh/authorized_keys

# /etc/init.d/sshd restart

9、配置masterslave間無密碼互相登陸

1maseterslave服務器上均生成密鑰

# su - hadoop

$ssh-keygen -b 1024 -t rsa

Generating public/private rsa key pair.

Enter file in which to save the key(/root/.ssh/id_rsa): <直接輸入回車

Enter passphrase (empty for no passphrase): <直接輸入回車

Enter same passphrase again: <直接輸入回車

Your identification has been saved in/root/.ssh/id_rsa.

Your public key has been saved in/root/.ssh/id_rsa.pub.

The key fingerprint is: ……

注意:在程序提示輸入 passphrase 時直接輸入回車,表示無證書密碼。

2maseterslave服務器上hadoop用戶下均建立authorized_keys文件

$ cd .ssh

$ vim authorized_keys   # 添加mastersalve服務器上hadoop用戶下id_rsa.pub文件內容

ssh-rsa AAAAB3Nza…省略…HxNDk= hadoop@master

ssh-rsa AAAAB3Nza…省略…7CmlRs= hadoop@slave1

ssh-rsa AAAAB3Nza…省略…URmXD0= hadoop@slave2

$ chmod 644 authorized_keys

$ ssh -p2221 hadoop@10.0.24.16   $ ssh -p2221 slave1  # 分別測試ssh連通性

3Java環境安裝

### Hadoop集羣均需安裝Java環境

# mkdir /data  && cd /data

# tar zxf jdk-8u121-linux-x64.tar.gz

# ln -sv jdk1.8.0_121 jdk

# chown -R root. jdk*

# cat >> /etc/profile.d/java.sh<<'EOF'

# Set jave environment

export JAVA_HOME=/data/jdk

export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib

export PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/jre/bin

EOF 

# source /etc/profile     # 及時生效     # java -version# javac-version  # 查看版本信息

4Hadoop集羣安裝

4.1 master上安裝Hadoop

# cd /data

# hadoop-3.0.0-alpha2.tar.gz

# ln -sv hadoop-3.0.0-alpha2 hadoop   # mkdir -p /data/hadoop/logs # chown -Rhadoop:hadoop /data/hadoop/logs

# mkdir -p /data/hadoop/tmp          # 配置文件core-site.xml中配置使用

# mkdir -p /data/{hdfsname1,hdfsname2}/hdfs/name

# mkdir -p /data/{hdfsdata1,hdfsdata2}/hdfs/data

# chown -R hadoop:hadoop /data/hdfs*

# 以上四個文件目錄hadfs-site.xml中配置使用

# cat >> /etc/profile.d/hadoop.sh<<'EOF'

# Set hadoop environment

export HADOOP_HOME=/data/hadoop

export PATH=$PATH:$HADOOP_HOME/bin

EOF 

# source /etc/profile

# chown -R hadoop:hadoop hadoop*

4.2 master上配置Hadoop

# cd /data/hadoop/etc/hadoop

4.2.1 hadoop-env.sh

# vim hadoop-env.sh       # masterslave末行均添加

# Set jave environment

export JAVA_HOME=/data/jdk

export HADOOP_SSH_OPTS="-p 2221"

4.2.2 core-site.xml

# vim core-site.xml

<configuration>

   <property>

       <name>fs.defaultFS</name>

       <value>hdfs://master:9000</value>

   </property>

   <property>

       <name>hadoop.tmp.dir</name>

       <value>/data/hadoop/tmp</value>

    </property>

   <property>

       <name>io.compression.codecs</name>

       <value>org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache

.hadoop.io.compress.BZip2Codec</value>

   </property>

   <property>

       <name>io.compression.codec.lzo.class</name>

       <value>com.hadoop.compression.lzo.LzoCodec</value>

   </property>

</configuration>

### 說明:

1<property>:定義hdfsnamenode的主機名和端口,本機,主機名在/etc/hosts設置

2<property>:定義如沒有配置hadoop.tmp.dir參數,此時系統默認的臨時目錄爲:/tmp/hadoo-hadoop。而這個目錄在每次重啓後都會被刪掉,必須從新執行format才行,不然會出錯。默認是NameNodeDataNodeJournalNode等存放數據的公共目錄。用戶也能夠本身單獨指定這三類節點的目錄。這裏的/data/hadoop/tmp目錄與文件都是本身建立的,配置後在格式化namenode的時候也會自動建立。 

3<property>:定義hdfs使用壓縮(本次測試暫時關閉了本項目,能夠註釋掉

4<property>:定義壓縮格式和×××類(本次測試暫時關閉了本項目,能夠註釋掉)

4.2.3 hdfs-site.xml

# vim hdfs-site.xml

<configuration>

   <property>

       <name>dfs.name.dir</name>

       <value>file:///data/hdfsname1/hdfs/name,file:// /data/hdfsname2/hdfs/name</value>

       <description> </description>

   </property>

   <property>

       <name>dfs.data.dir</name>

       <value>file:///data/hdfsdata1/hdfs/data,file:///data/hdfsdata2/hdfs/data</value>

       <description> </description>

   </property>

   <property>

       <name>dfs.replication</name>

       <value>2</value>

   </property>

   <property>

       <name>dfs.datanode.du.reserved</name>

       <value>1073741824</value>

   </property>

   <property>

       <name>dfs.block.size</name>

       <value>134217728</value>

   </property>

   <property>

       <name>dfs.permissions</name>

       <value>false</value>

   </property>

</configuration>

1<property>:定義hdfs Namenode持久存儲名字空間、事務日誌路徑。多路徑能夠使用「,」分割,這裏配置模擬了多磁盤掛載。

2<property>:定義本地文件系統上DFS數據節點應存儲其塊的位置。能夠逗號分隔目錄列表,則數據將存儲在全部命名的目錄中,一般在不一樣的設備上。

3<property>:定義DataNode存儲block的副本數量。默認值是3個,咱們如今有2 DataNode,該值不大2便可,份數越多越安全,但速度越慢。

4<property>:定義du操做返回。

5<property>:定義hdfs的存儲塊大小,默認64M,我用的128M

6<property>:權限設置,最好不要。

4.2.4 mapred-site.xml

# cp -a mapred-site.xml.templatemapred-site.xml  

# vim mapred-site.xml

<configuration>

   <property>

       <name>mapreduce.framework.name</name>

       <value>yarn</value>

   </property>

   <property>

       <name>mapreduce.application.classpath</name>

       <value>

           /data/hadoop/etc/hadoop,

           /data/hadoop/share/hadoop/common/*,

           /data/hadoop/share/hadoop/common/lib/*,

           /data/hadoop/share/hadoop/hdfs/*,

           /data/hadoop/share/hadoop/hdfs/lib/*,

           /data/hadoop/share/hadoop/mapreduce/*,

           /data/hadoop/share/hadoop/mapreduce/lib/*,

           /data/hadoop/share/hadoop/yarn/*,

           /data/hadoop/share/hadoop/yarn/lib/*

     </value>

    </property>

</configuration>

###說明:

上面的mapreduce.application.classpath一開始沒有配置,致使使用mapreduce時報錯

Error: Could not find or load main classorg.apache.hadoop.mapreduce.v2.app.MRAppMaster

4.2.5 yarn-site.xml

# vim yarn-site.xml

<configuration>

 <property>

    <name>yarn.resourcemanager.hostname</name>

   <value>master</value>

 </property>

 <property>

   <name>yarn.nodemanager.aux-services</name>

   <value>mapreduce_shuffle</value>

 </property>

</configuration>

1<property>:定義指的是運行ResourceManager機器所在的節點.

2<property>:定義在hadoop2.2.0版本中是mapreduce_shuffle,必定要看清楚。

### 注意:本次測試使用了默認文件,沒有添加任何內容。

4.2.6 workers

# vim workers      # 配置slave的主機名,不然slave節點不啓動

slave1

slave2

4.3 slava上安裝Hadoop

複製主節點master上的hadoop安裝配置環境到全部的slave上,切記:目標路徑要與master保持一致。

$ scp -P2221 hadoop.tar.gzhadoop@slave1:/home/hadoop

$ scp -P2221 hadoop.tar.gzhadoop@slave2:/home/hadoop

4.4配置防火牆

### 實驗時能夠關閉防火牆,避免沒必要要的麻煩,等後續陸續調試

4.5 Hadoop啓動及其驗證

4.5.1 master上格式化HDFS文件系統

### 注意回到master服務器上執行以下操做:

# su - hadoop

$ /data/hadoop/bin/hdfsnamenode -format      # 顯示以下內容:

2017-03-15 19:02:50,062 INFO namenode.NameNode:STARTUP_MSG:

/************************************************************

STARTUP_MSG: Starting NameNode

STARTUP_MSG:  user = hadoop

STARTUP_MSG:  host = master/10.0.24.15

STARTUP_MSG:  args = [-format]

STARTUP_MSG:  version = 3.0.0-alpha2

……此處省略好多……

Re-format filesystem in Storage Directory/data/hdfsname1/hdfs/name ? (Y or N) y

Re-format filesystem in Storage Directory/data/hdfsname2/hdfs/name ? (Y or N) y

……此處省略好多……

2017-03-15 19:03:48,703 INFO namenode.FSImage:Allocated new BlockPoolId: BP-1344030132-10.0.24.15-1489575828688

……此處省略好多……

2017-03-15 19:03:48,999 INFO util.ExitUtil: Exitingwith status 0

2017-03-15 19:03:49,002 INFO namenode.NameNode:SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode atmaster/10.0.24.15

************************************************************/

4.5.2 啓動校驗中止集羣

$ cd /data/hadoop/sbin   # master服務器上操做

1$ ./start-all.sh  # 啓動       # 顯示內容:WARNING WARN暫時沒有解決,詳見5FAQ

WARNING: Attempting to start all Apache Hadoopdaemons as hadoop in 10 seconds.

WARNING: This is not a recommended productiondeployment configuration.

WARNING: Use CTRL-C to abort.

Starting namenodes on [master]

Starting datanodes

Starting secondary namenodes [master]

2017-03-21 18:51:03,092 WARN util.NativeCodeLoader:Unable to load native-hadoop library for your platform... using builtin-javaclasses where applicable

Starting resourcemanager

Starting nodemanagers

2$ /data/jdk1.8.0_121/bin/jps       # master上查看進程

9058 SecondaryNameNode

9272 ResourceManager

9577 RunJar

8842 NameNode

9773 Jps

3$ /data/jdk1.8.0_121/bin/jps       # slave1\slave2上查看進程

5088 DataNode

5340 Jps

5213 NodeManager

4$ ./stop-all.sh                    # master服務器上操做中止集羣

WARNING: Stopping all Apache Hadoop daemons as hadoopin 10 seconds.

WARNING: Use CTRL-C to abort.

Stopping namenodes on [master]

Stopping datanodes

Stopping secondary namenodes [master]

2017-03-21 18:57:20,746 WARN util.NativeCodeLoader:Unable to load native-hadoop library for your platform... using builtin-javaclasses where applicable

Stopping nodemanagers

slave1: WARNING: nodemanager did not stop gracefullyafter 5 seconds: Trying to kill with kill -9

slave2: WARNING: nodemanager did not stop gracefullyafter 5 seconds: Trying to kill with kill -9

Stopping resourcemanager

5$ /data/jdk1.8.0_121/bin/jps     # 再次查看進程都已經正常關閉

11500 Jps

6Web頁面

1http://192.168.24.15:8088

    wKioL1jfIEjQHRRAAAFb9cgTxhE230.png-wh_50

2http://192.168.24.15:9870

wKioL1jfIHXwnTrVAADpwnlOGsg348.png-wh_50

4.5.3 Mapreduce程序測試

$ cd /data/hadoop/bin

1、第一種測試方法:

$ hadoop jar../share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-alpha2.jar pi 1 1  #說明成功

Number of Maps = 1

Samples per Map = 1

Wrote input for Map #0

Starting Job

2017-04-01 05:34:34,150 INFO client.RMProxy:Connecting to ResourceManager at master/192.168.24.15:8032

2017-04-01 05:34:35,765 INFO input.FileInputFormat:Total input files to process : 1

2017-04-01 05:34:35,876 INFO mapreduce.JobSubmitter:number of splits:1

2017-04-01 05:34:35,926 INFOConfiguration.deprecation:yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead,use yarn.system-metrics-publisher.enabled

2017-04-01 05:34:36,402 INFO mapreduce.JobSubmitter:Submitting tokens for job: job_1490957345671_0007

2017-04-01 05:34:36,939 INFO impl.YarnClientImpl:Submitted application application_1490957345671_0007

2017-04-01 05:34:37,085 INFO mapreduce.Job: The urlto track the job: http://master:8088/proxy/application_1490957345671_0007/

2017-04-01 05:34:37,086 INFO mapreduce.Job: Runningjob: job_1490957345671_0007

2017-04-01 05:34:47,336 INFO mapreduce.Job: Jobjob_1490957345671_0007 running in uber mode : false

2017-04-01 05:34:47,340 INFO mapreduce.Job:  map 0% reduce 0%

2017-04-01 05:34:57,496 INFO mapreduce.Job:  map 100% reduce 0%

2017-04-01 05:35:05,574 INFO mapreduce.Job:  map 100% reduce 100%

2017-04-01 05:35:05,588 INFO mapreduce.Job: Jobjob_1490957345671_0007 completed successfully

 

2、第二種測試方式:

1)生成HDFS請求目錄執行MapReduce任務

$ hdfs dfs -mkdir /user  

$ hdfs dfs -mkdir /user/hduser

2)將輸入文件拷貝到分佈式文件系統

$ hdfs dfs -mkdir /user/hduser/input

$ hdfs dfs -put ../etc/hadoop/yarn-site.xml /user/hduser/input

2)運行提供的示例程序

$ hadoop jar../share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-alpha2.jar grep/user/hduser/input/yarn-site.xml output 'dfs[a-z.]+'……省略……

2017-03-31 10:58:46,650 INFO mapreduce.Job:  map 100% reduce 100%

2017-03-31 10:58:46,664 INFO mapreduce.Job: Jobjob_1490957345671_0003 completed successfully

2017-03-31 10:58:46,860 INFO mapreduce.Job: Counters:49

……省略……

http://192.168.24.15:9870裏能夠看到:

wKiom1jfIKqAad1yAABuPgUYvPs768.png-wh_50

### 因爲博客文字限制,只能分開寫了:
Hadoop 3.0.0-alpha2安裝(二)連接:

http://laowafang.blog.51cto.com/251518/1912345

劉政委2017-04-01

相關文章
相關標籤/搜索