HADOOP docker(一):安裝hadoop實驗集羣(略操蛋)

一.環境準備

1.1.機器規劃

主機名    別名    IP     角色
9321a27a2b91 hadoop1 172.17.0.10 NN1 ZK RM
7c3a3c9cd595 hadoop2 172.17.0.9 NN2 ZK RM JOBHIS
f89eaf2a2548 hadoop3 172.17.0.8 DN ZK ND
28620eee1426 hadoop4 172.17.0.7 DN QJM1 ND
ae1f06bd04c8 hadoop5 172.17.0.6 DN QJM2 ND
11c433a003b6 hadoop6 172.17.0.5 DN QJM3 ND

1.2.用戶與組

用戶     做用
hdfs hadoop 管理HDFS
yarn hadoop 管理yarn
zookeeper hadoop 管理zookeeper
hvie hadoop 管理hvie
hbase hadoop 管理hbase
     
腳本:
  1. groupadd hadoop
  2. useradd -g hadoop hdfs
  3. passwd hdfs <<EOF
  4. hdfs
  5. hdfs
  6. EOF
  7. useradd -g hadoop yarn
  8. passwd yarn <<EOF
  9. yarn
  10. yarn
  11. EOF
  12. useradd -g hadoop zookeeper
  13. passwd zookeeper <<EOF
  14. zookeeper
  15. zookeeper
  16. EOF
  17. useradd -g hadoop hive
  18. passwd hive <<EOF
  19. hive
  20. hive
  21. EOF
  22. useradd -g hadoop hbase
  23. passwd hbase <<EOF
  24. hbase
  25. hbase
  26. EOF
  27. echo user added!
 
 

1.3.修改/etc/hosts

加入全部節點的IP NAME
  1. echo "127.0.0.1 localhost localhost">/etc/hosts
  2. echo "172.17.0.6 9321a27a2b91 hadoop1">>/etc/hosts
  3. echo "172.17.0.7 7c3a3c9cd595 hadoop2">>/etc/hosts
  4. echo "172.17.0.8 f89eaf2a2548 hadoop3">>/etc/hosts
  5. echo "172.17.0.9 28620eee1426 hadoop4">>/etc/hosts
  6. echo "172.17.0.10 ae1f06bd04c8 hadoop5">>/etc/hosts
  7. echo "172.17.0.11 11c433a003b6 hadoop6">>/etc/hosts

1.4. ssh 免密碼登陸

在各個機器上執行
 
  1. su hdfs
  2. ssh-copy-id -i ~/.ssh/id_rsa.pub 172.17.0.6
  3. ssh-copy-id -i ~/.ssh/id_rsa.pub 172.17.0.7
  4. ssh-copy-id -i ~/.ssh/id_rsa.pub 172.17.0.8
  5. ssh-copy-id -i ~/.ssh/id_rsa.pub 172.17.0.9
  6. ssh-copy-id -i ~/.ssh/id_rsa.pub 172.17.0.10
  7. ssh-copy-id -i ~/.ssh/id_rsa.pub 172.17.0.11
 

1.5.修改ulimit 

先用hdfs yarn hive等用戶查看limit -a .若是 -n -m -l -u等不知足要求,則修改/etc/security/limits.conf
  1. [hdfs@9321a27a2b91 root]$ ulimit -a
  2. core file size (blocks,-c) unlimited
  3. data seg size (kbytes,-d) unlimited
  4. scheduling priority (-e)0
  5. file size (blocks,-f) unlimited
  6. pending signals (-i)95612
  7. max locked memory (kbytes,-l)64
  8. max memory size (kbytes,-m) unlimited
  9. open files (-n)65536
  10. pipe size (512 bytes,-p)8
  11. POSIX message queues (bytes,-q)819200
  12. real-time priority (-r)0
  13. stack size (kbytes,-s)8192
  14. cpu time (seconds,-t) unlimited
  15. max user processes (-u)1024
  16. virtual memory (kbytes,-v) unlimited
  17. file locks (-x) unlimited
若是不知足要求,修改/etc/security/limits.conf,添加:
  1. hdfs hard nfile 65536
  2. hdfs soft nfile 65536
  3. yarn hard nfile 65536
  4. yarn soft nfile 65536
  5. ......
nfile指找開文件數,還能夠設置nproc等.
注:本次實驗機不作修改.
 

6.關閉防火牆

  1. service iptables stop

7.關閉seLinux

永久生效:修改/etc/selinux/config文件中設置SELINUX=disabled
臨時生效:使用命令setenforce 0 
因爲是docker沒法重啓,故這次使用第二種方法
  1. setenforce 0
 
 

二.軟件準備

2.1.安裝jdk 

在hadoop1節點上:
上官網下載最新的jdk8.1 tar包,解壓到/usr/local/java
  1. [root@9321a27a2b91 ~]#mkdir /usr/local/java
  2. [root@9321a27a2b91 ~]#cp jdk-8u121-linux-x64.tar.gz /usr/local/java/
  3. [root@9321a27a2b91 ~]#chown -R hdfs:hadoop /usr/local/java/
  4. [root@9321a27a2b91 ~]#su hdfs
  5. [hdfs@9321a27a2b91 root]$ cd /usr/local/java/jdk-8u121-linux-x64.tar.gz
  6. [hdfs@9321a27a2b91 java]$ tar -zxvf jdk-8u121-linux-x64.tar.gz
在每一個節點上執行此操做.
或者在一個節點上執行完後,把相關文件scp到其它節點:
  1. mkdir /usr/local/java
  2. chown hdfs:hadoop /usr/local/java
  3. su hdfs
  4. scp -r hdfs@hadoop1:/usr/local/java/jdk1.8.0_121 /usr/local/java
 

2.2.hadoop安裝包

在hadoop1節點上:
上官網下載hadoop2.7.3, 解壓到/opt/hadoop
  1. [root@9321a27a2b91 ~]# mkdir /opt/hadoop
  2. [root@9321a27a2b91 ~]# chown hdfs:hadoop hadoop-2.7.3.tar.gz
  3. [root@9321a27a2b91 ~]# chown hdfs:hadoop /opt/hadoop
  4. [root@9321a27a2b91 ~]# cp hadoop-2.7.3.tar.gz /opt/hadoop
  5. [root@9321a27a2b91 ~]# su hdfs
  6. [hdfs@9321a27a2b91 root]$ cd /opt/hadoop/
  7. [hdfs@9321a27a2b91 hadoop]$ tar -zxvf hadoop-2.7.3.tar.gz
 

2.3.ntp服務

將hadoop1設置爲ntp服務器,其它爲ntp client.在hadoop1結點上:
1)在hadoop1上,以root用戶執行:
若是沒有裝ntp,先用yum安裝:
  1. yum -y install ntp
編輯ntp配置文件/etc/ntp.conf,添加:
  1. #本子網內主機均可以同步
  2. restrict 172.17.0.0 mask 255.255.0.0 nomodify
  3. #優先時間服務器
  4. server 172.17.0.10 prefer
  5. #日誌文件位置
  6. logfile /var/log/ntp.log
而後啓動ntpd:
  1. [root@9321a27a2b91 hadoop]# service ntpd start
  2. Starting ntpd:[ OK ]
  3. [root@9321a27a2b91 hadoop]# service ntpd status
  4. ntpd dead but pid file exists
發現ntpd中止了,去/var/log/ntp.log看:
  1. 3Apr11:20:08 ntpd[732]: ntp_io: estimated max descriptors:65536, initial socket boundary:16
  2. 3Apr11:20:08 ntpd[732]:Listen and drop on 0 v4wildcard 0.0.0.0 UDP 123
  3. 3Apr11:20:08 ntpd[732]:Listen and drop on 1 v6wildcard :: UDP 123
  4. 3Apr11:20:08 ntpd[732]:Listen normally on 2 lo 127.0.0.1 UDP 123
  5. 3Apr11:20:08 ntpd[732]:Listen normally on 3 eth0 172.17.0.10 UDP 123
  6. 3Apr11:20:08 ntpd[732]:Listen normally on 4 lo ::1 UDP 123
  7. 3Apr11:20:08 ntpd[732]:Listen normally on 5 eth0 fe80::42:acff:fe11:a UDP 123
  8. 3Apr11:20:08 ntpd[732]:Listening on routing socket on fd #22 for interface updates
  9. 3Apr11:20:08 ntpd[732]:0.0.0.0 c016 06 restart
  10. 3Apr11:20:08 ntpd[732]: ntp_adjtime() failed:Operation not permitted
  11. 3Apr11:20:08 ntpd[732]:0.0.0.0 c012 02 freq_set kernel 0.000 PPM
  12. 3Apr11:20:08 ntpd[732]:0.0.0.0 c011 01 freq_not_set
  13. 3Apr11:20:08 ntpd[732]: cap_set_proc() failed to drop root privileges:Operation not permitted
上網百度:
緣由:多是虛擬機上使用linux內核有bug,致使ntp不能drop root(不以root用戶啓動).編譯內核是不可能了.那麼,修改/etc/sysconfig/ntpd:
註釋掉:
  1. OPTIONS="-u ntp:ntp -p /var/run/ntpd.pid -g"
  1. echo "# Drop root to id 'ntp:ntp' by default.">/etc/sysconfig/ntpd
  2. echo "#OPTIONS="-u ntp:ntp -p /var/run/ntpd.pid -g" ">>/etc/sysconfig/ntpd
 
而後啓動ntp:
  1. [root@9321a27a2b91 hadoop]# service ntpd start
  2. Starting ntpd:[ OK ]
  3. [root@9321a27a2b91 hadoop]# service ntpd status
  4. ntpd (pid 796) is running..
 
2)以其它節點上:
修改/etc/ntp.conf 加上
  1. server 172.17.0.10 prefer
 
 

2.4.mysql 數據庫

mysql數據庫是給hive用的.爲了方便,直接使用yum安裝便可.也可手動下載安裝包安裝.
 
 

三.安裝hadoop及其組件

3.1 安裝HDFS及YARN

3.1.1 設置環境變量.bash_profile
在hadoop1上修改/home/hdfs/.bash_profile
  1. su hdfs
  2. vi ~.bash_profile
  3. JAVA_HOME=/usr/local/java/jdk1.8.0_121
  4. CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib:$CLASS
  5. HADOOP_HOME=/opt/hadoop/hadoop-2.7.3
  6. HADOOP_PREFIX=/opt/hadoop/hadoop-2.7.3
  7. HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
  8. HADOOP_YARN_HOME=$HADOOP_HOME
  9. LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$JAVA_HOME/jre/lib/amd64/server
  10. PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
  11. export PATH
複製到全部機器上:
  1. su hdfs
  2. scp -r hdfs@hadoop1:/home/hdfs/.bash_profile ~
 
3.1.2 設置hadoop啓動的環境配置文件xxx-evn.sh
在hadoop1上:
3.1.2.1 hadoop-env.s
該文件主要包括啓動jvm的內存參數,環境變量等:
  1. export JAVA_HOME=/usr/local/java/jdk1.8.0_121
  2. export HADOOP_HOME=/opt/hadoop/hadoop-2.7.3
  3. #hadoop進程的最大heapsize包括namenode/datanode/ secondarynamenode等,默認1000M
  4. #export HADOOP_HEAPSIZE=
  5. #namenode的初始heapsize,默認取上面的值,按須要分配
  6. #export HADOOP_NAMENODE_INIT_HEAPSIZE=""
  7. #JVM啓動參數,默認爲空
  8. export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"
  9. #還能夠單獨配置各個組件的內存:
  10. export HADOOP_NAMENODE_OPTS=
  11. export HADOOP_DATANODE_OPTS
  12. export HADOOP_SECONDARYNAMENODE_OPTS
  13. #設置hadoop日誌,默認是$HADOOP_HOME/log
  14. export HADOOP_LOG_DIR=${HADOOP_LOG_DIR}/$USER
根據本身系統的規劃來設置各個參數.要注意namenode所用的blockmap和namespace空間都在heapsize中,因此生產環境要設較大的 heapsize.
注意全部組件使用的內存和,生產給linux系統留5-15%的內存(通常留10G).
 
這裏就不設置這些參數,按需分配便可.
 
3.1.2.2 yarn-env.sh
  1. export JAVA_HOME=/usr/local/java/jdk1.8.0_121
  2. JAVA_HEAP_MAX=-Xmx1000m
  3. # YARN_HEAPSIZE=1000 #yarn 守護進程heapsize
  4. #export YARN_RESOURCEMANAGER_HEAPSIZE=1000 #單獨設置RESOURCEMANAGER的HEAPSIZE
  5. #export YARN_TIMELINESERVER_HEAPSIZE=1000 #單獨設置TIMELINESERVER(jobhistoryServer)的HEAPSIZE
  6. #export YARN_RESOURCEMANAGER_OPTS= #單獨設置RESOURCEMANAGER的JVM選項
  7. #export YARN_NODEMANAGER_HEAPSIZE=1000 #單獨設置NODEMANAGER的HEAPSIZE
  8. #export YARN_NODEMANAGER_OPTS= #單獨設置NODEMANAGER的JVM選項
同hadoop-env.sh按須要分配
 
3.1.3 修改hadoop配置文件
在hadoop1上:
3.1.3.1 修改core-site.xml
  1. <configuration>
  2. <property>
  3. <name>fs.defaultFS</name>
  4. <value>hdfs://hadoop1:9000</value>
  5.              <description>HDFS 端口</description>
  6. </property>
  7. <property>
  8. <name>io.file.buffer.size</name>
  9. <value>131072</value>
  10. </property>
  11. <property>
  12. <name>hadoop.tmp.dir</name>
  13. <value>/opt/hadoop/hadoop-2.7.3/tmp</value>
  14. <description>默認值/tmp/hadoop-${user.name},修改爲持久化的目錄</description>
  15. </property>
  16. </configuration>
用hdfs用戶建立目錄:
 
  1. mkdir ${HADOOP_HOME}/tmp
 
3.1.3.2 hdfs-site.xm
  1. <configuration>
  2. <property>
  3. <name>dfs.replication</name>
  4. <value>3</value>              
  5.                <description>數據塊的備份數量</description>
  6. </property>
  7. <property>
  8. <name>dfs.namenode.name.dir</name>
  9. <value>/opt/hadoop/hadoop-2.7.3/namenodedir</value>
  10.                <description>保存namenode元數據的目錄,要本身建立</description>
  11. </property>
  12. <property>
  13. <name>dfs.blocksize</name>
  14. <value>134217728</value>
          1. <description>數據塊大小,128M</description>
  15. </property>
  16. <property>
  17. <name>dfs.datanode.data.dir</name>
  18. <value>/opt/hadoop/hadoop-2.7.3/datadir</value>
  19.             <description>datanode 數據目錄</description>
  20. </property>
  21. </configuration>
用hdfs用戶建立目錄:
  1. mkdir ${HADOOP_HOME}/datadir
  2. mkdir ${HADOOP_HOME}/namenodedir
 
3.1.3.3 mapred-site.xml
MR任務的設置
Parameter Value Notes
mapreduce.framework.name yarn Execution framework set to Hadoop YARN.     MR任務執行框架
mapreduce.map.memory.mb 1536 Larger resource limit for maps.                       map內存上限
mapreduce.map.java.opts -Xmx1024M Larger heap-size for child jvms of maps.         map的子進程虛擬機heapsize
mapreduce.reduce.memory.mb 3072 Larger resource limit for reduces.                  redouce任務內存上限
mapreduce.reduce.java.opts -Xmx2560M Larger heap-size for child jvms of reduces.    redouce的子進程虛擬機heapsize
mapreduce.task.io.sort.mb 512 Higher memory-limit while sorting data for efficiency.    排序內存
mapreduce.task.io.sort.factor 100 More streams merged at once while sorting files.            排序因子
mapreduce.reduce.shuffle.parallelcopies 50 Higher number of parallel copies run by reduces to fetch outputs from very large number of maps.  並行數
jobhistoryServe:
Parameter Value Notes
mapreduce.jobhistory.address MapReduce JobHistory Server host:port Default port is 10020.    jobhistory地址:主機+端口
mapreduce.jobhistory.webapp.address MapReduce JobHistory Server Web UI host:port Default port is 19888.   jobhistory web端口
mapreduce.jobhistory.intermediate-done-dir /mr-history/tmp Directory where history files are written by MapReduce jobs.  
mapreduce.jobhistory.done-dir /mr-history/done Directory where history files are managed by the MR JobHistory Server.
 
這裏只配置如下參數:
  1. <configuration>
  2. <property>
  3. <name>mapreduce.framework.name</name>
  4. <value>yarn</value>
  5.               <description>使用yarn來管理mr</description>
  6. </property>
  7. <property>
  8. <name>mapreduce.jobhistory.address</name>
  9. <value>hadoop2</value>
  10. </property>
  11. <property>
  12. <name>mapreduce.jobhistory.webapp.address</name>
  13. <value>hadoop2</value>
  14. </property>
  15. <property>
  16. <name>mapreduce.jobhistory.intermediate-done-dir</name>
  17. <value>/opt/hadoop/hadoop-2.7.3/mrHtmp</value>
  18. </property>
  19. <property>
  20. <name>mapreduce.jobhistory.done-dir</name>
  21. <value>/opt/hadoop/hadoop-2.7.3/mrhHdone</value>
  22. </property>
  23. </configuration>
在hadoop2上建立目錄:
mkdir  ${HADOOP_HOME}/mrHtmp
mkdir  ${HADOOP_HOME}/mrhHdone
 
3.1.3.4 yarn-site.xml
yarn-site.xml有衆多參數可設置,大多數都有默認參數,參考官網:
如下是幾個比較關鍵的參數:
ResourceManager配置:
Parameter Value Notes
yarn.resourcemanager.address ResourceManager host:port for clients to submit jobs. host:port If set, overrides the hostname set in yarn.resourcemanager.hostname.   resourcemanager的地址,格式 主機:端口
yarn.resourcemanager.scheduler.address ResourceManager host:port for ApplicationMasters to talk to Scheduler to obtain resources. host:port If set, overrides the hostname set in yarn.resourcemanager.hostname.    調度器地址 ,覆蓋yarn.resourcemanager.hostname
yarn.resourcemanager.resource-tracker.address ResourceManager host:port for NodeManagers. host:port If set, overrides the hostname set in yarn.resourcemanager.hostname.    datanode像rm報告的端口, 覆蓋 yarn.resourcemanager.hostname
yarn.resourcemanager.admin.address ResourceManager host:port for administrative commands. host:port If set, overrides the hostname set in yarn.resourcemanager.hostname.    RM管理地址,覆蓋 yarn.resourcemanager.hostname
yarn.resourcemanager.webapp.address ResourceManager web-ui host:port. host:port If set, overrides the hostname set in yarn.resourcemanager.hostname.    RM web地址,有默認值
yarn.resourcemanager.hostname ResourceManager host. host Single hostname that can be set in place of setting allyarn.resourcemanager*address resources. Results in default ports for ResourceManager components.                           RM的主機,使用默認端口
yarn.resourcemanager.scheduler.class ResourceManager Scheduler class. CapacityScheduler (recommended), FairScheduler (also recommended), or FifoScheduler
yarn.scheduler.minimum-allocation-mb Minimum limit of memory to allocate to each container request at the Resource Manager. In MBs            最小容器內存(每一個container最小內存)                             
yarn.scheduler.maximum-allocation-mb Maximum limit of memory to allocate to each container request at the Resource Manager. In MBs           最大容器內存(每一個container最大內存)   
yarn.resourcemanager.nodes.include-path /yarn.resourcemanager.nodes.exclude-path List of permitted/excluded NodeManagers. If necessary, use these files to control the list of allowable NodeManagers.    哪些datanode能夠被RM管理
 
NodeManager配置:
yarn.nodemanager.resource.memory-mb Resource i.e. available physical memory, in MB, for given NodeManager Defines total available resources on the NodeManager to be made available to running containers    Yarn在NodeManager最大內存
yarn.nodemanager.vmem-pmem-ratio Maximum ratio by which virtual memory usage of tasks may exceed physical memory The virtual memory usage of each task may exceed its physical memory limit by this ratio. The total amount of virtual memory used by tasks on the NodeManager may exceed its physical memory usage by this ratio.     任務使用的虛擬內存超過被容許的推理內存的比率,超過則kill掉
yarn.nodemanager.local-dirs Comma-separated list of paths on the local filesystem where intermediate data is written. Multiple paths help spread disk i/o.    datamanager的本地目錄
yarn.nodemanager.log-dirs Comma-separated list of paths on the local filesystem where logs are written. Multiple paths help spread disk i/o.   datamanager日誌目錄
yarn.nodemanager.log.retain-seconds 10800 Default time (in seconds) to retain log files on the NodeManager Only applicable if log-aggregation is disabled.
yarn.nodemanager.remote-app-log-dir /logs HDFS directory where the application logs are moved on application completion. Need to set appropriate permissions. Only applicable if log-aggregation is enabled.
yarn.nodemanager.remote-app-log-dir-suffix logs Suffix appended to the remote log dir. Logs will be aggregated to ${yarn.nodemanager.remote-app-log-dir}/${user}/${thisParam} Only applicable if log-aggregation is enabled.
yarn.nodemanager.aux-services mapreduce_shuffle Shuffle service that needs to be set for Map Reduce applications.   shuffle服務類型
 
YARN的ACL配置:
yarn.acl.enable true /false Enable ACLs? Defaults to false.  是否開啓ACL
yarn.admin.acl Admin ACL ACL to set admins on the cluster. ACLs are of for comma-separated-usersspacecomma-separated-groups. Defaults to special value of * which meansanyone. Special value of just space means no one has access.              ACL用戶,用,分隔     如root,yarn
yarn.log-aggregation-enable false Configuration to enable or disable log aggregation                 啓用日誌彙集.日誌聚焦到一個節點
 
本次實驗只設置yarn.resourcemanager.hostname
  1. <configuration>
  2. <!-- Site specific YARN configuration properties -->
  3. <property>
  4. <name>yarn.resourcemanager.hostname</name>
  5. <value>hadoop1</value>
  6. <description>設置resourcemanager節點</description>
  7. </property>
  8. <!-- Site specific YARN configuration properties -->
  9. <property>
  10. <name>yarn.nodemanager.aux-services</name>
  11. <value>mapreduce_shuffle</value>
  12. <description>設置nodemanager的aux服務</description>
  13. </property>
  14. </configuration>
 
3.1.4 設置slaves文件
在hadoop1上,在${HADOOP_HOME}/etc/hadoop/slaves中加入datanode\nodemanager的主機名:
  1. vi $HADOOP_HOME/et/hadoop/slaves
  2. hadoop3
  3. hadoop4
  4. hadoop5
  5. hadoop6
 
3.1.5 啓動HADOOP
3.1.5.1 把hadoop配置複製到其它機器上
在其它機器上執行:
  1. mkdir /opt/hadoop
  2. chown hdfs:hadoop /opt/hadoop
  3. su hdfs
  4. scp -r hdfs@hadoop1:/opt/hadoop/hadoop-2.7.3 /opt/hadoop
 
3.1.5.2 格式化namenode
  1. $HADOOP_HOME/bin/hdfs namenode -format
 
3.1.5.3 啓動hdfs
找一臺機器:
  1. [hdfs@9321a27a2b91 hadoop-2.7.3]$ start-dfs.sh
  2. Starting namenodes on [hadoop1]
  3. hadoop1: starting namenode, logging to /opt/hadoop/hadoop-2.7.3/logs/hadoop-hdfs-namenode-9321a27a2b91.out
  4. hadoop3: starting datanode, logging to /opt/hadoop/hadoop-2.7.3/logs/hadoop-hdfs-datanode-f89eaf2a2548.out
  5. hadoop4: starting datanode, logging to /opt/hadoop/hadoop-2.7.3/logs/hadoop-hdfs-datanode-28620eee1426.out
  6. hadoop5: starting datanode, logging to /opt/hadoop/hadoop-2.7.3/logs/hadoop-hdfs-datanode-ae1f06bd04c8.out
  7. hadoop6: starting datanode, logging to /opt/hadoop/hadoop-2.7.3/logs/hadoop-hdfs-datanode-11c433a003b6.out
  8. Starting secondary namenodes [0.0.0.0]
  9. 0.0.0.0: starting secondarynamenode, logging to /opt/hadoop/hadoop-2.7.3/logs/hadoop-hdfs-secondarynamenode-9321a27a2b91.out
執行jps命令,查看進程:
  1. [hdfs@9321a27a2b91 hadoop]$ jps
  2. 11105 Jps
  3. 10981 SecondaryNameNode
  4. 10777 NameNode
 
驗證hdfs:
  1. [hdfs@9321a27a2b91 hadoop-2.7.3]$ hdfs dfs -put NOTICE.txt /
  2. [hdfs@9321a27a2b91 hadoop-2.7.3]$ hdfs dfs -ls /
  3. Found 1 items
  4. -rw-r--r-- 3 hdfs supergroup 14978 2017-04-03 19:15 /NOTICE.txt
查看hdfs web頁面:
  1. [root@9321a27a2b91 hdfs]# curl hadoop1:50070
  2. <!--
  3. Licensed to the Apache Software Foundation (ASF) under one or more
  4. contributor license agreements. See the NOTICE file distributed with
  5. this work for additional information regarding copyright ownership.
  6. ................
固然,若是作了端口映射 就能夠看見web了!
 至於怎麼作端口映射,參考我另外一篇文章<docker iptables端口映射>
 
 
 
3.1.5.4 啓動yarn
本想以yarn用戶啓動yarn,可是又要配置yarn的ssh又要配置環境變量,挺麻煩的,就以hdfs用戶啓動yarn.
  1. [hdfs@9321a27a2b91 hadoop]$ start-yarn.sh
  2. starting yarn daemons
  3. starting resourcemanager, logging to /opt/hadoop/hadoop-2.7.3/logs/yarn-hdfs-resourcemanager-9321a27a2b91.out
  4. hadoop5: starting nodemanager, logging to /opt/hadoop/hadoop-2.7.3/logs/yarn-hdfs-nodemanager-ae1f06bd04c8.out
  5. hadoop6: starting nodemanager, logging to /opt/hadoop/hadoop-2.7.3/logs/yarn-hdfs-nodemanager-11c433a003b6.out
  6. hadoop3: starting nodemanager, logging to /opt/hadoop/hadoop-2.7.3/logs/yarn-hdfs-nodemanager-f89eaf2a2548.out
  7. hadoop4: starting nodemanager, logging to /opt/hadoop/hadoop-2.7.3/logs/yarn-hdfs-nodemanager-28620eee1426.out
執行jps查看進程:
  1. [hdfs@9321a27a2b91 hadoop]$ jps
  2. 11105 Jps
  3. 10981 SecondaryNameNode
  4. 10777 NameNode
  5. 10383 ResourceManager
 
查看yarn頁面:
經過curl沒有查到數據,可是web端口能夠看到:
 
3.1.5.5 測試集羣配置
使用hadooop example程序來驗證一下集羣正確性:
  1. [hdfs@9321a27a2b91 hadoop-2.7.3]$ bin/hdfs dfs -mkdir /user
  2. [hdfs@9321a27a2b91 hadoop-2.7.3]$ bin/hdfs dfs -mkdir /user/hdfs
  3. [hdfs@9321a27a2b91 hadoop-2.7.3]$ bin/hdfs dfs -put etc/hadoop input
  4. ...............
  5. 17/04/12 12:38:24 INFO mapreduce.JobSubmitter: number of splits:30
  6. 17/04/12 12:38:24 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1491968887469_0003
  7. 17/04/12 12:38:24 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/hdfs/.staging/job_1491968887469_0003
  8. java.lang.IllegalArgumentException: Does not contain a valid host:port authority: hadoop2
  9. at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:213)
  10. ........................
呀,報錯了!
錯誤信息:
找不到hadoop2主機!
什麼,hadoop2這個主機?不該該是7c3a3c9cd595嗎?臥槽.趕忙修改全部以hadoop開頭的主機名爲docker真正的主機名:
core-site.xml:
<value>hdfs://hadoop1:9000</value> 改成<value>hdfs://9321a27a2b91:9000</value>
yarn-site.xml:
<value>hadoop1</value>改成 <value> 9321a27a2b91 </value>
mapred-site.xml:
<value>hadoop2</value>改成 <value> 7c3a3c9cd595 </value>
slaves文件:
hadoop3
hadoop4
hadoop5
hadoop6
改成:
f89eaf2a2548
28620eee1426
ae1f06bd04c8
11c433a003b6
 
關閉yarn\hdfs,而後scp把etc/hadoop下的配置文件發到各個節點上:
scp -r hdfs@hadoop1:/opt/hadoop/hadoop-2.7.3/etc/hadoop/* /opt/hadoop/hadoop-2.7.3/etc/hadoop/
再啓hdfs:
  1. [hdfs@9321a27a2b91 hadoop]$ start-dfs.sh
  2. Starting namenodes on [9321a27a2b91]
  3. The authenticity of host '9321a27a2b91 (172.17.0.10)' can't be established.
  4. RSA key fingerprint is 60:0c:61:73:2c:49:ef:e3:f7:61:c9:27:93:5a:1d:c7.
  5. Are you sure you want to continue connecting (yes/no)?
臥槽,居然又要設置ssh免密碼!ssh免密碼是以主機名的字符而設置的,9321a27a2b91和hadoop1居然不是對等的!我居然無主以對!
 
3.1.5.6 各個節點上啓動hdfs守護進程
算了,在各個節點上單獨啓動hdfs:
9321a27a2b91 hadoop1啓動namenode:
$HADOOP_HOME/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start namenode
如下節點啓動datanode:
f89eaf2a2548 hadoop3
28620eee1426 hadoop4
ae1f06bd04c8 hadoop5
11c433a003b6 hadoop6
  1. [hdfs@11c433a003b6 hadoop-2.7.3]$ $HADOOP_HOME/sbin/hadoop-daemons.sh start datanode
  2. The authenticity of host '28620eee1426 (172.17.0.7)' can't be established.
  3. RSA key fingerprint is 60:0c:61:73:2c:49:ef:e3:f7:61:c9:27:93:5a:1d:c7.
  4. Are you sure you want to continue connecting (yes/no)? The authenticity of host '11c433a003b6 (172.17.0.5)' can't be established.
  5. RSA key fingerprint is 60:0c:61:73:2c:49:ef:e3:f7:61:c9:27:93:5a:1d:c7.
  6. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'ae1f06bd04c8 (172.17.0.6)' can't be established.
  7. RSA key fingerprint is 60:0c:61:73:2c:49:ef:e3:f7:61:c9:27:93:5a:1d:c7.
  8. Are you sure you want to continue connecting (yes/no)? f89eaf2a2548: datanode running as process 5764. Stop it first.
臥槽,在其中一個節點上執行時居然會一塊兒啓動其它節點!好吧,hadoop-daemons.sh這個腳本確定是讀取了slave文件的內容.嗯,在datanode節點把slave文件移出rm $HADOOP_HOME/etc/hadoop/slaves ,再啓動datanode:
  1. [hdfs@11c433a003b6 hadoop-2.7.3]$ $HADOOP_HOME/sbin/hadoop-daemons.sh --config $HADOOP_CONF_DIR --script hdfs start datanode
  2. cat: /opt/hadoop/hadoop-2.7.3/etc/hadoop/slaves: No such file or directory
照樣報錯,真是日了狗了!來看看這個$HADOOP_HOME/sbin/hadoop-daemons.sh裏面是啥!
  1. usage="Usage: hadoop-daemons.sh [--config confdir] [--hosts hostlistfile] [start|stop] command args..."
  2. # if no args specified, show usage
  3. if[ $# -le 1 ]; then
  4. echo $usage
  5. exit 1
  6. fi
  7. bin=`dirname "${BASH_SOURCE-$0}"`
  8. bin=`cd "$bin"; pwd`
  9. DEFAULT_LIBEXEC_DIR="$bin"/../libexec
  10. HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}
  11. . $HADOOP_LIBEXEC_DIR/hadoop-config.sh
  12. exec "$bin/slaves.sh"--config $HADOOP_CONF_DIR cd "$HADOOP_PREFIX" \; "$bin/hadoop-daemon.sh"--config $HADOOP_CONF_DIR "$@"
臥槽尼瑪!居然是引用了 slave.sh  hadoop-daemon.sh  !!!!明明單節點啓動要用 hadoop-daemon.sh  ,官網地告訴咱們要用 hadoop-daemons.sh  ,尼瑪坑爹啊!
 好了,用 hadoop-daemon.sh  來啓動:
$HADOOP_HOME/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start datanode
好了,終於啓動成功了.
 
3.1.5.6 各個節點上啓動yarn守護進程
 
9321a27a2b91 hadoop1啓動resourcemanager:
 
  1. [hdfs@9321a27a2b91 hadoop]$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start namenode
  2. starting namenode, logging to /opt/hadoop/hadoop-2.7.3/logs/hadoop-hdfs-namenode-9321a27a2b91.out
如下節點啓動node manager:
f89eaf2a2548 hadoop3
28620eee1426 hadoop4
ae1f06bd04c8 hadoop5
11c433a003b6 hadoop6
  1. [hdfs@f89eaf2a2548 hadoop-2.7.3]$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start resourcemanager
  2. starting resourcemanager, logging to /opt/hadoop/hadoop-2.7.3/logs/yarn-hdfs-resourcemanager-f89eaf2a2548.out
一樣,官方文檔依然是搞錯了,我!
 
好了yarn也啓動成功了.
再來測試一下example程序:
  1. [hdfs@9321a27a2b91 hadoop-2.7.3]$ bin/hdfs dfs -put etc/hadoop input
  2. ...............
  3. 17/04/1212:38:24 INFO mapreduce.JobSubmitter: number of splits:30
  4. 17/04/1212:38:24 INFO mapreduce.JobSubmitter:Submitting tokens for job: job_1491968887469_0003
  5. 17/04/1212:38:24 INFO mapreduce.JobSubmitter:Cleaning up the staging area /tmp/hadoop-yarn/staging/hdfs/.staging/job_1491968887469_0003
  6. java.lang.IllegalArgumentException:Does not contain a valid host:port authority: hadoop2
  7. at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:213)
  8. ........................
老問題啊我X!那就不是主機的別名問題了.應該是
  1. <property>
  2. <name>mapreduce.jobhistory.address</name>
  3. <value>7c3a3c9cd595</value>
  4. </property>
  5. <property>
  6. <name>mapreduce.jobhistory.webapp.address</name>
  7. <value>7c3a3c9cd595</value>
  8. </property>
這裏要加上端口,改爲
  1. <property>
  2. <name>mapreduce.jobhistory.address</name>
  3. <value>7c3a3c9cd595:10020</value>
  4. </property>
  5. <property>
  6. <name>mapreduce.jobhistory.webapp.address</name>
  7. <value>7c3a3c9cd595:19888</value>
  8. </property>
分發到各個客戶端:
scp -r hdfs@hadoop1:/opt/hadoop/hadoop-2.7.3/etc/hadoop/* /opt/hadoop/hadoop-2.7.3/etc/hadoop/,重啓hdfs和yarn,再測試:
這一次不報那個問題了! 
爲何第一次沒寫端口,由於hadoop2.6.3的安裝中,不用寫jobhistory的端口,因此...我偷了懶~
雖然不報那個錯,可是報另一個錯誤:
  1. 2017-04-0319:13:12,328 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:IOExceptionin offerService
  2. java.io.EOFException:End of FileException between local host is:"ae1f06bd04c8/172.17.0.6"; destination host is:"hadoop1":9000;: java.io.EOFException;For more details see: http://wiki.a
  3. pache.org/hadoop/EOFException
  4. at sun.reflect.NativeConstructorAccessorImpl.newInstance0(NativeMethod)
  5. at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  6. at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
這個nodemanager日誌錯誤,IO不能夠.四個nodemanager中掛了三個.當只有一個nodemanger啓動時,任務能夠執行成功.說明同一臺宿主機上docker集羣有磁盤爭用,一旦一個進程用了其它進程就不能用了,這個問題我下次用docker單獨掛不一樣的目錄試試.
 

四.總結

4.1.hadoop的安裝步驟

其實很簡單的,大部分工做都在配置環境
1)安裝JDK
2)添加hadoop組,添加hdfs用戶,若是你願意也能夠加上yarn
3)設置環境變量 /home/hdfs/.bash_profil   /etc/hosts 
4)關閉防火牆
5)修改ulimit
6) 配置SSH,此步驟能夠省略,只要你願意在每一個節點上啓動守護進程
7)安裝ntp
8)下載hadoop安裝包並解壓
9)修改hadoop配置文件,並啓動hdfs yarn

4.2問題總結

1)jobhistory裏的端口要加上(爲何2.6.3版本可能不加?)
2)在每一個節點上啓動守護進程(不用start-dfs.sh start-yarn.sh)時,要注意用daemon.sh而非daemons.sh!
 
zookpeer尚未安裝,這將在搞完docker磁盤爭用後單獨一個節來寫.
後續的計劃:
1.解決docker磁盤爭用
2.安裝zookpeer
3.安裝hdfs ha
4.安裝hive
5.安裝hbaes
6.安裝kafka
7.安裝solr
8.安裝es
固然,有可能有一些改變.
 
 
 
 後記:
次日我又從新建一個hadoop集羣,緣由是弄好了docker的iptables端口映射\容器主機名\固定IP\掛載目錄這四樣東西.新的hadoop集羣IP:
172.18.0.11
172.18.0.12
172.18.0.13
172.18.0.14
172.18.0.15
172.18.0.16
角色跟上面的同樣.之後的更新將按照新部署的集羣來作.
 
注:cnblog支持markdown語法,之後會寫markdown的格式~
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 



相關文章
相關標籤/搜索