對於Hadoop來講,最主要的是兩個方面,一個是分佈式文件系統HDFS,另外一個是MapReduce計算模型,下面講解下我在搭建Hadoop 環境過程。
Hadoop 測試環境html
- 共4臺測試機,1臺namenode 3臺datanode
- OS版本:RHEL 5.5 X86_64
- Hadoop:0.20.203.0
- Jdk:jdk1.7.0
- 角色 ip地址
- namenode 192.168.57.75
- datanode1 192.168.57.76
- datanode2 192.168.57.78
- datanode3 192.168.57.79
一 部署 Hadoop 前的準備工做java
- 1 須要知道hadoop依賴Java和SSH
- Java 1.5.x (以上),必須安裝。
- ssh 必須安裝而且保證 sshd 一直運行,以便用Hadoop 腳本管理遠端Hadoop守護進程。
- 2 創建 Hadoop 公共賬號
- 全部的節點應該具備相同的用戶名,可使用以下命令添加:
- useradd hadoop
- passwd hadoop
- 3 配置 host 主機名
- tail -n 3 /etc/hosts
- 192.168.57.75 namenode
- 192.168.57.76 datanode1
- 192.168.57.78 datanode2
- 192.168.57.79 datanode3
- 4 以上幾點要求全部節點(namenode|datanode)配置所有相同
二 ssh 配置
ssh 詳細瞭解node
- 1 生成私匙 id_rsa 與 公匙 id_rsa.pub 配置文件
- [hadoop@hadoop1 ~]$ ssh-keygen -t rsa
- Generating public/private rsa key pair.
- Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
- Enter passphrase (empty for no passphrase):
- Enter same passphrase again:
- Your identification has been saved in /home/hadoop/.ssh/id_rsa.
- Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
- The key fingerprint is:
- d6:63:76:43:e2:5b:8e:85:ab:67:a2:7c:a6:8f:23:f9 hadoop@hadoop1.test.com
- 2 私匙 id_rsa 與 公匙 id_rsa.pub 配置文件
- [hadoop@hadoop1 ~]$ ls .ssh/
- authorized_keys id_rsa id_rsa.pub known_hosts
- 3 把公匙文件上傳到datanode服務器
- [hadoop@hadoop1 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@datanode1
- 28
- hadoop@datanode1's password:
- Now try logging into the machine, with "ssh 'hadoop@datanode1'", and check in:
- .ssh/authorized_keys
- to make sure we haven't added extra keys that you weren't expecting.
- [hadoop@hadoop1 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@datanode2
- 28
- hadoop@datanode2's password:
- Now try logging into the machine, with "ssh 'hadoop@datanode2'", and check in:
- .ssh/authorized_keys
- to make sure we haven't added extra keys that you weren't expecting.
- [hadoop@hadoop1 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@datanode3
- 28
- hadoop@datanode3's password:
- Now try logging into the machine, with "ssh 'hadoop@datanode3'", and check in:
- .ssh/authorized_keys
- to make sure we haven't added extra keys that you weren't expecting.
- [hadoop@hadoop1 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@localhost
- 28
- hadoop@localhost's password:
- Now try logging into the machine, with "ssh 'hadoop@localhost'", and check in:
- .ssh/authorized_keys
- to make sure we haven't added extra keys that you weren't expecting.
- 4 驗證
- [hadoop@hadoop1 ~]$ ssh datanode1
- Last login: Thu Feb 2 09:01:16 2012 from 192.168.57.71
- [hadoop@hadoop2 ~]$ exit
- logout
- [hadoop@hadoop1 ~]$ ssh datanode2
- Last login: Thu Feb 2 09:01:18 2012 from 192.168.57.71
- [hadoop@hadoop3 ~]$ exit
- logout
- [hadoop@hadoop1 ~]$ ssh datanode3
- Last login: Thu Feb 2 09:01:20 2012 from 192.168.57.71
- [hadoop@hadoop4 ~]$ exit
- logout
- [hadoop@hadoop1 ~]$ ssh localhost
- Last login: Thu Feb 2 09:01:24 2012 from 192.168.57.71
- [hadoop@hadoop1 ~]$ exit
- logout
三 java環境配置linux
- 1 下載合適的jdk
- //此文件爲64Linux 系統使用的 RPM包
- wget http://download.oracle.com/otn-pub/java/jdk/7/jdk-7-linux-x64.rpm
- 2 安裝jdk
- rpm -ivh jdk-7-linux-x64.rpm
- 3 驗證java
- [root@hadoop1 ~]# java -version
- java version "1.7.0"
- Java(TM) SE Runtime Environment (build 1.7.0-b147)
- Java HotSpot(TM) 64-Bit Server VM (build 21.0-b17, mixed mode)
- [root@hadoop1 ~]# ls /usr/java/
- default jdk1.7.0 latest
- 4 配置java環境變量
- #vim /etc/profile //在profile文件中加入以下信息:
- #add for hadoop
- export JAVA_HOME=/usr/java/jdk1.7.0
- export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/
- export PATH=$PATH:$JAVA_HOME/bin
- //使環境變量生效
- source /etc/profile
- 5 拷貝 /etc/profile 到 datanode
- [root@hadoop1 src]# scp /etc/profile root@datanode1:/etc/
- The authenticity of host 'datanode1 (192.168.57.86)' can't be established.
- RSA key fingerprint is b5:00:d1:df:73:4c:94:f1:ea:1f:b5:cd:ed:3a:cc:e1.
- Are you sure you want to continue connecting (yes/no)? yes
- Warning: Permanently added 'datanode1,192.168.57.86' (RSA) to the list of known hosts.
- root@datanode1's password:
- profile 100% 1624 1.6KB/s 00:00
- [root@hadoop1 src]# scp /etc/profile root@datanode2:/etc/
- The authenticity of host 'datanode2 (192.168.57.87)' can't be established.
- RSA key fingerprint is 57:cf:96:15:78:a3:94:93:30:16:8e:66:47:cd:f9:cd.
- Are you sure you want to continue connecting (yes/no)? yes
- Warning: Permanently added 'datanode2,192.168.57.87' (RSA) to the list of known hosts.
- root@datanode2's password:
- profile 100% 1624 1.6KB/s 00:00
- [root@hadoop1 src]# scp /etc/profile root@datanode3:/etc/
- The authenticity of host 'datanode3 (192.168.57.88)' can't be established.
- RSA key fingerprint is 31:73:e8:3c:20:0c:1e:b2:59:5c:d1:01:4b:26:41:70.
- Are you sure you want to continue connecting (yes/no)? yes
- Warning: Permanently added 'datanode3,192.168.57.88' (RSA) to the list of known hosts.
- root@datanode3's password:
- profile 100% 1624 1.6KB/s 00:00
- 6 拷貝 jdk 安裝包,並在每一個datanode 節點安裝 jdk 包
- [root@hadoop1 ~]# scp -r /home/hadoop/src/ hadoop@datanode1:/home/hadoop/
- hadoop@datanode1's password:
- hadoop-0.20.203.0rc1.tar.gz 100% 58MB 57.8MB/s 00:01
- jdk-7-linux-x64.rpm 100% 78MB 77.9MB/s 00:01
- [root@hadoop1 ~]# scp -r /home/hadoop/src/ hadoop@datanode2:/home/hadoop/
- hadoop@datanode2's password:
- hadoop-0.20.203.0rc1.tar.gz 100% 58MB 57.8MB/s 00:01
- jdk-7-linux-x64.rpm 100% 78MB 77.9MB/s 00:01
- [root@hadoop1 ~]# scp -r /home/hadoop/src/ hadoop@datanode3:/home/hadoop/
- hadoop@datanode3's password:
- hadoop-0.20.203.0rc1.tar.gz 100% 58MB 57.8MB/s 00:01
- jdk-7-linux-x64.rpm 100% 78MB 77.9MB/s 00:01
四 hadoop 配置
//注意使用hadoop 用戶 操做web
- 1 配置目錄
- [hadoop@hadoop1 ~]$ pwd
- /home/hadoop
- [hadoop@hadoop1 ~]$ ll
- total 59220
- lrwxrwxrwx 1 hadoop hadoop 17 Feb 1 16:59 hadoop -> hadoop-0.20.203.0
- drwxr-xr-x 12 hadoop hadoop 4096 Feb 1 17:31 hadoop-0.20.203.0
- -rw-r--r-- 1 hadoop hadoop 60569605 Feb 1 14:24 hadoop-0.20.203.0rc1.tar.gz
- 2 配置hadoop-env.sh,指定java位置
- vim hadoop/conf/hadoop-env.sh
- export JAVA_HOME=/usr/java/jdk1.7.0
- 3 配置core-site.xml //定位文件系統的 namenode
- [hadoop@hadoop1 ~]$ cat hadoop/conf/core-site.xml
- <?xml version="1.0"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <!-- Put site-specific property overrides in this file. -->
- <configuration>
- <property>
- <name>fs.default.name</name>
- <value>hdfs://namenode:9000</value>
- </property>
- </configuration>
- 4 配置mapred-site.xml //定位jobtracker 所在的主節點
- [hadoop@hadoop1 ~]$ cat hadoop/conf/mapred-site.xml
- <?xml version="1.0"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <!-- Put site-specific property overrides in this file. -->
- <configuration>
- <property>
- <name>mapred.job.tracker</name>
- <value>namenode:9001</value>
- </property>
- </configuration>
- 5 配置hdfs-site.xml //配置HDFS副本數量
- [hadoop@hadoop1 ~]$ cat hadoop/conf/hdfs-site.xml
- <?xml version="1.0"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <!-- Put site-specific property overrides in this file. -->
- <configuration>
- <property>
- <name>dfs.replication</name>
- <value>3</value>
- </property>
- </configuration>
- 6 配置 master 與 slave 配置文檔
- [hadoop@hadoop1 ~]$ cat hadoop/conf/masters
- namenode
- [hadoop@hadoop1 ~]$ cat hadoop/conf/slaves
- datanode1
- datanode2
- 7 拷貝hadoop 目錄到全部節點(datanode)
- [hadoop@hadoop1 ~]$ scp -r hadoop hadoop@datanode1:/home/hadoop/
- [hadoop@hadoop1 ~]$ scp -r hadoop hadoop@datanode2:/home/hadoop/
- [hadoop@hadoop1 ~]$ scp -r hadoop hadoop@datanode3:/home/hadoop
- 8 格式化 HDFS
- [hadoop@hadoop1 hadoop]$ bin/hadoop namenode -format
- 12/02/02 11:31:15 INFO namenode.NameNode: STARTUP_MSG:
- /************************************************************
- STARTUP_MSG: Starting NameNode
- STARTUP_MSG: host = hadoop1.test.com/127.0.0.1
- STARTUP_MSG: args = [-format]
- STARTUP_MSG: version = 0.20.203.0
- STARTUP_MSG: build = http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security-203 -r 1099333; compiled by 'oom' on Wed May 4 07:57:50 PDT 2011
- ************************************************************/
- Re-format filesystem in /tmp/hadoop-hadoop/dfs/name ? (Y or N) Y //這裏輸入Y
- 12/02/02 11:31:17 INFO util.GSet: VM type = 64-bit
- 12/02/02 11:31:17 INFO util.GSet: 2% max memory = 19.33375 MB
- 12/02/02 11:31:17 INFO util.GSet: capacity = 2^21 = 2097152 entries
- 12/02/02 11:31:17 INFO util.GSet: recommended=2097152, actual=2097152
- 12/02/02 11:31:17 INFO namenode.FSNamesystem: fsOwner=hadoop
- 12/02/02 11:31:18 INFO namenode.FSNamesystem: supergroupsupergroup=supergroup
- 12/02/02 11:31:18 INFO namenode.FSNamesystem: isPermissionEnabled=true
- 12/02/02 11:31:18 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
- 12/02/02 11:31:18 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
- 12/02/02 11:31:18 INFO namenode.NameNode: Caching file names occuring more than 10 times
- 12/02/02 11:31:18 INFO common.Storage: Image file of size 112 saved in 0 seconds.
- 12/02/02 11:31:18 INFO common.Storage: Storage directory /tmp/hadoop-hadoop/dfs/name has been successfully formatted.
- 12/02/02 11:31:18 INFO namenode.NameNode: SHUTDOWN_MSG:
- /************************************************************
- SHUTDOWN_MSG: Shutting down NameNode at hadoop1.test.com/127.0.0.1
- ************************************************************/
- [hadoop@hadoop1 hadoop]$
- 9 啓動hadoop 守護進程
- [hadoop@hadoop1 hadoop]$ bin/start-all.sh
- starting namenode, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-namenode-hadoop1.test.com.out
- datanode1: starting datanode, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-datanode-hadoop2.test.com.out
- datanode2: starting datanode, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-datanode-hadoop3.test.com.out
- datanode3: starting datanode, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-datanode-hadoop4.test.com.out
- starting jobtracker, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-jobtracker-hadoop1.test.com.out
- datanode1: starting tasktracker, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-tasktracker-hadoop2.test.com.out
- datanode2: starting tasktracker, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-tasktracker-hadoop3.test.com.out
- datanode3: starting tasktracker, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-tasktracker-hadoop4.test.com.out
- 10 驗證
- //namenode
- [hadoop@hadoop1 logs]$ jps
- 2883 JobTracker
- 3002 Jps
- 2769 NameNode
- //datanode
- [hadoop@hadoop2 ~]$ jps
- 2743 TaskTracker
- 2670 DataNode
- 2857 Jps
- [hadoop@hadoop3 ~]$ jps
- 2742 TaskTracker
- 2856 Jps
- 2669 DataNode
- [hadoop@hadoop4 ~]$ jps
- 2742 TaskTracker
- 2852 Jps
- 2659 DataNode
- Hadoop 監控web頁面
- http://192.168.57.75:50070/dfshealth.jsp
五 簡單驗證HDFSapache
- hadoop 的文件命令格式以下:
- hadoop fs -cmd <args>
- //創建目錄
- [hadoop@hadoop1 hadoop]$ bin/hadoop fs -mkdir /test-hadoop
- //査看目錄
- [hadoop@hadoop1 hadoop]$ bin/hadoop fs -ls /
- Found 2 items
- drwxr-xr-x - hadoop supergroup 0 2012-02-02 13:32 /test-hadoop
- drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp
- //査看目錄包括子目錄
- [hadoop@hadoop1 hadoop]$ bin/hadoop fs -lsr /
- drwxr-xr-x - hadoop supergroup 0 2012-02-02 13:32 /test-hadoop
- drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp
- drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop
- drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred
- drwx------ - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred/system
- -rw------- 2 hadoop supergroup 4 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred/system/jobtracker.info
- //添加文件
- [hadoop@hadoop1 hadoop]$ bin/hadoop fs -put /home/hadoop/hadoop-0.20.203.0rc1.tar.gz /test-hadoop
- [hadoop@hadoop1 hadoop]$ bin/hadoop fs -lsr /
- drwxr-xr-x - hadoop supergroup 0 2012-02-02 13:34 /test-hadoop
- -rw-r--r-- 2 hadoop supergroup 60569605 2012-02-02 13:34 /test-hadoop/hadoop-0.20.203.0rc1.tar.gz
- drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp
- drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop
- drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred
- drwx------ - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred/system
- -rw------- 2 hadoop supergroup 4 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred/system/jobtracker.info
- //獲取文件
- [hadoop@hadoop1 hadoop]$ bin/hadoop fs -get /test-hadoop/hadoop-0.20.203.0rc1.tar.gz /tmp/
- [hadoop@hadoop1 hadoop]$ ls /tmp/*.tar.gz
- /tmp/1.tar.gz /tmp/hadoop-0.20.203.0rc1.tar.gz
- //刪除文件
- [hadoop@hadoop1 hadoop]$ bin/hadoop fs -rm /test-hadoop/hadoop-0.20.203.0rc1.tar.gz
- Deleted hdfs://namenode:9000/test-hadoop/hadoop-0.20.203.0rc1.tar.gz
- [hadoop@hadoop1 hadoop]$ bin/hadoop fs -lsr /
- drwxr-xr-x - hadoop supergroup 0 2012-02-02 13:57 /test-hadoop
- drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp
- drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop
- drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred
- drwx------ - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred/system
- -rw------- 2 hadoop supergroup 4 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred/system/jobtracker.info
- drwxr-xr-x - hadoop supergroup 0 2012-02-02 13:36 /user
- -rw-r--r-- 2 hadoop supergroup 321 2012-02-02 13:36 /user/hadoop
- //刪除目錄
- [hadoop@hadoop1 hadoop]$ bin/hadoop fs -rmr /test-hadoop
- Deleted hdfs://namenode:9000/test-hadoop
- [hadoop@hadoop1 hadoop]$ bin/hadoop fs -lsr /
- drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp
- drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop
- drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred
- drwx------ - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred/system
- -rw------- 2 hadoop supergroup 4 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred/system/jobtracker.info
- drwxr-xr-x - hadoop supergroup 0 2012-02-02 13:36 /user
- -rw-r--r-- 2 hadoop supergroup 321 2012-02-02 13:36 /user/hadoop
- //hadoop fs 幫助(部分)
- [hadoop@hadoop1 hadoop]$ bin/hadoop fs -help
- hadoop fs is the command to execute fs commands. The full syntax is:
- hadoop fs [-fs <local | file system URI>] [-conf <configuration file>]
- [-D <propertyproperty=value>] [-ls <path>] [-lsr <path>] [-du <path>]
- [-dus <path>] [-mv <src> <dst>] [-cp <src> <dst>] [-rm [-skipTrash] <src>]
- [-rmr [-skipTrash] <src>] [-put <localsrc> ... <dst>] [-copyFromLocal <localsrc> ... <dst>]
- [-moveFromLocal <localsrc> ... <dst>] [-get [-ignoreCrc] [-crc] <src> <localdst>
- [-getmerge <src> <localdst> [addnl]] [-cat <src>]
- [-copyToLocal [-ignoreCrc] [-crc] <src> <localdst>] [-moveToLocal <src> <localdst>]
- [-mkdir <path>] [-report] [-setrep [-R] [-w] <rep> <path/file>]
- [-touchz <path>] [-test -[ezd] <path>] [-stat [format] <path>]
- [-tail [-f] <path>] [-text <path>]
- [-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
- [-chown [-R] [OWNER][:[GROUP]] PATH...]
- [-chgrp [-R] GROUP PATH...]
- [-count[-q] <path>]
- [-help [cmd]]
結束
Hadoop 環境搭建步驟繁瑣,須要具有必定的Linux 系統知識,須要注意的是,經過以上步驟搭建的Hadoop 環境只能讓你大致瞭解的hadoop ,若是想將HDFS 用於線上服務,還需對hadoop 配置文檔作進一步配置 ,後續文檔將繼續以博文的形式發佈,敬請期待。服務器