本文已同步至我的博客 liaosi's blog-Hadoop(二)Hadoop的HelloWorld(單機模式下的安裝和使用)本文示例使用的VMWare虛擬機,Linux系統版本是CentOS 7_64位,Hadoop的版本是Hadoop 2.8.2,JDK版本是1.8,使用的帳號是建立的hadoop帳號(參考Hadoop(一)Hadoop的介紹和安裝前準備)。
安裝Hadoop以前要保證系統已經安裝了Java JDK,並配置好了Java環境變量。html
Hadoop集羣有三種啓動模式:java
本文內容便是單機模式的示例。node
1.從官網上 http://hadoop.apache.org/rele... 下載,並解壓到服務器的某個目錄下(此處我登陸的用戶是hadoop,解壓到${HOME}/app目錄下)。
git
2.在Hadoop的運行環境配置文件中配置Java的安裝目錄
編輯 ${HADOOP_HOME}/etc/hadoop/hadoop-env.sh
文件,將JAVA_HOME設置爲Java安裝根路徑。
正則表達式
3.配置Hadoop的環境變量
在/etc/profile
文件中增長:apache
export HADOOP_HOME=/opt/hadoop-2.8.1 export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
好比個人/etc/profile
設置成以下圖:
服務器
4.執行hadoop version
命令,驗證驗證環境變量是否配置成功,正常狀況下會看到相似以下的結果:hexo
[hadoop@server01 hadoop]$ hadoop version Hadoop 2.8.2 Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 66c47f2a01ad9637879e95f80c41f798373828fb Compiled by jdu on 2017-10-19T20:39Z Compiled with protoc 2.5.0 From source with checksum dce55e5afe30c210816b39b631a53b1d This command was run using /home/hadoop/app/hadoop-2.8.2/share/hadoop/common/hadoop-common-2.8.2.jar [hadoop@server01 hadoop]$
Hadoop自帶了一個MapReduce程序$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.2.jar
,它做爲一個例子提供了MapReduce的基本功能,而且能夠用於計算,包括 wordcount、terasort、join、grep 等。app
以經過執行以下命令查看該.jar
文件支持哪些MapReduce功能。dom
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.2.jar
[hadoop@server01 mapreduce]$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.2.jar An example program must be given as the first argument. Valid program names are: aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files. aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files. bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi. dbcount: An example job that count the pageview counts from a database. distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi. grep: A map/reduce program that counts the matches of a regex in the input. join: A job that effects a join over sorted, equally partitioned datasets multifilewc: A job that counts words from several files. pentomino: A map/reduce tile laying program to find solutions to pentomino problems. pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method. randomtextwriter: A map/reduce program that writes 10GB of random textual data per node. randomwriter: A map/reduce program that writes 10GB of random data per node. secondarysort: An example defining a secondary sort to the reduce. sort: A map/reduce program that sorts the data written by the random writer. sudoku: A sudoku solver. teragen: Generate data for the terasort terasort: Run the terasort teravalidate: Checking results of terasort wordcount: A map/reduce program that counts the words in the input files. wordmean: A map/reduce program that counts the average length of the words in the input files. wordmedian: A map/reduce program that counts the median length of the words in the input files. wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files. [hadoop@server01 mapreduce]$
1.建立一個目錄用來存放咱們要處理的數據,能夠建立在任何地方(這裏我是在/home/hadoop/hadoopdata
的目錄下建立一個input的目錄),並把想要計算分析的文件放到這個目錄下(這裏我把Hadoop的配置文件複製一份到input目錄下)。
cd /home/hadoop/hadoopdata mkdir input cp /home/hadoop/app/hadoop-2.8.2/etc/hadoop/*.xml input ls -l input
[hadoop@server01 hadoopdata]$ cp /home/hadoop/app/hadoop-2.8.2/etc/hadoop/*.xml input [hadoop@server01 hadoopdata]$ ll input total 52 -rw-r--r--. 1 hadoop hadoop 4942 Apr 30 11:43 capacity-scheduler.xml -rw-r--r--. 1 hadoop hadoop 1144 Apr 30 11:43 core-site.xml -rw-r--r--. 1 hadoop hadoop 9683 Apr 30 11:43 hadoop-policy.xml -rw-r--r--. 1 hadoop hadoop 854 Apr 30 11:43 hdfs-site.xml -rw-r--r--. 1 hadoop hadoop 620 Apr 30 11:43 httpfs-site.xml -rw-r--r--. 1 hadoop hadoop 3518 Apr 30 11:43 kms-acls.xml -rw-r--r--. 1 hadoop hadoop 5546 Apr 30 11:43 kms-site.xml -rw-r--r--. 1 hadoop hadoop 871 Apr 30 11:43 mapred-site.xml -rw-r--r--. 1 hadoop hadoop 1067 Apr 30 11:43 yarn-site.xml [hadoop@server01 hadoopdata]$
2.在這個例子中,咱們將 input 文件夾中的全部文件做爲輸入,篩選當中符合正則表達式 dfs[a-z.]+ 的單詞並統計出現的次數,在/home/hadoop/hadoopdata
目錄下執行以下命令啓動Hadoop進程。
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.2.jar grep input output 'dfs[a-z.]+'
執行成功的話,會打印一系列處理的信息,處理的結果會輸出到 output 文件夾中,經過命令 cat output/* 查看結果,符合正則的單詞 dfsadmin 出現了1次:
Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=123 File Output Format Counters Bytes Written=23 [hadoop@server01 hadoopdata]$ cat output/* 1 dfsadmin [hadoop@server01 hadoopdata]$ ll output/ total 4 -rw-r--r--. 1 hadoop hadoop 11 Apr 30 12:51 part-r-00000 -rw-r--r--. 1 hadoop hadoop 0 Apr 30 12:51 _SUCCESS [hadoop@server04 hadoopdata]$
注意,Hadoop 默認不會覆蓋結果文件,所以再次運行一個命令而且結果也是輸出到output目錄則會提示出錯,須要先將 output 目錄刪除。
3.刪除output目錄後咱們使用命令在計算一下單詞數:
[hadoop@server04 hadoopdata]$ rm -rf output/ [hadoop@server04 hadoopdata]$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.2.jar wordcount input output
查看結果以下:
File Input Format Counters Bytes Read=26548 File Output Format Counters Bytes Written=10400 [hadoop@server04 hadoopdata]$ cat output/* "*" 18 "AS 8 "License"); 8 "alice,bob 18 "clumping" 1 "kerberos". 1 "simple" 1 'HTTP/' 1 'none' 1 'random' 1
這樣咱們就利用Hadoop自帶的MapReduce程序成功地運行了它計算單詞個數的功能。