<configuration>
<!--指定namenode的地址-->
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
<description>HDFS的URI,文件系統://namenode標識:端口號</description>
</property>
<!--用來指定使用hadoop時產生文件的存放目錄-->
<property>
<name>hadoop.tmp.dir</name>
<value>/D:/bigdata/hadoop-2.7.4/workplace/tmp</value>
<description>namenode上本地的hadoop臨時文件夾</description>
</property>
</configuration>
複製代碼
<configuration>
<!--指定hdfs保存數據的副本數量-->
<property>
<name>dfs.replication</name>
<value>1</value>
<description>副本個數,配置默認是3,應小於datanode機器數量</description>
</property>
<property>
<name>dfs.name.dir</name>
<value>/D:/bigdata/hadoop-2.7.4/workplace/name</value>
<description>namenode上存儲hdfs名字空間元數據 </description>
</property>
<property>
<name>dfs.data.dir</name>
<value>/D:/bigdata/hadoop-2.7.4/workplace/data</value>
<description>datanode上數據塊的物理存儲位置</description>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
<description>WebHDFS接口</description>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
<description>HDFS權限控制</description>
</property>
</configuration>
複製代碼
<configuration>
<!--nomenodeManager獲取數據的方式是shuffle-->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 我電腦的內存是8G,通常設置到6G左右,我爲了方便直接寫了8G的實際大小 -->
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>8192</value>
</property>
<!-- 這個參數是MR任務最小的分配內存 -->
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1536</value>
</property>
<!-- 這個參數是MR任務最大的分配內存 -->
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>4096</value>
</property>
<!-- 這個參數是MR任務使用的CPU數 -->
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>2</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
複製代碼
<configuration>
<!--告訴hadoop之後MR運行在YARN上-->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>2048</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>2048</value>
</property>
<property>
<name>mapreduce.jobtracker.http.address</name>
<value>localhost:50030</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>localhost:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>localhost:19888</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>http://localhost:9001</value>
</property>
</configuration>
複製代碼
D:\bigdata\hadoop-2.7.4\etc\hadoop\
複製代碼
D:\bigdata\hadoop-2.7.4
複製代碼
D:\bigdata\hadoop-2.7.4\bin
複製代碼
hadoop namenode -format
複製代碼
cd /d D:\bigdata\hadoop-2.7.4\sbin
dir
## 推薦啓動步驟
start-dfs.cmd
start-yarn.cmd
## 粗暴
start-all.cmd
複製代碼
jps
# 看看對應的組件是否啓動成功並佔有進程ID號
複製代碼
# 能夠查看有什麼測試的方法
hadoop jar D:\bigdata\hadoop-2.7.4\share\hadoop\mapreduce\hadoop-mapreduce-examples-2.7.4.jar -info
# 選用最常用的PI測試(3個Task, 100個取樣個數,兩數相乘爲總樣本數)
hadoop jar D:\bigdata\hadoop-2.7.4\share\hadoop\mapreduce\hadoop-mapreduce-examples-2.7.4.jar PI 3 100
複製代碼
Usage: $HADOOP_PREFIX/bin/hadoop jar hadoop-streaming.jar [options]
Options:
-input <path> DFS input file(s) for the Map step.
-output <path> DFS output directory for the Reduce step.
-mapper <cmd|JavaClassName> Optional. Command to be run as mapper.
-combiner <cmd|JavaClassName> Optional. Command to be run as combiner.
-reducer <cmd|JavaClassName> Optional. Command to be run as reducer.
-file <file> Optional. File/dir to be shipped in the Job jar file.
Deprecated. Use generic option "-files" instead.
-inputformat <TextInputFormat(default)|SequenceFileAsTextInputFormat|JavaClassName>
Optional. The input format class.
-outputformat <TextOutputFormat(default)|JavaClassName>
Optional. The output format class.
-partitioner <JavaClassName> Optional. The partitioner class.
-numReduceTasks <num> Optional. Number of reduce tasks.
-inputreader <spec> Optional. Input recordreader spec.
-cmdenv <n>=<v> Optional. Pass env.var to streaming commands.
-mapdebug <cmd> Optional. To run this script when a map task fails.
-reducedebug <cmd> Optional. To run this script when a reduce task fails.
-io <identifier> Optional. Format to use for input to and output
from mapper/reducer commands
-lazyOutput Optional. Lazily create Output.
-background Optional. Submit the job and don't wait till it completes.
-verbose Optional. Print verbose output.
-info Optional. Print detailed usage.
-help Optional. Print help message.
Generic options supported are
-conf <configuration file> specify an application configuration file
-D <property=value> use value for given property
-fs <local|namenode:port> specify a namenode
-jt <local|resourcemanager:port> specify a ResourceManager
-files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars> specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines.
The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]
Usage tips:
In -input: globbing on <path> is supported and can have multiple -input
Default Map input format: a line is a record in UTF-8 the key part ends at first
TAB, the rest of the line is the value
To pass a Custom input format:
-inputformat package.MyInputFormat
Similarly, to pass a custom output format:
-outputformat package.MyOutputFormat
The files with extensions .class and .jar/.zip, specified for the -file
argument[s], end up in "classes" and "lib" directories respectively inside
the working directory when the mapper and reducer are run. All other files
specified for the -file argument[s] end up in the working directory when the
mapper and reducer are run. The location of this working directory is
unspecified.
To set the number of reduce tasks (num. of output files) as, say 10:
Use -numReduceTasks 10
To skip the sort/combine/shuffle/sort/reduce step:
Use -numReduceTasks 0
Map output then becomes a 'side-effect output' rather than a reduce input.
This speeds up processing. This also feels more like "in-place" processing
because the input filename and the map input order are preserved.
This is equivalent to -reducer NONE
To speed up the last maps:
-D mapreduce.map.speculative=true
To speed up the last reduces:
-D mapreduce.reduce.speculative=true
To name the job (appears in the JobTracker Web UI):
-D mapreduce.job.name='My Job'
To change the local temp directory:
-D dfs.data.dir=/tmp/dfs
-D stream.tmpdir=/tmp/streaming
Additional local temp directories with -jt local:
-D mapreduce.cluster.local.dir=/tmp/local
-D mapreduce.jobtracker.system.dir=/tmp/system
-D mapreduce.cluster.temp.dir=/tmp/temp
To treat tasks with non-zero exit status as SUCCEDED:
-D stream.non.zero.exit.is.failure=false
Use a custom hadoop streaming build along with standard hadoop install:
$HADOOP_PREFIX/bin/hadoop jar /path/my-hadoop-streaming.jar [...]\
[...] -D stream.shipped.hadoopstreaming=/path/my-hadoop-streaming.jar
For more details about jobconf parameters see:
http://wiki.apache.org/hadoop/JobConfFile
To set an environement variable in a streaming command:
-cmdenv EXAMPLE_DIR=/home/example/dictionaries/
Shortcut:
setenv HSTREAMING "$HADOOP_PREFIX/bin/hadoop jar hadoop-streaming.jar"
Example: $HSTREAMING -mapper "/usr/local/bin/perl5 filter.pl"
-file /local/filter.pl -input "/logs/0604*/*" [...]
Ships a script, invokes the non-shipped perl interpreter. Shipped files go to
the working directory so filter.pl is found by perl. Input files are all the
daily logs for days in month 2006-04
複製代碼
["milton-paradise.txt", "[ Paradise Lost by John Milton 1667 ] Book I Of Man ' s first disobedience , and the fruit Of that forbidden tree whose mortal taste Brought death into the World , and all our woe , With loss of Eden , till one greater Man Restore us , and regain the blissful seat , Sing , Heavenly Muse , that , on the secret top Of Oreb , or of Sinai , didst inspire That shepherd who first taught the chosen seed In the beginning how the heavens and earth Rose out of Chaos : or , if Sion hill Delight thee more , and Siloa ' s brook that flowed Fast by the oracle of God , I thence Invoke thy aid to my adventurous song , That with no middle flight intends to soar Above th ' Aonian mount , while it pursues Things unattempted yet in prose or rhyme ."]
["edgeworth-parents.txt", "[ The Parent ' s Assistant , by Maria Edgeworth ] THE ORPHANS . Near the ruins of the castle of Rossmore , in Ireland , is a small cabin , in which there once lived a widow and her four children . As long as she was able to work , she was very industrious , and was accounted the best spinner in the parish ; but she overworked herself at last , and fell ill , so that she could not sit to her wheel as she used to do , and was obliged to give it up to her eldest daughter , Mary ."]
["austen-emma.txt", "[ Emma by Jane Austen 1816 ] VOLUME I CHAPTER I Emma Woodhouse , handsome , clever , and rich , with a comfortable home and happy disposition , seemed to unite some of the best blessings of existence ; and had lived nearly twenty - one years in the world with very little to distress or vex her . She was the youngest of the two daughters of a most affectionate , indulgent father ; and had , in consequence of her sister ' s marriage , been mistress of his house from a very early period . Her mother had died too long ago for her to have more than an indistinct remembrance of her caresses ; and her place had been supplied by an excellent woman as governess , who had fallen little short of a mother in affection ."]
["chesterton-ball.txt", "[ The Ball and The Cross by G . K . Chesterton 1909 ] I . A DISCUSSION SOMEWHAT IN THE AIR The flying ship of Professor Lucifer sang through the skies like a silver arrow ; the bleak white steel of it , gleaming in the bleak blue emptiness of the evening . That it was far above the earth was no expression for it ; to the two men in it , it seemed to be far above the stars . The professor had himself invented the flying machine , and had also invented nearly everything in it ."]
["bible-kjv.txt", "[ The King James Bible ] The Old Testament of the King James Bible The First Book of Moses : Called Genesis 1 : 1 In the beginning God created the heaven and the earth . 1 : 2 And the earth was without form , and void ; and darkness was upon the face of the deep . And the Spirit of God moved upon the face of the waters . 1 : 3 And God said , Let there be light : and there was light . 1 : 4 And God saw the light , that it was good : and God divided the light from the darkness . 1 : 5 And God called the light Day , and the darkness he called Night . And the evening and the morning were the first day ."]
["chesterton-thursday.txt", "[ The Man Who Was Thursday by G . K . Chesterton 1908 ] To Edmund Clerihew Bentley A cloud was on the mind of men , and wailing went the weather , Yea , a sick cloud upon the soul when we were boys together . Science announced nonentity and art admired decay ; The world was old and ended : but you and I were gay ; Round us in antic order their crippled vices came -- Lust that had lost its laughter , fear that had lost its shame . Like the white lock of Whistler , that lit our aimless gloom , Men showed their own white feather as proudly as a plume . Life was a fly that faded , and death a drone that stung ; The world was very old indeed when you and I were young ."]
["blake-poems.txt", "[ Poems by William Blake 1789 ] SONGS OF INNOCENCE AND OF EXPERIENCE and THE BOOK of THEL SONGS OF INNOCENCE INTRODUCTION Piping down the valleys wild , Piping songs of pleasant glee , On a cloud I saw a child , And he laughing said to me : \" Pipe a song about a Lamb !\" So I piped with merry cheer . \" Piper , pipe that song again ;\" So I piped : he wept to hear . \" Drop thy pipe , thy happy pipe ; Sing thy songs of happy cheer :!\" So I sang the same again , While he wept with joy to hear . \" Piper , sit thee down and write In a book , that all may read .\" So he vanish ' d from my sight ; And I pluck ' d a hollow reed , And I made a rural pen , And I stain ' d the water clear , And I wrote my happy songs Every child may joy to hear ."]
["shakespeare-caesar.txt", "[ The Tragedie of Julius Caesar by William Shakespeare 1599 ] Actus Primus . Scoena Prima . Enter Flauius , Murellus , and certaine Commoners ouer the Stage . Flauius . Hence : home you idle Creatures , get you home : Is this a Holiday ? What , know you not ( Being Mechanicall ) you ought not walke Vpon a labouring day , without the signe Of your Profession ? Speake , what Trade art thou ? Car . Why Sir , a Carpenter Mur . Where is thy Leather Apron , and thy Rule ? What dost thou with thy best Apparrell on ? You sir , what Trade are you ? Cobl . Truely Sir , in respect of a fine Workman , I am but as you would say , a Cobler Mur . But what Trade art thou ? Answer me directly Cob . A Trade Sir , that I hope I may vse , with a safe Conscience , which is indeed Sir , a Mender of bad soules Fla ."]
["whitman-leaves.txt", "[ Leaves of Grass by Walt Whitman 1855 ] Come , said my soul , Such verses for my Body let us write , ( for we are one ,) That should I after return , Or , long , long hence , in other spheres , There to some group of mates the chants resuming , ( Tallying Earth ' s soil , trees , winds , tumultuous waves ,) Ever with pleas ' d smile I may keep on , Ever and ever yet the verses owning -- as , first , I here and now Signing for Soul and Body , set to them my name , Walt Whitman [ BOOK I . INSCRIPTIONS ] } One ' s - Self I Sing One ' s - self I sing , a simple separate person , Yet utter the word Democratic , the word En - Masse ."]
["melville-moby_dick.txt", "[ Moby Dick by Herman Melville 1851 ] ETYMOLOGY . ( Supplied by a Late Consumptive Usher to a Grammar School ) The pale Usher -- threadbare in coat , heart , body , and brain ; I see him now . He was ever dusting his old lexicons and grammars , with a queer handkerchief , mockingly embellished with all the gay flags of all the known nations of the world . He loved to dust his old grammars ; it somehow mildly reminded him of his mortality ."]
複製代碼
任務需求:node
Data:
["test_1.txt", "[ apple pipe ]"]
["test_2.txt", "[ apple company ]"]
Result:
apple ["test_1.txt", "test_2.txt"]
pipe ["test_1.txt"]
company ["test_2.txt"]
複製代碼
mapper.py:python
#!/usr/bin/env python
# -*- coding: UTF-8 -*-
""" Created on 2017年10月30日 @author: Leo """
# Python內部庫
import sys
import json
for line in sys.stdin:
line = line.strip()
record = json.loads(line)
file_name = record[0]
value = record[1]
words = value.split()
for word in words:
print("%s\t%s" % (word, file_name))
複製代碼
#!/usr/bin/env python
# -*- coding: UTF-8 -*-
""" Created on 2017年10月30日 @author: Leo """
# Python內部庫
import sys
media = {}
word_in_media = {}
# maps words to their counts
for line in sys.stdin:
(word, file_name) = line.strip().split('\t', 1)
media.setdefault(word, [])
media[word].append(file_name)
for word in media:
word_in_media.setdefault(word, list(set(media[word])))
for word in word_in_media:
print("%s\t%s" % (word, word_in_media[word]))
複製代碼
將books.json上傳到HDFSlinux
# 若是還沒在HDFS上建立過文件夾的話,須要先建立文件夾
# 如下是本人建立的方式,能夠自行建立
hdfs dfs -mkdir -p /user/Leo/input
# 從本地上傳到HDFS中
hdfs dfs -copyFromLocal <絕對路徑>\books.json /user/Leo/input/
# 打開localhost:50070的HDFS頁面後查看文件是否存在
複製代碼
# 這個是若是Mapreduce執行過程當中出錯,解決後再出錯的時候記得刪除output文件夾
hdfs dfs -rm -r /user/Leo/output
複製代碼
# 不正常的操做觸發了HDFS啓動了安全模式
hdfs dfsadmin -safemode leave
複製代碼
# 如下代碼過長,我用linux命令換行的方式進行展現
hadoop jar D:/bigdata/hadoop-2.7.4/share/hadoop/tools/lib/hadoop-streaming-2.7.4.jar \
-D stream.non.zero.exit.is.failure=false \
-input /user/Leo/input/books.json \
-output /user/Leo/output \
-mapper "python mapper.py" \
-reducer "python reducer.py" \
-file C:/Users/Administrator/Desktop/MingDong_Work/Work_2/mapper.py \
-file C:/Users/Administrator/Desktop/MingDong_Work/Work_2/reducer.py
# 解釋:
一、jar 後面跟的是Jar包的路徑,官方提倡用環境變量加路徑的方式,我這爲了演示用了絕對路徑進行展現
二、-D stream.non.zero.exit.is.failure=false 這句話的意思是若是函數返回值(即mapper或reducer沒有return 0,則函數爲異常結果.加了這句就能夠跳過檢查.)
三、input: 就是HDFS的文件
四、output: 就是M-R任務結束後的文件存放的地方
五、mapper: 指定執行mapper的腳本或代碼
六、reducer: 指定執行reducer的腳本或代碼
七、-file: 指定代碼的位置(多個文件用-files,爲了展現更清晰,我用了舊版的-file的形式進行展現)
複製代碼
hdfs -dfs -get <HDFS文件路徑>(/user/Leo/output/part-00000) <本地路徑>
複製代碼
優勢: (方便,快捷)git
缺點: (主要都是性能問題)github