Hadoop基準測試(二)

Hadoop Examplesnode

除了《Hadoop基準測試(一)》提到的測試,Hadoop還自帶了一些例子,好比WordCount和TeraSort,這些例子在hadoop-examples-2.6.0-mr1-cdh5.16.1.jar和hadoop-examples.jar中。執行如下命令:git

hadoop jar hadoop-examples-2.6.0-mr1-cdh5.16.1.jar

會列出全部的示例程序:vim

bash-4.2$ hadoop jar hadoop-examples-2.6.0-mr1-cdh5.16.1.jar
An example program must be given as the first argument.
Valid program names are:
  aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
  aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
  bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
  dbcount: An example job that count the pageview counts from a database.
  distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
  grep: A map/reduce program that counts the matches of a regex in the input.
  join: A job that effects a join over sorted, equally partitioned datasets
  multifilewc: A job that counts words from several files.
  pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
  pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
  randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
  randomwriter: A map/reduce program that writes 10GB of random data per node.
  secondarysort: An example defining a secondary sort to the reduce.
  sort: A map/reduce program that sorts the data written by the random writer.
  sudoku: A sudoku solver.
  teragen: Generate data for the terasort
  terasort: Run the terasort
  teravalidate: Checking results of terasort
  wordcount: A map/reduce program that counts the words in the input files.
  wordmean: A map/reduce program that counts the average length of the words in the input files.
  wordmedian: A map/reduce program that counts the median length of the words in the input files.
  wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.

 單詞統計測試centos

進入角色hdfs建立的文件夾**,執行命令:vim words.txt,輸入內容以下:bash

hello hadoop hbase mytest
hadoop-node1
hadoop-master
hadoop-node2
this is my test

執行命令:app

../bin/hadoop fs -put words.txt /tmp/

將文件上傳到HDFS中,以下:dom

 執行如下命令,使用mapreduce統計指定文件單詞個數,並將結果輸入到指定文件:oop

hadoop jar ../jars/hadoop-examples-2.6.0-mr1-cdh5.16.1.jar wordcount /tmp/words.txt /tmp/words_result.txt

返回以下信息:測試

bash-4.2$ hadoop jar ../jars/hadoop-examples-2.6.0-mr1-cdh5.16.1.jar wordcount /tmp/words.txt /tmp/words_result.txt
19/04/04 13:53:02 INFO client.RMProxy: Connecting to ResourceManager at node1/10.200.101.131:8032
19/04/04 13:53:02 INFO input.FileInputFormat: Total input paths to process : 1
19/04/04 13:53:02 INFO mapreduce.JobSubmitter: number of splits:1
19/04/04 13:53:03 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552358721447_0060
19/04/04 13:53:03 INFO impl.YarnClientImpl: Submitted application application_1552358721447_0060
19/04/04 13:53:03 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1552358721447_0060/
19/04/04 13:53:03 INFO mapreduce.Job: Running job: job_1552358721447_0060
19/04/04 13:53:08 INFO mapreduce.Job: Job job_1552358721447_0060 running in uber mode : false
19/04/04 13:53:08 INFO mapreduce.Job:  map 0% reduce 0%
19/04/04 13:53:13 INFO mapreduce.Job:  map 100% reduce 0%
19/04/04 13:53:20 INFO mapreduce.Job:  map 100% reduce 15%
19/04/04 13:53:21 INFO mapreduce.Job:  map 100% reduce 100%
19/04/04 13:53:22 INFO mapreduce.Job: Job job_1552358721447_0060 completed successfully
19/04/04 13:53:22 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=1092
                FILE: Number of bytes written=7337288
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=172
                HDFS: Number of bytes written=96
                HDFS: Number of read operations=147
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=96
        Job Counters
                Launched map tasks=1
                Launched reduce tasks=48
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=2607
                Total time spent by all reduces in occupied slots (ms)=265156
                Total time spent by all map tasks (ms)=2607
                Total time spent by all reduce tasks (ms)=265156
                Total vcore-milliseconds taken by all map tasks=2607
                Total vcore-milliseconds taken by all reduce tasks=265156
                Total megabyte-milliseconds taken by all map tasks=2669568
                Total megabyte-milliseconds taken by all reduce tasks=271519744
        Map-Reduce Framework
                Map input records=5
                Map output records=10
                Map output bytes=116
                Map output materialized bytes=900
                Input split bytes=96
                Combine input records=10
                Combine output records=10
                Reduce input groups=10
                Reduce shuffle bytes=900
                Reduce input records=10
                Reduce output records=10
                Spilled Records=20
                Shuffled Maps =48
                Failed Shuffles=0
                Merged Map outputs=48
                GC time elapsed (ms)=10404
                CPU time spent (ms)=63490
                Physical memory (bytes) snapshot=15954518016
                Virtual memory (bytes) snapshot=142149648384
                Total committed heap usage (bytes)=30278156288
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=76
        File Output Format Counters
                Bytes Written=96

在hdfs目錄下保存了任務的結果文件:this

結果記錄條目從0計數到47,共計48條:

 

 

每個part對應一個Reduce:

 執行命令,查看任務執行後的結果:

bash-4.2$ hadoop fs -cat hdfs:///tmp/words_result.txt/part-r-*****

返回結果以下:

bash-4.2$ hadoop fs -cat hdfs:///tmp/words_result.txt/part-r-00000
bash-4.2$ hadoop fs -cat hdfs:///tmp/words_result.txt/part-r-00011
is      1
bash-4.2$ hadoop fs -cat hdfs:///tmp/words_result.txt/part-r-00015
this    1
bash-4.2$ hadoop fs -cat hdfs:///tmp/words_result.txt/part-r-00022
hadoop  1
bash-4.2$ hadoop fs -cat hdfs:///tmp/words_result.txt/part-r-00024
hbase   1
bash-4.2$ hadoop fs -cat hdfs:///tmp/words_result.txt/part-r-00040
hadoop-node1    1
bash-4.2$ hadoop fs -cat hdfs:///tmp/words_result.txt/part-r-00041
hadoop-master   1
hadoop-node2    1
bash-4.2$ hadoop fs -cat hdfs:///tmp/words_result.txt/part-r-00045
my      1
bash-4.2$ hadoop fs -cat hdfs:///tmp/words_result.txt/part-r-00047
mytest  1

 

參考: https://jeoygin.org/2012/02/22/running-hadoop-on-centos-single-node-cluster/

相關文章
相關標籤/搜索