Hadoop Examplesnode
除了《Hadoop基準測試(一)》提到的測試,Hadoop還自帶了一些例子,好比WordCount和TeraSort,這些例子在hadoop-examples-2.6.0-mr1-cdh5.16.1.jar和hadoop-examples.jar中。執行如下命令:git
hadoop jar hadoop-examples-2.6.0-mr1-cdh5.16.1.jar
會列出全部的示例程序:vim
bash-4.2$ hadoop jar hadoop-examples-2.6.0-mr1-cdh5.16.1.jar An example program must be given as the first argument. Valid program names are: aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files. aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files. bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi. dbcount: An example job that count the pageview counts from a database. distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi. grep: A map/reduce program that counts the matches of a regex in the input. join: A job that effects a join over sorted, equally partitioned datasets multifilewc: A job that counts words from several files. pentomino: A map/reduce tile laying program to find solutions to pentomino problems. pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method. randomtextwriter: A map/reduce program that writes 10GB of random textual data per node. randomwriter: A map/reduce program that writes 10GB of random data per node. secondarysort: An example defining a secondary sort to the reduce. sort: A map/reduce program that sorts the data written by the random writer. sudoku: A sudoku solver. teragen: Generate data for the terasort terasort: Run the terasort teravalidate: Checking results of terasort wordcount: A map/reduce program that counts the words in the input files. wordmean: A map/reduce program that counts the average length of the words in the input files. wordmedian: A map/reduce program that counts the median length of the words in the input files. wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.
單詞統計測試centos
進入角色hdfs建立的文件夾**,執行命令:vim words.txt,輸入內容以下:bash
hello hadoop hbase mytest hadoop-node1 hadoop-master hadoop-node2 this is my test
執行命令:app
../bin/hadoop fs -put words.txt /tmp/
將文件上傳到HDFS中,以下:dom
執行如下命令,使用mapreduce統計指定文件單詞個數,並將結果輸入到指定文件:oop
hadoop jar ../jars/hadoop-examples-2.6.0-mr1-cdh5.16.1.jar wordcount /tmp/words.txt /tmp/words_result.txt
返回以下信息:測試
bash-4.2$ hadoop jar ../jars/hadoop-examples-2.6.0-mr1-cdh5.16.1.jar wordcount /tmp/words.txt /tmp/words_result.txt 19/04/04 13:53:02 INFO client.RMProxy: Connecting to ResourceManager at node1/10.200.101.131:8032 19/04/04 13:53:02 INFO input.FileInputFormat: Total input paths to process : 1 19/04/04 13:53:02 INFO mapreduce.JobSubmitter: number of splits:1 19/04/04 13:53:03 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552358721447_0060 19/04/04 13:53:03 INFO impl.YarnClientImpl: Submitted application application_1552358721447_0060 19/04/04 13:53:03 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1552358721447_0060/ 19/04/04 13:53:03 INFO mapreduce.Job: Running job: job_1552358721447_0060 19/04/04 13:53:08 INFO mapreduce.Job: Job job_1552358721447_0060 running in uber mode : false 19/04/04 13:53:08 INFO mapreduce.Job: map 0% reduce 0% 19/04/04 13:53:13 INFO mapreduce.Job: map 100% reduce 0% 19/04/04 13:53:20 INFO mapreduce.Job: map 100% reduce 15% 19/04/04 13:53:21 INFO mapreduce.Job: map 100% reduce 100% 19/04/04 13:53:22 INFO mapreduce.Job: Job job_1552358721447_0060 completed successfully 19/04/04 13:53:22 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=1092 FILE: Number of bytes written=7337288 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=172 HDFS: Number of bytes written=96 HDFS: Number of read operations=147 HDFS: Number of large read operations=0 HDFS: Number of write operations=96 Job Counters Launched map tasks=1 Launched reduce tasks=48 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=2607 Total time spent by all reduces in occupied slots (ms)=265156 Total time spent by all map tasks (ms)=2607 Total time spent by all reduce tasks (ms)=265156 Total vcore-milliseconds taken by all map tasks=2607 Total vcore-milliseconds taken by all reduce tasks=265156 Total megabyte-milliseconds taken by all map tasks=2669568 Total megabyte-milliseconds taken by all reduce tasks=271519744 Map-Reduce Framework Map input records=5 Map output records=10 Map output bytes=116 Map output materialized bytes=900 Input split bytes=96 Combine input records=10 Combine output records=10 Reduce input groups=10 Reduce shuffle bytes=900 Reduce input records=10 Reduce output records=10 Spilled Records=20 Shuffled Maps =48 Failed Shuffles=0 Merged Map outputs=48 GC time elapsed (ms)=10404 CPU time spent (ms)=63490 Physical memory (bytes) snapshot=15954518016 Virtual memory (bytes) snapshot=142149648384 Total committed heap usage (bytes)=30278156288 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=76 File Output Format Counters Bytes Written=96
在hdfs目錄下保存了任務的結果文件:this
結果記錄條目從0計數到47,共計48條:
每個part對應一個Reduce:
執行命令,查看任務執行後的結果:
bash-4.2$ hadoop fs -cat hdfs:///tmp/words_result.txt/part-r-*****
返回結果以下:
bash-4.2$ hadoop fs -cat hdfs:///tmp/words_result.txt/part-r-00000 bash-4.2$ hadoop fs -cat hdfs:///tmp/words_result.txt/part-r-00011 is 1 bash-4.2$ hadoop fs -cat hdfs:///tmp/words_result.txt/part-r-00015 this 1 bash-4.2$ hadoop fs -cat hdfs:///tmp/words_result.txt/part-r-00022 hadoop 1 bash-4.2$ hadoop fs -cat hdfs:///tmp/words_result.txt/part-r-00024 hbase 1 bash-4.2$ hadoop fs -cat hdfs:///tmp/words_result.txt/part-r-00040 hadoop-node1 1 bash-4.2$ hadoop fs -cat hdfs:///tmp/words_result.txt/part-r-00041 hadoop-master 1 hadoop-node2 1 bash-4.2$ hadoop fs -cat hdfs:///tmp/words_result.txt/part-r-00045 my 1 bash-4.2$ hadoop fs -cat hdfs:///tmp/words_result.txt/part-r-00047 mytest 1
參考: https://jeoygin.org/2012/02/22/running-hadoop-on-centos-single-node-cluster/