測試對於驗證系統的正確性、分析系統的性能來講很是重要,但每每容易被咱們所忽視。爲了能對系統有更全面的瞭解、能找到系統的瓶頸所在、能對系統性能作更好的改進,打算先從測試入手,學習Hadoop主要的測試手段。html
TestDFSIOjava
TestDFSIO用於測試HDFS的IO性能,使用一個MapReduce做業來併發地執行讀寫操做,每一個map任務用於讀或寫每一個文件,map的輸出用於收集與處理文件相關的統計信息,reduce用於累積統計信息,併產生summary。node
NameNode的地址爲:10.*.*.131:7180bash
輸入命令 hadoop version,提示hadoop jar包所在路徑併發
進入jar包所在路徑,輸入命令 hadoop jar hadoop-test-2.6.0-mr1-cdh5.16.1.jar,返回以下信息:app
An example program must be given as the first argument. Valid program names are: DFSCIOTest: Distributed i/o benchmark of libhdfs. DistributedFSCheck: Distributed checkup of the file system consistency. MRReliabilityTest: A program that tests the reliability of the MR framework by injecting faults/failures TestDFSIO: Distributed i/o benchmark. dfsthroughput: measure hdfs throughput filebench: Benchmark SequenceFile(Input|Output)Format (block,record compressed and uncompressed), Text(Input|Output)Format (compressed and uncompressed) loadgen: Generic map/reduce load generator mapredtest: A map/reduce test check. minicluster: Single process HDFS and MR cluster. mrbench: A map/reduce benchmark that can create many small jobs nnbench: A benchmark that stresses the namenode. testarrayfile: A test for flat files of binary key/value pairs. testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce testfilesystem: A test for FileSystem read/write. testmapredsort: A map/reduce program that validates the map-reduce framework's sort. testrpc: A test for rpc. testsequencefile: A test for flat files of binary key value pairs. testsequencefileinputformat: A test for sequence file input format. testsetfile: A test for flat files of binary key/value pairs. testtextinputformat: A test for text input format. threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spill
輸入並執行命令 hadoop jar hadoop-test-2.6.0-mr1-cdh5.16.1.jar TestDFSIO -write -nrFiles 10 -fileSize 1000dom
返回以下信息:ide
19/04/02 16:22:30 INFO fs.TestDFSIO: TestDFSIO.1.7 19/04/02 16:22:30 INFO fs.TestDFSIO: nrFiles = 10 19/04/02 16:22:30 INFO fs.TestDFSIO: nrBytes (MB) = 1000.0 19/04/02 16:22:30 INFO fs.TestDFSIO: bufferSize = 1000000 19/04/02 16:22:30 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO 19/04/02 16:22:31 INFO fs.TestDFSIO: creating control file: 1048576000 bytes, 10 files java.io.IOException: Permission denied: user=root, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
報錯! java.io.IOException: Permission denied: user=root, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-xoop
執行命令 su hdfs 切換用戶爲 hdfs性能
輸入並執行命令 hadoop jar hadoop-test-2.6.0-mr1-cdh5.16.1.jar TestDFSIO -write -nrFiles 10 -fileSize 1000
返回以下信息:
bash-4.2$ hadoop jar hadoop-test-2.6.0-mr1-cdh5.16.1.jar TestDFSIO -write -nrFiles 10 -fileSize 1000 19/04/02 16:26:39 INFO fs.TestDFSIO: TestDFSIO.1.7 19/04/02 16:26:39 INFO fs.TestDFSIO: nrFiles = 10 19/04/02 16:26:39 INFO fs.TestDFSIO: nrBytes (MB) = 1000.0 19/04/02 16:26:39 INFO fs.TestDFSIO: bufferSize = 1000000 19/04/02 16:26:39 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO 19/04/02 16:26:40 INFO fs.TestDFSIO: creating control file: 1048576000 bytes, 10 files 19/04/02 16:26:40 INFO fs.TestDFSIO: created control files for: 10 files 19/04/02 16:26:40 INFO client.RMProxy: Connecting to ResourceManager at node1/10.200.101.131:8032 19/04/02 16:26:40 INFO client.RMProxy: Connecting to ResourceManager at node1/10.200.101.131:8032 19/04/02 16:26:41 INFO mapred.FileInputFormat: Total input paths to process : 10 19/04/02 16:26:41 INFO mapreduce.JobSubmitter: number of splits:10 19/04/02 16:26:41 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum 19/04/02 16:26:41 INFO Configuration.deprecation: dfs.https.address is deprecated. Instead, use dfs.namenode.https-address 19/04/02 16:26:41 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552358721447_0002 19/04/02 16:26:41 INFO impl.YarnClientImpl: Submitted application application_1552358721447_0002 19/04/02 16:26:41 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1552358721447_0002/ 19/04/02 16:26:41 INFO mapreduce.Job: Running job: job_1552358721447_0002 19/04/02 16:26:48 INFO mapreduce.Job: Job job_1552358721447_0002 running in uber mode : false 19/04/02 16:26:48 INFO mapreduce.Job: map 0% reduce 0% 19/04/02 16:27:02 INFO mapreduce.Job: map 30% reduce 0% 19/04/02 16:27:03 INFO mapreduce.Job: map 100% reduce 0% 19/04/02 16:27:08 INFO mapreduce.Job: map 100% reduce 100% 19/04/02 16:27:08 INFO mapreduce.Job: Job job_1552358721447_0002 completed successfully 19/04/02 16:27:08 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=379 FILE: Number of bytes written=1653843 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=2310 HDFS: Number of bytes written=10485760082 HDFS: Number of read operations=43 HDFS: Number of large read operations=0 HDFS: Number of write operations=12 Job Counters Launched map tasks=10 Launched reduce tasks=1 Data-local map tasks=10 Total time spent by all maps in occupied slots (ms)=128477 Total time spent by all reduces in occupied slots (ms)=2621 Total time spent by all map tasks (ms)=128477 Total time spent by all reduce tasks (ms)=2621 Total vcore-milliseconds taken by all map tasks=128477 Total vcore-milliseconds taken by all reduce tasks=2621 Total megabyte-milliseconds taken by all map tasks=131560448 Total megabyte-milliseconds taken by all reduce tasks=2683904 Map-Reduce Framework Map input records=10 Map output records=50 Map output bytes=784 Map output materialized bytes=1033 Input split bytes=1190 Combine input records=0 Combine output records=0 Reduce input groups=5 Reduce shuffle bytes=1033 Reduce input records=50 Reduce output records=5 Spilled Records=100 Shuffled Maps =10 Failed Shuffles=0 Merged Map outputs=10 GC time elapsed (ms)=2657 CPU time spent (ms)=94700 Physical memory (bytes) snapshot=7229349888 Virtual memory (bytes) snapshot=32021716992 Total committed heap usage (bytes)=6717702144 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=1120 File Output Format Counters Bytes Written=82 java.io.FileNotFoundException: TestDFSIO_results.log (Permission denied)
報錯! java.io.FileNotFoundException: TestDFSIO_results.log (Permission denied)
這是因爲用戶hdfs對當前所在文件夾沒有足夠的訪問權限,參考: https://blog.csdn.net/qq_15547319/article/details/53543587 中的評論
解決:新建文件夾 ** (命令:mkdir **),並授予用戶hdfs對文件夾**的訪問權限(命令:sudo chmod -R 777 **),進入文件夾**,執行命令 hadoop jar ../jars/hadoop-test-2.6.0-mr1-cdh5.16.1.jar TestDFSIO -write -nrFiles 10 -fileSize 1000 ,返回以下信息:
bash-4.2$ hadoop jar ../jars/hadoop-test-2.6.0-mr1-cdh5.16.1.jar TestDFSIO -write -nrFiles 10 -fileSize 1000 19/04/03 10:26:32 INFO fs.TestDFSIO: TestDFSIO.1.7 19/04/03 10:26:32 INFO fs.TestDFSIO: nrFiles = 10 19/04/03 10:26:32 INFO fs.TestDFSIO: nrBytes (MB) = 1000.0 19/04/03 10:26:32 INFO fs.TestDFSIO: bufferSize = 1000000 19/04/03 10:26:32 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO 19/04/03 10:26:32 INFO fs.TestDFSIO: creating control file: 1048576000 bytes, 10 files 19/04/03 10:26:33 INFO fs.TestDFSIO: created control files for: 10 files 19/04/03 10:26:33 INFO client.RMProxy: Connecting to ResourceManager at node1/10.200.101.131:8032 19/04/03 10:26:33 INFO client.RMProxy: Connecting to ResourceManager at node1/10.200.101.131:8032 19/04/03 10:26:33 INFO mapred.FileInputFormat: Total input paths to process : 10 19/04/03 10:26:33 INFO mapreduce.JobSubmitter: number of splits:10 19/04/03 10:26:33 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum 19/04/03 10:26:33 INFO Configuration.deprecation: dfs.https.address is deprecated. Instead, use dfs.namenode.https-address 19/04/03 10:26:33 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552358721447_0006 19/04/03 10:26:34 INFO impl.YarnClientImpl: Submitted application application_1552358721447_0006 19/04/03 10:26:34 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1552358721447_0006/ 19/04/03 10:26:34 INFO mapreduce.Job: Running job: job_1552358721447_0006 19/04/03 10:26:39 INFO mapreduce.Job: Job job_1552358721447_0006 running in uber mode : false 19/04/03 10:26:39 INFO mapreduce.Job: map 0% reduce 0% 19/04/03 10:26:53 INFO mapreduce.Job: map 30% reduce 0% 19/04/03 10:26:54 INFO mapreduce.Job: map 90% reduce 0% 19/04/03 10:26:55 INFO mapreduce.Job: map 100% reduce 0% 19/04/03 10:27:00 INFO mapreduce.Job: map 100% reduce 100% 19/04/03 10:27:00 INFO mapreduce.Job: Job job_1552358721447_0006 completed successfully 19/04/03 10:27:00 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=392 FILE: Number of bytes written=1653853 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=2310 HDFS: Number of bytes written=10485760082 HDFS: Number of read operations=43 HDFS: Number of large read operations=0 HDFS: Number of write operations=12 Job Counters Launched map tasks=10 Launched reduce tasks=1 Data-local map tasks=10 Total time spent by all maps in occupied slots (ms)=125653 Total time spent by all reduces in occupied slots (ms)=2636 Total time spent by all map tasks (ms)=125653 Total time spent by all reduce tasks (ms)=2636 Total vcore-milliseconds taken by all map tasks=125653 Total vcore-milliseconds taken by all reduce tasks=2636 Total megabyte-milliseconds taken by all map tasks=128668672 Total megabyte-milliseconds taken by all reduce tasks=2699264 Map-Reduce Framework Map input records=10 Map output records=50 Map output bytes=783 Map output materialized bytes=1030 Input split bytes=1190 Combine input records=0 Combine output records=0 Reduce input groups=5 Reduce shuffle bytes=1030 Reduce input records=50 Reduce output records=5 Spilled Records=100 Shuffled Maps =10 Failed Shuffles=0 Merged Map outputs=10 GC time elapsed (ms)=1881 CPU time spent (ms)=78110 Physical memory (bytes) snapshot=6980759552 Virtual memory (bytes) snapshot=31983017984 Total committed heap usage (bytes)=6693060608 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=1120 File Output Format Counters Bytes Written=82 19/04/03 10:27:00 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write 19/04/03 10:27:00 INFO fs.TestDFSIO: Date & time: Wed Apr 03 10:27:00 CST 2019 19/04/03 10:27:00 INFO fs.TestDFSIO: Number of files: 10 19/04/03 10:27:00 INFO fs.TestDFSIO: Total MBytes processed: 10000.0 19/04/03 10:27:00 INFO fs.TestDFSIO: Throughput mb/sec: 114.77630098937172 19/04/03 10:27:00 INFO fs.TestDFSIO: Average IO rate mb/sec: 115.29634094238281 19/04/03 10:27:00 INFO fs.TestDFSIO: IO rate std deviation: 7.880011777295818 19/04/03 10:27:00 INFO fs.TestDFSIO: Test exec time sec: 27.05 19/04/03 10:27:00 INFO fs.TestDFSIO: bash-4.2$
測試命令正確執行之後會在Hadoop File System中建立文件夾存放生成的測試文件,以下所示:
並生成了一系列小文件:
將小文件下載到本地,查看文件大小爲1KB
用Notepad++打開後,查看內容爲:
並非可讀的內容
執行命令: hadoop jar ../jars/hadoop-test-2.6.0-mr1-cdh5.16.1.jar TestDFSIO -read -nrFiles 10 -fileSize 1000
返回以下信息:
bash-4.2$ hadoop jar ../jars/hadoop-test-2.6.0-mr1-cdh5.16.1.jar TestDFSIO -read -nrFiles 10 -fileSize 1000 19/04/03 10:51:05 INFO fs.TestDFSIO: TestDFSIO.1.7 19/04/03 10:51:05 INFO fs.TestDFSIO: nrFiles = 10 19/04/03 10:51:05 INFO fs.TestDFSIO: nrBytes (MB) = 1000.0 19/04/03 10:51:05 INFO fs.TestDFSIO: bufferSize = 1000000 19/04/03 10:51:05 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO 19/04/03 10:51:05 INFO fs.TestDFSIO: creating control file: 1048576000 bytes, 10 files 19/04/03 10:51:06 INFO fs.TestDFSIO: created control files for: 10 files 19/04/03 10:51:06 INFO client.RMProxy: Connecting to ResourceManager at node1/10.200.101.131:8032 19/04/03 10:51:06 INFO client.RMProxy: Connecting to ResourceManager at node1/10.200.101.131:8032 19/04/03 10:51:06 INFO mapred.FileInputFormat: Total input paths to process : 10 19/04/03 10:51:06 INFO mapreduce.JobSubmitter: number of splits:10 19/04/03 10:51:06 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum 19/04/03 10:51:06 INFO Configuration.deprecation: dfs.https.address is deprecated. Instead, use dfs.namenode.https-address 19/04/03 10:51:06 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552358721447_0007 19/04/03 10:51:07 INFO impl.YarnClientImpl: Submitted application application_1552358721447_0007 19/04/03 10:51:07 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1552358721447_0007/ 19/04/03 10:51:07 INFO mapreduce.Job: Running job: job_1552358721447_0007 19/04/03 10:51:12 INFO mapreduce.Job: Job job_1552358721447_0007 running in uber mode : false 19/04/03 10:51:12 INFO mapreduce.Job: map 0% reduce 0% 19/04/03 10:51:19 INFO mapreduce.Job: map 100% reduce 0% 19/04/03 10:51:25 INFO mapreduce.Job: map 100% reduce 100% 19/04/03 10:51:25 INFO mapreduce.Job: Job job_1552358721447_0007 completed successfully 19/04/03 10:51:25 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=345 FILE: Number of bytes written=1653774 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=10485762310 HDFS: Number of bytes written=81 HDFS: Number of read operations=53 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=10 Launched reduce tasks=1 Data-local map tasks=10 Total time spent by all maps in occupied slots (ms)=50265 Total time spent by all reduces in occupied slots (ms)=2630 Total time spent by all map tasks (ms)=50265 Total time spent by all reduce tasks (ms)=2630 Total vcore-milliseconds taken by all map tasks=50265 Total vcore-milliseconds taken by all reduce tasks=2630 Total megabyte-milliseconds taken by all map tasks=51471360 Total megabyte-milliseconds taken by all reduce tasks=2693120 Map-Reduce Framework Map input records=10 Map output records=50 Map output bytes=774 Map output materialized bytes=1020 Input split bytes=1190 Combine input records=0 Combine output records=0 Reduce input groups=5 Reduce shuffle bytes=1020 Reduce input records=50 Reduce output records=5 Spilled Records=100 Shuffled Maps =10 Failed Shuffles=0 Merged Map outputs=10 GC time elapsed (ms)=1310 CPU time spent (ms)=35780 Physical memory (bytes) snapshot=6365962240 Virtual memory (bytes) snapshot=31838441472 Total committed heap usage (bytes)=6873415680 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=1120 File Output Format Counters Bytes Written=81 19/04/03 10:51:25 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read 19/04/03 10:51:25 INFO fs.TestDFSIO: Date & time: Wed Apr 03 10:51:25 CST 2019 19/04/03 10:51:25 INFO fs.TestDFSIO: Number of files: 10 19/04/03 10:51:25 INFO fs.TestDFSIO: Total MBytes processed: 10000.0 19/04/03 10:51:25 INFO fs.TestDFSIO: Throughput mb/sec: 897.4243919949744 19/04/03 10:51:25 INFO fs.TestDFSIO: Average IO rate mb/sec: 898.6844482421875 19/04/03 10:51:25 INFO fs.TestDFSIO: IO rate std deviation: 33.68623587810037 19/04/03 10:51:25 INFO fs.TestDFSIO: Test exec time sec: 19.035 19/04/03 10:51:25 INFO fs.TestDFSIO: bash-4.2$
執行命令: hadoop jar ../jars/hadoop-test-2.6.0-mr1-cdh5.16.1.jar TestDFSIO -clean
返回以下信息:
bash-4.2$ hadoop jar ../jars/hadoop-test-2.6.0-mr1-cdh5.16.1.jar TestDFSIO -clean 19/04/03 11:17:25 INFO fs.TestDFSIO: TestDFSIO.1.7 19/04/03 11:17:25 INFO fs.TestDFSIO: nrFiles = 1 19/04/03 11:17:25 INFO fs.TestDFSIO: nrBytes (MB) = 1.0 19/04/03 11:17:25 INFO fs.TestDFSIO: bufferSize = 1000000 19/04/03 11:17:25 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO 19/04/03 11:17:26 INFO fs.TestDFSIO: Cleaning up test files bash-4.2$
同時Hadoop File System中刪除了TestDFSIO文件夾
nnbench
nnbench用於測試NameNode的負載,它會生成不少與HDFS相關的請求,給NameNode施加較大的壓力。這個測試能在HDFS上模擬建立、讀取、重命名和刪除文件等操做。
nnbench命令的參數說明以下:
NameNode Benchmark 0.4 Usage: nnbench <options> Options: -operation <Available operations are create_write open_read rename delete. This option is mandatory> * NOTE: The open_read, rename and delete operations assume that the files they operate on, are already available. The create_write operation must be run before running the other operations. -maps <number of maps. default is 1. This is not mandatory> -reduces <number of reduces. default is 1. This is not mandatory> -startTime <time to start, given in seconds from the epoch. Make sure this is far enough into the future, so all maps (operations) will start at the same time>. default is launch time + 2 mins. This is not mandatory -blockSize <Block size in bytes. default is 1. This is not mandatory> -bytesToWrite <Bytes to write. default is 0. This is not mandatory> -bytesPerChecksum <Bytes per checksum for the files. default is 1. This is not mandatory> -numberOfFiles <number of files to create. default is 1. This is not mandatory> -replicationFactorPerFile <Replication factor for the files. default is 1. This is not mandatory> -baseDir <base DFS path. default is /becnhmarks/NNBench. This is not mandatory> -readFileAfterOpen <true or false. if true, it reads the file and reports the average time to read. This is valid with the open_read operation. default is false. This is not mandatory> -help: Display the help statement
爲了使用12個mapper和6個reducer來建立1000個文件,執行命令: hadoop jar ../jars/hadoop-test-2.6.0-mr1-cdh5.16.1.jar nnbench -operation create_write -maps 12 -reduces 6 -blockSize 1 -bytesToWrite 0 -numberOfFiles 1000 -replicationFactorPerFile 3 -readFileAfterOpen true -baseDir /benchmarks/NNBench
返回以下信息:
bash-4.2$ hadoop jar ../jars/hadoop-test-2.6.0-mr1-cdh5.16.1.jar nnbench -operation create_write -maps 12 -reduces 6 -blockSize 1 -bytesToWrite 0 -numberOfFiles 1000 -replicationFactorPerFile 3 hadoop jar ../jars/hadoop-test-2.6.0-mr1-cdh5.16.1.jar nnbench -operation create_write -maps 12 -reduces 6 -blockSize 1 -bytesToWrite 0 -numberOfFiles 1000 -replicationFactorPerFile 3 -readFileAfterOpen true -baseDir /benchmarks/NNBench NameNode Benchmark 0.4 19/04/03 16:11:22 INFO hdfs.NNBench: Test Inputs: 19/04/03 16:11:22 INFO hdfs.NNBench: Test Operation: create_write 19/04/03 16:11:22 INFO hdfs.NNBench: Start time: 2019-04-03 16:13:22,755 19/04/03 16:11:22 INFO hdfs.NNBench: Number of maps: 12 19/04/03 16:11:22 INFO hdfs.NNBench: Number of reduces: 6 19/04/03 16:11:22 INFO hdfs.NNBench: Block Size: 1 19/04/03 16:11:22 INFO hdfs.NNBench: Bytes to write: 0 19/04/03 16:11:22 INFO hdfs.NNBench: Bytes per checksum: 1 19/04/03 16:11:22 INFO hdfs.NNBench: Number of files: 1000 19/04/03 16:11:22 INFO hdfs.NNBench: Replication factor: 3 19/04/03 16:11:22 INFO hdfs.NNBench: Base dir: /benchmarks/NNBench 19/04/03 16:11:22 INFO hdfs.NNBench: Read file after open: true 19/04/03 16:11:23 INFO hdfs.NNBench: Deleting data directory 19/04/03 16:11:23 INFO hdfs.NNBench: Creating 12 control files 19/04/03 16:11:24 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum 19/04/03 16:11:24 INFO client.RMProxy: Connecting to ResourceManager at node1/10.200.101.131:8032 19/04/03 16:11:24 INFO client.RMProxy: Connecting to ResourceManager at node1/10.200.101.131:8032 19/04/03 16:11:24 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this. 19/04/03 16:11:24 INFO mapred.FileInputFormat: Total input paths to process : 12 19/04/03 16:11:24 INFO mapreduce.JobSubmitter: number of splits:12 19/04/03 16:11:24 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum 19/04/03 16:11:24 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552358721447_0009 19/04/03 16:11:24 INFO impl.YarnClientImpl: Submitted application application_1552358721447_0009 19/04/03 16:11:24 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1552358721447_0009/ 19/04/03 16:11:24 INFO mapreduce.Job: Running job: job_1552358721447_0009 19/04/03 16:11:31 INFO mapreduce.Job: Job job_1552358721447_0009 running in uber mode : false 19/04/03 16:11:31 INFO mapreduce.Job: map 0% reduce 0% 19/04/03 16:11:48 INFO mapreduce.Job: map 50% reduce 0% 19/04/03 16:11:49 INFO mapreduce.Job: map 67% reduce 0% 19/04/03 16:13:26 INFO mapreduce.Job: map 100% reduce 0% 19/04/03 16:13:31 INFO mapreduce.Job: map 100% reduce 17% 19/04/03 16:13:32 INFO mapreduce.Job: map 100% reduce 100% 19/04/03 16:13:32 INFO mapreduce.Job: Job job_1552358721447_0009 completed successfully 19/04/03 16:13:32 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=519 FILE: Number of bytes written=2736365 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=2908 HDFS: Number of bytes written=170 HDFS: Number of read operations=66 HDFS: Number of large read operations=0 HDFS: Number of write operations=12012 Job Counters Launched map tasks=12 Launched reduce tasks=6 Data-local map tasks=12 Total time spent by all maps in occupied slots (ms)=1363711 Total time spent by all reduces in occupied slots (ms)=18780 Total time spent by all map tasks (ms)=1363711 Total time spent by all reduce tasks (ms)=18780 Total vcore-milliseconds taken by all map tasks=1363711 Total vcore-milliseconds taken by all reduce tasks=18780 Total megabyte-milliseconds taken by all map tasks=1396440064 Total megabyte-milliseconds taken by all reduce tasks=19230720 Map-Reduce Framework Map input records=12 Map output records=84 Map output bytes=2016 Map output materialized bytes=3276 Input split bytes=1418 Combine input records=0 Combine output records=0 Reduce input groups=7 Reduce shuffle bytes=3276 Reduce input records=84 Reduce output records=7 Spilled Records=168 Shuffled Maps =72 Failed Shuffles=0 Merged Map outputs=72 GC time elapsed (ms)=2335 CPU time spent (ms)=35880 Physical memory (bytes) snapshot=9088864256 Virtual memory (bytes) snapshot=52095377408 Total committed heap usage (bytes)=11191975936 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=1490 File Output Format Counters Bytes Written=170 19/04/03 16:13:32 INFO hdfs.NNBench: -------------- NNBench -------------- : 19/04/03 16:13:32 INFO hdfs.NNBench: Version: NameNode Benchmark 0.4 19/04/03 16:13:32 INFO hdfs.NNBench: Date & time: 2019-04-03 16:13:32,475 19/04/03 16:13:32 INFO hdfs.NNBench: 19/04/03 16:13:32 INFO hdfs.NNBench: Test Operation: create_write 19/04/03 16:13:32 INFO hdfs.NNBench: Start time: 2019-04-03 16:13:22,755 19/04/03 16:13:32 INFO hdfs.NNBench: Maps to run: 12 19/04/03 16:13:32 INFO hdfs.NNBench: Reduces to run: 6 19/04/03 16:13:32 INFO hdfs.NNBench: Block Size (bytes): 1 19/04/03 16:13:32 INFO hdfs.NNBench: Bytes to write: 0 19/04/03 16:13:32 INFO hdfs.NNBench: Bytes per checksum: 1 19/04/03 16:13:32 INFO hdfs.NNBench: Number of files: 1000 19/04/03 16:13:32 INFO hdfs.NNBench: Replication factor: 3 19/04/03 16:13:32 INFO hdfs.NNBench: Successful file operations: 0 19/04/03 16:13:32 INFO hdfs.NNBench: 19/04/03 16:13:32 INFO hdfs.NNBench: # maps that missed the barrier: 0 19/04/03 16:13:32 INFO hdfs.NNBench: # exceptions: 0 19/04/03 16:13:32 INFO hdfs.NNBench: 19/04/03 16:13:32 INFO hdfs.NNBench: TPS: Create/Write/Close: 0 19/04/03 16:13:32 INFO hdfs.NNBench: Avg exec time (ms): Create/Write/Close: 0.0 19/04/03 16:13:32 INFO hdfs.NNBench: Avg Lat (ms): Create/Write: NaN 19/04/03 16:13:32 INFO hdfs.NNBench: Avg Lat (ms): Close: NaN 19/04/03 16:13:32 INFO hdfs.NNBench: 19/04/03 16:13:32 INFO hdfs.NNBench: RAW DATA: AL Total #1: 0 19/04/03 16:13:32 INFO hdfs.NNBench: RAW DATA: AL Total #2: 0 19/04/03 16:13:32 INFO hdfs.NNBench: RAW DATA: TPS Total (ms): 0 19/04/03 16:13:32 INFO hdfs.NNBench: RAW DATA: Longest Map Time (ms): 0.0 19/04/03 16:13:32 INFO hdfs.NNBench: RAW DATA: Late maps: 0 19/04/03 16:13:32 INFO hdfs.NNBench: RAW DATA: # of exceptions: 0 19/04/03 16:13:32 INFO hdfs.NNBench: bash-4.2$
任務執行完之後能夠到頁面 http://*.*.*.*:19888/jobhistory/job/job_1552358721447_0009 查看任務執行詳情,以下:
而且在Hadoop File System中生成文件夾NNBench存儲任務產生的文件:
進入目錄/benchmarks/NNBench/control,查看某個文件 NNBench_Controlfile_0 的元信息,發現文件存在三個節點上:
下載下來用Notepad++打開,發現內容是亂碼:
mrbench
mrbench會屢次重複執行一個小做業,用於檢查在機羣上小做業的運行是否可重複以及運行是否高效。mrbench的用法以下:
Usage: mrbench [-baseDir <base DFS path for output/input, default is /benchmarks/MRBench>] [-jar <local path to job jar file containing Mapper and Reducer implementations, default is current jar file>] [-numRuns <number of times to run the job, default is 1>] [-maps <number of maps for each run, default is 2>] [-reduces <number of reduces for each run, default is 1>] [-inputLines <number of input lines to generate, default is 1>] [-inputType <type of input to generate, one of ascending (default), descending, random>] [-verbose]
執行命令: hadoop jar ../jars/hadoop-test-2.6.0-mr1-cdh5.16.1.jar mrbench -numRuns 50
返回以下信息:
…… Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=3 File Output Format Counters Bytes Written=3 19/04/03 17:10:15 INFO mapred.MRBench: Running job 49: input=hdfs://node1:8020/benchmarks/MRBench/mr_input output=hdfs://node1:8020/benchmarks/MRBench/mr_output/output_299739316 19/04/03 17:10:15 INFO client.RMProxy: Connecting to ResourceManager at node1/10.200.101.131:8032 19/04/03 17:10:15 INFO client.RMProxy: Connecting to ResourceManager at node1/10.200.101.131:8032 19/04/03 17:10:15 INFO mapred.FileInputFormat: Total input paths to process : 1 19/04/03 17:10:15 INFO mapreduce.JobSubmitter: number of splits:2 19/04/03 17:10:15 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552358721447_0059 19/04/03 17:10:15 INFO impl.YarnClientImpl: Submitted application application_1552358721447_0059 19/04/03 17:10:15 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1552358721447_0059/ 19/04/03 17:10:15 INFO mapreduce.Job: Running job: job_1552358721447_0059 19/04/03 17:10:21 INFO mapreduce.Job: Job job_1552358721447_0059 running in uber mode : false 19/04/03 17:10:21 INFO mapreduce.Job: map 0% reduce 0% 19/04/03 17:10:25 INFO mapreduce.Job: map 100% reduce 0% 19/04/03 17:10:30 INFO mapreduce.Job: map 100% reduce 100% 19/04/03 17:10:30 INFO mapreduce.Job: Job job_1552358721447_0059 completed successfully 19/04/03 17:10:30 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=27 FILE: Number of bytes written=450422 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=239 HDFS: Number of bytes written=3 HDFS: Number of read operations=9 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=2 Launched reduce tasks=1 Data-local map tasks=2 Total time spent by all maps in occupied slots (ms)=5134 Total time spent by all reduces in occupied slots (ms)=2562 Total time spent by all map tasks (ms)=5134 Total time spent by all reduce tasks (ms)=2562 Total vcore-milliseconds taken by all map tasks=5134 Total vcore-milliseconds taken by all reduce tasks=2562 Total megabyte-milliseconds taken by all map tasks=5257216 Total megabyte-milliseconds taken by all reduce tasks=2623488 Map-Reduce Framework Map input records=1 Map output records=1 Map output bytes=5 Map output materialized bytes=39 Input split bytes=236 Combine input records=0 Combine output records=0 Reduce input groups=1 Reduce shuffle bytes=39 Reduce input records=1 Reduce output records=1 Spilled Records=2 Shuffled Maps =2 Failed Shuffles=0 Merged Map outputs=2 GC time elapsed (ms)=196 CPU time spent (ms)=2550 Physical memory (bytes) snapshot=1503531008 Virtual memory (bytes) snapshot=8690847744 Total committed heap usage (bytes)=1791492096 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=3 File Output Format Counters Bytes Written=3 DataLines Maps Reduces AvgTime (milliseconds) 1 2 1 15357 bash-4.2$
以上結果表示平均做業完成時間是15秒
打開網址 http://*.*.*.*:8088/cluster ,能夠查看執行的任務信息:
Hadoop File System也生成了相應的目錄,可是目錄裏面的內容是空的,以下:
參考內容: https://blog.51cto.com/7543154/1243883 ; http://www.javashuo.com/article/p-ftetlymu-ha.html ; https://blog.csdn.net/flygoa/article/details/52127382