http://blog.fens.me/hadoop-mahout-mapreduce-itemcf/html
Hadoop家族系列文章,主要介紹Hadoop家族產品,經常使用的項目包括Hadoop, Hive, Pig, HBase, Sqoop, Mahout, Zookeeper, Avro, Ambari, Chukwa,新增長的項目包括,YARN, Hcatalog, Oozie, Cassandra, Hama, Whirr, Flume, Bigtop, Crunch, Hue等。java
從2011年開始,中國進入大數據風起雲涌的時代,以Hadoop爲表明的家族軟件,佔據了大數據處理的廣闊地盤。開源界及廠商,全部數據軟件,無一不向Hadoop靠攏。Hadoop也從小衆的高富帥領域,變成了大數據開發的標準。在Hadoop原有技術基礎之上,出現了Hadoop家族產品,經過「大數據」概念不斷創新,推出科技進步。linux
做爲IT界的開發人員,咱們也要跟上節奏,抓住機遇,跟着Hadoop一塊兒雄起!git
關於做者:程序員
轉載請註明出處:
http://blog.fens.me/hadoop-mahout-mapreduce-itemcf/github
前言算法
Mahout是Hadoop家族一員,從血緣就繼承了Hadoop程序的特色,支持HDFS訪問和MapReduce分步式算法。隨着Mahout的發展,從0.7版本開始,Mahout作了重大的升級。移除了部分算法的單機內存計算,只支持基於Hadoop的MapReduce平行計算。apache
從這點上,咱們能看出Mahout走向大數據,堅持並行化的決心!相信在Hadoop的大框架下,Mahout最終能成爲一個大數據的明星產品!編程
目錄api
在 用Maven構建Mahout項目 文章中,咱們已經配置好了基於Maven的Mahout的開發環境,咱們將繼續完成Mahout的分步式的程序開發。
本文的mahout版本爲0.8。
開發環境:
找到pom.xml,修改mahout版本爲0.8
<mahout.version>0.8</mahout.version>
而後,下載依賴庫。
~ mvn clean install
因爲 org.conan.mymahout.cluster06.Kmeans.java 類代碼,是基於mahout-0.6的,因此會報錯。咱們能夠先註釋這個文件。
如上圖所示,咱們能夠選擇在win7中開發,也能夠在linux中開發,開發過程咱們能夠在本地環境進行調試,標配的工具都是Maven和Eclipse。
Mahout在運行過程當中,會把MapReduce的算法程序包,自動發佈的Hadoop的集羣環境中,這種開發和運行模式,就和真正的生產環境差很少了。
實現步驟:
1). 準備數據文件: item.csv
上傳測試數據到HDFS,單機內存實驗請參考文章:用Maven構建Mahout項目
~ hadoop fs -mkdir /user/hdfs/userCF ~ hadoop fs -copyFromLocal /home/conan/datafiles/item.csv /user/hdfs/userCF ~ hadoop fs -cat /user/hdfs/userCF/item.csv 1,101,5.0 1,102,3.0 1,103,2.5 2,101,2.0 2,102,2.5 2,103,5.0 2,104,2.0 3,101,2.5 3,104,4.0 3,105,4.5 3,107,5.0 4,101,5.0 4,103,3.0 4,104,4.5 4,106,4.0 5,101,4.0 5,102,3.0 5,103,2.0 5,104,4.0 5,105,3.5 5,106,4.0
2). Java程序:HdfsDAO.java
HdfsDAO.java,是一個HDFS操做的工具,用API實現Hadoop的各類HDFS命令,請參考文章:Hadoop編程調用HDFS
咱們這裏會用到HdfsDAO.java類中的一些方法:
HdfsDAO hdfs = new HdfsDAO(HDFS, conf); hdfs.rmr(inPath); hdfs.mkdirs(inPath); hdfs.copyFile(localFile, inPath); hdfs.ls(inPath); hdfs.cat(inFile);
3). Java程序:ItemCFHadoop.java
用Mahout實現分步式算法,咱們看到Mahout in Action中的解釋。
實現程序:
package org.conan.mymahout.recommendation; import org.apache.hadoop.mapred.JobConf; import org.apache.mahout.cf.taste.hadoop.item.RecommenderJob; import org.conan.mymahout.hdfs.HdfsDAO; public class ItemCFHadoop { private static final String HDFS = "hdfs://192.168.1.210:9000"; public static void main(String[] args) throws Exception { String localFile = "datafile/item.csv"; String inPath = HDFS + "/user/hdfs/userCF"; String inFile = inPath + "/item.csv"; String outPath = HDFS + "/user/hdfs/userCF/result/"; String outFile = outPath + "/part-r-00000"; String tmpPath = HDFS + "/tmp/" + System.currentTimeMillis(); JobConf conf = config(); HdfsDAO hdfs = new HdfsDAO(HDFS, conf); hdfs.rmr(inPath); hdfs.mkdirs(inPath); hdfs.copyFile(localFile, inPath); hdfs.ls(inPath); hdfs.cat(inFile); StringBuilder sb = new StringBuilder(); sb.append("--input ").append(inPath); sb.append(" --output ").append(outPath); sb.append(" --booleanData true"); sb.append(" --similarityClassname org.apache.mahout.math.hadoop.similarity.cooccurrence.measures.EuclideanDistanceSimilarity"); sb.append(" --tempDir ").append(tmpPath); args = sb.toString().split(" "); RecommenderJob job = new RecommenderJob(); job.setConf(conf); job.run(args); hdfs.cat(outFile); } public static JobConf config() { JobConf conf = new JobConf(ItemCFHadoop.class); conf.setJobName("ItemCFHadoop"); conf.addResource("classpath:/hadoop/core-site.xml"); conf.addResource("classpath:/hadoop/hdfs-site.xml"); conf.addResource("classpath:/hadoop/mapred-site.xml"); return conf; } }
RecommenderJob.java,實際上就是封裝了,上面整個圖的分步式並行算法的執行過程!若是沒有這層封裝,咱們須要本身去實現圖中8個步驟MapReduce算法。
關於上面算法的深度剖析,請參考文章:R實現MapReduce的協同過濾算法
4). 運行程序
控制檯輸出:
Delete: hdfs://192.168.1.210:9000/user/hdfs/userCF Create: hdfs://192.168.1.210:9000/user/hdfs/userCF copy from: datafile/item.csv to hdfs://192.168.1.210:9000/user/hdfs/userCF ls: hdfs://192.168.1.210:9000/user/hdfs/userCF ========================================================== name: hdfs://192.168.1.210:9000/user/hdfs/userCF/item.csv, folder: false, size: 229 ========================================================== cat: hdfs://192.168.1.210:9000/user/hdfs/userCF/item.csv 1,101,5.0 1,102,3.0 1,103,2.5 2,101,2.0 2,102,2.5 2,103,5.0 2,104,2.0 3,101,2.5 3,104,4.0 3,105,4.5 3,107,5.0 4,101,5.0 4,103,3.0 4,104,4.5 4,106,4.0 5,101,4.0 5,102,3.0 5,103,2.0 5,104,4.0 5,105,3.5 5,106,4.0SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. 2013-10-14 10:26:35 org.apache.hadoop.util.NativeCodeLoader 警告: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2013-10-14 10:26:35 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus 信息: Total input paths to process : 1 2013-10-14 10:26:35 org.apache.hadoop.io.compress.snappy.LoadSnappy 警告: Snappy native library not loaded 2013-10-14 10:26:36 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: Running job: job_local_0001 2013-10-14 10:26:36 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2013-10-14 10:26:36 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: io.sort.mb = 100 2013-10-14 10:26:36 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: data buffer = 79691776/99614720 2013-10-14 10:26:36 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: record buffer = 262144/327680 2013-10-14 10:26:36 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush 信息: Starting flush of map output 2013-10-14 10:26:36 org.apache.hadoop.io.compress.CodecPool getCompressor 信息: Got brand-new compressor 2013-10-14 10:26:36 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill 信息: Finished spill 0 2013-10-14 10:26:36 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting 2013-10-14 10:26:36 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:36 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local_0001_m_000000_0' done. 2013-10-14 10:26:36 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2013-10-14 10:26:36 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:36 org.apache.hadoop.mapred.Merger$MergeQueue merge 信息: Merging 1 sorted segments 2013-10-14 10:26:36 org.apache.hadoop.io.compress.CodecPool getDecompressor 信息: Got brand-new decompressor 2013-10-14 10:26:36 org.apache.hadoop.mapred.Merger$MergeQueue merge 信息: Down to the last merge-pass, with 1 segments left of total size: 42 bytes 2013-10-14 10:26:36 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:36 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting 2013-10-14 10:26:36 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:36 org.apache.hadoop.mapred.Task commit 信息: Task attempt_local_0001_r_000000_0 is allowed to commit now 2013-10-14 10:26:36 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask 信息: Saved output of task 'attempt_local_0001_r_000000_0' to hdfs://192.168.1.210:9000/tmp/1381717594500/preparePreferenceMatrix/itemIDIndex 2013-10-14 10:26:36 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: reduce > reduce 2013-10-14 10:26:36 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local_0001_r_000000_0' done. 2013-10-14 10:26:37 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: map 100% reduce 100% 2013-10-14 10:26:37 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: Job complete: job_local_0001 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: Counters: 19 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: File Output Format Counters 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: Bytes Written=187 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: FileSystemCounters 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: FILE_BYTES_READ=3287330 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: HDFS_BYTES_READ=916 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: FILE_BYTES_WRITTEN=3443292 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: HDFS_BYTES_WRITTEN=645 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: File Input Format Counters 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: Bytes Read=229 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: Map-Reduce Framework 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: Map output materialized bytes=46 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: Map input records=21 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: Reduce shuffle bytes=0 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: Spilled Records=14 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: Map output bytes=84 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: Total committed heap usage (bytes)=376569856 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: SPLIT_RAW_BYTES=116 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: Combine input records=21 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: Reduce input records=7 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: Reduce input groups=7 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: Combine output records=7 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: Reduce output records=7 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: Map output records=21 2013-10-14 10:26:37 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus 信息: Total input paths to process : 1 2013-10-14 10:26:37 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: Running job: job_local_0002 2013-10-14 10:26:37 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2013-10-14 10:26:37 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: io.sort.mb = 100 2013-10-14 10:26:37 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: data buffer = 79691776/99614720 2013-10-14 10:26:37 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: record buffer = 262144/327680 2013-10-14 10:26:37 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush 信息: Starting flush of map output 2013-10-14 10:26:37 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill 信息: Finished spill 0 2013-10-14 10:26:37 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local_0002_m_000000_0 is done. And is in the process of commiting 2013-10-14 10:26:37 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:37 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local_0002_m_000000_0' done. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2013-10-14 10:26:37 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:37 org.apache.hadoop.mapred.Merger$MergeQueue merge 信息: Merging 1 sorted segments 2013-10-14 10:26:37 org.apache.hadoop.mapred.Merger$MergeQueue merge 信息: Down to the last merge-pass, with 1 segments left of total size: 68 bytes 2013-10-14 10:26:37 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:37 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local_0002_r_000000_0 is done. And is in the process of commiting 2013-10-14 10:26:37 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:37 org.apache.hadoop.mapred.Task commit 信息: Task attempt_local_0002_r_000000_0 is allowed to commit now 2013-10-14 10:26:37 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask 信息: Saved output of task 'attempt_local_0002_r_000000_0' to hdfs://192.168.1.210:9000/tmp/1381717594500/preparePreferenceMatrix/userVectors 2013-10-14 10:26:37 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: reduce > reduce 2013-10-14 10:26:37 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local_0002_r_000000_0' done. 2013-10-14 10:26:38 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: map 100% reduce 100% 2013-10-14 10:26:38 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: Job complete: job_local_0002 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: Counters: 20 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: org.apache.mahout.cf.taste.hadoop.item.ToUserVectorsReducer$Counters 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: USERS=5 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: File Output Format Counters 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: Bytes Written=288 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: FileSystemCounters 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: FILE_BYTES_READ=6574274 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: HDFS_BYTES_READ=1374 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: FILE_BYTES_WRITTEN=6887592 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: HDFS_BYTES_WRITTEN=1120 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: File Input Format Counters 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: Bytes Read=229 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: Map-Reduce Framework 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: Map output materialized bytes=72 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: Map input records=21 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: Reduce shuffle bytes=0 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: Spilled Records=42 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: Map output bytes=63 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: Total committed heap usage (bytes)=575930368 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: SPLIT_RAW_BYTES=116 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: Combine input records=0 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: Reduce input records=21 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: Reduce input groups=5 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: Combine output records=0 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: Reduce output records=5 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: Map output records=21 2013-10-14 10:26:38 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus 信息: Total input paths to process : 1 2013-10-14 10:26:38 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: Running job: job_local_0003 2013-10-14 10:26:38 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2013-10-14 10:26:38 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: io.sort.mb = 100 2013-10-14 10:26:38 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: data buffer = 79691776/99614720 2013-10-14 10:26:38 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: record buffer = 262144/327680 2013-10-14 10:26:38 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush 信息: Starting flush of map output 2013-10-14 10:26:38 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill 信息: Finished spill 0 2013-10-14 10:26:38 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local_0003_m_000000_0 is done. And is in the process of commiting 2013-10-14 10:26:38 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:38 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local_0003_m_000000_0' done. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2013-10-14 10:26:38 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:38 org.apache.hadoop.mapred.Merger$MergeQueue merge 信息: Merging 1 sorted segments 2013-10-14 10:26:38 org.apache.hadoop.mapred.Merger$MergeQueue merge 信息: Down to the last merge-pass, with 1 segments left of total size: 89 bytes 2013-10-14 10:26:38 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:38 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local_0003_r_000000_0 is done. And is in the process of commiting 2013-10-14 10:26:38 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:38 org.apache.hadoop.mapred.Task commit 信息: Task attempt_local_0003_r_000000_0 is allowed to commit now 2013-10-14 10:26:38 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask 信息: Saved output of task 'attempt_local_0003_r_000000_0' to hdfs://192.168.1.210:9000/tmp/1381717594500/preparePreferenceMatrix/ratingMatrix 2013-10-14 10:26:38 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: reduce > reduce 2013-10-14 10:26:38 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local_0003_r_000000_0' done. 2013-10-14 10:26:39 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: map 100% reduce 100% 2013-10-14 10:26:39 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: Job complete: job_local_0003 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: Counters: 21 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: File Output Format Counters 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: Bytes Written=335 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: org.apache.mahout.cf.taste.hadoop.preparation.ToItemVectorsMapper$Elements 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: USER_RATINGS_NEGLECTED=0 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: USER_RATINGS_USED=21 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: FileSystemCounters 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: FILE_BYTES_READ=9861349 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: HDFS_BYTES_READ=1950 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: FILE_BYTES_WRITTEN=10331958 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: HDFS_BYTES_WRITTEN=1751 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: File Input Format Counters 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: Bytes Read=288 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: Map-Reduce Framework 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: Map output materialized bytes=93 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: Map input records=5 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: Reduce shuffle bytes=0 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: Spilled Records=14 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: Map output bytes=336 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: Total committed heap usage (bytes)=775290880 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: SPLIT_RAW_BYTES=157 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: Combine input records=21 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: Reduce input records=7 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: Reduce input groups=7 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: Combine output records=7 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: Reduce output records=7 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: Map output records=21 2013-10-14 10:26:39 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus 信息: Total input paths to process : 1 2013-10-14 10:26:39 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: Running job: job_local_0004 2013-10-14 10:26:39 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2013-10-14 10:26:39 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: io.sort.mb = 100 2013-10-14 10:26:39 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: data buffer = 79691776/99614720 2013-10-14 10:26:39 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: record buffer = 262144/327680 2013-10-14 10:26:39 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush 信息: Starting flush of map output 2013-10-14 10:26:39 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill 信息: Finished spill 0 2013-10-14 10:26:39 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local_0004_m_000000_0 is done. And is in the process of commiting 2013-10-14 10:26:39 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:39 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local_0004_m_000000_0' done. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2013-10-14 10:26:39 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:39 org.apache.hadoop.mapred.Merger$MergeQueue merge 信息: Merging 1 sorted segments 2013-10-14 10:26:39 org.apache.hadoop.mapred.Merger$MergeQueue merge 信息: Down to the last merge-pass, with 1 segments left of total size: 118 bytes 2013-10-14 10:26:39 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:39 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local_0004_r_000000_0 is done. And is in the process of commiting 2013-10-14 10:26:39 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:39 org.apache.hadoop.mapred.Task commit 信息: Task attempt_local_0004_r_000000_0 is allowed to commit now 2013-10-14 10:26:39 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask 信息: Saved output of task 'attempt_local_0004_r_000000_0' to hdfs://192.168.1.210:9000/tmp/1381717594500/weights 2013-10-14 10:26:39 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: reduce > reduce 2013-10-14 10:26:39 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local_0004_r_000000_0' done. 2013-10-14 10:26:40 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: map 100% reduce 100% 2013-10-14 10:26:40 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: Job complete: job_local_0004 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: Counters: 20 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: File Output Format Counters 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: Bytes Written=381 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: FileSystemCounters 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: FILE_BYTES_READ=13148476 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: HDFS_BYTES_READ=2628 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: FILE_BYTES_WRITTEN=13780408 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: HDFS_BYTES_WRITTEN=2551 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: File Input Format Counters 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: Bytes Read=335 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: ROWS=7 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: Map-Reduce Framework 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: Map output materialized bytes=122 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: Map input records=7 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: Reduce shuffle bytes=0 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: Spilled Records=16 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: Map output bytes=516 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: Total committed heap usage (bytes)=974651392 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: SPLIT_RAW_BYTES=158 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: Combine input records=24 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: Reduce input records=8 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: Reduce input groups=8 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: Combine output records=8 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: Reduce output records=5 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: Map output records=24 2013-10-14 10:26:40 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus 信息: Total input paths to process : 1 2013-10-14 10:26:40 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: Running job: job_local_0005 2013-10-14 10:26:40 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2013-10-14 10:26:40 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: io.sort.mb = 100 2013-10-14 10:26:40 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: data buffer = 79691776/99614720 2013-10-14 10:26:40 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: record buffer = 262144/327680 2013-10-14 10:26:40 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush 信息: Starting flush of map output 2013-10-14 10:26:40 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill 信息: Finished spill 0 2013-10-14 10:26:40 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local_0005_m_000000_0 is done. And is in the process of commiting 2013-10-14 10:26:40 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:40 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local_0005_m_000000_0' done. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2013-10-14 10:26:40 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:40 org.apache.hadoop.mapred.Merger$MergeQueue merge 信息: Merging 1 sorted segments 2013-10-14 10:26:40 org.apache.hadoop.mapred.Merger$MergeQueue merge 信息: Down to the last merge-pass, with 1 segments left of total size: 121 bytes 2013-10-14 10:26:40 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:40 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local_0005_r_000000_0 is done. And is in the process of commiting 2013-10-14 10:26:40 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:40 org.apache.hadoop.mapred.Task commit 信息: Task attempt_local_0005_r_000000_0 is allowed to commit now 2013-10-14 10:26:40 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask 信息: Saved output of task 'attempt_local_0005_r_000000_0' to hdfs://192.168.1.210:9000/tmp/1381717594500/pairwiseSimilarity 2013-10-14 10:26:40 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: reduce > reduce 2013-10-14 10:26:40 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local_0005_r_000000_0' done. 2013-10-14 10:26:41 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: map 100% reduce 100% 2013-10-14 10:26:41 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: Job complete: job_local_0005 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: Counters: 21 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: File Output Format Counters 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: Bytes Written=392 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: FileSystemCounters 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: FILE_BYTES_READ=16435577 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: HDFS_BYTES_READ=3488 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: FILE_BYTES_WRITTEN=17230010 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: HDFS_BYTES_WRITTEN=3408 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: File Input Format Counters 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: Bytes Read=381 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: PRUNED_COOCCURRENCES=0 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: COOCCURRENCES=57 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: Map-Reduce Framework 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: Map output materialized bytes=125 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: Map input records=5 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: Reduce shuffle bytes=0 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: Spilled Records=14 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: Map output bytes=744 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: Total committed heap usage (bytes)=1174011904 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: SPLIT_RAW_BYTES=129 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: Combine input records=21 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: Reduce input records=7 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: Reduce input groups=7 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: Combine output records=7 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: Reduce output records=7 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: Map output records=21 2013-10-14 10:26:41 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus 信息: Total input paths to process : 1 2013-10-14 10:26:41 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: Running job: job_local_0006 2013-10-14 10:26:41 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2013-10-14 10:26:41 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: io.sort.mb = 100 2013-10-14 10:26:41 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: data buffer = 79691776/99614720 2013-10-14 10:26:41 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: record buffer = 262144/327680 2013-10-14 10:26:41 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush 信息: Starting flush of map output 2013-10-14 10:26:41 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill 信息: Finished spill 0 2013-10-14 10:26:41 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local_0006_m_000000_0 is done. And is in the process of commiting 2013-10-14 10:26:41 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:41 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local_0006_m_000000_0' done. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2013-10-14 10:26:41 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:41 org.apache.hadoop.mapred.Merger$MergeQueue merge 信息: Merging 1 sorted segments 2013-10-14 10:26:41 org.apache.hadoop.mapred.Merger$MergeQueue merge 信息: Down to the last merge-pass, with 1 segments left of total size: 158 bytes 2013-10-14 10:26:41 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:41 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local_0006_r_000000_0 is done. And is in the process of commiting 2013-10-14 10:26:41 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:41 org.apache.hadoop.mapred.Task commit 信息: Task attempt_local_0006_r_000000_0 is allowed to commit now 2013-10-14 10:26:41 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask 信息: Saved output of task 'attempt_local_0006_r_000000_0' to hdfs://192.168.1.210:9000/tmp/1381717594500/similarityMatrix 2013-10-14 10:26:41 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: reduce > reduce 2013-10-14 10:26:41 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local_0006_r_000000_0' done. 2013-10-14 10:26:42 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: map 100% reduce 100% 2013-10-14 10:26:42 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: Job complete: job_local_0006 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: Counters: 19 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: File Output Format Counters 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: Bytes Written=554 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: FileSystemCounters 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: FILE_BYTES_READ=19722740 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: HDFS_BYTES_READ=4342 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: FILE_BYTES_WRITTEN=20674772 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: HDFS_BYTES_WRITTEN=4354 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: File Input Format Counters 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: Bytes Read=392 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: Map-Reduce Framework 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: Map output materialized bytes=162 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: Map input records=7 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: Reduce shuffle bytes=0 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: Spilled Records=14 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: Map output bytes=599 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: Total committed heap usage (bytes)=1373372416 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: SPLIT_RAW_BYTES=140 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: Combine input records=25 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: Reduce input records=7 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: Reduce input groups=7 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: Combine output records=7 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: Reduce output records=7 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: Map output records=25 2013-10-14 10:26:42 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus 信息: Total input paths to process : 1 2013-10-14 10:26:42 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus 信息: Total input paths to process : 1 2013-10-14 10:26:42 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: Running job: job_local_0007 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: io.sort.mb = 100 2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: data buffer = 79691776/99614720 2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: record buffer = 262144/327680 2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush 信息: Starting flush of map output 2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill 信息: Finished spill 0 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local_0007_m_000000_0 is done. And is in the process of commiting 2013-10-14 10:26:42 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local_0007_m_000000_0' done. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: io.sort.mb = 100 2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: data buffer = 79691776/99614720 2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: record buffer = 262144/327680 2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush 信息: Starting flush of map output 2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill 信息: Finished spill 0 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local_0007_m_000001_0 is done. And is in the process of commiting 2013-10-14 10:26:42 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local_0007_m_000001_0' done. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2013-10-14 10:26:42 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:42 org.apache.hadoop.mapred.Merger$MergeQueue merge 信息: Merging 2 sorted segments 2013-10-14 10:26:42 org.apache.hadoop.io.compress.CodecPool getDecompressor 信息: Got brand-new decompressor 2013-10-14 10:26:42 org.apache.hadoop.mapred.Merger$MergeQueue merge 信息: Down to the last merge-pass, with 2 segments left of total size: 233 bytes 2013-10-14 10:26:42 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local_0007_r_000000_0 is done. And is in the process of commiting 2013-10-14 10:26:42 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task commit 信息: Task attempt_local_0007_r_000000_0 is allowed to commit now 2013-10-14 10:26:42 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask 信息: Saved output of task 'attempt_local_0007_r_000000_0' to hdfs://192.168.1.210:9000/tmp/1381717594500/partialMultiply 2013-10-14 10:26:42 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: reduce > reduce 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local_0007_r_000000_0' done. 2013-10-14 10:26:43 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: map 100% reduce 100% 2013-10-14 10:26:43 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: Job complete: job_local_0007 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: Counters: 19 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: File Output Format Counters 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: Bytes Written=572 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: FileSystemCounters 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: FILE_BYTES_READ=34517913 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: HDFS_BYTES_READ=8751 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: FILE_BYTES_WRITTEN=36182630 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: HDFS_BYTES_WRITTEN=7934 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: File Input Format Counters 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: Bytes Read=0 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: Map-Reduce Framework 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: Map output materialized bytes=241 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: Map input records=12 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: Reduce shuffle bytes=0 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: Spilled Records=56 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: Map output bytes=453 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: Total committed heap usage (bytes)=2558459904 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: SPLIT_RAW_BYTES=665 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: Combine input records=0 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: Reduce input records=28 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: Reduce input groups=7 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: Combine output records=0 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: Reduce output records=7 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: Map output records=28 2013-10-14 10:26:43 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus 信息: Total input paths to process : 1 2013-10-14 10:26:43 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: Running job: job_local_0008 2013-10-14 10:26:43 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2013-10-14 10:26:43 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: io.sort.mb = 100 2013-10-14 10:26:43 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: data buffer = 79691776/99614720 2013-10-14 10:26:43 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: record buffer = 262144/327680 2013-10-14 10:26:43 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush 信息: Starting flush of map output 2013-10-14 10:26:43 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill 信息: Finished spill 0 2013-10-14 10:26:43 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local_0008_m_000000_0 is done. And is in the process of commiting 2013-10-14 10:26:43 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:43 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local_0008_m_000000_0' done. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2013-10-14 10:26:43 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:43 org.apache.hadoop.mapred.Merger$MergeQueue merge 信息: Merging 1 sorted segments 2013-10-14 10:26:43 org.apache.hadoop.mapred.Merger$MergeQueue merge 信息: Down to the last merge-pass, with 1 segments left of total size: 206 bytes 2013-10-14 10:26:43 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:43 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local_0008_r_000000_0 is done. And is in the process of commiting 2013-10-14 10:26:43 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:43 org.apache.hadoop.mapred.Task commit 信息: Task attempt_local_0008_r_000000_0 is allowed to commit now 2013-10-14 10:26:43 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask 信息: Saved output of task 'attempt_local_0008_r_000000_0' to hdfs://192.168.1.210:9000/user/hdfs/userCF/result 2013-10-14 10:26:43 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: reduce > reduce 2013-10-14 10:26:43 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local_0008_r_000000_0' done. 2013-10-14 10:26:44 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: map 100% reduce 100% 2013-10-14 10:26:44 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: Job complete: job_local_0008 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: Counters: 19 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: File Output Format Counters 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: Bytes Written=217 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: FileSystemCounters 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: FILE_BYTES_READ=26299802 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: HDFS_BYTES_READ=7357 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: FILE_BYTES_WRITTEN=27566408 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: HDFS_BYTES_WRITTEN=6269 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: File Input Format Counters 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: Bytes Read=572 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: Map-Reduce Framework 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: Map output materialized bytes=210 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: Map input records=7 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: Reduce shuffle bytes=0 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: Spilled Records=42 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: Map output bytes=927 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: Total committed heap usage (bytes)=1971453952 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: SPLIT_RAW_BYTES=137 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: Combine input records=0 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: Reduce input records=21 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: Reduce input groups=5 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: Combine output records=0 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: Reduce output records=5 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: Map output records=21 cat: hdfs://192.168.1.210:9000/user/hdfs/userCF/result//part-r-00000 1 [104:1.280239,106:1.1462644,105:1.0653841,107:0.33333334] 2 [106:1.560478,105:1.4795978,107:0.69935876] 3 [103:1.2475469,106:1.1944525,102:1.1462644] 4 [102:1.6462644,105:1.5277859,107:0.69935876] 5 [107:1.1993587]
5). 推薦結果解讀
咱們能夠把上面的日誌分解析成3個部分解讀
a. 初始化環境
出初HDFS的數據目錄和工做目錄,並上傳數據文件。
Delete: hdfs://192.168.1.210:9000/user/hdfs/userCF Create: hdfs://192.168.1.210:9000/user/hdfs/userCF copy from: datafile/item.csv to hdfs://192.168.1.210:9000/user/hdfs/userCF ls: hdfs://192.168.1.210:9000/user/hdfs/userCF ========================================================== name: hdfs://192.168.1.210:9000/user/hdfs/userCF/item.csv, folder: false, size: 229 ========================================================== cat: hdfs://192.168.1.210:9000/user/hdfs/userCF/item.csv
b. 算法執行
分別執行,上圖中對應的8種MapReduce算法。
Job complete: job_local_0001
Job complete: job_local_0002
Job complete: job_local_0003
Job complete: job_local_0004
Job complete: job_local_0005
Job complete: job_local_0006
Job complete: job_local_0007
Job complete: job_local_0008
c. 打印推薦結果
方便咱們看到計算後的推薦結果
cat: hdfs://192.168.1.210:9000/user/hdfs/userCF/result//part-r-00000 1 [104:1.280239,106:1.1462644,105:1.0653841,107:0.33333334] 2 [106:1.560478,105:1.4795978,107:0.69935876] 3 [103:1.2475469,106:1.1944525,102:1.1462644] 4 [102:1.6462644,105:1.5277859,107:0.69935876] 5 [107:1.1993587]
https://github.com/bsspirit/maven_mahout_template/tree/mahout-0.8
你們能夠下載這個項目,作爲開發的起點。
~ git clone https://github.com/bsspirit/maven_mahout_template ~ git checkout mahout-0.8
咱們完成了基於物品的協同過濾分步式算法實現,下面將繼續介紹Mahout的Kmeans的分步式算法實現,請參考文章:Mahout分步式程序開發 聚類Kmeans
轉載請註明出處:
http://blog.fens.me/hadoop-mahout-mapreduce-itemcf/