僞分佈式hadoop+mahout部署及20newsgroups經典算法測試

--------------------------------------------------------------------------
第一階段:hadoop的僞分佈式安裝

第二階段:mahout的安裝html

第三階段:20newsgroups的bayes算法測試
-------------------------------------------------------------------------
注意:安裝完vmwaretools必須重啓centos才能夠生效
第一階段:hadoop的僞分佈式安裝
1.JDK的安裝
1.1解壓hadoop安裝包卸載hadoop自帶的jdk
1. 檢驗系統原版本: 命令行 # java -version
查看詳細信息 # rpm -qa | grep java
卸載自帶的: 命令行 # rpm -e --nodeps
卸載OpenJDK,執行如下操做
[root@Centos 桌面]# rpm -e --nodeps 版本信息
複查 # rpm -qa | grep java 無輸出表示卸載乾淨了
----------------------------------------------------------------------------------
[root@Centos 桌面]# java -version
java version "1.7.0_09-icedtea"
OpenJDK Runtime Environment (rhel-2.3.4.1.el6_3-x86_64)
OpenJDK 64-Bit Server VM (build 23.2-b09, mixed mode)
[root@Centos 桌面]# rpm -qa | grep java
tzdata-java-2012j-1.el6.noarch
java-1.7.0-openjdk-1.7.0.9-2.3.4.1.el6_3.x86_64
java-1.6.0-openjdk-1.6.0.0-1.50.1.11.5.el6_3.x86_64
[root@Centos 桌面]# rpm -e --nodeps tzdata-java-2012j-1.el6.noarch
[root@Centos 桌面]# rpm -e --nodeps java-1.7.0-openjdk-1.7.0.9-2.3.4.1.el6_3.x86_64
[root@Centos 桌面]# rpm -e --nodeps java-1.6.0-openjdk-1.6.0.0-1.50.1.11.5.el6_3.x86_64
[root@Centos 桌面]# rpm -qa | grep java
[root@Centos 桌面]#
----------------------------------------------------------------------------------
1.2 安裝本身下載的jdk配置環境變量
1.解壓安裝jdk
------------------------------------------------------------
[root@Centos 桌面]# cd /root
[root@Centos ~]# tar zxvf jdk-8u65-linux-x64.gz
------------------------------------------------------------
2.配置環境變量
1.編輯/etc/profile文件 命令行 vi /etc/profile
[root@Centos ~]# vi /etc/profile
2.配置環境變量---在/etc/profile文件裏添加jdk路徑
export JAVA_HOME=/root/jdk1.8.0_65
export JRE_HOME=/root/jdk1.8.0_65/jre
export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JRE_HOME/lib
export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/binjava

3.保存生效
[root@Centos ~]# source /etc/profile
[root@Centos ~]# echo $JAVA_HOME
3. 驗證安裝
執行如下操做,查看信息是否正常:
[root@Centos ~]# java
[root@Centos ~]# javac
[root@Centos ~]# java -version
---------------------------------------------------------------------------------
[root@Centos ~]# java
用法: java [-options] class [args...]
(執行類)
或 java [-options] -jar jarfile [args...]
(執行 jar 文件)
其中選項包括:
-d32 使用 32 位數據模型 (若是可用)
-d64 使用 64 位數據模型 (若是可用)
-server 選擇 "server" VM
默認 VM 是 server.node

-cp <目錄和 zip/jar 文件的類搜索路徑>
-classpath <目錄和 zip/jar 文件的類搜索路徑>
用 : 分隔的目錄, JAR 檔案
和 ZIP 檔案列表, 用於搜索類文件。
-D<名稱>=<值>
設置系統屬性
-verbose:[class|gc|jni]
啓用詳細輸出
-version 輸出產品版本並退出
-version:<值>
警告: 此功能已過期, 將在
將來發行版中刪除。
須要指定的版本才能運行
-showversion 輸出產品版本並繼續
-jre-restrict-search | -no-jre-restrict-search
警告: 此功能已過期, 將在
將來發行版中刪除。
在版本搜索中包括/排除用戶專用 JRE
-? -help 輸出此幫助消息
-X 輸出非標準選項的幫助
-ea[:<packagename>...|:<classname>]
-enableassertions[:<packagename>...|:<classname>]
按指定的粒度啓用斷言
-da[:<packagename>...|:<classname>]
-disableassertions[:<packagename>...|:<classname>]
禁用具備指定粒度的斷言
-esa | -enablesystemassertions
啓用系統斷言
-dsa | -disablesystemassertions
禁用系統斷言
-agentlib:<libname>[=<選項>]
加載本機代理庫 <libname>, 例如 -agentlib:hprof
另請參閱 -agentlib:jdwp=help 和 -agentlib:hprof=help
-agentpath:<pathname>[=<選項>]
按完整路徑名加載本機代理庫
-javaagent:<jarpath>[=<選項>]
加載 Java 編程語言代理, 請參閱 java.lang.instrument
-splash:<imagepath>
使用指定的圖像顯示啓動屏幕
有關詳細信息, 請參閱 http://www.oracle.com/technetwork/java/javase/documentation/index.html。
[root@Centos ~]# javac
用法: javac <options> <source files>
其中, 可能的選項包括:
-g 生成全部調試信息
-g:none 不生成任何調試信息
-g:{lines,vars,source} 只生成某些調試信息
-nowarn 不生成任何警告
-verbose 輸出有關編譯器正在執行的操做的消息
-deprecation 輸出使用已過期的 API 的源位置
-classpath <路徑> 指定查找用戶類文件和註釋處理程序的位置
-cp <路徑> 指定查找用戶類文件和註釋處理程序的位置
-sourcepath <路徑> 指定查找輸入源文件的位置
-bootclasspath <路徑> 覆蓋引導類文件的位置
-extdirs <目錄> 覆蓋所安裝擴展的位置
-endorseddirs <目錄> 覆蓋簽名的標準路徑的位置
-proc:{none,only} 控制是否執行註釋處理和/或編譯。
-processor <class1>[,<class2>,<class3>...] 要運行的註釋處理程序的名稱; 繞過默認的搜索進程
-processorpath <路徑> 指定查找註釋處理程序的位置
-parameters 生成元數據以用於方法參數的反射
-d <目錄> 指定放置生成的類文件的位置
-s <目錄> 指定放置生成的源文件的位置
-h <目錄> 指定放置生成的本機標頭文件的位置
-implicit:{none,class} 指定是否爲隱式引用文件生成類文件
-encoding <編碼> 指定源文件使用的字符編碼
-source <發行版> 提供與指定發行版的源兼容性
-target <發行版> 生成特定 VM 版本的類文件
-profile <配置文件> 請確保使用的 API 在指定的配置文件中可用
-version 版本信息
-help 輸出標準選項的提要
-A關鍵字[=值] 傳遞給註釋處理程序的選項
-X 輸出非標準選項的提要
-J<標記> 直接將 <標記> 傳遞給運行時系統
-Werror 出現警告時終止編譯
@<文件名> 從文件讀取選項和文件名linux

[root@Centos ~]# java -version
java version "1.8.0_65"
Java(TM) SE Runtime Environment (build 1.8.0_65-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode)算法

·······························JDK安裝完成·····································
2. hadoop的安裝開始
1.在hadoop的conf目錄下配置 hadoop-env.sh core-site.xml hdfs-site.xml mapred-site.xml
1.1 在hadoop-env.sh裏的配置hadoop的JDK環境
---------------------------------------------
[root@Centos ~]# cd hadoop-1.2.1/
[root@Centos hadoop-1.2.1]# cd conf
[root@Centos conf]# vi hadoop-env.sh
---------------------------------------------
配置信息以下:
export JAVA_HOME=/root/jdk1.8.0_65
1.2 在core-site.xml裏的配置hadoop的HDFS地址及端口號
------------------------------------------------
[root@Centos conf]# vi core-site.xml
------------------------------------------------
配置信息以下:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
1.3 在hdfs-site.xml裏的配置hadoop的HDFS的配置
-------------------------------------------------
[root@Centos conf]# vi hdfs-site.xml
-------------------------------------------------
配置信息以下:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
1.4 在mapred-site.xml裏的配置hadoop的HDFS的配置
-------------------------------------------------
[root@Centos conf]# vi mapred-site.xml
--------------------------------------------
配置信息以下:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
--------------------------------------------------------------------
[root@Centos conf]# vi hadoop-env.sh
[root@Centos conf]# vi core-site.xml
[root@Centos conf]# vi hdfs-site.xml
[root@Centos conf]# vi mapred-site.xml
--------------------------------------------------------------------
2.ssh免密碼登陸
--------------------------------------------------------------------
[root@Centos conf]# cd /root
[root@Centos ~]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
ed:48:64:29:62:37:c1:e9:3d:84:bf:ad:4e:50:5e:66 root@Centos
The key's randomart image is:
+--[ RSA 2048]----+
| ..o |
| +... |
| o.++= E |
| . o.B+= |
| . S+. |
| o.o. |
| o.. |
| .. |
| .. |
+-----------------+
c[root@Centos ~]# cd .ssh
[root@Centos .ssh]# ls
id_rsa id_rsa.pub
[root@Centos .ssh]# cp id_rsa.pub authorized_keys
[root@Centos .ssh]# ls
authorized_keys id_rsa id_rsa.pub
[root@Centos .ssh]# ssh localhost
The authenticity of host 'localhost (::1)' can't be established.
RSA key fingerprint is 3f:84:db:2f:53:a9:09:a6:61:a2:3a:82:80:6c:af:1a.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
-------------------------------------------------------------------------------
驗證免密碼登陸
-------------------------------------------------------------------------------
[root@Centos ~]# ssh localhost
Last login: Sun Apr 3 23:19:51 2016 from localhost
[root@Centos ~]# exit
logout
Connection to localhost closed.
[root@Centos ~]# ssh localhost
Last login: Sun Apr 3 23:20:12 2016 from localhost
[root@Centos ~]# exit
logout
Connection to localhost closed.
[root@Centos ~]#
----------------------------SSH免密碼登陸設置成功----------------------------
3.格式化HDFS
命令行 # bin/hadoop namenode -format
-----------------------------------------------------------------------------
[root@Centos ~]# cd /root/hadoop-1.2.1/
[root@Centos hadoop-1.2.1]# bin/hadoop namenode -format
16/04/03 23:24:12 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = java.net.UnknownHostException: Centos: Centos: unknown error
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 1.2.1
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG: java = 1.8.0_65
************************************************************/
16/04/03 23:24:13 INFO util.GSet: Computing capacity for map BlocksMap
16/04/03 23:24:13 INFO util.GSet: VM type = 64-bit
16/04/03 23:24:13 INFO util.GSet: 2.0% max memory = 1013645312
16/04/03 23:24:13 INFO util.GSet: capacity = 2^21 = 2097152 entries
16/04/03 23:24:13 INFO util.GSet: recommended=2097152, actual=2097152
16/04/03 23:24:15 INFO namenode.FSNamesystem: fsOwner=root
16/04/03 23:24:15 INFO namenode.FSNamesystem: supergroup=supergroup
16/04/03 23:24:15 INFO namenode.FSNamesystem: isPermissionEnabled=true
16/04/03 23:24:15 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
16/04/03 23:24:15 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
16/04/03 23:24:15 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
16/04/03 23:24:15 INFO namenode.NameNode: Caching file names occuring more than 10 times
16/04/03 23:24:17 INFO common.Storage: Image file /tmp/hadoop-root/dfs/name/current/fsimage of size 110 bytes saved in 0 seconds.
16/04/03 23:24:18 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/tmp/hadoop-root/dfs/name/current/edits
16/04/03 23:24:18 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/tmp/hadoop-root/dfs/name/current/edits
16/04/03 23:24:18 INFO common.Storage: Storage directory /tmp/hadoop-root/dfs/name has been successfully formatted.
16/04/03 23:24:18 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at java.net.UnknownHostException: Centos: Centos: unknown error
************************************************************/
-----------------------------------------------------------------------------
格式化節點報錯:Centos: unknown error--------彆着急緊接着下一步配置
--------------------------------------------------------------------------
[root@Centos hadoop-1.2.1]# vi /etc/hosts
配置信息以下:
127.0.0.1 localhost Centos
-------------------------------------------------------------------------
再一次進行格式化
--------------------------------------------------------------------------
[root@Centos hadoop-1.2.1]# vi /etc/hosts
[root@Centos hadoop-1.2.1]# bin/hadoop namenode -format
16/04/03 23:26:30 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = Centos/127.0.0.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 1.2.1
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG: java = 1.8.0_65
************************************************************/
Re-format filesystem in /tmp/hadoop-root/dfs/name ? (Y or N) Y
16/04/03 23:26:33 INFO util.GSet: Computing capacity for map BlocksMap
16/04/03 23:26:33 INFO util.GSet: VM type = 64-bit
16/04/03 23:26:33 INFO util.GSet: 2.0% max memory = 1013645312
16/04/03 23:26:33 INFO util.GSet: capacity = 2^21 = 2097152 entries
16/04/03 23:26:33 INFO util.GSet: recommended=2097152, actual=2097152
16/04/03 23:26:33 INFO namenode.FSNamesystem: fsOwner=root
16/04/03 23:26:33 INFO namenode.FSNamesystem: supergroup=supergroup
16/04/03 23:26:33 INFO namenode.FSNamesystem: isPermissionEnabled=true
16/04/03 23:26:33 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
16/04/03 23:26:33 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
16/04/03 23:26:33 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
16/04/03 23:26:33 INFO namenode.NameNode: Caching file names occuring more than 10 times
16/04/03 23:26:34 INFO common.Storage: Image file /tmp/hadoop-root/dfs/name/current/fsimage of size 110 bytes saved in 0 seconds.
16/04/03 23:26:34 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/tmp/hadoop-root/dfs/name/current/edits
16/04/03 23:26:34 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/tmp/hadoop-root/dfs/name/current/edits
16/04/03 23:26:34 INFO common.Storage: Storage directory /tmp/hadoop-root/dfs/name has been successfully formatted.
16/04/03 23:26:34 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at Centos/127.0.0.1
************************************************************/
---------------------------namenode格式化成功------------------------------
4.啓動hadoop
關閉防火牆命令行 # service iptables stop
啓動hadoop集羣命令行 # start-all.sh
關閉hadoop集羣命令行 # stop-all.sh
---------------------------------------------------------------------------
關閉防火牆
[root@Centos hadoop-1.2.1]# service iptables stop
iptables:清除防火牆規則: [肯定]
iptables:將鏈設置爲政策 ACCEPT:filter [肯定]
iptables:正在卸載模塊: [肯定]
啓動hadoop集羣
[root@Centos hadoop-1.2.1]# bin/start-all.sh
starting namenode, logging to /root/hadoop-1.2.1/libexec/../logs/hadoop-root-namenode-Centos.out
localhost: starting datanode, logging to /root/hadoop-1.2.1/libexec/../logs/hadoop-root-datanode-Centos.out
localhost: starting secondarynamenode, logging to /root/hadoop-1.2.1/libexec/../logs/hadoop-root-secondarynamenode-Centos.out
starting jobtracker, logging to /root/hadoop-1.2.1/libexec/../logs/hadoop-root-jobtracker-Centos.out
localhost: starting tasktracker, logging to /root/hadoop-1.2.1/libexec/../logs/hadoop-root-tasktracker-Centos.out
驗證集羣是否正常啓動----5個節點在列表中則啓動成功
再次驗證啓動項目
[root@Centos hadoop-1.2.1]# cd mahout-distribution-0.6/
[root@Centos mahout-distribution-0.6]# jps
30692 SecondaryNameNode
30437 NameNode
31382 Jps
30903 TaskTracker
30775 JobTracker
30553 DataNode
[root@Centos mahout-distribution-0.6]# jps
30692 SecondaryNameNode
31477 Jps
30437 NameNode
30903 TaskTracker
30775 JobTracker
30553 DataNode
[root@Centos mahout-distribution-0.6]# cd ..
關閉hadoop集羣
[root@Centos hadoop-1.2.1]# bin/stop-all.sh
stopping jobtracker
localhost: stopping tasktracker
stopping namenode
localhost: stopping datanode
localhost: stopping secondarynamenode
[root@Centos hadoop-1.2.1]#
------------------------hadoop僞分佈式安裝成功------------------------
**********************************************************************
**********************************************************************
第二階段:mahout的安裝
1.解壓安裝mahout
[root@Centos hadoop-1.2.1]# tar zxvf mahout-distribution-0.6.tar.gz
2.配置環境變量
export HADOOP_HOME=/root/hadoop-1.2.1
export HADOOP_CONF_DIR=/root/hadoop-1.2.1/conf
export MAHOUT_HOME=/root/hadoop-1.2.1/mahoutdistribution-0.6
export MAHOUT_CONF_DIR=/root/hadoop-1.2.1/mahoutdistribution-0.6/conf
export PATH=$PATH:$MAHOUT_HOME/conf:$MAHOUT_HOME/bin
3.測試mahout的啓動
-------------------------------------------------------------------------
[root@Centos mahout-distribution-0.6]# cd ..
[root@Centos hadoop-1.2.1]# bin/stop-all.sh
stopping jobtracker
localhost: stopping tasktracker
stopping namenode
localhost: stopping datanode
localhost: stopping secondarynamenode
[root@Centos hadoop-1.2.1]# cd ..
You have mail in /var/spool/mail/root
[root@Centos ~]# cd ruanjian/
[root@Centos ruanjian]# tar zxvf
tar: 舊選項「f」須要參數。
請用「tar --help」或「tar --usage」得到更多信息。
[root@Centos ruanjian]# cd ..
[root@Centos ~]# cd hadoop-1.2.1/
[root@Centos hadoop-1.2.1]# export HADOOP_HOME=/root/hadoop-1.2.1
[root@Centos hadoop-1.2.1]# export HADOOP_CONF_DIR=/root/hadoop-1.2.1/conf
[root@Centos hadoop-1.2.1]# export MAHOUT_HOME=/root/hadoop-1.2.1/mahoutdistribution-0.6
[root@Centos hadoop-1.2.1]# export MAHOUT_CONF_DIR=/root/hadoop-1.2.1/mahoutdistribution-0.6/conf
[root@Centos hadoop-1.2.1]# export PATH=$PATH:$MAHOUT_HOME/conf:$MAHOUT_HOME/bin
[root@Centos hadoop-1.2.1]# cd mahout-distribution-0.6/
[root@Centos mahout-distribution-0.6]# bin/mahout
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using HADOOP_HOME=/root/hadoop-1.2.1
HADOOP_CONF_DIR=/root/hadoop-1.2.1/conf
MAHOUT-JOB: /root/hadoop-1.2.1/mahout-distribution-0.6/mahout-examples-0.6-job.jar
Warning: $HADOOP_HOME is deprecated.express

An example program must be given as the first argument.
Valid program names are:
arff.vector: : Generate Vectors from an ARFF file or directory
baumwelch: : Baum-Welch algorithm for unsupervised HMM training
canopy: : Canopy clustering
cat: : Print a file or resource as the logistic regression models would see it
cleansvd: : Cleanup and verification of SVD output
clusterdump: : Dump cluster output to text
clusterpp: : Groups Clustering Output In Clusters
cmdump: : Dump confusion matrix in HTML or text formats
cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
dirichlet: : Dirichlet Clustering
eigencuts: : Eigencuts spectral clustering
evaluateFactorization: : compute RMSE and MAE of a rating matrix factorization against probes
fkmeans: : Fuzzy K-means clustering
fpg: : Frequent Pattern Growth
hmmpredict: : Generate random sequence of observations by given HMM
itemsimilarity: : Compute the item-item-similarities for item-based collaborative filtering
kmeans: : K-means clustering
lda: : Latent Dirchlet Allocation
ldatopics: : LDA Print Topics
lucene.vector: : Generate Vectors from a Lucene index
matrixdump: : Dump matrix in CSV format
matrixmult: : Take the product of two matrices
meanshift: : Mean Shift clustering
minhash: : Run Minhash clustering
pagerank: : compute the PageRank of a graph
parallelALS: : ALS-WR factorization of a rating matrix
prepare20newsgroups: : Reformat 20 newsgroups data
randomwalkwithrestart: : compute all other vertices' proximity to a source vertex in a graph
recommendfactorized: : Compute recommendations using the factorization of a rating matrix
recommenditembased: : Compute recommendations using item-based collaborative filtering
regexconverter: : Convert text files on a per line basis based on regular expressions
rowid: : Map SequenceFile<Text,VectorWritable> to {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
rowsimilarity: : Compute the pairwise similarities of the rows of a matrix
runAdaptiveLogistic: : Score new production data using a probably trained and validated AdaptivelogisticRegression model
runlogistic: : Run a logistic regression model against CSV data
seq2encoded: : Encoded Sparse Vector generation from Text sequence files
seq2sparse: : Sparse Vector generation from Text sequence files
seqdirectory: : Generate sequence files (of Text) from a directory
seqdumper: : Generic Sequence File dumper
seqwiki: : Wikipedia xml dump to sequence file
spectralkmeans: : Spectral k-means clustering
split: : Split Input data into test and train sets
splitDataset: : split a rating dataset into training and probe parts
ssvd: : Stochastic SVD
svd: : Lanczos Singular Value Decomposition
testclassifier: : Test the text based Bayes Classifier
testnb: : Test the Vector-based Bayes classifier
trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model
trainclassifier: : Train the text based Bayes Classifier
trainlogistic: : Train a logistic regression using stochastic gradient descent
trainnb: : Train the Vector-based Bayes classifier
transpose: : Take the transpose of a matrix
validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model against hold-out data set
vecdist: : Compute the distances between a set of Vectors (or Cluster or Canopy, they must fit in memory) and a list of Vectors
vectordump: : Dump vectors from a sequence file to text
viterbi: : Viterbi decoding of hidden states from given output states sequence
wikipediaDataSetCreator: : Splits data set of wikipedia wrt feature like country
wikipediaXMLSplitter: : Reads wikipedia data and creates ch
[root@Centos mahout-distribution-0.6]#
**********An example program must be given as the first argument.******出現則表示mahout安裝成功apache

--------------------------------mahout安裝成功----------------------------------------------------------------------- 編程

第三階段:20newsgroups的bayes算法測試
1.解壓20newsgroups的壓縮包
1.在根目錄下建立data目錄將下載的20newsgroups文件進行解壓
----------------------------------------------------------------------
[root@Centos mahout-distribution-0.6]# cd ..
[root@Centos hadoop-1.2.1]# cd ..
[root@Centos ~]# mkdir data
[root@Centos ~]# ls
anaconda-ks.cfg install.log ruanjian 視頻 下載
data install.log.syslog 公共的 圖片 音樂
hadoop-1.2.1 jdk1.8.0_65 模板 文檔 桌面
[root@Centos ~]# cd data/
[root@Centos data]# ls
20news-bydate.tar.gz
[root@Centos data]# tar zxvf
tar: 舊選項「f」須要參數。
請用「tar --help」或「tar --usage」得到更多信息。
[root@Centos data]# tar zxvf 20news-bydate.tar.gz
[root@Centos data]# ls
20news-bydate.tar.gz 20news-bydate-test 20news-bydate-train
[root@Centos data]#
-----------------------------------------------------------------------------------
2.啓動mahout
----------------------------------------------------------------------------------
[root@Centos data]# cd /root/hadoop-1.2.1/mahout-distribution-0.6/
[root@Centos mahout-distribution-0.6]# jps
34338 Jps
[root@Centos mahout-distribution-0.6]# cd ..
[root@Centos hadoop-1.2.1]# bin/start-all.sh
Warning: $HADOOP_HOME is deprecated.windows

starting namenode, logging to /root/hadoop-1.2.1/libexec/../logs/hadoop-root-namenode-Centos.out
localhost: starting datanode, logging to /root/hadoop-1.2.1/libexec/../logs/hadoop-root-datanode-Centos.out
localhost: starting secondarynamenode, logging to /root/hadoop-1.2.1/libexec/../logs/hadoop-root-secondarynamenode-Centos.out
starting jobtracker, logging to /root/hadoop-1.2.1/libexec/../logs/hadoop-root-jobtracker-Centos.out
localhost: starting tasktracker, logging to /root/hadoop-1.2.1/libexec/../logs/hadoop-root-tasktracker-Centos.out
[root@Centos hadoop-1.2.1]# cd mahout-distribution-0.6/
[root@Centos mahout-distribution-0.6]# jps
34979 Jps
34757 JobTracker
34886 TaskTracker
34663 SecondaryNameNode
34408 NameNode
34524 DataNode
[root@Centos mahout-distribution-0.6]# bin/mahout
-------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------
************************************************************************************************************************
貝葉斯算法測試-----20newsgroups的文本自動分類
第一步:創建訓練集
bin/mahout org.apache.mahout.classifier.bayes.PrepareTwentyNewsgroups \
-p /root/data/20news-bydate-train \
-o /root/data/bayes-test-input \
-a org.apache.mahout.vectorizer.DefaultAnalyzer \
-c UTF-8centos

bin/mahout org.apache.mahout.classifier.bayes.PrepareTwentyNewsgroups \
-p /root/data/20news-bydate-train \
-o /root/data/bayes-train-input \
-a org.apache.mahout.vectorizer.DefaultAnalyzer \
-c UTF-8
-----------------------------------------------------------------------------------------------------
創建訓練集
------------------------------------------------------------------------------------------------------
[root@Centos mahout-distribution-0.6]# bin/mahout org.apache.mahout.classifier.bayes.PrepareTwentyNewsgroups \
> -p /root/data/20news-bydate-train \
> -o /root/data/bayes-test-input \
> -a org.apache.mahout.vectorizer.DefaultAnalyzer \
>
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using HADOOP_HOME=/root/hadoop-1.2.1
HADOOP_CONF_DIR=/root/hadoop-1.2.1/conf
MAHOUT-JOB: /root/hadoop-1.2.1/mahout-distribution-0.6/mahout-examples-0.6-job.jar
Warning: $HADOOP_HOME is deprecated.

16/04/04 08:59:20 WARN driver.MahoutDriver: No org.apache.mahout.classifier.bayes.PrepareTwentyNewsgroups.props found on classpath, will use command-line arguments only
Usage:
[--analyzerName <analyzerName> --charset <charset> --outputDir <outputDir>
--parent <parent> --help]
Options
--analyzerName (-a) analyzerName The class name of the analyzer
--charset (-c) charset The name of the character encoding of the
input files
--outputDir (-o) outputDir The output directory
--parent (-p) parent Parent dir containing the newsgroups
--help (-h) Print out help
16/04/04 08:59:20 INFO driver.MahoutDriver: Program took 167 ms (Minutes: 0.0027833333333333334)
[root@Centos mahout-distribution-0.6]# bin/mahout org.apache.mahout.classifier.bayes.PrepareTwentyNewsgroups \
> -p /root/data/20news-bydate-train \
> -o /root/data/bayes-test-input \
> -a org.apache.mahout.vectorizer.DefaultAnalyzer \
> -c UTF-8
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using HADOOP_HOME=/root/hadoop-1.2.1
HADOOP_CONF_DIR=/root/hadoop-1.2.1/conf
MAHOUT-JOB: /root/hadoop-1.2.1/mahout-distribution-0.6/mahout-examples-0.6-job.jar
Warning: $HADOOP_HOME is deprecated.

16/04/04 08:59:41 WARN driver.MahoutDriver: No org.apache.mahout.classifier.bayes.PrepareTwentyNewsgroups.props found on classpath, will use command-line arguments only
16/04/04 09:00:29 INFO driver.MahoutDriver: Program took 47897 ms (Minutes: 0.7982833333333333)
[root@Centos mahout-distribution-0.6]# bin/mahout org.apache.mahout.classifier.bayes.PrepareTwentyNewsgroups \
> -p /root/data/20news-bydate-train \
> -o /root/data/bayes-train-input \
> -a org.apache.mahout.vectorizer.DefaultAnalyzer \
> -c UTF-8
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using HADOOP_HOME=/root/hadoop-1.2.1
HADOOP_CONF_DIR=/root/hadoop-1.2.1/conf
MAHOUT-JOB: /root/hadoop-1.2.1/mahout-distribution-0.6/mahout-examples-0.6-job.jar
Warning: $HADOOP_HOME is deprecated.

16/04/04 09:01:07 WARN driver.MahoutDriver: No org.apache.mahout.classifier.bayes.PrepareTwentyNewsgroups.props found on classpath, will use command-line arguments only
16/04/04 09:01:27 INFO driver.MahoutDriver: Program took 19347 ms (Minutes: 0.32245)
---------------查看輸出文件
[root@Centos mahout-distribution-0.6]# cd ..
[root@Centos hadoop-1.2.1]# cd ..
[root@Centos ~]# cd data
[root@Centos data]# ls
20news-bydate.tar.gz 20news-bydate-train bayes-train-input
20news-bydate-test bayes-test-input
[root@Centos data]#
-------------------bayes-test-input----bayes-train-input-----------------訓練集創建成功---



第二步:上傳到HDFS
創建上傳文件夾: bin/hadoop fs -mkdir 20news
上傳到HDFS: bin/hadoop fs -put 本地目錄 20news
查看: bin/hadoop fs -ls
bin/hadoop fs -ls /20news
-----------------------------------------------------------------------------------------------------------------------
[root@Centos hadoop-1.2.1]# cd /root/hadoop-1.2.1/
[root@Centos hadoop-1.2.1]# bin/hadoop fs -mkdir 20news
[root@Centos hadoop-1.2.1]# bin/hadoop fs -ls
Warning: $HADOOP_HOME is deprecated.

Found 1 items
drwxr-xr-x - root supergroup 0 2016-04-04 09:08 /user/root/20news

[root@Centos hadoop-1.2.1]# bin/hadoop fs -put ../data/bayes-train-input/ ./20news
[root@Centos hadoop-1.2.1]# bin/hadoop fs -ls 20news
Warning: $HADOOP_HOME is deprecated.
Found 1 items
drwxr-xr-x - root supergroup 0 2016-04-04 09:08 /user/root/20news/bayes-train-input
[root@Centos hadoop-1.2.1]# bin/hadoop fs -put ../data/bayes-test-input/ ./20news
[root@Centos hadoop-1.2.1]# bin/hadoop fs -ls 20news
Warning: $HADOOP_HOME is deprecated.
Found 2 items
drwxr-xr-x - root supergroup 0 2016-04-04 09:08 /user/root/20news/bayes-test-input
drwxr-xr-x - root supergroup 0 2016-04-04 09:08 /user/root/20news/bayes-train-input
[root@Centos hadoop-1.2.1]# bin/hadoop fs -ls 20news/bayes-train-input
Warning: $HADOOP_HOME is deprecated.

Found 20 items
-rw-r--r-- 1 root supergroup 773301 2016-04-04 09:08 /user/root/20news/bayes-train-input/alt.atheism.txt
-rw-r--r-- 1 root supergroup 687018 2016-04-04 09:08 /user/root/20news/bayes-train-input/comp.graphics.txt
-rw-r--r-- 1 root supergroup 1371301 2016-04-04 09:08 /user/root/20news/bayes-train-input/comp.os.ms-windows.misc.txt
-rw-r--r-- 1 root supergroup 605082 2016-04-04 09:08 /user/root/20news/bayes-train-input/comp.sys.ibm.pc.hardware.txt
-rw-r--r-- 1 root supergroup 539488 2016-04-04 09:08 /user/root/20news/bayes-train-input/comp.sys.mac.hardware.txt
-rw-r--r-- 1 root supergroup 924668 2016-04-04 09:08 /user/root/20news/bayes-train-input/comp.windows.x.txt
-rw-r--r-- 1 root supergroup 457202 2016-04-04 09:08 /user/root/20news/bayes-train-input/misc.forsale.txt
-rw-r--r-- 1 root supergroup 649942 2016-04-04 09:08 /user/root/20news/bayes-train-input/rec.autos.txt
-rw-r--r-- 1 root supergroup 610103 2016-04-04 09:08 /user/root/20news/bayes-train-input/rec.motorcycles.txt
-rw-r--r-- 1 root supergroup 648313 2016-04-04 09:08 /user/root/20news/bayes-train-input/rec.sport.baseball.txt
-rw-r--r-- 1 root supergroup 870760 2016-04-04 09:08 /user/root/20news/bayes-train-input/rec.sport.hockey.txt
-rw-r--r-- 1 root supergroup 1139592 2016-04-04 09:08 /user/root/20news/bayes-train-input/sci.crypt.txt
-rw-r--r-- 1 root supergroup 616166 2016-04-04 09:08 /user/root/20news/bayes-train-input/sci.electronics.txt
-rw-r--r-- 1 root supergroup 901841 2016-04-04 09:08 /user/root/20news/bayes-train-input/sci.med.txt
-rw-r--r-- 1 root supergroup 913047 2016-04-04 09:08 /user/root/20news/bayes-train-input/sci.space.txt
-rw-r--r-- 1 root supergroup 1004842 2016-04-04 09:08 /user/root/20news/bayes-train-input/soc.religion.christian.txt
-rw-r--r-- 1 root supergroup 973157 2016-04-04 09:08 /user/root/20news/bayes-train-input/talk.politics.guns.txt
-rw-r--r-- 1 root supergroup 1317255 2016-04-04 09:08 /user/root/20news/bayes-train-input/talk.politics.mideast.txt
-rw-r--r-- 1 root supergroup 980920 2016-04-04 09:08 /user/root/20news/bayes-train-input/talk.politics.misc.txt
-rw-r--r-- 1 root supergroup 623882 2016-04-04 09:08 /user/root/20news/bayes-train-input/talk.religion.misc.txt

[root@Centos hadoop-1.2.1]# bin/hadoop fs -ls 20news/bayes-test-input
Warning: $HADOOP_HOME is deprecated.

Found 20 items
-rw-r--r-- 1 root supergroup 773301 2016-04-04 09:08 /user/root/20news/bayes-test-input/alt.atheism.txt
-rw-r--r-- 1 root supergroup 687018 2016-04-04 09:08 /user/root/20news/bayes-test-input/comp.graphics.txt
-rw-r--r-- 1 root supergroup 1371301 2016-04-04 09:08 /user/root/20news/bayes-test-input/comp.os.ms-windows.misc.txt
-rw-r--r-- 1 root supergroup 605082 2016-04-04 09:08 /user/root/20news/bayes-test-input/comp.sys.ibm.pc.hardware.txt
-rw-r--r-- 1 root supergroup 539488 2016-04-04 09:08 /user/root/20news/bayes-test-input/comp.sys.mac.hardware.txt
-rw-r--r-- 1 root supergroup 924668 2016-04-04 09:08 /user/root/20news/bayes-test-input/comp.windows.x.txt
-rw-r--r-- 1 root supergroup 457202 2016-04-04 09:08 /user/root/20news/bayes-test-input/misc.forsale.txt
-rw-r--r-- 1 root supergroup 649942 2016-04-04 09:08 /user/root/20news/bayes-test-input/rec.autos.txt
-rw-r--r-- 1 root supergroup 610103 2016-04-04 09:08 /user/root/20news/bayes-test-input/rec.motorcycles.txt
-rw-r--r-- 1 root supergroup 648313 2016-04-04 09:08 /user/root/20news/bayes-test-input/rec.sport.baseball.txt
-rw-r--r-- 1 root supergroup 870760 2016-04-04 09:08 /user/root/20news/bayes-test-input/rec.sport.hockey.txt
-rw-r--r-- 1 root supergroup 1139592 2016-04-04 09:08 /user/root/20news/bayes-test-input/sci.crypt.txt
-rw-r--r-- 1 root supergroup 616166 2016-04-04 09:08 /user/root/20news/bayes-test-input/sci.electronics.txt
-rw-r--r-- 1 root supergroup 901841 2016-04-04 09:08 /user/root/20news/bayes-test-input/sci.med.txt
-rw-r--r-- 1 root supergroup 913047 2016-04-04 09:08 /user/root/20news/bayes-test-input/sci.space.txt
-rw-r--r-- 1 root supergroup 1004842 2016-04-04 09:08 /user/root/20news/bayes-test-input/soc.religion.christian.txt
-rw-r--r-- 1 root supergroup 973157 2016-04-04 09:08 /user/root/20news/bayes-test-input/talk.politics.guns.txt
-rw-r--r-- 1 root supergroup 1317255 2016-04-04 09:08 /user/root/20news/bayes-test-input/talk.politics.mideast.txt
-rw-r--r-- 1 root supergroup 980920 2016-04-04 09:08 /user/root/20news/bayes-test-input/talk.politics.misc.txt
-rw-r--r-- 1 root supergroup 623882 2016-04-04 09:08 /user/root/20news/bayes-test-input/talk.religion.misc.txt

[root@Centos hadoop-1.2.1]# bin/hadoop fs -cat 20news/bayes-train-input/talk.politics.misc.txt


rce most part uninformed ignorant public democracy i don't think so society's sense justice judged basis treatment people who make up society all those people yes includes gays lesbians bisexuals whose crimes have victims who varied diverse society wich part frank jordan d d d c c c gay arab bassoonists unite
talk.politics.misc from steveh thor.isc br.com steve hendricks subject re limiting govt re employment re why concentrate summary promoting competition does depend upon libertarians organization free barbers inc lines 60 nntp posting host thor.isc br.com article c5kh8g 961 cbnewse.cb.att.com doctor1 cbnewse.cb.att.com patrick.b.hailey writes article 1993apr15.170731.8797 isc br.isc br.com steveh thor.isc br.com steve hendricks writes two paragraphs from two different posts splicing them together my intention change steve's meaning misrepresent him any way i don't think i've done so noted another thread limiting govt problem libertarians face insuring limited government seek does become tool private interests pursue own agenda failure libertarianism ideology does provide any reasonable way restrain actions other than utopian dreams just marxism fails specify how pure communism achieved state wither away libertarians frequently fail show how weakening power state result improvement human condition patrick's example anti competitive regulations auto dealers deleted here's what i see libertarianism offering you does seem me utopian dream basic human decency common sense real grass roots example freedom liberty yes having few people acting our masters approving rejecting each our basic transactions each other does strike me wonderful way improve human condition thanks awfully patrick let me try drag discussion back original issues i've noted before i'm necessarily disputing benefits eliminating anti competitive legislation regard auto dealers barbers etc one need however swallow entire libertarian agenda accomplish end just because one grants benefits allowing anyone who wishes cut hair sell his her services without regulation does mean same unregulated barbers should free bleed people medical service without government intervention some many libertarians would argue case case basis cost benefit ratio government regulation obviously worthwhile libertarian agenda however does call assessment assumes costs regulation any kind always outweigh its benefits approach avoids all sorts difficult analysis strikes many rest us dogmatic say least i have objection analysis medical care education national defense local police suggests free market can provide more effective efficient means accomplishing social obj

 

第三步:訓練貝葉斯分類器
1.模型訓練,已經上傳了訓練文本集,而後依據訓練文本集來訓練貝葉斯分類器模型。
解釋一下命令:-i 表示訓練集的輸入路徑,HDFS路徑。 -o分類模型輸出路徑 -type 分類器類型,這裏使用bayes,可選cbayes -ng n-gram建模的大小,默認爲1 -source
數據源的位置,HDFS或HBase 後面的測試也是同樣的。

bin/mahout trainclassifier \
-i /user/root/20news/bayes-train-input \
-o /user/root/20news/newsmodel \
-type cbayes \
-ng 2 \
-source hdfs

 

---------------------------------------------------------------------------------------------------------------
[root@Centos hadoop-1.2.1]# cd mahout-distribution-0.6/
[root@Centos mahout-distribution-0.6]# bin/mahout
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using HADOOP_HOME=/root/hadoop-1.2.1
HADOOP_CONF_DIR=/root/hadoop-1.2.1/conf
MAHOUT-JOB: /root/hadoop-1.2.1/mahout-distribution-0.6/mahout-examples-0.6-job.jar
Warning: $HADOOP_HOME is deprecated.
[root@Centos mahout-distribution-0.6]# bin/mahout trainclassifier \
> -i /user/root/20news/bayes-train-input \
> -o /user/root/20news/newsmodel \
> -type cbayes \
> -ng 2 \
> -source hdfs
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using HADOOP_HOME=/root/hadoop-1.2.1
HADOOP_CONF_DIR=/root/hadoop-1.2.1/conf
MAHOUT-JOB: /root/hadoop-1.2.1/mahout-distribution-0.6/mahout-examples-0.6-job.jar
Warning: $HADOOP_HOME is deprecated.

16/04/04 09:21:58 WARN driver.MahoutDriver: No trainclassifier.props found on classpath, will use command-line arguments only
16/04/04 09:21:58 INFO bayes.TrainClassifier: Training Complementary Bayes Classifier
16/04/04 09:21:59 INFO cbayes.CBayesDriver: Reading features...
16/04/04 09:22:01 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
16/04/04 09:22:02 INFO util.NativeCodeLoader: Loaded the native-hadoop library
16/04/04 09:22:02 WARN snappy.LoadSnappy: Snappy native library not loaded
16/04/04 09:22:02 INFO mapred.FileInputFormat: Total input paths to process : 20
16/04/04 09:22:04 INFO mapred.JobClient: Running job: job_201604040854_0001
16/04/04 09:22:05 INFO mapred.JobClient: map 0% reduce 0%
16/04/04 09:22:48 INFO mapred.JobClient: map 1% reduce 0%
16/04/04 09:22:49 INFO mapred.JobClient: map 2% reduce 0%
16/04/04 09:23:11 INFO mapred.JobClient: map 3% reduce 0%
16/04/04 09:23:12 INFO mapred.JobClient: map 4% reduce 0%
····································
··········································
···················································
16/04/04 10:04:11 INFO mapred.JobClient: Job complete: job_201604040854_0004
16/04/04 10:04:11 INFO mapred.JobClient: Counters: 30
16/04/04 10:04:11 INFO mapred.JobClient: Map-Reduce Framework
16/04/04 10:04:11 INFO mapred.JobClient: Spilled Records=4309
16/04/04 10:04:12 INFO mapred.JobClient: Map output materialized bytes=1473
16/04/04 10:04:12 INFO mapred.JobClient: Reduce input records=41
16/04/04 10:04:12 INFO mapred.JobClient: Virtual memory (bytes) snapshot=7733702656
16/04/04 10:04:12 INFO mapred.JobClient: Map input records=3146637
16/04/04 10:04:12 INFO mapred.JobClient: SPLIT_RAW_BYTES=416
16/04/04 10:04:12 INFO mapred.JobClient: Map output bytes=965613985
16/04/04 10:04:12 INFO mapred.JobClient: Reduce shuffle bytes=1473
16/04/04 10:04:12 INFO mapred.JobClient: Physical memory (bytes) snapshot=682602496
16/04/04 10:04:12 INFO mapred.JobClient: Map input bytes=150138778
16/04/04 10:04:12 INFO mapred.JobClient: Reduce input groups=20
16/04/04 10:04:12 INFO mapred.JobClient: Combine output records=2128
16/04/04 10:04:12 INFO mapred.JobClient: Reduce output records=20
16/04/04 10:04:12 INFO mapred.JobClient: Map output records=28673441
16/04/04 10:04:12 INFO mapred.JobClient: Combine input records=28675528
16/04/04 10:04:12 INFO mapred.JobClient: CPU time spent (ms)=210830
16/04/04 10:04:12 INFO mapred.JobClient: Total committed heap usage (bytes)=498544640
16/04/04 10:04:12 INFO mapred.JobClient: File Input Format Counters
16/04/04 10:04:12 INFO mapred.JobClient: Bytes Read=150140285
16/04/04 10:04:12 INFO mapred.JobClient: FileSystemCounters
16/04/04 10:04:12 INFO mapred.JobClient: HDFS_BYTES_READ=150140770
16/04/04 10:04:12 INFO mapred.JobClient: FILE_BYTES_WRITTEN=383730
16/04/04 10:04:12 INFO mapred.JobClient: FILE_BYTES_READ=152894
16/04/04 10:04:12 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=932
16/04/04 10:04:12 INFO mapred.JobClient: File Output Format Counters
16/04/04 10:04:12 INFO mapred.JobClient: Bytes Written=932
16/04/04 10:04:12 INFO mapred.JobClient: Job Counters
16/04/04 10:04:12 INFO mapred.JobClient: Launched map tasks=3
16/04/04 10:04:12 INFO mapred.JobClient: Launched reduce tasks=1
16/04/04 10:04:12 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=214633
16/04/04 10:04:12 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
16/04/04 10:04:12 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=320403
16/04/04 10:04:12 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
16/04/04 10:04:12 INFO mapred.JobClient: Data-local map tasks=3
16/04/04 10:04:14 INFO common.HadoopUtil: Deleting /user/root/20news/newsmodel/trainer-docCount
16/04/04 10:04:15 INFO common.HadoopUtil: Deleting /user/root/20news/newsmodel/trainer-termDocCount
16/04/04 10:04:15 INFO common.HadoopUtil: Deleting /user/root/20news/newsmodel/trainer-featureCount
16/04/04 10:04:15 INFO common.HadoopUtil: Deleting /user/root/20news/newsmodel/trainer-wordFreq
16/04/04 10:04:15 INFO common.HadoopUtil: Deleting /user/root/20news/newsmodel/trainer-tfIdf/trainer-vocabCount
16/04/04 10:04:16 INFO driver.MahoutDriver: Program took 2537700 ms (Minutes: 42.29723333333333)
[root@Centos mahout-distribution-0.6]#
------------------------------------------------------------------------------------------------------------------------
第四步測試貝葉斯模型
bin/mahout testclassifier \
-m /user/root/20news/newsmodel \
-d /user/root/20news/bayes-test-input \
-type cbayes \
-ng 2 \
-source hdfs \
-method mapreduce
---------------------------------------------------------------------------------
第四步:生成模型

第五步:測試貝葉斯分類器
---------------------------------
[root@Centos mahout-distribution-0.6]# bin/mahout testclassifier \
> -m /user/root/20news/newtestsmodel \
> -d /user/root/20news/bayes-test-input \
> -type cbayes \
> -ng 2 \
> -source hdfs \
> -method mapreduce
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using HADOOP_HOME=/root/hadoop-1.2.1
HADOOP_CONF_DIR=/root/hadoop-1.2.1/conf
MAHOUT-JOB: /root/hadoop-1.2.1/mahout-distribution-0.6/mahout-examples-0.6-job.jar
Warning: $HADOOP_HOME is deprecated.

16/04/04 14:10:54 WARN driver.MahoutDriver: No testclassifier.props found on classpath, will use command-line arguments only
16/04/04 14:10:56 INFO common.HadoopUtil: Deleting /user/root/20news/bayes-test-input-output
16/04/04 14:10:56 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
16/04/04 14:11:00 INFO util.NativeCodeLoader: Loaded the native-hadoop library
16/04/04 14:11:00 WARN snappy.LoadSnappy: Snappy native library not loaded
16/04/04 14:11:00 INFO mapred.FileInputFormat: Total input paths to process : 20
16/04/04 14:11:02 INFO mapred.JobClient: Running job: job_201604040854_0011
16/04/04 14:11:03 INFO mapred.JobClient: map 0% reduce 0%
16/04/04 14:11:47 INFO mapred.JobClient: map 5% reduce 0%
16/04/04 14:11:52 INFO mapred.JobClient: map 10% reduce 0%
16/04/04 14:12:33 INFO mapred.JobClient: map 19% reduce 0%
16/04/04 14:12:45 INFO mapred.JobClient: map 19% reduce 6%
16/04/04 14:12:58 INFO mapred.JobClient: map 29% reduce 6%
16/04/04 14:13:09 INFO mapred.JobClient: map 29% reduce 10%
16/04/04 14:13:36 INFO mapred.JobClient: map 39% reduce 10%
16/04/04 14:13:45 INFO mapred.JobClient: map 39% reduce 13%
16/04/04 14:13:53 INFO mapred.JobClient: map 44% reduce 13%
16/04/04 14:13:54 INFO mapred.JobClient: map 50% reduce 13%
16/04/04 14:14:01 INFO mapred.JobClient: map 50% reduce 16%
16/04/04 14:14:03 INFO mapred.JobClient: map 55% reduce 16%
16/04/04 14:14:04 INFO mapred.JobClient: map 60% reduce 16%
16/04/04 14:14:11 INFO mapred.JobClient: map 60% reduce 20%
16/04/04 14:14:22 INFO mapred.JobClient: map 70% reduce 20%
16/04/04 14:14:31 INFO mapred.JobClient: map 70% reduce 23%
16/04/04 14:14:34 INFO mapred.JobClient: map 80% reduce 23%
16/04/04 14:14:41 INFO mapred.JobClient: map 80% reduce 26%
16/04/04 14:14:43 INFO mapred.JobClient: map 85% reduce 26%
16/04/04 14:14:44 INFO mapred.JobClient: map 90% reduce 26%
16/04/04 14:14:47 INFO mapred.JobClient: map 90% reduce 30%
16/04/04 14:14:52 INFO mapred.JobClient: map 95% reduce 30%
16/04/04 14:14:53 INFO mapred.JobClient: map 100% reduce 30%
16/04/04 14:15:02 INFO mapred.JobClient: map 100% reduce 66%
16/04/04 14:15:11 INFO mapred.JobClient: map 100% reduce 100%
16/04/04 14:15:16 INFO mapred.JobClient: Job complete: job_201604040854_0011
16/04/04 14:15:28 INFO mapred.JobClient: Counters: 30
16/04/04 14:15:28 INFO mapred.JobClient: Map-Reduce Framework
16/04/04 14:15:28 INFO mapred.JobClient: Spilled Records=40
16/04/04 14:15:28 INFO mapred.JobClient: Map output materialized bytes=993
16/04/04 14:15:28 INFO mapred.JobClient: Reduce input records=20
16/04/04 14:15:28 INFO mapred.JobClient: Virtual memory (bytes) snapshot=40516427776
16/04/04 14:15:28 INFO mapred.JobClient: Map input records=11314
16/04/04 14:15:28 INFO mapred.JobClient: SPLIT_RAW_BYTES=2573
16/04/04 14:15:28 INFO mapred.JobClient: Map output bytes=470632
16/04/04 14:15:28 INFO mapred.JobClient: Reduce shuffle bytes=993
16/04/04 14:15:28 INFO mapred.JobClient: Physical memory (bytes) snapshot=4085964800
16/04/04 14:15:28 INFO mapred.JobClient: Map input bytes=16607880
16/04/04 14:15:28 INFO mapred.JobClient: Reduce input groups=20
16/04/04 14:15:28 INFO mapred.JobClient: Combine output records=20
16/04/04 14:15:28 INFO mapred.JobClient: Reduce output records=20
16/04/04 14:15:28 INFO mapred.JobClient: Map output records=11314
16/04/04 14:15:28 INFO mapred.JobClient: Combine input records=11314
16/04/04 14:15:28 INFO mapred.JobClient: CPU time spent (ms)=34980
16/04/04 14:15:28 INFO mapred.JobClient: Total committed heap usage (bytes)=3097051136
16/04/04 14:15:28 INFO mapred.JobClient: File Input Format Counters
16/04/04 14:15:28 INFO mapred.JobClient: Bytes Read=16607880
16/04/04 14:15:28 INFO mapred.JobClient: FileSystemCounters
16/04/04 14:15:28 INFO mapred.JobClient: HDFS_BYTES_READ=16610453
16/04/04 14:15:28 INFO mapred.JobClient: FILE_BYTES_WRITTEN=1166412
16/04/04 14:15:28 INFO mapred.JobClient: FILE_BYTES_READ=879
16/04/04 14:15:28 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=1092
16/04/04 14:15:28 INFO mapred.JobClient: File Output Format Counters
16/04/04 14:15:28 INFO mapred.JobClient: Bytes Written=1092
16/04/04 14:15:28 INFO mapred.JobClient: Job Counters
16/04/04 14:15:28 INFO mapred.JobClient: Launched map tasks=20
16/04/04 14:15:28 INFO mapred.JobClient: Launched reduce tasks=1
16/04/04 14:15:28 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=195607
16/04/04 14:15:28 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
16/04/04 14:15:28 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=406966
16/04/04 14:15:28 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
16/04/04 14:15:28 INFO mapred.JobClient: Data-local map tasks=20
16/04/04 14:15:38 INFO bayes.BayesClassifierDriver: =======================================================
Confusion Matrix
-------------------------------------------------------
a b c d e f g h i j k l m n o p q r s t <--Classified as
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0 a = soc.religion.christian
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0 b = rec.autos
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0 c = talk.religion.misc
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0 d = comp.windows.x
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0 e = rec.sport.baseball
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0 f = comp.graphics
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0 g = talk.politics.mideast
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0 h = comp.sys.ibm.pc.hardware
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0 i = sci.med
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0 j = comp.os.ms-windows.misc
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0 k = sci.crypt
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0 l = comp.sys.mac.hardware
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0 m = misc.forsale
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0 n = rec.motorcycles
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0 o = talk.politics.misc
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0 p = sci.electronics
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0 q = rec.sport.hockey
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0 r = sci.space
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0 s = alt.atheism
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0 t = talk.politics.guns


16/04/04 14:15:38 INFO driver.MahoutDriver: Program took 283133 ms (Minutes: 4.718883333333333)

-------------------------------

相關文章
相關標籤/搜索