RHADOOP 環境安裝能夠參考如下文章:http://chen.yi.bo.blog.163.com/blog/static/1506211092012720111910827/ java
注意事項: apache
1.環境變量設置: app
我自做聰明,把環境變量設置到/etc/profile下,結果悲劇,報錯 oop
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:576)
at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:530)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
log4j:WARN No appenders could be found for logger (org.apache.hadoop.hdfs.DFSClient).
spa
,所以仍是老老實這樣作吧。 code
2.原文的wordcount實例代碼有點問題。 orm
wordcount <- function (input, output=NULL, split='[[:punct:][:space:]]+'){
mapreduce(input=input, output=output,
map=function(k, v){
v2=unlist(strsplit(x=v, split=split))
v3=v2[v2!=' ']
lapply(v3, function(w){keyval(w, 1)})
},
reduce=function(k, vv){
keyval(k, sum(unlist(vv)))
})
} blog
這樣跑的話JOB仍是報錯,由於map方法沒有返回keyval類型值,keyval類型值爲list(key<list>,value<list>), ip
而lapply調用後返回的爲list(list<key,value>,list<key,value>.......),所以應該把紅色部分改成:keyval(v3,rep(1,length(v3))) 就OK了。 hadoop
這是本人初試RHADOOP後遇到的問題,記錄如下供之後參考,同時但願對其餘正在研究RHADOOP的童鞋有所幫助。