首先說一下本人的環境:html
Windows7 64位系統java
Spring Tool Suite Version: 3.4.0.RELEASEnode
Hadoop2.6.0git
Hadoop2.x以後沒有Eclipse插件工具,咱們就不能在Eclipse上調試代碼,咱們要把寫好的java代碼的MapReduce打包成jar而後在Linux上運行,因此這種不方便咱們調試代碼,因此咱們本身編譯一個Eclipse插件,方便咱們在咱們本地上調試,通過hadoop1.x的發展,編譯hadoop2.x版本的eclipse插件比以前簡單多了。接下來我 們開始編譯Hadoop-eclipse-plugin插件,並在Eclipse開發Hadoop。github
1) 安裝jdkapache
2) 配置環境變量服務器
JAVA_HOME、CLASSPATH、PATH等設置,這裏就很少介紹,網上不少資料app
1).下載eclipse-jee-juno-SR2.rareclipse
2).解壓到本地磁盤,如圖所示:分佈式
1)下載
http://ant.apache.org/bindownload.cgi
apache-ant-1.9.4-bin.zip
2)解壓到一個盤,如圖所示:
3).環境變量的配置
新建ANT_HOME=E:\ant\apache-ant-1.9.4-bin\apache-ant-1.9.4
在PATH後面加;%ANT_HOME%\bin
4)cmd 測試一下是否配置正確
ant version 如圖所示:
1).下載hadoop包
hadoop-2.6.0.tar.gz
解壓到本地磁盤,如圖所示:
下載hadoop2x-eclipse-plugin源代碼
1)目前hadoop2的eclipse-plugins源代碼由github脫管,下載地址是https://github.com/winghc/hadoop2x-eclipse-plugin,而後在右側的Download ZIP鏈接點擊下載,如圖所示:
2)下載hadoop2x-eclipse-plugin-master.zip
解壓到本地磁盤,如圖所示:
antjar -Dversion=2.6.0 -Declipse.home=F:\tool\eclipse-jee-juno-SR2\eclipse-jee-juno-SR2 -Dhadoop.home=E:\hadoop\hadoop-2.6.0\hadoop-2.6.0,如圖所示:
1)點擊Window-->Show View -->MapReduce Tools 點擊Map/ReduceLocation
2)點擊Map/ReduceLocation選項卡,點擊右邊小象圖標,打開Hadoop Location配置窗口: 輸入Location Name,任意名稱便可.配置Map/Reduce Master和DFS Mastrer,Host和Port配置成hdfs-site.xml與core-site.xml的設置一致便可。
1.右擊New->Map/Reduce Project
2.新建WordCount.java(在Hadoop的share目錄下找到mapreduce的案例,copy過來)
3.在hdfs建立一個input目錄(輸出目錄能夠不用建立,運行MR是會自動建立),並上傳一個file01文件(隨便寫幾個單詞)
hdfs dfs -mkdir –p /user/root/input
hdfs dfs -mkdir -p /user/root/output
hadoop fs -put file01 /input
4.點擊WordCount.java右擊-->Run As-->Run COnfigurations 設置輸入和輸出目錄路徑,如圖所示:
5.點擊WordCount.java右擊-->Run As-->Run on Hadoop
而後到output/count目錄下,有一個統計文件,並查看結果,因此配置成功。
問題一.An internal error occurred during: "Map/Reducelocation status updater".java.lang.NullPointerException
咱們hadoop-eclipse-plugin-2.6.0.jar放到Eclipse的plugins目錄下,咱們的Eclipse目錄是F:\tool\eclipse-jee-juno-SR2\eclipse-jee-juno-SR2\plugins,重啓一下Eclipse,而後,打開Window-->Preferens,能夠看到Hadoop Map/Reduc選項,而後點擊出現了An internal error occurredduring: "Map/Reduce location status updater".java.lang.NullPointerException,如圖所示:
解決:
咱們發現剛配置部署的Hadoop2還沒建立輸入和輸出目錄,先在hdfs上建個文件夾 。
#bin/hdfs dfs -mkdir –p /user/root/input
#bin/hdfs dfs -mkdir -p /user/root/output
咱們在Eclipse的DFS Locations目錄下看到咱們這兩個目錄,如圖所示:
問題二.Exception in thread "main" java.lang.NullPointerException atjava.lang.ProcessBuilder.start(Unknown Source)
運行Hadoop2的WordCount.java代碼時出現了這樣錯誤,
log4j:WARNPlease initialize the log4j system properly. log4j:WARN Seehttp://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Exception in thread "main" java.lang.NullPointerException atjava.lang.ProcessBuilder.start(Unknown Source) atorg.apache.hadoop.util.Shell.runCommand(Shell.java:482) atorg.apache.hadoop.util.Shell.run(Shell.java:455) atorg.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) atorg.apache.hadoop.util.Shell.execCommand(Shell.java:808) atorg.apache.hadoop.util.Shell.execCommand(Shell.java:791) at
分析:
下載Hadoop2以上版本時,在Hadoop2的bin目錄下沒有winutils.exe
解決:
1.下載http://pan.baidu.com/s/1qWG7XxU下載Hadoop2.6.0-eclipse插件.zip,而後解壓後,把Hadoop2.6.0-eclipse插件.zip\eclipse插件\2.4之後的目錄中的winutils.exe複製Hadoop2/bin目錄下。如圖所示:
2.Eclipse-》window-》Preferences 下的Hadoop Map/Peduce 把下載放在咱們的磁盤的Hadoop目錄引進來,如圖所示:
3.Hadoop2配置變量環境HADOOP_HOME 和path,如圖所示:
問題三.Exception in thread "main"java.lang.UnsatisfiedLinkError:org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
當咱們解決了問題三時,在運行WordCount.java代碼時,出現這樣的問題
log4j:WARN No appenders could be found forlogger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory). log4j:WARN Please initialize the log4jsystem properly. log4j:WARN Seehttp://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Exception in thread "main"java.lang.UnsatisfiedLinkError:org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z atorg.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method) atorg.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:557) atorg.apache.hadoop.fs.FileUtil.canRead(FileUtil.java:977) atorg.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:187) atorg.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:174) atorg.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:108) atorg.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.confChanged(LocalDirAllocator.java:285) atorg.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:344) atorg.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150) atorg.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131) atorg.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:115) atorg.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:131)
分析:
C:\Windows\System32下缺乏hadoop.dll,把這個文件拷貝到C:\Windows\System32下面便可。
解決:
將壓縮包中的hadoop.dll放到C:\Windows\System32下,而後重啓電腦,也許還沒那麼簡單,仍是出現這樣的問題。若是這個仍是沒解決,最好在%HADOOP_HOME%/bin目錄下面也複製一份。
咱們在繼續分析:
咱們在出現錯誤的的atorg.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:557)咱們來看這個類NativeIO的557行,如圖所示:
Windows的惟一方法用於檢查當前進程的請求,在給定的路徑的訪問權限,因此咱們先給以能進行訪問,咱們本身先修改源代碼,return true 時容許訪問。咱們下載對應hadoop源代碼,hadoop-2.6.0-src.tar.gz解壓,hadoop-2.6.0-src\hadoop-common-project\hadoop-common\src\main\java\org\apache\hadoop\io\nativeio下NativeIO.java 複製到對應的Eclipse的project,而後修改557行爲return true如圖所示:
問題四:org.apache.hadoop.security.AccessControlException: Permissiondenied: user=zhengcy, access=WRITE,inode="/user/root/output":root:supergroup:drwxr-xr-x
咱們在執行運行WordCount.java代碼時,出現這樣的問題
2014-12-18 16:03:24,092 WARN (org.apache.hadoop.mapred.LocalJobRunner:560) - job_local374172562_0001 org.apache.hadoop.security.AccessControlException: Permission denied: user=zhengcy, access=WRITE, inode="/user/root/output":root:supergroup:drwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:271) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:257) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:238) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:179) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6512) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6494) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6446) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4248) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4218) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4191) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:813)
分析:
咱們沒權限訪問output目錄。
解決:
咱們 在設置hdfs配置的目錄是在hdfs-site.xml配置hdfs文件存放的地方,我在hadoop僞分佈式部署那邊有介紹過,咱們在這邊在複習一下,如圖所示:
咱們在這個etc/hadoop下的hdfs-site.xml添加
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
設置沒有權限,不過咱們在正式的 服務器上不能這樣設置。
問題五:File/usr/root/input/file01._COPYING_ could only be replicated to 0 nodes instead ofminRepLication (=1) There are 0 datanode(s) running and no node(s) are excludedin this operation
如圖所示:
分析:
咱們在第一次執行#hadoop namenode –format 完而後在執行#sbin/start-all.sh
在執行#jps,能看到Datanode,在執行#hadoop namenode –format而後執行#jps這時看不到Datanode ,如圖所示:
而後咱們想把文本放到輸入目錄執行bin/hdfs dfs -put/usr/local/hadoop/hadoop-2.6.0/test/* /user/root/input 把/test/*文件上傳到hdfs的/user/root/input中,出現這樣的問題,
解決:
是咱們執行太屢次了hadoopnamenode –format,在建立了多個,咱們對應的hdfs目錄刪除hdfs-site.xml配置的保存datanode和namenode目錄。
問題六:在複製了hadoop.dll後,運行WordCount,發現運行一會沒有任何信息輸出就結束了
解決:能夠寫一個log4j日誌文件,查看一下日誌的輸出,可能從輸出的日誌中發現問題。
內容寫爲:
log4j.rootLogger=debug,stdout,R log4j.appender.stdout=org.apache.log4j.ConsoleAppender log4j.appender.stdout.layout=org.apache.log4j.PatternLayout log4j.appender.stdout.layout.ConversionPattern=%5p - %m%n log4j.appender.R=org.apache.log4j.RollingFileAppender log4j.appender.R.File=mapreduce_test.log log4j.appender.R.MaxFileSize=1MB log4j.appender.R.MaxBackupIndex=1 log4j.appender.R.layout=org.apache.log4j.PatternLayout log4j.appender.R.layout.ConversionPattern=%p %t %c - %m%n log4j.logger.com.codefutures=DEBUG
問題七:有了log4j日誌輸出後,查看問題就比較方便了,若是同一個MR執行兩次,會出現輸出文件已存在的問題
解決:能夠刪除掉存在的輸出文件,也能夠改代碼中輸出的路徑
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://192.168.233.11:8020/mroutput already exists at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146) at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:562) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1314) at test.WordCount.main(WordCount.java:87)
問題八:出現內存溢出的問題java.lang.OutOfMemoryError
WARN - job_local845949011_0001 java.lang.Exception: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983) at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:401) at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:81) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:695) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619)
解決:右鍵WordCount,-->run Confi....
感謝:(部份內容摘自下面,本身作了一些修改和補充)