文章概覽:php
Hadoop高可用品臺搭建完備後,參見《Hadoop高可用平臺搭建》,下一步是在集羣上跑任務,本文主要講述Eclipse遠程提交hadoop集羣任務。html
Hadoop集羣文件查看能夠經過webUI或hadoop Cmd,爲了在Eclipse上方便增刪改查集羣文件,咱們須要編譯hadoop eclipse 插件,步驟以下:java
① 環境準備node
JDK環境配置 配置JAVA_HOME,並將bin目錄配置到pathgit
ANT環境配置 配置ANT_HOME,並將bin目錄配置到pathgithub
在cmd查看:web
② 軟件準備apache
hadoop2x-eclipse-plugin-master https://github.com/winghc/hadoop2x-eclipse-pluginbash
hadoop-common-2.2.0-bin-master https://github.com/srccodes/hadoop-common-2.2.0-binapp
hadoop-2.6.0
eclipse-jee-luna-SR2-win32-x86_64
③ 編譯
注:軟件位置爲本身機器上位置,請勿照搬。
E:\>cd E:\hadoop\hadoop2x-eclipse-plugin-master\src\contrib\eclipse-plugin
E:\hadoop\hadoop2x-eclipse-plugin-master\src\contrib\eclipse-plugin>ant jar -Dve
rsion=2.6.0 -Declipse.home=E:\eclipse -Dhadoop.home=E:\hadoop\hadoop-2.6.0 Buildfile: E:\hadoop\hadoop2x-eclipse-plugin-master\src\contrib\eclipse-plugin\b uild.xml check-contrib: init: [echo] contrib: eclipse-plugin init-contrib: ivy-probe-antlib: ivy-init-antlib: ivy-init: [ivy:configure] :: Ivy 2.1.0 - 20090925235825 :: http://ant.apache.org/ivy/ :: [ivy:configure] :: loading settings :: file = E:\hadoop\hadoop2x-eclipse-plugin- master\ivy\ivysettings.xml ivy-resolve-common: ivy-retrieve-common: [ivy:cachepath] DEPRECATED: 'ivy.conf.file' is deprecated, use 'ivy.settings.fil e' instead [ivy:cachepath] :: loading settings :: file = E:\hadoop\hadoop2x-eclipse-plugin- master\ivy\ivysettings.xml compile: [echo] contrib: eclipse-plugin [javac] E:\hadoop\hadoop2x-eclipse-plugin-master\src\contrib\eclipse-plugin\ build.xml:76: warning: 'includeantruntime' was not set, defaulting to build.sysc lasspath=last; set to false for repeatable builds jar: BUILD SUCCESSFUL Total time: 10 seconds
成功編譯,生成以下圖:
④ 將改文件拷貝到Eclipse中plugins目錄下,重啓Eclipse會出現:
打開Map/Reduce Locations
編輯Map/Reduce配置項:
根據上一篇,咱們配置用戶hadoop,Active HDFS和Active NM位置信息。
完成後,就能夠在Eclipse中查看HDFS文件信息:
咱們編寫一個hdfs簡單實例,來遠程操做hadoop。
1 package com.diexun.cn.mapred; 2 3 import java.io.IOException; 4 import java.net.URI; 5 import java.net.URISyntaxException; 6 7 import org.apache.hadoop.conf.Configuration; 8 import org.apache.hadoop.fs.FSDataOutputStream; 9 import org.apache.hadoop.fs.FileSystem; 10 import org.apache.hadoop.fs.Path; 11 12 public class MR2Test { 13 14 static final String INPUT_PATH = "hdfs://192.168.137.101:9000/hello"; 15 static final String OUTPUT_PATH = "hdfs://192.168.137.101:9000/output"; 16 17 public static void main(String[] args) throws IOException, URISyntaxException { 18 Configuration conf = new Configuration(); 19 final FileSystem fileSystem = FileSystem.get(new URI(INPUT_PATH), conf); 20 final Path outPath = new Path(OUTPUT_PATH); 21 if (fileSystem.exists(outPath)) { 22 fileSystem.delete(outPath, true); 23 } 24 25 FSDataOutputStream fsDataOutputStream = fileSystem.create(new Path(INPUT_PATH)); 26 fsDataOutputStream.writeBytes("welcome to here ..."); 27 } 28 29 }
用Eclipse查看HDFS文件,發現hello文件被修改成「welcome to here ...」。
正式進入本文的正題,新建一個Map/Reduce Project,會引用不少jar(注:日常咱們都是新建Maven項目進行開發,有利於程序遷移及體積,後面的文章會以Maven構建),將自帶WordCount實例拷貝到Eclipse,
配置運行參數:(注:填寫hdfs集羣上路徑,本地路徑無效)
執行,出現線面結果:
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:557)
at org.apache.hadoop.fs.FileUtil.canRead(FileUtil.java:977)
at org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:187)
at org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:174)
at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:108)
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.confChanged(LocalDirAllocator.java:285)
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:344)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:115)
at org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:131)
at org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:163)
at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:731)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:536)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1314)
at WordCount.main(WordCount.java:76)
方便後面打印,先添加log4j.properties文件:
log4j.rootLogger=DEBUG,stdout,R log4j.appender.stdout=org.apache.log4j.ConsoleAppender log4j.appender.stdout.layout=org.apache.log4j.PatternLayout log4j.appender.stdout.layout.ConversionPattern=%5p - %m%n log4j.appender.R=org.apache.log4j.RollingFileAppender log4j.appender.R.File=mapreduce_test.log log4j.appender.R.MaxFileSize=1MB log4j.appender.R.MaxBackupIndex=1 log4j.appender.R.layout=org.apache.log4j.PatternLayout log4j.appender.R.layout.ConversionPattern=%p %t %c - %m%n log4j.logger.com.codefutures=INFO
根據出錯提示,是因爲NativeIO.java中return access0(path, desiredAccess.accessRight());致使,此句注,改成返回return true。
修改源碼後,在項目裏建立和Apache中同樣的包,此包會覆蓋Apache源碼包,以下:
再次執行:
INFO - Job job_local401325246_0001 completed successfully DEBUG - PrivilegedAction as:wangxiaolong (auth:SIMPLE) from:org.apache.hadoop.mapreduce.Job.getCounters(Job.java:764) INFO - Counters: 38 File System Counters FILE: Number of bytes read=16290 FILE: Number of bytes written=545254 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=38132 HDFS: Number of bytes written=6834 HDFS: Number of read operations=15 HDFS: Number of large read operations=0 HDFS: Number of write operations=4 Map-Reduce Framework Map input records=174 Map output records=1139 Map output bytes=23459 Map output materialized bytes=7976 Input split bytes=99 Combine input records=1139 Combine output records=286 Reduce input groups=286 Reduce shuffle bytes=7976 Reduce input records=286 Reduce output records=286 Spilled Records=572 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=18 CPU time spent (ms)=0 Physical memory (bytes) snapshot=0 Virtual memory (bytes) snapshot=0 Total committed heap usage (bytes)=468713472 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=19066 File Output Format Counters Bytes Written=6834
確實已經成功執行了,可發現「INFO - Job job_local401325246_0001 completed successfully」,
觀察http://nns:8088/cluster/apps也沒有發現該任務,說明此任務並未提交到集羣執行。
添加配置文件,以下:
配置文件直接從集羣下載(注:集羣中yarn-site.xml配置中「yarn.resourcemanager.ha.id」是有所不一樣的),該下載哪份配置?
因爲集羣中Active RM是nns,故下載nns中yarn-site.xml配置。執行:
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.diexun.cn.mapred.WordCount$TokenizerMapper not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2074) at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:742) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.ClassNotFoundException: Class com.diexun.cn.mapred.WordCount$TokenizerMapper not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1980) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2072) ... 8 more
沒有找到對應的代碼文件,咱們把代碼打包,並設置conf,conf.set("mapred.jar", "**.jar"); 再次執行:
Exception message: /bin/bash: line 0: fg: no job control Stack trace: ExitCodeException exitCode=1: /bin/bash: line 0: fg: no job control at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
出現以下錯誤,是因爲平臺引發,在hadoop2.2~2.5中需修改源碼編譯(略),hadoop2.6已經能夠直接添加配置,conf.set("mapreduce.app-submission.cross-platform", "true");或直接到mapred-site.xml中配置。再次執行:
INFO - Job job_1438912697979_0023 completed successfully DEBUG - PrivilegedAction as:wangxiaolong (auth:SIMPLE) from:org.apache.hadoop.mapreduce.Job.getCounters(Job.java:764) DEBUG - IPC Client (1894045259) connection to dn2/192.168.137.104:56327 from wangxiaolong sending #217 DEBUG - IPC Client (1894045259) connection to dn2/192.168.137.104:56327 from wangxiaolong got value #217 DEBUG - Call: getCounters took 139ms INFO - Counters: 49 File System Counters FILE: Number of bytes read=149 FILE: Number of bytes written=325029 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=255 HDFS: Number of bytes written=86 HDFS: Number of read operations=9 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=2 Launched reduce tasks=1 Data-local map tasks=2 Total time spent by all maps in occupied slots (ms)=45308 Total time spent by all reduces in occupied slots (ms)=9324 Total time spent by all map tasks (ms)=45308 Total time spent by all reduce tasks (ms)=9324 Total vcore-seconds taken by all map tasks=45308 Total vcore-seconds taken by all reduce tasks=9324 Total megabyte-seconds taken by all map tasks=46395392 Total megabyte-seconds taken by all reduce tasks=9547776 Map-Reduce Framework Map input records=3 Map output records=12 Map output bytes=119 Map output materialized bytes=155 Input split bytes=184 Combine input records=12 Combine output records=12 Reduce input groups=11 Reduce shuffle bytes=155 Reduce input records=12 Reduce output records=11 Spilled Records=24 Shuffled Maps =2 Failed Shuffles=0 Merged Map outputs=2 GC time elapsed (ms)=827 CPU time spent (ms)=4130 Physical memory (bytes) snapshot=479911936 Virtual memory (bytes) snapshot=6192558080 Total committed heap usage (bytes)=261115904 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=71 File Output Format Counters Bytes Written=86
至此,任務已經成功提交至集羣執行。
有時咱們想用咱們特定用戶去執行任務(注:dfs.permissions.enabled爲true時,每每會涉及用戶權限問題),能夠在VM arguments中設置,這樣任務的提交這就變成了設定者。
本文主要闡述hadoop eclipse插件的編譯與遠程提交hadoop集羣任務。hadoop eclipse插件的編譯須要注意軟件安裝位置對應。遠程提交hadoop集羣任務需留意,本地與HDFS文件路徑異同,加載特定文件配置,指定特定用戶,跨平臺異常等問題。
參考:
http://www.cxyclub.cn/n/48423/
http://zy19982004.iteye.com/blog/2031172
http://www.iteye.com/blogs/subjects/Hadoop
http://qindongliang.iteye.com/blog/2078452