Eclipse遠程提交hadoop集羣任務 Hadoop高可用平臺搭建

時間 2019-12-06

標籤 eclipse 遠程提交 hadoop 集羣任務可用平臺搭建欄目 Eclipse 简体版

原文原文鏈接

文章概覽：php

一、前言

二、Eclipse查看遠程hadoop集羣文件

三、Eclipse提交遠程hadoop集羣任務

四、小結

1 前言

　　Hadoop高可用品臺搭建完備後，參見《Hadoop高可用平臺搭建》，下一步是在集羣上跑任務，本文主要講述Eclipse遠程提交hadoop集羣任務。html

2 Eclipse查看遠程hadoop集羣文件

2.1 編譯hadoop eclipse 插件

　　Hadoop集羣文件查看能夠經過webUI或hadoop Cmd，爲了在Eclipse上方便增刪改查集羣文件，咱們須要編譯hadoop eclipse 插件，步驟以下：java

　　① 環境準備node

　　　　JDK環境配置　　配置JAVA_HOME，並將bin目錄配置到pathgit

　　　　ANT環境配置　　配置ANT_HOME，並將bin目錄配置到pathgithub

　　　　在cmd查看：web

　　② 軟件準備apache

　　　　hadoop2x-eclipse-plugin-master　　https://github.com/winghc/hadoop2x-eclipse-pluginbash

　　　　hadoop-common-2.2.0-bin-master　　https://github.com/srccodes/hadoop-common-2.2.0-binapp

　　　　hadoop-2.6.0

　　　　eclipse-jee-luna-SR2-win32-x86_64

　　③ 編譯

　　注：軟件位置爲本身機器上位置，請勿照搬。

E:\>cd E:\hadoop\hadoop2x-eclipse-plugin-master\src\contrib\eclipse-plugin

E:\hadoop\hadoop2x-eclipse-plugin-master\src\contrib\eclipse-plugin>ant jar -Dve

rsion=2.6.0 -Declipse.home=E:\eclipse -Dhadoop.home=E:\hadoop\hadoop-2.6.0
Buildfile: E:\hadoop\hadoop2x-eclipse-plugin-master\src\contrib\eclipse-plugin\b
uild.xml

check-contrib:

init:
     [echo] contrib: eclipse-plugin

init-contrib:

ivy-probe-antlib:

ivy-init-antlib:

ivy-init:
[ivy:configure] :: Ivy 2.1.0 - 20090925235825 :: http://ant.apache.org/ivy/ ::
[ivy:configure] :: loading settings :: file = E:\hadoop\hadoop2x-eclipse-plugin-
master\ivy\ivysettings.xml

ivy-resolve-common:

ivy-retrieve-common:
[ivy:cachepath] DEPRECATED: 'ivy.conf.file' is deprecated, use 'ivy.settings.fil
e' instead
[ivy:cachepath] :: loading settings :: file = E:\hadoop\hadoop2x-eclipse-plugin-
master\ivy\ivysettings.xml

compile:
     [echo] contrib: eclipse-plugin
    [javac] E:\hadoop\hadoop2x-eclipse-plugin-master\src\contrib\eclipse-plugin\
build.xml:76: warning: 'includeantruntime' was not set, defaulting to build.sysc
lasspath=last; set to false for repeatable builds

jar:

BUILD SUCCESSFUL
Total time: 10 seconds

　　　　成功編譯，生成以下圖：

　　④ 將改文件拷貝到Eclipse中plugins目錄下，重啓Eclipse會出現：

2.2 配置hadoop選項

　　打開Map/Reduce Locations

　　編輯Map/Reduce配置項：

　　根據上一篇，咱們配置用戶hadoop，Active HDFS和Active NM位置信息。

　　完成後，就能夠在Eclipse中查看HDFS文件信息:

2.3 hdfs簡單實例

　　咱們編寫一個hdfs簡單實例，來遠程操做hadoop。

 1 package com.diexun.cn.mapred;
 2 
 3 import java.io.IOException;
 4 import java.net.URI;
 5 import java.net.URISyntaxException;
 6 
 7 import org.apache.hadoop.conf.Configuration;
 8 import org.apache.hadoop.fs.FSDataOutputStream;
 9 import org.apache.hadoop.fs.FileSystem;
10 import org.apache.hadoop.fs.Path;
11 
12 public class MR2Test {
13     
14     static final String INPUT_PATH = "hdfs://192.168.137.101:9000/hello";
15     static final String OUTPUT_PATH = "hdfs://192.168.137.101:9000/output";
16     
17     public static void main(String[] args) throws IOException, URISyntaxException {
18         Configuration conf = new Configuration();
19         final FileSystem fileSystem = FileSystem.get(new URI(INPUT_PATH), conf);
20         final Path outPath = new Path(OUTPUT_PATH);
21         if (fileSystem.exists(outPath)) {
22             fileSystem.delete(outPath, true);
23         }
24         
25         FSDataOutputStream fsDataOutputStream = fileSystem.create(new Path(INPUT_PATH));
26         fsDataOutputStream.writeBytes("welcome to here ...");
27     }
28 
29 }

　　用Eclipse查看HDFS文件，發現hello文件被修改成「welcome to here ...」。

3 Eclipse提交遠程hadoop集羣任務

　　正式進入本文的正題，新建一個Map/Reduce Project，會引用不少jar（注：日常咱們都是新建Maven項目進行開發，有利於程序遷移及體積，後面的文章會以Maven構建），將自帶WordCount實例拷貝到Eclipse，

配置運行參數：(注：填寫hdfs集羣上路徑，本地路徑無效)

　　執行，出現線面結果：

log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
    at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method)
    at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:557)
    at org.apache.hadoop.fs.FileUtil.canRead(FileUtil.java:977)
    at org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:187)
    at org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:174)
    at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:108)
    at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.confChanged(LocalDirAllocator.java:285)
    at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:344)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:115)
    at org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:131)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:163)
    at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:731)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:536)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1314)
    at WordCount.main(WordCount.java:76)

　　方便後面打印，先添加log4j.properties文件：

log4j.rootLogger=DEBUG,stdout,R
 
log4j.appender.stdout=org.apache.log4j.ConsoleAppender 
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout 
log4j.appender.stdout.layout.ConversionPattern=%5p - %m%n
 
log4j.appender.R=org.apache.log4j.RollingFileAppender 
log4j.appender.R.File=mapreduce_test.log 
log4j.appender.R.MaxFileSize=1MB 
log4j.appender.R.MaxBackupIndex=1 
log4j.appender.R.layout=org.apache.log4j.PatternLayout 
log4j.appender.R.layout.ConversionPattern=%p %t %c - %m%n 
log4j.logger.com.codefutures=INFO

　　根據出錯提示，是因爲NativeIO.java中return access0(path, desiredAccess.accessRight());致使，此句注，改成返回return true。　

　　修改源碼後，在項目裏建立和Apache中同樣的包，此包會覆蓋Apache源碼包，以下：

　　再次執行：

 INFO - Job job_local401325246_0001 completed successfully
DEBUG - PrivilegedAction as:wangxiaolong (auth:SIMPLE) from:org.apache.hadoop.mapreduce.Job.getCounters(Job.java:764)
 INFO - Counters: 38
    File System Counters
        FILE: Number of bytes read=16290
        FILE: Number of bytes written=545254
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=38132
        HDFS: Number of bytes written=6834
        HDFS: Number of read operations=15
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=4
    Map-Reduce Framework
        Map input records=174
        Map output records=1139
        Map output bytes=23459
        Map output materialized bytes=7976
        Input split bytes=99
        Combine input records=1139
        Combine output records=286
        Reduce input groups=286
        Reduce shuffle bytes=7976
        Reduce input records=286
        Reduce output records=286
        Spilled Records=572
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=18
        CPU time spent (ms)=0
        Physical memory (bytes) snapshot=0
        Virtual memory (bytes) snapshot=0
        Total committed heap usage (bytes)=468713472
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=19066
    File Output Format Counters 
        Bytes Written=6834

　　確實已經成功執行了，可發現「INFO - Job job_local401325246_0001 completed successfully」，

　　觀察http://nns:8088/cluster/apps也沒有發現該任務，說明此任務並未提交到集羣執行。

　　添加配置文件，以下：

　配置文件直接從集羣下載（注：集羣中yarn-site.xml配置中「yarn.resourcemanager.ha.id」是有所不一樣的），該下載哪份配置？

　　因爲集羣中Active RM是nns，故下載nns中yarn-site.xml配置。執行：

Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.diexun.cn.mapred.WordCount$TokenizerMapper not found
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2074)
    at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:742)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.ClassNotFoundException: Class com.diexun.cn.mapred.WordCount$TokenizerMapper not found
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1980)
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2072)
    ... 8 more

　　沒有找到對應的代碼文件，咱們把代碼打包，並設置conf，conf.set("mapred.jar", "**.jar"); 再次執行：

Exception message: /bin/bash: line 0: fg: no job control

Stack trace: ExitCodeException exitCode=1: /bin/bash: line 0: fg: no job control

    at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
    at org.apache.hadoop.util.Shell.run(Shell.java:455)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

　　出現以下錯誤，是因爲平臺引發，在hadoop2.2~2.5中需修改源碼編譯(略)，hadoop2.6已經能夠直接添加配置，conf.set("mapreduce.app-submission.cross-platform", "true");或直接到mapred-site.xml中配置。再次執行：

 INFO - Job job_1438912697979_0023 completed successfully
DEBUG - PrivilegedAction as:wangxiaolong (auth:SIMPLE) from:org.apache.hadoop.mapreduce.Job.getCounters(Job.java:764)
DEBUG - IPC Client (1894045259) connection to dn2/192.168.137.104:56327 from wangxiaolong sending #217
DEBUG - IPC Client (1894045259) connection to dn2/192.168.137.104:56327 from wangxiaolong got value #217
DEBUG - Call: getCounters took 139ms
 INFO - Counters: 49
    File System Counters
        FILE: Number of bytes read=149
        FILE: Number of bytes written=325029
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=255
        HDFS: Number of bytes written=86
        HDFS: Number of read operations=9
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=2
        Launched reduce tasks=1
        Data-local map tasks=2
        Total time spent by all maps in occupied slots (ms)=45308
        Total time spent by all reduces in occupied slots (ms)=9324
        Total time spent by all map tasks (ms)=45308
        Total time spent by all reduce tasks (ms)=9324
        Total vcore-seconds taken by all map tasks=45308
        Total vcore-seconds taken by all reduce tasks=9324
        Total megabyte-seconds taken by all map tasks=46395392
        Total megabyte-seconds taken by all reduce tasks=9547776
    Map-Reduce Framework
        Map input records=3
        Map output records=12
        Map output bytes=119
        Map output materialized bytes=155
        Input split bytes=184
        Combine input records=12
        Combine output records=12
        Reduce input groups=11
        Reduce shuffle bytes=155
        Reduce input records=12
        Reduce output records=11
        Spilled Records=24
        Shuffled Maps =2
        Failed Shuffles=0
        Merged Map outputs=2
        GC time elapsed (ms)=827
        CPU time spent (ms)=4130
        Physical memory (bytes) snapshot=479911936
        Virtual memory (bytes) snapshot=6192558080
        Total committed heap usage (bytes)=261115904
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=71
    File Output Format Counters 
        Bytes Written=86