6.2.2 輔助類GenericOptionsParser,Tool和ToolRunner深刻解析

時間 2019-11-18

標籤 6.2.2 輔助 genericoptionsparser tool toolrunner 深刻解析简体版

原文原文鏈接

輔助類GenericOptionsParser,Tool和ToolRunner

（1）爲何要用ToolRunnerhtml

將MapReduce Job配置參數寫到java代碼裏，一旦變動意味着修改java文件源碼、編譯、打包、部署一連串事情。當MapReduce 依賴配置文件的時候，你須要手工編寫java代碼使用DistributedCache將其上傳到HDFS中，以便map和reduce函數能夠讀取。:當你的map或reduce 函數依賴第三方jar文件時，你在命令行中使用」-libjars」參數指定依賴jar包時，但根本沒生效。Hadoop有個能夠GenericOptionsParser是一個類，用來解釋經常使用的Hadoop命令行選項，經過簡單的命令行參數來實現這樣的功能，爲Configuration對象設置相應的取值。一般不直接使用GenericOptionsParser，更方便的方式是：實現Tool接口，經過ToolRunner來運行應用程序，ToolRunner內部調用GenericOptionsParser來解析命令行。設置Configuration對象。java

（2）使用ToolRunner步驟node

自定義一個ToolRunner類ToolRunnerDemo類，繼承Configured類，實現Tool接口，實現Tool的run(String [] args)方法，並在main函數中調用ToolRunner. run(Tool tool, String[] args)靜態方法。Run方法內部建立GenericOptionsParser parser = new GenericOptionsParser(conf, args);調用GenericOptionsParser解析命令行參數，解析完以後將參數設置到Configuration對象中。apache

1）建立ToolRunnerDemo對象json

package org.jediael.hadoopdemo.toolrunnerdemo;ide

import java.util.Map.Entry;函數

import org.apache.hadoop.conf.Configuration;oop

import org.apache.hadoop.conf.Configured;ui

import org.apache.hadoop.util.Tool;this

import org.apache.hadoop.util.ToolRunner;

public class ToolRunnerDemo extends Configured implements Tool {

static {

//Configuration.addDefaultResource("hdfs-default.xml");

//Configuration.addDefaultResource("hdfs-site.xml");

//Configuration.addDefaultResource("mapred-default.xml");

//Configuration.addDefaultResource("mapred-site.xml");

}

@Override

public int run(String[] args) throws Exception {

Configuration conf = getConf();

for (Entry<String, String> entry : conf) {

System.out.printf("%s=%s\n", entry.getKey(), entry.getValue());

}

return 0;

}

public static void main(String[] args) throws Exception {

int exitCode = ToolRunner.run(new ToolRunnerDemo(), args);//ToolRunnerDemo對象實現了Tool接口，形參傳入對象引用，在調用tool.run()方法，實際是調用ToolRunner重寫的run方法。

System.exit(exitCode);

}

Configurable接口只有兩個函數，獲取設置Configuration對象

package org.apache.hadoop.conf;

public interface Configurable {

void setConf(Configuration conf);

Configuration getConf();

}

Configred類實現了Configurable接口

package org.apache.hadoop.conf;

public class Configured implements Configurable {

private Configuration conf;

public Configured() {

this(null);

}

public Configured(Configuration conf) {

setConf(conf);

}

public void setConf(Configuration conf) {

this.conf = conf;

}

public Configuration getConf() {

return conf;

}

2）ToolRunner.run（）函數內部建立GenericOptionsParser對象

public static int run(Configuration conf, Tool tool, String[] args) throws Exception {

if (conf == null) {

conf = new Configuration();

}

GenericOptionsParser parser = new GenericOptionsParser(conf, args);

tool.setConf(conf);

String[] toolArgs = parser.getRemainingArgs();

return tool.run(toolArgs);

}

3）GenericOptionsParser構造函數1調用構造函數2，構造函數2調用解析函數parseGeneralOptions

public GenericOptionsParser(Options opts, String[] args) throws IOException {

this(new Configuration(), opts, args);

}//構造函數1

public GenericOptionsParser(Configuration conf, Options options, String[] args) throws IOException {

this.parseGeneralOptions(options, conf, args);

this.conf = conf;

}//構造函數2

4）parseGeneralOptions先調用解析函數parser.parse解析命令行，而後再用函數this.processGeneralOptions（）執行命令。

private void parseGeneralOptions(Options opts, Configuration conf, String[] args) throws IOException {

opts = buildGeneralOptions(opts);

GnuParser parser = new GnuParser();

try {

this.commandLine = parser.parse(opts, this.preProcessForWindows(args), true);

this.processGeneralOptions(conf, this.commandLine);

} catch (ParseException var7) {

LOG.warn("options parsing failed: " + var7.getMessage());

HelpFormatter formatter = new HelpFormatter();

formatter.printHelp("general options are: ", opts);

}

5）processGeneralOptions函數內部會根據不一樣的命令選項：fs、jt、conf、libjars、files、archives進行設置。

private void processGeneralOptions(Configuration conf, CommandLine line) throws IOException {

//設置默認的文件系統

if (line.hasOption("fs")) {

FileSystem.setDefaultUri(conf, line.getOptionValue("fs"));

}

//設置jobtracker服務ip地址和端口，用於監聽並接收來自各個TaskTracker發送的心跳信息，包括資源使用狀況和任務運行狀況等信息。

String fileName;

if (line.hasOption("jt")) {

fileName = line.getOptionValue("jt");

if (fileName.equalsIgnoreCase("local")) {

conf.set("mapreduce.framework.name", fileName);

}

conf.set("yarn.resourcemanager.address", fileName, "from -jt command line option");

}

//添加新的配置文件

String[] arr$;

int len$;

int i$;

String prop;

String[] property;

if (line.hasOption("conf")) {

property = line.getOptionValues("conf");

arr$ = property;

len$ = property.length;

for(i$ = 0; i$ < len$; ++i$) {

prop = arr$[i$];

conf.addResource(new Path(prop));

}

//從本地文件系統中複製指定的jar包到jobtracker使用的共享文件系統中，添加到mapreduce任務路徑，這個選項是一個頗有用放入方法來添加任務的依賴jar包。

if (line.hasOption("libjars")) {

conf.set("tmpjars", this.validateFiles(line.getOptionValue("libjars"), conf), "from -libjars command line option");

URL[] libjars = getLibJars(conf);

if (libjars != null && libjars.length > 0) {

conf.setClassLoader(new URLClassLoader(libjars, conf.getClassLoader()));

Thread.currentThread().setContextClassLoader(new URLClassLoader(libjars, Thread.currentThread().getContextClassLoader()));

}

//從本地文件系統中複製指定的文件到jobtracker使用的共享文件系統中，使他們可以被mapreduce任務使用

if (line.hasOption("files")) {

conf.set("tmpfiles", this.validateFiles(line.getOptionValue("files"), conf), "from -files command line option");

}

//從本地文件系統中複製指定的檔案到jobtracker使用的共享文件系統中，使他們可以被mapreduce任務使用。

if (line.hasOption("archives")) {

conf.set("tmparchives", this.validateFiles(line.getOptionValue("archives"), conf), "from -archives command line option");

}

//給屬性設置屬性值

if (line.hasOption('D')) {

property = line.getOptionValues('D');

arr$ = property;

len$ = property.length;

for(i$ = 0; i$ < len$; ++i$) {

prop = arr$[i$];

String[] keyval = prop.split("=", 2);

if (keyval.length == 2) {

conf.set(keyval[0], keyval[1], "from command line");

}

conf.setBoolean("mapreduce.client.genericoptionsparser.used", true);

if (line.hasOption("tokenCacheFile")) {

fileName = line.getOptionValue("tokenCacheFile");

FileSystem localFs = FileSystem.getLocal(conf);

Path p = localFs.makeQualified(new Path(fileName));

if (!localFs.exists(p)) {

throw new FileNotFoundException("File " + fileName + " does not exist.");

}

if (LOG.isDebugEnabled()) {

LOG.debug("setting conf tokensFile: " + fileName);

}

UserGroupInformation.getCurrentUser().addCredentials(Credentials.readTokenStorageFile(p, conf));

conf.set("mapreduce.job.credentials.json", p.toString(), "from -tokenCacheFile command line option");

}

6）設置好配置以後，全部的命令行都解析執行，參數都添加到了Configuration對象之中，接下來就能夠獲取這些參數。

在第2）步的ToolRunner的run函數中

public static int run(Configuration conf, Tool tool, String[] args) throws Exception {

if (conf == null) {

conf = new Configuration();

}

GenericOptionsParser parser = new GenericOptionsParser(conf, args);

tool.setConf(conf);

String[] toolArgs = parser.getRemainingArgs();

return tool.run(toolArgs);

}

parser.getRemainingArgs()；獲取的其實是第4）步中解析的命令行參數

public String[] getRemainingArgs() {

return this.commandLine == null ? new String[0] : this.commandLine.getArgs();

}

7）調用tool接口的run方法，實際是調用ToolRunnerDemo重寫的run方法。由於ToolRunnerDemo對象實現了Tool接口，ToolRunner.run函數形參傳入ToolRunnerDemo對象引用，在調用tool.run()方法， 實際是調用ToolRunnerDemo重寫的run方法。

@Override

public int run(String[] args) throws Exception {

Configuration conf = getConf();

for (Entry<String, String> entry : conf) {//輸出全部的屬性值

System.out.printf("%s=%s\n", entry.getKey(), entry.getValue());

}

return 0;

}

（3）使用ToolRunnerDemo設置hadoop參數調用實例

1）使用ToolRunnerDemo輸出全部配置屬性

[root@jediael project]#hadoop jar toolrunnerdemo.jar org.jediael.hadoopdemo.toolrunnerdemo.ToolRunnerDemo

io.seqfile.compress.blocksize=1000000

keep.failed.task.files=false

mapred.disk.healthChecker.interval=60000

dfs.df.interval=60000

dfs.datanode.failed.volumes.tolerated=0

mapreduce.reduce.input.limit=-1

mapred.task.tracker.http.address=0.0.0.0:50060

mapred.used.genericoptionsparser=true

mapred.userlog.retain.hours=24

dfs.max.objects=0

mapred.jobtracker.jobSchedulable=org.apache.hadoop.mapred.JobSchedulable

mapred.local.dir.minspacestart=0

hadoop.native.lib=true

2）經過-D指定新的參數，-D設置參數color爲yello，grep查看設置屬性

[root@jediael project]# hadoop org.jediael.hadoopdemo.toolrunnerdemo.ToolRunnerDemo -D color=yello | grep color

color=yello

3）經過-conf增長新的配置文件，-conf用於添加配置文件，wc命令用於查看配置數量。

hadoop jar toolrunnerdemo.jar org.jediael.hadoopdemo.toolrunnerdemo.ToolRunnerDemo-conf /opt/jediael/hadoop-1.2.0/conf/mapred-site.xml | wc

68 68 3028

其中mapred-site.xml的內容以下：

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<name>mapred.job.tracker</name>

<value>localhost:9001</value>

</property>

</configuration>

（4）ToolRunner使用匯總

-D color=yello	-D設置參數color爲yello，grep查看設置屬性
-conf conf/mapred-site.xml	-conf用於添加配置文件
-fs uri	//設置文件系統爲uri指定的路徑，等同-D fs.default.FS=url
-jt 10.21.34.11:3800	//hadoop1中用於設置jobtracker的ip地址和端口，用於監聽並接收來自各個TaskTracker發送的心跳信息，包括資源使用狀況和任務運行狀況等信息。hadoop2中用於指定YARN資源管理器地址。等同-D yarn.resourcemanager.address= 10.21.34.11:3800.
-files file1,file2	從本地文件系統中複製指定的文件到jobtracker使用的共享文件系統中，使他們可以被mapreduce任務使用
-libjars jars1,jars2	從本地文件系統中複製指定的jar包到jobtracker使用的共享文件系統中，添加到mapreduce任務路徑，這個選項是一個頗有用放入方法來添加任務的依賴jar包。
-archives archive1，archive2	從本地文件系統中複製指定的檔案到jobtracker使用的共享文件系統中，使他們可以被mapreduce任務使用。