hadoop mapreduce程序jar包版本衝突解決方法

時間 2019-11-12

標籤 hadoop mapreduce 程序 jar 版本衝突解決方法欄目 Hadoop 简体版

原文原文鏈接

寫MR程序時每每會使用到第三方包，若是這些包在集羣中不存在，能夠經過多種方式提交到集羣供 MR 程序使用，但若是集羣中存在的jar與用戶MR程序用到的JAR存在版本衝突時該如何解決？apache

下面是我碰到的問題及解決方式，簡單記錄以下，碰到一樣問題的同窗能夠參考下:app

昨天使用 commons-net-3.2.jar 包鏈接FTP採集日誌，oop

調用方法片斷:spa

FTPClient ftpClient = new FTPClient();.net

ftpClient.setConnectTimeout(1000);日誌

// 這個方法在commons-net-3.2.jar包中有，而在 commons-net-1.4.1.jar 中沒有hadoop

通常狀況下，使用hadoop jar 執行mr的時候，會首先加載$HADOOP_HOME/lib下的jar包，ci

因爲使用的hadoop中帶了commons-net-1.4.1.jar，因此會優先加載1.4.1版本，而忽略用戶本身指定的3.2版本，因此報異常，get

Error: org.apache.commons.net.ftp.FTPClient.setConnectTimeout(I)Vinput

//異常提示調用setConnectTimeout 方法有問題。

/**

static List getClassPaths(JobConf conf, File workDir,

TaskDistributedCacheManager taskDistributedCacheManager)

throws IOException {

// Accumulates class paths for child.

List classPaths = new ArrayList();

boolean userClassesTakesPrecedence = conf.userClassesTakesPrecedence();

// 這個參數項能夠改變系統classpath加載的優先順序, 默認應該是false

if (!userClassesTakesPrecedence) { // 默認是false, tasktrack機器的系統classpath老是優先加載

// start with same classpath as parent process

appendSystemClasspaths(classPaths);

}

// include the user specified classpath

appendJobJarClasspaths(conf.getJar(), classPaths);

// Distributed cache paths

if (taskDistributedCacheManager != null)

classPaths.addAll(taskDistributedCacheManager.getClassPaths());

// Include the working dir too

classPaths.add(workDir.toString());

if (userClassesTakesPrecedence) {

// parent process's classpath is added last

appendSystemClasspaths(classPaths);

}

return classPaths;

}

經過上面源碼能夠看出參數項 -Dmapreduce.task.classpath.user.precedence 能夠改變系統classpath加載的優先順序

驗證：

hadoop jar collect_log.jar com.collect.LogCollectJob -Dmapreduce.task.classpath.user.precedence=true -libjars commons-net-3.2.jar /new_log_collect/input /new_log_collect/output

程序執行成功。

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。