hive on tez踩坑記1-hive0.13 on tez

  最近集羣準備升級到cdh5.2.0,並使用tez,在測試集羣cdh5.2.0已經穩定運行了很長時間,所以開始折騰hive on tez了,期間遇到很多問題,這裏記錄下。java

hive on tez的部署比較簡單,能夠參考wiki.主要注意幾個地方git

1.編譯的時候github

mvn clean package -Dtar -DskipTests=true -Dmaven.javadoc.skip=true

2.須要將tez相關的包upload到hdfs中,並設置tez-site.xmlapache

  <property>
    <name>tez.lib.uris</name>
    <value>${fs.defaultFS}/tez,${fs.defaultFS}/tez/lib</value>
  </property>

設置mapred-site.xmlapi

  <property>
      <name>mapreduce.framework.name</name>
      <value>yarn-tez</value>
  </property>


3.注意更新hadoop-env.sh中classpath的設置bash

export TEZ_HOME=/home/vipshop/platform/tez
for jar in `ls $TEZ_HOME |grep jar`; do
    export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$TEZ_HOME/$jar
done
for jar in `ls $TEZ_HOME/lib`; do
    export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$TEZ_HOME/lib/$jar
done

不然會報以下錯誤(加載不到對應的tez相關類,致使Cluster 初始化時失敗):session

java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
        at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
        at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:82)
        at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:75)
        at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1265)
        at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1261)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
        at org.apache.hadoop.mapreduce.Job.connect(Job.java:1260)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1289)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1313)
        at org.apache.hadoop.mapreduce.SleepJob.run(SleepJob.java:261)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.mapreduce.SleepJob.main(SleepJob.java:194)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
        at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
        at org.apache.hadoop.test.MapredTestDriver.run(MapredTestDriver.java:118)
        at org.apache.hadoop.test.MapredTestDriver.main(MapredTestDriver.java:126)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

部署完畢後,使用hadoop jar提交tez job運行正常,測試hive on tez:app

hive -hiveconf hive.execution.engine=tez -hiveconf hive.root.logger=DEBUG,console

出現以下報錯:
maven

Exception in thread "main" java.lang.NoSuchMethodError: 
org.apache.tez.mapreduce.hadoop.MRHelpers.updateEnvironmentForMRAM(Lorg/apache/hadoop/conf/Configuration;Ljava/util/Map;)V
        at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:182)
        at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:123)
        at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:355)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

從堆棧上來看是因爲session初始化異常致使,ide

org.apache.hadoop.hive.cli.CliDriver.main->org.apache.hadoop.hive.cli.CliDriver.run->
org.apache.hadoop.hive.ql.session.SessionState.start

在SessionState.start方法中:

 if (HiveConf.getVar(startSs.getConf(), HiveConf.ConfVars.HIVE_EXECUTION_ENGINE)  
        .equals( "tez") && (startSs.isHiveServerQuery == false)) { //若是設置hive.execution.engine爲tez(默認爲mr)
      try {
        if (startSs.tezSessionState == null) {
          startSs.tezSessionState = new TezSessionState(startSs.getSessionId()); //實例化一個TezSessionState對象
        }
        startSs.tezSessionState.open(startSs.conf ); //調用TezSessionState.open方法
      } catch (Exception e) {
        throw new RuntimeException(e);
      }
    } else {
      LOG.info( "No Tez session required at this point. hive.execution.engine=mr.");
    }

TezSessionState.open中,首先使用createTezDir建立臨時文件目錄

    // create the tez tmp dir
    tezScratchDir = createTezDir(sessionId);
    String dir = tezScratchDir.toString();
    // Localize resources to session scratch dir
    localizedResources = utils.localizeTempFilesFromConf(dir, conf); //DagUtils.localizeTempFilesFromConf方法
    List<LocalResource> handlerLr = utils.localizeTempFiles(dir, conf, additionalFiles); // DagUtils.localizeTempFiles方法
    if (handlerLr != null) {
      if (localizedResources == null) {
        localizedResources = handlerLr;
      } else {
        localizedResources.addAll(handlerLr);
      }
      additionalFilesNotFromConf = new HashSet<String>();
      for (String originalFile : additionalFiles) {
        additionalFilesNotFromConf.add(originalFile);
      }
    }
    // generate basic tez config
    TezConfiguration tezConfig = new TezConfiguration(conf); //而後實例化一個TezConfiguration對象
    tezConfig.set(TezConfiguration.TEZ_AM_STAGING_DIR, tezScratchDir.toUri().toString()); //設置tez的staging目錄,設置項爲tez.staging-dir,默認值爲/tmp/tez/staging
//這裏默認最終爲"/tmp/hive-" + System. getProperty( "user.name")/_tez_session_dir/sessionId
     appJarLr = createJarLocalResource(utils.getExecJarPathLocal()); //localize hive-exec.jar
    // configuration for the application master
    Map<String, LocalResource> commonLocalResources = new HashMap<String, LocalResource>();
    commonLocalResources.put( utils.getBaseName( appJarLr), appJarLr );
    if (localizedResources != null) {
      for (LocalResource lr : localizedResources) {
        commonLocalResources.put( utils.getBaseName(lr), lr);
      }
    }
    // Create environment for AM.
    Map<String, String> amEnv = new HashMap<String, String>();
    MRHelpers.updateEnvironmentForMRAM(conf, amEnv); //調用MRHelpers類的updateEnvironmentForMRAM方法

對於org.apache.tez.mapreduce.hadoop.MRHelpers類來講,在0.5.0中,這個updateEnvironmentForMRAM方法是不存在的,對應存在updateEnvBasedOnMRTaskEnv(配置Mappers和Reducers的環境變量)和updateEnvBasedOnMRAMEnv(配置am的環境變量)

public static void updateEnvBasedOnMRAMEnv(Configuration conf, Map<String, String> environment) {
  TezYARNUtils.appendToEnvFromInputString(environment, conf.get(MRJobConfig.MR_AM_ADMIN_USER_ENV),
      File.pathSeparator);
  TezYARNUtils.appendToEnvFromInputString(environment, conf.get(MRJobConfig.MR_AM_ENV),
      File.pathSeparator);
}

而在0.4.1-incubating中是有updateEnvironmentForMRAM這個方法的:

public static void updateEnvironmentForMRAM(Configuration conf, Map<String, String> environment) {
  TezYARNUtils.setEnvFromInputString(environment, conf.get(MRJobConfig.MR_AM_ADMIN_USER_ENV),
    File.pathSeparator);
  TezYARNUtils.setEnvFromInputString(environment, conf.get(MRJobConfig.MR_AM_ENV),
    File.pathSeparator);
}

對應的hive中:
hive0.13中:

    // Create environment for AM.
    Map<String, String> amEnv = new HashMap<String, String>();
    MRHelpers.updateEnvironmentForMRAM(conf, amEnv);

hive0.14中:

    // Create environment for AM.      
     Map<String, String> amEnv = new HashMap<String, String>();      
     MRHelpers.updateEnvBasedOnMRAMEnv(conf, amEnv);

能夠看到0.4.x到0.5.x版本的tez api變更比較大,0.5.x的tez已經和hive0.13.x不能兼容了,要想使用tez-0.5.x版本,必須使用hive0.14.x版本。        
在github下載hive0.14的源碼,編譯並測試運行hive on tez:
https://codeload.github.com/apache/hive/zip/branch-0.14

mvn clean package -DskipTests -Phadoop-2 -Pdist
相關文章
相關標籤/搜索