入門大數據---Hive計算引擎Tez簡介和使用

1、前言

Hive默認計算引擎時MR,爲了提升計算速度,咱們能夠改成Tez引擎。至於爲何提升了計算速度,能夠參考下圖:html

image-20200719151044959

用Hive直接編寫MR程序,假設有四個有依賴關係的MR做業,上圖中,綠色是Reduce Task,雲狀表示寫屏蔽,須要將中間結果持久化寫到HDFS。java

Tez能夠將多個有依賴的做業轉換爲一個做業,這樣只需寫一次HDFS,且中間節點較少,從而大大提高做業的計算性能。node

2、安裝包準備

1)下載tez的依賴包:http://tez.apache.orgshell

2)拷貝apache-tez-0.9.1-bin.tar.gz到hadoop102的/opt/module目錄apache

[root@hadoop102 module]$ ls

apache-tez-0.9.1-bin.tar.gzvim

3)解壓縮apache-tez-0.9.1-bin.tar.gzapi

[root@hadoop102 module]$ tar -zxvf apache-tez-0.9.1-bin.tar.gz

4)修更名稱app

[root@hadoop102 module]$ mv apache-tez-0.9.1-bin/ tez-0.9.1

3、在Hive中配置Tez

1)進入到Hive的配置目錄:/opt/module/hive/confoop

[root@hadoop102 conf]$ pwd
/opt/module/hive/conf

2)在hive-env.sh文件中添加tez環境變量配置和依賴包環境變量配置性能

[root@hadoop102 conf]$ vim hive-env.sh

添加以下配置

# Set HADOOP_HOME to point to a specific hadoop install directory
export HADOOP_HOME=/opt/module/hadoop-2.7.2

# Hive Configuration Directory can be controlled by:
export HIVE_CONF_DIR=/opt/module/hive/conf

# Folder containing extra libraries required for hive compilation/execution can be controlled by:
export TEZ_HOME=/opt/module/tez-0.9.1    #是你的tez的解壓目錄
export TEZ_JARS=""
for jar in `ls $TEZ_HOME |grep jar`; do
    export TEZ_JARS=$TEZ_JARS:$TEZ_HOME/$jar
done
for jar in `ls $TEZ_HOME/lib`; do
    export TEZ_JARS=$TEZ_JARS:$TEZ_HOME/lib/$jar
done

export HIVE_AUX_JARS_PATH=/opt/module/hadoop-2.7.2/share/hadoop/common/hadoop-lzo-0.4.20.jar$TEZ_JARS

3)在hive-site.xml文件中添加以下配置,更改hive計算引擎

<property>
    <name>hive.execution.engine</name>
    <value>tez</value>
</property>

4、配置Tez

1)在Hive的/opt/module/hive/conf下面建立一個tez-site.xml文件

[root@hadoop102 conf]$ pwd
/opt/module/hive/conf
[root@hadoop102 conf]$ vim tez-site.xml

添加以下內容

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
	<name>tez.lib.uris</name>    <value>${fs.defaultFS}/tez/tez-0.9.1,${fs.defaultFS}/tez/tez-0.9.1/lib</value>
</property>
<property>
	<name>tez.lib.uris.classpath</name>    	<value>${fs.defaultFS}/tez/tez-0.9.1,${fs.defaultFS}/tez/tez-0.9.1/lib</value>
</property>
<property>
     <name>tez.use.cluster.hadoop-libs</name>
     <value>true</value>
</property>
<property>
     <name>tez.history.logging.service.class</name>        <value>org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService</value>
</property>
</configuration>

5、上傳Tez到集羣

1)將/opt/module/tez-0.9.1上傳到HDFS的/tez路徑

[root@hadoop102 conf]$ hadoop fs -mkdir /tez
[root@hadoop102 conf]$ hadoop fs -put /opt/module/tez-0.9.1/ /tez
[root@hadoop102 conf]$ hadoop fs -ls /tez
/tez/tez-0.9.1

6、測試

1)啓動Hive

[root@hadoop102 hive]$ bin/hive

2)建立LZO表

hive (default)> create table student(
id int,
name string);

3)向表中插入數據

hive (default)> insert into student values(1,"zhangsan");

4)若是沒有報錯就表示成功了

hive (default)> select * from student;
1       zhangsan

7、小結

1)運行Tez時檢查到用過多內存而被NodeManager殺死進程問題:

Caused by: org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. Application application_1546781144082_0005 failed 2 times due to AM Container for appattempt_1546781144082_0005_000002 exited with  exitCode: -103
For more detailed output, check application tracking page:http://hadoop103:8088/cluster/app/application_1546781144082_0005Then, click on links to logs of each attempt.
Diagnostics: Container [pid=11116,containerID=container_1546781144082_0005_02_000001] is running beyond virtual memory limits. Current usage: 216.3 MB of 1 GB physical memory used; 2.6 GB of 2.1 GB virtual memory used. Killing container.

這種問題是從機上運行的Container試圖使用過多的內存,而被NodeManager kill掉了。

[摘錄] The NodeManager is killing your container. It sounds like you are trying to use hadoop streaming which is running as a child process of the map-reduce task. The NodeManager monitors the entire process tree of the task and if it eats up more memory than the maximum set in mapreduce.map.memory.mb or mapreduce.reduce.memory.mb respectively, we would expect the Nodemanager to kill the task, otherwise your task is stealing memory belonging to other containers, which you don't want.

解決方法:

方案一:或者是關掉虛擬內存檢查。咱們選這個,修改yarn-site.xml

<property>
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
</property>

方案二:mapred-site.xml中設置Map和Reduce任務的內存配置以下:(value中實際配置的內存須要根據本身機器內存大小及應用狀況進行修改)

<property>
  <name>mapreduce.map.memory.mb</name>
  <value>1536</value>
</property>
<property>
  <name>mapreduce.map.java.opts</name>
  <value>-Xmx1024M</value>
</property>
<property>
  <name>mapreduce.reduce.memory.mb</name>
  <value>3072</value>
</property>
<property>
  <name>mapreduce.reduce.java.opts</name>
  <value>-Xmx2560M</value>
</property>

系列傳送門

相關文章
相關標籤/搜索