Hive On Tez，Tez 和 MapReduce engine 性能對比

時間 2019-11-10

標籤 hive tez mapreduce engine 性能對比欄目 Hadoop 简体版

原文原文鏈接

接上一篇文章： https://my.oschina.net/zhzhenqin/blog/781670app

Tez On Yarn 安裝成功後，是爲了給 Hive 或者 Pig 提供執行引擎。oop

Hive 默認支持 MapReduce，Tez，Spark（在 SparkSQL 中支持）等執行引擎。所以給 Hive 換上 Tez 很是簡單，只需給 hive-site.xml 中設置：測試

<property>
    <name>hive.execution.engine</name>
    <value>tez</value>
</property>

設置hive.execution.engine爲 tez 後進入到 Hive 執行 SQL：.net

hive> select count(*) as c from userinfo;
Query ID = zhenqin_20161104150743_4155afab-4bfa-4e8a-acb0-90c8c50ecfb5
Total jobs = 1
Launching Job 1 out of 1


Status: Running (Executing on YARN cluster with App id application_1478229439699_0007)

--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED      2          2        0        0       0       0
Reducer 2 ......   SUCCEEDED      1          1        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 6.19 s     
--------------------------------------------------------------------------------
OK
1000000
Time taken: 6.611 seconds, Fetched: 1 row(s)

能夠看到，個人 userinfo 中有 100W 條記錄，執行一遍 count 須要 6.19s。如今把 engine 換爲 mr翻譯

set hive.execution.engine=mr;

再次執行 count userinfo:日誌

hive> select count(*) as c from userinfo;
Query ID = zhenqin_20161104152022_c7e6c5bd-d456-4ec7-b895-c81a369aab27
Total jobs = 1
Launching Job 1 out of 1
Starting Job = job_1478229439699_0010, Tracking URL = http://localhost:8088/proxy/application_1478229439699_0010/
Kill Command = /Users/zhenqin/software/hadoop/bin/hadoop job  -kill job_1478229439699_0010
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2016-11-04 15:20:28,323 Stage-1 map = 0%,  reduce = 0%
2016-11-04 15:20:34,587 Stage-1 map = 100%,  reduce = 0%
2016-11-04 15:20:40,796 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_1478229439699_0010
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   HDFS Read: 215 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
1000000
Time taken: 19.46 seconds, Fetched: 1 row(s)
hive>

能夠看到，使用 Tez 效率比 MapReduce 有近3倍的提高。並且，Hive 在使用 Tez 引擎執行時，有 ==>> 動態的進度指示。而在使用 mr 時，只有日誌輸出 map and reduce 的進度百分比。使用 tez，輸出的日誌也清爽不少。code

在我測試的不少複雜的 SQL，Tez 的都比 MapReduce 快不少，快慢取決於 SQL 的複雜度。執行簡單的 select 等並不能體現 tez 的優點。Tez 內部翻譯 SQL 能任意的 Map，Reduce，Reduce 組合，而 MR 只能 Map->Reduce->Map->Reduce，所以在執行復雜 SQL 時， Tez 的優點明顯。orm

上文一篇文章提到的 Tez Timeline 在配置好後，任何的 Tez DAG Job 都會在 UI 上展現。xml