接上一篇文章: https://my.oschina.net/zhzhenqin/blog/781670app
Tez On Yarn 安裝成功後,是爲了給 Hive 或者 Pig 提供執行引擎。oop
Hive 默認支持 MapReduce,Tez,Spark(在 SparkSQL 中支持) 等執行引擎。所以給 Hive 換上 Tez 很是簡單,只需給 hive-site.xml 中設置:測試
<property> <name>hive.execution.engine</name> <value>tez</value> </property>
設置hive.execution.engine爲 tez 後進入到 Hive 執行 SQL:.net
hive> select count(*) as c from userinfo; Query ID = zhenqin_20161104150743_4155afab-4bfa-4e8a-acb0-90c8c50ecfb5 Total jobs = 1 Launching Job 1 out of 1 Status: Running (Executing on YARN cluster with App id application_1478229439699_0007) -------------------------------------------------------------------------------- VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -------------------------------------------------------------------------------- Map 1 .......... SUCCEEDED 2 2 0 0 0 0 Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0 -------------------------------------------------------------------------------- VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 6.19 s -------------------------------------------------------------------------------- OK 1000000 Time taken: 6.611 seconds, Fetched: 1 row(s)
能夠看到,個人 userinfo 中有 100W 條記錄,執行一遍 count 須要 6.19s。 如今把 engine 換爲 mr翻譯
set hive.execution.engine=mr;
再次執行 count userinfo:日誌
hive> select count(*) as c from userinfo; Query ID = zhenqin_20161104152022_c7e6c5bd-d456-4ec7-b895-c81a369aab27 Total jobs = 1 Launching Job 1 out of 1 Starting Job = job_1478229439699_0010, Tracking URL = http://localhost:8088/proxy/application_1478229439699_0010/ Kill Command = /Users/zhenqin/software/hadoop/bin/hadoop job -kill job_1478229439699_0010 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2016-11-04 15:20:28,323 Stage-1 map = 0%, reduce = 0% 2016-11-04 15:20:34,587 Stage-1 map = 100%, reduce = 0% 2016-11-04 15:20:40,796 Stage-1 map = 100%, reduce = 100% Ended Job = job_1478229439699_0010 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 HDFS Read: 215 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK 1000000 Time taken: 19.46 seconds, Fetched: 1 row(s) hive>
能夠看到,使用 Tez 效率比 MapReduce 有近3倍的提高。並且,Hive 在使用 Tez 引擎執行時,有 ==>> 動態的進度指示。而在使用 mr 時,只有日誌輸出 map and reduce 的進度百分比。使用 tez,輸出的日誌也清爽不少。code
在我測試的不少複雜的 SQL,Tez 的都比 MapReduce 快不少,快慢取決於 SQL 的複雜度。執行簡單的 select 等並不能體現 tez 的優點。Tez 內部翻譯 SQL 能任意的 Map,Reduce,Reduce 組合,而 MR 只能 Map->Reduce->Map->Reduce,所以在執行復雜 SQL 時, Tez 的優點明顯。orm
上文一篇文章提到的 Tez Timeline 在配置好後,任何的 Tez DAG Job 都會在 UI 上展現。xml