Pig On Tez, Pig 換 Tez 執行引擎

時間 2019-11-16

標籤 pig tez 執行引擎简体版

原文原文鏈接

Tez 安裝件上一篇文章： https://my.oschina.net/zhzhenqin/blog/781670java

Tez On Yarn 安裝成功後，是爲了給 Hive 或者 Pig 提供執行引擎。apache

安裝 Pig

下載地址： http://apache.fayea.com/pig/pig-0.15.0/pig-0.15.0.tar.gzgrunt

下載後直接解壓到本地目錄，若是安裝了 Hadoop 則能夠直接使用。oop

Pig 默認的執行引擎是 mr，pig 可選的幾種引擎以下：大數據

# Execution Mode. Local mode is much faster, but only suitable for small amounts
# of data. Local mode interprets paths on the local file system; Mapreduce mode
# on the HDFS. Read more under 'Execution Modes' within the Getting Started
# documentation.
#
# * mapreduce (default): use the Hadoop cluster defined in your Hadoop config files
# * local: use local mode
# * tez: use Tez on Hadoop cluster
# * tez_local: use Tez local mode
#
exectype=tez

執行時可使用 pig -x local script.pig 的方式優先選擇執行引擎，忽略配置文件。ui

開始Pig Helloword

和前兩篇文章同樣，nie 是 Apache Lisence 的文本文件。.net

group-limit-word.pig日誌

words = load '/user/hadoop/nie.txt' using PigStorage(' ') as (line); --以空格做爲分隔符把內容分詞讀入
grpd = group words by line;     --以每一個單詞Group
cntd = foreach grpd generate group, COUNT(words); -- Group Count
cous = order cntd by $1 desc; --以 count 倒排序
dump cous;

而後能夠運行：code

pig group-limit-word.pig
pig -x tez group-limit-word.pig

我分別使用 local, tez_local,tez,mapreduce 執行，對於幾十 kb 的文本文件，固然local 模式都是比較快的。一樣的如 hive 的 tez 同樣，tez 仍是比 mapreduce 要快不少，及時其它複雜的 pig script 一樣是 tez 比 mapreduce 快。若是是複雜script，大數據量的，tez 應該比 mapreduce 具備很是大的優點。server

Tez UI，執行 Pig 的界面

問題總結

執行後，Yarn 上 Tez Job 執行成功，可是沒有任何輸出。

本地日誌有以下輸出以下：

[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2117: Unexpected error when launching Tez job.

Details at logfile: /opt/software/pig/pig_1479783397791.log

查看該 pig_1479783397791.log Log，有以下錯誤：

Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2117: Unexpected error when launching Tez job.
	at org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.handleUnCaughtException(TezLauncher.java:282)
	at org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:235)
	at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:304)
	at org.apache.pig.PigServer.launchPlan(PigServer.java:1390)
	at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1375)
	at org.apache.pig.PigServer.storeEx(PigServer.java:1034)
	... 15 more
Caused by: java.lang.IllegalArgumentException: No enum constant org.apache.tez.common.counters.TaskCounter.SHUFFLE_CHUNK_COUNT

顯然是 Tez 的版本不匹配形成的。pig 的 lib/h2 下默認的 tez 是 0.7.0，而我安裝的是他的 bug 修復版本0.7.1。刪除 lib/h2下的 tez jar，copy tez0.7.1的 jar 到該目錄下：

rm -v lib/h2/tez-*.jar
cp -v tez/tez-*.jar lib/h2/

成功後執行 pig 腳本，輸出成功。

使用 mapreduce 執行 pig 腳本，報錯。內部 rpc 連接不到10020

[main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

這是由於 yarn 須要 historyserver，在 hadoop home 下啓動：

sbin/mr-jobhistory-daemon.sh start historyserver

啓動一個 mr 的 historyserver 便可。

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。