JDK 1.8.0 Hadoop 2.6.0 Scala 2.11.8 Spark 2.1.2 Oozie 4.1 Hue 3.9
hdfs://localcluster/user/hue/oozie/workspaces/hue-oozie-1570773494.4/lib/DataWarehouse-1.0-SNAPSHOT.jar
hdfs://localcluster/user/hue/oozie/workspaces/hue-oozie-1570773494.4/lib/DataWarehouse-1.0-SNAPSHOT.jar dw.user.qhy.wc.WordCount --properties-file spark.properties
<workflow-app name="data_warehouse.test" xmlns="uri:oozie:workflow:0.5"> <start to="spark-2d66"/> <kill name="Kill"> <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <action name="spark-2d66"> <spark xmlns="uri:oozie:spark-action:0.2"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <master>yarn</master> <mode>cluster</mode> <name>data_warehouse.workflow</name> <class>dw.update.JobStream</class> <jar>hdfs://localcluster/user/hue/oozie/workspaces/hue-oozie-1578979482.24/lib/DataWarehouse-1.0.jar</jar> <spark-opts>--properties-file spark.properties</spark-opts> <arg>${CustomArgs}</arg> </spark> <ok to="End"/> <error to="Kill"/> </action> <end name="End"/> </workflow-app>
import requests import json import time from pprint import pprint HEADER = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36', 'Content-Type': 'application/xml;charset=UTF-8', }
# oozie.wf.application.path 裏面存放 workflow.xml XML = ''' <configuration> <property> <name>oozie.wf.application.path</name> <value>hdfs://localcluster/user/hue/oozie/workspaces/hue-oozie-1578979482.24</value> </property> <property> <name>oozie.use.system.libpath</name> <value>True</value> </property> <property> <name>user.name</name> <value>walker</value> </property> <property> <name>jobTracker</name> <value>rm1</value> </property> <property> <name>mapreduce.job.user.name</name> <value>walker</value> </property> <property> <name>nameNode</name> <value>hdfs://localcluster</value> </property> <property> <name>CustomArgs</name> <value>%s</value> </property> </configuration> ''' CustomArgs = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36', 'Content-Type': 'application/xml;charset=UTF-8', 'hello': 'world' } XML = XML % json.dumps(CustomArgs)
# 提交任務 r = requests.post('http://oozie.walker:11000/oozie/v1/jobs?action=start', data=XML, headers=HEADER) print(r.text) # {"id":"0000034-191012235641226-oozie-vipc-W"} # 獲取任務 ID jobid = json.loads(r.text, encoding='utf8')['id'] # 查看運行狀態 url = 'http://oozie.walker:11000/oozie/v1/job/%s?show=info&timezone=GMT' % jobid while True: time.sleep(5) r = requests.get(url) pprint(r.text) # 查看運行狀態
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, Attempt to add (hdfs://localcluster/user/hue/oozie/workspaces/hue-oozie-1570758098.65/lib/DataWarehouse-1.0-SNAPSHOT.jar) multiple times to the distributed cache.
能夠參考這篇文章的處理方式: java.lang.IllegalArgumentException: Attempt to add (custom-jar-with-spark-code.jar) multiple times to the distributed cachehtml
java.io.IOException: java.lang.NullPointerException java.io.EOFException com.esotericsoftware.kryo.KryoException
多是由於不當的使用了 kryo 序列化器,最簡單的解決方法是將java
spark.serializer=org.apache.spark.serializer.KryoSerializer
換回默認的node
spark.serializer=org.apache.spark.serializer.JavaSerializer
進一步可參考這篇文章的解決方案:Spark2 的序列化(JavaSerializer/KryoSerializer)python
本文出自 walker snapshot