[TOC]html
調度框架:Linux Crontab,Azkaban,oozie,zeusjava
三款任務調度系統比較node
oozie是一個工做流調度系統git
記錄下踩的
報錯 Error: E0505 : E0505: App definition [hdfs://localhost:8020/tmp/oozie-app/coordinator/] does not exist
這個錯誤信息很坑爹,當時發現其實不是目錄不對,是coordinator.xml文件名命名有問題。github
準備工做:時區統一web
建議採用東八區時間(GMT+0800)spring
在服務器上,date -R
若是顯示以下信息,則表示爲東八區,若是不是須要設置時區,通常採用北京或者上海的ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
Sat, 30 Sep 2017 10:26:58 +0800
shell
接着去修改oozie-site.xml,若是沒有這個屬性,就增長數據庫
<property> <name>oozie.processing.timezone</name> <value>GMT+0800</value> </property>
讓界面的時間也顯示正確 apache
文件目錄結構
├── ooziespark │ ├── job.properties │ ├── lib │ │ └── spark-1.6.2-1.0-SNAPSHOT.jar │ └── workflow.xml
workflow.xml
<?xml version="1.0" encoding="utf-8"?> <workflow-app xmlns="uri:oozie:workflow:0.5" name="SparkWordCount"> <start to="spark-node"/> <action name="spark-node"> <spark xmlns="uri:oozie:spark-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete path="${outputdir}"/> </prepare> <master>${master}</master> <name>Spark-Wordcount</name> <class>WordCount</class> <jar>${nameNode}/user/LJK/ooziecoor/lib/spark-1.6.2-1.0-SNAPSHOT.jar</jar> <spark-opts>--driver-memory 512M --executor-memory 512M</spark-opts> <arg>${inputdir}</arg> <arg>${outputdir}</arg> </spark> <ok to="end"/> <error to="fail"/> </action> <kill name="fail"> <message>Workflow failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end"/> </workflow-app>
job.properties
nameNode=hdfs://nn1:8020 jobTracker=rm:8050 master=yarn-cluster queueName=default inputdir=/user/LJK/hello-spark outputdir=/user/LJK/output oozie.use.system.libpath=true oozie.wf.application.path=/user/LJK/ooziespark #oozie.coord.application.path=${nameNode}/user/LJK/ooziespark #start=2017-09-28T17:00+0800 #end=2017-09-30T17:00+0800 #workflowAppUri=${nameNode}/user/LJK/ooziespark/
打包程序拷貝到app/lib目錄下,測試源碼如下
object WordCount { def main(args: Array[String]): Unit = { val conf = new SparkConf() // .setJars(List("/Users/LJK/Documents/code/github/study-spark1.6.2/target/spark-1.6.2-1.0-SNAPSHOT.jar")) // .set("spark.yarn.historyServer.address", "rm:18080") // .set("spark.eventLog.enabled", "true") // .set("spark.eventLog.dir", "hdfs://nn1:8020/spark-history") .set("spark.testing.memory", "1073741824") val sc = new SparkContext(conf) val rdd = sc.textFile(args(0)) .flatMap(_.split(" ")) .map((_, 1)) .reduceByKey(_ + _) rdd.saveAsTextFile(args(1)) sc.stop() } }
把這個目錄上傳到HDFS目錄,執行命令hdfs dfs -put ooziespark /user/LJK/
注意點:job.properties能夠不用上傳到HDFS,由於執行命令的時候用的是本地的不是HDFS的
oozie啓動job,執行命令 oozie job -oozie http://rm:11000/oozie -config /usr/local/share/applications/ooziespark/job.properties -run
或者 oozie job -config /usr/local/share/applications/ooziespark/job.properties -run
簡略版前提是你要配置the env variable 'OOZIE_URL' is used as default value for the '-oozie' option
,具體能夠用oozie help
查看
在oozie界面上查看job執行
簡單調度,每五分鐘跑一次WordCount
文件目錄結構
├── ooziecoor │ ├── coordinator.xml │ ├── job.properties │ ├── lib │ │ └── spark-1.6.2-1.0-SNAPSHOT.jar │ └── workflow.xml
coordinator.xml
<coordinator-app name="cron-coord" frequency="${coord:minutes(5)}" start="${start}" end="${end}" timezone="GMT+0800" xmlns="uri:oozie:coordinator:0.4"> <action> <workflow> <app-path>${workflowAppUri}</app-path> <configuration> <property> <name>jobTracker</name> <value>${jobTracker}</value> </property> <property> <name>nameNode</name> <value>${nameNode}</value> </property> <property> <name>queueName</name> <value>${queueName}</value> </property> </configuration> </workflow> </action> </coordinator-app>
修改以前的job.properties,改成
nameNode=hdfs://nn1:8020 jobTracker=rm:8050 master=yarn-cluster queueName=default inputdir=/user/LJK/hello-spark outputdir=/user/LJK/output oozie.use.system.libpath=true #oozie.wf.application.path=/user/LJK/ooziespark oozie.coord.application.path=${nameNode}/user/LJK/ooziecoor start=2017-09-30T9:30+0800 end=2017-09-30T17:00+0800 workflowAppUri=${nameNode}/user/LJK/ooziecoor
以前的workflow能夠直接保留不改jar包位置也是能夠的,但爲了每一個任務更加好看,修改下jar包位置便可
上傳到HDFS,並執行命令 oozie job -config /usr/local/share/applications/ooziecoor/job.properties -run
能夠在web上查看job
文件結構
├── ooziebundle │ ├── bundle.xml │ ├── coordinator.xml │ ├── job.properties │ ├── lib │ │ └── spark-1.6.2-1.0-SNAPSHOT.jar │ └── workflow.xml
增長bundle.xml
<bundle-app name='bundle-app' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xmlns='uri:oozie:bundle:0.1'> <coordinator name='coord-1'> <app-path>${nameNode}/user/LJK/ooziebundle/coordinator.xml</app-path> <configuration> <property> <name>start</name> <value>${start}</value> </property> <property> <name>end</name> <value>${end}</value> </property> </configuration> </coordinator> </bundle-app>
修改job.properties
nameNode=hdfs://nn1:8020 jobTracker=rm:8050 master=yarn-cluster queueName=default inputdir=/user/LJK/hello-spark outputdir=/user/LJK/output oozie.use.system.libpath=true #oozie.wf.application.path=/user/LJK/ooziespark #oozie.coord.application.path=${nameNode}/user/LJK/ooziecoor oozie.bundle.application.path=${nameNode}/user/LJK/ooziebundle start=2017-09-30T9:30+0800 end=2017-09-30T17:00+0800 workflowAppUri=${nameNode}/user/LJK/ooziebundle
上傳到HDFS,並執行命令 oozie job -config /usr/local/share/applications/ooziebundle/job.properties -run
web上查看job
文件結構,lib包不是打成一個jar包因此不列出了,你能夠選擇打成一個jar包
javaExample/ ├── job.properties ├── lib └── workflow.xml
注意
若是你用的是SpringBoot框架,須要在pom上加上exclusions,不然會有jar包衝突,oozie會報錯
<dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter</artifactId> <exclusions> <exclusion> <artifactId>spring-boot-starter-logging</artifactId> <groupId>org.springframework.boot</groupId> </exclusion> </exclusions> </dependency>
workflow.xml
<workflow-app name="My_Workflow" xmlns="uri:oozie:workflow:0.5"> <start to="java-2d81"/> <kill name="Kill"> <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <action name="java-2d81"> <java> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <main-class>com.sharing.App</main-class> <arg>hello</arg> <arg>springboot</arg> </java> <ok to="End"/> <error to="Kill"/> </action> <end name="End"/> </workflow-app>
job.properties
oozie.use.system.libpath=false queueName=default jobTracker=rm.ambari:8050 nameNode=hdfs://nn1.ambari:8020 oozie.wf.application.path=${nameNode}/user/LJK/javaExample
java程序源碼
@SpringBootApplication public class App { public static void main(String[] args) { SpringApplication.run(App.class,args); System.out.println(args[0] + " " + args[1]); } }
文件結構
shell ├── job.properties └── workflow.xml
workflow.xml
<workflow-app name="My_Workflow" xmlns="uri:oozie:workflow:0.5"> <start to="shell-2504"/> <kill name="Kill"> <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <action name="shell-2504"> <shell xmlns="uri:oozie:shell-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <exec>echo</exec> <argument>hello shell</argument> <capture-output/> </shell> <ok to="End"/> <error to="Kill"/> </action> <end name="End"/> </workflow-app>
job.properties
hue-id-w=50057 jobTracker=rm.ambari:8050 mapreduce.job.user.name=admin nameNode=hdfs://nn1.ambari:8020 oozie.use.system.libpath=True oozie.wf.application.path=hdfs://nn1.ambari:8020/user/LJK/shell user.name=admin
文件結構
hiveExample/ ├── hive-site.xml ├── input │ └── inputdata ├── job.properties ├── output ├── script.q └── workflow.xml
hive script,寫一個hive腳本,文件名自定義,
script.q文件內容
DROP TABLE IF EXISTS test; CREATE EXTERNAL TABLE test (a INT) STORED AS TEXTFILE LOCATION '${INPUT}'; INSERT OVERWRITE DIRECTORY '${OUTPUT}' SELECT * FROM test;
workflow.xml
<workflow-app name="My_Workflow" xmlns="uri:oozie:workflow:0.5"> <start to="hive-bfbc"/> <kill name="Kill"> <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <action name="hive-bfbc" cred="hcat"> <hive xmlns="uri:oozie:hive-action:0.2"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete path="${nameNode}/user/LJK/hiveExample/output"/> <mkdir path="${nameNode}/user/LJK/hiveExample/output"/> </prepare> <job-xml>/user/LJK/hiveExample/hive-site.xml</job-xml> <script>/user/LJK/hiveExample/script.q</script> <param>INPUT=/user/LJK/hiveExample/input</param> <param>OUTPUT=/user/LJK/hiveExample/output</param> </hive> <ok to="End"/> <error to="Kill"/> </action> <end name="End"/> </workflow-app>
job.properties
hue-id-w=50059 jobTracker=rm.ambari:8050 mapreduce.job.user.name=admin nameNode=hdfs://nn1.ambari:8020 oozie.use.system.libpath=True oozie.wf.application.path=hdfs://nn1.ambari:8020/user/LJK/hiveExample user.name=admin
其中hdfs://nn1.ambari:8020/user/LJK/hiveExample/input要放一個文件,文件名自定義,
inputdata文件內容
1 2 3 4 6 7 8 9
執行成功後,能夠看到output文件夾生成文件000000_0,內容與inputdata內容一致
跟Hive Action基本是同樣的,只要改動workflow.xml就好
<workflow-app name="My_Workflow" xmlns="uri:oozie:workflow:0.5"> <start to="hive2-8f27"/> <kill name="Kill"> <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <action name="hive2-8f27" cred="hive2"> <hive2 xmlns="uri:oozie:hive2-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete path="${nameNode}/user/LJK/hiveExample/output"/> <mkdir path="${nameNode}/user/LJK/hiveExample/output"/> </prepare> <job-xml>/user/LJK/hiveExample/hive-site.xml</job-xml> <jdbc-url>jdbc:hive2://rm.ambari:10000/default</jdbc-url> <script>/user/LJK/hiveExample/script.q</script> <param>INPUT=/user/LJK/hiveExample/input</param> <param>OUTPUT=/user/LJK/hiveExample/output</param> </hive2> <ok to="End"/> <error to="Kill"/> </action> <end name="End"/> </workflow-app>