Oozie概覽

時間 2019-11-16

標籤 oozie 概覽简体版

原文原文鏈接

OOZIE概覽

[TOC]html

調度框架：Linux Crontab，Azkaban，oozie，zeusjava

三款任務調度系統比較node

簡介

oozie是一個工做流調度系統git

工做流的調度是DAG
可擴展：一個oozie就是一個mr任務，可是僅僅是map，沒有reduce
可靠性：任務失敗後重試
集成了Hadoop生態系統的其餘任務，如mr、pig、hive、sqoop、spark

主要組件

tomcat（servlet進行調用並頁面顯示任務）
數據庫(存儲任務)
Bundle,coordinator,workflow

架構圖

三大服務模塊

Oozie V3 :a server based Bundle engine：對多個coordinator進行封裝，能夠啓動，中止，掛起，關閉，重啓一組coordinator的任務
Oozie V2 :a server based Coordinator engine:能夠運行多個workflow，結構：start->workfows->end
Oozie V1 :a server based workflow engine，結構：start->mr->pig->fork->mr/hive->join->end

workflow

coordinator

記錄下踩的
報錯
Error: E0505 : E0505: App definition [hdfs://localhost:8020/tmp/oozie-app/coordinator/] does not exist
這個錯誤信息很坑爹，當時發現其實不是目錄不對，是coordinator.xml文件名命名有問題。github

準備工做：時區統一web

建議採用東八區時間（GMT+0800）spring

在服務器上，date -R若是顯示以下信息，則表示爲東八區，若是不是須要設置時區，通常採用北京或者上海的ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
Sat, 30 Sep 2017 10:26:58 +0800shell

接着去修改oozie-site.xml，若是沒有這個屬性，就增長數據庫

<property>
      <name>oozie.processing.timezone</name>
      <value>GMT+0800</value>
    </property>

讓界面的時間也顯示正確 apache

Examples

Spark Action

workflow spark on yarn

workflow spark on yarn參考官方地址

文件目錄結構

├── ooziespark
│   ├── job.properties
│   ├── lib
│   │   └── spark-1.6.2-1.0-SNAPSHOT.jar
│   └── workflow.xml

workflow.xml

<?xml version="1.0" encoding="utf-8"?>
<workflow-app xmlns="uri:oozie:workflow:0.5" name="SparkWordCount">  
  <start to="spark-node"/>  
  <action name="spark-node"> 
    <spark xmlns="uri:oozie:spark-action:0.1">  
      <job-tracker>${jobTracker}</job-tracker>  
      <name-node>${nameNode}</name-node>  
      <prepare> 
        <delete path="${outputdir}"/>
      </prepare>  
      <master>${master}</master>  
      <name>Spark-Wordcount</name>  
      <class>WordCount</class>  
      <jar>${nameNode}/user/LJK/ooziecoor/lib/spark-1.6.2-1.0-SNAPSHOT.jar</jar>  
      <spark-opts>--driver-memory 512M --executor-memory 512M</spark-opts>  
      <arg>${inputdir}</arg>  
      <arg>${outputdir}</arg> 
    </spark>  
    <ok to="end"/>  
    <error to="fail"/> 
  </action>  
  <kill name="fail"> 
    <message>Workflow failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> 
  </kill>  
  <end name="end"/> 
</workflow-app>

job.properties

nameNode=hdfs://nn1:8020
jobTracker=rm:8050
master=yarn-cluster
queueName=default
inputdir=/user/LJK/hello-spark
outputdir=/user/LJK/output
oozie.use.system.libpath=true
oozie.wf.application.path=/user/LJK/ooziespark
#oozie.coord.application.path=${nameNode}/user/LJK/ooziespark
#start=2017-09-28T17:00+0800
#end=2017-09-30T17:00+0800
#workflowAppUri=${nameNode}/user/LJK/ooziespark/

打包程序拷貝到app/lib目錄下，測試源碼如下

object WordCount {

  def main(args: Array[String]): Unit = {

    val conf = new SparkConf()
//      .setJars(List("/Users/LJK/Documents/code/github/study-spark1.6.2/target/spark-1.6.2-1.0-SNAPSHOT.jar"))
//      .set("spark.yarn.historyServer.address", "rm:18080")
//      .set("spark.eventLog.enabled", "true")
//      .set("spark.eventLog.dir", "hdfs://nn1:8020/spark-history")
      .set("spark.testing.memory", "1073741824")
    val sc = new SparkContext(conf)
    val rdd = sc.textFile(args(0))
      .flatMap(_.split(" "))
      .map((_, 1))
      .reduceByKey(_ + _)
    rdd.saveAsTextFile(args(1))
    sc.stop()
  }
}

把這個目錄上傳到HDFS目錄，執行命令hdfs dfs -put ooziespark /user/LJK/
注意點：job.properties能夠不用上傳到HDFS，由於執行命令的時候用的是本地的不是HDFS的

oozie啓動job，執行命令
oozie job -oozie http://rm:11000/oozie -config /usr/local/share/applications/ooziespark/job.properties -run
或者
oozie job -config /usr/local/share/applications/ooziespark/job.properties -run
簡略版前提是你要配置the env variable 'OOZIE_URL' is used as default value for the '-oozie' option,具體能夠用oozie help查看

在oozie界面上查看job執行

Coordinator spark on yarn

簡單調度，每五分鐘跑一次WordCount

文件目錄結構

├── ooziecoor
│   ├── coordinator.xml
│   ├── job.properties
│   ├── lib
│   │   └── spark-1.6.2-1.0-SNAPSHOT.jar
│   └── workflow.xml

coordinator.xml

<coordinator-app name="cron-coord" frequency="${coord:minutes(5)}" start="${start}" end="${end}" timezone="GMT+0800"
              xmlns="uri:oozie:coordinator:0.4">
     <action>
     <workflow>
         <app-path>${workflowAppUri}</app-path>
         <configuration>
             <property>
                 <name>jobTracker</name>
                 <value>${jobTracker}</value>
             </property>
             <property>
                 <name>nameNode</name>
                 <value>${nameNode}</value>
             </property>
             <property>
                 <name>queueName</name>
                 <value>${queueName}</value>
             </property>
         </configuration>
     </workflow>
 </action>
</coordinator-app>

修改以前的job.properties，改成

nameNode=hdfs://nn1:8020
jobTracker=rm:8050
master=yarn-cluster
queueName=default
inputdir=/user/LJK/hello-spark
outputdir=/user/LJK/output
oozie.use.system.libpath=true
#oozie.wf.application.path=/user/LJK/ooziespark
oozie.coord.application.path=${nameNode}/user/LJK/ooziecoor
start=2017-09-30T9:30+0800
end=2017-09-30T17:00+0800
workflowAppUri=${nameNode}/user/LJK/ooziecoor

以前的workflow能夠直接保留不改jar包位置也是能夠的，但爲了每一個任務更加好看，修改下jar包位置便可

上傳到HDFS，並執行命令
oozie job -config /usr/local/share/applications/ooziecoor/job.properties -run

能夠在web上查看job

bundle spark on yarn

文件結構

├── ooziebundle
│   ├── bundle.xml
│   ├── coordinator.xml
│   ├── job.properties
│   ├── lib
│   │   └── spark-1.6.2-1.0-SNAPSHOT.jar
│   └── workflow.xml

增長bundle.xml

<bundle-app name='bundle-app' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xmlns='uri:oozie:bundle:0.1'>
          <coordinator name='coord-1'>
                 <app-path>${nameNode}/user/LJK/ooziebundle/coordinator.xml</app-path>
                 <configuration>
                     <property>
                         <name>start</name>
                         <value>${start}</value>
                     </property>
                     <property>
                         <name>end</name>
                         <value>${end}</value>
                     </property>
                 </configuration>
          </coordinator>
</bundle-app>

修改job.properties

nameNode=hdfs://nn1:8020
jobTracker=rm:8050
master=yarn-cluster
queueName=default
inputdir=/user/LJK/hello-spark
outputdir=/user/LJK/output
oozie.use.system.libpath=true
#oozie.wf.application.path=/user/LJK/ooziespark
#oozie.coord.application.path=${nameNode}/user/LJK/ooziecoor
oozie.bundle.application.path=${nameNode}/user/LJK/ooziebundle
start=2017-09-30T9:30+0800
end=2017-09-30T17:00+0800
workflowAppUri=${nameNode}/user/LJK/ooziebundle

上傳到HDFS，並執行命令
oozie job -config /usr/local/share/applications/ooziebundle/job.properties -run

web上查看job

Java Action

文件結構，lib包不是打成一個jar包因此不列出了，你能夠選擇打成一個jar包

javaExample/
├── job.properties
├── lib
└── workflow.xml

注意
若是你用的是SpringBoot框架，須要在pom上加上exclusions，不然會有jar包衝突，oozie會報錯

<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter</artifactId>
  <exclusions>
      <exclusion>
          <artifactId>spring-boot-starter-logging</artifactId>
          <groupId>org.springframework.boot</groupId>
      </exclusion>
  </exclusions>
</dependency>

workflow.xml

<workflow-app name="My_Workflow" xmlns="uri:oozie:workflow:0.5">
    <start to="java-2d81"/>
    <kill name="Kill">
        <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <action name="java-2d81">
        <java>
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <main-class>com.sharing.App</main-class>
            <arg>hello</arg>
            <arg>springboot</arg>
        </java>
        <ok to="End"/>
        <error to="Kill"/>
    </action>
    <end name="End"/>
</workflow-app>

job.properties

oozie.use.system.libpath=false
queueName=default
jobTracker=rm.ambari:8050
nameNode=hdfs://nn1.ambari:8020
oozie.wf.application.path=${nameNode}/user/LJK/javaExample

java程序源碼

@SpringBootApplication
public class App {

    public static void main(String[] args) {
        SpringApplication.run(App.class,args);
        System.out.println(args[0] + " " + args[1]);
    }
}

Shell Action

文件結構

shell
├── job.properties
└── workflow.xml

workflow.xml

<workflow-app name="My_Workflow" xmlns="uri:oozie:workflow:0.5">
    <start to="shell-2504"/>
    <kill name="Kill">
        <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <action name="shell-2504">
        <shell xmlns="uri:oozie:shell-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <exec>echo</exec>
              <argument>hello shell</argument>
              <capture-output/>
        </shell>
        <ok to="End"/>
        <error to="Kill"/>
    </action>
    <end name="End"/>
</workflow-app>

job.properties

hue-id-w=50057
jobTracker=rm.ambari:8050
mapreduce.job.user.name=admin
nameNode=hdfs://nn1.ambari:8020
oozie.use.system.libpath=True
oozie.wf.application.path=hdfs://nn1.ambari:8020/user/LJK/shell
user.name=admin

Hive Action

文件結構

hiveExample/
├── hive-site.xml
├── input
│   └── inputdata
├── job.properties
├── output
├── script.q
└── workflow.xml

hive script，寫一個hive腳本，文件名自定義，
script.q文件內容

DROP TABLE IF EXISTS test;
CREATE EXTERNAL TABLE test (a INT) STORED AS TEXTFILE LOCATION '${INPUT}';
INSERT OVERWRITE DIRECTORY '${OUTPUT}' SELECT * FROM test;

workflow.xml

<workflow-app name="My_Workflow" xmlns="uri:oozie:workflow:0.5">
    <start to="hive-bfbc"/>
    <kill name="Kill">
        <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <action name="hive-bfbc" cred="hcat">
        <hive xmlns="uri:oozie:hive-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <prepare>
                  <delete path="${nameNode}/user/LJK/hiveExample/output"/>
                  <mkdir path="${nameNode}/user/LJK/hiveExample/output"/>
            </prepare>
              <job-xml>/user/LJK/hiveExample/hive-site.xml</job-xml>
            <script>/user/LJK/hiveExample/script.q</script>
              <param>INPUT=/user/LJK/hiveExample/input</param>
              <param>OUTPUT=/user/LJK/hiveExample/output</param>
        </hive>
        <ok to="End"/>
        <error to="Kill"/>
    </action>
    <end name="End"/>
</workflow-app>

job.properties

hue-id-w=50059
jobTracker=rm.ambari:8050
mapreduce.job.user.name=admin
nameNode=hdfs://nn1.ambari:8020
oozie.use.system.libpath=True
oozie.wf.application.path=hdfs://nn1.ambari:8020/user/LJK/hiveExample
user.name=admin

其中hdfs://nn1.ambari:8020/user/LJK/hiveExample/input要放一個文件，文件名自定義，
inputdata文件內容

執行成功後，能夠看到output文件夾生成文件000000_0,內容與inputdata內容一致

Hive2 Action

跟Hive Action基本是同樣的，只要改動workflow.xml就好

<workflow-app name="My_Workflow" xmlns="uri:oozie:workflow:0.5">
    <start to="hive2-8f27"/>
    <kill name="Kill">
        <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <action name="hive2-8f27" cred="hive2">
        <hive2 xmlns="uri:oozie:hive2-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <prepare>
                  <delete path="${nameNode}/user/LJK/hiveExample/output"/>
                  <mkdir path="${nameNode}/user/LJK/hiveExample/output"/>
            </prepare>
              <job-xml>/user/LJK/hiveExample/hive-site.xml</job-xml>
            <jdbc-url>jdbc:hive2://rm.ambari:10000/default</jdbc-url>
            <script>/user/LJK/hiveExample/script.q</script>
              <param>INPUT=/user/LJK/hiveExample/input</param>
              <param>OUTPUT=/user/LJK/hiveExample/output</param>
        </hive2>
        <ok to="End"/>
        <error to="Kill"/>
    </action>
    <end name="End"/>
</workflow-app>

資源連接

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。