概述java
當前開源的hadoop任務工做流管理主要有oozie和Azkaban,本文先介紹oozie的配置安裝與基本運行原理。 git
配置安裝github
(參考https://segmentfault.com/a/1190000002738484)apache
1. 首先要本身下載代碼編譯。segmentfault
git clone https://github.com/apache/oozie.git
2. 修改pom文件,修改scala和hadoop版本。api
<spark.scala.binary.version>2.11</spark.scala.binary.version> <hadoop.version>2.7.0</hadoop.version>
3. 編譯app
bin/mkdistro.sh -DskipTests -Dhadoop.version=2.7.0 -Pspark-2 -Phadoop-2
註釋:支持spark yarn模式運行,必須加上-Phadoop-2,編譯後的文件在distro/target/oozie-4.3.0-distro/oozie-4.3.0文件夾內
4. 安裝oozie server(1)建立目錄jsp
cd distro/target/oozie-4.3.0-distro/oozie-4.3.0/ mkdir libext (2)oozie server 須要用到一個js庫,在csdn上(http://download.csdn.net/detail/on_way_/8674059),下載後把ext-2.2.zip這個文件放的libext文件夾裏。 cp ~/Downloads/ext-2.2.zip libext/ (3)把hadoop的jar把也放到這個libext文件夾內,參考下面這個命令 cp ${HADOOP_HOME}/share/hadoop/*/*.jar libext/ cp ${HADOOP_HOME}/share/hadoop/*/lib/*.jar libext/ (4)輸出下述jar包,不要放到libext中 cd libext rm jasper-compiler-5.5.23.jar rm jasper-runtime-5.5.23.jar rm jsp-api-2.1.jar cd ../ (5)hue+oozie運行任務異常: java.lang.NoSuchFieldError: HADOOP_CLASSPATH 解決方法(參考http://stackoverflow.com/questions/41205447/oozie-example-map-reduce-job-fails-with-java-lang-nosuchfielderror-hadoop-class) mkdir tmp cp oozie.war tmp cd tmp jar -xvf oozie.war rm -f WEB-INF/lib/hadoop-*.jar rm -f WEB-INF/lib/hive-*.jar rm oozie.war jar -cvf oozie.war ./* cp oozie.war ../ cd ../ bin/oozie-setup.sh prepare-war (6)更新hadoop配置conf/hadoop-conf mkdir conf/hadoop-conf cp ${HADOOP_HOME}/etc/hadoop/* conf/hadoop-conf (7)新增spark配置conf/spark-conf mkdir conf/spark-conf cp ${SPARK_HOME}/conf/* conf/spark-conf (8)增長oozie配置conf/oozie-site.xml <property> <name>oozie.service.ProxyUserService.proxyuser.hadoop.groups</name> <value>*</value> </property> <property> <name>oozie.service.ProxyUserService.proxyuser.hadoop.hosts</name> <value>*</value> </property> 驗證方式: http://localhost:11000/oozie/v1/admin/configuration?timezone=America%2FLos_Angeles&user.name=hadoop&doAs=test (9)執行下面的命令,把相關的jar包傳到hdfs上 bin/oozie-setup.sh sharelib create -fs hdfs://mycluster (10)啓動bin/oozied.sh start
oozie運行試運行oop
配置好job.properties和workflow.xml後,就能夠運行spark任務了。spa
cd oozie-4.3.0/examples/src/main/apps/spark
oozie job -oozie http://localhost:11000/oozie -config job.properties -run
oozie的缺陷
oozie中,因爲分叉和鏈接節點須要成對出現,這會致使致使一些流在oozie中沒法支持,例如。如 A->C,B->C, B->D,這種依賴關係在oozie中沒法實現。
最後附一張oozie運行任務的流程圖: