工做流的執行命令參考博客:https://www.jianshu.com/p/6cb3a4b78556,也能夠鍵入oozie help
查看幫助node
job.properties文件,存放workflow.xml文件可能用到的一些參數
job.propertiespython
# 注意變量名不要包含特殊字符,不然在 spark 中會出現沒法解析變量名的問題 # oozie.wf.application.path的路徑必須在hdfs上,由於整個集羣要訪問 nameNode=hdfs://txz-data0:9820 resourceManager=txz-data0:8032 oozie.use.system.libpath=true oozie.libpath=${nameNode}/share/lib/spark2/jars/,${nameNode}/share/lib/spark2/python/lib/,${nameNode}/share/lib/spark2/hive-site.xml oozie.wf.application.path=${nameNode}/workflow/data-factory/download_report_voice_and_upload/Workflow oozie.action.sharelib.for.spark=spark2 archive=${nameNode}/envs/py3.tar.gz#py # 若是 dryrun 爲 true,表示只是測試當前的 workflow,並不具體記錄相應 job dryrun=false sparkMaster=yarn-cluster sparkMode=cluster scriptRoot=/workflow/data-factory/download_report_voice_and_upload/Python sparkScriptBasename=download_parquet_from_data0_upload_online.py sparkScript=${scriptRoot}/${sparkScriptBasename} pysparkPath=py/py3/bin/python3
workflow.xml文件app
<!-- 這是爲oozie的workflow提供參數,裏面用到的變量默認來自job.properties文件 --> <workflow-app xmlns='uri:oozie:workflow:1.0' name='download_parquet_from_data0_upload_online'> <global> <resource-manager>${resourceManager}</resource-manager> <name-node>${nameNode}</name-node> </global> <start to='spark-node' /> <action name='spark-node'> <spark xmlns="uri:oozie:spark-action:1.0"> <master>${sparkMaster}</master> <mode>${sparkMode}</mode> <name>report_voice_download_pyspark</name> <jar>${sparkScriptBasename}</jar> <spark-opts> --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=${pysparkPath} </spark-opts> <file>${sparkScript}#${sparkScriptBasename}</file> <archive>${archive}</archive> </spark> <ok to="end" /> <error to="fail" /> </action> <kill name="fail"> <message> Workflow failed, error message[${wf:errorMessage(wf:lastErrorNode())}] </message> </kill> <end name='end' /> </workflow-app>
將這兩個文件放在本地磁盤上面,例如放在文件夾/home/workflow/
中測試
運行命令oozie job -oozie http://txz-data0:11000/oozie -config /home/workflow/job.properties -run
便可運行這個workflowspa
這樣手寫配置的話,在Hue上面是不可見的,因此後面都是在Hue上面配置workflow,而後再配置Schedule。具體配置見博客https://blog.csdn.net/qq_22918243/article/details/89204111.net