hue oozie再踩坑,workflow,coordinator終於均可以跑了

前邊總結 了些hue下sqoop1,oozie,hbase的一些坑,今日項目到期,必定要搞定oozie工做流和定時調度執行,以是skr skr skr ....html

1.前邊 的sqoop mysql 導入出的坑已都踩過了,後來發現除了cdh(5.15)沒有自動配置好sqoop1以外也可有可無,手動配置後,按裝sharelib後在拷些不全的包(如 sqoop,hbase,mysql,oozie等),基本是也能夠在hue裏跑的(hue 用oozie跑sqoop ,python寫的xml 轉義bug不能帶引號之類),開始一直找不到驅動,後邊按網上OOZIE 下各lib libext libtools 和其它sqoop lib的目錄下加了mysql驅動後依然不行,後來改了下hdfs 下的 core-site.xml 的代理用戶後就行了:java

<property><name>hadoop.proxyuser.hue.hosts</name><value>*</value></property>
<property><name>hadoop.proxyuser.hue.groups</name><value>*</value></property>
<property><name>hadoop.proxyuser.oozie.hosts</name><value>*</value></property><property><name>hadoop.proxyuser.ozzie.groups</name><value>*</value></property>

sqoop1手動配置:node

Sqoop 1 Client 服務環境高級配置代碼段(安全閥):python

SQOOP_CONF_DIR=/etc/sqoop/conf
HADOOP_COMMON_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
HADOOP_MAPRED_HOME=/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce
HBASE_HOME=/opt/cloudera/parcels/CDH/lib/hbase
HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive
ZOOCFGDIR=/opt/cloudera/parcels/CDH/lib/zookeeper

sqoop-conf/sqoop-env.sh 的 Sqoop 1 Client 客戶端高級配置代碼段(安全閥):mysql

#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/opt/cloudera/parcels/CDH/lib/hadoop

#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce

#set the path to where bin/hbase is available
export HBASE_HOME=/opt/cloudera/parcels/CDH/lib/hbase

#Set the path to where bin/hive is available
export HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive

#Set the path for where zookeper config dir is
export ZOOCFGDIR=/opt/cloudera/parcels/CDH/lib/zookeeper

export HADOOP_CLASSPATH=$HADOOP_CLASSPATH::/opt/cloudera/parcels/SQOOP_TERADATA_CONNECTOR1.7c5//lib/tdgssconfig.jar:/opt/cloudera/parcels/SQOOP_TERADATA_CONNECTOR-1.7c5//lib/terajdbc4.jar:


export  SQOOP_CONF_DIR=/etc/sqoop/conf

2.導入由於是關係數據庫有修改,因此導到hbase解決數據修改後增量同步過來的數據重複,保證數據一致性,而後有些類型轉化,如int float(11,2)datetime 要轉成java類型,特別是int ,有的關係數據庫是tinyint 不轉到hbase會變成true false,後在hive數倉,還不報錯, 到impala下才提示sql

3.數據 有了,hive建hbase外表入倉,後開始計算,因要增量計算因此要定時刪除建立表並導出到Mysql,導出的坑也說 了,-columns 類型 null轉化參數,和完全解決類型問題自定義class,另外一坑是因導出表在hdfs ,各人在hue 操做權限不一樣,雖hue 裏有share 功能,但只能腳本共享,不能正確執行,還報hdfs權限錯誤,後來和上邊同樣core-site。xml加全部組員爲代理用戶,雖不報錯了,但仍是有問題,如dorp table ,因別人建立的hive裏刪了,hdfs 下目錄並無刪,數據建立等都有問題,結論是hue 仍是本身跑本身的程序和腳本,上線用一個賬號所有拷到一塊兒在跑就沒問題了,不能真正協做,只能過程當中腳本共享查看,並不能一塊在一流裏,相互調用,因權限不一樣,一些臨時目錄建立,代理用戶操做不了的數據庫

4.導入計算輸出腳本sql都有了,要作成流並定時調用,但oozie開始一直只能在命令行可跑和sqoop1 同樣,後來測試可跑Workflow,按上說的在workspase下改配置後跑並不起做用,並且配置又還原了,瞭解到HUE裏的好多配置並不像開源安裝的同樣直接能夠改配置文件,因它可能就不是用的那個配置,而本身有本身的目錄,一些關鍵的參數都有默認值 ,如:oozie.wf.application.path oozie.coord.application.path等,且從workflow設計頁的菜單裏生成的cornd調度根本提交不了,提示undefined --!           以下是截圖:安全

後來排除以上全部其它問題後,從頭去跑OOZIE-examples,結果命令行下依然能夠,拷上來的examples 在hue裏也不能跑,試了各類方式,後埋頭看了幾遍OOZIE 原理操做,有幾隻得提下:http://shiyanjun.cn/archives/684.html  https://www.cnblogs.com/en-heng/p/5581331.html,最後瞭解workflow 和cornd是等同的不是包含從屬關係,後就從hue菜單建了個新的cornd 後再選workflow,結果 居然能夠跑定時了,我擦 ,那work flow裏的cornd 調整就是坑貨,網上一些哪貼根本沒能在hue 裏跑,只是在hue裏按命令行下寫workflow 文件,而實測hue4.1 (cdh5.15.1)里根本不讓跑,不是參數重了, 就 是提交後會還原你的配置app

 

正確姿式截圖:oop

注意這裏的workflow-1是在SCHCDULE後選的,就 是個順序的問題,而後提交經過啦--!

5.其它還有些小坑,時區大都網上仍是utc,然而按官網改oozie-site.xml 的 Oozie Server 高級配置代碼段(安全閥)

<property><name>oozie.processing.timezone</name><value>GMT+0800</value></property>

cornd裏變成上海時間:

<coordinator-app name="MY_APP" frequency="${coord:minutes(2)}" start="${start}" end="${end}" timezone="Asia/Shanghai" xmlns="uri:oozie:coordinator:0.2">
   <action>
      <workflow>
         <app-path>${workflowAppUri}</app-path>
      </workflow>
   </action>
</coordinator-app>

而提交時,提示格式不對要加 +0800  --!

Error: E1003 : E1003: Invalid coordinator application attributes, parameter [start] = [2018-09-19T16:35] must be Date in GMT+08:00 format (yyyy-MM-dd'T'HH:mm+0800). Parsing error java.text.ParseException: Could not parse [2018-09-19T16:35] using [yyyy-MM-dd'T'HH:mm+0800] mask
因而job.properties:

oozie.use.system.libpath=true
security_enabled=False
dryrun=False
send_email=False
jobTracker=master:8032
start=2018-09-27T16:35+0800
nameNode=hdfs://master:8020
end=2018-09-27T18:35+0800
workflowAppUri=${nameNode}/user/hue/oozie/apps/sqoop#自生成的wf根本不寫這兩行
oozie.coord.application.path=${nameNode}/user/hue/oozie/apps/sqoop#自生成的wf根本不寫這兩行
<workflow-app name="My Workflow" xmlns="uri:oozie:workflow:0.5">
    <start to="sqoop-ace0"/>
    <kill name="Kill">
        <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <action name="sqoop-ace0">
        <sqoop xmlns="uri:oozie:sqoop-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <command> list-databases --connect jdbc:mysql://master:3306/ --username bigdata --password xxxxx </command>
        </sqoop>
        <ok to="End"/>
        <error to="Kill"/>
    </action>
    <end name="End"/>
</workflow-app>

hue裏建立的workflow 和job.properties

<workflow-app name="Workflow-1" xmlns="uri:oozie:workflow:0.5">
    <start to="hive-d593"/>
    <kill name="Kill">
        <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <action name="hive-d593" cred="hive2">
        <hive2 xmlns="uri:oozie:hive2-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <jdbc-url>jdbc:hive2://master:10000/default</jdbc-url>
            <script>${wf:appPath()}/hive-d593.sql</script>
        </hive2>
        <ok to="hive-9e6a"/>
        <error to="Kill"/>
    </action>
    <action name="hive-9e6a" cred="hive2">
        <hive2 xmlns="uri:oozie:hive2-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <jdbc-url>jdbc:hive2://master:10000/default</jdbc-url>
            <script>${wf:appPath()}/hive-9e6a.sql</script>
        </hive2>
        <ok to="hive-016c"/>
        <error to="Kill"/>
    </action>
    <action name="hive-016c" cred="hive2">
        <hive2 xmlns="uri:oozie:hive2-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <jdbc-url>jdbc:hive2://master:10000/default</jdbc-url>
            <script>${wf:appPath()}/hive-016c.sql</script>
        </hive2>
        <ok to="hive-02ec"/>
        <error to="Kill"/>
    </action>
    <action name="hive-3c77" cred="hive2">
        <hive2 xmlns="uri:oozie:hive2-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <jdbc-url>jdbc:hive2://master:10000/default</jdbc-url>
            <script>${wf:appPath()}/hive-3c77.sql</script>
        </hive2>
        <ok to="hive-6ffa"/>
        <error to="Kill"/>
    </action>
    <action name="hive-02ec" cred="hive2">
        <hive2 xmlns="uri:oozie:hive2-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <jdbc-url>jdbc:hive2://master:10000/default</jdbc-url>
            <script>${wf:appPath()}/hive-02ec.sql</script>
        </hive2>
        <ok to="hive-3c77"/>
        <error to="Kill"/>
    </action>
    <action name="hive-6ffa" cred="hive2">
        <hive2 xmlns="uri:oozie:hive2-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <jdbc-url>jdbc:hive2://master:10000/default</jdbc-url>
            <script>${wf:appPath()}/hive-6ffa.sql</script>
        </hive2>
        <ok to="hive-bbd4"/>
        <error to="Kill"/>
    </action>
    <action name="hive-bbd4" cred="hive2">
        <hive2 xmlns="uri:oozie:hive2-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <jdbc-url>jdbc:hive2://master:10000/default</jdbc-url>
            <script>${wf:appPath()}/hive-bbd4.sql</script>
        </hive2>
        <ok to="End"/>
        <error to="Kill"/>
    </action>
    <end name="End"/>
</workflow-app>
oozie.use.system.libpath=True
security_enabled=False
dryrun=False
start_date=2018-09-27T17:28
end_date=2018-09-27T18:28

還有自動生的cornd

<coordinator-app name="Schedule-1"
  frequency="0,33,40 * * * *"
  start="${start_date}" end="${end_date}" timezone="Asia/Shanghai"
  xmlns="uri:oozie:coordinator:0.2"
  >
  <controls>
    <execution>FIFO</execution>
  </controls>
  <action>
    <workflow>
      <app-path>${wf_application_path}</app-path>
      <configuration>
        <property>
            <name>oozie.use.system.libpath</name>
            <value>True</value>
        </property>
        <property>
            <name>start_date</name>
            <value>${start_date}</value>
        </property>
        <property>
            <name>end_date</name>
            <value>${end_date}</value>
        </property>
      </configuration>
   </workflow>
  </action>
</coordinator-app>
oozie.use.system.libpath True
security_enabled False
oozie.coord.application.path hdfs://master:8020/user/hue/oozie/deployments/_hue_-oozie-3509-1538040624.53
dryrun False
end_date 2018-09-27T18:28+0800
jobTracker master:8032
mapreduce.job.user.name hue
user.name hue
hue-id-c 3509
nameNode hdfs://master:8020
wf_application_path hdfs://master:8020/user/hue/oozie/workspaces/hue-oozie-1537954075.59
start_date 2018-09-27T17:28+0800

 

6.其它線上安裝集羣遇到的坑

hiveserver2 CM裏看起來了但10000端口沒有,緣由很坑 用的tmp目錄臨時文件目錄權限 不夠:

7.hbase 不穩定,zookeeper也是報安全檢測問題, 一會master 不見了,一下子zk node 不能寫,一下子60000端口被佔了,實際master進程都在,但hbase不能訪問,或報連zk有問題,其實就是zk監測時常問題,hbase最終增長以下配置穩定多了:

相關文章
相關標籤/搜索