oozie的做業調度

一段時間寫了個mr程序,最後要進行做業調度,可是不知道用什麼方式比較合適,最終選擇了oozie。java

以前一直寫web程序,老闆們忽然讓玩Hadoop,因而就這麼愉快的的接受了這個活,對於一個新手來講其中遇到好多好多的坑。。。node

環境:hadoop :1.2.1,   sqoop:1.4.4, oozie:3.3.2mysql

 

1.  oozie安裝請參考個人這篇文章:http://blog.csdn.net/jueshengtianya/article/details/25300761  這裏面有我以前遇到的坑。web

2.  oozie的workflow去找要運行的jar包是在的他的同級目錄下的lib目錄下,workflow要找依賴的jar包都是在這個路徑下。sql

3.  個人oozie工做目錄:apache

 

[java] view plain copyapp

在CODE上查看代碼片派生到個人代碼片

  1. hadoop@steven:~/hadoop1.1.2/hadoop-1.2.1/iesRunShell/oozie/iesCron$ ../../../bin/hadoop fs -ls /ies/oozie/cron/  
  2. Found 4 items  
  3. -rw-r--r--   3 hadoop supergroup       1591 2014-05-12 19:37 /ies/oozie/cron/coordinator.xml  
  4. -rw-r--r--   3 hadoop supergroup       1032 2014-05-10 20:12 /ies/oozie/cron/job.properties  
  5. drwxr-xr-x   - hadoop supergroup          0 2014-05-13 21:41 /ies/oozie/cron/lib  
  6. -rw-r--r--   3 hadoop supergroup       5450 2014-05-13 20:13 /ies/oozie/cron/workflow.xml  

4.  個人workflow文件是這樣配置的,沒啥可說的,直接看吧oop

 

 

[java] view plain copyspa

在CODE上查看代碼片派生到個人代碼片

  1. <workflow-app xmlns="uri:oozie:workflow:0.2" name="java-main-wf">  
  2.     <start to="firstMid"/>  
  3.       
  4.     <!--生成第一次中間結果-->  
  5.     <action name="firstMid">  
  6.         <java>  
  7.             <job-tracker>${jobTracker}</job-tracker>  
  8.             <name-node>${nameNode}</name-node>  
  9.             <configuration>  
  10.                 <property>  
  11.                     <name>mapred.job.queue.name</name>  
  12.                     <value>${queueName}</value>  
  13.                 </property>  
  14.             </configuration>  
  15.             <main-class>com.miaozhen.ies.job.IesJob4MZSEQ</main-class>  
  16.             <arg>/ies/output/mid</arg>  
  17.         </java>  
  18.         <ok to="joinLog"/>  
  19.         <error to="fail"/>  
  20.     </action>  
  21.   
  22.     <!--聚合中間結果和當天的日誌-->  
  23.     <action name="joinLog">  
  24.         <java>  
  25.             <job-tracker>${jobTracker}</job-tracker>  
  26.             <name-node>${nameNode}</name-node>  
  27.             <configuration>  
  28.                 <property>  
  29.                     <name>mapred.job.queue.name</name>  
  30.                     <value>${queueName}</value>  
  31.                 </property>  
  32.             </configuration>  
  33.             <main-class>com.miaozhen.ies.job.JoinJob</main-class>  
  34.             <arg>/ies/join</arg>  
  35.         </java>  
  36.         <ok to="generateResult"/>  
  37.         <error to="fail"/>  
  38.     </action>  
  39.   
  40.     <fork name="generateResult">  
  41.        <path start="iesResult"/>  
  42.        <path start="spidResult"/>  
  43.     </fork>  
  44.   
  45.     <!--生成結果-->  
  46.     <action name="iesResult">  
  47.         <java>  
  48.             <job-tracker>${jobTracker}</job-tracker>  
  49.             <name-node>${nameNode}</name-node>  
  50.             <configuration>  
  51.                 <property>  
  52.                     <name>mapred.job.queue.name</name>  
  53.                     <value>${queueName}</value>  
  54.                 </property>  
  55.             </configuration>  
  56.             <main-class>com.miaozhen.ies.job.IesResultJob</main-class>  
  57.             <arg>/ies/join/joinResult/iesResult-r-00000</arg>  
  58.             <arg>/ies/iesResult</arg>  
  59.         </java>  
  60.         <ok to="completed"/>  
  61.         <error to="fail"/>  
  62.     </action>  
  63.   
  64.     <action name="spidResult">  
  65.         <java>  
  66.             <job-tracker>${jobTracker}</job-tracker>  
  67.             <name-node>${nameNode}</name-node>  
  68.             <configuration>  
  69.                 <property>  
  70.                     <name>mapred.job.queue.name</name>  
  71.                     <value>${queueName}</value>  
  72.                 </property>  
  73.             </configuration>  
  74.             <main-class>com.miaozhen.ies.job.ResultJob</main-class>  
  75.             <arg>/ies/join/joinResult/iesResult-r-00000</arg>  
  76.             <arg>/ies/spidResult</arg>  
  77.         </java>  
  78.         <ok to="completed"/>  
  79.         <error to="fail"/>  
  80.     </action>  
  81.   
  82.     <join name="completed" to="sqoopResult"/>  
  83.   
  84.     <fork name="sqoopResult">  
  85.        <path start="sqoopIesResult"/>  
  86.        <path start="sqoopSpidResult"/>  
  87.        <path start="sqoopRelationResult"/>  
  88.     </fork>  
  89.   
  90.   
  91.     <action name="sqoopIesResult">  
  92.         <sqoop xmlns="uri:oozie:sqoop-action:0.2">  
  93.                 <job-tracker>${jobTracker}</job-tracker>  
  94.                 <name-node>${nameNode}</name-node>  
  95.                 <command>  
  96.             export --connect jdbc:mysql://127.0.0.1:3306/ies2 --username root --table ies_report --export-dir /ies/iesResult/data/iesRegionResult-r-00000 --columns iesId,caid,imp3rd,clk3rd,period,regionId,insertTime  
  97.         </command>  
  98.         </sqoop>  
  99.         <ok to="sqoopCompleted"/>  
  100.         <error to="fail"/>  
  101.     </action>  
  102.   
  103.     <action name="sqoopSpidResult">  
  104.         <sqoop xmlns="uri:oozie:sqoop-action:0.2">  
  105.                 <job-tracker>${jobTracker}</job-tracker>  
  106.                 <name-node>${nameNode}</name-node>  
  107.                 <command>  
  108.                  export --connect jdbc:mysql://127.0.0.1:3306/ies2 --username root --table spots_report --export-dir /ies/spidResult/spid/spidResult-r-00000 --columns spid,impIES,clkIES,insertTime  
  109.             </command>  
  110.         </sqoop>  
  111.         <ok to="sqoopCompleted"/>  
  112.         <error to="fail"/>  
  113.     </action>  
  114.   
  115.     <action name="sqoopRelationResult">  
  116.         <sqoop xmlns="uri:oozie:sqoop-action:0.2">  
  117.                 <job-tracker>${jobTracker}</job-tracker>  
  118.                 <name-node>${nameNode}</name-node>  
  119.                 <command>  
  120.                     export --connect jdbc:mysql://127.0.0.1:3306/ies2 --username root --table relation --export-dir /ies/spidResult/spid/relation-r-00000 --columns iesId,spid,insertTime  
  121.         </command>  
  122.         </sqoop>  
  123.         <ok to="sqoopCompleted"/>  
  124.         <error to="fail"/>  
  125.     </action>  
  126.   
  127.     <join name="sqoopCompleted" to="end"/>  
  128.   
  129.     <kill name="fail">  
  130.         <message>Java failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>  
  131.     </kill>  
  132.     <end name="end"/>  
  133. </workflow-app>  

4. 這裏要說一下在oozie調用sqoop的時候:.net

 

[java] view plain copy

在CODE上查看代碼片派生到個人代碼片

  1. export --connect jdbc:mysql://127.0.0.1:3306/ies2 --username root --table ies_report --export-dir /ies/iesResult/data/iesRegionResult-r-00000 --columns iesId,caid,imp3rd,clk3rd,period,regionId,insertTime  

在進行insertTime插入的時候,要注意必定要把時間設置成這種格式:yyyy-MM-dd HH:mm:ss,sqoop在進行時間插入的時候會把date轉化爲timestrap,若是你不保留時分秒的話就會拋出以下錯誤:

 

 

[java] view plain copy

在CODE上查看代碼片派生到個人代碼片

  1. java.io.IOException: Can't export data, please check task tracker logs  
  2.     at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)  
  3.     at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)  
  4.     at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)  
  5.     at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)  
  6.     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)  
  7.     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)  
  8.     at org.apache.hadoop.mapred.Child$4.run(Child.java:255)  
  9.     at java.security.AccessController.doPrivileged(Native Method)  
  10.     at javax.security.auth.Subject.doAs(Subject.java:415)  
  11.     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)  
  12.     at org.apache.hadoop.mapred.Child.main(Child.java:249)  
  13. Caused by: java.lang.IllegalArgumentException: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff]  
  14.     at java.sql.Timestamp.valueOf(Timestamp.java:202)  
  15.     at spots_report.__loadFromFields(spots_report.java:266)  
  16.     at spots_report.parse(spots_report.java:203)  
  17.     at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)  
  18.     ... 10 more  

若是你的輸出格式是yyyy-MM-dd HH:mm:ss這種而不是yyyy-MM-dd這種,sqoop的日期轉化就沒有問題。

相關文章
相關標籤/搜索