一段時間寫了個mr程序,最後要進行做業調度,可是不知道用什麼方式比較合適,最終選擇了oozie。java
以前一直寫web程序,老闆們忽然讓玩Hadoop,因而就這麼愉快的的接受了這個活,對於一個新手來講其中遇到好多好多的坑。。。node
環境:hadoop :1.2.1, sqoop:1.4.4, oozie:3.3.2mysql
1. oozie安裝請參考個人這篇文章:http://blog.csdn.net/jueshengtianya/article/details/25300761 這裏面有我以前遇到的坑。web
2. oozie的workflow去找要運行的jar包是在的他的同級目錄下的lib目錄下,workflow要找依賴的jar包都是在這個路徑下。sql
3. 個人oozie工做目錄:apache
[java] view plain copyapp
- hadoop@steven:~/hadoop1.1.2/hadoop-1.2.1/iesRunShell/oozie/iesCron$ ../../../bin/hadoop fs -ls /ies/oozie/cron/
- Found 4 items
- -rw-r--r-- 3 hadoop supergroup 1591 2014-05-12 19:37 /ies/oozie/cron/coordinator.xml
- -rw-r--r-- 3 hadoop supergroup 1032 2014-05-10 20:12 /ies/oozie/cron/job.properties
- drwxr-xr-x - hadoop supergroup 0 2014-05-13 21:41 /ies/oozie/cron/lib
- -rw-r--r-- 3 hadoop supergroup 5450 2014-05-13 20:13 /ies/oozie/cron/workflow.xml
4. 個人workflow文件是這樣配置的,沒啥可說的,直接看吧oop
[java] view plain copyspa
- <workflow-app xmlns="uri:oozie:workflow:0.2" name="java-main-wf">
- <start to="firstMid"/>
-
- <!--生成第一次中間結果-->
- <action name="firstMid">
- <java>
- <job-tracker>${jobTracker}</job-tracker>
- <name-node>${nameNode}</name-node>
- <configuration>
- <property>
- <name>mapred.job.queue.name</name>
- <value>${queueName}</value>
- </property>
- </configuration>
- <main-class>com.miaozhen.ies.job.IesJob4MZSEQ</main-class>
- <arg>/ies/output/mid</arg>
- </java>
- <ok to="joinLog"/>
- <error to="fail"/>
- </action>
-
- <!--聚合中間結果和當天的日誌-->
- <action name="joinLog">
- <java>
- <job-tracker>${jobTracker}</job-tracker>
- <name-node>${nameNode}</name-node>
- <configuration>
- <property>
- <name>mapred.job.queue.name</name>
- <value>${queueName}</value>
- </property>
- </configuration>
- <main-class>com.miaozhen.ies.job.JoinJob</main-class>
- <arg>/ies/join</arg>
- </java>
- <ok to="generateResult"/>
- <error to="fail"/>
- </action>
-
- <fork name="generateResult">
- <path start="iesResult"/>
- <path start="spidResult"/>
- </fork>
-
- <!--生成結果-->
- <action name="iesResult">
- <java>
- <job-tracker>${jobTracker}</job-tracker>
- <name-node>${nameNode}</name-node>
- <configuration>
- <property>
- <name>mapred.job.queue.name</name>
- <value>${queueName}</value>
- </property>
- </configuration>
- <main-class>com.miaozhen.ies.job.IesResultJob</main-class>
- <arg>/ies/join/joinResult/iesResult-r-00000</arg>
- <arg>/ies/iesResult</arg>
- </java>
- <ok to="completed"/>
- <error to="fail"/>
- </action>
-
- <action name="spidResult">
- <java>
- <job-tracker>${jobTracker}</job-tracker>
- <name-node>${nameNode}</name-node>
- <configuration>
- <property>
- <name>mapred.job.queue.name</name>
- <value>${queueName}</value>
- </property>
- </configuration>
- <main-class>com.miaozhen.ies.job.ResultJob</main-class>
- <arg>/ies/join/joinResult/iesResult-r-00000</arg>
- <arg>/ies/spidResult</arg>
- </java>
- <ok to="completed"/>
- <error to="fail"/>
- </action>
-
- <join name="completed" to="sqoopResult"/>
-
- <fork name="sqoopResult">
- <path start="sqoopIesResult"/>
- <path start="sqoopSpidResult"/>
- <path start="sqoopRelationResult"/>
- </fork>
-
-
- <action name="sqoopIesResult">
- <sqoop xmlns="uri:oozie:sqoop-action:0.2">
- <job-tracker>${jobTracker}</job-tracker>
- <name-node>${nameNode}</name-node>
- <command>
- export --connect jdbc:mysql://127.0.0.1:3306/ies2 --username root --table ies_report --export-dir /ies/iesResult/data/iesRegionResult-r-00000 --columns iesId,caid,imp3rd,clk3rd,period,regionId,insertTime
- </command>
- </sqoop>
- <ok to="sqoopCompleted"/>
- <error to="fail"/>
- </action>
-
- <action name="sqoopSpidResult">
- <sqoop xmlns="uri:oozie:sqoop-action:0.2">
- <job-tracker>${jobTracker}</job-tracker>
- <name-node>${nameNode}</name-node>
- <command>
- export --connect jdbc:mysql://127.0.0.1:3306/ies2 --username root --table spots_report --export-dir /ies/spidResult/spid/spidResult-r-00000 --columns spid,impIES,clkIES,insertTime
- </command>
- </sqoop>
- <ok to="sqoopCompleted"/>
- <error to="fail"/>
- </action>
-
- <action name="sqoopRelationResult">
- <sqoop xmlns="uri:oozie:sqoop-action:0.2">
- <job-tracker>${jobTracker}</job-tracker>
- <name-node>${nameNode}</name-node>
- <command>
- export --connect jdbc:mysql://127.0.0.1:3306/ies2 --username root --table relation --export-dir /ies/spidResult/spid/relation-r-00000 --columns iesId,spid,insertTime
- </command>
- </sqoop>
- <ok to="sqoopCompleted"/>
- <error to="fail"/>
- </action>
-
- <join name="sqoopCompleted" to="end"/>
-
- <kill name="fail">
- <message>Java failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
- </kill>
- <end name="end"/>
- </workflow-app>
4. 這裏要說一下在oozie調用sqoop的時候:.net
[java] view plain copy
- export --connect jdbc:mysql://127.0.0.1:3306/ies2 --username root --table ies_report --export-dir /ies/iesResult/data/iesRegionResult-r-00000 --columns iesId,caid,imp3rd,clk3rd,period,regionId,insertTime
在進行insertTime插入的時候,要注意必定要把時間設置成這種格式:yyyy-MM-dd HH:mm:ss,sqoop在進行時間插入的時候會把date轉化爲timestrap,若是你不保留時分秒的話就會拋出以下錯誤:
[java] view plain copy
- java.io.IOException: Can't export data, please check task tracker logs
- at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
- at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
- at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
- at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
- at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
- at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
- at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
- at java.security.AccessController.doPrivileged(Native Method)
- at javax.security.auth.Subject.doAs(Subject.java:415)
- at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
- at org.apache.hadoop.mapred.Child.main(Child.java:249)
- Caused by: java.lang.IllegalArgumentException: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff]
- at java.sql.Timestamp.valueOf(Timestamp.java:202)
- at spots_report.__loadFromFields(spots_report.java:266)
- at spots_report.parse(spots_report.java:203)
- at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
- ... 10 more
若是你的輸出格式是yyyy-MM-dd HH:mm:ss這種而不是yyyy-MM-dd這種,sqoop的日期轉化就沒有問題。