Oozie 配合 sqoop hive 實現數據分析輸出到 mysql

文件/RDBMS -> flume/sqoop -> HDFS -> Hive -> HDFS -> Sqoop -> RDBMS

其中,本文實現了node

  • 使用 sqoop 從 RDBMS 中讀取數據(非Oozie實現,具體錯誤將在本文最後說明)
  • 從 Hive 處理數據存儲到 HDFS
  • 使用 sqoop 將 HDFS 存儲到 RDBMS 中

1.複製一個 sqoop example,拷貝 hive-site.xml 文件,拷貝 mysql 依賴包到 lib 目錄下

2.增長 sqoop-import.sql 文件用以從 RDBMS 讀取數據到 Hive 中

--connect
jdbc:mysql://cen-ubuntu:3306/test
--username
root
--password
ubuntu
--table
user
--hive-database
default
--hive-table
import_from_mysql
--hive-import
--hive-overwrite
--delete-target-dir

3.增長 select.sql 用於使用 Hive 處理數據導出到 HDFS 中(注意說明輸出分隔符)

insert overwrite directory '/user/cen/oozie-apps/sqoop2hive2sqoop/output/' ROW format delimited fields terminated by ',' select id,name from default.import_from_mysql;

4.增長 sqoop-export.sql 用於使用 sqoop 將 HDFS 文件導入到 RDBMS 中

--connect
jdbc:mysql://cen-ubuntu:3306/test
--username
root
--password
ubuntu
--table
export_from_hdfs
--export-dir
/user/cen/oozie-apps/sqoop2hive2sqoop/output/
--fields-terminated-by
','

5.修改 job.properties 文件

nameNode=hdfs://cen-ubuntu.cenzhongman.com:8020
jobTracker=localhost:8032
queueName=default
oozieAppsRoot=oozie-apps

oozie.use.system.libpath=true

oozie.wf.application.path=${nameNode}/user/cen/${oozieAppsRoot}/sqoop2hive2sqoop/
outputDir=sqoop2hive2sqoop/output

6.修改 workflow.xml 文件

<workflow-app xmlns="uri:oozie:workflow:0.5" name="sqoop2hive2sqoop-wf">
    <start to="hive-node"/>

    <action name="hive-node">
        <hive xmlns="uri:oozie:hive-action:0.5">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <prepare>
                <delete path="${nameNode}/user/cen/${oozieAppsRoot}/${outputDir}"/>
            </prepare>
            <job-xml>${nameNode}/user/cen/${oozieAppsRoot}/sqoop2hive2sqoop/hive-site.xml</job-xml>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
            </configuration>
            <script>select.sql</script>
        </hive>
        <ok to="sqoop-export-node"/>
        <error to="hive-fail"/>
    </action>

    <action name="sqoop-export-node">
        <sqoop xmlns="uri:oozie:sqoop-action:0.3">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
            </configuration>
            <command>export --options-file sqoop-export.sql</command>
            <file>${nameNode}/user/cen/${oozieAppsRoot}/sqoop2hive2sqoop/sqoop-export.sql#sqoop-export.sql</file>
        </sqoop>
        <ok to="end"/>
        <error to="sqoop-export-fail"/>
    </action>

    <kill name="hive-fail">
        <message>hive failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <kill name="sqoop-export-fail">
        <message>Sqoop export failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name="end"/>
</workflow-app>

注意事項

  • 各個 action 節點的版本號
  • 用到文件拷貝,請使用 file 屬性
  • Hive 的配置文件不能忘記

7.上傳文件到 HDFS 上

8.執行 sqoop 從 MySQL 中讀取數據到 Hive 中(此處出現錯誤 could not load org.apache.hadoop.hive.conf.HiveConf.Make sure HIVE_CONF_DIR is set corretly.緣由及解決請看 注2 )

bin/sqoop import --options-file /opt/cdh5.3.6/oozie-4.1.0-cdh5.12.0/oozie-apps/sqoop2hive2sqoop/sqoop-import.sql

9.檢查 Hive 中是否已經存在數據,並執行 Oozie

export OOZIE_URL=http://cen-ubuntu:11000/oozie/
bin/oozie job --config /opt/cdh5.3.6/oozie-4.1.0-cdh5.12.0/oozie-apps/sqoop2hive2sqoop/job.properties -run

10.檢查程序執行 Wordflow 和 MySQL 中的輸出結果

注1:使用 Oozie 經過 sqoop import to hive 執行失敗(一樣的程序,本地執行成功),但日誌無輸出,此處貼出完整 wordflow.xml 文件僅供參考

<workflow-app xmlns="uri:oozie:workflow:0.5" name="sqoop2hive2sqoop-wf">
    <start to="sqoop-import-node"/>

    <action name="sqoop-import-node">
        <sqoop xmlns="uri:oozie:sqoop-action:0.3">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
            </configuration>
            <command>import --options-file sqoop-import.sql</command>
            <file>${nameNode}/user/cen/${oozieAppsRoot}/sqoop2hive2sqoop/sqoop-import.sql#sqoop-import.sql</file>
        </sqoop>
        <ok to="hive-node"/>
        <error to="sqoop-import-fail"/>
    </action>

    <action name="hive-node">
        <hive xmlns="uri:oozie:hive-action:0.5">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <prepare>
                <delete path="${nameNode}/user/cen/${oozieAppsRoot}/${outputDir}"/>
            </prepare>
            <job-xml>${nameNode}/user/cen/${oozieAppsRoot}/sqoop2hive2sqoop/hive-site.xml</job-xml>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
            </configuration>
            <script>select.sql</script>
        </hive>
        <ok to="sqoop-export-node"/>
        <error to="hive-fail"/>
    </action>

    <action name="sqoop-export-node">
        <sqoop xmlns="uri:oozie:sqoop-action:0.3">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
            </configuration>
            <command>export --options-file sqoop-export.sql</command>
            <file>${nameNode}/user/cen/${oozieAppsRoot}/sqoop2hive2sqoop/sqoop-export.sql#sqoop-export.sql</file>
        </sqoop>
        <ok to="end"/>
        <error to="sqoop-export-fail"/>
    </action>

    <kill name="sqoop-import-fail">
        <message>Sqoop import failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <kill name="hive-fail">
        <message>hive failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <kill name="sqoop-export-fail">
        <message>Sqoop export failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name="end"/>
</workflow-app>

緣由剖析:錯誤出如今 sqoop-import-nodemysql

  • 找不到 hive 配置文件,嘗試 (1):如 hive-node 同樣增長說明 --> 無效 嘗試(2):在 sqoop-import.sql 中增長 --hive-home /opt/xxx/xxx/xxx --> 無效 嘗試(3):修改conf/cation-conf/hive.xml --> 並未配置
  • 沒法從本地的 sqoop 執行 Hive ? ? 有機會再探索

注2:執行 sqoop 過程出現錯誤could not load org.apache.hadoop.hive.conf.HiveConf.Make sure HIVE_CONF_DIR is set corretly.

  • 緣由:系統使用了變量$HADOOP_CLASSPATH 但本機未定義
  • 解決:增長用戶環境變量~/.bash_profilesql

    export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/opt/cdh5.3.6/hive-1.1.0-cdh5.12.0/lib/*apache

詳情請參考ubuntu

相關文章
相關標籤/搜索