經過DataX從Oracle同步數據到MySQL-安裝配置過程

DataXhtml

DataX 是阿里巴巴集團內被普遍使用的離線數據同步工具/平臺,實現包括 MySQL、SQL Server、Oracle、PostgreSQL、HDFS、Hive、HBase、OTS、ODPS 等各類異構數據源之間高效的數據同步功能。

Featuresjava

DataX自己做爲數據同步框架,將不一樣數據源的同步抽象爲從源頭數據源讀取數據的Reader插件,以及向目標端寫入數據的Writer插件,理論上DataX框架能夠支持任意數據源類型的數據同步工做。同時DataX插件體系做爲一套生態系統, 每接入一套新數據源該新加入的數據源便可實現和現有的數據源互通。

System Requirementspython

Linux
JDK(1.8以上,推薦1.8)
Python(推薦Python2.6.X)
Apache Maven 3.X(Compile DataX)

Quick Startmysql

工具部署linux

方法一、直接下DataX工具包:DataX下載地址,下載後解壓至本地某個目錄,進入bin目錄,便可運行同步做業git

$ cd  {YOUR_DATAX_HOME}/bin
$ python datax.py {YOUR_JOB.json}

 方法二、下載DataX源碼,本身編譯:DataX源碼github

①.安裝JDKsql

tar xvf jdk-8u151-linux-x64.tar.gz
mv jdk1.8.0_151
vim /etc/profile.d/jdk.sh
export JAVA_HOME=/usr/local/jdk1.8.0_151
export JAVA_BIN=$JAVA_HOME/bin
export PATH=$PATH:$JAVA_BIN
export CLASSPATH=$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
source /etc/profile.d/jdk.sh

 檢測安裝是否成功apache

[root@oracle ~]# java -version
java version "1.8.0_151"
Java(TM) SE Runtime Environment (build 1.8.0_151-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.151-b12, mixed mode)

 ②.查看Python版本,若是不知足則須要自行安裝json

[root@oracle ~]# python -V
Python 2.6.6

 ③.安裝Maven

 

開始安裝配置

tar xvf apache-maven-3.6.1-bin.tar.gz
mv apache-maven-3.6.1-bin.tar ../maven
vim /etc/profile
M2_HOME=/usr/local/maven
export PATH=${M2_HOME}/bin:/u01/mysql/bin:${PATH}

 驗證Maven是否安裝成功

[root@oracle src]# mvn -v
Apache Maven 3.6.1 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-11T00:41:47+08:00)
Maven home: /usr/local/maven
Java version: 1.8.0_151, vendor: Oracle Corporation
Java home: /usr/local/jdk1.8.0_151/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "3.10.0-862.el7.x86_64", arch: "amd64", family: "unix"

 系統需求已配置完成,開始源碼安裝DataX,下載方法二選其一

下載地址:https://github.com/alibaba/DataX 
git clone git@github.com:alibaba/DataX.git

 開始源碼安裝

unzip DataX-master.zip
mv DataX-master ../
cd ../DataX-master
mvn -U clean package assembly:assembly -Dmaven.test.skip=true

 該過程很是很是的漫長,須要等待,打包成功,最終顯示以下

[INFO] BUILD SUCCESS
[INFO] -----------------------------------------------------------------
[INFO] Total time: 08:12 min
[INFO] Finished at: 2015-12-13T16:26:48+08:00
[INFO] Final Memory: 133M/960M
[INFO] -----------------------------------------------------------------

 打包成功後的DataX包位於{DataX_source_code_home}/target/datax/datax/,結構以下:

cd /usr/local/Datax-master
[root@oracle DataX-master]# ls -a ./target/datax/datax/
.  ..  bin  conf  job  lib  log  log_perf  plugin  script  tmp

 配置示例:從stream讀取數據並打印到控制檯,

第一步。建立創業的配置文件(json格式),能夠經過命令查看配置模板:python datax.py -r {YOUR_READER} -w {YOUR_WRITER}

cd /usr/local/DataX-master/target/datax/datax/bin
./datax.py -r streamreader -w streamwriter
DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.


Please refer to the streamreader document:
     https://github.com/alibaba/DataX/blob/master/streamreader/doc/streamreader.md 

Please refer to the streamwriter document:
     https://github.com/alibaba/DataX/blob/master/streamwriter/doc/streamwriter.md 
 
Please save the following configuration as a json file and  use
     python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json 
to run the job.

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "streamreader", 
                    "parameter": {
                        "column": [], 
                        "sliceRecordCount": ""
                    }
                }, 
                "writer": {
                    "name": "streamwriter", 
                    "parameter": {
                        "encoding": "", 
                        "print": true
                    }
                }
            }
        ], 
        "setting": {
            "speed": {
                "channel": ""
            }
        }
    }
}

根據模板配置本身所需的json,具體以下

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "oraclereader",
                    "parameter": {
                        "column": ["zybh","xmbh","jsbh","xmmc","sfje","fylx","bz","sjc","zllx"],
                        "connection": [
                            {
                               "jdbcUrl": ["jdbc:oracle:thin:@192.168.11.91:1521:orcl"],
                                "table": ["JDZLJCXM"]
                            }
                        ],
                        "password": "jgzdwffz",
                        "username": "bjxxjgxt",
                    }
                },
                "writer": {
                    "name": "mysqlwriter",
                    "parameter": {
                        "column": ["zybh","xmbh","jsbh","xmmc","sfje","fylx","bz","sjc","zllx"],
                        "connection": [
                            {
                                "jdbcUrl": "jdbc:mysql://192.168.11.75:3336/prison_practical_platform",
                                "table": ["JDZLJCXM"]
                            }
                        ],
                        "password": "root",
                        "username": "root",
                    }
                }
            }
        ],
        "setting": {
            "speed": {
                "channel": "5"
            }
        }
    }
}

 第二步。啓動DataX

cd /usr/local/DataX-master/target/datax/datax/bin
./datax.py ../job/oracle11mysql8.json

 同步結束,相關日誌以下:

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.


2019-05-24 11:59:06.065 [main] INFO  VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2019-05-24 11:59:06.075 [main] INFO  Engine - the machine info  => 

    osInfo:    Oracle Corporation 1.8 25.161-b12
    jvmInfo:    Linux amd64 3.10.0-862.el7.x86_64
    cpu num:    1

    totalPhysicalMemory:    -0.00G
    freePhysicalMemory:    -0.00G
    maxFileDescriptorCount:    -1
    currentOpenFileDescriptorCount:    -1

    GC Names    [Copy, MarkSweepCompact]

    MEMORY_NAME                    | allocation_size                | init_size                      
    Eden Space                     | 273.06MB                       | 273.06MB                       
    Code Cache                     | 240.00MB                       | 2.44MB                         
    Survivor Space                 | 34.13MB                        | 34.13MB                        
    Compressed Class Space         | 1,024.00MB                     | 0.00MB                         
    Metaspace                      | -0.00MB                        | 0.00MB                         
    Tenured Gen                    | 682.69MB                       | 682.69MB                       


2019-05-24 11:59:06.093 [main] INFO  Engine - 
{
    "content":[
        {
            "reader":{
                "name":"oraclereader",
                "parameter":{
                    "column":[
                        "zybh",
                        "xmbh",
                        "jsbh",
                        "xmmc",
                        "sfje",
                        "fylx",
                        "bz",
                        "sjc",
                        "zllx"
                    ],
                    "connection":[
                        {
                            "jdbcUrl":[
                                "jdbc:oracle:thin:@192.168.11.91:1521:orcl"
                            ],
                            "table":[
                                "JDZLJCXM"
                            ]
                        }
                    ],
                    "password":"********",
                    "username":"bjxxjgxt"
                }
            },
            "writer":{
                "name":"mysqlwriter",
                "parameter":{
                    "column":[
                        "zybh",
                        "xmbh",
                        "jsbh",
                        "xmmc",
                        "sfje",
                        "fylx",
                        "bz",
                        "sjc",
                        "zllx"
                    ],
                    "connection":[
                        {
                            "jdbcUrl":"jdbc:mysql://192.168.11.75:3336/prison_practical_platform",
                            "table":[
                                "JDZLJCXM"
                            ]
                        }
                    ],
                    "password":"****",
                    "username":"root"
                }
            }
        }
    ],
    "setting":{
        "speed":{
            "channel":"5"
        }
    }
}

2019-05-24 11:59:06.111 [main] WARN  Engine - prioriy set to 0, because NumberFormatException, the value is: null
2019-05-24 11:59:06.121 [main] INFO  PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2019-05-24 11:59:06.121 [main] INFO  JobContainer - DataX jobContainer starts job.
2019-05-24 11:59:06.122 [main] INFO  JobContainer - Set jobId = 0
2019-05-24 11:59:06.488 [job-0] INFO  OriginalConfPretreatmentUtil - Available jdbcUrl:jdbc:oracle:thin:@192.168.11.91:1521:orcl.
2019-05-24 11:59:06.605 [job-0] INFO  OriginalConfPretreatmentUtil - table:[JDZLJCXM] has columns:[ZYBH,XMBH,JSBH,XMMC,SFJE,FYLX,BZ,SJC,ZLLX].
2019-05-24 11:59:06.952 [job-0] INFO  OriginalConfPretreatmentUtil - table:[JDZLJCXM] all columns:[
ZYBH,XMBH,JSBH,XMMC,SFJE,FYLX,BZ,SJC,ZLLX
].
2019-05-24 09:48:44.768 [job-0] INFO  OriginalConfPretreatmentUtil - Write data [
INSERT INTO %s (zybh,xmbh,jsbh,xmmc,sfje,fylx,bz,sjc,zllx) VALUES(?,?,?,?,?,?,?,?,?)
], which jdbcUrl like:[jdbc:mysql://192.168.11.75:3336/prison_practical_platform?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true]
2019-05-24 09:48:44.768 [job-0] INFO  JobContainer - jobContainer starts to do prepare ...
2019-05-24 09:48:44.769 [job-0] INFO  JobContainer - DataX Reader.Job [oraclereader] do prepare work .
2019-05-24 09:48:44.769 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] do prepare work .
2019-05-24 09:48:44.769 [job-0] INFO  JobContainer - jobContainer starts to do split ...
2019-05-24 09:48:44.769 [job-0] INFO  JobContainer - Job set Channel-Number to 5 channels.
2019-05-24 09:48:44.772 [job-0] INFO  JobContainer - DataX Reader.Job [oraclereader] splits to [1] tasks.
2019-05-24 09:48:44.772 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] splits to [1] tasks.
2019-05-24 09:48:44.820 [job-0] INFO  JobContainer - jobContainer starts to do schedule ...
2019-05-24 09:48:44.830 [job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.
2019-05-24 09:48:44.832 [job-0] INFO  JobContainer - Running by standalone Mode.
2019-05-24 09:48:44.866 [taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2019-05-24 09:48:44.876 [taskGroup-0] INFO  Channel - Channel set byte_speed_limit to -1, No bps activated.
2019-05-24 09:48:44.877 [taskGroup-0] INFO  Channel - Channel set record_speed_limit to -1, No tps activated.
2019-05-24 09:48:44.898 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2019-05-24 09:48:44.901 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Begin to read record by Sql: [select zybh,xmbh,jsbh,xmmc,sfje,fylx,bz,sjc,zllx from JDZLJCXM 
] jdbcUrl:[jdbc:oracle:thin:@192.168.11.91:1521:orcl].
2019-05-24 09:48:45.050 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Finished read record by Sql: [select zybh,xmbh,jsbh,xmmc,sfje,fylx,bz,sjc,zllx from JDZLJCXM 
] jdbcUrl:[jdbc:oracle:thin:@192.168.11.91:1521:orcl].
2019-05-24 09:48:46.419 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[1538]ms
2019-05-24 09:48:46.420 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
2019-05-24 09:48:54.873 [job-0] INFO  StandAloneJobContainerCommunicator - Total 1554 records, 79079 bytes | Speed 7.72KB/s, 155 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.020s |  All Task WaitReaderTime 0.124s | Percentage 100.00%
2019-05-24 09:48:54.874 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
2019-05-24 09:48:54.874 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] do post work.
2019-05-24 09:48:54.874 [job-0] INFO  JobContainer - DataX Reader.Job [oraclereader] do post work.
2019-05-24 09:48:54.874 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
2019-05-24 09:48:54.875 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: /usr/local/DataX-master/target/datax/datax/hook
2019-05-24 09:48:54.875 [job-0] INFO  JobContainer - 
     [total cpu info] => 
        averageCpu                     | maxDeltaCpu                    | minDeltaCpu                    
        -1.00%                         | -1.00%                         | -1.00%
                        

     [total gc info] => 
         NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime     
         Copy                 | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s             
         MarkSweepCompact     | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s             

2019-05-24 09:48:54.875 [job-0] INFO  JobContainer - PerfTrace not enable!
2019-05-24 09:48:54.876 [job-0] INFO  StandAloneJobContainerCommunicator - Total 1554 records, 79079 bytes | Speed 7.72KB/s, 155 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.020s |  All Task WaitReaderTime 0.124s | Percentage 100.00%
2019-05-24 09:48:54.876 [job-0] INFO  JobContainer - 
任務啓動時刻                    : 2019-05-24 09:48:43
任務結束時刻                    : 2019-05-24 09:48:54
任務總計耗時                    :                 11s
任務平均流量                    :            7.72KB/s
記錄寫入速度                    :            155rec/s
讀出記錄總數                    :                1554
讀寫失敗總數                    :                   0

 在配置過程當中出現的問題,錯誤以下

[ERROR] Failed to execute goal on project otsstreamreader: Could not resolve dependencies for project com.alibaba.datax:otsstreamreader:jar:1.0.0-SNAPSHOT: Could not find artifact 
com.aliyun.openservices:tablestore-streamclient:jar:1.0.0-SNAPSHOT -> [Help 1]

 該錯誤是因爲快照版本不一致,因爲ots基本不會被用到,直接把pom.xml中的<module>ots</module>去掉,也能夠更改版本otsstreamreader中的默認版本爲0.0.1,改成1.0.0

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-jar-plugin:2.4:jar (default-jar) on project ocswriter: Error assembling JAR: /Users/FengZhen/Desktop/Hadoop/DataX/源碼/DataX/ocswriter/pom.xml isn't a file. -> [Help 1]

 將ocs註釋掉,從新打包便可。

參考來源:https://github.com/alibaba/DataX/blob/master/userGuid.md

     http://www.cnblogs.com/EnzoDin/p/9979583.html

相關文章
相關標籤/搜索