sqoop 增量導入,數據重複問題

根據自增ID導入數據時重複,能夠使用下列方法html

image.png

圖片來源
http://cn.voidcc.com/question...apache

下面是官網文檔手冊
https://sqoop.apache.org/docs...app

7.2.10. Incremental Imports

Sqoop provides an incremental import mode which can be used to retrieve only rows newer than some previously-imported set of rows.ide

Sqoop提供了一種增量導入模式,該模式可用於僅檢索比某些先前導入的行集新的行。 oop

The following arguments control incremental imports:ui

如下參數控制增量導入: this

Argument Descriptionspa

--check-column (col)3d

Specifies the column to be examined when determining which rows to import. (the column should not be of type CHAR/NCHAR/VARCHAR/VARNCHAR/ LONGVARCHAR/LONGNVARCHAR)orm

指定在肯定要導入的行時要檢查的列。(該列的類型不得爲CHAR / NCHAR / VARCHAR / VARNCHAR / LONGVARCHAR / LONGNVARCHAR)

--incremental (mode)

Specifies how Sqoop determines which rows are new. Legal values for mode include append and lastmodified.

指定Sqoop如何肯定哪些行是新的。modeincludeappend和的法律價值lastmodified。

--last-value (value)

Specifies the maximum value of the check column from the previous import.

指定上一次導入中檢查列的最大值。


Sqoop supports two types of incremental imports:appendandlastmodified. You can use the--incrementalargument to specify the type of incremental import to perform.

Sqoop支持兩種類型的增量導入:append和lastmodified。您能夠使用--incremental參數指定要執行的增量導入的類型。

You should specifyappendmode when importing a table where new rows are continually being added with increasing row id values. You specify the column containing the row’s id with--check-column. Sqoop imports rows where the check column has a value greater than the one specified with--last-value.

append導入表時,應指定模式,在該表中,將隨着行ID值的增長而不斷添加新行。您能夠使用指定包含行ID的列--check-column。Sqoop導入行,其中check列的值大於用所指定的值--last-value。

An alternate table update strategy supported by Sqoop is calledlastmodifiedmode. You should use this when rows of the source table may be updated, and each such update will set the value of a last-modified column to the current timestamp. Rows where the check column holds a timestamp more recent than the timestamp specified with--last-valueare imported.

Sqoop支持的替表明更新策略稱爲lastmodified模式。當源表的行可能會更新時,應該使用此方法,而且每次此類更新會將上次修改的列的值設置爲當前時間戳。--last-value導入檢查列保存的時間戳比使用指定的時間戳更新的時間戳的行。

At the end of an incremental import, the value which should be specified as--last-valuefor a subsequent import is printed to the screen. When running a subsequent import, you should specify--last-valuein this way to ensure you import only the new or updated data. This is handled automatically by creating an incremental import as a saved job, which is the preferred mechanism for performing a recurring incremental import. See the section on saved jobs later in this document for more information.

在增量導入結束時,應--last-value爲後續導入指定的值將顯示在屏幕上。運行後續導入時,應--last-value以這種方式指定以確保僅導入新數據或更新數據。經過將增量導入建立爲保存的做業來自動處理此問題,這是執行循環增量導入的首選機制。有關更多信息,請參閱本文檔後面有關已保存做業的部分。

參考文檔:
http://cn.voidcc.com/question...
https://sqoop.apache.org/docs...

相關文章
相關標籤/搜索