發現模式提供了一種描述tap 支持數據流的方式,使用了json schema 作爲描述數據的結構以及每一個數據流的
類型,發現模式的實現依賴tap 的數據源,有些taps 將硬編碼每一個流的模式,而其餘的將鏈接到提供可用流的
描述的api,當運行發現模式時,tap 應該寫如stdout 流列表,稱爲目錄,每一個條目包含關於流的一些基本信息和
描述流的json schema
發現模式下運行tap, 使用--discover
node
tap --config CONFIG --discover
咱們能夠在運行的時候將輸出重定向到一個文件git
tap --config CONFIG --discover > catalog.json
對於一些遺留的taps ,會使用properties.json 作爲目錄github
JSON用於表示數據,由於它無處不在,可讀,而且特別適用於將數據公開爲JSON(如Web API)的大量源。可是,
JSON遠非完美:sql
Stitch Target和Stitch API使用schema以下:數據庫
{
"type": [
"null",
"object"
],
"additionalProperties": false,
"properties": {
"id": {
"type": [
"null",
"string"
],
},
"name": {
"type": [
"null",
"string"
],
},
"date_modified": {
"type": [
"null",
"string"
],
"format": "date-time",
}
}
}
發現模式的輸出應該是Tap支持的數據流列表。此JSON格式的列表稱爲目錄。頂層是一個對象,其中一個被調用的鍵"streams"指向一個對象數組,
每一個對象都有如下字段:
tap_stream_id 字符串 須要 流的惟一標識符。容許這與流的名稱不一樣,以容許具備重複流名稱的源。
schema 對象 須要 流的JSON模式。
table_name 字符串 可選的 對於數據庫源,表的名稱。
metadata 元數據數組 可選的 請參閱下面的元數據以獲取解釋
參考:express
{
"streams": [
{
"tap_stream_id": "users",
"stream": "users",
"schema": {
"type": ["null", "object"],
"additionalProperties": false,
"properties": {
"id": {
"type": [
"null",
"string"
],
},
"name": {
"type": [
"null",
"string"
],
},
"date_modified": {
"type": [
"null",
"string"
],
"format": "date-time",
}
}
}
}
]
}
元數據是關聯模式中節點的額外信息的首選機制。
應該經過tap 來寫入和讀取某些元數據。此元數據稱爲discoverable元數據。其餘元數據將由其餘系統(如UI)編寫
,所以只能經過tap讀取。這種類型的元數據稱爲non-discoverable元數據
參考的字段信息:json
Keyword | Tap Type | Discoverable? | Description |
---|---|---|---|
selected |
any | non-discoverable | Either true or false . Indicates that this node in the schema has been selected by the user for replication. |
replication-method |
any | non-discoverable | Either FULL_TABLE , INCREMENTAL , or LOG_BASED . The replication method to use for a stream. |
replication-key |
any | non-discoverable | The name of a property in the source to use as a "bookmark". For example, this will often be an "updated-at" field or an auto-incrementing primary key (requires replication-method ). |
view-key-properties |
database | non-discoverable | List of key properties for a database view. |
inclusion |
any | discoverable | Either available , automatic , or unsupported . available means the field is available for selection, and the tap will only emit values for that field if it is marked with "selected": true . automatic means that the tap will emit values for the field. unsupported means that the field exists in the source data but the tap is unable to provide it. |
selected-by-default |
any | discoverable | Either true or false . Indicates if a node in the schema should be replicated if a user has not expressed any opinion on whether or not to replicate it. |
valid-replication-keys |
any | discoverable | List of the fields that could be used as replication keys. |
schema-name |
any | discoverable | The name of the stream. |
forced-replication-method |
any | discoverable | Used to force the replication method to either FULL_TABLE or INCREMENTAL . |
table-key-properties |
database | discoverable | List of key properties for a database table. |
is-view |
database | discoverable | Either true or false . Indicates whether a stream corresponds to a database view. |
row-count |
database | discoverable | Number of rows in a database table/view. |
database-name |
database | discoverable | Name of database. |
sql-datatype |
database | discoverable | Represents the datatype of a database column. |
參考的數據格式api
{
"metadata" : {
"selected" : true,
"some-other-metadata" : "whatever"
},
"breadcrumb" : ["properties", "some-field-name"]
}
上面的breadcrumb對象定義了到元數據所屬節點的模式的路徑。流的元數據將具備空的麪包屑。
參考完整例子數組
{
"streams": [
{
"tap_stream_id": "users",
"stream": "users",
"schema": {
"type": ["null", "object"],
"additionalProperties": false,
"properties": {
"id": {
"type": [
"null",
"string"
],
},
"name": {
"type": [
"null",
"string"
],
},
"date_modified": {
"type": [
"null",
"string"
],
"format": "date-time",
}
}
},
"metadata": [
{
"metadata": {
"inclusion": "available",
"table-key-properties": ["id"],
"selected-by-default": true,
"valid-replication-keys": ["date_modified"],
"schema-name": "users",
},
"breadcrumb": []
},
{
"metadata": {
"inclusion": "automatic",
},
"breadcrumb": ["properties", "id"]
},
{
"metadata": {
"inclusion": "available",
"selected-by-default": true,
},
"breadcrumb": ["properties", "name"]
},
{
"metadata": {
"inclusion": "automatic",
},
"breadcrumb": ["properties", "date_modified"]
}
]
}
]
}
https://github.com/singer-io/getting-started/blob/master/docs/DISCOVERY_MODE.md數據結構