打虎親兄弟,上陣父子兵。程序員
本章做爲複雜搜索的鋪墊,介紹父子文檔是爲了更好的介紹複雜場景下的ES操做。數據庫
在非關係型數據庫數據庫中,咱們經常會有表與表的關聯查詢。例如學生表和成績表的關聯查詢就能查出學會的信息和成績信息。在ES中,父子關係文檔就相似於表的關聯查詢。json
ES5.x開始藉助父子關係文檔實現多表關聯查詢,核心是一個索引Index下能夠建立多個類型Type。但ES6.x開始只容許一個索引Index下建立一個類型Type,甚至在將來的版本中將會移除建立類型Type。爲了繼續支持多表關聯查詢,ES6.x推出了join
新類型來支持父子關係文檔的建立。數據結構
假設如今有這樣的需求場景:一個博客有多篇文章,文章有標題、內容、做者、日期等信息,同時一篇文章中會有評論,評論有評論的內容、做者、日期等信息,經過ES來存儲博客的文章及評論信息。app
此時文章自己就是"父",而評論就是"子",這類問題也能夠經過nested
嵌套對象實現,大部分狀況下netsted
嵌套對象和parent-child
父子對象可以互相替代,但他們仍然不一樣的優缺點。下面將介紹這兩種數據結構。code
一篇文章的數據結構以下圖所示:對象
{ "title":"ElasticSearch6.x實戰教程", "author":"OKevin", "content":"這是一篇水文", "created":1562141626000, "comments":[{ "name":"張三", "content":"寫的真菜", "created":1562141689000 },{ "name":"李四", "content":"辣雞", "created":1562141745000 }] }
經過RESTful API建立索引及定義映射結構:blog
PUT http://localhost:9200/blog { "mappings":{ "article":{ "properties":{ "title":{ "type":"text", "analyzer":"ik_smart", "fields":{ "keyword":{ "type":"keyword", "ignore_above":256 } } }, "author":{ "type":"text", "analyzer":"ik_smart", "fields":{ "keyword":{ "type":"keyword", "ignore_above":256 } } }, "content":{ "type":"text", "analyzer":"ik_smart" }, "created":{ "type":"date" }, "comments":{ "type":"nested", "properties":{ "name":{ "type":"text", "analyzer":"ik_smart", "fields":{ "keyword":{ "type":"keyword", "ignore_above":256 } } }, "content":{ "type":"text", "analyzer":"ik_smart", "fields":{ "keyword":{ "type":"keyword", "ignore_above":256 } } }, "created":{ "type":"date" } } } } } } }
插入數據:教程
POST http://localhost:9200/blog/article { "title":"ElasticSearch6.x實戰教程", "author":"OKevin", "content":"這是一篇水文", "created":1562141626000, "comments":[{ "name":"張三", "content":"寫的真菜", "created":1562141689000 },{ "name":"李四", "content":"辣雞", "created":1562141745000 }] }
POST http://localhost:9200/blog/article { "title":"ElasticSearch6.x從入門到放棄", "author":"OKevin", "content":"這是一篇ES從入門到放棄文章", "created":1562144089000, "comments":[{ "name":"張三", "content":"我已入門", "created":1562144089000 },{ "name":"李四", "content":"我已放棄", "created":1562144089000 }] }
POST http://localhost:9200/blog/article { "title":"ElasticSearch6.x原理解析", "author":"專家", "content":"這是一篇ES原理解析的文章", "created":1562144089000, "comments":[{ "name":"張三", "content":"牛逼,專家就是不同", "created":1562144089000 },{ "name":"李四", "content":"大牛", "created":1562144089000 }] }
GET http://localhost:9200/blog/article/_search { "query":{ "bool":{ "must":[{ "match":{ "author.keyword":"OKevin" } }] } } }
ES結果返回2條做者爲"OKevin"的所有數據。索引
GET http://localhost:9200/blog/article/_search { "query":{ "bool":{ "must":[{ "match":{ "author.keyword":"OKevin" } },{ "nested":{ "path":"comments", "query":{ "bool":{ "must":[{ "match":{ "comments.content":"辣雞" } }] } } } }] } } }
ES確實只返回了包含"辣雞"的數據。
兩次查詢都直接返回了整個文檔數據。
既然父子文檔能實現表的關聯查詢,那它的數據結構就應該是這樣:
文章數據結構
{ "title":"ElasticSearch6.x實戰教程", "author":"OKevin", "content":"這是一篇實戰教程", "created":1562141626000, "comments":[] }
評論數據結構
{ "name":"張三", "content":"寫的真菜", "created":1562141689000 }
ES6.x之前是將這兩個結構分別存儲在兩個類型Type中關聯(這看起來更接近關係型數據庫表與表的關聯查詢),但在ES6.x開始一個索引Index只能建立一個類型Type,要再想實現表關聯查詢,就意味着須要把上述兩張表揉在一塊兒,ES6.x由此定義了一個新的數據類型——join
。
經過RESTful API建立索引及定義映射結構:
{ "mappings":{ "article":{ "properties":{ "title":{ "type":"text", "analyzer":"ik_smart", "fields":{ "keyword":{ "type":"keyword", "ignore_above":256 } } }, "author":{ "type":"text", "analyzer":"ik_smart", "fields":{ "keyword":{ "type":"keyword", "ignore_above":256 } } }, "content":{ "type":"text", "analyzer":"ik_smart" }, "created":{ "type":"date" }, "comments":{ "type":"join", "relations":{ "article":"comment" } } } } } }
重點關注其中的"comments"字段,能夠看到類型定義爲join
,relations定義了誰是父誰是子,"article":"comment"表示article是父comment是子。
父子文檔的插入是父與子分別插入(由於能夠理解爲把多個表塞到了一張表裏)。
插入父文檔:
POST http://localhost:9200/blog/article/1 { "title":"ElasticSearch6.x實戰教程", "author":"OKevin", "content":"這是一篇水文", "created":1562141626000, "comments":"article" }
POST http://localhost:9200/blog/article/2 { "title":"ElasticSearch6.x從入門到放棄", "author":"OKevin", "content":"這是一篇ES從入門到放棄文章", "created":1562144089000, "comments":"article" }
POST http://localhost:9200/blog/article/3 { "title":"ElasticSearch6.x原理解析", "author":"專家", "content":"這是一篇ES原理解析的文章", "created":1562144089000, "comments":"article" }
插入子文檔:
POST http://localhost:9200/blog/article/4?routing=1 { "name":"張三", "content":"寫的真菜", "created":1562141689000, "comments":{ "name":"comment", "parent":1 } }
POST http://localhost:9200/blog/article/5?routing=1 { "name":"李四", "content":"辣雞", "created":1562141745000, "comments":{ "name":"comment", "parent":1 } }
POST http://localhost:9200/blog/article/6?routing=2 { "name":"張三", "content":"我已入門", "created":1562144089000, "comments":{ "name":"comment", "parent":2 } }
POST http://localhost:9200/blog/article/7?routing=2 { "name":"李四", "content":"我已放棄", "created":1562144089000, "comments":{ "name":"comment", "parent":2 } }
POST http://localhost:9200/blog/article/8?routing=3 { "name":"張三", "content":"牛逼,專家就是不同", "created":1562144089000, "comments":{ "name":"comment", "parent":3 } }
POST http://localhost:9200/blog/article/9?routing=3 { "name":"李四", "content":"大牛", "created":1562144089000, "comments":{ "name":"comment", "parent":3 } }
若是查詢索引數據會發現一共有9條數據,並非nested
那樣將"評論"嵌套"文章"中的。
GET http://localhost:9200/blog/article/_search { "query":{ "has_parent":{ "parent_type":"article", "query":{ "match":{ "author.keyword":"OKevin" } } } } }
ES只返回了comment評論結構中的數據,而不是所有包括文章數據也返回。這是嵌套對象查詢與父子文檔查詢的區別之一——子文檔能夠單獨返回。
GET http://localhost:9200/blog/artice/_search { "query":{ "has_child":{ "type":"comment", "query":{ "match":{ "content":"辣雞" } } } } }
ES一樣也只返回了父文檔的數據,而沒有子文檔(評論)的數據。
nested
嵌套對象和parent-child
父子文檔之間最大的區別,嵌套對象中的"父子"是一個文檔數據,而父子文檔的中的"父子"是兩個文檔數據。這意味着嵌套對象中若是涉及對嵌套文檔的操做會對整個文檔形成影響(從新索引,但查詢快),包括修改、刪除、查詢。而父子文檔子文檔或者父文檔自己就是獨立的文檔,對子文檔或者父文檔的操做並不會相互影響(不會從新索引,查詢相對慢)。
關注公衆號:CoderBuff,回覆「es」獲取《ElasticSearch6.x實戰教程》完整版PDF。