百度:咱們好比說想找尋任何的信息的時候,就會上百度去搜索一下,好比說找一部本身喜歡的電影,或者說找一本喜歡的書,或者找一條感興趣的新聞(提到搜索的第一印象),百度 != 搜索java
垂直搜索(站內搜索)node
互聯網的搜索:電商網站,招聘網站,新聞網站,各類app算法
IT系統的搜索:OA軟件,辦公自動化軟件,會議管理,日程管理,項目管理,員工管理,搜索「張三」,「張三兒」,「張小三」;有個電商網站,賣家,後臺管理系統,搜索「牙膏」,訂單,「牙膏相關的訂單」數據庫
搜索,就是在任何場景下,找尋你想要的信息,這個時候,會輸入一段你要搜索的關鍵字,而後就指望找到這個關鍵字相關的有些信息json
作軟件開發的話,或者對IT、計算機有必定的瞭解的話,都知道,數據都是存儲在數據庫裏面的,好比說電商網站的商品信息,招聘網站的職位信息,新聞網站的新聞信息,等等吧。因此說,很天然的一點,若是說從技術的角度去考慮,如何實現如說,電商網站內部的搜索功能的話,就能夠考慮,去使用數據庫去進行搜索。windows
用數據庫來實現搜索,是不太靠譜的。一般來講,性能會不好的。api
全文檢索:倒排索引服務器
lucene:就是一個jar包,裏面包含了封裝好的各類創建倒排索引,以及進行搜索的代碼,包括各類算法。咱們就用java開發的時候,引入lucene jar,而後基於lucene的api進行去進行開發就能夠了。用lucene,咱們就能夠去將已有的數據創建索引,lucene會在本地磁盤上面,給咱們組織索引的數據結構。另外的話,咱們也能夠用lucene提供的一些功能和api來針對磁盤上額restful
Elasticsearch 是一個分佈式、RESTful 風格的搜索和數據分析引擎,可以解決不斷涌現出的各類用例。 做爲 Elastic Stack 的核心,它集中存儲您的數據,幫助您發現意料之中以及意料以外的狀況。網絡
搜索:百度,網站的站內搜索,IT系統的檢索
數據分析:電商網站,最近7天牙膏這種商品銷量排名前10的商家有哪些;新聞網站,最近1個月訪問量排名前3的新聞版塊是哪些
分佈式,搜索,數據分析
全文檢索:我想搜索商品名稱包含牙膏的商品,select * from products where product_name like "%牙膏%"
結構化檢索:我想搜索商品分類爲日化用品的商品都有哪些,select * from products where category_id='日化用品'
部分匹配、自動完成、搜索糾錯、搜索推薦
數據分析:咱們分析每個商品分類下有多少個商品,select category_id,count(*) from products group by category_id
分佈式:ES自動能夠將海量數據分散到多臺服務器上去存儲和檢索
海聯數據的處理:分佈式之後,就能夠採用大量的服務器去存儲和檢索數據,天然而然就能夠實現海量數據的處理了
近實時:檢索個數據要花費1小時(這就不要近實時,離線批處理,batch-processing);在秒級別對數據進行搜索和分析
跟分佈式/海量數據相反的:lucene,單機應用,只能在單臺服務器上使用,最多隻能處理單臺服務器能夠處理的數據量
(1)能夠做爲一個大型分佈式集羣(數百臺服務器)技術,處理PB級數據,服務大公司;也能夠運行在單機上,服務小公司
(2)Elasticsearch不是什麼新技術,主要是將全文檢索、數據分析以及分佈式技術,合併在了一塊兒,才造成了獨一無二的ES;lucene(全文檢索),商用的數據分析軟件(也是有的),分佈式數據庫(mycat)
(3)對用戶而言,是開箱即用的,很是簡單,做爲中小型的應用,直接3分鐘部署一下ES,就能夠做爲生產環境的系統來使用了,數據量不大,操做不是太複雜
(4)數據庫的功能面對不少領域是不夠用的(事務,還有各類聯機事務型的操做);特殊的功能,好比全文檢索,同義詞處理,相關度排名,複雜數據分析,海量數據的近實時處理;Elasticsearch做爲傳統數據庫的一個補充,提供了數據庫所不不能提供的不少功能
(1)分佈式的文檔存儲引擎
(2)分佈式的搜索引擎和分析引擎
(3)分佈式,支持PB級數據
// name: node名稱 // cluster_name: 集羣名稱(默認的集羣名稱就是elasticsearch) // version.number: 5.2.0,es版本號 { name: "1LdqLFq", cluster_name: "elasticsearch", cluster_uuid: "5pqT0Q_XQky6GKjSiFgilA", version: { number: "5.2.0", build_hash: "24e05b9", build_date: "2017-01-24T19:52:35.800Z", build_snapshot: false, lucene_version: "6.4.0" }, tagline: "You Know, for Search" }
應用系統的數據結構都是面向對象的,複雜的
有一個電商網站,須要爲其基於ES構建一個後臺系統,提供如下功能:
GET /_cat/health?v
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent 1566094709 10:18:29 elasticsearch yellow 1 1 1 1 0 0 1 0 - 50.0%
如何快速瞭解集羣的健康情況?green、yellow、red?
green:每一個索引的primary shard和replica shard都是active狀態的
yellow:每一個索引的primary shard都是active狀態的,可是部分replica shard不是active狀態,處於不可用的狀態
red:不是全部索引的primary shard都是active狀態的,部分索引有數據丟失了
爲何如今會處於一個yellow狀態?
咱們如今就一個筆記本電腦,就啓動了一個es進程,至關於就只有一個node。
如今es中有一個index,就是kibana本身內置創建的index。
因爲默認的配置是給每一個index分配5個primary shard和5個replica shard,並且primary shard和replica shard不能在同一臺機器上(爲了容錯)。
如今kibana本身創建的index是1個primary shard和1個replica shard。
當前就一個node,因此只有1個primary shard被分配了和啓動了,可是一個replica shard沒有第二臺機器去啓動。
作一個小實驗:此時只要啓動第二個es進程,就會在es集羣中有2個node,而後那1個replica shard就會自動分配過去,而後cluster status就會變成green狀態。
GET _cat/indices?v
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open .kibana xpiNHK4UQb2569AzgiveSw 1 1 1 0 3.1kb 3.1kb
PUT /test_index?pretty
DELETE /test_index?pretty
語法:
PUT /index/type/id { "json數據" }
示例:
PUT /ecommerce/product/1 { "name" : "gaolujie yagao", "desc" : "gaoxiao meibai", "price" : 30, "producer" : "gaolujie producer", "tags": [ "meibai", "fangzhu" ] } { "_index": "ecommerce", "_type": "product", "_id": "1", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "created": true } PUT /ecommerce/product/2 { "name" : "jiajieshi yagao", "desc" : "youxiao fangzhu", "price" : 25, "producer" : "jiajieshi producer", "tags": [ "fangzhu" ] } PUT /ecommerce/product/3 { "name" : "zhonghua yagao", "desc" : "caoben zhiwu", "price" : 40, "producer" : "zhonghua producer", "tags": [ "qingxin" ] }
es會自動創建index和type,不須要提早建立,並且es默認會對document每一個field都創建倒排索引,讓其能夠被搜索
語法:
GET /index/type/id
示例:
GET /ecommerce/product/1 { "_index": "ecommerce", "_type": "product", "_id": "1", "_version": 1, "found": true, "_source": { "name": "gaolujie yagao", "desc": "gaoxiao meibai", "price": 30, "producer": "gaolujie producer", "tags": [ "meibai", "fangzhu" ] } }
語法:
PUT /index/type/id { "json數據" }
示例:
PUT /ecommerce/product/1 { "name" : "jiaqiangban gaolujie yagao", "desc" : "gaoxiao meibai", "price" : 30, "producer" : "gaolujie producer", "tags": [ "meibai", "fangzhu" ] } { "_index": "ecommerce", "_type": "product", "_id": "1", "_version": 2, "result": "updated", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "created": false }
替換方式有一個很差,即便必須帶上全部的field,才能去進行信息的修改(意思是會所有覆蓋)
語法:
POST /index/type/id/_update { "json數據" }
示例:
POST /ecommerce/product/1/_update { "doc": { "name": "jiaqiangban gaolujie yagao" } } { "_index": "ecommerce", "_type": "product", "_id": "1", "_version": 8, "result": "updated", "_shards": { "total": 2, "successful": 1, "failed": 0 } }
語法:
DELETE /index/type/id
示例:
DELETE /ecommerce/product/1 { "found": true, "_index": "ecommerce", "_type": "product", "_id": "1", "_version": 9, "result": "deleted", "_shards": { "total": 2, "successful": 1, "failed": 0 } }
搜索所有商品:
GET /ecommerce/product/_search
{ "took": 3, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 3, "max_score": 1, "hits": [ { "_index": "ecommerce", "_type": "product", "_id": "2", "_score": 1, "_source": { "name": "jiajieshi yagao", "desc": "youxiao fangzhu", "price": 25, "producer": "jiajieshi producer", "tags": [ "fangzhu" ] } }, { "_index": "ecommerce", "_type": "product", "_id": "1", "_score": 1, "_source": { "name": "gaolujie yagao", "desc": "gaoxiao meibai", "price": 30, "producer": "gaolujie producer", "tags": [ "meibai", "fangzhu" ] } }, { "_index": "ecommerce", "_type": "product", "_id": "3", "_score": 1, "_source": { "name": "zhonghua yagao", "desc": "caoben zhiwu", "price": 40, "producer": "zhonghua producer", "tags": [ "qingxin" ] } } ] } }
took:耗費了幾毫秒
timed_out:是否超時,這裏是沒有
_shards:數據拆成了5個分片,因此對於搜索請求,會打到全部的primary shard(或者是它的某個replica shard也能夠)
hits.total:查詢結果的數量,3個document
hits.max_score:score的含義,就是document對於一個search的相關度的匹配分數,越相關,就越匹配,分數也高
hits.hits:包含了匹配搜索的document的詳細數據
搜索商品名稱中包含yagao的商品,並且按照售價降序排序
GET /ecommerce/product/_search?q=name:yagao&sort=price:desc
query string search的由來,由於search參數都是以http請求的query string來附帶的
適用於臨時的在命令行使用一些工具,好比curl,快速的發出請求,來檢索想要的信息;可是若是查詢請求很複雜,是很難去構建的
在生產環境中,幾乎不多使用query string search
DSL:Domain Specified Language,特定領域的語言
優勢:更加適合生產環境的使用,能夠構建複雜的查詢
http request body:請求體,能夠用json的格式來構建查詢語法,比較方便,能夠構建各類複雜的語法,比query string search確定強大多了
GET /ecommerce/product/_search { "query": { "match_all": {} } }
GET /ecommerce/product/_search { "query": { "match": { "name": "yagao" } }, "sort": [ { "price": { "order": "desc" } } ] }
分頁查詢商品,總共3條商品,假設每頁就顯示1條商品,如今顯示第2頁,因此就查出來第2個商品
GET /ecommerce/product/_search { "query": { "match_all": {} }, "from": 1, "size": 1 }
GET /ecommerce/product/_search { "query": { "match_all": {} }, "_source": ["name","price"] }
搜索商品名稱包含yagao,並且售價大於25元的商品
GET /ecommerce/product/_search { "query": { "bool": { "must": { "match": { "name": "yagao" } }, "filter": { "range": { "price": { "gt": 25 } } } } } }
新增測試數據
PUT /ecommerce/product/4 { "name":"special yagao", "desc":"special meibai", "price":50, "producer":"special yagao producer", "tags":["meibai"] }
全文模糊檢索
GET /ecommerce/product/_search { "query" : { "match" : { "producer" : "yagao producer" } } }
{ "took": 4, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 4, "max_score": 0.70293105, "hits": [ { "_index": "ecommerce", "_type": "product", "_id": "4", "_score": 0.70293105, "_source": { "name": "special yagao", "desc": "special meibai", "price": 50, "producer": "special yagao producer", "tags": [ "meibai" ] } }, { "_index": "ecommerce", "_type": "product", "_id": "1", "_score": 0.25811607, "_source": { "name": "gaolujie yagao", "desc": "gaoxiao meibai", "price": 30, "producer": "gaolujie producer", "tags": [ "meibai", "fangzhu" ] } }, { "_index": "ecommerce", "_type": "product", "_id": "3", "_score": 0.25811607, "_source": { "name": "zhonghua yagao", "desc": "caoben zhiwu", "price": 40, "producer": "zhonghua producer", "tags": [ "qingxin" ] } }, { "_index": "ecommerce", "_type": "product", "_id": "2", "_score": 0.1805489, "_source": { "name": "jiajieshi yagao", "desc": "youxiao fangzhu", "price": 25, "producer": "jiajieshi producer", "tags": [ "fangzhu" ] } } ] } }
跟全文檢索相對應,相反,全文檢索會將輸入的搜索串拆解開來,去倒排索引裏面去一一匹配,只要能匹配上任意一個拆解後的單詞,就能夠做爲結果返回
phrase search,要求輸入的搜索串,必須在指定的字段文本中,徹底包含如出一轍的,才能夠算匹配,才能做爲結果返回
GET /ecommerce/product/_search { "query" : { "match_phrase" : { "producer" : "yagao producer" } } }
{ "took": 7, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.70293105, "hits": [ { "_index": "ecommerce", "_type": "product", "_id": "4", "_score": 0.70293105, "_source": { "name": "special yagao", "desc": "special meibai", "price": 50, "producer": "special yagao producer", "tags": [ "meibai" ] } } ] } }
GET /ecommerce/product/_search { "query" : { "match" : { "producer" : "producer" } }, "highlight": { "fields" : { "producer" : {} } } }
{ "took": 15, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 4, "max_score": 0.25811607, "hits": [ { "_index": "ecommerce", "_type": "product", "_id": "1", "_score": 0.25811607, "_source": { "name": "gaolujie yagao", "desc": "gaoxiao meibai", "price": 30, "producer": "gaolujie producer", "tags": [ "meibai", "fangzhu" ] }, "highlight": { "producer": [ "gaolujie <em>producer</em>" ] } }, { "_index": "ecommerce", "_type": "product", "_id": "3", "_score": 0.25811607, "_source": { "name": "zhonghua yagao", "desc": "caoben zhiwu", "price": 40, "producer": "zhonghua producer", "tags": [ "qingxin" ] }, "highlight": { "producer": [ "zhonghua <em>producer</em>" ] } }, { "_index": "ecommerce", "_type": "product", "_id": "2", "_score": 0.1805489, "_source": { "name": "jiajieshi yagao", "desc": "youxiao fangzhu", "price": 25, "producer": "jiajieshi producer", "tags": [ "fangzhu" ] }, "highlight": { "producer": [ "jiajieshi <em>producer</em>" ] } }, { "_index": "ecommerce", "_type": "product", "_id": "4", "_score": 0.14638957, "_source": { "name": "special yagao", "desc": "special meibai", "price": 50, "producer": "special yagao producer", "tags": [ "meibai" ] }, "highlight": { "producer": [ "special yagao <em>producer</em>" ] } } ] } }
//將文本field的fielddata屬性設置爲true PUT /ecommerce/_mapping/product { "properties": { "tags": { "type": "text", "fielddata": true } } } // 聚合計算 GET /ecommerce/product/_search { "aggs": { "group_by_tags": { "terms": { "field": "tags" } } } }
{ "took": 20, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 4, "max_score": 1, "hits": [ { "_index": "ecommerce", "_type": "product", "_id": "2", "_score": 1, "_source": { "name": "jiajieshi yagao", "desc": "youxiao fangzhu", "price": 25, "producer": "jiajieshi producer", "tags": [ "fangzhu" ] } }, { "_index": "ecommerce", "_type": "product", "_id": "4", "_score": 1, "_source": { "name": "special yagao", "desc": "special meibai", "price": 50, "producer": "special yagao producer", "tags": [ "meibai" ] } }, { "_index": "ecommerce", "_type": "product", "_id": "1", "_score": 1, "_source": { "name": "gaolujie yagao", "desc": "gaoxiao meibai", "price": 30, "producer": "gaolujie producer", "tags": [ "meibai", "fangzhu" ] } }, { "_index": "ecommerce", "_type": "product", "_id": "3", "_score": 1, "_source": { "name": "zhonghua yagao", "desc": "caoben zhiwu", "price": 40, "producer": "zhonghua producer", "tags": [ "qingxin" ] } } ] }, "aggregations": { "group_by_tags": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "fangzhu", "doc_count": 2 }, { "key": "meibai", "doc_count": 2 }, { "key": "qingxin", "doc_count": 1 } ] } } }
不返回hit信息
GET /ecommerce/product/_search { "size": 0, "aggs": { "all_tags": { "terms": { "field": "tags" } } } }
{ "took": 20, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 4, "max_score": 0, "hits": [] }, "aggregations": { "group_by_tags": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "fangzhu", "doc_count": 2 }, { "key": "meibai", "doc_count": 2 }, { "key": "qingxin", "doc_count": 1 } ] } } }
GET /ecommerce/product/_search { "query": { "match": { "name": "yagao" } }, "size": 0, "aggs": { "group_by_tags": { "terms": { "field": "tags" } } } }
{ "took": 2, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 4, "max_score": 0, "hits": [] }, "aggregations": { "group_by_tags": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "fangzhu", "doc_count": 2 }, { "key": "meibai", "doc_count": 2 }, { "key": "qingxin", "doc_count": 1 } ] } } }
GET /ecommerce/product/_search { "size": 0, "aggs": { "group_by_tags": { "terms": { "field": "tags" }, "aggs": { "avg_price": { "avg": { "field": "price" } } } } } }
{ "took": 3, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 4, "max_score": 0, "hits": [] }, "aggregations": { "group_by_tags": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "fangzhu", "doc_count": 2, "avg_price": { "value": 27.5 } }, { "key": "meibai", "doc_count": 2, "avg_price": { "value": 40 } }, { "key": "qingxin", "doc_count": 1, "avg_price": { "value": 40 } } ] } } }
GET /ecommerce/product/_search { "size": 0, "aggs": { "group_by_tags": { "terms": { "field": "tags", "order": { "avg_price": "desc" } }, "aggs": { "avg_price": { "avg": { "field": "price" } } } } } }
{ "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 4, "max_score": 0, "hits": [] }, "aggregations": { "group_by_tags": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "meibai", "doc_count": 2, "avg_price": { "value": 40 } }, { "key": "qingxin", "doc_count": 1, "avg_price": { "value": 40 } }, { "key": "fangzhu", "doc_count": 2, "avg_price": { "value": 27.5 } } ] } } }
GET /ecommerce/product/_search { "size": 0, "aggs":{ "group_by_price":{ "range": { "field": "price", "ranges": [ { "from": 0, "to": 20 },{ "from": 20, "to": 40 },{ "from": 40, "to": 50 } ] }, "aggs": { "group_by_tags": { "terms": { "field": "tags" }, "aggs": { "average_price": { "avg": { "field": "price" } } } } } } } }
{ "took": 3, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 4, "max_score": 0, "hits": [] }, "aggregations": { "group_by_price": { "buckets": [ { "key": "0.0-20.0", "from": 0, "to": 20, "doc_count": 0, "group_by_tags": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [] } }, { "key": "20.0-40.0", "from": 20, "to": 40, "doc_count": 2, "group_by_tags": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "fangzhu", "doc_count": 2, "average_price": { "value": 27.5 } }, { "key": "meibai", "doc_count": 1, "average_price": { "value": 30 } } ] } }, { "key": "40.0-50.0", "from": 40, "to": 50, "doc_count": 1, "group_by_tags": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "qingxin", "doc_count": 1, "average_price": { "value": 40 } } ] } } ] } } }