elasticsearch入門

時間 2019-11-06

原文原文鏈接

ES Restful API GET、POST、PUT、DELETE、HEAD含義：
1）GET：獲取請求對象的當前狀態。
2）POST：改變對象的當前狀態。
3）PUT：建立一個對象。
4）DELETE：銷燬對象。
5）HEAD：請求獲取對象的基礎信息。html

Mysql與Elasticsearch核心概念對比示意圖git

一.插入

1.PUT指定Id插入

PUT /megacorp/employee/1
{
    "first_name" : "John",
    "last_name" :  "Smith",
    "age" :        25,
    "about" :      "I love to go rock climbing",
    "interests": [ "sports", "music" ]
}

2.POST自動生成ID插入

PUT /megacorp/employee
{
    "first_name" : "John",
    "last_name" :  "Smith",
    "age" :        25,
    "about" :      "I love to go rock climbing",
    "interests": [ "sports", "music" ]
}

3.批量插入

curl -XPOST localhost:9200/_bulk --data-binary @data.jsongithub

{"index":{"_index":"meterdata","_type":"autoData"}}
{"Mfid ":1,"TData":172170,"TMoney":209,"HTime":"2016-05-17T08:03:00"}
{"index":{"_index":"meterdata","_type":"autoData"}}
{"Mfid ":1,"TData":172170,"TMoney":209,"HTime":"2016-05-17T08:04:00"}
{"index":{"_index":"meterdata","_type":"autoData"}}
{"Mfid ":1,"TData":172170,"TMoney":209,"HTime":"2016-05-17T08:05:00"}
{"index":{"_index":"meterdata","_type":"autoData"}}
{"Mfid ":1,"TData":172170,"TMoney":209,"HTime":"2016-05-17T08:06:00"}
{"index":{"_index":"meterdata","_type":"autoData"}}
{"Mfid ":1,"TData":172170,"TMoney":209,"HTime":"2016-05-17T08:07:00"}

4.upsert插入

當文檔存在時，執行腳本；當文檔不存在時，upsert中的內容就會插入到對應的文檔中sql

curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
    "script" : {
        "inline": "ctx._source.counter += count",
        "params" : {
            "count" : 4
        }
    },
    "upsert" : {
        "counter" : 1
    }
}'

2、更新

可使用Script對全部的文檔執行更新操做，也可使用doc對部分文檔執行更新，也可使用upsert對不存在的文檔執行添加操做。shell

1.所有更新

curl -XPUT localhost:9200/test/type1/1 -d '{
    "counter" : 1,
    "tags" : ["red"]
}'

2.部分更新

curl -XPOST "localhost:9200/gengxin/update/1/_update?pretty" -d '
{
   "doc": {"job": "奮鬥者"}
}'

3.腳本更新

(1).更新部分字段

curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
    "script" : {
        "inline": "ctx._source.counter += count",
        "params" : {
            "count" : 4
        }
    }
}'

(2).新加字段

curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
    "script" : "ctx._source.name_of_new_field = \"value_of_new_field\""
}'

(3).移除字段

curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
    "script" : "ctx._source.remove(\"name_of_field\")"
}'

curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
    "script" : {
        "inline": "ctx._source.tags.contains(tag) ? ctx.op = \"delete\" : ctx.op = \"none\"",
        "params" : {
            "tag" : "blue"
        }
    }
}'

所有更新和部分更新區別？json

所有更新，是直接把以前的老數據，標記爲刪除狀態，而後，再添加一條更新的。 api

部分更新，只是修改某個字段。數組

參考：緩存

http://www.cnblogs.com/shihuc/p/5978078.html網絡

http://www.cnblogs.com/xing901022/p/5330778.html

3、刪除

curl -XDELETE 'http://localhost:9200/twitter/tweet/1'

路由

若是在索引的時候提供了路由，那麼刪除的時候，也須要指定相應的路由：

$ curl -XDELETE 'http://localhost:9200/twitter/tweet/1?routing=kimchy'

上面的例子中，想要刪除id爲1的索引，會經過固定的路由查找文檔。若是路由不正確，可能查不到相關的文檔。對於某種狀況，須要使用_routing參數，可是卻沒有任何的值，那麼刪除請求會廣播到每一個分片，執行刪除操做。

ES刪除總結

若是文檔存在，es會返回200 ok的狀態碼，found屬性值爲true，_version屬性的值+1。

　　若是文檔不存在，es會返回404 Not Found的狀態碼，found屬性值爲false，可是_version屬性的值依然會+1，這個就是內部管理的一部分，它保證了咱們在多個節點間的不一樣操做的順序都被正確標記了。

ES的刪除操做，也是不會當即生效，跟更新操做相似。只是會被標記爲已刪除狀態，ES後期會自動刪除。

比如，你刪除的操做一步一步累積，當達到它上限時，等你刪除幾十條數據後，ES我一次性刪除，這樣能夠節省磁盤IO。

參考：

http://www.cnblogs.com/zlslch/p/6421648.html

http://www.cnblogs.com/xing901022/archive/2016/03/26/5321659.html

4、查詢

1.query和filte

(1)查詢上下文：查詢操做不只僅會進行查詢，還會計算分值，用於肯定相關度；
(2)過濾器上下文：查詢操做僅判斷是否知足查詢條件，不會計算得分，查詢的結果能夠被緩存。
參考：http://www.cnblogs.com/xing901022/p/4975931.html

輕量級搜索,查詢字符串(query string)

GET /megacorp/employee/_search?q=last_name:Smith

2.Filter DSL

(1)term

表明徹底匹配，即不進行分詞器分析，文檔中必須包含整個搜索的詞彙（若是爲中文，默認當個字爲一個索引，只能搜索到單個字）

POST /megacorp/employee/_search
{
  "query": {
    "term": {
      "last_name": "Smith"
    }
  }
}

(2)terms 過濾

terms 跟 term 有點相似，但 terms 容許指定多個匹配條件。若是某個字段指定了多個值，那麼文檔須要一塊兒去作匹配：

POST /megacorp/employee/_search
{
  "query": {
    "terms": {
      "last_name": ["Bob","Smith"]
    }
  }
}

(3)range 過濾

容許咱們按照指定範圍查找一批數據

POST /megacorp/employee/_search
{
  "query": {
    "range": {
      "age": {
        "gt": 18
      }
    }
  }
}

(4)exists 和 missing 過濾

能夠用於查找文檔中是否包含指定字段或沒有某個字段，相似於SQL語句中的IS_NULL條件.

POST /megacorp/employee/_search
{
  "query": {
    "exists":   {
        "field":    "title"
    }
  }
}

(5)Bool合併查詢(Combining Queries)

使用bool過濾器來合併多個過濾器以實現and，or和not邏輯。should知足的匹配度更高。must語句都須要匹配，而全部的must_not語句都不能匹配。默認狀況下，should語句一個都不要求匹配，只有一個特例：若是查詢中沒有must語句，那麼至少要匹配一個should語句。minimum_should_match參數來控制should語句須要匹配的數量，該參數能夠是一個絕對數值或者一個百分比。

GET /my_index/my_type/_search
{
  "query": {
    "bool": {
      "must":     { "match": { "title": "quick" }},
      "must_not": { "match": { "title": "lazy"  }},
      "should": [
                  { "match": { "title": "brown" }},
                  { "match": { "title": "dog"   }}
      ]
    }
  }
}

(6)過濾器(filter)

來實現sql中where的效果, 好比：搜索一個叫Smith,且年齡大於30的員工,能夠這麼檢索.

POST /megacorp/employee/_search
{
  "query" : {
      "filtered" : {
          "filter" : {
              "range" : {
                  "age" : { "gt" : 30 } 
              }
          },
          "query" : {
              "match" : {
                  "last_name" : "Smith" 
              }
          }
      }
  }
}

(7)聚合(aggregations)

它容許你在數據上生成複雜的分析統計,相似於sql中的group by

GET /megacorp/employee/_search
{
"aggs": {
  "all_interests": {
    "terms": { "field": "interests" }
  }
}
}

聚合也容許分級彙總。例如，讓咱們統計每種興趣下職員的平均年齡

GET /megacorp/employee/_search
{
  "aggs" : {
      "all_interests" : {
          "terms" : { "field" : "interests" },
          "aggs" : {
              "avg_age" : {
                  "avg" : { "field" : "age" }
              }
          }
      }
  }
}

3.Query DSL

(1)match_all 查詢

能夠查詢到全部文檔，是沒有查詢條件下的默認語句。

POST /index/doc/_search
{
	"query" : {
		"match_all": {}
	}
}

(2)match查詢

一個標準查詢，無論你須要全文本查詢仍是精確查詢基本上都要用到它。

POST /index/doc/_search
{
  "query" : {
      "match" : {
          "title" : "中國杭州"
      }
  }
}

match查詢接受一個operator參數，該參數的默認值是"or"。能夠將它改變爲"and"來要求全部的詞條都須要被匹配，來提升搜索精度。

POST /index/doc/_search
{
    "query": {
        "match": {
            "title": {      
                "query":    "中國 杭州",
                "operator": "and"
            }
        }
    }
}

控制精度(Controlling Precision)，在下面擁有3個詞條的例子中，75%會被向下舍入到66.6%，即3個詞條中的2個。不管你輸入的是什麼，至少有2個詞條被匹配時，該文檔纔會被算做最終結果中的一員。

GET /index/doc/_search
{
  "query": {
    "match": {
      "title": {
        "query":                  "中國杭州",
        "minimum_should_match":   "75%"
      }
    }
  }
}

分值計算(Score Calculation)

bool查詢經過將匹配的must和should語句的_score相加，而後除以must和should語句的總數來獲得相關度分值_score。must_not語句不會影響分值；它們惟一的目的是將不須要的文檔排除在外。

(3)multi_match查詢

容許你作match查詢的基礎上同時搜索多個字段，在多個字段中同時查一個：

POST /index/doc/_search
{
  "query" : {
  	"multi_match": {
		"query":	"中國",
		"fields":	[ "content", "title" ]
	}
  }
}

(4)match_phrase短語搜索(phrases)

match_phrase與match的區別在於,前者會命中」rock「「climbing」（有序）所有匹配到的數據，然後者會命中rock balabala climbing ，前者可用調節因子slop控制不匹配的數量。

GET /megacorp/employee/_search
{
  "query" : {
      "match_phrase" : {
          "about" : "rock climbing",
          "slop" : 1
      }
  }
}

(5)bool 查詢

與 bool 過濾類似，用於合併多個查詢子句。不一樣的是，bool 過濾能夠直接給出是否匹配成功，而bool 查詢要計算每個查詢子句的 _score （相關性分值）。
    must:: 查詢指定文檔必定要被包含。
    must_not:: 查詢指定文檔必定不要被包含。
    should:: 查詢指定文檔，有則能夠爲文檔相關性加分。

(6)wildcards 查詢

使用標準的shell通配符查詢

POST /index/doc/_search
{
  "query": {
    "wildcard": {
      "content": "中*"
    }
  }
}

(7)regexp查詢

使用regexp查詢可以讓你寫下更復雜的模式（中文只能匹配單個字開頭）

POST /index/doc/_search
{
  "query": {
    "regexp": {
      "content": "中.*"
    }
  }
}

(8)prefix查詢

以什麼字符開頭的，能夠更簡單地用 prefix

POST /index/doc/_search
{
  "query": {
    "prefix": {
      "content": "中"
    }
  }
}

參考：
http://blog.csdn.net/dm_vincent/article/details/41720193
http://www.cnblogs.com/ghj1976/p/5293250.html

4.Mapping

什麼是mapping

ES的mapping很是相似於靜態語言中的數據類型：聲明一個變量爲int類型的變量，之後這個變量都只能存儲int類型的數據。一樣的，一個number類型的mapping字段只能存儲number類型的數據。

同語言的數據類型相比，mapping還有一些其餘的含義，mapping不只告訴ES一個field中是什麼類型的值，它還告訴ES如何索引數據以及數據是否能被搜索到。

剖析mapping

一個mapping由一個或多個analyzer組成，一個analyzer又由一個或多個filter組成的。當ES索引文檔的時候，它把字段中的內容傳遞給相應的analyzer，analyzer再傳遞給各自的filters。

filter的功能很容易理解：一個filter就是一個轉換數據的方法，輸入一個字符串，這個方法返回另外一個字符串，好比一個將字符串轉爲小寫的方法就是一個filter很好的例子。

一個analyzer由一組順序排列的filter組成，執行分析的過程就是按順序一個filter一個filter依次調用， ES存儲和索引最後獲得的結果。

總結來講， mapping的做用就是執行一系列的指令將輸入的數據轉成可搜索的索引項。

默認analyzer

回到咱們的例子， ES猜想description字段是string類型，因而默認建立一個string類型的mapping，它使用默認的全局analyzer，默認的analyzer是標準analyzer, 這個標準analyzer有三個filter：token filter, lowercase filter和stop token filter。

（1）新增

PUSH  /libray/books
{
    "settings" : {
        "number_of_shards" : 2,
        "number_of_replicas" : 1
    },
    "mappings" : {
        "books" : {
            "properties" : {
                "name" : {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "year" : {
                    "type" : "integer"
                },
                "detail" : {
                    "type" : "string"
                }
            }
        }
    }
}

（2）刪除索引中全部映射

DELETE  /libray/_mapping

（3）刪除指定映射索引

DELETE  /libray/_mapping/books

參考

http://m.blog.csdn.net/lilongsheng1125/article/details/53862629

5.查詢補充

(1).source filter 限制返回字段

_source檢索設置爲false參數關閉檢索

GET /_search
{
    "_source": "obj.*, obj2.*",
    "query" : {
        "match_all" : {}
    }
}

complete control

GET /_search
{
    "_source": {
        "includes": [ "obj1.*", "obj2.*" ],
        "excludes": [ "*.description" ]
    },
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}

(2)sort排序

POST /bank/_search
{
    "query": {
        "match_all" : {} 
    },
    "sort" : [
        {
            "age" : "asc"
        }
    ]
}

分類模式選項編輯

Elasticsearch支持按數組或多值字段進行排序。該mode選項控制選擇哪一個數組值來排序它所屬的文檔。該mode選項能夠具備如下值：

`min`	選擇最低的價值。
`max`	選擇最高的價值。
`sum`	使用全部值的總和做爲排序值。僅適用於基於數字的數組字段。
`avg`	使用全部值的平均值做爲排序值。僅適用於基於數字的數組字段。
`median`	使用全部值的中位數做爲排序值。僅適用於基於數字的數組字段。

（3）Post Filter 後置過濾器

用於過濾搜索結果和聚合的過濾器，post_filter元素是一個頂層元素，只會對搜索結果進行過濾。

GET /cars/transactions/_search?search_type=count
{
    "query": {
        "match": {
            "make": "ford"
        }
    },
    "post_filter": {    
        "term" : {
            "color" : "green"
        }
    },
    "aggs" : {
        "all_colors": {
            "terms" : { "field" : "color" }
        }
    }
}

（4）explain

對每一個命中的分數進行解釋。

GET /bank/_search
{
    "explain" : true,
    "query": {
        "bool" : {
            "filter" : {
                "term" : {
                    "age" : 39
                }
            }
        }
    }
}

（5）version

爲每一個搜索命中返回一個版本。

GET /bank/_search
{
    "version": true,
    "query": {
    	"bool" : {
    		"filter" : {
    			"term" : {
    				"age" : 39
    			}
    		}
    	}
    }
}

（6）min_score

排除_score小於如下指定最小值的文檔min_score

GET /_search
{
    "min_score": 0.5,
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}

（7）inner_hits

返回父文檔，也返回匹配has-child條件的子文檔，至關於在父子之間join

例子：假設咱們使用父文檔存儲郵件內容，子文檔存儲每一個郵件擁有者的信息以及對於此用戶這封郵件的狀態。搜索某個帳戶的郵件列表時，咱們但願搜索到郵件內容和郵件狀態，能夠設想假如沒有Inner-hits，咱們必須得分兩次查詢，由於郵件內容和郵件狀態分別存放在父文檔和子文檔中。而有了Inner_hits屬性後，咱們可使用一次查詢完成。

curl -XGET  'http://localhost:9200/hermes/email/_search/?pretty=true' -d  '{
 "query": {
    "has_child": {
      "type": "email_owner",
      "query": {
        "bool": {
          "must": [
            { "term": { "owner": "13724100993@189.cn" } },
            {"term": {"labelId": "1"} }
          ]}
      },
      //注意此處
       "inner_hits": {} 
    }
  }
}'

（8）mget批量查詢

若是一次性要查詢多條數據的話，那麼必定要用batch批量操做的api，儘量減小網絡開銷次數，可能能夠將性能提高數倍，甚至數十倍。

POST  http://localhost:9200/bank/_mget
{
	"docs" : [
	{
		"_type" : "accout",
		"_id" : 1
	},{
		"_type" : "accout",
		"_id" : 2
	}]
}

5、補充

強烈推薦：

Elasticsearch5.2核心知識篇 http://www.jianshu.com/nb/13767185

Elasticsearch5.2高手進階篇 http://www.jianshu.com/nb/14337815

分詞器

es 默認分詞器原理：中文以單個字爲單位進行分詞，英文以空格或者標點爲單位進行分詞。

match與term http://blog.csdn.net/yangwenbo214/article/details/54142786

倒排索引

可參考 http://blog.csdn.net/wang_zhenwei/article/details/52831992

http://www.jianshu.com/p/ed7e1ebb2fb7

http://www.infoq.com/cn/articles/database-timestamp-02?utm_source=infoq&utm_medium=related_content_link&utm_campaign=relatedContent_articles_clk

filters特性 http://www.cnblogs.com/bmaker/p/5480006.html

過濾查詢以及聚合 http://blog.csdn.net/dm_vincent/article/details/42757519

_all http://blog.csdn.net/jiao_fuyou/article/details/49800969

Elasticsearch 字段數據類型 :

http://www.jianshu.com/p/ab99d2bcd63d

http://blog.csdn.net/ntc10095/article/details/73730772(推薦)

ES部分原理介紹：https://www.cnblogs.com/valor-xh/p/6095894.html

1. Elasticsearch入門（二）
2. elasticsearch入門篇
3. Elasticsearch入門
4. elasticSearch入門
5. ElasticSearch入門
6. Elasticsearch 7 入門
7. ElasticSearch 入門
8. elasticsearch入門
9. Node + Elasticsearch入門
10. [ElasticSearch]入門篇
更多相關文章...
• Memcached入門教程 - NoSQL教程
• Neo4j數據庫入門教程 - NoSQL教程
• YAML 入門教程
• Java Agent入門實戰（一）-Instrumentation介紹與使用

相關標籤/搜索

入門

elasticsearch+elasticsearch

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。