Elasticsearch學習筆記——安裝、數據導入和查詢

時間 2019-12-05

原文原文鏈接

到elasticsearch網站下載最新版本的elasticsearch 6.2.1html

https://www.elastic.co/downloads/elasticsearch

中文文檔請參考java

https://www.elastic.co/guide/cn/elasticsearch/guide/current/index.html

英文文檔及其Java API使用方法請參考，官方文檔比任何博客均可信數據庫

https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/index.html

Python API使用方法json

http://elasticsearch-py.readthedocs.io/en/master/

下載tar包，而後解壓到/usr/local目錄下，修改一下用戶和組以後能夠使用非root用戶啓動，啓動命令bootstrap

./bin/elasticsearch

而後訪問http://127.0.0.1:9200/api

若是須要讓外網訪問Elasticsearch的9200端口的話，須要將es的host綁定到外網bash

修改 /configs/elasticsearch.yml文件，添加以下app

network.host: 0.0.0.0
http.port: 9200

而後重啓，若是遇到下面問題的話curl

[2018-01-28T23:51:35,204][INFO ][o.e.b.BootstrapChecks    ] [qR5cyzh] bound or publishing to a non-loopback address, enforcing bootstrap checks
ERROR: [2] bootstrap checks failed
[1]: max file descriptors [4096] for elasticsearch process is too low, increase to at least [65536]
[2]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

解決方法elasticsearch

在root用戶下執行

sysctl -w vm.max_map_count=262144

接下來導入json格式的數據，數據內容以下

{"index":{"_id":"1"}}
{"title":"許寶江","url":"7254863","chineseName":"許寶江","sex":"男","occupation":" 灤縣農業局局長","nationality":"中國"}
{"index":{"_id":"2"}}
{"title":"鮑志成","url":"2074015","chineseName":"鮑志成","occupation":"醫師","nationality":"中國","birthDate":"1901年","deathDate":"1973年","graduatedFrom":"香港大學"}

須要注意的是{"index":{"_id":"1"}}和文件末尾另起一行換行是不可少的

其中的id能夠從0開始，甚至是abc等等

不然會出現400狀態，錯誤提示分別爲

Malformed action/metadata line [1], expected START_OBJECT or END_OBJECT but found [VALUE_STRING]

The bulk request must be terminated by a newline [\n]"

使用下面命令來導入json文件

其中的people.json爲文件的路徑，能夠是/home/common/下載/xxx.json

其中的es是index，people是type，在elasticsearch中的index和type能夠理解成關係數據庫中的database和table，二者都是必不可少的

curl -H "Content-Type: application/json" -XPOST 'localhost:9200/es/people/_bulk?pretty&refresh' --data-binary "@people.json"

成功後的返回值是200，好比

{
  "took" : 233,
  "errors" : false,
  "items" : [
    {
      "index" : {
        "_index" : "es",
        "_type" : "people",
        "_id" : "1",
        "_version" : 1,
        "result" : "created",
        "forced_refresh" : true,
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 0,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
      "index" : {
        "_index" : "es",
        "_type" : "people",
        "_id" : "2",
        "_version" : 1,
        "result" : "created",
        "forced_refresh" : true,
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 0,
        "_primary_term" : 1,
        "status" : 201
      }
    }
  ]
}

<0>查看字段的mapping

http://localhost:9200/es/people/_mapping

接下來能夠使用對應的查詢語句對數據進行查詢

<1>按id來查詢

http://localhost:9200/es/people/1

<2>簡單的匹配查詢，查詢某個字段中包含某個關鍵字的數據（GET）

http://localhost:9200/es/people/_search?q=_id:1

http://localhost:9200/es/people/_search?q=title:許

<3>多字段查詢，在多個字段中查詢包含某個關鍵字的數據（POST）

能夠使用Firefox中的RESTer插件來構造一個POST請求，在升級到Firefox quantum以後，原來使用的Poster插件掛了

在title和sex字段中查詢包含許字的數據

{
    "query": {
        "multi_match" : {
            "query" : "許",
            "fields": ["title", "sex"]
        }
    }
}

還能夠額外指定返回值

size指定返回的數量

from指定返回的id起始值

_source指定返回的字段

highlight指定語法高亮

{
    "query": {
        "multi_match" : {
            "query" : "中國",
            "fields": ["nationality", "sex"]
        }
    },
    "size": 2,
    "from": 0,
    "_source": [ "title", "sex", "nationality" ],
    "highlight": {
        "fields" : {
            "title" : {}
        }
    }
}

<4>Boosting

用於提高字段的權重，能夠將max_score的分數乘以一個係數

{
    "query": {
        "multi_match" : {
            "query" : "中國",
            "fields": ["nationality^3", "sex"]
        }
    },
    "size": 2,
    "from": 0,
    "_source": [ "title", "sex", "nationality" ],
    "highlight": {
        "fields" : {
            "title" : {}
        }
    }
}

<5>組合查詢，能夠實現一些比較複雜的查詢

AND -> must

NOT -> must not

OR -> should

{
    "query": {
        "bool": {
            "must": {
                "bool" : { 
                    "should": [
                      { "match": { "title": "鮑" }},
                      { "match": { "title": "許" }} ],
                    "must": { "match": {"nationality": "中國" }}
                }
            },
            "must_not": { "match": {"sex": "女" }}
        }
    }
}

<6>模糊（Fuzzy）查詢（POST）

{
    "query": {
        "multi_match" : {
            "query" : "廠長",
            "fields": ["title", "sex","occupation"],
            "fuzziness": "AUTO"
        }
    },
    "_source": ["title", "sex", "occupation"],
    "size": 1
}

經過模糊匹配將廠長和局長匹配上

AUTO的時候，當query的長度大於5的時候，模糊值指定爲2

<7>通配符（Wildcard）查詢（POST）

？ 匹配任何字符

* 匹配零個或多個字

{
    "query": {
        "wildcard" : {
            "title" : "*寶"
        }
    },
    "_source": ["title", "sex", "occupation"],
    "size": 1
}

<8>正則（Regexp）查詢（POST）

{
    "query": {
        "regexp" : {
            "authors" : "t[a-z]*y"
        }
    },
    "_source": ["title", "sex", "occupation"],
    "size": 3
}

<9>短語匹配（Match Phrase）查詢（POST）

短語匹配查詢 要求在請求字符串中的全部查詢項必須都在文檔中存在，文中順序也得和請求字符串一致，且彼此相連。

默認狀況下，查詢項之間必須緊密相連，但能夠設置 slop 值來指定查詢項之間能夠分隔多遠的距離，結果仍將被看成一次成功的匹配。

{
    "query": {
        "multi_match" : {
            "query" : "許長江",
            "fields": ["title", "sex","occupation"],
            "type": "phrase"
        }
    },
    "_source": ["title", "sex", "occupation"],
    "size": 3
}

注意使用slop的時候距離是累加的，灤農局和灤縣農業局差了2個距離

{
    "query": {
        "multi_match" : {
            "query" : "灤農局",
            "fields": ["title", "sex","occupation"],
            "type": "phrase",
            "slop":2
        }
    },
    "_source": ["title", "sex", "occupation"],
    "size": 3
}

<10>短語前綴（Match Phrase Prefix）查詢（POST）