Elasticsearch由淺入深（九）搜索引擎：query DSL、filter與query、query搜索實戰

時間 2019-11-06

標籤 elasticsearch 由淺入深搜索引擎 query dsl filter 搜索實戰欄目日誌分析简体版

原文原文鏈接

search api的基本語法

語法概要：web

GET /_search
{}

GET /index1,index2/type1,type2/_search
{}

GET /_search
{
  "from": 0,
  "size": 10
}

http協議中get是否能夠帶上request body？sql

HTTP協議，通常不容許get請求帶上request body，可是由於get更加適合描述查詢數據的操做，所以仍是這麼用了。api

不少瀏覽器，或者是服務器，也都支持GET+request body模式瀏覽器

若是遇到不支持的場景，也能夠用POST /_search服務器

GET /_search?from=0&size=10

POST /_search
{
  "from":0,
  "size":10
}

query DSL

一個例子讓你明白什麼是query DSL

GET /_search
{
    "query": {
        "match_all": {}
    }
}

Query DSL的基本語法

GET /{index}/_search/{type}
{
    "各類條件"
}

示例：app

GET /test_index/test_type/_search 
{
  "query": {
    "match": {
      "test_field": "test"
    }
  }
}


{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0.843298,
    "hits": [
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "6",
        "_score": 0.843298,
        "_source": {
          "test_field": "test test"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "8",
        "_score": 0.43445712,
        "_source": {
          "test_field": "test client 2"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "7",
        "_score": 0.25316024,
        "_source": {
          "test_field": "test client 1"
        }
      }
    ]
  }
}

組合多個搜索條件

搜索需求：title必須包含elasticsearch，content能夠包含elasticsearch也能夠不包含，author_id必須不爲111nosql

構造數據：elasticsearch

PUT /website/article/1
{
  "title":"my elasticsearch article",
  "content":"es is very bad",
  "author_id":110
}

PUT /website/article/2
{
  "title":"my hadoop article",
  "content":"hadoop is very bad",
  "author_id":111
}

PUT /website/article/3
{
  "title":"my hadoop article",
  "content":"hadoop is very good",
  "author_id":111
}

組合查詢：ide

GET /website/article/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title": "elasticsearch"
          }
        }
      ],
      "should": [
        {
          "match": {
            "content": "elasticsearch"
          }
        }
      ],
      "must_not": [
        {
          "match": {
            "author_id": 111
          }
        }
      ]
    }
  }
}

查詢結果：oop

{
  "took": 7,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.25316024,
    "hits": [
      {
        "_index": "website",
        "_type": "article",
        "_id": "1",
        "_score": 0.25316024,
        "_source": {
          "title": "my elasticsearch article",
          "content": "es is very bad",
          "author_id": 110
        }
      }
    ]
  }
}

View Code

filter與query

初始化數據：

PUT /company/employee/2
{
  "address": {
    "country": "china",
    "province": "jiangsu",
    "city": "nanjing"
  },
  "name": "tom",
  "age": 30,
  "join_date": "2016-01-01"
}

PUT /company/employee/3
{
  "address": {
    "country": "china",
    "province": "shanxi",
    "city": "xian"
  },
  "name": "marry",
  "age": 35,
  "join_date": "2015-01-01"
}

搜索請求：年齡必須大於等於30，同時join_date必須是2016-01-01

GET /company/employee/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "join_date": "2016-01-01"
          }
        }
      ],
      "filter": {
        "range": {
          "age": {
            "gte": 30
          }
        }
      }
    }
  }
}

filter與query對比大揭祕

filter，僅僅只是按照搜索條件過濾出須要的數據而已，不計算任何相關度分數，對相關度沒有任何影響
query，會去計算每一個document相對於搜索條件的相關度，並按照相關度進行排序

通常來講，若是你是在進行搜索，須要將最匹配搜索條件的數據先返回，那麼用query；若是你只是要根據一些條件篩選出一部分數據，不關注其排序，那麼用filter

除非是你的這些搜索條件，你但願越符合這些搜索條件的document越排在前面返回，那麼這些搜索條件要放在query中；若是你不但願一些搜索條件來影響你的document排序，那麼就放在filter中便可

filter與query性能

filter，不須要計算相關度分數，不須要按照相關度分數進行排序，同時還有內置的自動cache最常使用filter的數據
query，相反，要計算相關度分數，按照分數進行排序，並且沒法cache結果

Elasticsearch 實戰各類query搜索

各類query搜索語法

match_all

GET /_search
{
    "query": {
        "match_all": {}
    }
}

match

GET /{index}/_search
{
  "query": {
    "match": {
      "FIELD": "TEXT"
    }
  }
}

multi match

GET /{index}/_search
{
  "query": {
    "multi_match": {
      "query": "",
      "fields": []
    }
  }
}

示例

GET /test_index/test_type/_search
{
  "query": {
    "multi_match": {
      "query": "test",
      "fields": ["test_field", "test_field1"]
    }
  }
}

View Code

range query

GET /{index}/_search
{
  "query": {
    "range": {
      "FIELD": {
        "gte": 10,
        "lte": 20
      }
    }
  }
}

示例

GET /company/employee/_search 
{
  "query": {
    "range": {
      "age": {
        "gte": 30
      }
    }
  }
}

View Code

term query(與match相比不分詞)

GET /{index}/_search
{
  "query": {
    "term": {
      "FIELD": {
        "value": "VALUE"
      }
    }
  }
}

示例

GET /test_index/test_type/_search 
{
  "query": {
    "term": {
      "test_field": "test hello"
    }
  }
}

View Code

terms query

GET /{index}/_search
{
  "query": {
    "terms": {
      "FIELD": [
        "VALUE1",
        "VALUE2"
      ]
    }
  }
}

示例

GET /_search
{
    "query": { "terms": { "tag": [ "search", "full_text", "nosql" ] }}
}

View Code

exist query

GET /{index}/_search
{
  "query": {
    "exists": {
       "field": ""
    }
  }
}

多搜索條件組合查詢

bool: must, must_not, should, filter

每一個子查詢都會計算一個document針對它的相關度分數，而後bool綜合全部分數，合併爲一個分數，固然filter是不會計算分數的。

GET /company/employee/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "range": {
          "age": {
            "gte": 30
          }
        }
      }
    }
  }
}

定位不合法的搜索

通常用在那種特別複雜龐大的搜索下，好比你一會兒寫了上百行的搜索，這個時候能夠先用validate api去驗證一下，搜索是否合法

GET /test_index/test_type/_validate/query?explain
{
  "query": {
    "math": {
      "test_field": "test"
    }
  }
}

{
  "valid": false,
  "error": "org.elasticsearch.common.ParsingException: no [query] registered for [math]"
}

正常數據

GET /test_index/test_type/_validate/query?explain
{
  "query":{
    "match":{
      "test_field":"test"
    }
  }
}


{
  "valid": true,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "explanations": [
    {
      "index": "test_index",
      "valid": true,
      "explanation": "+test_field:test #(#_type:test_type)"
    }
  ]
}

定製搜索結果的排序規則

默認狀況下，返回的document是按照_score降序排列的。若是咱們想本身定義排序規則怎麼辦，此時只須要使用sort便可

語法：

# 主要語法
"sort": [
    {
      "FIELD": {
        "order": "desc"
      }
    }
  ]
# 總體位置
GET /{index}/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "exists": {
          "field": ""
        }
      },
      "boost": 1.2
    }
  },
  "sort": [
    {
      "FIELD": {
        "order": "desc"
      }
    }
  ]
}

示例：

GET company/employee/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "range": {
          "age": {
            "gte": 30
          }
        }
      }
    }
  },
  "sort": [
    {
      "join_date": {
        "order": "asc"
      }
    }
  ]
}

將一個field索引兩次來解決字符串排序問題

若是某個字段的類型是text，在建立索引的時候，針對每一個document，對應的這個text字段都會對內容進行分詞。因爲ES不容許對已經存在的field的類型進行修改，就會致使該字段一直都是會被分詞，那麼若是以後有需求想對該字段排序，就不行了。具體看下面展現的示例。

# 刪除原來的刪除索引
DELETE /website

# 手動創建索引 
PUT /website
{
  "mappings": {
    "article": {
      "properties": {
        "title":{
          "type": "text",
          "fields": {
            "raw":{
              "type": "string",
              "index": "not_analyzed"
            }
          },
          "fielddata": true
        },
        "content":{
          "type": "text"
        },
        "post_date":{
          "type": "date"
        },
        "author_id":{
          "type": "long"
        }
      }
    }
  }
}

插入模擬數據

PUT /website/article/1
{
  "title": "second article",
  "content": "this is my second article",
  "post_date": "2017-01-01",
  "author_id": 110
}

PUT /website/article/2
{
  "title": "first article",
  "content": "this is my first article",
  "post_date": "2017-02-01",
  "author_id": 110
}

PUT /website/article/3
{
  "title": "third article",
  "content": "this is my third article",
  "post_date": "2017-03-01",
  "author_id": 110
}

按照不分詞排序

GET /website/article/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "title.raw": {
        "order": "desc"
      }
    }
  ]
}

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。