Elasticsearch(查詢詳解)

時間 2019-12-06

原文原文鏈接

Elasticsearch查詢類型

Elasticsearch支持兩種類型的查詢：基本查詢和複合查詢。基本查詢，如詞條查詢用於查詢實際數據。複合查詢，如布爾查詢，能夠合併多個查詢，然而，這不是所有。除了這兩種類型的查詢，你還能夠用過濾查詢，根據必定的條件縮小查詢結果。不像其餘查詢，篩選查詢不會影響得分，並且一般很是高效。更加複雜的狀況，查詢能夠包含其餘查詢。此外，一些查詢能夠包含過濾器，而其餘查詢可同時包含查詢和過濾器。這並非所有，但暫時先解釋這些工做。php

1.簡單查詢

這種查詢方式很簡單，但比較侷限。查詢last_name字段中含有smith一詞的文檔，能夠這樣寫：sql

http://127.0.0.1:9200/megacorp/employee/_searchjson

{
    "query" : {
        "query_string" : { 
            "query" : "last_name:smith" 
        }
    }
}

返回格式以下:數組

{
  "took": 15,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.30685282,
    "hits": [
      {
        "_index": "megacorp",
        "_type": "employee",
        "_id": "2",
        "_score": 0.30685282,
        "_source": {
          "first_name": "Jane",
          "last_name": "Smith",
          "age": 32,
          "about": "I like to collect rock albums",
          "interests": [
            "music"
          ]
        }
      },
      {
        "_index": "megacorp",
        "_type": "employee",
        "_id": "1",
        "_score": 0.30685282,
        "_source": {
          "first_name": "John",
          "last_name": "Smith",
          "age": 25,
          "about": "I love to go rock climbing",
          "interests": [
            "sports",
            "music"
          ]
        }
      }
    ]
  }
}

pretty=true參數會讓Elasticsearch以更容易閱讀的方式返回響應。性能

2.分頁和結果集大小（form、size）

Elasticsearch能控制想要的最多結果數以及想從哪一個結果開始。下面是能夠在請求體中添加的兩個額外參數。 from：該屬性指定咱們但願在結果中返回的起始文檔。它的默認值是0，表示想要獲得從第一個文檔開始的結果。 size：該屬性指定了一次查詢中返回的最大文檔數，默認值爲10。若是隻對切面結果感興趣，並不關心文檔自己，能夠把這個參數設置成0。若是想讓查詢從第2個文檔開始返回20個文檔，能夠發送以下查詢：rest

{
    "version" : true,//返回版本號
    "from" : 1,//從哪一個文檔開始（數組因此有0）
    "size" : 20,//返回多少個文檔
    "query" : {
        "query_string" : { 
            "query" : "last_name:smith" 
        }
    }
}

選擇返回字段（fields）

只返回age，about和last_name字段code

{
    "fields":[ "age", "about","last_name" ],
    "query" : {
        "query_string" : { 
            "query" : "last_name:Smith" 
        }
    }
}

返回格式以下:orm

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.30685282,
    "hits": [
      {
        "_index": "megacorp",
        "_type": "employee",
        "_id": "2",
        "_score": 0.30685282,
        "fields": {
          "about": [
            "I like to collect rock albums"
          ],
          "last_name": [
            "Smith"
          ],
          "age": [
            32
          ]
        }
      },
      {
        "_index": "megacorp",
        "_type": "employee",
        "_id": "1",
        "_score": 0.30685282,
        "fields": {
          "about": [
            "I love to go rock climbing"
          ],
          "last_name": [
            "Smith"
          ],
          "age": [
            25
          ]
        }
      }
    ]
  }
}

若是沒有定義fields數組，它將用默認值，若是有就返回_source字段；
若是使用_source字段，而且請求一個沒有存儲的字段，那麼這個字段將從_source字段中提取（然而，這須要額外的處理）；
若是想返回全部的存儲字段，只需傳入星號（*）做爲字段名字。 從性能的角度，返回_source字段比返回多個存儲字段更好。

部分字段（include、exclude）

Elasticsearch公開了部分字段對象的include和exclude屬性，因此能夠基於這些屬性來包含或排除字段。例如，爲了在查詢中包括以titl開頭且排除以chara開頭的字段，發出如下查詢：對象

{
    "partial_fields" : {
        "partial1" : {
            "include" : [ "titl*" ],
            "exclude" : [ "chara*" ]
        }
    },
    "query" : {
        "query_string" : { "query" : "title:crime" }
    }
}

腳本字段(script_fields)

在JSON的查詢對象中加上script_fields部分，添加上每一個想返回的腳本值的名字。若要返回一個叫correctYear的值，它用year字段減去1800計算得來，運行如下查詢：索引

{
	"script_fields" : {
		"correctYear" : {
			"script" : "doc['year'].value - 1800"
		}
	},
	"query" : {
		"query_string" : { "query" : "title:crime" }
	}
}

上面的示例中使用了doc符號，它讓咱們捕獲了返回結果，從而讓腳本執行速度更快，但也致使了更高的內存消耗，而且限制了只能用單個字段的單個值。若是關心內存的使用，或者使用的是更復雜的字段值，能夠用_source字段。使用此字段的查詢以下所示

{
	"script_fields" : {
		"correctYear" : {
			"script" : "_source.year - 1800"
		}
	},
	"query" : {
		"query_string" : { "query" : "title:crime" }
	}
}

返回格式以下:

{
	"took" : 1,
	"timed_out" : false,
	"_shards" : {
		"total" : 5,
		"successful" : 5,
		"failed" : 0
	},
	"hits" : {
		"total" : 1,
		"max_score" : 0.15342641,
		"hits" : [ {
			"_index" : "library",
			"_type" : "book",
			"_id" : "4",
			"_score" : 0.15342641,
			"fields" : {
				"correctYear" : [ 86 ]
			}
		} ]
	}
}

傳參數到腳本字段中（script_fields）

一個腳本字段的特性：可傳入額外的參數。可使用一個變量名稱，並把值傳入params節中，而不是直接把1800寫在等式中。這樣作之後，查詢將以下所示：

{
	"script_fields" : {
		"correctYear" : {
			"script" : "_source.year - paramYear",
			"params" : {
				"paramYear" : 1800
			}
		}
	},
	"query" : {
		"query_string" : { "query" : "title:crime" }
	}
}

基本查詢

單詞條查詢:

最簡單的詞條查詢以下所示：

{
    "query" : {
        "term" : {
            "last_name" : "smith"
        }
    }
}

多詞條查詢:

假設想獲得全部在tags字段中含有novel或book的文檔。運行如下查詢來達到目的：

{
    "query" : {
        "terms" : {
            "tags" : [ "novel", "book" ],
            "minimum_match" : 1
        }
    }
}

上述查詢返回在tags字段中包含一個或兩個搜索詞條的全部文檔.minimum_match屬性設置爲1；這意味着至少有1個詞條應該匹配。若是想要查詢匹配全部詞條的文檔，能夠把minimum_match屬性設置爲2。

match_all 查詢

若是想獲得索引中的全部文檔，只需運行如下查詢：

{
    "query" : {
        "match_all" : {}
    }
}

match 查詢

{
    "query" : {
        "match" : {
            "title" : "crime and punishment"
        }
    }
}

上面的查詢將匹配全部在title字段含有crime、and或punishment詞條的文檔。

match查詢的幾種類型

1 布爾值匹配查詢（operator）

{
    "query" : {
        "match" : {
            "title" : {
                "query" : "crime and punishment",
                "operator" : "and"
            }
        }
    }
}

operator參數可接受or和and,用來決定查詢中的全部條件的是or仍是and。

2 match_phrase查詢（slop）

這個能夠查詢相似 a+x+b，其中x是未知的。即知道了a和b，x未知的結果也能夠查詢出來。

{
    "query" : {
        "match_phrase" : {
            "title" : {
                "query" : "crime punishment",
                "slop" : 1
            }
        }
    }
}

注意，咱們從查詢中移除了and一詞，但由於slop參數設置爲1，它仍將匹配咱們的文檔。

slop：這是一個整數值，該值定義了文本查詢中的詞條和詞條之間能夠有多少個未知詞條，以被視爲跟一個短語匹配。此參數的默認值是0，這意味着，不容許有額外的詞條，即上面的x能夠是多個。

3 match_phrase_prefix查詢

{
    "query" : {
        "match_phrase_prefix" : {
            "title" : {
                "query" : "crime and punishm",
                "slop" : 1,
                "max_expansions" : 20
            }
        }
       }
}

注意，咱們沒有提供完整的「crime and punishment」短語，而只是提供「crime and punishm」，該查詢仍將匹配咱們的文檔。

multi_match 查詢

multi_match查詢和match查詢同樣，可是能夠經過fields參數針對多個字段查詢。固然，match查詢中可使用的全部參數一樣能夠在multi_match查詢中使用。因此，若是想修改match查詢，讓它針對title和otitle字段運行，那麼運行如下查詢：

{
    "query" : {
        "multi_match" : {
            "query" : "crime punishment",
            "fields" : [ "title", "otitle" ]
        }
    }
}

前綴查詢

想找到全部title字段以cri開始的文檔，能夠運行如下查詢：

{
    "query" : {
        "prefix" : {
            "title" : "cri"
        }
    }
}

通配符查詢

這裏?表示任意字符：

{
    "query" : {
        "wildcard" : {
            "title" : "cr?me"
        }
    }
}

範圍查詢

gte：範圍查詢將匹配字段值大於或等於此參數值的文檔。
gt：範圍查詢將匹配字段值大於此參數值的文檔。
lte：範圍查詢將匹配字段值小於或等於此參數值的文檔。
lt：範圍查詢將匹配字段值小於此參數值的文檔。

舉例來講，要找到year字段從1700到1900的全部圖書，能夠運行如下查詢：

{
    "query" : {
        "range" : {
            "year" : {
                "gte" : 1700,
                "lte" : 1900
            }
        }
    }
}

複合查詢

布爾查詢

should：被它封裝的布爾查詢可能被匹配，也可能不被匹配。被匹配的should節點數由minimum_should_match參數控制。
must：被它封裝的布爾查詢必須被匹配，文檔纔會返回。
must_not：被它封裝的布爾查詢必須不被匹配，文檔纔會返回。

假設咱們想要找到全部這樣的文檔：在title字段中含有crime詞條，而且year字段能夠在也能夠不在1900~2000的範圍裏，在otitle字段中不能夠包含nothing詞條。用布爾查詢的話，相似於下面的代碼：

{
    "query" : {
        "bool" : {
            "must" : {
                "term" : {
                    "title" : "crime"
                }
            },
            "should" : {
                "range" : {
                    "year" : {
                        "from" : 1900,
                        "to" : 2000
                    }
                }
            },
            "must_not" : {
                "term" : {
                    "otitle" : "nothing"
                }
            }    
        }
    }
}

過濾器（不太理解過濾器的做用）

返回給定title的全部文檔，但結果縮小到僅在1961年出版的書。使用filtered查詢。以下：

{
    "query": {
        "filtered" : {
            "query" : {
            "match" : { "title" : "Catch-22" }
            },
            "filter" : {
                "term" : { "year" : 1961 }
            }
        }
    }
}

Demo

1.查詢wechat_customer表中mid等於$mid,且subscribe=1的人。

http://localhost:9200/wechat_v6_count/wechat_customer/_search?search_type=count

//php代碼
$esjson = array();
$esjson['query']['bool']['must'][] = array("term" => array("mid" => $mid));
$esjson['query']['bool']['must'][] = array("term" => array("subscribe" => 1));
$esjson['aggs'] = array("type_count" => array("value_count" => array("field" => "id")));

{
    "query":{
        "bool":{
            "must":[
            {    
                "term":{"mid":"55"}
            },{
                "term":{"subscribe":1}
            }]
        }
    },
    "aggs":{
        "type_count":{
            "value_count":{"field":"id"}
        }
    }
}

2.查詢wechat_customer 中mid等於$mid,$rule大於等於$start，且subscribe等於1的人數。（聚合默認返回的條數爲10，若是加上size等於0的參數則返回全部）

$esjson['query']['bool']['must'][] = array("range" => array($rule => array("gte"=>$start)));
$esjson['query']['bool']['must'][] = array("term" => array("mid" => $mid));
$esjson['query']['bool']['must'][] = array("term" => array("subscribe" => 1));
$esjson['aggs'] = array("type_count" => array("value_count" => array("field" => "id")));
$esjson = json_encode($esjson);
$esresult = ElasticsearchClient::searchForCount($esjson);
$result = $esresult['aggregations']['type_count']['value'];


//原來的sql
//$sql = "SELECT count(*) as 'cnt' from wechat_customer where mid =:mid AND " . $rule . ">=:start AND subscribe=1;";
//$params = array(':mid' => $mid, ':start' => $start);

esjson

{
    "query":{
        "bool":{
            "must":[
                {
                    "range":{
                        "action_count":{"gte":"15"}
                    }
                },
                {
                    "term":{"mid":"55"}
                },
                {
                    "term":{"subscribe":"1"}
                }
            ]
        }
    },
    "aggs":{
        "type_count":{
            "value_count":{"field":"id"}
        }
    }
}

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。