Elastic Search 入門 & DSL應用

時間 2019-12-06

標籤 elastic search 入門 dsl 應用简体版

原文原文鏈接

Reference

6.4最新版英文：https://www.elastic.co/guide/...
中文：https://www.elastic.co/guide/...
5.4中文：http://cwiki.apachecn.org/pag...

Basic Concepts

Near Realtime （NRT 近實時）：數據寫入後到能夠被查詢會有輕微的延遲（一般爲1s）
Cluster （集羣）：一個或者多個節點（ Node ）的集合，以cluster name集羣名做爲惟一標識
Node （節點）：一個ES實例就是一個node，大多數狀況下每一個node運行在一個獨立的環境或虛擬機上。
Index （索引）：一系列documents的集合。相似於數據庫中的db概念
Type（類型）：一個類型是索引中一個邏輯的種類/分區，相似於數據庫中的table概念。
Document（文檔）：索引信息的基本單位

Shards & Replicas （分片和副本）html

Shards：每一個索引有一個或多個分片，索引的數據被分配到各個分片上，至關於一桶水用了N個杯子裝
Replicas：副本，備份分片。每一個主分片都有一個備份分片在另外一個節點上，此時容許掛掉任意一個節點。

Defination

DSL（Domain Specific Language）:Elasticsearch 定義的查詢語言

ES字段類型：https://blog.csdn.net/chengyu...node

Elasticsearch vs 傳統數據庫

Elasticsearch	關係型數據庫	NOSQL數據庫
索引（index）	數據庫（database）	數據庫（database）
文檔（document）	行（row）	文檔（document）
字段（fields）	字段（columns）	字段（fields）

ElasticSearch相較於傳統數據庫的缺陷
· 不支持事務性操做
· 讀寫延時（NTR）
· 不適合頻繁的update等操做
· 安全性可靠性

API

Settings API: 獲取索引設置正則表達式

GET es-index_*/_settings
{
  "es-index_*": {
    "settings": {
      "index": {
        "mapping": {
          "ignore_malformed": "true"
        },
        "refresh_interval": "10s",
        "translog": {
          "durability": "async"
        },
        "max_result_window": "10000",
        "creation_date": "1551295476399",
        "requests": {
          "cache": {
            "enable": "true"
          }
        },
        "unassigned": {
          "node_left": {
            "delayed_timeout": "6h"
          }
        },
        "priority": "5",
        "number_of_replicas": "1",
        "uuid": "-JvfCJ3-TCaMMxqnOiOfNA",
        "version": {
          "created": "2030399"
        },
        "codec": "best_compression",
        "routing": {},
        "search": {
          "slowlog": {
            "threshold": {
              "fetch": {
                "warn": "1s",
                "trace": "200ms",
                "debug": "500ms",
                "info": "800ms"
              },
              "query": {
                "warn": "10s",
                "trace": "500ms",
                "debug": "1s",
                "info": "5s"
              }
            }
          }
        },
        "number_of_shards": "12",
        "merge": {
          "scheduler": {
            "max_thread_count": "1"
          }
        }
      },
      "tribe": {
        "name": "olap"
      }
    }
  }
}

Stats API: 獲取索引統計信息（http://cwiki.apachecn.org/pag...）數據庫

GET es-index_*/_stats
{
  "_shards": {
    "total": 622,
    "successful": 622,
    "failed": 0
  },
 //返回的統計信息是索引級的聚合結果，具備primaries和total的聚合結果。其中primaries只是主分片的值，total是主分片和副本分片的累積值。
  "_all": {
    "primaries": {
      "docs": {  //文檔和已刪除文檔（還沒有合併的文檔）的數量。注意，此值受刷新索引的影響。
        "count": 2932357017,
        "deleted": 86610
      },
      "store": { //索引的大小。
        "size_in_bytes": 2573317479532,
      }, 
      "indexing": {}, //索引統計信息，能夠用逗號分隔的type列表組合，以提供文檔級統計信息。
      "get": {}, // get api調用統計
      "search": {}, // search api 調用統計
     },
  
    "total": {
    }
  }
}

Search API（兩種形式）apache

using a simple query string as a parameterapi

GET es-index_*/_search?q=eventid:OMGH5PageView

using a request body緩存

GET es-index_*/_search
{
  "query": {
    "term": {
      "eventid": {
        "value": "OMGH5PageView"
      }
    }
  }
}

Query DSL

Leaf Query Clause: 葉查詢子句
Compound Query Clause: 複合查詢子句安全

DSL查詢上下文app

query context
在查詢上下文中，回答的問題是：How well does this document match this query clause?
除了判斷一條數據記錄(document)是否匹配查詢條件之外，還要計算其相對於其餘記錄的匹配程度，經過_score進行記錄。
filter context**
在過濾上下文中，回答的問題是：Does this document match this query clause?
僅判斷document是否匹配，不計算_score
通常用來過濾結構化數據,
e.g. timestamp是否在2017-2018範圍內，status是不是published
頻繁使用的過濾器會被Elasticsearch自動緩存，可提升性能

** 查詢時，可先使用filter過濾操做過濾數據，而後使用query查詢匹配數據async

查詢結果字段過濾

fields：字段過濾
script_fields：可對原始數據進行計算

"fields": ["eh"],  //僅返回eh字段
"script_fields": {
   "test": {
      "script": "doc['eh'].value*2"
   }
} // 返回eh字段值*2的數據並命名爲test字段

查詢過濾：query

bool 組合過濾器

{
   "bool" : {
      "must" :     [], // 全部的語句都必須匹配，至關於SQL中的and
      "must_not" : [], // 全部的語句都不能匹配，至關於SQL中的not
      "should" :   [], // 至少有一個語句要匹配，至關於SQL中的OR
      "filter" :   [] || {
          "and": [],
          "or": [],
          "not": [],
      }, // 
   }
}

filtered過濾器

{
    "filtered": {
          "query": {},
          "filter": {} // 在filter中進行數據過濾，而後再去query中進行匹配
    }
}

match和term

match（模糊匹配）：先檢查字段類型是不是analyzed，若是是，則先分詞，再去去匹配token；若是不是，則直接去匹配token。
term（精確匹配）：直接去匹配token。

terms: 多項查詢

{ terms : { user: ['tony', 'kitty' ] } }

range範圍過濾

對於date類型字段的範圍選擇可使用 Date Math

{
     "range" : {
          "born" : {
              "gte": "01/01/2012",
              "lte": "2013",
              "format": "dd/MM/yyyy||yyyy" 
           }
       }
 }


{
     "range" : {
          "timestamp" : {
              "gte": "now-6d/d", // Date Math
              "lte": "now/d", // Date Math
              "time_zone": "+08:00"  // 時區
           }
       }
 }

exists 該條記錄是否存在某個字段

{
     "exists" : { "field" : "user" }
}

wildcard: 通配符查詢（對分詞進行匹配查詢）

Note that this query can be slow, as it needs to iterate over many terms. In order to prevent extremely slow wildcard queries, a wildcard term should not start with one of the wildcards * or ?
wildcard查詢性能較差，儘可能避免使用*或？開頭來進行wildcard匹配

prefix: 前綴查詢
regexp：正則表達式查詢

Tips

value帶-的特殊處理

value帶了-，則默認會被切詞，致使搜索結果不許確。解決辦法之一就是在字段那裏加個.raw

term: {status:'pre-active'} => term: {status.raw: 'pre-active'}

sort

GET es-index_*/_search
{
  "fields" : ["eventid", "logtime"],
  "query": {
    "term": {
      "eventid": {
        "value": "OMGH5PageView"
      }
    }
  },
  "sort": [
    {
      "logtime": {
        "order": "asc"
      }
    }
  ]
}

聚合aggregation

date_histogram

（和 histogram 同樣）默認只會返回文檔數目非零的 buckets。即便 buckets
中沒有文檔咱們也想返回。能夠經過設置兩個額外參數來實現這種效果：

"min_doc_count" : 0,  // 這個參數強制返回空 buckets。
"extended_bounds" : {  // 強制返回全年
    "min" : "2014-01-01",
    "max" : "2014-12-31"
}

查詢返回結果參數

took: 查詢返回的時間（單位：毫秒）time_out: 查詢是否超時_shards: 描述查詢分片的信息，包括：查詢了多少分片，成功的分片數量，失敗的分片數量等hits：搜索的結果total: 知足查詢條件的文檔數max_score: hits: 知足條件的文檔_score: 文檔的匹配程度

1. Elastic search入門
2. 【Elastic Search】入門
3. Elastic Search之入門概念
4. Elastic Search 新手筆記（1）——入門篇
5. 初試 Elastic Search
6. Elastic Search
7. Elastic Stack入門
8. elastic search
9. Elastic Search 安裝與運行
10. 學習ELk之----02. Elastic Search操做入門
更多相關文章...
• Memcached入門教程 - NoSQL教程
• Maven Web 應用 - Maven教程
• YAML 入門教程
• Java Agent入門實戰（一）-Instrumentation介紹與使用

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。