【原創】elasticsearch入門

時間 2019-11-11

標籤原創 elasticsearch 入門欄目日誌分析简体版

原文原文鏈接

示例

示例一：

示例二：

示例三：

示例四：

ES介紹

ElasticSearch是一個基於Lucene的搜索服務器。它提供了一個分佈式多用戶能力的全文搜索引擎，基於RESTful web接口。Elasticsearch是用Java開發的，並做爲Apache許可條款下的開放源碼發佈，是當前流行的企業級搜索引擎。html

安裝過程

wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.4.0.tar.gz  
tar -xvzf elasticsearch-6.4.0.tar.gz  
cd elasticsearch-6.4.0/bin
./elasticsearch -d

修改配置文件

# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: my-application
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
network.host: 192.168.141.129
#
# Set a custom port for HTTP:
#
http.port: 9200
#
# For more information, consult the network module documentation.
#

再次啓動報錯：node

[2018-09-13T09:29:43,060][INFO ][o.e.b.BootstrapChecks    ] [7hyiUY2] bound or publishing to a non-loopback address, enforcing bootstrap checks
ERROR: [2] bootstrap checks failed
[1]: max file descriptors [4096] for elasticsearch process is too low, increase to at least [65536]
[2]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

解決方案：web

vi /etc/security/limits.conf # 添加兩行行配置，並重連SSH
elasticsearch soft nofile 65536
elasticsearch hard nofile 65537

vi /etc/sysctl.conf # 添加一行配置
vm.max_map_count=262144
sysctl -p

頁面訪問

地址：
http://192.168.141.129:9200/?pretty
顯示：
正則表達式

ES架構

基礎概念

http://www.javashuo.com/article/p-mfnljgnt-gp.html數據庫

接近實時（NRT）
Elasticsearch 是一個接近實時的搜索平臺。這意味着，從索引一個文檔直到這個文檔可以被搜索到有一個很小的延遲（一般是 1 秒）
集羣(Cluster)
表明一個集羣，集羣中有多個節點，其中有一個爲主節點，這個主節點是能夠經過選舉產生的，主從節點是對於集羣內部來講的。es的一個概念就是去中心化，字面上理解就是無中心節點，這是對於集羣外部來講的，由於從外部來看es集羣，在邏輯上是個總體，你與任何一個節點的通訊和與整個es集羣通訊是等價的。
節點(Node)
節點是一個單獨運行的elasticsearch實例，它屬於一個集羣。默認狀況下，elasticsearch中的每一個節點都加入名爲「elasticsearch」的集羣。每一個節點均可以在elasticsearch中使用本身的elasticsearch.yml，它們能夠對內存和資源分配有不一樣的設置。
數據節點(Data Node)
數據節點索引文檔並對索引文檔執行搜索。建議添加更多的數據節點，以提升性能或擴展集羣。經過在elasticsearch中設置這些屬性，可使節點成爲一個數據節點。elasticsearch.yml配置
管理節點(Master Node)
主節點負責集羣的管理。對於大型集羣，建議有三個專用的主節點(一個主節點和兩個備份節點)，它們只做爲主節點，不存儲索引或執行搜索。在elasticsearch.yml配置聲明節點爲主節點:
路由節點亦稱負載均衡節點(Routing Node or load balancer node)
這些節點不扮演主或數據節點的角色，但只需執行負載平衡，或爲搜索請求路由，或將文檔編入適當的節點。這對於高容量搜索或索引操做很是有用。
索引(Index)
Elasticsearch索引是一組具備共同特徵的文檔集合。每一個索引(index)包含多個類型(type)，這些類型依次包含多個文檔(document)，每一個文檔包含多個字段(Fields)。在Elasticsearch中索引由多個JSON文檔組成。在Elasticsearch集羣中能夠有多個索引。
類型(Type)[Deprecated]
類型用於在索引中提供一個邏輯分區。它基本上表示一類相似類型的文檔。一個索引能夠有多個類型，咱們能夠根據上下文來解除它們。
文檔(Document)。
Elasticsearch文檔是一個存儲在索引中的JSON文檔。每一個文檔都有一個類型和對應的ID，這是唯一的。
映射(Mapping)
映射用於映射文檔的每一個field及其對應的數據類型，例如字符串、整數、浮點數、雙精度數、日期等等。在索引建立過程當中，elasticsearch會自動建立一個針對fields的映射，而且根據特定的需求類型，能夠很容易地查詢或修改這些映射。
分片(Shard)
表明索引分片，es能夠把一個完整的索引分紅多個分片，這樣的好處是能夠把一個大的索引拆分紅多個，分佈到不一樣的節點上。構成分佈式搜索。分片的數量只能在索引建立前指定，而且索引建立後不能更改。
副本(Replica)
表明索引副本，es能夠設置多個索引的副本，副本的做用一是提升系統的容錯性，當某個節點某個分片損壞或丟失時能夠從副本中恢復。二是提升es的查詢效率，es會自動對搜索請求進行負載均衡。
river
表明es的一個數據源，也是其它存儲方式（如：數據庫）同步數據到es的一個方法。它是以插件方式存在的一個es服務，經過讀取river中的數據並把它索引到es中，官方的river有couchDB的，RabbitMQ的，Twitter的，Wikipedia的。
gateway
表明es索引快照的存儲方式，es默認是先把索引存放到內存中，當內存滿了時再持久化到本地硬盤。gateway對索引快照進行存儲，當這個es集羣關閉再從新啓動時就會從gateway中讀取索引備份數據。es支持多種類型的gateway，有本地文件系統（默認），分佈式文件系統，Hadoop的HDFS和amazon的s3雲存儲服務。

GET /_cat	命令解釋
/_cluster/stats	查看集羣統計信息
/_cat/allocation
/_cat/shards
/_cat/shards/{index}
/_cat/master
/_cat/nodes	查看集羣的節點列表
/_cat/tasks
/_cat/indices	查看全部索引
/_cat/indices/{index}	查看指定索引
/_cat/segments
/_cat/segments/{index}
/_cat/count
/_cat/count/{index}
/_cat/recovery
/_cat/recovery/{index}
/_cat/health	查看集羣的健康情況
/_cat/pending_tasks
/_cat/aliases
/_cat/aliases/{alias}
/_cat/thread_pool
/_cat/thread_pool/{thread_pools}
/_cat/plugins
/_cat/fielddata
/_cat/fielddata/{fields}
/_cat/nodeattrs
/_cat/repositories
/_cat/snapshots/{repository}
/_cat/templates
/_stats	查看全部的索引狀態

v是用來要求在結果中返回表頭
pretty 格式化json
help 幫助

狀態值說明

Green - everything is good (cluster is fully functional)，即最佳狀態
Yellow - all data is available but some replicas are not yet allocated (cluster is fully functional)，即數據和集羣可用，可是集羣的備份有的是壞的
Red - some data is not available for whatever reason (cluster is partially functional)，即數據和集羣都不可用

索引管理

建立索引

直接建立json

PUT twitter

settingsbootstrap

PUT twitter
{
    "settings" : {
        "index" : {
            "number_of_shards" : 3, 
            "number_of_replicas" : 2 
        }
    }
}

mappings數組

PUT twitter
{
    "settings" : {
        "index" : {
            "number_of_shards" : 3, 
            "number_of_replicas" : 2 
        }
    },
   "mappings" : {
        "_doc" : {
            "properties" : {
                "field1" : { "type" : "text" }
            }
        }
    }
}

查看索引

GET /twitter/
GET /twitter/_search

刪除索引

DELETE /twitter

映射管理

Core Datatypes     核心類型
string
    text and keyword 
Numeric datatypes
    long, integer, short, byte, double, float, half_float, scaled_float 
Date datatype
    date 
Boolean datatype
    boolean 
Binary datatype
    binary 
Range datatypes     範圍
    integer_range, float_range, long_range, double_range, date_range

Complex datatypes 複合類型
Array datatype
    數組就是多值，不須要專門的類型
Object datatype
    object ：表示值爲一個JSON 對象 
Nested datatype
    nested：for arrays of JSON objects（表示值爲JSON對象數組 ）
    
Geo datatypes  地理數據類型
Geo-point datatype
    geo_point： for lat/lon points  （經緯座標點）
Geo-Shape datatype
    geo_shape： for complex shapes like polygons （形狀表示）
    
Specialised datatypes 特別的類型
IP datatype
    ip： for IPv4 and IPv6 addresses 
Completion datatype
    completion： to provide auto-complete suggestions 
Token count datatype
    token_count： to count the number of tokens in a string 
mapper-murmur3
    murmur3： to compute hashes of values at index-time and store them in the index 
Percolator type
    Accepts queries from the query-dsl 
join datatype
    Defines parent/child relation for documents within the same index

文檔管理

新建

指定id
PUT twitter/_doc/1
{
    "id": 1,
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}

自動生成id
POST twitter/_doc/
{
    "id": 1,
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}

查看

HEAD twitter/_doc/11
GET twitter/_doc/1

更新

PUT twitter/_doc/1
{
    "id": 1,
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}

刪除

DELETE twitter/_doc/1

批處理

POST _bulk
{ "index" : { "_index" : "test", "_type" : "_doc", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_type" : "_doc", "_id" : "2" } }
{ "create" : { "_index" : "test", "_type" : "_doc", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_type" : "_doc", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }

index 不管是否存在，都會成功
create 存在會提示
update 不存在會提示
delete 不存在會提示

結構化搜索

精確值查找term

POST /my_store/_doc/_bulk
{ "index": { "_id": 1 }}
{ "price" : 10, "productID" : "XHDK-A-1293-#fJ3" }
{ "index": { "_id": 2 }}
{ "price" : 20, "productID" : "KDKE-B-9947-#kL5" }
{ "index": { "_id": 3 }}
{ "price" : 30, "productID" : "JODL-X-1937-#pV7" }
{ "index": { "_id": 4 }}
{ "price" : 30, "productID" : "QQPX-R-3956-#aD8" }

一個字段查詢服務器

GET my_store/_doc/_search
{
  "query": {
    "term": {
      "price": "30"
    }
  }
}

組合過濾

GET my_store/_doc/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "term": {
            "price": 20
          }
        },
        {
          "term": {
            "productID": "XHDK-A-1293-#fJ3"
          }
        }
      ],
      "must_not": {
        "term": {
          "price": 30
        }
      }
    }
  }
}

PUT my_store
{
    "mappings" : {
        "_doc" : {
            "properties" : {
                "productID" : {
                    "type" : "keyword"
                }
            }
        }
    }
}

GET /my_store/_analyze
{
  "field": "productID",
  "text": "XHDK-A-1293-#fJ3"
}

高亮

GET my_store/_doc/_search
{
  "query": {
    "match": {
      "productID": "b"
    }
  },
  "highlight": {
      "pre_tags" : ["<span class='hlt'>"],
      "post_tags" : ["</span>"],
      "title": {},
      "content": {}
    }
  }
}

全文搜索

POST /my_index/my_type/_bulk
{ "index": { "_id": 1 }}
{ "title": "The quick brown fox" }
{ "index": { "_id": 2 }}
{ "title": "The quick brown fox jumps over the lazy dog" }
{ "index": { "_id": 3 }}
{ "title": "The quick brown fox jumps over the quick dog" }
{ "index": { "_id": 4 }}
{ "title": "Brown fox brown dog" }

匹配查詢

GET /my_index/my_type/_search
{
    "query": {
        "match": {
            "title": "QUICK!"
        }
    }
}

GET /my_index/_analyze
{
  "field": "title",
  "text": "QUICK!"
}

組合查詢

GET /my_index/my_type/_search
{
  "query": {
    "bool": {
      "must":     { "match": { "title": "quick" }},
      "must_not": { "match": { "title": "lazy"  }},
      "should": [
                  { "match": { "title": "brown" }},
                  { "match": { "title": "dog"   }}
      ]
    }
  }
}

分詞

character filter ：字符過濾器，對文本進行字符過濾處理，如處理文本中的html標籤字符。處理完後再交給tokenizer進行分詞。一個analyzer中可包含0個或多個字符過濾器，多個按配置順序依次進行處理。
tokenizer：分詞器，對文本進行分詞。一個analyzer必需且只可包含一個tokenizer。
token filter：詞項過濾器，對tokenizer分出的詞進行過濾處理。如轉小寫、停用詞處理、同義詞處理。一個analyzer可包含0個或多個詞項過濾器，按配置順序進行過濾。

測試分詞器

POST _analyze
{
  "tokenizer": "standard",
  "char_filter":  [ "html_strip" ],
  "filter":  [ "lowercase", "asciifolding" ],
  "text":      "Is this déja vu?"
}

POST _analyze
{
  "analyzer": "ik_smart",
  "text": "微知"
}

內置的分析器

Standard Analyzer
Simple Analyzer
Whitespace Analyzer
Stop Analyzer
Keyword Analyzer
Pattern Analyzer
Language Analyzers
Fingerprint Analyzer
Custom analyzers

內建的character filter

HTML Strip Character Filter
　　html_strip ：過濾html標籤，解碼HTML entities like &.
Mapping Character Filter
　　mapping ：用指定的字符串替換文本中的某字符串。
Pattern Replace Character Filter
　　pattern_replace ：進行正則表達式替換。

內建的Tokenizer

Standard Tokenizer
Letter Tokenizer
Lowercase Tokenizer
Whitespace Tokenizer
UAX URL Email Tokenizer
Classic Tokenizer
Thai Tokenizer
NGram Tokenizer
Edge NGram Tokenizer
Keyword Tokenizer
Pattern Tokenizer
Simple Pattern Tokenizer
Simple Pattern Split Tokenizer
Path Hierarchy Tokenizer

示例架構

PUT customer
{
  "mappings": {
    "_doc": {
      "properties": {
        "customerName": {
          "type": "text",
          "analyzer": "ik_smart",
          "search_analyzer": "ik_smart"
        },
        "companyId": {
          "type": "text"
        }
      }
    }
  }
}


POST /customer/_doc/_bulk
{ "index": { "_id": 1 }}
{ "companyId": "55", "customerName": "微知（上海）服務外包有限公司" }
{ "index": { "_id": 2 }}
{ "companyId": "55", "customerName": "上海微盟" }
{ "index": { "_id": 3 }}
{ "companyId": "55", "customerName": "上海知道廣告有限公司" }
{ "index": { "_id": 4 }}
{ "companyId": "55", "customerName": "微鯨科技有限公司" }
{ "index": { "_id": 5}}
{ "companyId": "55", "customerName": "北京微塵大業電子商務" }
{ "index": { "_id": 6}}
{ "companyId": "55", "customerName": "福建微衝企業諮詢有限公司" }
{ "index": { "_id": 7}}
{ "companyId": "55", "customerName": "上海知盛企業管理諮詢有限公司" }

GET /customer/_doc/_search
{
  "query": {
    "match": {
      "customerName": "知道"
    }
  }
}

GET /customer/_doc/_search
{
  "query": {
    "match": {
      "customerName": "微知"
    }
  }
}

標題	連接
elasticsearch系列一：elasticsearch（ES簡介、安裝&配置、集成Ikanalyzer）	http://www.javashuo.com/article/p-seoiweer-mh.html
elasticsearch系列二：索引詳解（快速入門、索引管理、映射詳解、索引別名）	http://www.javashuo.com/article/p-mnkeqnhg-mh.html
elasticsearch系列三：索引詳解（分詞器、文檔管理、路由詳解（集羣））	http://www.javashuo.com/article/p-hewbfjdl-mh.html
elasticsearch系列四：搜索詳解（搜索API、Query DSL）	http://www.javashuo.com/article/p-bxhmnxve-hr.html
elasticsearch系列五：搜索詳解（查詢建議介紹、Suggester 介紹）	http://www.javashuo.com/article/p-hbibvwze-mh.html
elasticsearch系列六：聚合分析（聚合分析簡介、指標聚合、桶聚合）	http://www.javashuo.com/article/p-pcwdcicw-mh.html
elasticsearch系列七：ES Java客戶端-Elasticsearch Java client	http://www.javashuo.com/article/p-hlgrwnon-ey.html
elasticsearch系列八：ES 集羣管理（集羣規劃、集羣搭建、集羣管理）	http://www.javashuo.com/article/p-hubkyqjz-mh.html