Elasticsearch使用

時間 2019-12-14

標籤 elasticsearch 使用欄目日誌分析简体版

原文原文鏈接

elasticsearch之倒排索引html

小知識python

　　通常搜索引擎的存儲採用倒排索引，是底層數據存儲的基本方式linux

定義django

　　倒排索引源於實際應用中須要根據屬性的值來查找記錄。這種索引表中的每一項都包括一個屬性值和具體該屬性值的各記錄的地址。數組

　　因爲不是由記錄來肯定屬性值，而是由屬性值來肯定記錄的位置，於是稱爲倒排索引 inverted index。網絡

　　帶有倒排索引的文件咱們稱之爲倒排索引文件，簡稱倒排文件app

TF-IDFelasticsearch

　　一種用於信息檢索與數據挖掘的經常使用加權技術。TF意思是詞頻(Term Frequency)，IDF意思是逆文本頻率指數(Inverse Document Frequency)分佈式

　　若是某個詞或短語在一篇文章中出現的頻率TF高，而且在其餘文章中不多出現，則認爲此詞或者短語具備很好的類別區分能力，適合用來分類ide

　　elasticsearch使用的分詞打分技術

倒排索引須要解決的問題

　　1. 大小寫轉換問題，如python和PYTHON應該爲一個詞

　　2. 詞幹提取，如looking和look應該處理爲一個詞

　　3. 分詞，如屏蔽系應該分爲「屏蔽」、「系統」仍是「屏蔽系統」

　　4. 倒排索引文件過大，須要使用什麼樣的壓縮編碼下降存儲空間開銷

elasticsearch文檔、索引的簡單CRUD操做

內置類型

　　string類型

　　　　text、keyword

　　數字類型

　　　　long、integer、short、byte、double、float

　　日期類型

　　　　date

　　bool類型

　　　　boolean

　　binary類型

　　　　binary

　　複雜類型

　　　　object、nested

　　geo類型

　　　　geo-point、geo-shape

　　專業類型

　　　　ip、competion

建立索引

1. 在kibana的console中建立索引

    # 一但建立好索引，shards就不能更改了

    PUT myindex
    {
        "settings": {
            "index": {
                "number_of_shards": 5,
                "number_of_replicas": 1
            }
        }
    }

2. 在head管理頁面的新建索引菜單中建立

查詢索引

1. 在kibana的console中查詢索引

    GET myindex/_settings

    GET _all/_settings

    GET .kabana,myindex/_settings

    GET _settings

    GET myindex

更新索引

1. 在kibana的console中更新索引

    PUT myindex
    {
        "number_of_replicas": 2
    }

保存文檔至某個索引

# 若是job後不指定文檔id，則es默認給咱們生成以各uuid

PUT myindex/job/1    
{
    "title": "python分佈式爬蟲開發",
    "salary_min": 15000,
    "city": "北京",
    "company": {
        "name": "百度",
        "company_addr": "北京市軟件園"
    }
    "publish_date": "2018-05-20",
    "comments": 15
}

查看文檔保存狀態

　　在head管理頁面中點擊數據瀏覽，選擇myindex

獲取文檔

GET myindex/job/1

GET myindex/job/1?_source=title,city

修改文檔

1. 覆蓋修改

    PUT myindex/job/1    
    {
        "title": "python分佈式爬蟲開發",
        "salary_min": 15000,
        "city": "北京",
        "company": {
            "name": "百度",
            "company_addr": "北京市軟件園"
        }
        "publish_date": "2018-05-20",
        "comments": 15
    } 

2. 增量修改

    POST myindex/job/1/_update
    {
        "doc": {
            "comments": 15
        }
    }

刪除文檔

DELETE myindex/job/1

刪除type

DELETE myindex/job

刪除索引

DELETE myindex

elasticsearch的批量操做

mget批量獲取

# 方法一：獲取不一樣index的數據

    GET _mget
    {
        "docs": [
            {
                "_index": "test",
                "_type": "job1",
                "_id": "1"
            },
            {
                "_index": "test",
                "_type": "job2",
                "_id": "2" 
            }
        ]
    }

# 方法二：獲取同一個index、type下的數據

    GET test/_mget
    {
        "docs": [
            {
                "_type": "job1",
                "_id": "1"
            },
            {
                "_type": "job2",
                "_id": "2" 
            }
        ]
    }

# 方法三：獲取同一個index、type、id下的數據

    GET test/job1/_mget
    {
        "docs": [
            {
                "_id": "1"
            },
            {
                "_id": "2" 
            }
        ]
    }

# 方法四：test/job1/_mget

    {
        "ids": [1, 2]
    }

bulk批量操做

# 批量導入能夠合併多個操做，好比idnex、delete、update、create等

# 也能夠將一個索引導入到另外一個索引

修改文檔某個key值

    POST _bulk
    {"index": {"_index": "test", "_type": "type1", "_id": "1"}}
    {"field1": "value1"}
    {"index": {"_index": "test", "_type": "type1", "_id": "2"}}
    {"field2": "value2"}
    {"delete": {"_index": "test", "_type": "type1", "_id": "2"}}
    {"create": {"_index": "test", "_type": "type1", "_id": "3"}}
    {"field3": "value3"}
    {"update": {"_index": "test", "_type": "type1", "_id": "2"}}
    {"doc": {"field2": "value22222"}}

elasticsearch的映射mapping

經常使用屬性

store

    值爲yes表示存儲，默認爲no，適合類型all

index

    yes表示分析，默認爲true，適合類型string

null_value

    若是字段爲空，能夠設置一個默認值，好比NA，適合類型all

analyzer

    能夠設置索引和搜索時用的分析器，默認使用的是standard分析器，還可使用whitespace、simple、english，適合類型all

include_in_all

    默認es爲每一個文檔定義一個特殊的_all，它的做用是讓每一個字段被搜索到，若是不想某個字段被搜索到，能夠設置爲false，合適類型爲all

format

    時間格式字符串的模式，合適類型爲date

mapping更多屬性

https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-params.html

建立一個mapping

PUT mapping 
{
    "mappings": {
        "job": {
            "properties": {
                "title": {
                    "store": true,
                    "type": "text",
                    "analyzer": "ik_max_word"
                },
                "salary_min": {
                    "type": "integer"
                },
                "city": {
                    "type": "keyword"
                },
                "company": {
                    "properties": {
                        "name":{
                            "store": true,
                            "type": "text"
                        },
                        "company_addr": {
                            "type": "text"
                        },
                        "employee_count": {
                            "type": "integer"
                        }
                    }
                },
                "publish_date": {
                    "type": "date",
                    "format": "yyyy-MM-dd"
                },
                "comments": {
                    "type": "integer"
                },
            }
        }
    }
}

對應document

PUT myindex/job/1    
{
    "title": "python分佈式爬蟲開發",
    "salary_min": 15000,
    "city": "北京",
    "company": {
        "name": "百度",
        "company_addr": "北京市軟件園"，
        "employee_count": 50
    }
    "publish_date": "2018-05-20",
    "comments": 15
}

查看索引信息

　　在head管理頁面中，點擊對應index，下拉菜單中選擇索引信息便可看到

注意

　　對應key屬性若是與mapping不對應，那麼es會嘗試轉換，若是轉換成功繼續存儲，不然報錯

　　索引的類型一但肯定，很難修改。若是想修改很麻煩，特別是當索引數據特別大的時候，在修改映射恢復時間會特別長

獲取已經建立好的mapping

GET myindex/_mapping

GET myindex/job/_mapping

GET _all/_mapping

GET _all/_mapping/job

更多elasticsearch查詢

查詢分類

　　基本查詢，使用elasticsearch內置查詢條件進行查詢

　　組合查詢，把多個查詢組合在一塊兒進行復合查詢

　　過濾，查詢同時，經過filter條件在不影響打分的狀況下篩選數據

match查詢

# 對查詢數據進行分詞後查詢

GET myindex/job/_search
{
    "query": {
        "match": {
            "title": "python"
        }
    }
}

term查詢

# 不對查詢數據進行任何處理，直接查詢

GET myindex/_search
{
    "query": {
        "term": {
            "title": "python"
        }
    }
}

terms查詢

# 列表中任何一個值查詢到數據就會返回

GET myindex/_search
{
    "query": {
        "terms": {
            "title": ["工程師", "django", "系統"]
        }
    }
}

控制查詢返回數量

# from表示從哪一條開始，size開始返回多少條數據

GET myindex/_search
{
    "query": {
        "terms": {
            "title": ["工程師", "django", "系統"]
        }
    },
    "from": 0,
    "size": 2
}

match_all查詢

# 查詢全部

GET myindex/_search
{
    "query": {
        "match_all": {}
    },
}

match_phrase查詢

# 短語查詢,對query字段值進行分詞放到一個列表中，然查詢列表中的全部元素，只有知足列表全部詞的文檔纔會被返回

# slop,限制列表中每一個詞出現的距離

GET /myindex/_search
{
    "query": {
        "match_phrase": {
            "title": {
                "query": "python系統",
                "slop": 6
            }
        }
    },
}

multi_match查詢

# 能夠指定多個字段，任意字段包含查詢值便可知足返回條件

# 字段後加上^3表示增長權重，表示title字段出現python權重會高，這會影響結果的排序

GET /myindex/_search
{
    "query": {
        "multi_match": {
            "query": "python",
            "fields": ["title^3", "desc"]
        }
    },
}

指定返回字段

GET /myindex/_search
{
    "stored_fileds": ["title", "company_name"],
    "query": {
        "match": {
            "title": "python"
        }
    },
}

經過sort把結果排序

# 升序asc 降序desc

GET /myindex/_search
{
    "query": {
        "match_all": {}
    },
    "sort": [{
        "comments": {
            "order": "desc"
        }
    }]
}

數字range查詢

# boost 表示權重

GET /myindex/_search
{
    "query": {
        "range": {
            "comments": {
                "gte": 10,
                "lte": 20,
                "boost": 2.0
            }
        }
    }
}

時間range查詢

# now 會被es自動解析成當前時間

GET /myindex/_search
{
    "query": {
        "range": {
            "add_time": {
                "gte": "2018-05-20",
                "lte": "now"
            }
        }
    }
}

wildcard模糊查詢

GET /myindex/_search
{
    "query": {
        "wildcard": {
            "title":{
                "value": "py*n",
                "boost": 2.0
            }
        }
    }
}

fuzzy模糊查詢

GET myindex/_search
{
    "query": {
        "fuzzy": {
            "title": "linux"
        }
    },
    "_source": ["title"]  
}

# fuzzyiness爲最小編輯距離， prefix_length表明前面不參與變換詞長度

GET myindex/_search
{
    "query": {
        "fuzzy": {
            "title": {
                "value": "linu",
                "fuzzyiness": 1,
                "prefix_length": 0
            }
        }
    },
    "_source": ["title"]  
}

POST music/_search?pretty
{
    "suggest": {
        "song-suggest" : {
            "prefix" : "nor",
            "completion" : {
                "field" : "suggest",
                "fuzzy" : {
                    "fuzziness" : 2
                }
            }
        }
    }
}

regex正則匹配查詢

POST music/_search?pretty
{
    "suggest": {
        "song-suggest" : {
            "regex" : "n[ever|i]r",
            "completion" : {
                "field" : "suggest"
            }
        }
    }
}

elasticsearch組合查詢

bool查詢

　　老版本的filtered已經被bool替換

bool查詢使用格式

# filter字段過濾，不參與打分

# mush數組列的全部條件查詢必須同時知足

# should數組列中任意條件知足便可

# mush_not於mush相反

"bool": {
    "filter": [],
    "must": [],
    "should": [],
    "must_not": []
}

簡單過濾bool查詢

# term也能夠替換成match，這裏integer類型分不分詞查詢結果都同樣

GET /myindex/_search
{
    "query": {
        "bool": {
            "must":{
                "match_all": {}
            },
            "filter": {
                "term":{
                    "salary_min": 20
                }
            }
        }
    }
}

多條件查詢

GET /myindex/_search
{
    "query": {
        "bool": {
            "must":{
                "match_all": {}
            },
            "filter": {
                "terms":{
                    "salary_min": [20, 30]
                }
            }
        }
    }
}

term查詢text屬性值的數據

# 咱們存儲時是Python，但默認text通過分詞後會被轉換程python進行存儲，因此若是對Python進程term查詢將沒有返回結果

# 可使用原查詢值的小寫進行查詢或者使用match進行查詢

GET /myindex/_search
{
    "query": {
        "bool": {
            "must":{
                "match_all": {}
            },
            "filter": {
                "term":{
                    "title": "Python"
                }
            }
        }
    }
}

查看分析器解析的結果

# 分詞結果： python、網絡、絡

GET _analyze
{
    "analyzer": "ik_max_word",
    "text": "python網絡"
}

# 分詞結果： python、網絡

GET _analyze
{
    "analyzer": "ik_smart",
    "text": "python網絡"
}

多條件組合查詢

GET /myindex/_search
{
    "query": {
        "bool": {
            "should": [
                "term": {
                    "salary_min": 20
                },
                "term": {
                    "title": "python"
                }
            ],
            "must_not": [
                "term": {
                    "salary_min": 30
                }
            ]
        }
    }
}

多條件嵌套查詢

GET /myindex/_search
{
    "query": {
        "bool": {
            "should": [
                {
                    "term": {
                        "salary_min": 20
                    }
                },
                {
                    "bool": {
                        "must": [
                            {
                                "term": {
                                    "title": "django"
                                }
                            },
                            {
                                "term": {
                                    "salary_min": 30
                                }
                            }
                        ]
                    }
                }
            ],
        }
    }
}

過濾空值和非空值

創建測試數據

POST myindex/test2/_bulk
{"index":{"_id":1}}
{"tags":["search"]}
{"index":{"_id":2}}
{"tags":["search", "python"]}
{"index":{"_id":3}}
{"other_field":["data"]}
{"index":{"_id":4}}
{"tags":null}
{"index":{"_id":5}}
{"tags":["search", null]}

獲取非空值結果

# 返回有tag字段的結果而且非空

GET /myindex/_search
{
    "query": {
        "bool": {
            "filter":{
                "exists": {
                    "field": "tags"
                }
            },
        }
    }
}

獲取空值結果

# 返回沒有tag字段的結果而且爲空

GET /myindex/_search
{
    "query": {
        "bool": {
            "must_not":{
                "exists": {
                    "field": "tags"
                }
            },
        }
    }
}

相關標籤/搜索

使用

elasticsearch+elasticsearch

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。