第10-12章彙總

時間 2019-11-08

標籤彙總简体版

原文原文鏈接

10-1 elasticsearch介紹

目前在使用es的大公司： https://www.elastic.co/use-caseshtml

Mongodb redis 在 elasticsearch面前就是一個玩笑哈哈！java

10-2 elasticsearch安裝

10-3elasticsearch-head插件以及kibana的安裝

要先下載java安裝node

Java -version 查看是否安裝python

能夠用國內大神二次開發的版本 https://github.com/medcl/elasticsearch-rtflinux

碰見的錯誤git

Could not find any executable java binary. Please install java in your PATH or set JAVA_HOMEgithub

解決方式：redis

要添加jdk配置到windows的環境中npm

JAVA_HOME django

C:\Program Files\Java\jdk1.8.0_131

此時不該有 \elasticsearch-rtf-master\bin\\..\config\jvm.options。

解決方式：

由於的你的路徑中有括號，只要去掉括號就能夠了

安裝head插件：

下載 https://github.com/mobz/elasticsearch-head

下載 https://nodejs.org/en/ 安裝nodejs

安裝好nodejs就安裝好了npm 能夠用npm -v 測試下

網上安裝教程不少，就能夠安裝。

推薦使用cnpm 利用的是淘寶的鏡像

npm install -g cnpm --registry=https://registry.npm.taobao.org

百度下就能夠看到不少資料，這裏很少寫了

整個安裝elasticsearch的教程，能夠百度均可用

啓動命令

啓動elasticsearch 直接 elasticsearch.bat

啓動 head 直接 cnpm run start

啓動kibana 直接kibana.bat

10-4 elasticsearch的基本概念

若是是同一個索引的，要求是字段名稱要相同

10-5 倒排索引

<>中是出現的位置，最後一個是出現的次數

以上的問題其實elasticsearch都已經幫咱們作好了

10-6 elasticsearch 基本的索引和文檔CRUD操做

10-7 elasticsearch的mget和bulk批量操做

10-8 elasticsearch的mapping映射管理

10-9 elasticsearch的簡單查詢 - 1

10-10 elasticsearch的簡單查詢 - 2

10-11 elasticsearch的bool組合查詢

GET _search

{

  "query": {

    "match_all": {}

  }

}

# es的文檔、索引的CRUD操做

# 索引初始化操做

# 指定分片和副本的數量

# shards 一旦設置就不能修改了（注意）

# 創建5個分片，1個副本

PUT lagou

{

  "settings": {

    "index":{

      "number_of_shards":5,

      "number_of_replicas":1

    }

  }

}

GET lagou/_settings

# 得到全部索引的setting

GET _all/_settings

GET _settings

# 得到指定索引的setting

GET .kibana,lagou/_settings

# 修改settings
PUT lagou/_settings
{
"number_of_replicas": 2
}

# 獲取索引信息

GET _all

GET lagou

#put插入數據必定要有_id,能夠本身指定

PUT lagou/job/1

{

  "name":"ppp",

  "age":12

}

#post插入數據,能夠沒有_id，他會本身隨機生成一個_id

POST lagou/job/

{

  "name":"ppp",

  "age":13

}

# 只查看某條記錄

GET lagou/job/1

# 只查看某條記錄的某個字段

GET lagou/job/1?_source=name,age

# put修改文章只要是_id是同樣的，那就會直接覆蓋

PUT lagou/job/1

{

  "name":"ppp",

  "age":12,

  "gender":"male"

}

# post 能夠指定某個字段作修改,不用用到所有的字段

# 通常都是用post

POST lagou/job/1/_update

{

  "doc": {

    "name2": "hello"

  }

}

# 刪除

# 刪除某條記錄

DELETE lagou/job/1

# 刪除索引

DELETE lagou

# _mget多條查詢

GET lagou/job/_mget

{

  "ids":[1,"AWOcnK0u_fyJAeHWwkP7",12]

}

# _bulk用的很少

# 建立索引

PUT lagou22

{

  "mappings": {

    "job":{

      "properties": {

        "name":{

          "type": "keyword"

        },

        "age":{

          "type": "integer"

        },

        "gender":{

          "type": "text"

        }

      }

    }

  }

}

PUT lagou22/job/1

{

  "name":"pujinxiao",

  "age":34,

  "gender":"male"



}

# 查詢 query

term 和 match 區別

term 不會給查詢條件作分詞，可是match會的

{

  "query": {

    "term": {

      "title": "火石"

    }

  }

}

{

  "query": {

    "match": {

      "title": "火石"

    }

  }

}

# 只要知足列表中的一個就能被查詢出來

get map_news/index/_search

{

  "query": {

    "terms": {

      "title": ["火石","python"]

    }

  }

}

# 控制查詢返回的數量

get map_news/index/_search

{

  "query": {

    "term": {

      "title": ["火石","python"]

    }

  },

  "from":0,

  "size":2

}

# match_phrase查詢

# 短語查詢

#他會把查詢的內容分詞，slop就是分詞後的中間的長度，你能夠本身控制

GET map_news/index/_search

{

  "query": {

    "match_phrase": {

      "title": {

        "query": "火石創造"，

"slop":6

      }

    }

  }

}

# multi_match

# 指定多個字段查詢，

GET map_news/index/_search

{

  "query": {

    "multi_match": {

      "query": "火石",

      "fields": ["title","summary"]

      }

    }

}

# 搜索的權重是title是summary的三倍

"fields": ["title^3","summary"]

# 指定返回的字段

# _source 指定返回的字段，用excludes 是排除該字段

GET map_news/index/_search

{

  "_source": {

    "includes": ["title"]

  },

  "query": {

    "match": {

      "title": "火石"

      }

    }

}

# 經過sort把結果排序

GET map_news/index/_search

{

  "query": {

    "match_all": {}

  },

  "sort": [

    {

      "update_time": {

        "order": "desc"

      }

    }

  ]

}

# 查詢範圍

# range查詢

GET map_news/index/_search

{

  "query": {

    "range": {

      "publish_timestamp": {

        "gte": "2016-08-24 18:58:00.0",

        "lte": "2016-08-24 19:58:00.0"

      }

    }

  }

}

# wildcard查詢

# 支持通配符查詢

GET map_news/index/_search

{

  "_source": {

    "includes": ["title"]

  },

  "query": {

    "wildcard": {

      "title": {

        "value": "pyth*n"

      }

    }

  }

}

# # 組合查詢

# bool查詢

# 用bool 包括 must should must_not filter 來完成，格式以下

bool:{

"filter":[], 字段的過濾不參與打分

"must":[],   必須所有都知足

"should":[], 只要知足一個就能夠

"must_not":{}, 必須一個都不能知足

}

# 最簡單的filter查詢

# select * from testjob where salary=20

# 薪資爲20的工做

# term 換爲 terms 就能夠[ ] 查詢多個值

get lagou/testjob/_search

{

  "query": {

    "bool": {

      "filter": {

        "term": {

          "salary": 20

        }

      }

    }

  }

}

# 查看分詞器解析的結果 ik_max_word 和 ik_smart 2種分詞方法

GET _analyze    # 分紅 python 網絡開發工程師工程師

{

  "analyzer": "ik_max_word",

  "text": "python網絡開發工程師"

}

GET _analyze # 分紅 python 網絡開發工程師

{

  "analyzer": "ik_smart",

  "text": "python網絡開發工程師"

}

# bool過濾查詢，能夠作組合查詢

# select * from testjob where (salary=20 or title=python) and (salary!=30)

#查詢薪資等於20k或者是工做爲python的工做，排除價格爲30k的

{

  "query": {

    "bool": {

      "should": [

        {"term": {

          "salary": {

            "value": "20"

          }

        }},

        {"term": {

          "title": {

            "value": "python"

          }

        }}

      ],

      "must_not": [

        {"term": {

          "salary": {

            "value": "30"

          }

        }}

      ]

    }

  }

}

# 嵌套查詢

{

  "query": {

    "bool": {

      "should": [

        {

          "term": {

            "salary": {

              "value": "20"

            }

          }

        },

        {"bool": {

          "must": [

            {"term": {

              "title": {

                "value": "python"

              }

            }},

            {

              "term": {

                "salary": {

                  "value": "30"

                }

              }

            }

          ]

        }}

      ],

      "must_not": [

        {

          "term": {

            "salary": {

              "value": "30"

            }

          }

        }

      ]

    }

  }

}

# 查詢該字段爲null的狀況

GET map_news/index/_search

{

  "query": {

    "bool": {

      "must_not": {

        "exists": {

          "field": "content"

        }

      }

    }

  }

}

10-12 scrapy寫入數據到elasticsearch中 - 1

10-13 scrapy寫入數據到elasticsearch中 - 2

Elasticsearch-dsl

# 取出html標籤

from w3lib.html import remove_tags

remove_tags(u'<span>1000</span>')

第11章 django搭建搜索網站

11-3 django實現elasticsearch的搜索建議 - 1

Es 的 fuzzy 模糊搜索有必定的糾錯性

GET /_search

{

   "query": {

   "fuzzy": {

   "title": {

   "value": "linx",

   "fuzziness": 2,   # 編輯距離   編輯距離要少於等於2步

   "prefix_length": 2   # 前面不參數編輯距離計算的詞的長度，2 爲 li 不參加編輯計算

   }

   }

   }

}

編輯距離：是指一種字符串之間類似程度的計算方法。

即兩個字符串之間的編輯距離等於使一個字符串變成另外一個字符串而進行的插入刪除替換相鄰交換位置進行操做的最少次數。

好比：ed(「linux」,」linx」)==1 只要刪除u就能夠了