elasticsearch 第三講

es的詳細介紹

SearchTemplatenode

tmdb 表示的是模板名稱 dmdb1 表示的是當前的索引算法

腳本方式編輯sql

##編輯模板
POST _scripts/tmdb
{
  "script": {
    "lang": "mustache",
    "source": {
      "_source": ["title", "overview"],
      "size": 20,
      "query": {
        "multi_match": {
          "query": "{{q}}",
          "fields": ["title", "overview"]
        }
      }
    }
  }
}

## 編輯查詢
POST tmdb1/_search/template
{
  "id": "tmdb",
  "params": {
    "q": "basketball with cartoon aliens"
  }
}
aliases 的用戶

它至關於 es 某個文檔的一個別名,能夠把多個索引放入到同一個視圖中,也能夠添加過濾器,把符合條件的索引數據 查詢出來,最後集中成一個別名,查詢該別名能夠把多個索引裏的數據都查詢出來json

#### 新增別名
POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "news",
        "alias": "new1"
      }
    },
    {
      "add": {
        "index": "blogs",
        "alias": "new1"
      }
    }
  ]
}

## 查詢的時候 會吧對應的news和blogs裏的數據都查詢出來
POST new1/_search
{
  "query": {
    "match_all": {}
  }
}

### 刪除別名
POST _aliases
{
  "actions": [
    {
      "remove": {
        "index": "blogs",
        "alias": "new1"
      }
    },
    {
      "remove": {
        "index": "news",
        "alias": "new1"
      }
    }
  ]
}
function score query

表示的是對結果數據從新算分,而後排序,詳情自行百度api

新算分 = 老算分 * 投票數緩存

使用modifier 新算分 = 老算分 * log( 1+投票數)併發

引入 factor 老算分 * log( 1 + factor * 投票數)app

當前的 boostMode 都是爲 multipfy, 表示的是老算法和後邊的關係,能夠爲 sum 等等能夠查官網信息curl

max boost 表示的是 當前的分數控制的最大範圍iphone

DELETE blogs
PUT blogs/_doc/1
{
  "title": "About popularity",
  "content": "In this post we wil talk about...",
  "votes": 0 
}

PUT blogs/_doc/2
{
  "title": "About popularity",
  "content": "In this post we wil talk about...",
  "votes": 100 
}

PUT blogs/_doc/3
{
  "title": "About popularity",
  "content": "In this post we wil talk about...",
  "votes": 1000000 
}

POST blogs/_search
{
  "query": {
    "function_score": {
      "query": {
        "multi_match": {
          "query": "popularity",
          "fields": ["title", "content"]
        }
      },
      "field_value_factor": {
        "field": "votes",
        "modifier": "log1p",
        "factor": 0.1
      },
      "boost_mode": "sum",
      "max_boost": 3
    }
  }
}
suggester 推薦的使用方式

suggest_mode missing表示的是 若是索引中若是存在,則不提供建議,例如lucen solid中的solid; popular 表示的是 推薦出現的頻率比較高的詞例如:rock 會推薦 rocks ,由於文檔裏面有兩個裏面有 rocks; always 表示的是 不管是否存在在索引中都會推薦

POST articles/_bulk 
{"index": {}}
{"body": "lucene is very cool"}
{"index": {}}
{"body": "Elasticsearch builds on top of lucene"}
{"index": {}}
{"body": "Elasticsearch rocks"}
{"index": {}}
{"body": "elastic is the company behind ELK stack"}
{"index":{}}
{"body": "Elk stack rocks"}
{"index":{}}
{"body": "elasticsearch is rock solid"}


POST articles/_search
{
  "suggest": {
    "test1": {
      "text": "lucen solid",
      "term": {
        "field": "body",
        "suggest_mode": "missing"
      }
    }
  }
}
completion suggester

聯想詞信息,基於fst 內存查找的方式,速度比較快,可是侷限也是 只能從首字母開始匹配

DELETE articles

GET articles/_mapping

PUT articles
{
  "mappings": {
    "properties": {
      "title_completion": {
        "type": "completion"
      }
    }
  }
}
POST articles/_bulk 
{"index": {}}
{"title_completion": "lucene is very cool"}
{"index": {}}
{"title_completion": "Elasticsearch builds on top of lucene"}
{"index": {}}
{"title_completion": "Elasticsearch rocks"}
{"index": {}}
{"title_completion": "elastic is the company behind ELK stack"}
{"index":{}}
{"title_completion": "Elk stack rocks"}
{"index":{}}
{"title_completion": "elasticsearch is rock solid"}


POST articles/_search?pretty
{
  "suggest": {
    "articles_suggester": {
      "prefix": "e",
      "completion": {
        "field": "title_completion"
      }
    }
  }
}

completion 能夠根據分類進行查找不一樣的文檔, type爲 category 表示任意字符串

DELETE comments

PUT comments
{
  "mappings": {
    "properties":{
      "comment_autocomplete": {
        "type": "completion",
        "contexts": [
            {
              "type": "category",
              "name": "comment_category"
            }
          ]
      }
    }
  }
}

POST comments/_doc/1
{
  "comment": "I love the star war movies",
  "comment_autocomplete": {
    "input": ["star wars"],
    "contexts": {
      "comment_category": "movies"
    }
  }
}

POST comments/_doc/2
{
  "comment": "Where can I find a Starbucks",
  "comment_autocomplete": {
    "input": ["starbucks"],
    "completions": {
      "comment_category": "coffee"
    }
  }
}


POST comments/_search
{
  "suggest": {
    "YOUR_SUGGESTION": {
      "text": "star",
      "completion":{
        "field": "comment_autocomplete",
        "contexts":
            {
              "comment_category": "movies"
            }
          
      }
    }
  }
}
配置誇集羣搜索

經過命令的方式,能夠吧多個集羣裏面的數據進行搜索

elasticsearch-7.5.0/bin/elasticsearch -E node.name=cluster1_node -E cluster.name=cluster1 -E path.data=cluster1_data -E discovery.type=single-node -E http.port=9201 -E transport.port=9301
elasticsearch-7.5.0/bin/elasticsearch -E node.name=cluster2_node -E cluster.name=cluster2 -E path.data=cluster2_data -E discovery.type=single-node -E http.port=9202 -E transport.port=9302
elasticsearch-7.5.0/bin/elasticsearch -E node.name=cluster3_node -E cluster.name=cluster3 -E path.data=cluster3_data -E discovery.type=single-node -E http.port=9203 -E transport.port=9303


curl -XPUT "http://localhost:9201/_cluster/settings" -H "Content-Type:application/json" -d '{"persistent":{"cluster":{"remote":{"cluster1":{"seeds":["127.0.0.1:9301"], "transport.ping_schedule":"30s"},"cluster2":{"seeds":["127.0.0.1:9302"],"transport.ping_schedule":"30s","transport.compress": true, "skip_unavailable":true},"cluster3":{"seeds":["127.0.0.1:9303"]}}}}}'

curl -XPUT "http://localhost:9202/_cluster/settings" -H "Content-Type:application/json" -d '{"persistent":{"cluster":{"remote":{"cluster1":{"seeds":["127.0.0.1:9301"], "transport.ping_schedule":"30s"},"cluster2":{"seeds":["127.0.0.1:9302"],"transport.ping_schedule":"30s","transport.compress": true, "skip_unavailable":true},"cluster3":{"seeds":["127.0.0.1:9303"]}}}}}'

curl -XPUT "http://localhost:9203/_cluster/settings" -H "Content-Type:application/json" -d '{"persistent":{"cluster":{"remote":{"cluster1":{"seeds":["127.0.0.1:9301"], "transport.ping_schedule":"30s"},"cluster2":{"seeds":["127.0.0.1:9302"],"transport.ping_schedule":"30s","transport.compress": true, "skip_unavailable":true},"cluster3":{"seeds":["127.0.0.1:9303"]}}}}}'


curl -XPOST "http://localhost:9201/users/_doc" -H "Content-Type:application/json" -d '{"name":"user1", "age": 10}'

curl -XPOST "http://localhost:9202/users/_doc" -H "Content-Type:application/json" -d '{"name":"user2", "age": 20}'

curl -XPOST "http://localhost:9203/users/_doc" -H "Content-Type:application/json" -d '{"name":"user3", "age": 30}'



訪問方式
http://localhost:9201/cluster1:users,cluster2:users,cluster3:users/_search

node.master=false 能夠設置當前節點不能爲主節點

若是配置成功,則若是更改node.master=true的時候,啓動當前的服務,則會報錯 master not discovered or elected yet, an election requires 不能找到主節點,解決方式是 刪除 對應的data數據,可是這樣對應的 數據信息也所有刪除掉啦;

能夠優先啓動 原主節點,而後再啓動刪除之後的 節點,這樣數據會從新同步過來

es 若是設置的主分片爲3 副本爲1,若是數據分佈到不一樣的機器上,若是某臺機子掛掉,則改機子裏面的數據對應的副本也會同步到其餘機器上,若是掛掉的機器是主分片,則會在副本中從新選舉 主分片

主分片建立的時候就肯定不能修改,除非刪除索引 從新錄入;

文檔到分片的路由算法

shard = hash(_routing) % number_of_primary_shards

hash 確保均勻的分佈到分片中 默認_routing 是文檔的id

能夠指定_routing 的值 這裏就是 主分片不能修改的緣由

PUT posts/_doc/100?routing=bigdata
{
	"title": "Master Elasticsearch",
	"body": "Let's Rock"
}
es分片和生命週期

單個倒排索引表示的是一個 segment,segment是不可變動的,多個segment就是 index,他對應的是es中的分片

當有文檔寫入的時候,會生成新的 segment ,查詢的時候會查詢全部的 segments,對結果彙總 ,刪除文檔信息 保存在 .del 文件中

es refresh

將index buffer 寫入到 segment的過程叫 refresh,

refresh 默認1秒執行一次,refresh成功之後就能夠被搜索到啦;

若是系統有大量的數據寫入就會有不少的 segment

index buffer 被佔滿也會觸發 refresh,默認值爲 JVM的 10%

<img src="/Users/duanlsh/Library/Application Support/typora-user-images/image-20200327015811530.png" alt="image-20200327015811530" style="zoom:50%;" />

transaction log

segment 寫入磁盤的過程比較耗時,因此,先把segment寫入緩存,以開放查詢;

爲啦防止數據丟失,因此同時會寫入到 Transaction log 中,transaction log會有入盤操做,每一個分片都有一個 transaction log

這樣,若是斷電的狀況下,若是啓動先從transaction log中加載到數據,保證數據完整性

<img src="/Users/duanlsh/Library/Application Support/typora-user-images/image-20200327020120587.png" alt="image-20200327020120587" style="zoom:50%;" />

Flush

flush 默認30分鐘調用一次,首先調用 refresh 清空 index buffer;

調用 fsync, 將緩存中的 segment 寫入到磁盤,保證全部數據 進入到 transaction log中;

清空 transaction log 中的數據;

當 transaction log 滿的時候也會調用flush, transaction默認爲 512MB大小;

<img src="/Users/duanlsh/Library/Application Support/typora-user-images/image-20200327020850670.png" alt="image-20200327020850670" style="zoom:50%;" />

merge

segment有不少,會按期進行合併;減小 segment的數量和 刪除的文件;

強制merge 經過 POST my_index/_forcemerge 進行操做

對文本進行排序

對文本排序須要設置 字段爲 fielddata 爲true 默認爲 docvalues ,更改成 field data

<img src="/Users/duanlsh/Library/Application Support/typora-user-images/image-20200328014528877.png" alt="image-20200328014528877" style="zoom:50%;" />

PUT /kibana_sample_data_ecommerce/_mapping
{
	"properties":{
		"customer_full_name" : {
          "type" : "text",
          "fielddata": true, 
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
	}
}
es 分頁獲取數據
深度分頁 from size

es數據是保存到多個分片上的,多個機器上的,當查詢 from 990 size 10 的時候,會在每一個分片上獲取 1000個文檔,而後經過 coordinating Node 聚合全部結果,最後再經過排序獲取前 1000個文檔,頁數越深,佔用的內存也越大,es默認限制是10000個文檔,能夠經過 index.max.result.window 來設置

POST /kibana_sample_data_ecommerce/_search
{
  "from": 1,
  "size": 2, 
  "query": {
    "match_all": {}
  }
}
search_after 他必須的from爲0開始

search_after 爲 返回的結果信息裏面的 sort信息,以此來實現分頁效果;能夠避免深度分頁問題;

可是若是新添加數據,仍然能夠搜索的到

//第一次請求:
POST /kibana_sample_data_ecommerce/_search
{
  "size": 2, 
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "order_date": {
        "order": "desc"
      },
      "_id":{
        "order": "desc"
      }
    }
  ]
}

返回結果
...
	"sort" : [
          1581808954000,
          "gTTfym8BtdKew7ex1Zsk"
        ]
//第二次請求 
POST /kibana_sample_data_ecommerce/_search
{
  "from": 1,
  "size": 2, 
  "query": {
    "match_all": {}
  },
  "search_after":[
          1581808954000,
          "gTTfym8BtdKew7ex1Zsk"
        ],
  "sort": [
    {
      "order_date": {
        "order": "desc"
      },
      "_id":{
        "order": "desc"
      }
    }
  ]
}
scroll api 的用法

他是經過建立快照的方式進行查詢;

也就是在生成快照的時候的數據,爲最終能查找到的數據,若是中間新增啦數據,是沒法查找到的;

查找方式爲每次查找數據,都要輸入上次查找的id

他的數量是按照第一次查詢的數量計算的;

//設置croll保存5分鐘
POST /kibana_sample_data_ecommerce/_search?scroll=5m
{
	"size": 1,
	"query": {
		"match_all":{}
	}
}
//吧上面結果的scrollId 獲取 再次查詢,有效爲1分鐘
POST _search/scroll
{
	"scroll": "1m",
	"scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAADZUWSThxQkp4VUZTZy1ZZzE0OGI1OW02Zw=="
}
from size search after scroll
不適合深度分頁 能夠深度分頁,可是隻能從0開始日後查詢 隨機返回效率高
適合 前幾頁數據查詢 適合深度分頁查詢 適合所有文件獲取下載
es併發控制

es 併發採用的是樂觀鎖;

es 使用 if_seq_no和if_primary_term來更新數據的時候,當前的數據的 對應的值必須的和傳遞的值相同,不然不能更新

es 也能夠經過 version和version_type 爲鎖來控制對應的值信息

DELETE products
GET products/_search

//此處會返回對應的 seq_no 和 primary_term 的值,就是下面對應的值信息
PUT products/_doc/1
{
  "title": "iphone",
  "count": 100
}

PUT products/_doc/1?if_seq_no=1&if_primary_term=1
{
  "title":"iphone1",
  "count": 100
}
//此處的version必須的大於當前1文檔的version不然衝突這個就是es的併發處理樂觀鎖
PUT products/_doc/1?version=6&version_type=external
{
  "title":"iphone2",
  "count": 1
}
es的聚合分析 min max avg stats terms range histogram
sql es
select count(brand) from table metric
group by bucket

聚合計算 是不能操做text類型的數據的;

terms aggregation 不能對text進行 分桶,能夠更改成 filedata類型 能夠參考 docvalue和field data 的不區別, keyword 默認支持分桶

aggs 包含 min max avg stats terms range histogram

先分組,而後獲取分組內的 top信息用的是 top_hits

獲取總分組的數量使用的是 cardinality

DELETE employees

GET employees/_mapping

PUT employees
{
  "mappings": {
    "properties": {
      "age":{
        "type": "integer"
      },
      "gender": {
        "type": "keyword"
      },
      "name": {
        "type": "keyword"
      },
      "salary": {
        "type": "integer"
      },
      "job": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above":22
          }
        }
      }
    }
  }
}


POST employees/_bulk
{"index":{"_id": "1"}}
{"name":"Emma","age":"32","job":"Product Manager", "gender": "female","salary": "35000"}
{"index":{"_id": "2"}}
{"name":"Underwood","age":"41","job":"Dev Manager", "gender": "male","salary": "50000"}
{"index":{"_id": "3"}}
{"name":"Tran","age":"25","job":"Web Designer", "gender": "male","salary": "18000"}
{"index":{"_id": "4"}}
{"name":"Rivera","age":"26","job":"Web Designer", "gender": "female","salary": "22000"}
{"index":{"_id": "5"}}
{"name":"Rose","age":"25","job":"QA", "gender": "female","salary": "18000"}
{"index":{"_id": "6"}}
{"name":"Lucy","age":"31","job":"QA", "gender": "female","salary": "25000"}
{"index":{"_id": "7"}}
{"name":"Byrd","age":"27","job":"QA", "gender": "male","salary": "20000"}
{"index":{"_id": "8"}}
{"name":"Foster","age":"27","job":"Java Programmer", "gender": "male","salary": "20000"}
{"index":{"_id": "9"}}
{"name":"Gregory","age":"32","job":"Java Programmer", "gender": "male","salary": "22000"}
{"index":{"_id": "10"}}
{"name":"Bryant","age":"20","job":"Java Programmer", "gender": "male","salary": "9000"}
{"index":{"_id": "11"}}
{"name":"Jenny","age":"36","job":"Java Programmer", "gender": "female","salary": "38000"}
{"index":{"_id": "12"}}
{"name":"Mcdonald","age":"31","job":"Java Programmer", "gender": "male","salary": "32000"}
{"index":{"_id": "13"}}
{"name":"Jonthna","age":"30","job":"Java Programmer", "gender": "female","salary": "30000"}
{"index":{"_id": "14"}}
{"name":"Marsha","age":"32","job":"Javascript Programmer", "gender": "male","salary": "25000"}
{"index":{"_id": "15"}}
{"name":"King","age":"33","job":"Java Programmer", "gender": "male","salary": "28000"}
{"index":{"_id": "16"}}
{"name":"Mccarthy","age":"21","job":"Javascript Programmer", "gender": "male","salary": "16000"}
{"index":{"_id": "17"}}
{"name":"Goodwid","age":"25","job":"Javascript Programmer", "gender": "male","salary": "16000"}
{"index":{"_id": "18"}}
{"name":"Catherine","age":"29","job":"Javascript Programmer", "gender": "female","salary": "20000"}
{"index":{"_id": "19"}}
{"name":"Boone","age":"30","job":"DBA", "gender": "male","salary": "30000"}
{"index":{"_id": "20"}}
{"name":"Kathy","age":"29","job":"DBA", "gender": "female","salary": "20000"}


POST employees/_search
{
  "size": 0,
  "aggs": {
    "min_salary": {
      "min": {
        "field": "salary"
      }
    }
  }
}

POST employees/_search
{
  "size": 0,
  "aggs": {
    "max_salary": {
      "max": {
        "field": "salary"
      }
    }
  }
}

POST employees/_search
{
  "size": 0,
  "aggs": {
    "min_salay": {
      "min": {
        "field": "salary"
      }
    },
    "max_salay": {
      "max": {
        "field": "salary"
      }
    },
    "avg_salay": {
      "avg": {
        "field": "salary"
      }
    }
  }
}



POST employees/_search
{
  "size": 20,
  "aggs": {
    "stats_salay":{
      "stats": {
        "field": "salary"
      }
    }
  }
}


POST employees/_search
{
  "size": 0,
  "aggs": {
    "jobs": {
      "terms": {
        "field": "job.keyword"
      }
    }
  }
}

//分桶返回的類別總數量
POST employees/_search
{
  "size": 0,
  "aggs": {
    "cardinate": {
      "cardinality": {
        "field": "job.keyword"
      }
    }
  }
}

POST employees/_search
{
  "size": 0,
  "aggs": {
    "gender": {
      "terms": {
        "field": "age",
        "size": 20
      }
    }
  }
}
### 根據不一樣的工種 年齡最大的3員工信息
POST employees/_search
{
  "size": 0,
  "aggs": {
    "result": {
      "terms": {
        "field": "job.keyword"
      },
      "aggs": {
        "old_employees": {
          "top_hits": {
            "size": 3
            , "sort": [
              {"age": {"order": "desc"}}
              ]
          }
        }
      }
    }
  }
}

#### range 分桶,指定key
POST employees/_search
{
  "size": 0,
  "aggs": {
    "range_result": {
      "range": {
        "field": "salary",
        "ranges": [
          {
            "from": 0,
            "to": 10000
          },
          {
            "key": "1w-2w",
            "from": 10000,
            "to": 20000
          },
          {
            "key": ">2w",
            "from": 20000
          }
        ]
      }
    }
  }
}

#### histogram 分桶 按照5000進行分桶統計
POST employees/_search
{
  "size": 0,
  "aggs": {
    "result1": {
      "histogram": {
        "field": "salary",
        "interval": 5000,
        "extended_bounds": {
          "min": 0,
          "max": 100000
        }
      }
    }
  }
}
pipeline 聚合分析

pipeline 表示的是 能夠對 聚合的結果進行二次聚合

#### 獲取term_job 分桶下的每個值的平均值中的最小值
#### term_job 表示的外部聚合 avg_salary 表示的是外部聚合的內部聚合
POST employees/_search
{
  "size": 0,
  "aggs": {
    "term_job": {
      "terms": {
        "field": "job.keyword"
      },
      "aggs": {
        "avg_salary":{
          "avg": {
            "field": "salary"
          }
        }
      }
    },
    "result":{
      "min_bucket": {
        "buckets_path": "term_job>avg_salary"
      }
    }
  }
}
聚合的做用範圍和排序

1使用query進行查詢,當進行聚合的時候,是對query的結果進行聚合操做的; eg1

二、能夠再 aggs中使用 filter 進行過濾,同時進行agg聚合,當前是在 fillter結果中進行聚合操做,若是在filter的父級進行aggs操做的話,是操做的所有數據 eg2

三、postfield 是對 聚合結果進行篩選,查看匹配對應結果的數據 eg3

四、global 至關因而1和2的整合,當使用global的時候,進行query查詢不會對結果統計有影響 eg4

五、聚合排序的時候 能夠按照字段key和count進行排序 eg5

六、聚合排序的時候,能夠按照另外一個聚合結果進行排序 eg6

eg1

POST employees/_search
{
  "size": 0, 
  "query": {
    "range": {
      "age": {
        "gte": 20
      }
    }
  },
  "aggs": {
    "result": {
      "terms": {
        "field": "job.keyword"
      }
    }
  }
}

eg2

#### 分爲兩種,一種是過濾結果的統計,一個是整個內容的統計,query 查詢結果只能 過濾結果的統計
POST employees/_search
{
  "size": 0, 
  "aggs": {
    "old_persion": {
      "filter": {
        "range": {
          "age": {
            "gte": 40
          }
        }
      },
      "aggs": {
        "jobs": {
          "terms": {
            "field": "job.keyword"
          }
        }
      }
    },
    "all_jobs":{
      "terms": {
        "field": "job.keyword"
      }
    }
  }
}

eg3

POST employees/_search
{
  "aggs": {
    "jobs": {
      "terms": {
        "field": "job.keyword"
      }
    }
  },
  "post_filter": {
    "match":{
      "job.keyword": "Web Designer"
    }
  }
}

eg4

##### global 至關於上面的對全部內容統計的部分處理;此處不是用的filter方式,而是用的global的方式進行處理; 他是忽略掉啦 query的查詢條件;
POST employees/_search
{
  "size": 0,
  "query": {
    "range": {
      "age": {
        "gte": 40
      }
    }
  },
  "aggs": {
    "jobs": {
      "terms": {
        "field": "job.keyword"
      }
    },
    "all":{
      "global": {},
      "aggs": {
        "all_result": {
          "terms": {
            "field": "job.keyword"
          }
        }
      }
    }
  }
}

eg5

#### 排序順序 _key 表示的是按照key執行 _count 按照數量執行排序,順序是按照後面寫的字段優先排序,而後再按照前面寫的字段排序,當前就是 先按照 _count 再按照 _key 排序
POST employees/_search
{
  "size": 0, 
  "query": {
    "range": {
      "age": {
        "gte": 20
      }
    }
  },
  "aggs": {
    "NAME": {
      "terms": {
        "field": "job.keyword",
        "order": {
          "_key": "desc",
          "_count": "asc"
        }
      }
    }
  }
}

eg6

#### 按照聚合結果進行排序
POST employees/_search
{
  "size": 0,
  "aggs": {
    "jobs": {
      "terms": {
        "field": "job.keyword",
        "order": {
          "test1": "asc"
        }
      },
      "aggs": {
        "test1": {
          "avg": {
            "field": "salary"
          }
        }
      }
    }
  }
}
分佈式系統近似統計算法

TODO 再驗證

<img src="/Users/duanlsh/Library/Application Support/typora-user-images/image-20200401013056443.png" alt="image-20200401013056443" style="zoom:50%;" />

nested 對象

nested 對象信息 表示的是數據查詢中包含對象的信息

eg1 查詢中 搜索 Keanu Hopper 是能夠查詢到結果的,由於es分析的時候,至關於解析成啦 actors.first_name=["Keanu",Dennis]actors.last_name=["Reeves","Hopper"] 經過之前學的知識可知,只要包含查詢值,則會命中,因此能夠選中;

eg2 查詢搜索 Keanu Hopper 是不能夠命中的,由於使用啦 nested 表示的是一個對象,他解析成的是 兩個文檔, Keanu ReevesDennis Hopper 只有包含着兩個鐘的一個纔會命中

eg1

DELETE my_movie

PUT my_movie
{
  "mappings" : {
    "properties" : {
        "actors" : {
          "properties" : {
            "first_name" : {
              "type" : "keyword"
            },
            "last_name" : {
              "type" : "text"
            }
          }
        },
        "title" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

POST my_movie/_doc/1
{
  "title": "speed",
  "actors": [
      {
        "first_name": "Keanu",
        "last_name": "Reeves"
      },
      {
        "first_name": "Dennis",
        "last_name": "Hopper"
      }
    ]
}


POST my_movie/_search
{
  
  "query": {
    "bool": {
      "must": [
        {"match": {
        "actors.first_name": "Keanu"
        }},
        {"match": {
        "actors.last_name": "Hopper"
        }}
        ]
    }
  }
}

eg2

DELETE my_movie

PUT my_movie
{
  "mappings" : {
    "properties" : {
        "actors" : {
          "type": "nested", 
          "properties" : {
            "first_name" : {
              "type" : "keyword"
            },
            "last_name" : {
              "type" : "text"
            }
          }
        },
        "title" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

POST my_movie/_doc/1
{
  "title": "speed",
  "actors": [
      {
        "first_name": "Keanu",
        "last_name": "Reeves"
      },
      {
        "first_name": "Dennis",
        "last_name": "Hopper"
      }
    ]
}

POST my_movie/_search
{
  
  "query": {
    "bool": {
      "must": [
        {
          "nested": {
            "path": "actors",
            "query": {
              "bool": {
                "must": [
                  {"match": {
                    "actors.first_name": "Keanu"
                  }},
                  {"match": {
                    "actors.last_name": "Hopper"
                  }}
                ]
              }
            }
          }
        }
      ]
    }
  }
}
文檔的父子關係文檔

創建父子文檔關係

DELETE my_blogs

PUT my_blogs
{
  "settings": {
    "number_of_shards": 2
  },
  "mappings": {
    "properties": {
      "blog_comments_relation":{
        "type": "join",
        "relations": {
          "blog": "comment"
        }
      },
      "content":{
        "type": "text"
      },
      "title":{
        "type": "keyword"
      }
    }
  }
}

PUT my_blogs/_doc/blog1
{
  "title": "Learning Elasticsearch",
  "content": "learning ELK @ geektime",
  "blog_comments_relation": {
    "name": "blog"
  }
}

PUT my_blogs/_doc/blog2
{
  "title": "Learning Hadoop",
  "content": "learning Hadoop",
  "blog_comments_relation":{
    "name": "blog"
  }
}

PUT my_blogs/_doc/comment1?routing=blog1
{
  "comment": "I am learning ELK",
  "username": "Jack",
  "blog_comments_relation": {
    "name":"comment",
    "parent": "blog1"
  }
}

PUT my_blogs/_doc/comment2?routing=blog2
{
  "comment": "I like Hadoop!!!!!",
  "username": "Jack",
  "blog_comments_relation": {
    "name":"comment",
    "parent": "blog2"
  }
}

PUT my_blogs/_doc/comment3?routing=blog2
{
  "comment": "Hello Hadoop",
  "username": "Bob",
  "blog_comments_relation": {
    "name":"comment",
    "parent": "blog2"
  }
}


POST my_blogs/_search
{
  
}

GET my_blogs/_doc/blog2

POST my_blogs/_search
{
  "query": {
    "parent_id": {
      "type": "comment",
      "id": "blog2"
    }
  }
}

#### 返回子文檔信息
POST my_blogs/_search
{
  "query": {
    "has_parent": {
      "parent_type": "blog",
      "query": {
        "match": {
          "content": "Learning hadoop"
        }
      }
    }
  }
}

#### 返回父文檔信息
POST my_blogs/_search
{
  "query": {
    "has_child": {
      "type": "comment",
      "query": {
        "match": {
          "username": "Bob"
        }
      }
    }
  }
}
索引重建

當索引類型發生變動,須要重建索引

索引主分片發生變化 須要重建索引

update by query 在現有的索引上重建

reindex 在其餘索引上重建

DELETE blogs

PUT blogs/_doc/1
{
  "content":"Hadoop is cool",
  "keyword": "hadoop"
}


GET blogs/_mapping

PUT blogs/_mapping
{
  "properties" : {
        "content" : {
          "type" : "text",
          "fields" : {
            "english" : {
              "type" : "text",
              "analyzer": "english"
            }
          }
        }
      }
}

PUT blogs/_doc/2
{
  "content": "Elasticsearch rocks",
  "keyword": "elasticsearch"
}

POST blogs/_search
{
  "query": {
    "match": {
      "content.english": "hadoop"
    }
  }
}

##### 添加索引部份內容的時候,直接 _update_by_query
POST blogs/_update_by_query
{}


PUT blogs/_mapping
{
  "properties": {
    "keyword" : {
          "type" : "keyword"
          
        }
  }
}

DELETE blogs_fix

#### 更改類型的時候 重建索引
PUT blog_fix
{
  "mappings": {
    "properties" : {
        "content" : {
          "type" : "text",
          "fields" : {
            "english" : {
              "type" : "text",
              "analyzer" : "english"
            },
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "keyword" : {
          "type" : "keyword"
        }
      }
  }
}

GET blog_fix/_mapping

###### 重建索引,把原來的索引導入進新索引
POST _reindex
{
  "source": {
  	#### 原來索引名
    "index": "blogs",
    #### 獲取匹配的索引
    "query": {
      "match": {
        "content": "elasticsearch"
      }
    },
    "size": 1
  },
  "dest": {
  	#### 目標索引
    "index": "blog_fix",
    #### 若是當前索引的數據存在,則拋異常,不存在的數據添加進去
    #### 若是不加這個則 所有覆蓋, 可是若是原來已經存在的,添加進來的數據不存在,則直接保留
    "op_type": "create"
  }
}

GET blog_fix/_doc/1


PUT blog_fix/_doc/3
{
  "content": "Elasticsearch rocks copy1",
  "keyword": "elasticsearch copy1"
}

DELETE blog_fix/_doc/1

POST blog_fix/_search
{
  "size": 0,
  "aggs": {
    "blog_keyword": {
      "terms": {
        "field": "keyword",
        "size": 10
      }
    }
  }
}

POST blog_fix/_search
{}
IngestPipeline

至關因而一個管道,能夠對添加進去的數據進行 管道過濾處理,好比說新增字段 es,hadoop 能夠經過分割管道,在新增的時候指定分割管道,則添加的數據自動轉換成 對應的數據; 也能夠對原來的數據 指定管道的方式重建索引

### pipleline 的用法
DELETE tech_blogs

PUT tech_blogs/_doc/1
{
  "title": "Introducing big data...",
  "tags": "hadoop,elasticsearch,spark",
  "content": "You know, for big data"
}

GET tech_blogs/_doc/1

#### 測試pipleline對字段的 測試效果
POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "description": "to split blog tags",
    "processors": [
      {
        // 對字段進行分割
        "split": {
          "field": "tags",
          "separator": ","
        }
      },
      {
        // 添加字段
        "set": {
          "field": "view",
          "value": "0"
        }  
      }
    ]
  },
  "docs": [
      {
        "_source" : {
          "tags" : "hadoop,elasticsearch,spark"
        }
      }
    ]
}

//定義一個pipleine
PUT _ingest/pipeline/blog_pipleline
{
  "processors": [
      {
        "split": {
          "field": "tags",
          "separator": ","
        },
        "set": {
          "field": "view",
          "value": "0"
        }
      }
    ]
}

GET _ingest/pipeline/blog_pipleline

// 這樣會對文案自動使用上blog_pipleline 對應的信息
POST _ingest/pipeline/blog_pipleline/_simulate
{
  "docs": [
      {
        "_source" : {
          "tags" : "hadoop,elasticsearch,spark"
        }
      }
    ]
}

POST tech_blogs/_doc/2?pipeline=blog_pipleline
{
  "title": "Introducing cloud computering",
  "tags": "openstacks, k8s",
  "content": "You know, for cloud"
}

POST tech_blogs/_doc/3
{
  "title": "Introducing cloud computering",
  "tags": "openstacks, k8s",
  "content": "You know, for cloud"
}


POST tech_blogs/_search
{}

//執行的時候雖然已經使用 blog_pipleline 的數據會報錯,可是也會修改爲功
POST tech_blogs/_update_by_query?pipeline=blog_pipleline
{}
// 能夠經過這樣的方法總體更改
POST tech_blogs/_update_by_query?pipeline=blog_pipleline
{
  "query": {
    "bool": {
      "must_not": [
        {
          "exists": {
            "field": "views"
          }
        }
      ]
    }
  } 
}
相關文章
相關標籤/搜索