本套技術專欄做者(秦凱新)專一於大數據及容器雲核心技術解密,具有5年工業級IOT大數據雲平臺建設經驗,可提供全棧的大數據+雲原平生臺諮詢方案,請持續關注本套博客。QQ郵箱地址:1120746959@qq.com,若有任何學術交流,可隨時聯繫。html
/_search:全部索引,全部type下的全部數據都搜索出來
/index1/_search:指定一個index,搜索其下全部type的數據
/index1,index2/_search:同時搜索兩個index下的數據
/*1,*2/_search:按照通配符去匹配多個索引
/index1/type1/_search:搜索一個index下指定的type的數據
/index1/type1,type2/_search:能夠搜索一個index下多個type的數據
/index1,index2/type1,type2/_search:搜索多個index下的多個type的數據
/_all/type1,type2/_search:_all,能夠表明搜索全部index下的指定type的數據
複製代碼
GET /_search?size=10
GET /_search?size=10&from=0
GET /_search?size=10&from=20
GET /test_index/test_type/_search
"hits": {
"total": 9,
"max_score": 1,
複製代碼
GET /test_index/test_type/_search?from=0&size=3
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 9,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "test_type",
"_id": "8",
"_score": 1,
"_source": {
"test_field": "test client 2"
}
},
{
"_index": "test_index",
"_type": "test_type",
"_id": "6",
"_score": 1,
"_source": {
"test_field": "tes test"
}
},
{
"_index": "test_index",
"_type": "test_type",
"_id": "4",
"_score": 1,
"_source": {
"test_field": "test4"
}
}
]
}
}
第一頁:id=8,6,4
GET /test_index/test_type/_search?from=3&size=3
第二頁:id=2,自動生成,7
GET /test_index/test_type/_search?from=6&size=3
第三頁:id=1,11,3
複製代碼
自動或手動爲index中的type創建的一種數據結構和相關配置,簡稱爲mapping。web
dynamic mapping,自動爲咱們創建index,建立type,以及type對應的mapping,mapping中包含了每一個field對應的數據類型,以及如何分詞等設置 咱們固然,後面會講解,也能夠手動在建立數據以前,先建立index和type,以及type對應的mapping。bash
插入幾條數據,讓es自動爲咱們創建一個索引數據結構
PUT /website/article/1
{
"post_date": "2017-01-01",
"title": "my first article",
"content": "this is my first article in this website",
"author_id": 11400
}
PUT /website/article/2
{
"post_date": "2017-01-02",
"title": "my second article",
"content": "this is my second article in this website",
"author_id": 11400
}
PUT /website/article/3
{
"post_date": "2017-01-03",
"title": "my third article",
"content": "this is my third article in this website",
"author_id": 11400
}
複製代碼
GET /website/article/_search?q=2017 3條結果
GET /website/article/_search?q=2017-01-01 3條結果
GET /website/article/_search?q=post_date:2017-01-01 1條結果
GET /website/article/_search?q=post_date:2017 1條結果
複製代碼
GET /website/_mapping/article
{
"website": {
"mappings": {
"article": {
"properties": {
"author_id": {
"type": "long"
},
"content": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"post_date": {
"type": "date"
},
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
複製代碼
doc1:I really liked my small dogs, and I think my mom also liked them.
doc2:He never liked any dogs, so I hope that my mom will not expect me to liked him.
複製代碼
mother like little dog,
不可能有任何結果。
這個是否是咱們想要的搜索結果???絕對不是,由於在咱們看來,mother和mom有區別嗎?同義詞,都是媽媽的意思。
like和liked有區別嗎?沒有,都是喜歡的意思,只不過一個是如今時,一個是過去時。little和small有區別嗎?
同義詞,都是小小的。dog和dogs有區別嗎?狗,只不過一個是單數,一個是複數。
複製代碼
時態的轉換,單複數的轉換,同義詞的轉換,大小寫的轉換
mom —> mother
liked —> like
small —> little
dogs —> dog
複製代碼
從新創建倒排索引,加入normalization,再次用mother liked little dog搜索,就能夠搜索到了app
character filter:在一段文本進行分詞以前,先進行預處理,好比說最多見的就是,過濾
html標籤(<span>hello<span> --> hello),& --> and(I&you --> I and you)
tokenizer:分詞,hello you and me --> hello, you, and, me
token filter:lowercase,stop word,synonymom,dogs --> dog,liked --> like,Tom --> tom,a/the/an --> 幹掉,mother --> mom,small --> little
複製代碼
一個分詞器,很重要,將一段文本進行各類處理,最後處理好的結果纔會拿去創建倒排索引elasticsearch
內置分詞器的介紹post
Set the shape to semi-transparent by calling set_trans(5)
standard analyzer:set, the, shape, to, semi, transparent, by, calling, set_trans, 5(默認的是standard)
simple analyzer:set, the, shape, to, semi, transparent, by, calling, set, trans
whitespace analyzer:Set, the, shape, to, semi-transparent, by, calling, set_trans(5)
language analyzer(特定的語言的分詞器,好比說,english,英語分詞器):set, shape, semi, transpar, call, set_tran, 5
複製代碼
query string必須以和index創建時相同的analyzer進行分詞
query string對exact value和full text的區別對待
知識點:不一樣類型的field,可能有的就是full text,有的就是exact value
post_date,date:exact value
_all:full text,分詞,normalization
GET /_search?q=2017
搜索的是_all field,document全部的field都會拼接成一個大串,進行分詞
2017-01-02 my second article this is my second article in this website 11400
doc1 doc2 doc3
2017 * * *
01 *
02 *
03 *
_all,2017,天然會搜索到3個docuemnt
GET /_search?q=2017-01-01
_all,2017-01-01,query string會用跟創建倒排索引同樣的分詞器去進行分詞
2017
01
01
GET /_search?q=post_date:2017-01-01
date,會做爲exact value去創建索引
doc1 doc2 doc3
2017-01-01 *
2017-01-02 *
2017-01-03 *
post_date:2017-01-01,2017-01-01,doc1一條document
GET /_search?q=post_date:2017,這個在這裏不講解,由於是es 5.2之後作的一個優化
複製代碼
GET /_analyze
{
"analyzer": "standard",
"text": "Text to analyze"
}
複製代碼
往es裏面直接插入數據,es會自動創建索引,同時創建type以及對應的mapping測試
mapping中就自動定義了每一個field的數據類型大數據
不一樣的數據類型(好比說text和date),可能有的是exact value,有的是full text優化
exact value,在創建倒排索引的時候,分詞的時候,是將整個值一塊兒做爲一個關鍵詞創建到倒排索引中的;full text,會經歷各類各樣的處理,分詞,normaliztion(時態轉換,同義詞轉換,大小寫轉換),纔會創建到倒排索引中
同時呢,exact value和full text類型的field就決定了,在一個搜索過來的時候,對exact value field或者是full text field進行搜索的行爲也是不同的,會跟創建倒排索引的行爲保持一致;好比說exact value搜索的時候,就是直接按照整個值進行匹配,full text query string,也會進行分詞和normalization再去倒排索引中去搜索
能夠用es的dynamic mapping,讓其自動創建mapping,包括自動設置數據類型;也能夠提早手動建立index和type的mapping,本身對各個field進行設置,包括數據類型,包括索引行爲,包括分詞器,等等
mapping,就是index的type的元數據,每一個type都有一個本身的mapping,決定了數據類型,創建倒排索引的行爲,還有進行搜索的行爲
string
byte,short,integer,long
float,double
boolean
date
複製代碼
true or false --> boolean
123 --> long
123.45 --> double
2017-01-01 --> date
"hello world" --> string/text
複製代碼
GET /index/_mapping/type
複製代碼
analyzed
not_analyzed
no
複製代碼
PUT /website
{
"mappings": {
"article": {
"properties": {
"author_id": {
"type": "long"
},
"title": {
"type": "text",
"analyzer": "english"
},
"content": {
"type": "text"
},
"post_date": {
"type": "date"
},
"publisher_id": {
"type": "text",
"index": "not_analyzed"
}
}
}
}
}
複製代碼
PUT /website
{
"mappings": {
"article": {
"properties": {
"author_id": {
"type": "text"
}
}
}
}
}
{
"error": {
"root_cause": [
{
"type": "index_already_exists_exception",
"reason": "index [website/co1dgJ-uTYGBEEOOL8GsQQ] already exists",
"index_uuid": "co1dgJ-uTYGBEEOOL8GsQQ",
"index": "website"
}
],
"type": "index_already_exists_exception",
"reason": "index [website/co1dgJ-uTYGBEEOOL8GsQQ] already exists",
"index_uuid": "co1dgJ-uTYGBEEOOL8GsQQ",
"index": "website"
},
"status": 400
}
複製代碼
PUT /website/_mapping/article
{
"properties" : {
"new_field" : {
"type" : "string",
"index": "not_analyzed"
}
}
}
複製代碼
GET /website/_analyze
{
"field": "content",
"text": "my-dogs"
}
GET website/_analyze
{
"field": "new_field",
"text": "my dogs"
}
{
"error": {
"root_cause": [
{
"type": "remote_transport_exception",
"reason": "[4onsTYV][127.0.0.1:9300][indices:admin/analyze[s]]"
}
],
"type": "illegal_argument_exception",
"reason": "Can't process field [new_field], Analysis requests are only supported on tokenized fields"
},
"status": 400
}
複製代碼
{ "tags": [ "tag1", "tag2" ]}
複製代碼
null,[],[null]
複製代碼
PUT /company/employee/1
{
"address": {
"country": "china",
"province": "guangdong",
"city": "guangzhou"
},
"name": "jack",
"age": 27,
"join_date": "2017-01-01"
}
address:object類型
GET /company/_mapping/employee
{
"company": {
"mappings": {
"employee": {
"properties": {
"address": {
"properties": {
"city": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"country": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"province": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"age": {
"type": "long"
},
"join_date": {
"type": "date"
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
複製代碼
{
"address": {
"country": "china",
"province": "guangdong",
"city": "guangzhou"
},
"name": "jack",
"age": 27,
"join_date": "2017-01-01"
}
{
"name": [jack],
"age": [27],
"join_date": [2017-01-01],
"address.country": [china],
"address.province": [guangdong],
"address.city": [guangzhou]
}
複製代碼
{
"authors": [
{ "age": 26, "name": "Jack White"},
{ "age": 55, "name": "Tom Jones"},
{ "age": 39, "name": "Kitty Smith"}
]
}
{
"authors.age": [26, 55, 39],
"authors.name": [jack, white, tom, jones, kitty, smith]
}
複製代碼
GET /_search
{
"query": {
"match_all": {}
}
}
複製代碼
{
QUERY_NAME: {
ARGUMENT: VALUE,
ARGUMENT: VALUE,...
}
}
{
QUERY_NAME: {
FIELD_NAME: {
ARGUMENT: VALUE,
ARGUMENT: VALUE,...
}
}
}
GET /test_index/test_type/_search
{
"query": {
"match": {
"test_field": "test"
}
}
}
複製代碼
GET /website/article/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"title": "elasticsearch"
}
}
],
"should": [
{
"match": {
"content": "elasticsearch"
}
}
],
"must_not": [
{
"match": {
"author_id": 111
}
}
]
}
}
}
GET /test_index/_search
{
"query": {
"bool": {
"must": { "match": { "name": "tom" }},
"should": [
{ "match": { "hired": true }},
{ "bool": {
"must": { "match": { "personality": "good" }},
"must_not": { "match": { "rude": true }}
}}
],
"minimum_should_match": 1
}
}
複製代碼
生產部署還有不少工做要作,本文從初級思路切入,進行了問題的整合。
本套技術專欄做者(秦凱新)專一於大數據及容器雲核心技術解密,具有5年工業級IOT大數據雲平臺建設經驗,可提供全棧的大數據+雲原平生臺諮詢方案,請持續關注本套博客。QQ郵箱地址:1120746959@qq.com,若有任何學術交流,可隨時聯繫。
秦凱新
著做權歸做者全部。商業轉載請聯繫做者得到受權,非商業轉載請註明出處。