elasticsearch學習筆記（二十五）——Elasticsearch mapping詳解以及索引內部原理

時間 2019-11-10

標籤 elasticsearch 學習筆記二十五 mapping 詳解以及索引內部原理欄目日誌分析简体版

原文原文鏈接

下面先簡單描述一下mapping是什麼？
當咱們插入幾條數據，讓ES自動爲咱們創建一個索引web

PUT /website/_doc/1
{
  "post_date": "2017-01-01",
  "title": "my first article",
  "content": "this is my first article in this website",
  "author_id": 11400
}
PUT /website/_doc/2
{
  "post_date": "2017-01-02",
  "title": "my second article",
  "content": "this is my second article in this website",
  "author_id": 11400
}
PUT /website/_doc/3
{
  "post_date": "2017-01-03",
  "title": "my third article",
  "content": "this is my third article in this website",
  "author_id": 11400
}

查看mapping數組

GET /website/_mapping
{
  "website" : {
    "mappings" : {
      "properties" : {
        "author_id" : {
          "type" : "long"
        },
        "content" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "post_date" : {
          "type" : "date"
        },
        "title" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

上面是插入數據自動生成的mapping，還有手動生成的mapping。這種自動或手動爲index中的type創建的一種數據結構和相關配置，稱爲mapping。
下面是手動建立的mapping。數據結構

PUT /test_mapping
{
  "mappings" : {
    "properties" : {
      "author_id" : {
        "type" : "long"
      },
      "content" : {
        "type" : "text",
        "fields" : {
          "keyword" : {
            "type" : "keyword",
            "ignore_above" : 256
          }
        }
      },
      "post_date" : {
        "type" : "date"
      },
      "title" : {
        "type" : "text",
        "fields" : {
          "keyword" : {
            "type" : "keyword",
            "ignore_above" : 256
          }
        }
      }
    }
  }
}

一、精確匹配與全文搜索的對比分析

（1）exact value

也就是某個field必須所有匹配才能返回相應的document
示例:app

GET /website/_search?q=post_date:2017
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

GET /website/_search?q=post_date:2017-01-01
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "website",
        "_type" : "doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "post_date" : "2017-01-01",
          "title" : "my first article",
          "content" : "this is my first article in this website",
          "author_id" : 11400
        }
      }
    ]
  }
}

（2）full text

full text與exact value不同，不是說單純的只是匹配完整的一個值，而是能夠對值進行拆分詞語後（分詞）進行匹配，也能夠經過縮寫、時態、大小寫、同義詞等進行匹配。
示例：post

GET /website/_search?q=title:article
{
  "took" : 7,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 0.087011375,
    "hits" : [
      {
        "_index" : "website",
        "_type" : "doc",
        "_id" : "1",
        "_score" : 0.087011375,
        "_source" : {
          "post_date" : "2017-01-01",
          "title" : "my first article",
          "content" : "this is my first article in this website",
          "author_id" : 11400
        }
      },
      {
        "_index" : "website",
        "_type" : "doc",
        "_id" : "2",
        "_score" : 0.087011375,
        "_source" : {
          "post_date" : "2017-01-02",
          "title" : "my second article",
          "content" : "this is my second in this website",
          "author_id" : 11400
        }
      },
      {
        "_index" : "website",
        "_type" : "doc",
        "_id" : "3",
        "_score" : 0.087011375,
        "_source" : {
          "post_date" : "2017-01-03",
          "title" : "my third article",
          "content" : "this is my third in this website",
          "author_id" : 11400
        }
      }
    ]
  }
}

二、倒排索引核心原理

下面演示一下倒排索引簡單創建的過程，固然實際中倒排索引的創建過程會很是的複雜。
doc1: I really liked my small dogs, and I think my mom also liked them.
doc2: He never liked any dogs, so I hope that my mom will not expect me to liked him.測試

分詞，初步的倒排索引的創建ui

word    doc1    doc2
I        *        *
really   *
liked    *        *
my       *        *
small    *
dogs     *
and      *
think    *
mom      *        *
also     *        
them     *
He                *
never             *
any               *
so                *
hope              *
that              *
will              *
not               *
expect            *
me                *
to                *
him               *

搜索 mother like little dog, 不會有任何結果
mother
like
little
dog
這確定不是咱們想要的結果。好比mother和mom其實根本就沒有區別。可是卻檢索不到。可是作下測試發現ES是能夠查到的。實際上ES在創建倒排索引的時候，還會執行一個操做，就是會對拆分的各個單詞進行相應的處理，以提高後面搜索的時候可以搜索到相關聯的文檔的機率。像時態的轉換，單複數的轉換，同義詞的轉換，大小寫的轉換。這個過程稱爲正則化（normalization）
mother-> mom
liked -> like
small -> little
dogs -> dog
這樣從新創建倒排索引：this

word    doc1    doc2
I        *        *
really   *
like     *        *
my       *        *
little   *
dog      *
and      *
think    *
mom      *        *
also     *        
them     *
He                *
never             *
any               *
so                *
hope              *
that              *
will              *
not               *
expect            *
me                *
to                *
him               *

查詢：mother like little dog 分詞正則化
mother -> mom
like -> like
little -> little
dog -> dog
doc1和doc2都會搜索出來
doc1：I really liked my small dogs, and I think my mom also liked them.
doc2：He never liked any dogs, so I hope that my mom will not expect me to liked him.code

三、對mapping進一步總結

（1）往ES裏面直接插入數據，ES會自動創建索引，同時創建type以及對應的mapping
（2）mapping中自動定義了每一個fieldd的數據類型
（3）不一樣的數據類型（好比說text和date），可能有的是exact value，有的是full text
（4）exact value，在創建倒排索引的時候，分詞的時候，都是將整個值一塊兒做爲關鍵字創建到倒排索引中；full text會經歷各類各樣的處理，分詞，normalization（時態轉換，同義詞轉換，大小寫轉換），纔會創建到倒排索引中
（5）在搜索的時候，exact value和full text類型就決定了，對exact value和full text field進行搜索的行爲也是不同的，會跟創建倒排索引的行爲保持一致；好比說exact value搜索的時候，就是直接按照整個值進行匹配，full text也會進行分詞和正則化normalization再去倒排索引中去搜索。
（6）能夠用 ES的dynamic mapping，讓其自動創建mapping,包括自動設置數據類型；也能夠提早手動建立index和type的mapping,本身對各個field進行設置，包括數據類型，包括索引行爲，包括分析器等等。orm

mapping本質上就是index的type的元數據，決定了數據類型，創建倒排索引的行爲，還有進行搜索的行爲。

四、mapping核心數據類型以及dynamic mapping

（1）核心數據類型
string text：字符串類型
byte:字節類型
short：短整型
integer：整型
long:長整型
float:浮點型
boolean:布爾類型
date:時間類型
固然還有一些高級類型，像數組，對象object，但其底層都是text字符串類型
（2） dynamic mapping
true or false -> boolean
123 -> long
123.45 -> float
2017-01-01 -> date
"hello world" -> string text
（3）查看mapping

GET /{index}/mapping


GET /test/_mapping
{
  "test" : {
    "mappings" : {
      "properties" : {
        "field1" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "field2" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

五、手動創建和修改mapping以及定製string類型是否分詞

注意：只能建立index時手動創建mapping，或者新增field mapping，可是不能update field mapping。

# 建立索引
PUT /website
{
  "mappings": {
    "properties": {
      "author_id": {
        "type": "long"
      },
      "title": {
        "type": "text",
        "analyzer": "standard"
      },
      "content": {
        "type": "text"
      },
      "post_date": {
        "type": "date"
      },
      "publisher_id": {
        "type": "keyword"
      }
    }
  }
}
#修改字段的mapping
PUT /website
{
  "mappings": {
    "properties": {
      "author_id": {
        "type": "text"
      }
    }
  }
}
{
  "error": {
    "root_cause": [
      {
        "type": "resource_already_exists_exception",
        "reason": "index [website/5xLohnJITHqCwRYInmBFmA] already exists",
        "index_uuid": "5xLohnJITHqCwRYInmBFmA",
        "index": "website"
      }
    ],
    "type": "resource_already_exists_exception",
    "reason": "index [website/5xLohnJITHqCwRYInmBFmA] already exists",
    "index_uuid": "5xLohnJITHqCwRYInmBFmA",
    "index": "website"
  },
  "status": 400
}
#增長mapping的字段
PUT /website/_mapping
{
  "properties": {
    "new_field": {
      "type": "text"
    }
  }
}
{
  "acknowledged" : true
}

六、mapping複雜類型y以及object類型數據底層結構

（1）multivalue field

{
    "tags": ["tag1", "tag2"]
}

（2）empty field
null, []
（3）object field

PUT /test/_create/1
{
  "address": {
    "country": "china",
    "province": "guangdong",
    "city": "guangzhou"
  },
  "name": "jack",
  "age": 27,
  "join_date": "2017-01-01"
}
GET /test/_mapping
{
  "test" : {
    "mappings" : {
      "properties" : {
        "address" : {
          "properties" : {
            "city" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "country" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "province" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            }
          }
        },
        "age" : {
          "type" : "long"
        },
        "join_date" : {
          "type" : "date"
        },
        "name" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

GET /test/_doc/1

{
  "_index" : "test",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "address" : {
      "country" : "china",
      "province" : "guangdong",
      "city" : "guangzhou"
    },
    "name" : "jack",
    "age" : 27,
    "join_date" : "2017-01-01"
  }
}

注意：創建索引的時候與string時同樣的，數據類型不能混