[Elasticsearch]2.數據存儲:文檔和索引

數據存儲:文檔和索引

連載更新中…html

Data in: Documents and indicesnode

Elasticsearch是分佈式文檔存儲系統.數據不是以列和行的形式存儲的,而是被序列化爲JSON文檔存儲的.若是在Elasticsearch集羣中有多個存儲節點,這些文檔是分散在多個節點上的,而且在任何一個節點上均可以訪問這些文檔.就問你,神奇不神奇?sql

Elasticsearch is a distributed document store. Instead of storing information as rows of columnar data, Elasticsearch stores complex data structures that have been serialized as JSON documents. When you have multiple Elasticsearch nodes in a cluster, stored documents are distributed across the cluster and can be accessed immediately from any node.數據結構

文檔在存儲的時候會建立相應索引而且支持近實時的(1秒鐘內)全文檢索. 怎麼就這麼快呢?這是由於Elasticsearch內部使用了稱爲倒排序索引的方式支持快速全文檢索.那啥子是倒排序索引呢?倒排序索引就是列出了文檔中出現過的每一個詞(去重後)跟文檔的對應關係.能夠想象成一個map:key是詞、value是包含這個詞的全部文檔的IDapp

When a document is stored, it is indexed and fully searchable in near real-time–within 1 second. Elasticsearch uses a data structure called an inverted index that supports very fast full-text searches. An inverted index lists every unique word that appears in any document and identifies all of the documents each word occurs in.less

那啥子又是索引呢?能夠把索引想成特地爲存儲文檔優化的集合.那啥子又是文檔呢? 文檔就是字段的集合.那啥子又是字段呢?字段就是一些包含數據名稱和數據值的鍵值對.默認狀況下Elasticsearch會對文檔中的每一個字段進行索引處理而且會自動探查字段的數據類型好作對應的優化.好比:文本類型的字段使用倒排序索引、數字和地理位置類型的字段使用BKD樹索引.在存儲和搜索的時候根據不一樣的字段類型對數據進行特定的優化處理也是Elasticsearch之因此快的緣由.沒有免費的午飯.快是背後作了不少工做地.elasticsearch

An index can be thought of as an optimized collection of documents and each document is a collection of fields, which are the key-value pairs that contain your data. By default, Elasticsearch indexes all data in every field and each indexed field has a dedicated, optimized data structure. For example, text fields are stored in inverted indices, and numeric and geo fields are stored in BKD trees. The ability to use the per-field data structures to assemble and return search results is what makes Elasticsearch so fast.分佈式

另外Elasticsearch仍是結構(Schema)要求寬鬆的.啥是結構(Schema)?能夠看前一篇關於結構化數據半結構化數據非結構化數據的解釋。簡單說就是數據的格式定義。那什麼是結構要求寬鬆呢?就像上一篇提到的半結構化數據同樣,忽然跟存儲一個不存在的字段也不會報錯.若是Elasticsearch開啓了動態字段映射就能夠自動的檢測新提交存儲的字段的類型並對這個字段進行相應的索引處理.這種功能是否是可讓咱們很快的存儲和探索數據?Elasticsearch會幫咱們檢測數據的類型具體是布爾類型仍是浮點數類型、整數類型、日期類型仍是字符串類型.若是增長了字段不用像Mysql同樣還得先把表修改好.等等,Mysql給表添加字段的Sql語句是什麼來着? 不但查詢快,開發也快啊!ide

Elasticsearch also has the ability to be schema-less, which means that documents can be indexed without explicitly specifying how to handle each of the different fields that might occur in a document. When dynamic mapping is enabled, Elasticsearch automatically detects and adds new fields to the index. This default behavior makes it easy to index and explore your data—just start indexing documents and Elasticsearch will detect and map booleans, floating point and integer values, dates, and strings to the appropriate Elasticsearch data types.優化

然而總歸你比(你也應該)比Elasticsearch更瞭解你的數據,Elasticsearch是根據一些規則判斷數據類型的,你但是21世紀最早進的智能人.若是你以爲Elasticsearch自動檢測的數據類型不符合實際須要或者有進一步優化的空間,你能夠經過定義動態字段映射規則或者爲字段定義特定的類型來控制字段的存儲和索引方式.

Ultimately, however, you know more about your data and how you want to use it than Elasticsearch can. You can define rules to control dynamic mapping and explicitly define mappings to take full control of how fields are stored and indexed.

經過自定義映射咱們能夠:

Defining your own mappings enables you to:

  • 區分那些字符串類型的字段是用於全文檢索的那些不須要全文檢索.(區分Text和Keyword)

  • Distinguish between full-text string fields and exact value string fields

  • 對特定的語言採用特定的分詞器(好比對中文使用ik分詞器)

  • Perform language-specific text analysis

  • 對字段進行某些特定匹配場景優化

  • Optimize fields for partial matching

  • 使用自定義的數據結構

  • Use custom date formats

  • 使用一些不能被自動探測出的類型,如:geo_pointgeo_shape類型

  • Use data types such as geo_point and geo_shape that cannot be automatically detected

咱們常常須要對同一個字段進行不一樣的索引處理.好比:對於一個字符串類型的字段,咱們可能會把它當作Text類型的字段以支持全文檢索,還可能把它作爲Keyword類型的字段以用於對數據進行排序和聚合操做.再好比:對一個字段即便用中文的分詞引擎又須要使用英文的分詞引擎.畢竟咱們有些人日常說話都是中英文混雜的.咱們一塊兒賺Money啊,A lot a lot 啊!

It’s often useful to index the same field in different ways for different purposes. For example, you might want to index a string field as both a text field for full-text search and as a keyword field for sorting or aggregating your data. Or, you might choose to use more than one language analyzer to process the contents of a string field that contains user input.

Elasticsearch會在搜索的時候對搜索文本使用跟文檔存儲的時候對全文檢索(Text)類型的字段使用的相同分詞鏈進行分詞處理。存儲和搜索時使用的分詞處理同樣才能保證可以搜索到正確的結果嘛.

The analysis chain that is applied to a full-text field during indexing is also used at search time. When you query a full-text field, the query text undergoes the same analysis before the terms are looked up in the index.

相關文章
相關標籤/搜索