MongoDB 存儲引擎：WiredTiger和In-Memory

時間 2019-11-21

標籤 mongodb 存儲引擎 wiredtiger memory 欄目 MongoDB 简体版

原文原文鏈接

存儲引擎（Storage Engine）是MongoDB的核心組件，負責管理數據如何存儲在硬盤（Disk）和內存（Memory）上。從MongoDB 3.2 版本開始，MongoDB 支持多數據存儲引擎（Storage Engine），MongoDB支持的存儲引擎有：WiredTiger，MMAPv1和In-Memory。算法

從MongoDB 3.2 版本開始，WiredTiger成爲MongDB默認的Storage Engine，用於將數據持久化存儲到硬盤文件中，WiredTiger提供文檔級別（Document-Level）的併發控制，檢查點（CheckPoint），數據壓縮和本地數據加密（ Native Encryption）等功能。mongodb

MongoDB不只能將數據持久化存儲到硬盤文件中，並且還能將數據只保存到內存中；In-Memory存儲引擎用於將數據只存儲在內存中，只將少許的元數據和診斷日誌（Diagnostic）存儲到硬盤文件中，因爲不須要Disk的IO操做，就能獲取索取的數據，In-Memory存儲引擎大幅度下降了數據查詢的延遲（Latency）。數據庫

一，指定MongoDB實例的存儲引擎緩存

mongod 參數： --storageEngine wiredTiger | inMemory安全

指定Storage Engine的類型，併發

若是參數值是wiredTiger，MongoDB使用的存儲引擎是WiredTiger，將數據持久化存儲在Disk Files中；
若是參數值是inMemory，MongoDB使用的存儲引擎是In-Memory，將數據存儲在內存中；
從MongoDB 3.2 版本開始，MongoDB默認的存儲引擎是WiredTiger；

二，WiredTiger 存儲引擎將數據存儲到硬盤文件（Disk Files）app

WiredTiger和MMAPv1都用於持久化存儲數據，相對而言，WiredTiger比MMAPv1更新，功能更強大。less

1，文檔級別的併發控制（Document-Level Concurrency Control）ui

MongoDB在執行寫操做時，WiredTiger 在文檔級別進行併發控制，就是說，在同一時間，多個寫操做可以修改同一個集合中的不一樣文檔；當多個寫操做修改同一個文檔時，必須以序列化方式執行；這意味着，若是該文檔正在被修改，其餘寫操做必須等待，直到在該文檔上的寫操做完成以後，其餘寫操做相互競爭，獲勝的寫操做在該文檔上執行修改操做。this

對於大多數讀寫操做，WiredTiger使用樂觀併發控制（optimistic concurrency control），只在Global，database和Collection級別上使用意向鎖（Intent Lock），若是WiredTiger檢測到兩個操做發生衝突時，致使MongoDB將其中一個操做從新執行，這個過程是系統自動完成的。

For most read and write operations, WiredTiger uses optimistic concurrency control. WiredTiger uses only intent locks at the global, database and collection levels. When the storage engine detects conflicts between two operations, one will incur a write conflict causing MongoDB to transparently retry that operation.

2，檢查點（Checkpoint）

在Checkpoint操做開始時，WiredTiger提供指定時間點（point-in-time）的數據庫快照（Snapshot），該Snapshot呈現的是內存中數據的一致性視圖。當向Disk寫入數據時，WiredTiger將Snapshot中的全部數據以一致性方式寫入到數據文件（Disk Files）中。一旦Checkpoint建立成功，WiredTiger保證數據文件和內存數據是一致性的，所以，Checkpoint擔當的是還原點（Recovery Point），Checkpoint操做可以縮短MongoDB從Journal日誌文件還原數據的時間。

當WiredTiger建立Checkpoint時，MongoDB將數據刷新到數據文件（Disk Files）中，在默認狀況下，WiredTiger建立Checkpoint的時間間隔是60s，或產生2GB的Journal文件。在WiredTiger建立新的Checkpoint期間，上一個Checkpoint仍然是有效的，這意味着，即便MongoDB在建立新的Checkpoint期間遭遇到錯誤而異常終止運行，只要重啓，MongoDB就能從上一個有效的Checkpoint開始還原數據。

當MongoDB以原子方式更新WiredTiger的元數據表，使其引用新的Checkpoint時，代表新的Checkpoint建立成功，MongoDB將老的Checkpoint佔用的Disk空間釋放。使用WiredTiger 存儲引擎，若是沒有記錄數據更新的日誌，MongoDB只能還原到上一個Checkpoint；若是要還原在上一個Checkpoint以後執行的修改操做，必須使用Jounal日誌文件。

3，預先記錄日誌（Write-ahead Transaction Log）

WiredTiger使用預寫日誌的機制，在數據更新時，先將數據更新寫入到日誌文件，而後在建立Checkpoint操做開始時，將日誌文件中記錄的操做，刷新到數據文件，就是說，經過預寫日誌和Checkpoint，將數據更新持久化到數據文件中，實現數據的一致性。WiredTiger 日誌文件會持久化記錄從上一次Checkpoint操做以後發生的全部數據更新，在MongoDB系統崩潰時，經過日誌文件可以還原從上次Checkpoint操做以後發生的數據更新。

The WiredTiger journal persists all data modifications between checkpoints. If MongoDB exits between checkpoints, it uses the journal to replay all data modified since the last checkpoint.

3，內存使用

3.1 WiredTiger 利用系統內存資源緩存兩部分數據：

內部緩存（Internal Cache）
文件系統緩存（Filesystem Cache）

從MongoDB 3.2 版本開始，WiredTiger內部緩存的使用量，默認值是：1GB 或 60% of RAM - 1GB，取兩值中的較大值；文件系統緩存的使用量不固定，MongoDB自動使用系統空閒的內存，這些內存不被WiredTiger緩存和其餘進程使用，數據在文件系統緩存中是壓縮存儲的。

3.2 調整WiredTiger內部緩存的大小

使用 mongod的參數 --wiredTigerCacheSizeGB 來修改MongoDB實例中WiredTiger內部緩存的大小，計算內部緩存大小的公式是：

Starting in MongoDB 3.2, the WiredTiger internal cache, by default, will use the larger of either: 60% of RAM minus 1 GB, or 1 GB.
For systems with up to 10 GB of RAM, the new default setting is less than or equal to the 3.0 default setting
For systems with more than 10 GB of RAM, the new default setting is greater than the 3.0 setting.

4，數據壓縮（Data Compression）

WiredTiger壓縮存儲集合（Collection）和索引（Index），壓縮減小Disk空間消耗，可是消耗額外的CPU執行數據壓縮和解壓縮的操做。

默認狀況下，WiredTiger使用塊壓縮（Block Compression）算法來壓縮Collections，使用前綴壓縮（Prefix Compression）算法來壓縮Indexes，Journal日誌文件也是壓縮存儲的。對於大多數工做負載（Workload），默認的壓縮設置可以均衡（Balance）數據存儲的效率和處理數據的需求，即壓縮和解壓的處理速度是很是高的。

5，Disk空間回收

當從MongoDB中刪除文檔（Documents）或集合（Collections）後，MongoDB不會將Disk空間釋放給OS，MongoDB在數據文件（Data Files）中維護Empty Records的列表。當從新插入數據後，MongoDB從Empty Records列表中分配存儲空間給新的Document，所以，不須要從新開闢空間。爲了更新有效的重用Disk空間，必須從新整理數據碎片。

WiredTiger使用compact 命令，移除集合（Collection）中數據和索引的碎片，並將unused的空間釋放，調用語法：

db.runCommand ( { compact: '<collection>' } )

在執行compact命令時，MongoDB會對當前的database加鎖，阻塞其餘操做。在compact命令執行完成以後，mongod會重建集合的全部索引。

On WiredTiger, compact will rewrite the collection and indexes to minimize disk space by releasing unused disk space to the operating system. This is useful if you have removed a large amount of data from the collection, and do not plan to replace it.

二，In-Memory 存儲引擎將數據存儲到內存（Memory）

In-Memory存儲引擎將數據存儲在內存中，除了少許的元數據和診斷（Diagnostic）日誌，In-Memory存儲引擎不會維護任何存儲在硬盤上的數據（On-Disk Data），避免Disk的IO操做，減小數據查詢的延遲。

1，指定In-Memory存儲引擎

mongod --storageEngine inMemory --dbpath <path>

在選擇In-Memory存儲引擎時，須要指定兩個參數：

設置mongod參數： --storageEngine ，設置參數的值是 inMemory；
設置mongod參數： --dbpath ，設置參數的值是數據存儲的目錄；
使用Disk存儲元數據，診斷數據和臨時數據：雖然 In-Memory 存儲引擎不會向文件系統寫入數據，可是它須要使用 --dbpath 維護少許的元數據和診斷（Diagnostic ）日誌，在建立Large Index時，使用Disk存儲臨時數據；Although the in-memory storage engine does not write data to the filesystem, it maintains in the --dbpath small metadata files and diagnostic data as well temporary files for building large indexes.

2，文檔級別的併發（document-level concurrency）

In-Memory存儲引擎在執行寫操做時，使用文件級別的併發控制，就是說，在同一時間，多個寫操做可以同時修改同一個集合中的不一樣文檔；當多個寫操做修改同一個文檔時，必須以序列化方式執行；這意味着，若是該文檔正在被修改，其餘寫操做必須等待。

3，內存使用

In-Mmeory 存儲引擎須要將Data，Index，Oplog等存儲到內存中，經過mongod參數： --inMemorySizeGB 設置佔用的內存數量，默認值是：50% of RAM-1GB。指定In-Memory 存儲引擎使用的內存數據量，單位是GB：

mongod --storageEngine inMemory --dbpath <path> --inMemorySizeGB <newSize>

4，持久化（Durable）

因爲In-Memory 存儲引擎不會持久化存儲數據，只將數據存儲在內存中，讀寫操做直接在內存中完成，不會將數據寫入到Disk文件中，所以，不須要單獨的日誌文件，不存在記錄日誌和等待數據持久化的問題，當MongoDB實例關機或系統異常終止時，全部存儲在內存中的數據都將會丟失。

5，記錄oplog

In-Memory 存儲引擎不會將數據更新寫入到Disk，可是會記錄oplog，該oplog是存儲在內存中的集合，MongoDB經過Replication將Primary成員的oplog推送給同一副本集的其餘成員。若是一個MongoDB實例是Replica Set的Primary成員，該實例使用In-Memory存儲引擎，經過Replication將oplog推送到其餘成員，在其餘成員中重作oplog中記錄的操做，這樣，就能將在Primary成員中執行的數據修改持久化存儲。

You can deploy mongod instances that use in-memory storage engine as part of a replica set. For example, as part of a three-member replica set, you could have:

two mongod instances run with in-memory storage engine.
one mongod instance run with WiredTiger storage engine. Configure the WiredTiger member as a hidden member (i.e. hidden: true and priority: 0).

With this deployment model, only the mongod instances running with the in-memory storange engine can become the primary. Clients connect only to the in-memory storage engine mongod instances. Even if both mongod instances running in-memory storage engine crash and restart, they can sync from the member running WiredTiger. The hidden mongod instance running with WiredTiger persists the data to disk, including the user data, indexes, and replication configuration information.

三，記錄日誌

數據是MongoDB的核心，MongoDB必須保證數據的安全，不能丟失，Journal 是順序寫入的日誌文件，用於記錄上一個Checkpoint以後發生的數據更新，可以將數據庫從系統異常終止事件中還原到一個有效的狀態。MongoDB使用預寫日誌機制實現數據的持久化：WiredTiger 存儲引擎在執行寫操做時，先將數據更新寫入到Journal文件。Journal Files是存儲在硬盤的日誌文件，每一個Journal File大約是100MB，存儲在--dbpath下的Journal子目錄中，在執行Checkpoint操做，將數據的更新同步到數據文件。

每隔必定的時間間隔，WiredTiger 存儲引擎都會執行Checkpoint操做，將緩存的數據更新日誌同步到硬盤上的數據文件中（On-Disk Files），在默認狀況下，MongoDB啓用日誌記錄，也能夠顯式啓用，只須要在啓動mongod 時使用--journal 參數：

mongod --journal

1，使用Journal日誌文件還原的過程

WiredTiger建立Checkpoint，可以將MongoDB數據庫還原到上一個CheckPoint建立時的一致性狀態，若是MongoDB在上一個Checkpoint以後異常終止，必須使用Journal日誌文件，重作從上一個Checkpoint以後發生的數據更新操做，將數據還原到Journal記錄的一致性狀態，使用Journal日誌還原的過程是：

獲取上一個Checkpoint建立的標識值：從數據文件（Data Files）中查找上一個Checkpoint發生的標識值（Identifier）；
根據標識值匹配日誌記錄：從Journal Files 中搜索日誌記錄（Record），查找匹配上一個Checkpoint的標識值的日誌記錄；
重作日誌記錄：重作從上一個Checkpoint以後，記錄在Journal Files中的全部日誌記錄；

2，緩存日誌

MongoDB配置WiredTiger使用內存緩衝區來存儲Journal Records，全部沒有達到128KB的Journal Records都會被緩存在緩衝區中，直到大小超過128KB。在執行寫操做時，WiredTiger將Journal Records存儲在緩衝區中，若是MongoDB異常關機，存儲在內存中的Journal Records將丟失，這意味着，WiredTiger將丟失最大128KB的數據更新。

WiredTiger syncs the buffered journal records to disk according to the following intervals or conditions:

New in version 3.2: Every 50 milliseconds.
MongoDB sets checkpoints to occur in WiredTiger on user data at an interval of 60 seconds or when 2 GB of journal data has been written, whichever occurs first.
If the write operation includes a write concern of j: true, WiredTiger forces a sync of the WiredTiger journal files.
Because MongoDB uses a journal file size limit of 100 MB, WiredTiger creates a new journal file approximately every 100 MB of data. When WiredTiger creates a new journal file, WiredTiger syncs the previous journal file.

3，日誌文件（Journal Files）

關於Journal文件，MongoDB在 --dbpath 目錄下建立 journal子目錄，WiredTiger將Journal 文件存儲在該目錄下，每個Journal文件大約是100M，命名格式是：WiredTigerLog.<sequence>，sequence是一個左邊填充0的10位數字，從0000000001開始，依次遞增。

對於WiredTiger存儲引擎，Journal 文件具備如下特性：

標識日誌記錄：Journal文件的每個日誌記錄（Record）表明一個寫操做；每個記錄都有一個ID，用於惟一標識該記錄；
壓縮Journal文件：WiredTiger會壓縮存儲在Journal文件中的數據；
Journal文件大小的上限：每個Journal文件大小的上限大約是100MB，一旦文件超過該限制，WiredTiger建立一個新的Journal文件；
自動移除Journal文件：WiredTiger自動移除老的Journal文件，只維護從上一個Checkpoint還原時必需的Journal文件；
預先分配Journal文件：WiredTiger預先分配Journal文件；

4，在異常宕機後恢復數據

在MongoDB實例異常宕機後，重啓mongod實例，MongoDB自動重作（redo）全部的Journal Files，在還原Journal Files期間，MongoDB數據庫是沒法訪問的。

四，mongod 跟存儲引擎相關的參數

1，使用WiredTiger的參數設置

mongod 
--storageEngine wiredTiger 
--dbpath <path> 
--journal --wiredTigerCacheSizeGB <value>
--wiredTigerJournalCompressor <compressor>
--wiredTigerCollectionBlockCompressor <compressor>
--wiredTigerIndexPrefixCompression <boolean>

2，使用In-Memory的參數設置

mongod 
--storageEngine inMemory
--dbpath <path> 
--inMemorySizeGB <newSize>
--replSet <setname>
--oplogSize <value>

參考doc：

Storage Engines

WiredTiger Storage Engine

Journaling

In-Memory Storage Engine

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。