tokudb引擎磁盤空間不足致使寫入失敗的調查

故障現象
2016.1.1號早上4點左右,zabbi數據庫服務器報警,寫入數據失敗。登錄機器後檢查發現磁盤空間使用95%沒有用滿,進去zabbix數據庫,執行insert命令提示錯誤「errir 1030(HY000):got error 28 from storage engine」. 
 
前提
zabbix數據庫因爲超大的寫入量,咱們使用tokudb存儲引擎來存儲,此引擎有強大的壓縮比,寫入性能也很是不錯,適合zabbix數據庫場景。
 
故障調查
1)檢查錯誤日誌,發現有以下
Version: '5.6.22-72.0-log'  socket: '/tmp/mysql.socket'  port: 3306  Percona Server (GPL), Release 72.0, Revision 738
Sun Dec 27 06:18:58 2015 TokuFT file system space is low
Sun Dec 27 06:22:58 2015 TokuFT file system space is low
Sun Dec 27 06:26:43 2015 TokuFT file system space is low
Sun Dec 27 06:30:48 2015 TokuFT file system space is low
Sun Dec 27 06:34:48 2015 TokuFT file system space is low
Sun Dec 27 06:38:43 2015 TokuFT file system space is low
Fri Jan  1 03:57:56 2016 TokuFT file system space is really low and access is restricted
Fri Jan  1 04:25:56 2016 TokuFT file system space is really low and access is restricted
Fri Jan  1 05:52:07 2016 TokuFT file system space is really low and access is restricted
Fri Jan  1 07:33:47 2016 TokuFT file system space is really low and access is restricted
在3.57的時候開始報「 Fri Jan 1 03:57:56 2016 TokuFT file system space is really low and access is restricted」錯誤。翻譯一下就是說系統磁盤空間不足了,請求被拒絕。這個時間和DB寫入失敗時間一致。
 
2)查看percona官方文檔,發現有一個變量是控制磁盤剩餘空間檢查的
variable tokudb_fs_reserve_percent

This variable controls the percentage of the file system that must be available for inserts to be allowed. By default, this is set to 5. We recommend that this reserve be at least half the size of your physical memory. See Full Disks for more information.
看到默認設置是5,也就是說磁盤剩餘可用空間低於5%的時候,拒絕寫入,直到釋放出更多的空間
 
3)進一步查看full disk information,獲得一個信息,「 TokuDB polls the file system every five seconds to determine how much free space is available」每5秒鐘去檢測一次磁盤空間。
Details about the disk system:
There is a free-space reserve requirement, which is a user-configurable parameter given as a percentage of the total space in the file system. The default reserve is five percent. This value is available in the global variable tokudb_fs_reserve_percent. We recommend that this reserve be at least half the size of your physical memory.
TokuDB polls the file system every five seconds to determine how much free space is available. If the free space dips below the reserve, then further table inserts are prohibited. Any transaction that attempts to insert rows will be aborted. Inserts are re-enabled when twice the reserve is available in the file system (so freeing a small amount of disk storage will not be sufficient to resume inserts). Warning messages are sent to the system error log when free space dips below twice the reserve and again when free space dips below the reserve.
Even with inserts prohibited it is still possible for the file system to become completely full. For example this can happen because another storage engine or another application consumes disk space.
If the file system becomes completely full, then TokuDB will freeze. It will not crash, but it will not respond to most SQL commands until some disk space is made available. When TokuDB is frozen in this state, it will still respond to the following command:
 
4)嘗試動態設置這個參數,發現是個只讀參數,須要重啓服務
mysql> set global tokudb_fs_reserve_percent=4;
ERROR 1238 (HY000): Variable 'tokudb_fs_reserve_percent' is a read only variable
結論
tokudb爲了保障數據庫服務正常,每5秒檢測一次磁盤剩餘空間,默認剩餘5%的時候阻塞寫入,直到釋放更多的空間再恢復正常。經過tokudb_fs_reserve_percent變量控制剩餘百分比,這是個只讀變量。在INNODB,MYISAM等引擎上沒有這個參數可配置,磁盤能夠寫到100%。你們在使用tokudb的時候不要忘記這個參數,磁盤到95%以前就要準備擴容了。
相關文章
相關標籤/搜索