忽然收到告警,提示mysql宕機了,該服務器是從庫。因而嘗試登陸服務器看看可否登陸,發現能夠登陸,查看mysql進程也存在,嘗試登陸提示php
ERROR 1040 (HY000): Too many connections
最大鏈接數設置的3000,怎麼會鏈接數不夠了呢。因而使用gdb修改一下最大鏈接數:mysql
gdb -p $(cat pid_mysql.pid) -ex "set max_connections=5000" -batch
修改之後能夠登陸了,因而show processlist看看是啥狀況:sql
發現監控程序執行show slave status都被卡住了,最後把最大鏈接數用完,致使Too many connections。複製卡在了Waiting for commit lock。查閱資料之後發現是觸發了bug。https://bugs.mysql.com/bug.php?id=70307,改bug在5.6.23已經修復。個人版本是 5.6.17服務器
mysql> SELECT a.trx_id, trx_state, trx_started, b.id AS thread_id, b.info, b.user, b.host, b.db, b.command, b.state FROM information_schema.`INNODB_TRX` a, information_schema.`PROCESSLI ST` b WHERE a.trx_mysql_thread_id = b.id ORDER BY a.trx_started; +----------+-----------+---------------------+-----------+------+-------------+------+------+---------+-------------------------+ | trx_id | trx_state | trx_started | thread_id | info | user | host | db | command | state | +----------+-----------+---------------------+-----------+------+-------------+------+------+---------+-------------------------+ | 51455154 | RUNNING | 2017-08-02 02:20:07 | 6404 | NULL | system user | | NULL | Connect | Waiting for commit lock | +----------+-----------+---------------------+-----------+------+-------------+------+------+---------+-------------------------+ 1 row in set (0.03 sec)
能夠看到在凌晨2點左右的時候卡住的,忽然發現凌晨2點這個時候正是xtrabackup備份數據的時間。xtrabackup備份的時候執行flushs tables with read lock和show slave status會有可能和SQL Thread造成死鎖,致使SQL Thread一直被卡主。緣由是SQL Thread的DML操做完成以後,持有rli->data_lock鎖,commit的時候等待MDL_COMMIT,而flush tables with read lock以後執行的show slave status會等待rli->data_lock;修復方法是rli->data_lock鎖週期只在DML操做期間持有。
stop slave沒有用,正常中止沒有用,最後只能kill -9,問題仍是比較嚴重的,解決的方法就是升級新版本。spa