【MySQL】你覺得設置了並行複製就下降延遲了?這個你絕對想不到!

在MySQL官方版本中,爲了保證其的高可用性,通常狀況咱們會採用主從複製的方式來解決。固然,方法不少。而咱們今天所要處理的是採用GTID方式而且開了多線程複製後,仍然延遲的狀況,糟糕的是,延遲還在不斷擴大!mysql

  • 環境概要
  • 序號 清單 說明
    1 系統 Redhat 6.x(4c,32g)
    2 數據庫 MySQL-5.7.25
    3 同步方式 基於GTID主從同步

  • 環境排查
  • 1)已經配置的重要參數:sql

     
    

    relay for slave

    slaveparalleltype = LOGICAL_CLOCKslaveparallelworkers = 6masterinforepository = TABLErelayloginfo_repository = TABLErelaylogrecovery = onsyncrelaylog = 10000注:此時沒有設置slave_preserve_commit_order參數。數據庫

    2)從庫延遲狀態查詢session

    mysql> show slave statusG* 1. row *SlaveIOState: Queueing master event to the relay logMaster_Host: xxx.xxx.xxx.xxxMaster_User: replMaster_Port: 3306Connect_Retry: 60MasterLogFile: mysql-bin.008978ReadMasterLog_Pos: 696914605RelayLogFile: DB41-relay-bin.001259RelayLogPos: 207377582RelayMasterLog_File: mysql-bin.008970SlaveIORunning: YesSlaveSQLRunning: YesReplicateDoDB: neteagle3ReplicateIgnoreDB: mysqlReplicateDoTable:ReplicateIgnoreTable:ReplicateWildDo_Table:ReplicateWildIgnore_Table:Last_Errno: 0Last_Error:Skip_Counter: 0ExecMasterLog_Pos: 1068770059RelayLogSpace: 8425484286Until_Condition: NoneUntilLogFile:UntilLogPos: 0MasterSSLAllowed: NoMasterSSLCA_File:MasterSSLCA_Path:MasterSSLCert:MasterSSLCipher:MasterSSLKey:SecondsBehindMaster: 187358MasterSSLVerifyServerCert: NoLastIOErrno: 0LastIOError:LastSQLErrno: 0LastSQLError:ReplicateIgnoreServer_Ids:MasterServerId: 42Master_UUID: eab7fcac-3cda-11e6-ada8-fa163e648db2MasterInfoFile: mysql.slavemasterinfoSQL_Delay: 0SQLRemainingDelay: NULLSlaveSQLRunning_State: Waiting for dependent transaction to commitMasterRetryCount: 86400Master_Bind:LastIOError_Timestamp:LastSQLError_Timestamp:MasterSSLCrl:MasterSSLCrlpath:RetrievedGtidSet: eab7fcac-3cda-11e6-ada8-fa163e648db2:58031191-59927276ExecutedGtidSet: eab7fcac-3cda-11e6-ada8-fa163e648db2:1-58080239:58080241Auto_Position: 1ReplicateRewriteDB:Channel_Name:MasterTLSVersion:1 row in set (0.00 sec)
    簡單介紹一下幾個指標信息:

    1. MasterLogFile
    2. ReadMasterLog_Pos
    3. SecondsBehindMaster
    4. RelayLogFile
    5. RelayLogPos
    6. RelayMasterLog_File
    7. ExecMasterLog_Pos

    MasterLogFile,ReadMasterLog_Pos:這兩個參數是成對的,表示的是從庫IO thread傳輸主庫的binlog日誌號及具體位置。多線程

    RelayLogFile,RelayLogPos:這兩個參數也是成對的,表示的是從庫sql thread應用中繼日誌(relay log)號及具體位置。app

    RelayMasterLog_File,RelayLogPos:這兩個參數也是成對的,表示的是上一項中的中繼日誌對應的主庫binlog日誌及具體位置(有點繞)。性能

    SecondsBehindMaster:此參數可簡單理解爲主從延遲時間,單位爲秒。spa

    從上面這段MySQL從狀態信息中,咱們能夠看到,Seconds_Behind_Master: 187358這是從庫sql應用延遲主庫的時間爲187358秒,轉換整天,大概兩天多。這說明,咱們從庫複製的數據是兩天前的。線程

    3)驗證並行複製日誌

    mysql> show full processlist;+----+-------------+--------------+-----------+------------------+--------+---------------------------------------------------------------+-----------------------+| Id | User        | Host         | db        | Command          | Time   | State                                                         | Info                  |+----+-------------+--------------+-----------+------------------+--------+---------------------------------------------------------------+-----------------------+|  1 | system user |              | NULL      | Connect          |  18204 | Waiting for master to send event                              | NULL                  ||  2 | system user |              | NULL      | Connect          |      0 | Waiting for dependent transaction to commit                   | NULL                  ||  3 | system user |              | NULL      | Connect          | 154914 | System lock                                                   | NULL                  ||  4 | system user |              | NULL      | Connect          | 154914 | Waiting for an event from Coordinator                         | NULL                  ||  5 | system user |              | NULL      | Connect          | 154918 | Waiting for an event from Coordinator                         | NULL                  ||  6 | system user |              | NULL      | Connect          | 155525 | Waiting for an event from Coordinator                         | NULL                  ||  7 | system user |              | NULL      | Connect          | 180427 | Waiting for an event from Coordinator                         | NULL                  ||  8 | system user |              | NULL      | Connect          |  18204 | Waiting for an event from Coordinator                         | NULL                  || 10 | root        | localhost    | neteagle3 | Query            |      0 | starting                                                      | show full processlist || 11 | repl        | DBSlave:9683 | NULL      | Binlog Dump GTID |  18156 | Master has sent all binlog to slave; waiting for more updates | NULL                  || 13 | root        | localhost    | neteagle3 | Sleep            |   4962 |                                                               | NULL                  |+----+-------------+--------------+-----------+------------------+--------+---------------------------------------------------------------+-----------------------+

    mysql> select * from  performance_schema.replication_applier_status_by_worker ;+--------------+-----------+-----------+---------------+-----------------------------------------------+-------------------+--------------------+----------------------+| CHANNELNAME | WORKERID | THREADID | SERVICESTATE | LASTSEENTRANSACTION                         | LASTERRORNUMBER | LASTERRORMESSAGE | LASTERRORTIMESTAMP |+--------------+-----------+-----------+---------------+-----------------------------------------------+-------------------+--------------------+----------------------+|              |         1 |        51 | ON            | eab7fcac-3cda-11e6-ada8-fa163e648db2:80240805 |                 0 |                    | 0000-00-00 00:00:00  ||              |         2 |        52 | ON            | eab7fcac-3cda-11e6-ada8-fa163e648db2:80240210 |                 0 |                    | 0000-00-00 00:00:00  ||              |         3 |        53 | ON            | eab7fcac-3cda-11e6-ada8-fa163e648db2:80235089 |                 0 |                    | 0000-00-00 00:00:00  ||              |         4 |        54 | ON            | eab7fcac-3cda-11e6-ada8-fa163e648db2:80191268 |                 0 |                    | 0000-00-00 00:00:00  ||              |         5 |        55 | ON            | eab7fcac-3cda-11e6-ada8-fa163e648db2:75296683 |                 0 |                    | 0000-00-00 00:00:00  ||              |         6 |        56 | ON            |                                               |                 0 |                    | 0000-00-00 00:00:00  |+--------------+-----------+-----------+---------------+-----------------------------------------------+-------------------+--------------------+----------------------+6 rows in set (0.00 sec)經過本條查詢,能夠看到開啓了6個並行進行復制。

  • 細找瓶頸?
  • 經過以上來看,全部一切彷佛都正常,並行複製開了,而且查看到CPU,IO,內存均沒有達到瓶頸地步。主庫寫binlog日誌大概爲2MB/s,這樣的日質量並非很是高。

    從庫也一樣查看了是否存在鎖的狀況,也沒有發現。

    所以反覆的在查看slave狀態,看可否發現一些細節,結果還真看到了一些異常現象。Relay_Log_Pos這個參數在頻繁的刷slave狀態時,發現時常會卡着不動(此時已確認沒有看到鎖)。或許問題真正的緣由正在這裏!

  • 順藤摸瓜!
  • 分析binlog或者relay log日誌,看有啥線索:

    [mysql@xxx data]$ mysqlbinlog --no-defaults -v -v --base64-output=DECODE-ROWS  relay-bin.001384 --start-position=420090430|more

    /!50530 SET @@SESSION.PSEUDOSLAVEMODE=1/;/!50003 SET @OLDCOMPLETIONTYPE=@@COMPLETIONTYPE,COMPLETIONTYPE=0/;DELIMITER /!/;# at 420090430

    190923 9:24:28 server id 42 endlogpos 420090282 CRC32 0xd9097eaf GTID lastcommitted=57148 sequencenumber=57149 rbr_only=yes

    /!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED//!/;SET @@SESSION.GTID_NEXT= 'eab7fcac-3cda-11e6-ada8-fa163e648db2:69415610'/*!*/;

    at 420090495

    190923 9:24:28 server id 42 endlogpos 420090364 CRC32 0x82b57dfd Query threadid=95 exectime=0 error_code=0

    SET TIMESTAMP=1569201868/!/;SET @@session.pseudothreadid=95/!/;SET @@session.foreignkeychecks=1, @@session.sqlautoisnull=0, @@session.uniquechecks=1, @@session.autocommit=1/!/;SET @@session.sql_mode=1075838976/*!*/;SET @@session.autoincrementincrement=2, @@session.autoincrementoffset=1/!/;/!C gbk //!/;SET @@session.charactersetclient=28,@@session.collationconnection=28,@@session.collationserver=8/!/;SET @@session.lctimenames=0/!/;SET @@session.collation_database=DEFAULT/*!*/;BEGIN/!/;

    at 420090577

    190923 9:24:28 server id 42 endlogpos 420090585 CRC32 0x752e27cf Tablemap: net.`fevent` mapped to number 108

    at 420090798

    190923 9:24:28 server id 42 endlogpos 420090812 CRC32 0x72b8e10d Tablemap: net.`feventstorage` mapped to number 245

    at 420091025

    190923 9:24:28 server id 42 endlogpos 420091039 CRC32 0x1797f9d8 Tablemap: net.`feventstorage` mapped to number 245

    at 420091252

    190923 9:24:28 server id 42 endlogpos 420091106 CRC32 0x8af14ad2 Tablemap: net.`feventdetail` mapped to number 243

    at 420091319

    190923 9:24:28 server id 42 endlogpos 420091177 CRC32 0xf1ce87c8 Tablemap: net.`feventoperation` mapped to number 244

    at 420091390

    190923 9:24:28 server id 42 endlogpos 420091244 CRC32 0x586c0b9d Tablemap: net.`feventaudit` mapped to number 242

    at 420091457

    190923 9:24:28 server id 42 endlogpos 420093382 CRC32 0x505e5408 Update_rows: table id 108

    at 420093595

    190923 9:24:28 server id 42 endlogpos 420098858 CRC32 0x0f404509 Update_rows: table id 245

    at 420099071

    190923 9:24:28 server id 42 endlogpos 420098910 CRC32 0xb8d9ed15 Write_rows: table id 243

    at 420099123

    190923 9:24:28 server id 42 endlogpos 420098966 CRC32 0x3c489a7f Writerows: table id 244 flags: STMTEND_F

    咱們查看了中繼日誌relay-bin.001384卡住的位置號420090430,爲設置GTID_NEXT,信息沒什麼用。

    繼續看在卡住時刻,數據庫open的是什麼表?

    mysql>   show open tables where In_use=1;+-----------+---------------------+--------+-------------+| Database  | Table               | Inuse | Namelocked |+-----------+---------------------+--------+-------------+| net       | f_currentxxx        |      1 |           0 |+-----------+---------------------+--------+-------------+
    這張表有什麼特殊的麼?查看其表結構
    mysql>         show create table net.f_currentxxxG* 1. row *Table: f_currentxxxCreate Table: CREATE TABLE  ( int(20) NOT NULL COMMENT 'xxx', bigint(20) NOT NULL COMMENT 'xxx', int(11) DEFAULT NULL COMMENT 'xxx', int(11) DEFAULT NULL COMMENT 'xxx,KEY  ()) ENGINE=MEMORY DEFAULT CHARSET=gbk COMMENT='xxx'1 row in set (0.00 sec)f_currentxxx複製代碼serial複製代碼audittime複製代碼type複製代碼severity複製代碼audittime複製代碼audittime複製代碼
    有沒有看到什麼不同凡響?

    沒錯,就是表的存儲引擎ENGINE=MEMORY。MEMORY的表進行主從複製,首先來講若是從庫不作查詢,一點意義沒有,另外對Memory表作複製,性能是很是堪憂的。若是必須進行同步,考慮將表的存儲引擎改成InnoDB

    mysql> select tablename from informationschema.tables where TABLE_SCHEMA='net' and ENGINE='memory';+----------------------+| table_name           |+----------------------+| f_currentxxx         |+----------------------+1 row in set, 6 warnings (0.01 sec)出於嚴謹,咱們將要複製的數據庫進行完全排查,確實只有這一張表是Memory存儲引擎。

  • 措施
  • 中止複製進程,將選項中添加replicate-ignore-table=net.f_currentxxx,從新啓動複製進程,觀察slave狀態。

    mysql> show slave statusG* 1. row *SlaveIOState: Waiting for master to send eventMaster_Host: xxx.xxx.xxx.xxxMaster_User: replMaster_Port: 3306Connect_Retry: 60MasterLogFile: mysql-bin.009194ReadMasterLog_Pos: 939698255RelayLogFile: relay-bin.001964RelayLogPos: 444060572RelayMasterLog_File: mysql-bin.009027SlaveIORunning: YesSlaveSQLRunning: YesReplicateDoDB: netReplicateIgnoreDB: mysqlReplicateDoTable:ReplicateIgnoreTable: net.f_currentxxxReplicateWildDo_Table:ReplicateWildIgnore_Table:Last_Errno: 0Last_Error:Skip_Counter: 0ExecMasterLog_Pos: 444060359RelayLogSpace: 180287882098Until_Condition: NoneUntilLogFile:UntilLogPos: 0MasterSSLAllowed: NoMasterSSLCA_File:MasterSSLCA_Path:MasterSSLCert:MasterSSLCipher:MasterSSLKey:SecondsBehindMaster: 179221MasterSSLVerifyServerCert: NoLastIOErrno: 0LastIOError:LastSQLErrno: 0LastSQLError:ReplicateIgnoreServer_Ids:MasterServerId: 42Master_UUID: eab7fcac-3cda-11e6-ada8-fa163e648db2MasterInfoFile: mysql.slavemasterinfoSQL_Delay: 0SQLRemainingDelay: NULLSlaveSQLRunning_State: Waiting for dependent transaction to commitMasterRetryCount: 86400Master_Bind:LastIOError_Timestamp:LastSQLError_Timestamp:MasterSSLCrl:MasterSSLCrlpath:RetrievedGtidSet: eab7fcac-3cda-11e6-ada8-fa163e648db2:69497322-107886661ExecutedGtidSet: 1264a536-da12-11e9-81ea-005056856ba5:1,eab7fcac-3cda-11e6-ada8-fa163e648db2:1-71980857Auto_Position: 1ReplicateRewriteDB:Channel_Name:MasterTLSVersion:1 row in set (0.00 sec)

    咱們能夠看到net.fcurrentxxx表已經被忽略複製。持續觀察一段時間後,SecondsBehind_Master在逐漸縮小。中繼日誌應用速度大約5分鐘一個(每一箇中繼日誌爲1GB大小),而主庫binlog日誌大約爲10分鐘一個(每一個binlog日誌爲1GB大小)。

  • 總結
  • 在梳理了整個處理過程後,其實難度不高,主要是要細心,細心去排查每個想到的點。 在非輕量級的數據庫中問題發生的機率也會隨着量級的增長而增多。而這偏偏是可以磨練我的的成長。

    同時,知識的儲備也要充足,這是進階高手的必要前提!

    相關文章
    相關標籤/搜索