mysql複製那點事 - Seconds_behind_Master參數調查筆記

mysql複製那點事 - Seconds_behind_Master參數調查筆記


0. 參考文獻

序號 文獻
1 MySQL 5.7 MTS源碼分析
2 MySQL · 答疑解惑 · 備庫Seconds_Behind_Master計算
3 MySQL · 答疑釋惑 · server_id爲0的Rotate
4 【案例】主從替換以後的複製風暴
5 MySQL 複製源碼解析
6 MySQL主庫02設置宕機致使的主庫數據丟失解決方法和緣由
7 Mysql rpl_slave.cc:handle_slave_io 源碼的一些我的分析
8 mysql slave 備庫延遲是怎麼獲得的
9 MySQL 編譯安裝而且開啓DEBUG模式
10 Bug #72376:Seconds_behind_master distorted because of previous_gtid event
11 MySQL Binlog解析(1)
12 理解MySQL——複製(Replication)

ps:如上的參考資料並不是徹底在本文中被引用,其餘沒有被引用的資料在本文的造成的工做中,提供了思路上和其餘方面的參考,所以一併列入參考文獻。感謝如上的文獻做者提供的參考。html


1. 問題背景

部門當前的數據庫架構是雙主模式,既線上由2臺互爲主從的數據庫搭建而成的集羣。高可用經過vip和headbeat來作保證。一般狀況下,vip掛在主(本文稱之爲da)上,當da發生了異常好比宕機等問題的時候,vip自動漂移至從(本文稱之爲dp)。架構以下圖所示:mysql

dadp同步圖

能夠看出,全部的寫入和讀取操做都在da上進行。da產生的binlog會被dp拉取重放。同時dp也會產生binlog並被da拉取重放(注:這裏不會產生循環複製(既da產生的binlog事件在dp執行過了以後,再被da讀取到再執行),由於binlog中每一個event都有個serverid標記,標記是哪一個server產生的事物。當da讀取到binlog的時候,發現serverid和本機的serverid一致,便會跳過當前的binlog event)。正常狀況下,da的Seconds_behind_Master參數,應該保持在0或者很小的數值纔對。可是監控系統發現,da的Seconds_behind_Master參數常常發生跳變,上一秒爲0下一秒就可能爲一個很大值(好比10000)。例以下面的數據,是經過腳本抓取到的線上Seconds_behind_Master狀況:git

2019-06-27 13:26:00
          Read_Master_Log_Pos: 472851480
               Relay_Log_File: da01-relay-bin.002323
                Relay_Log_Pos: 371
        Relay_Master_Log_File: dp01-bin.004267
          Exec_Master_Log_Pos: 472851480
        Seconds_Behind_Master: 0
2019-06-27 13:26:01
          Read_Master_Log_Pos: 473594815
               Relay_Log_File: da01-relay-bin.002323
                Relay_Log_Pos: 371
        Relay_Master_Log_File: dp01-bin.004267
          Exec_Master_Log_Pos: 473594649
        Seconds_Behind_Master: 622
2019-06-27 13:26:02
          Read_Master_Log_Pos: 474422859
               Relay_Log_File: da01-relay-bin.002323
                Relay_Log_Pos: 371
        Relay_Master_Log_File: dp01-bin.004267
          Exec_Master_Log_Pos: 474422859
        Seconds_Behind_Master: 0

2. 調查結論

前文講到了本文調查的問題背景,在這裏先給出下整個問題的調查結論。在sql/rpl_slave.cc 中計算Seconds_Behind_Master邏輯以下 (文獻2):github

if ((mi->get_master_log_pos() == mi->rli->get_group_master_log_pos()) && (!strcmp(mi->get_master_log_name(), mi->rli->get_group_master_log_name()))) { 
    if (mi->slave_running == MYSQL_SLAVE_RUN_CONNECT) 
    protocol->store(0LL); 
    else 
    protocol->store_null(); 
} else { 
  long time_diff= ((long)(time(0) - mi->rli->last_master_timestamp) - mi->clock_diff_with_master);

  protocol->store((longlong)(mi->rli->last_master_timestamp ? max(0L, time_diff) : 0)); 
}

當Exec_Master_Log_Pos <Read_Master_Log_Pos 的時候進入else裏面。在da的io進程拉取dp的binlog的時候,會發現dp的binlog的serverid是自身產生的。 因此不會去更新last_master_timestamp。若是dp的binlog發生過輪轉,會產生一個ROTATE_EVENT事件更新這個last_master_timestamp。所以當sql進程比io進程慢的時候, 計算出來的time_diff 即是當前時間與上次ROTATE_EVENT事件的差值。復現方法:sql

  1. dp上flush logs
  2. da上stop slave sql_thread;
  3. da上DML數據。此時觀察show slave status 能夠看到Read_Master_Log_Pos 很超前。
  4. da上start slave ;show slave status \G 能夠看到Seconds_Behind_Master變大。

3. 調查與分析過程

從前文的結果中能夠看出,跟Seconds_Behind_Master(下文簡稱SBM)值有關的mysql 變量就是last_master_timestamp。所以須要追蹤下在收到的binlog的serverid和mysql進程的serverid一致的狀況下,last_master_timestamp是如何更新的。根據前文的復現方法,須要去確認以下的2個點:shell

  1. dp上flush logs的時候,da獲取到dp的binglog作的操做。
  2. 收到的serverid和本機一致的binlog的時候,da的mysql進程的操做。

爲了便於觀察mysql的運行邏輯,本文參考文獻9的辦法,編譯了一個debug版本。在mysql運行的過程當中,tail -f mysqld.trace文件觀察mysql命令執行的過程。數據庫

3.1 輪轉binlog時的運行邏輯

3.1.1 寫入ROTATE_EVENT

在dp 執行flush logs 或者輪轉binlog的時候,會產生一個ROTATE_EVENT事件(具體的解析能夠參考文獻11)。在mysqld.trace文件中,能夠觀察到以下的信息,標誌着da此時接收到了dp的binlog輪轉事件的ROTATE_EVENT。緩存

handle_slave_io: info: IO thread received event of type Rotate

相應的代碼在sql/rpl_slave.cc文件的5819行中:架構

5817       THD_STAGE_INFO(thd, stage_queueing_master_event_to_the_relay_log);
 5818       event_buf= (const char*)mysql->net.read_pos + 1;
 5819       DBUG_PRINT("info", ("IO thread received event of type %s",
 5820                  Log_event::get_type_str(
 5821                             (Log_event_type)event_buf[EVENT_TYPE_OFFSET])));
          
          ... ...
              
 5831
 5832       /* XXX: 'synced' should be updated by queue_event to indicate
 5833          whether event has been synced to disk */
 5834       bool synced= 0;
 5835       if (queue_event(mi, event_buf, event_len))
 5836       {
 5837         mi->report(ERROR_LEVEL, ER_SLAVE_RELAY_LOG_WRITE_FAILURE,
 5838                    ER(ER_SLAVE_RELAY_LOG_WRITE_FAILURE),
 5839                    "could not queue event from master");
 5840         goto err;
 5841       }

能夠看到在sql進程接收到一個binlog以後,會調用queue_event進行寫入relaylog。queue_event處理ROTATE_EVENT關鍵代碼以下:

8236  switch (event_type) {
         
      ... ...
           
 8251  case binary_log::ROTATE_EVENT:
 8252  {
 8253     Rotate_log_event rev(buf, checksum_alg != binary_log::BINLOG_CHECKSUM_ALG_OFF ?
 8254                          event_len - BINLOG_CHECKSUM_LEN : event_len,
 8255                          mi->get_mi_description_event());
 8256 
 8257     if (unlikely(process_io_rotate(mi, &rev)))
 8258     {
 8259       mi->report(ERROR_LEVEL, ER_SLAVE_RELAY_LOG_WRITE_FAILURE,
 8260                  ER(ER_SLAVE_RELAY_LOG_WRITE_FAILURE),
 8261                  "could not queue event from master");
 8262       goto err;
 8263     }

在switch中判斷binlog若是是ROTATE_EVENT,則調用process_io_rotate進行處理。其中buf是sql進行拉取到的binlog的緩存,mi是表明了masterinfo。process_io_rotate主要的工做是更新master_info 的相關信息,例以下一個binglog的位置,master的binlog位置信息等。能夠在mysqld.trace 文件中觀察到以下的信息:

process_io_rotate: info: new (master_log_name, master_log_pos): ('dp-bin.000029', 4)

process_io_rotate函數註釋信息以下:

7767 /**
 7768   Used by the slave IO thread when it receives a rotate event from the
 7769   master.
 7770
 7771   Updates the master info with the place in the next binary log where
 7772   we should start reading.  Rotate the relay log to avoid mixed-format
 7773   relay logs.
 7774
 7775   @param mi master_info for the slave
 7776   @param rev The rotate log event read from the master
 7777
 7778   @note The caller must hold mi->data_lock before invoking this function.
 7779
 7780   @retval 0 ok
 7781   @retval 1 error
 <!--7782 */-->
 7783 static int process_io_rotate(Master_info *mi, Rotate_log_event *rev)
     
... ...
     
 7830 }

在完成相關的工做以後,來到queue_event函數的最後,寫入ROTATE_EVENT,在mysqld.trace中能夠看到以下的信息:

queue_event: info: master_log_pos: 4
harvest_bytes_written: info: counter: 769  bytes_written: 44
queue_event: info: error: 0

相關代碼以下:

8656   {
 8657     bool is_error= false;
 8658     /* write the event to the relay log */
 8659     if (likely(rli->relay_log.append_buffer(buf, event_len, mi) == 0))
 8660     {
 8661       mi->set_master_log_pos(mi->get_master_log_pos() + inc_pos);
 8662       DBUG_PRINT("info", ("master_log_pos: %lu", (ulong) mi->get_master_log_pos()));
 8663       rli->relay_log.harvest_bytes_written(rli, true/*need_log_space_lock=true*/);
     
          ... ...
              
          }
         }

到此,ROTATE_EVENT事件被徹底寫入到了relaylog中。借用一張經典mysql複製原理圖(文獻12)說明這個過程,即完成了圖中從BinaryLog 讀取到RelayLog的寫入過程。

mysql03-1

3.1.2 sql 進程讀取和執行ROTATE_EVENT

在前文的描述中,能夠觀察到在RelayLog的寫入過程當中並無去更新last_master_timestamp相關信息。所以更新last_master_timestamp的工做,只可能在sql進程中去完成。接下來本節將探索下ROTATE_EVENT在sql進程中的執行過程。

mysql複製代碼中,sql進程的入口函數是handle_slave_sql:

7158 /**
 7159   Slave SQL thread entry point.
 7160 
 7161   @param arg Pointer to Relay_log_info object that holds information
 7162   for the SQL thread.
 7163 
 7164   @return Always 0.
 7165 */
 7166 extern "C" void *handle_slave_sql(void *arg)

在handle_slave_sql中,關鍵的代碼是while (!sql_slave_killed(thd,rli))這個循環:

7438   while (!sql_slave_killed(thd,rli))
 7439   {
 7440     THD_STAGE_INFO(thd, stage_reading_event_from_the_relay_log);
 7441     DBUG_ASSERT(rli->info_thd == thd);
 7442     THD_CHECK_SENTRY(thd);
 7443
 7444     if (saved_skip && rli->slave_skip_counter == 0)
 7445     {
     
             ... ...
                 
 7456     }
 7457
 7458     if (exec_relay_log_event(thd,rli))
 7459     {
 7460        ... ...
 7525       }
 7526       goto err;
 7527     }
 7528   }

能夠發現,每次循環的時候,都會去執行exec_relay_log_event這個函數:

5098   Top-level function for executing the next event in the relay log.
 5099   This is called from the SQL thread.
 5100
 5101   This function reads the event from the relay log, executes it, and
 5102   advances the relay log position.  It also handles errors, etc.
 5103
 5104   This function may fail to apply the event for the following reasons:
 5105
 5106    - The position specfied by the UNTIL condition of the START SLAVE
 5107      command is reached.
 5108
 5109    - It was not possible to read the event from the log.
 5110
 5111    - The slave is killed.
 5112
 5113    - An error occurred when applying the event, and the event has been
 5114      tried slave_trans_retries times.  If the event has been retried
 5115      fewer times, 0 is returned.
 5116
 5117    - init_info or init_relay_log_pos failed. (These are called
 5118      if a failure occurs when applying the event.)
 5119
 5120    - An error occurred when updating the binlog position.
 5121
 5122   @retval 0 The event was applied.
 5123
 5124   @retval 1 The event was not applied.
 5125 */
 5126 static int exec_relay_log_event(THD* thd, Relay_log_info* rli)

在exec_relay_log_event中,執行對應的relaylog以後,會相應的更新last_master_timestamp。在這裏並行複製模式和非並行複製模式下,更新last_master_timestamp的方式是不一樣的。在這裏先介紹下非並行複製模式下更新last_master_timestamp的步驟。

3.1.2.1 非並行複製模式下更新last_master_timestamp

在exec_relay_log_event中判斷是不是並行複製是經過is_parallel_exec函數實現的。若是是並行複製模式則返回True,不然返回False。在非並行複製的模式下,會進入以下的代碼執行:

5175     /*
 5176       Even if we don't execute this event, we keep the master timestamp,
 5177       so that seconds behind master shows correct delta (there are events
 5178       that are not replayed, so we keep falling behind).
 5179
 5180       If it is an artificial event, or a relay log event (IO thread generated
 5181       event) or ev->when is set to 0, or a FD from master, or a heartbeat
 5182       event with server_id '0' then  we don't update the last_master_timestamp.
 5183
 5184       In case of parallel execution last_master_timestamp is only updated when
 5185       a job is taken out of GAQ. Thus when last_master_timestamp is 0 (which
 5186       indicates that GAQ is empty, all slave workers are waiting for events from
 5187       the Coordinator), we need to initialize it with a timestamp from the first
 5188       event to be executed in parallel.
 5189     */
 5190     if ((!rli->is_parallel_exec() || rli->last_master_timestamp == 0) &&
 5191          !(ev->is_artificial_event() || ev->is_relay_log_event() ||
 5192           (ev->common_header->when.tv_sec == 0) ||
 5193           ev->get_type_code() == binary_log::FORMAT_DESCRIPTION_EVENT ||
 5194           ev->server_id == 0))
 5195     {
 5196       rli->last_master_timestamp= ev->common_header->when.tv_sec +
 5197                                   (time_t) ev->exec_time;
 5198       DBUG_ASSERT(rli->last_master_timestamp >= 0);
 5199     }

能夠看到,last_master_timestamp的值即是ev->common_header->when.tv_sec和(time_t) ev->exec_time的和。在代碼中加上點調試信息,從新編譯:

DBUG_PRINT("info", ("before rli->last_master_timestamp = %lu", rli->last_master_timestamp));
      DBUG_PRINT("info", ("before rli->is_parallel_exec() = %d", int(rli->is_parallel_exec())));
    if ((!rli->is_parallel_exec() || rli->last_master_timestamp == 0) &&
         !(ev->is_artificial_event() || ev->is_relay_log_event() ||
          (ev->common_header->when.tv_sec == 0) ||
          ev->get_type_code() == binary_log::FORMAT_DESCRIPTION_EVENT ||
          ev->server_id == 0))
    {
      rli->last_master_timestamp= ev->common_header->when.tv_sec +
                                  (time_t) ev->exec_time;

      DBUG_PRINT("info", ("after rli->last_master_timestamp = %lu", rli->last_master_timestamp));
      DBUG_PRINT("info", ("after rli->is_parallel_exec() = %d", int(rli->is_parallel_exec())));
      DBUG_ASSERT(rli->last_master_timestamp >= 0);
    }

能夠在mysqld.trace中觀察到以下的調試信息,印證了前文關於非並行複製的結論。同時也能夠看出,非並行複製下last_master_timestamp的更新是在binlogevent執行以前。

exec_relay_log_event: info: before rli->last_master_timestamp = 0
exec_relay_log_event: info: before rli->is_parallel_exec() = 0
exec_relay_log_event: info: after rli->last_master_timestamp = 1562201989
exec_relay_log_event: info: after rli->is_parallel_exec() = 0

非並行複製更新

3.1.2.2 並行複製模式下更新last_master_timestamp

前文介紹了非並行複製模式下的last_master_timestamp的更新邏輯。本節將介紹下在並行複製模式下,last_master_timestamp是如何更新的。

在並行複製模式下last_master_timestamp的更新邏輯更爲複雜。同時對於ROTATE_EVENT事件和普通的binlog事件更新模式也有所不一樣。接下來首先介紹下普通的binlog事件下last_master_timestamp的更新邏輯:

3.1.2.2.1 普通的binlog事件下last_master_timestamp的更新邏輯

文獻1文獻2中對於並行複製邏輯的分析以下:

​ 並行複製有一個分發隊列gaq,sql線程將binlog事務讀取到gaq,而後再分發給worker線程執行。並行複製時,binlog事件是併發穿插執行的,gaq中有一個checkpoint點稱爲lwm, lwm以前的binlog都已經執行,而lwm以後的binlog有些執行有些沒有執行。 假設worker線程數爲2,gap有1,2,3,4,5,6,7,8個事務。worker 1已執行的事務爲1 4 6, woker 2執行的事務爲2 3 ,那麼lwm爲4。

​ 並行複製更新gap checkpiont時,會推動lwm點,同時更新last_master_timestamp爲lwm所在事務結束的event的時間。所以,並行複製是在事務執行完成後才更新last_master_timestamp,更新是以事務爲單位。同時更新gap checkpiont還受slave_checkpoint_period參數的影響。

​ coordinator會調用函數mts_checkpoint_routune按期作「checkpoint」操做,將Relay_log_info.gaq中已經執行結束的事務移出隊列,移除方式爲從隊列頭開始檢查,若是done爲真則移出,不然中止掃描,並標記Low Water Mark爲移出的事務中最大的sequence_number(lock-interval終止點)

能夠看出,並行複製下last_master_timestamp的更新邏輯與lwm和mts_checkpoint_routune函數有關。mts_checkpoint_routune函數頭以下:

/**
   Processing rli->gaq to find out the low-water-mark (lwm) coordinates
   which is stored into the cental recovery table.

   @param rli            pointer to Relay-log-info of Coordinator
   @param period         period of processing GAQ, normally derived from
                         @c mts_checkpoint_period
   @param force          if TRUE then hang in a loop till some progress
   @param need_data_lock False if rli->data_lock mutex is aquired by
                         the caller.

   @return FALSE success, TRUE otherwise
*/
bool mts_checkpoint_routine(Relay_log_info *rli, ulonglong period,
                            bool force, bool need_data_lock)

在coordinator中會按期調用mts_checkpoint_routine將Relay_log_info.gaq中已經執行結束的事務移出隊列,而且更新Low Water Mark爲移出的事務中最大的sequence_number(參見mts_checkpoint_routine中以下):

do
  {
    if (!is_mts_db_partitioned(rli))
      mysql_mutex_lock(&rli->mts_gaq_LOCK);

    cnt= rli->gaq->move_queue_head(&rli->workers);

    if (!is_mts_db_partitioned(rli))
      mysql_mutex_unlock(&rli->mts_gaq_LOCK);
#ifndef DBUG_OFF
    if (DBUG_EVALUATE_IF("check_slave_debug_group", 1, 0) &&
        cnt != opt_mts_checkpoint_period)
      sql_print_error("This an error cnt != mts_checkpoint_period");
#endif
  } while (!sql_slave_killed(rli->info_thd, rli) &&
           cnt == 0 && force &&
           !DBUG_EVALUATE_IF("check_slave_debug_group", 1, 0) &&
           (my_sleep(rli->mts_coordinator_basic_nap), 1));

在函數的最後,會執行last_master_timestamp的更新邏輯:

/*
    Update the rli->last_master_timestamp for reporting correct Seconds_behind_master.

    If GAQ is empty, set it to zero.
    Else, update it with the timestamp of the first job of the Slave_job_queue
    which was assigned in the Log_event::get_slave_worker() function.
  */
  ts= rli->gaq->empty()
    ? 0
    : reinterpret_cast<Slave_job_group*>(rli->gaq->head_queue())->ts;
  rli->reset_notified_checkpoint(cnt, ts, need_data_lock, true);

能夠看出若是GAP爲空(既work的隊列爲空)則更新last_master_timestamp爲0,不然更新爲隊列中第一個是元素的timestamp。

3.1.2.2.2 ROTATE_EVENT事件下last_master_timestamp的更新模式

前文提到在並行複製模式下,普通的binlog事件會被coordinator進程分發到work中去執行。而且在mts_checkpoint_routine中去推動Low Water Mark和更新last_master_timestamp。可是若是執行的binlog是一個ROTATE_EVENT事件,則coordinator進程不會將事件分發到work中,而是在coordinator進程中本身執行並更新last_master_timestamp。

對於ROTATE_EVENT事件的執行邏輯,入口依然是exec_relay_log_event函數。在函數中調用了apply_event_and_update_pos函數執行binlog事件。

5247     /* ptr_ev can change to NULL indicating MTS coorinator passed to a Worker */
 5248     exec_res= apply_event_and_update_pos(ptr_ev, thd, rli);
 5249     /*
 5250       Note: the above call to apply_event_and_update_pos executes
 5251       mysql_mutex_unlock(&rli->data_lock);
 5252     */
 5253
 5254     /* For deferred events, the ptr_ev is set to NULL
 5255         in Deferred_log_events::add() function.
 5256         Hence deferred events wont be deleted here.
 5257         They will be deleted in Deferred_log_events::rewind() funciton.
 5258     */

在apply_event_and_update_pos函數中,會首先調用apply_event去判斷當前的binlog事件是否能夠被分發到work進程執行。在這裏收到的binlog事件是ROTATE_EVENT,所以apply_event返回的exec_res爲0,ev->worker==rli(rli 表明了coordinator進程),所以不會進入下面的if邏輯中:

4709   if (reason == Log_event::EVENT_SKIP_NOT)
 4710   {
 4711     // Sleeps if needed, and unlocks rli->data_lock.
 4712     if (sql_delay_event(ev, thd, rli))
 4713       DBUG_RETURN(SLAVE_APPLY_EVENT_AND_UPDATE_POS_OK);
 4714
 4715     exec_res= ev->apply_event(rli);
 4717     if (!exec_res && (ev->worker != rli))
 4718     {
                        ... ...
          }

在mysqld.trace 中能夠觀察到以下的輸出:

Log_event::shall_skip: info: skip reason=0=NOT
LOG_EVENT:apply_event: info: event_type=Rotate
apply_event_and_update_pos: info: apply_event error = 0

apply_event函數定義和註釋以下:

/**
   Scheduling event to execute in parallel or execute it directly.
   In MTS case the event gets associated with either Coordinator or a
   Worker.  A special case of the association is NULL when the Worker
   can't be decided yet.  In the single threaded sequential mode the
   event maps to SQL thread rli.

   @note in case of MTS failure Coordinator destroys all gathered
         deferred events.

   @return 0 as success, otherwise a failure.
*/
int Log_event::apply_event(Relay_log_info *rli)

由於沒法被分發到work執行,所以會進入以下的執行邏輯中:

4832   DBUG_PRINT("info", ("apply_event error = %d", exec_res));
 4833   if (exec_res == 0)
 4834   {
 4835     /*
 4836       Positions are not updated here when an XID is processed. To make
 4837       a slave crash-safe, positions must be updated while processing a
 4838       XID event and as such do not need to be updated here again.
 4839
 4840       However, if the event needs to be skipped, this means that it
 4841       will not be processed and then positions need to be updated here.
 4842
 4843       See sql/rpl_rli.h for further details.
 4844     */
 4845     int error= 0;
 4846     if (*ptr_ev &&
 4847         (ev->get_type_code() != binary_log::XID_EVENT ||
 4848          skip_event || (rli->is_mts_recovery() && !is_gtid_event(ev) &&
 4849          (ev->ends_group() || !rli->mts_recovery_group_seen_begin) &&
 4850           bitmap_is_set(&rli->recovery_groups, rli->mts_recovery_index))))
 4851     {
 4852 #ifndef DBUG_OFF
 4853       /*
 4854         This only prints information to the debug trace.
 4855
 4856         TODO: Print an informational message to the error log?
 4857       */
            ... ...
   
 4873       error= ev->update_pos(rli);
                    
                    ... ...
           }
         }

能夠發現,在此調用了update_pos進行更新。由於當前的binlog事件是ROTATE_EVENT,所以會調用以下的代碼:

/*
  Got a rotate log event from the master.

  This is mainly used so that we can later figure out the logname and
  position for the master.

  We can't rotate the slave's BINlog as this will cause infinitive rotations
  in a A -> B -> A setup.
  The NOTES below is a wrong comment which will disappear when 4.1 is merged.

  This must only be called from the Slave SQL thread, since it calls
  flush_relay_log_info().

  @retval
    0   ok
*/
int Rotate_log_event::do_update_pos(Relay_log_info *rli)

在do_update_pos 更新last_master_timestamp 的關鍵邏輯以下:

if (rli->is_parallel_exec())
{
      bool real_event= server_id && !is_artificial_event();
      rli->reset_notified_checkpoint(0,
                           real_event ?
                           common_header->when.tv_sec +
                           (time_t) exec_time : 0,
                           true/*need_data_lock=true*/,
                           real_event? true : false);
}

能夠在mysqld.trace 中觀察到以下的輸出:

Rotate_log_event::do_update_pos: info: server_id=248; ::server_id=236
Rotate_log_event::do_update_pos: info: new_log_ident: dp-bin.000063
Rotate_log_event::do_update_pos: info: pos: 4
Rotate_log_event::do_update_pos: info: old group_master_log_name: 'dp-bin.000062'  old group_master_log_pos: 154
Relay_log_info::inc_group_relay_log_pos: info: log_pos: 4  group_master_log_pos: 154
Rotate_log_event::do_update_pos: info: new group_master_log_name: 'dp-bin.000063'  new group_master_log_pos: 4
Rotate_log_event::do_update_pos: info: ------> 1562684581(此處爲本文加入的調試信息,在reset_notified_checkpoint更新last_master_timestamp的時候打印出。)

ROTATE_EVENT事件下last_master_timestamp的更新模式

至此,對於並行複製和非並行複製下當收到ROTATE_EVENT事件時,更新last_master_timestamp的邏輯分析所有完成。

3.2 da執行與本實例serverid一致的binlog時的運行邏輯

前文分析了當mysql收到一個ROTATE_EVENT事件的時候所運行的邏輯。本節將分析下在mysql收到與本身的serverid一致的binlog事件的時候的運行邏輯。

3.2.1 io thread 處理與本實例serverid一致的binlog

首先來看下io thread 接收到與自己serverid一致的binlog的時候所作的操做。在handle_slave_io函數中,會在while循環中不斷的調用queue_even函數。

5740     while (!io_slave_killed(thd,mi))
 5741     {
 5742       ulong event_len;
           ... ...
 5750       event_len= read_event(mysql, mi, &suppress_warnings);
          
           ... ...
             
 5813       /* XXX: 'synced' should be updated by queue_event to indicate
 5814          whether event has been synced to disk */
 5815       bool synced= 0;
 5816       if (queue_event(mi, event_buf, event_len))
 5817       {
 5818         mi->report(ERROR_LEVEL, ER_SLAVE_RELAY_LOG_WRITE_FAILURE,
 5819                    ER(ER_SLAVE_RELAY_LOG_WRITE_FAILURE),
 5820                    "could not queue event from master");
 5821         goto err;
 5822       }   
           ... ...
          }

在queue_even函數中,對於收到與本身serverid一致的binlog的處理邏輯以下:

8534   /*
 8535      If this event is originating from this server, don't queue it.
 8536      We don't check this for 3.23 events because it's simpler like this; 3.23
 8537      will be filtered anyway by the SQL slave thread which also tests the
 8538      server id (we must also keep this test in the SQL thread, in case somebody
 8539      upgrades a 4.0 slave which has a not-filtered relay log).
 8540
 8541      ANY event coming from ourselves can be ignored: it is obvious for queries;
 8542      for STOP_EVENT/ROTATE_EVENT/START_EVENT: these cannot come from ourselves
 8543      (--log-slave-updates would not log that) unless this slave is also its
 8544      direct master (an unsupported, useless setup!).
 8545   */
   
        ... ...
   
 8560   if ((s_id == ::server_id && !mi->rli->replicate_same_server_id) ||
 8561       /*
 8562         the following conjunction deals with IGNORE_SERVER_IDS, if set
 8563         If the master is on the ignore list, execution of
 8564         format description log events and rotate events is necessary.
 8565       */
 8566       (mi->ignore_server_ids->dynamic_ids.size() > 0 &&
 8567        mi->shall_ignore_server_id(s_id) &&
 8568        /* everything is filtered out from non-master */
 8569        (s_id != mi->master_id ||
 8570         /* for the master meta information is necessary */
 8571         (event_type != binary_log::FORMAT_DESCRIPTION_EVENT &&
 8572          event_type != binary_log::ROTATE_EVENT))))
 8573   {
 8574     /*
 8575       Do not write it to the relay log.
 8576       a) We still want to increment mi->get_master_log_pos(), so that we won't
 8577       re-read this event from the master if the slave IO thread is now
 8578       stopped/restarted (more efficient if the events we are ignoring are big
 8579       LOAD DATA INFILE).
 8580       b) We want to record that we are skipping events, for the information of
 8581       the slave SQL thread, otherwise that thread may let
 8582       rli->group_relay_log_pos stay too small if the last binlog's event is
 8583       ignored.
 8584       But events which were generated by this slave and which do not exist in
 8585       the master's binlog (i.e. Format_desc, Rotate & Stop) should not increment
 8586       mi->get_master_log_pos().
 8587       If the event is originated remotely and is being filtered out by
 8588       IGNORE_SERVER_IDS it increments mi->get_master_log_pos()
 8589       as well as rli->group_relay_log_pos.
 8590     */
 8591     if (!(s_id == ::server_id && !mi->rli->replicate_same_server_id) ||
 8592         (event_type != binary_log::FORMAT_DESCRIPTION_EVENT &&
 8593          event_type != binary_log::ROTATE_EVENT &&
 8594          event_type != binary_log::STOP_EVENT))
 8595     {
 8596       mi->set_master_log_pos(mi->get_master_log_pos() + inc_pos);
 8597       memcpy(rli->ign_master_log_name_end, mi->get_master_log_name(), FN_REFLEN);
 8598       DBUG_ASSERT(rli->ign_master_log_name_end[0]);
 8599       rli->ign_master_log_pos_end= mi->get_master_log_pos();
 8600     }
 8601     rli->relay_log.signal_update(); // the slave SQL thread needs to re-check
 8602     DBUG_PRINT("info", ("master_log_pos: %lu, event originating from %u server, ignored",
 8603                         (ulong) mi->get_master_log_pos(), uint4korr(buf + SERVER_ID_OFFSET)));
 8604   }

結合註釋,能夠看出當mysql收到與本實例serverid一致的binlog的時候,不會將當前的binlog事件寫入relaylog中。同時會完成以下的2個事情:(ps :這是一個關鍵的地方,在下文中將提到它的做用)

  1. 將mi->get_master_log_name() 拷貝到rli->ign_master_log_name_end中。
  2. rli->relay_log.signal_update()。此處會新signal_cnt這個變量的值。

在mysqld.trace 中能夠觀察到以下的日誌信息:

queue_event: info: master_log_pos: 219, event originating from 236 server, ignored
queue_event: info: error: 0

3.2.2 sql thread 處理與本實例serverid一致的binlog

接下里看看sql進程對binlog的處理過程。入口的函數仍是exec_relay_log_event,在exec_relay_log_event函數中會調用next_event函數獲取一個可執行的binlog事件。在這裏next_event對於非並行複製會有一個特殊的處理:

9175         if (!rli->is_parallel_exec())
 9176           rli->last_master_timestamp= 0;

若是是非並行複製,則當讀取一個binlog的時候,都會把last_master_timestamp設置成0 。所以在非並行複製下,收到與本實例serverid一致的binlog的時候,mysqld.trace中能夠觀察到rli->last_master_timestamp的值會一直爲0:

next_event: info: seeing an ignored end segment
handle_slave_io: info: IO thread received event of type Query
exec_relay_log_event: info: ================================before rli->last_master_timestamp = 0
exec_relay_log_event: info: ================================before rli->is_parallel_exec() = 0
apply_event_and_update_pos: info: thd->options: ; rli->last_event_start_time: 0
Log_event::shall_skip: info: ev->server_id=0, ::server_id=236, rli->replicate_same_server_id=0, rli->slave_skip_counter=0
Log_event::shall_skip: info: skip reason=0=NOT
LOG_EVENT:apply_event: info: event_type=Rotate

... ...
... ...

exec_relay_log_event: info: ================================before rli->last_master_timestamp = 0
exec_relay_log_event: info: ================================before rli->is_parallel_exec() = 0
apply_event_and_update_pos: info: thd->options: ; rli->last_event_start_time: 0
Log_event::shall_skip: info: ev->server_id=0, ::server_id=236, rli->replicate_same_server_id=0, rli->slave_skip_counter=0
Log_event::shall_skip: info: skip reason=0=NOT
LOG_EVENT:apply_event: info: event_type=Rotate
apply_event_and_update_pos: info: apply_event error = 0
apply_event_and_update_pos: info: OPTION_BEGIN: 0; IN_STMT: 0
Rotate_log_event::do_update_pos: info: server_id=0; ::server_id=236
Rotate_log_event::do_update_pos: info: new_log_ident: dp-bin.000069
Rotate_log_event::do_update_pos: info: pos: 282
MYSQL_BIN_LOG::signal_update: info: signal_cnt : 13
queue_event: info: master_log_pos: 326, event originating from 236 server, ignored
queue_event: info: error: 0

以後的邏輯中,若是發現rli->ign_master_log_name_end[0]不爲空(對應了上一個小節中的第一項:將mi->get_master_log_name() 拷貝到rli->ign_master_log_name_end中),則構造一個serverid爲0的Rotate event並返回:

9181         if (rli->ign_master_log_name_end[0])
 9182         {
 9183           /* We generate and return a Rotate, to make our positions advance */
 9184           DBUG_PRINT("info",("seeing an ignored end segment"));
 9185           ev= new Rotate_log_event(rli->ign_master_log_name_end,
 9186                                    0, rli->ign_master_log_pos_end,
 9187                                    Rotate_log_event::DUP_NAME);
 9188           rli->ign_master_log_name_end[0]= 0;
 9189           mysql_mutex_unlock(log_lock);
 9190           if (unlikely(!ev))
 9191           {
 9192             errmsg= "Slave SQL thread failed to create a Rotate event "
 9193               "(out of memory?), SHOW SLAVE STATUS may be inaccurate";
 9194             goto err;
 9195           }
 9196           ev->server_id= 0; // don't be ignored by slave SQL thread
 9197           DBUG_RETURN(ev);
 9198         }

緊接着調用 apply_event_and_update_pos函數。在apply_event_and_update_pos函數中,如前所述若是是ROTATE EVENT 則返回0,不會被work進程並行執行,並進入update_pos邏輯中。在update_pos邏輯中關鍵的代碼以下(sql/log_event.cc 文件):

6649     if (rli->is_parallel_exec())
 6650     {
 6651       bool real_event= server_id && !is_artificial_event();
 6652       rli->reset_notified_checkpoint(0,
 6653                            real_event ?
 6654                            common_header->when.tv_sec +
 6655                            (time_t) exec_time : 0,
 6656                            true/*need_data_lock=true*/,
 6657                            real_event? true : false);
 6658     }

能夠見得real_event在server_id是0 的時候爲false。所以當進入reset_notified_checkpoint函數後,由於update_timestamp條件(傳入的值即是real_event的值)爲false便不會更新last_master_timestamp。

262 /**
 263    This method is called in mts_checkpoint_routine() to mark that each
 264    worker is required to adapt to a new checkpoint data whose coordinates
 265    are passed to it through GAQ index.
 266
 267    Worker notices the new checkpoint value at the group commit to reset
 268    the current bitmap and starts using the clean bitmap indexed from zero
 269    of being reset checkpoint_seqno.
 270
 271     New seconds_behind_master timestamp is installed.
 272
 273    @param shift            number of bits to shift by Worker due to the
 274                            current checkpoint change.
 275    @param new_ts           new seconds_behind_master timestamp value
 276                            unless zero. Zero could be due to FD event
 277                            or fake rotate event.
 278    @param need_data_lock   False if caller has locked @c data_lock
 279    @param update_timestamp if true, this function will update the
 280                            rli->last_master_timestamp.
 281 */
 282 void Relay_log_info::reset_notified_checkpoint(ulong shift, time_t new_ts,
 283                                                bool need_data_lock,
 284                                                bool update_timestamp)
 285 {
 
       ... ...
       
 330   if (update_timestamp)
 331   {
 332     if (need_data_lock)
 333       mysql_mutex_lock(&data_lock);
 334     else
 335       mysql_mutex_assert_owner(&data_lock);
 336     last_master_timestamp= new_ts;
 337     if (need_data_lock)
 338       mysql_mutex_unlock(&data_lock);
 339   }
 340 }

sql thread 處理與本實例serverid一致的binlog

3.3 小結

從上文的分析能夠得出以下的2個結論 :

  1. 並行複製下會發現last_master_timestamp的值,一直是ROTATE EVENT發生的時間。當Exec_Master_Log_Pos < Read_Master_Log_Pos的時候,便會發現Seconds_Behind_Master的值突變(突變的值是當前時間和ROTATE EVENT發生的時間差的秒數)
  2. 在非並行複製下,由於 rli->last_master_timestamp會在next_event被置爲0。同時以後的代碼也不會去更新這個參數的值,所以rli->last_master_timestamp一直爲0 。show slave status的時候,當Exec_Master_Log_Pos < Read_Master_Log_Pos的時候,Seconds_Behind_Master的值不會突變。

4. 總結

本文分析了並行複製和非並行複製下,Seconds_Behind_Master參數值更新的相關邏輯。限於本文的做者水平有限,文中的錯誤在所不免,懇請你們批評指正。

相關文章
相關標籤/搜索