目錄php
ps:如上的參考資料並不是徹底在本文中被引用,其餘沒有被引用的資料在本文的造成的工做中,提供了思路上和其餘方面的參考,所以一併列入參考文獻。感謝如上的文獻做者提供的參考。html
部門當前的數據庫架構是雙主模式,既線上由2臺互爲主從的數據庫搭建而成的集羣。高可用經過vip和headbeat來作保證。一般狀況下,vip掛在主(本文稱之爲da)上,當da發生了異常好比宕機等問題的時候,vip自動漂移至從(本文稱之爲dp)。架構以下圖所示:mysql
能夠看出,全部的寫入和讀取操做都在da上進行。da產生的binlog會被dp拉取重放。同時dp也會產生binlog並被da拉取重放(注:這裏不會產生循環複製(既da產生的binlog事件在dp執行過了以後,再被da讀取到再執行),由於binlog中每一個event都有個serverid標記,標記是哪一個server產生的事物。當da讀取到binlog的時候,發現serverid和本機的serverid一致,便會跳過當前的binlog event)。正常狀況下,da的Seconds_behind_Master參數,應該保持在0或者很小的數值纔對。可是監控系統發現,da的Seconds_behind_Master參數常常發生跳變,上一秒爲0下一秒就可能爲一個很大值(好比10000)。例以下面的數據,是經過腳本抓取到的線上Seconds_behind_Master狀況:git
2019-06-27 13:26:00 Read_Master_Log_Pos: 472851480 Relay_Log_File: da01-relay-bin.002323 Relay_Log_Pos: 371 Relay_Master_Log_File: dp01-bin.004267 Exec_Master_Log_Pos: 472851480 Seconds_Behind_Master: 0 2019-06-27 13:26:01 Read_Master_Log_Pos: 473594815 Relay_Log_File: da01-relay-bin.002323 Relay_Log_Pos: 371 Relay_Master_Log_File: dp01-bin.004267 Exec_Master_Log_Pos: 473594649 Seconds_Behind_Master: 622 2019-06-27 13:26:02 Read_Master_Log_Pos: 474422859 Relay_Log_File: da01-relay-bin.002323 Relay_Log_Pos: 371 Relay_Master_Log_File: dp01-bin.004267 Exec_Master_Log_Pos: 474422859 Seconds_Behind_Master: 0
前文講到了本文調查的問題背景,在這裏先給出下整個問題的調查結論。在sql/rpl_slave.cc 中計算Seconds_Behind_Master邏輯以下 (文獻2):github
if ((mi->get_master_log_pos() == mi->rli->get_group_master_log_pos()) && (!strcmp(mi->get_master_log_name(), mi->rli->get_group_master_log_name()))) { if (mi->slave_running == MYSQL_SLAVE_RUN_CONNECT) protocol->store(0LL); else protocol->store_null(); } else { long time_diff= ((long)(time(0) - mi->rli->last_master_timestamp) - mi->clock_diff_with_master); protocol->store((longlong)(mi->rli->last_master_timestamp ? max(0L, time_diff) : 0)); }
當Exec_Master_Log_Pos <Read_Master_Log_Pos 的時候進入else裏面。在da的io進程拉取dp的binlog的時候,會發現dp的binlog的serverid是自身產生的。 因此不會去更新last_master_timestamp。若是dp的binlog發生過輪轉,會產生一個ROTATE_EVENT事件更新這個last_master_timestamp。所以當sql進程比io進程慢的時候, 計算出來的time_diff 即是當前時間與上次ROTATE_EVENT事件的差值。復現方法:sql
- dp上flush logs
- da上stop slave sql_thread;
- da上DML數據。此時觀察show slave status 能夠看到Read_Master_Log_Pos 很超前。
- da上start slave ;show slave status \G 能夠看到Seconds_Behind_Master變大。
從前文的結果中能夠看出,跟Seconds_Behind_Master(下文簡稱SBM)值有關的mysql 變量就是last_master_timestamp。所以須要追蹤下在收到的binlog的serverid和mysql進程的serverid一致的狀況下,last_master_timestamp是如何更新的。根據前文的復現方法,須要去確認以下的2個點:shell
爲了便於觀察mysql的運行邏輯,本文參考文獻9的辦法,編譯了一個debug版本。在mysql運行的過程當中,tail -f mysqld.trace文件觀察mysql命令執行的過程。數據庫
在dp 執行flush logs 或者輪轉binlog的時候,會產生一個ROTATE_EVENT事件(具體的解析能夠參考文獻11)。在mysqld.trace文件中,能夠觀察到以下的信息,標誌着da此時接收到了dp的binlog輪轉事件的ROTATE_EVENT。緩存
handle_slave_io: info: IO thread received event of type Rotate
相應的代碼在sql/rpl_slave.cc文件的5819行中:架構
5817 THD_STAGE_INFO(thd, stage_queueing_master_event_to_the_relay_log); 5818 event_buf= (const char*)mysql->net.read_pos + 1; 5819 DBUG_PRINT("info", ("IO thread received event of type %s", 5820 Log_event::get_type_str( 5821 (Log_event_type)event_buf[EVENT_TYPE_OFFSET]))); ... ... 5831 5832 /* XXX: 'synced' should be updated by queue_event to indicate 5833 whether event has been synced to disk */ 5834 bool synced= 0; 5835 if (queue_event(mi, event_buf, event_len)) 5836 { 5837 mi->report(ERROR_LEVEL, ER_SLAVE_RELAY_LOG_WRITE_FAILURE, 5838 ER(ER_SLAVE_RELAY_LOG_WRITE_FAILURE), 5839 "could not queue event from master"); 5840 goto err; 5841 }
能夠看到在sql進程接收到一個binlog以後,會調用queue_event進行寫入relaylog。queue_event處理ROTATE_EVENT關鍵代碼以下:
8236 switch (event_type) { ... ... 8251 case binary_log::ROTATE_EVENT: 8252 { 8253 Rotate_log_event rev(buf, checksum_alg != binary_log::BINLOG_CHECKSUM_ALG_OFF ? 8254 event_len - BINLOG_CHECKSUM_LEN : event_len, 8255 mi->get_mi_description_event()); 8256 8257 if (unlikely(process_io_rotate(mi, &rev))) 8258 { 8259 mi->report(ERROR_LEVEL, ER_SLAVE_RELAY_LOG_WRITE_FAILURE, 8260 ER(ER_SLAVE_RELAY_LOG_WRITE_FAILURE), 8261 "could not queue event from master"); 8262 goto err; 8263 }
在switch中判斷binlog若是是ROTATE_EVENT,則調用process_io_rotate進行處理。其中buf是sql進行拉取到的binlog的緩存,mi是表明了masterinfo。process_io_rotate主要的工做是更新master_info 的相關信息,例以下一個binglog的位置,master的binlog位置信息等。能夠在mysqld.trace 文件中觀察到以下的信息:
process_io_rotate: info: new (master_log_name, master_log_pos): ('dp-bin.000029', 4)
process_io_rotate函數註釋信息以下:
7767 /** 7768 Used by the slave IO thread when it receives a rotate event from the 7769 master. 7770 7771 Updates the master info with the place in the next binary log where 7772 we should start reading. Rotate the relay log to avoid mixed-format 7773 relay logs. 7774 7775 @param mi master_info for the slave 7776 @param rev The rotate log event read from the master 7777 7778 @note The caller must hold mi->data_lock before invoking this function. 7779 7780 @retval 0 ok 7781 @retval 1 error <!--7782 */--> 7783 static int process_io_rotate(Master_info *mi, Rotate_log_event *rev) ... ... 7830 }
在完成相關的工做以後,來到queue_event函數的最後,寫入ROTATE_EVENT,在mysqld.trace中能夠看到以下的信息:
queue_event: info: master_log_pos: 4 harvest_bytes_written: info: counter: 769 bytes_written: 44 queue_event: info: error: 0
相關代碼以下:
8656 { 8657 bool is_error= false; 8658 /* write the event to the relay log */ 8659 if (likely(rli->relay_log.append_buffer(buf, event_len, mi) == 0)) 8660 { 8661 mi->set_master_log_pos(mi->get_master_log_pos() + inc_pos); 8662 DBUG_PRINT("info", ("master_log_pos: %lu", (ulong) mi->get_master_log_pos())); 8663 rli->relay_log.harvest_bytes_written(rli, true/*need_log_space_lock=true*/); ... ... } }
到此,ROTATE_EVENT事件被徹底寫入到了relaylog中。借用一張經典mysql複製原理圖(文獻12)說明這個過程,即完成了圖中從BinaryLog 讀取到RelayLog的寫入過程。
在前文的描述中,能夠觀察到在RelayLog的寫入過程當中並無去更新last_master_timestamp相關信息。所以更新last_master_timestamp的工做,只可能在sql進程中去完成。接下來本節將探索下ROTATE_EVENT在sql進程中的執行過程。
mysql複製代碼中,sql進程的入口函數是handle_slave_sql:
7158 /** 7159 Slave SQL thread entry point. 7160 7161 @param arg Pointer to Relay_log_info object that holds information 7162 for the SQL thread. 7163 7164 @return Always 0. 7165 */ 7166 extern "C" void *handle_slave_sql(void *arg)
在handle_slave_sql中,關鍵的代碼是while (!sql_slave_killed(thd,rli))這個循環:
7438 while (!sql_slave_killed(thd,rli)) 7439 { 7440 THD_STAGE_INFO(thd, stage_reading_event_from_the_relay_log); 7441 DBUG_ASSERT(rli->info_thd == thd); 7442 THD_CHECK_SENTRY(thd); 7443 7444 if (saved_skip && rli->slave_skip_counter == 0) 7445 { ... ... 7456 } 7457 7458 if (exec_relay_log_event(thd,rli)) 7459 { 7460 ... ... 7525 } 7526 goto err; 7527 } 7528 }
能夠發現,每次循環的時候,都會去執行exec_relay_log_event這個函數:
5098 Top-level function for executing the next event in the relay log. 5099 This is called from the SQL thread. 5100 5101 This function reads the event from the relay log, executes it, and 5102 advances the relay log position. It also handles errors, etc. 5103 5104 This function may fail to apply the event for the following reasons: 5105 5106 - The position specfied by the UNTIL condition of the START SLAVE 5107 command is reached. 5108 5109 - It was not possible to read the event from the log. 5110 5111 - The slave is killed. 5112 5113 - An error occurred when applying the event, and the event has been 5114 tried slave_trans_retries times. If the event has been retried 5115 fewer times, 0 is returned. 5116 5117 - init_info or init_relay_log_pos failed. (These are called 5118 if a failure occurs when applying the event.) 5119 5120 - An error occurred when updating the binlog position. 5121 5122 @retval 0 The event was applied. 5123 5124 @retval 1 The event was not applied. 5125 */ 5126 static int exec_relay_log_event(THD* thd, Relay_log_info* rli)
在exec_relay_log_event中,執行對應的relaylog以後,會相應的更新last_master_timestamp。在這裏並行複製模式和非並行複製模式下,更新last_master_timestamp的方式是不一樣的。在這裏先介紹下非並行複製模式下更新last_master_timestamp的步驟。
在exec_relay_log_event中判斷是不是並行複製是經過is_parallel_exec函數實現的。若是是並行複製模式則返回True,不然返回False。在非並行複製的模式下,會進入以下的代碼執行:
5175 /* 5176 Even if we don't execute this event, we keep the master timestamp, 5177 so that seconds behind master shows correct delta (there are events 5178 that are not replayed, so we keep falling behind). 5179 5180 If it is an artificial event, or a relay log event (IO thread generated 5181 event) or ev->when is set to 0, or a FD from master, or a heartbeat 5182 event with server_id '0' then we don't update the last_master_timestamp. 5183 5184 In case of parallel execution last_master_timestamp is only updated when 5185 a job is taken out of GAQ. Thus when last_master_timestamp is 0 (which 5186 indicates that GAQ is empty, all slave workers are waiting for events from 5187 the Coordinator), we need to initialize it with a timestamp from the first 5188 event to be executed in parallel. 5189 */ 5190 if ((!rli->is_parallel_exec() || rli->last_master_timestamp == 0) && 5191 !(ev->is_artificial_event() || ev->is_relay_log_event() || 5192 (ev->common_header->when.tv_sec == 0) || 5193 ev->get_type_code() == binary_log::FORMAT_DESCRIPTION_EVENT || 5194 ev->server_id == 0)) 5195 { 5196 rli->last_master_timestamp= ev->common_header->when.tv_sec + 5197 (time_t) ev->exec_time; 5198 DBUG_ASSERT(rli->last_master_timestamp >= 0); 5199 }
能夠看到,last_master_timestamp的值即是ev->common_header->when.tv_sec和(time_t) ev->exec_time的和。在代碼中加上點調試信息,從新編譯:
DBUG_PRINT("info", ("before rli->last_master_timestamp = %lu", rli->last_master_timestamp)); DBUG_PRINT("info", ("before rli->is_parallel_exec() = %d", int(rli->is_parallel_exec()))); if ((!rli->is_parallel_exec() || rli->last_master_timestamp == 0) && !(ev->is_artificial_event() || ev->is_relay_log_event() || (ev->common_header->when.tv_sec == 0) || ev->get_type_code() == binary_log::FORMAT_DESCRIPTION_EVENT || ev->server_id == 0)) { rli->last_master_timestamp= ev->common_header->when.tv_sec + (time_t) ev->exec_time; DBUG_PRINT("info", ("after rli->last_master_timestamp = %lu", rli->last_master_timestamp)); DBUG_PRINT("info", ("after rli->is_parallel_exec() = %d", int(rli->is_parallel_exec()))); DBUG_ASSERT(rli->last_master_timestamp >= 0); }
能夠在mysqld.trace中觀察到以下的調試信息,印證了前文關於非並行複製的結論。同時也能夠看出,非並行複製下last_master_timestamp的更新是在binlogevent執行以前。
exec_relay_log_event: info: before rli->last_master_timestamp = 0 exec_relay_log_event: info: before rli->is_parallel_exec() = 0 exec_relay_log_event: info: after rli->last_master_timestamp = 1562201989 exec_relay_log_event: info: after rli->is_parallel_exec() = 0
前文介紹了非並行複製模式下的last_master_timestamp的更新邏輯。本節將介紹下在並行複製模式下,last_master_timestamp是如何更新的。
在並行複製模式下last_master_timestamp的更新邏輯更爲複雜。同時對於ROTATE_EVENT事件和普通的binlog事件更新模式也有所不一樣。接下來首先介紹下普通的binlog事件下last_master_timestamp的更新邏輯:
並行複製有一個分發隊列gaq,sql線程將binlog事務讀取到gaq,而後再分發給worker線程執行。並行複製時,binlog事件是併發穿插執行的,gaq中有一個checkpoint點稱爲lwm, lwm以前的binlog都已經執行,而lwm以後的binlog有些執行有些沒有執行。 假設worker線程數爲2,gap有1,2,3,4,5,6,7,8個事務。worker 1已執行的事務爲1 4 6, woker 2執行的事務爲2 3 ,那麼lwm爲4。
並行複製更新gap checkpiont時,會推動lwm點,同時更新
last_master_timestamp
爲lwm所在事務結束的event的時間。所以,並行複製是在事務執行完成後才更新last_master_timestamp
,更新是以事務爲單位。同時更新gap checkpiont還受slave_checkpoint_period
參數的影響。
coordinator會調用函數mts_checkpoint_routune按期作「checkpoint」操做,將Relay_log_info.gaq中已經執行結束的事務移出隊列,移除方式爲從隊列頭開始檢查,若是done爲真則移出,不然中止掃描,並標記Low Water Mark爲移出的事務中最大的sequence_number(lock-interval終止點)
能夠看出,並行複製下last_master_timestamp的更新邏輯與lwm和mts_checkpoint_routune函數有關。mts_checkpoint_routune函數頭以下:
/** Processing rli->gaq to find out the low-water-mark (lwm) coordinates which is stored into the cental recovery table. @param rli pointer to Relay-log-info of Coordinator @param period period of processing GAQ, normally derived from @c mts_checkpoint_period @param force if TRUE then hang in a loop till some progress @param need_data_lock False if rli->data_lock mutex is aquired by the caller. @return FALSE success, TRUE otherwise */ bool mts_checkpoint_routine(Relay_log_info *rli, ulonglong period, bool force, bool need_data_lock)
在coordinator中會按期調用mts_checkpoint_routine將Relay_log_info.gaq中已經執行結束的事務移出隊列,而且更新Low Water Mark爲移出的事務中最大的sequence_number(參見mts_checkpoint_routine中以下):
do { if (!is_mts_db_partitioned(rli)) mysql_mutex_lock(&rli->mts_gaq_LOCK); cnt= rli->gaq->move_queue_head(&rli->workers); if (!is_mts_db_partitioned(rli)) mysql_mutex_unlock(&rli->mts_gaq_LOCK); #ifndef DBUG_OFF if (DBUG_EVALUATE_IF("check_slave_debug_group", 1, 0) && cnt != opt_mts_checkpoint_period) sql_print_error("This an error cnt != mts_checkpoint_period"); #endif } while (!sql_slave_killed(rli->info_thd, rli) && cnt == 0 && force && !DBUG_EVALUATE_IF("check_slave_debug_group", 1, 0) && (my_sleep(rli->mts_coordinator_basic_nap), 1));
在函數的最後,會執行last_master_timestamp的更新邏輯:
/* Update the rli->last_master_timestamp for reporting correct Seconds_behind_master. If GAQ is empty, set it to zero. Else, update it with the timestamp of the first job of the Slave_job_queue which was assigned in the Log_event::get_slave_worker() function. */ ts= rli->gaq->empty() ? 0 : reinterpret_cast<Slave_job_group*>(rli->gaq->head_queue())->ts; rli->reset_notified_checkpoint(cnt, ts, need_data_lock, true);
能夠看出若是GAP爲空(既work的隊列爲空)則更新last_master_timestamp爲0,不然更新爲隊列中第一個是元素的timestamp。
前文提到在並行複製模式下,普通的binlog事件會被coordinator進程分發到work中去執行。而且在mts_checkpoint_routine中去推動Low Water Mark和更新last_master_timestamp。可是若是執行的binlog是一個ROTATE_EVENT事件,則coordinator進程不會將事件分發到work中,而是在coordinator進程中本身執行並更新last_master_timestamp。
對於ROTATE_EVENT事件的執行邏輯,入口依然是exec_relay_log_event函數。在函數中調用了apply_event_and_update_pos函數執行binlog事件。
5247 /* ptr_ev can change to NULL indicating MTS coorinator passed to a Worker */ 5248 exec_res= apply_event_and_update_pos(ptr_ev, thd, rli); 5249 /* 5250 Note: the above call to apply_event_and_update_pos executes 5251 mysql_mutex_unlock(&rli->data_lock); 5252 */ 5253 5254 /* For deferred events, the ptr_ev is set to NULL 5255 in Deferred_log_events::add() function. 5256 Hence deferred events wont be deleted here. 5257 They will be deleted in Deferred_log_events::rewind() funciton. 5258 */
在apply_event_and_update_pos函數中,會首先調用apply_event去判斷當前的binlog事件是否能夠被分發到work進程執行。在這裏收到的binlog事件是ROTATE_EVENT,所以apply_event返回的exec_res爲0,ev->worker==rli(rli 表明了coordinator進程),所以不會進入下面的if邏輯中:
4709 if (reason == Log_event::EVENT_SKIP_NOT) 4710 { 4711 // Sleeps if needed, and unlocks rli->data_lock. 4712 if (sql_delay_event(ev, thd, rli)) 4713 DBUG_RETURN(SLAVE_APPLY_EVENT_AND_UPDATE_POS_OK); 4714 4715 exec_res= ev->apply_event(rli); 4717 if (!exec_res && (ev->worker != rli)) 4718 { ... ... }
在mysqld.trace 中能夠觀察到以下的輸出:
Log_event::shall_skip: info: skip reason=0=NOT LOG_EVENT:apply_event: info: event_type=Rotate apply_event_and_update_pos: info: apply_event error = 0
apply_event函數定義和註釋以下:
/** Scheduling event to execute in parallel or execute it directly. In MTS case the event gets associated with either Coordinator or a Worker. A special case of the association is NULL when the Worker can't be decided yet. In the single threaded sequential mode the event maps to SQL thread rli. @note in case of MTS failure Coordinator destroys all gathered deferred events. @return 0 as success, otherwise a failure. */ int Log_event::apply_event(Relay_log_info *rli)
由於沒法被分發到work執行,所以會進入以下的執行邏輯中:
4832 DBUG_PRINT("info", ("apply_event error = %d", exec_res)); 4833 if (exec_res == 0) 4834 { 4835 /* 4836 Positions are not updated here when an XID is processed. To make 4837 a slave crash-safe, positions must be updated while processing a 4838 XID event and as such do not need to be updated here again. 4839 4840 However, if the event needs to be skipped, this means that it 4841 will not be processed and then positions need to be updated here. 4842 4843 See sql/rpl_rli.h for further details. 4844 */ 4845 int error= 0; 4846 if (*ptr_ev && 4847 (ev->get_type_code() != binary_log::XID_EVENT || 4848 skip_event || (rli->is_mts_recovery() && !is_gtid_event(ev) && 4849 (ev->ends_group() || !rli->mts_recovery_group_seen_begin) && 4850 bitmap_is_set(&rli->recovery_groups, rli->mts_recovery_index)))) 4851 { 4852 #ifndef DBUG_OFF 4853 /* 4854 This only prints information to the debug trace. 4855 4856 TODO: Print an informational message to the error log? 4857 */ ... ... 4873 error= ev->update_pos(rli); ... ... } }
能夠發現,在此調用了update_pos進行更新。由於當前的binlog事件是ROTATE_EVENT,所以會調用以下的代碼:
/* Got a rotate log event from the master. This is mainly used so that we can later figure out the logname and position for the master. We can't rotate the slave's BINlog as this will cause infinitive rotations in a A -> B -> A setup. The NOTES below is a wrong comment which will disappear when 4.1 is merged. This must only be called from the Slave SQL thread, since it calls flush_relay_log_info(). @retval 0 ok */ int Rotate_log_event::do_update_pos(Relay_log_info *rli)
在do_update_pos 更新last_master_timestamp 的關鍵邏輯以下:
if (rli->is_parallel_exec()) { bool real_event= server_id && !is_artificial_event(); rli->reset_notified_checkpoint(0, real_event ? common_header->when.tv_sec + (time_t) exec_time : 0, true/*need_data_lock=true*/, real_event? true : false); }
能夠在mysqld.trace 中觀察到以下的輸出:
Rotate_log_event::do_update_pos: info: server_id=248; ::server_id=236 Rotate_log_event::do_update_pos: info: new_log_ident: dp-bin.000063 Rotate_log_event::do_update_pos: info: pos: 4 Rotate_log_event::do_update_pos: info: old group_master_log_name: 'dp-bin.000062' old group_master_log_pos: 154 Relay_log_info::inc_group_relay_log_pos: info: log_pos: 4 group_master_log_pos: 154 Rotate_log_event::do_update_pos: info: new group_master_log_name: 'dp-bin.000063' new group_master_log_pos: 4 Rotate_log_event::do_update_pos: info: ------> 1562684581(此處爲本文加入的調試信息,在reset_notified_checkpoint更新last_master_timestamp的時候打印出。)
至此,對於並行複製和非並行複製下當收到ROTATE_EVENT事件時,更新last_master_timestamp的邏輯分析所有完成。
前文分析了當mysql收到一個ROTATE_EVENT事件的時候所運行的邏輯。本節將分析下在mysql收到與本身的serverid一致的binlog事件的時候的運行邏輯。
首先來看下io thread 接收到與自己serverid一致的binlog的時候所作的操做。在handle_slave_io函數中,會在while循環中不斷的調用queue_even函數。
5740 while (!io_slave_killed(thd,mi)) 5741 { 5742 ulong event_len; ... ... 5750 event_len= read_event(mysql, mi, &suppress_warnings); ... ... 5813 /* XXX: 'synced' should be updated by queue_event to indicate 5814 whether event has been synced to disk */ 5815 bool synced= 0; 5816 if (queue_event(mi, event_buf, event_len)) 5817 { 5818 mi->report(ERROR_LEVEL, ER_SLAVE_RELAY_LOG_WRITE_FAILURE, 5819 ER(ER_SLAVE_RELAY_LOG_WRITE_FAILURE), 5820 "could not queue event from master"); 5821 goto err; 5822 } ... ... }
在queue_even函數中,對於收到與本身serverid一致的binlog的處理邏輯以下:
8534 /* 8535 If this event is originating from this server, don't queue it. 8536 We don't check this for 3.23 events because it's simpler like this; 3.23 8537 will be filtered anyway by the SQL slave thread which also tests the 8538 server id (we must also keep this test in the SQL thread, in case somebody 8539 upgrades a 4.0 slave which has a not-filtered relay log). 8540 8541 ANY event coming from ourselves can be ignored: it is obvious for queries; 8542 for STOP_EVENT/ROTATE_EVENT/START_EVENT: these cannot come from ourselves 8543 (--log-slave-updates would not log that) unless this slave is also its 8544 direct master (an unsupported, useless setup!). 8545 */ ... ... 8560 if ((s_id == ::server_id && !mi->rli->replicate_same_server_id) || 8561 /* 8562 the following conjunction deals with IGNORE_SERVER_IDS, if set 8563 If the master is on the ignore list, execution of 8564 format description log events and rotate events is necessary. 8565 */ 8566 (mi->ignore_server_ids->dynamic_ids.size() > 0 && 8567 mi->shall_ignore_server_id(s_id) && 8568 /* everything is filtered out from non-master */ 8569 (s_id != mi->master_id || 8570 /* for the master meta information is necessary */ 8571 (event_type != binary_log::FORMAT_DESCRIPTION_EVENT && 8572 event_type != binary_log::ROTATE_EVENT)))) 8573 { 8574 /* 8575 Do not write it to the relay log. 8576 a) We still want to increment mi->get_master_log_pos(), so that we won't 8577 re-read this event from the master if the slave IO thread is now 8578 stopped/restarted (more efficient if the events we are ignoring are big 8579 LOAD DATA INFILE). 8580 b) We want to record that we are skipping events, for the information of 8581 the slave SQL thread, otherwise that thread may let 8582 rli->group_relay_log_pos stay too small if the last binlog's event is 8583 ignored. 8584 But events which were generated by this slave and which do not exist in 8585 the master's binlog (i.e. Format_desc, Rotate & Stop) should not increment 8586 mi->get_master_log_pos(). 8587 If the event is originated remotely and is being filtered out by 8588 IGNORE_SERVER_IDS it increments mi->get_master_log_pos() 8589 as well as rli->group_relay_log_pos. 8590 */ 8591 if (!(s_id == ::server_id && !mi->rli->replicate_same_server_id) || 8592 (event_type != binary_log::FORMAT_DESCRIPTION_EVENT && 8593 event_type != binary_log::ROTATE_EVENT && 8594 event_type != binary_log::STOP_EVENT)) 8595 { 8596 mi->set_master_log_pos(mi->get_master_log_pos() + inc_pos); 8597 memcpy(rli->ign_master_log_name_end, mi->get_master_log_name(), FN_REFLEN); 8598 DBUG_ASSERT(rli->ign_master_log_name_end[0]); 8599 rli->ign_master_log_pos_end= mi->get_master_log_pos(); 8600 } 8601 rli->relay_log.signal_update(); // the slave SQL thread needs to re-check 8602 DBUG_PRINT("info", ("master_log_pos: %lu, event originating from %u server, ignored", 8603 (ulong) mi->get_master_log_pos(), uint4korr(buf + SERVER_ID_OFFSET))); 8604 }
結合註釋,能夠看出當mysql收到與本實例serverid一致的binlog的時候,不會將當前的binlog事件寫入relaylog中。同時會完成以下的2個事情:(ps :這是一個關鍵的地方,在下文中將提到它的做用)
在mysqld.trace 中能夠觀察到以下的日誌信息:
queue_event: info: master_log_pos: 219, event originating from 236 server, ignored queue_event: info: error: 0
接下里看看sql進程對binlog的處理過程。入口的函數仍是exec_relay_log_event,在exec_relay_log_event函數中會調用next_event函數獲取一個可執行的binlog事件。在這裏next_event對於非並行複製會有一個特殊的處理:
9175 if (!rli->is_parallel_exec()) 9176 rli->last_master_timestamp= 0;
若是是非並行複製,則當讀取一個binlog的時候,都會把last_master_timestamp設置成0 。所以在非並行複製下,收到與本實例serverid一致的binlog的時候,mysqld.trace中能夠觀察到rli->last_master_timestamp的值會一直爲0:
next_event: info: seeing an ignored end segment handle_slave_io: info: IO thread received event of type Query exec_relay_log_event: info: ================================before rli->last_master_timestamp = 0 exec_relay_log_event: info: ================================before rli->is_parallel_exec() = 0 apply_event_and_update_pos: info: thd->options: ; rli->last_event_start_time: 0 Log_event::shall_skip: info: ev->server_id=0, ::server_id=236, rli->replicate_same_server_id=0, rli->slave_skip_counter=0 Log_event::shall_skip: info: skip reason=0=NOT LOG_EVENT:apply_event: info: event_type=Rotate ... ... ... ... exec_relay_log_event: info: ================================before rli->last_master_timestamp = 0 exec_relay_log_event: info: ================================before rli->is_parallel_exec() = 0 apply_event_and_update_pos: info: thd->options: ; rli->last_event_start_time: 0 Log_event::shall_skip: info: ev->server_id=0, ::server_id=236, rli->replicate_same_server_id=0, rli->slave_skip_counter=0 Log_event::shall_skip: info: skip reason=0=NOT LOG_EVENT:apply_event: info: event_type=Rotate apply_event_and_update_pos: info: apply_event error = 0 apply_event_and_update_pos: info: OPTION_BEGIN: 0; IN_STMT: 0 Rotate_log_event::do_update_pos: info: server_id=0; ::server_id=236 Rotate_log_event::do_update_pos: info: new_log_ident: dp-bin.000069 Rotate_log_event::do_update_pos: info: pos: 282 MYSQL_BIN_LOG::signal_update: info: signal_cnt : 13 queue_event: info: master_log_pos: 326, event originating from 236 server, ignored queue_event: info: error: 0
以後的邏輯中,若是發現rli->ign_master_log_name_end[0]不爲空(對應了上一個小節中的第一項:將mi->get_master_log_name() 拷貝到rli->ign_master_log_name_end中),則構造一個serverid爲0的Rotate event並返回:
9181 if (rli->ign_master_log_name_end[0]) 9182 { 9183 /* We generate and return a Rotate, to make our positions advance */ 9184 DBUG_PRINT("info",("seeing an ignored end segment")); 9185 ev= new Rotate_log_event(rli->ign_master_log_name_end, 9186 0, rli->ign_master_log_pos_end, 9187 Rotate_log_event::DUP_NAME); 9188 rli->ign_master_log_name_end[0]= 0; 9189 mysql_mutex_unlock(log_lock); 9190 if (unlikely(!ev)) 9191 { 9192 errmsg= "Slave SQL thread failed to create a Rotate event " 9193 "(out of memory?), SHOW SLAVE STATUS may be inaccurate"; 9194 goto err; 9195 } 9196 ev->server_id= 0; // don't be ignored by slave SQL thread 9197 DBUG_RETURN(ev); 9198 }
緊接着調用 apply_event_and_update_pos函數。在apply_event_and_update_pos函數中,如前所述若是是ROTATE EVENT 則返回0,不會被work進程並行執行,並進入update_pos邏輯中。在update_pos邏輯中關鍵的代碼以下(sql/log_event.cc 文件):
6649 if (rli->is_parallel_exec()) 6650 { 6651 bool real_event= server_id && !is_artificial_event(); 6652 rli->reset_notified_checkpoint(0, 6653 real_event ? 6654 common_header->when.tv_sec + 6655 (time_t) exec_time : 0, 6656 true/*need_data_lock=true*/, 6657 real_event? true : false); 6658 }
能夠見得real_event在server_id是0 的時候爲false。所以當進入reset_notified_checkpoint函數後,由於update_timestamp條件(傳入的值即是real_event的值)爲false便不會更新last_master_timestamp。
262 /** 263 This method is called in mts_checkpoint_routine() to mark that each 264 worker is required to adapt to a new checkpoint data whose coordinates 265 are passed to it through GAQ index. 266 267 Worker notices the new checkpoint value at the group commit to reset 268 the current bitmap and starts using the clean bitmap indexed from zero 269 of being reset checkpoint_seqno. 270 271 New seconds_behind_master timestamp is installed. 272 273 @param shift number of bits to shift by Worker due to the 274 current checkpoint change. 275 @param new_ts new seconds_behind_master timestamp value 276 unless zero. Zero could be due to FD event 277 or fake rotate event. 278 @param need_data_lock False if caller has locked @c data_lock 279 @param update_timestamp if true, this function will update the 280 rli->last_master_timestamp. 281 */ 282 void Relay_log_info::reset_notified_checkpoint(ulong shift, time_t new_ts, 283 bool need_data_lock, 284 bool update_timestamp) 285 { ... ... 330 if (update_timestamp) 331 { 332 if (need_data_lock) 333 mysql_mutex_lock(&data_lock); 334 else 335 mysql_mutex_assert_owner(&data_lock); 336 last_master_timestamp= new_ts; 337 if (need_data_lock) 338 mysql_mutex_unlock(&data_lock); 339 } 340 }
從上文的分析能夠得出以下的2個結論 :
本文分析了並行複製和非並行複製下,Seconds_Behind_Master參數值更新的相關邏輯。限於本文的做者水平有限,文中的錯誤在所不免,懇請你們批評指正。