TiDB集羣qps抖動後續之gc失效

TiDB集羣qps抖動後續之gc失效

1、背景mysql

在上次insert致使大批量的衝突引起qps驟降的案例後,業務改成insert ignore解決了qps驟降和duration升高的問題,但緊接着次日就出現了GC_can_not_work的報警。集羣版本是從3.0.2作過升級到3.0.5。git

集羣配置github

集羣版本:v3.0.5
集羣配置:普通SSD磁盤,128G內存,40 核cpu
tidb21 TiDB/PD/pump/prometheus/grafana/CCS
tidb22 TiDB/PD/pump
tidb23 TiDB/PD/pump
tidb01 TiKV
tidb02 TiKV
tidb03 TiKV
tidb04 TiKV
tidb05 TiKV
tidb06 TiKV
tidb07 TiKV
tidb08 TiKV
tidb09 TiKV
tidb10 TiKV
tidb11 TiKV
tidb12 TiKV
tidb13 TiKV
tidb14 TiKV
tidb15 TiKV
tidb16 TiKV
tidb17 TiKV
tidb18 TiKV
tidb19 TiKV
tidb20 TiKV
wtidb29 TiKV
wtidb30 TiKV

2、現象和嘗試的處理方案sql

咱們首先收到了以下報警,報警中後續其餘報警陸續恢復,僅剩GC_can_not_work
TiDB集羣qps抖動後續之gc失效併發

嘗試的解決方案app

1.將gc_life_time從24h調整到10m(無果,重啓會致使gc leader漂移)
2.重啓tidb(無果)
3.重啓pd(無果)
4.將gc_life_time從10m調回24h(無果)

官網手冊關於gc_can_not_work有以下描述curl

TiKV_GC_can_not_work
• 報警規則:
sum(increase(tidb_tikvclient_gc_action_result{type="success"}[6h])) < 1
注意:
因爲 3.0 中引入了分佈式 GC 且 GC 不會在 TiDB 執行,所以 tidb_tikvclient_gc_action_result 指標雖然在 3.* 以上版本中存在,可是不會有值。
• 規則描述:
在 6 小時內 Region 上沒有成功執行 GC,說明 GC 不能正常工做了。短時間內 GC 不運行不會形成太大的影響,但若是 GC 一直不運行,版本會愈來愈多,從而致使查詢變慢。
• 處理方法:分佈式

  1. 執行 select VARIABLE_VALUE from mysql.tidb where VARIABLE_NAME="tikv_gc_leader_desc" 來找到 gc leader 對應的 tidb-server;
  2. 查看該 tidb-server 的日誌,grep gc_worker tidb.log;
  3. 若是發現這段時間一直在 resolve locks(最後一條日誌是 start resolve locks)或者 delete ranges(最後一條日誌是 start delete {number} ranges),說明 GC 進程是正常的。不然須要報備開發人員 support@pingcap.com 進行處理。

這種狀況通常是前一輪的GC還沒結束,那麼下一輪的GC就不會開始ide

這時咱們應當先看一下gc leader的日誌,gc leader能夠經過查詢mysql.tidb裏的tikv_gc_leader_desc列得知
本案例中是tidb21機器
TiDB集羣qps抖動後續之gc失效
從上圖得知,是21日16:44分開始的,後面一直就沒有看到過顯示完成,一直由於未完成,因此日誌裏一直報still running
其中經過tikv_gc_last_run_time/tikv_gc_safe_point分別指:工具

• tikv_gc_last_run_time:最近一次 GC 運行的時間(每輪 GC 開始時更新)
• tikv_gc_safe_point:當前的 safe point (每輪 GC 開始時更新)

gc一共分爲3個階段,分別是:

1.reslove lock
2.delete range
3.do gc

其詳細的原理以下:

Resolve Locks(清理鎖)
TiDB 的事務是基於 Google Percolator 模型實現的,事務的提交是一個兩階段提交的過程。第一階段完成時,全部涉及的 key 都會上鎖,其中一個鎖會被選爲 Primary,其他的鎖 (Secondary) 則會存儲一個指向 Primary 的指針;第二階段會將 Primary 鎖所在的 key 加上一個 Write 記錄,並去除鎖。這裏的 Write 記錄就是歷史上對該 key 進行寫入或刪除,或者該 key 上發生事務回滾的記錄。Primary 鎖被替換爲什麼種 Write 記錄標誌着該事務提交成功與否。接下來,全部 Secondary 鎖也會被依次替換。若是由於某些緣由(如發生故障等),這些 Secondary 鎖沒有完成替換、殘留了下來,那麼也能夠根據鎖中的信息取找到 Primary,並根據 Primary 是否提交來判斷整個事務是否提交。可是,若是 Primary 的信息在 GC 中被刪除了,而該事務又存在未成功提交的 Secondary 鎖,那麼就永遠沒法得知該鎖是否能夠提交。這樣,數據的正確性就沒法保證。
Resolve Locks 這一步的任務即對 safe point 以前的鎖進行清理。即若是一個鎖對應的 Primary 已經提交,那麼該鎖也應該被提交;反之,則應該回滾。而若是 Primary 仍然是上鎖的狀態(沒有提交也沒有回滾),則應當將該事務視爲超時失敗而回滾。
Resolve Locks 的執行方式是由 GC leader 對全部的 Region 發送請求掃描過時的鎖,並對掃到的鎖查詢 Primary 的狀態,再發送請求對其進行提交或回滾。這個過程默認會並行地執行,併發數量默認與 TiKV 節點個數相同。
Delete Ranges(刪除區間)
在執行 DROP TABLE/INDEX 等操做時,會有大量連續的數據被刪除。若是對每一個 key 都進行刪除操做、再對每一個 key 進行 GC 的話,那麼執行效率和空間回收速度均可能很是的低下。事實上,這種時候 TiDB 並不會對每一個 key 進行刪除操做,而是將這些待刪除的區間及刪除操做的時間戳記錄下來。Delete Ranges 會將這些時間戳在 safe point 以前的區間進行快速的物理刪除。
Do GC(進行 GC 清理)
這一步即刪除全部 key 的過時版本。爲了保證 safe point 以後的任什麼時候間戳都具備一致的快照,這一步刪除 safe point 以前提交的數據,可是會對每一個 key 保留 safe point 前的最後一次寫入(除非最後一次寫入是刪除)。
在進行這一步時,TiDB 只須要將 safe point 發送給 PD,便可結束整輪 GC。TiKV 會自行檢測到 safe point 發生了更新,會對當前節點上全部做爲 Region leader 進行 GC。與此同時,GC leader 能夠繼續觸發下一輪 GC。
    注意:
    TiDB v2.1 以及更早的版本中,Do GC 這一步是經過由 TiDB 對每一個 Region 發送請求的方式實現的。在 v3.0 及更新的版本中,經過修改配置能夠繼續使用舊的 GC 方式,詳情請參考 GC 配置。

3、排查步驟

去觀察tidb日誌裏有以下兩類:

[2020/06/21 16:45:00.745 +08:00] [INFO] [gc_worker.go:246] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5cb582e33dc0004]
[2020/06/21 16:44:00.749 +08:00] [INFO] [gc_worker.go:277] ["[gc worker] starts the whole job"] [uuid=5cb582e33dc0004] [safePoint=417524204365938688] [concurrency=22]
[2020/06/21 16:44:00.749 +08:00] [INFO] [gc_worker.go:773] ["[gc worker] start resolve locks"] [uuid=5cb582e33dc0004] [safePoint=417524204365938688] [concurrency=22]

gc不運行:
TiDB集羣qps抖動後續之gc失效

Region number持續上漲
TiDB集羣qps抖動後續之gc失效
gc speed沒速度了,雖然gc沒速度,但集羣總體的qps和duration都很穩定
TiDB集羣qps抖動後續之gc失效
TiDB集羣qps抖動後續之gc失效
放大後得知具體的時間在21日8:44分
TiDB集羣qps抖動後續之gc失效

而以前的insert致使qps抖動問題,業務修改成insert ignore時間是21日下午14點左右
到這裏仍是懷疑以前寫寫衝突嚴重,致使這些region上殘留的lock很是的多,reslove起來也很慢
而在上午8點多gc speed無速度時,業務尚未調整insert 爲insert,此時的tidb gc日誌爲:

[2020/06/21 16:42:41.938 +08:00] [ERROR] [gc_worker.go:787] ["[gc worker] resolve locks failed"] [uuid=5cb549336b40001] [safePoint=417520979457343488] [error="loadRegion from PD failed, key: \"t\\x80\\x00\\x00\\x00\\x00\\x01m\\xcb_r\\xf8\\x00\\x00\\x00\\x01\\x8f\\xd7;\", err: rpc error: code = Canceled desc = context canceled"] [errorVerbose="loadRegion from PD failed, key: \"t\\x80\\x00\\x00\\x00\\x00\\x01m\\xcb_r\\xf8\\x00\\x00\\x00\\x01\\x8f\\xd7;\", err: rpc error: code = Canceled desc = context canceled\ngithub.com/pingcap/tidb/store/tikv.(*RegionCache).loadRegion\n\tgithub.com/pingcap/tidb@/store/tikv/region_cache.go:621\ngithub.com/pingcap/tidb/store/tikv.(*RegionCache).findRegionByKey\n\tgithub.com/pingcap/tidb@/store/tikv/region_cache.go:358\ngithub.com/pingcap/tidb/store/tikv.(*RegionCache).LocateKey\n\tgithub.com/pingcap/tidb@/store/tikv/region_cache.go:318\ngithub.com/pingcap/tidb/store/tikv.(*RangeTaskRunner).RunOnRange\n\tgithub.com/pingcap/tidb@/store/tikv/range_task.go:147\ngithub.com/pingcap/tidb/store/tikv/gcworker.(*GCWorker).resolveLocks\n\tgithub.com/pingcap/tidb@/store/tikv/gcworker/gc_worker.go:785\ngithub.com/pingcap/tidb/store/tikv/gcworker.(*GCWorker).runGCJob\n\tgithub.com/pingcap/tidb@/store/tikv/gcworker/gc_worker.go:492\nruntime.goexit\n\truntime/asm_amd64.s:1357"] [stack="github.com/pingcap/tidb/store/tikv/gcworker.(*GCWorker).resolveLocks\n\tgithub.com/pingcap/tidb@/store/tikv/gcworker/gc_worker.go:787\ngithub.com/pingcap/tidb/store/tikv/gcworker.(*GCWorker).runGCJob\n\tgithub.com/pingcap/tidb@/store/tikv/gcworker/gc_worker.go:492"]

上面的日誌說明gc的時候要load region信息,可是像pd請求的時候超時了

重啓tidb節點後,日誌和以前同樣:
TiDB集羣qps抖動後續之gc失效
resolveLocks Progress的completed持續一個值,正常應該有抖動纔對
TiDB集羣qps抖動後續之gc失效

TiDB集羣qps抖動後續之gc失效
集羣已使用70%,後續本來計劃drop部分表來釋放空間被耽擱

那麼是否是region自己除了問題呢,咱們用以下命令檢查region是否有缺乏副本的狀況:

進到pd-ctl裏面執行命令
 region --jq=".regions[] | {id: .id, peer_stores: [.peers[].store_id] | select(length as $total | length>=$total-length) }" 
檢查後發現沒有缺失,說明region副本無異常

pd監控面板,region health一直沒變過
TiDB集羣qps抖動後續之gc失效
abnormal store也都是正常
TiDB集羣qps抖動後續之gc失效
分析:

scan lock 和 get txn status 時均可以成功,可是耗時會超過 40 分鐘,由於超過 40 分鐘了, resolve lock 時 region cache 過時了(10 分鐘沒有訪問),失敗重試又會從 scan lock 開始。

而後就進入了循環,出現上面的問題~
region cache 是 10 分鐘沒有訪問,就會 從 pd load 一遍

經過官方給的補丁定位,發現region 8026632存在異常,其分佈在tidb05,tidb06,tidb09上

所以懷疑是個別的region gc失敗,致使gc一直停留在reslove lock階段

[sync360@tidb21 bin]$ curl http://192.168.1.1:10080/regions/8026632
{
 "region_id": 8026632,
 "start_key": "dIAAAAAAAW3LX2mAAAAAAAAACQQZpqoETwAAAAPQAAAAAlq4bA==",
 "end_key": "dIAAAAAAAW3LX2mAAAAAAAAACQQZpqpxnwAAAAOgAAAAAmnitw==",
 "frames": [
  {
   "db_name": "snapshot",
   "table_name": "table_info_202006",
   "table_id": 93643,
   "is_record": false,
   "index_name": "idx_check_time",
   "index_id": 9,
   "index_values": [
    "1848351632564158464",
    "5764607523073734764"
   ]
  }
 ]
}

[sync360@tidb21 bin]$ ./tikv-ctl --host 192.168.1.1:20160 raft region -r 8026632              
region id: 8026632
region state key: \001\003\000\000\000\000\000zz\010\001
region state: Some(region { id: 8026632 start_key: 748000000000016DFFCB5F698000000000FF0000090419A6AA04FF4F00000003D00000FF00025AB86C000000FC end_key: 748000000000016DFFCB5F698000000000FF0000090419A6AA71FF9F00000003A00000FF000269E2B7000000FC region_epoch { conf_ver: 740 version: 61000 } peers { id: 8026633 store_id: 7 } peers { id: 8026634 store_id: 11 } peers { id: 8026635 store_id: 8 } })
raft state key: \001\002\000\000\000\000\000zz\010\002
raft state: Some(hard_state {term: 6 vote: 8026633 commit: 6 } last_index: 6)
apply state key: \001\002\000\000\000\000\000zz\010\003
apply state: Some(applied_index: 6 truncated_state { index: 5 term: 5 })

[sync360@tidb21 bin]$ ./tikv-ctl --host 192.168.1.1:20160 size -r 8026632              
region id: 8026632
cf default region size: 0 B
cf write region size: 60.042 MB
cf lock region size: 14.039 MB

咱們定位到這個region是對應的一個索引
TiDB集羣qps抖動後續之gc失效

3.0.5起支持經過 region-cache-ttl 配置修改 Region Cache 的 TTL #12683

tikv_client:
region-cache-ttl: 86400

region cache ttl 的話,若是調大,那麼這個時候,只是 cache 的信息在不訪問的狀況下,過時時間拉長了,真要訪問到的 region 信息過時了,只會增長 backoff 的時間,原則上沒有其餘影響

官方日誌記錄不全,後又補充了一個工具打印:
能夠看出,前面resolveLocks耗時1小時,緊接着region cache TTL fail,說明cache TTL這塊有問題

2020/06/28 20:02:39.214 +08:00] [INFO] [lock_resolver.go:215] ["BatchResolveLocks: lookup txn status"] ["cost time"=59m56.324274575s] ["num of txn"=1024]
[2020/06/28 20:02:39.214 +08:00] [WARN] [region_cache.go:269] ["region cache TTL fail"] [region=8026632]
[2020/06/28 20:02:39.214 +08:00] [WARN] [region_request.go:92] ["SendReqCtx fail because can't get ctx"] [region=8026632]
[2020/06/28 20:02:39.219 +08:00] [WARN] [lock_resolver.go:245] ["BatchResolveLocks: region error"] [regionID=8026632] [regionVer=61000] [regionConfVer=740] [regionErr="epoch_not_match:<> "]
[2020/06/28 20:05:12.827 +08:00] [INFO] [lock_resolver.go:215] ["BatchResolveLocks: lookup txn status"] ["cost time"=1h2m29.730427002s] ["num of txn"=1024]
[2020/06/28 20:05:12.828 +08:00] [WARN] [region_cache.go:269] ["region cache TTL fail"] [region=8019447]
[2020/06/28 20:05:12.828 +08:00] [WARN] [region_request.go:92] ["SendReqCtx fail because can't get ctx"] [region=8019447]
[2020/06/28 20:05:12.828 +08:00] [WARN] [lock_resolver.go:245] ["BatchResolveLocks: region error"] [regionID=8019447] [regionVer=60980] [regionConfVer=683] [regionErr="epoch_not_match:<> "]
[2020/06/28 20:05:13.439 +08:00] [INFO] [lock_resolver.go:215] ["BatchResolveLocks: lookup txn status"] ["cost time"=1h2m30.135826553s] ["num of txn"=1024]

因而決定調整region-cache-ttl參數,默認是600,調整到86400,在tidb.yml裏,調大的目標是爲了可以讓scan_lock執行後不至於失敗重來,以此來繞過這個問題,並把gc_life_time調整爲10分鐘,讓儘快的觸發gc。
調整完後gc一開始仍是沒有speed,且resolvelock也跟以前同樣,沒有波動

調整後cache後,偶爾能夠看到complete了
cat tidb.log |grep "[2020/06/29"|grep -e 'gc_worker' -e 'range_task' -e 'lock_resolver'

[2020/06/29 15:13:23.698 +08:00] [INFO] [gc_worker.go:156] ["[gc worker] start"] [uuid=5cbfbb183740041]
[2020/06/29 15:18:44.482 +08:00] [INFO] [gc_worker.go:187] ["[gc worker] quit"] [uuid=5cbfbb183740041]
[2020/06/29 15:19:03.159 +08:00] [INFO] [gc_worker.go:156] ["[gc worker] start"] [uuid=5cbfbc63b5c00cc]
[2020/06/29 15:25:03.395 +08:00] [INFO] [gc_worker.go:277] ["[gc worker] starts the whole job"] [uuid=5cbfbc63b5c00cc] [safePoint=417703369985753088] [concurrency=22]
[2020/06/29 15:25:03.395 +08:00] [INFO] [gc_worker.go:773] ["[gc worker] start resolve locks"] [uuid=5cbfbc63b5c00cc] [safePoint=417703369985753088] [concurrency=22]
[2020/06/29 15:25:03.395 +08:00] [INFO] [range_task.go:90] ["range task started"] [name=resolve-locks-runner] [startKey=] [endKey=] [concurrency=22]
[2020/06/29 15:26:03.445 +08:00] [INFO] [gc_worker.go:246] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5cbfbc63b5c00cc]
[2020/06/29 15:27:03.307 +08:00] [INFO] [gc_worker.go:246] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5cbfbc63b5c00cc]
[2020/06/29 15:28:03.172 +08:00] [INFO] [gc_worker.go:246] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5cbfbc63b5c00cc]
[2020/06/29 15:29:03.173 +08:00] [INFO] [gc_worker.go:246] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5cbfbc63b5c00cc]
[2020/06/29 15:30:03.434 +08:00] [INFO] [gc_worker.go:246] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5cbfbc63b5c00cc]
[2020/06/29 15:31:03.677 +08:00] [INFO] [gc_worker.go:246] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5cbfbc63b5c00cc]
[2020/06/29 15:32:03.173 +08:00] [INFO] [gc_worker.go:246] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5cbfbc63b5c00cc]
[2020/06/29 15:33:03.186 +08:00] [INFO] [gc_worker.go:246] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5cbfbc63b5c00cc]
[2020/06/29 15:34:03.174 +08:00] [INFO] [gc_worker.go:246] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5cbfbc63b5c00cc]
[2020/06/29 15:35:03.220 +08:00] [INFO] [gc_worker.go:246] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5cbfbc63b5c00cc]
[2020/06/29 15:35:03.395 +08:00] [INFO] [range_task.go:133] ["range task in progress"] [name=resolve-locks-runner] [startKey=] [endKey=] [concurrency=22] ["cost time"=10m0.000165232s] ["completed regions"=496646]
[2020/06/29 15:36:03.498 +08:00] [INFO] [gc_worker.go:246] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5cbfbc63b5c00cc]
[2020/06/29 15:37:03.173 +08:00] [INFO] [gc_worker.go:246] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5cbfbc63b5c00cc]
[2020/06/29 15:38:03.175 +08:00] [INFO] [gc_worker.go:246] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5cbfbc63b5c00cc]
[2020/06/29 15:39:03.173 +08:00] [INFO] [gc_worker.go:246] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5cbfbc63b5c00cc]
[2020/06/29 15:40:03.200 +08:00] [INFO] [gc_worker.go:246] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5cbfbc63b5c00cc]
[2020/06/29 15:41:03.174 +08:00] [INFO] [gc_worker.go:246] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5cbfbc63b5c00cc]
[2020/06/29 15:42:03.181 +08:00] [INFO] [gc_worker.go:246] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5cbfbc63b5c00cc]
[2020/06/29 15:43:03.209 +08:00] [INFO] [gc_worker.go:246] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5cbfbc63b5c00cc]
[2020/06/29 15:44:03.359 +08:00] [INFO] [gc_worker.go:246] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5cbfbc63b5c00cc]
[2020/06/29 15:45:03.179 +08:00] [INFO] [gc_worker.go:246] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5cbfbc63b5c00cc]
[2020/06/29 15:45:03.407 +08:00] [INFO] [range_task.go:133] ["range task in progress"] [name=resolve-locks-runner] [startKey=] [endKey=] [concurrency=22] ["cost time"=20m0.011628551s] ["completed regions"=987799]
[2020/06/29 15:46:03.173 +08:00] [INFO] [gc_worker.go:246] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5cbfbc63b5c00cc]
[2020/06/29 15:47:03.170 +08:00] [INFO] [gc_worker.go:246] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5cbfbc63b5c00cc]
[2020/06/29 15:48:03.291 +08:00] [INFO] [gc_worker.go:246] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5cbfbc63b5c00cc]
[2020/06/29 15:49:03.188 +08:00] [INFO] [gc_worker.go:246] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5cbfbc63b5c00cc]
[2020/06/29 15:50:03.239 +08:00] [INFO] [gc_worker.go:246] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5cbfbc63b5c00cc]
[2020/06/29 15:51:03.175 +08:00] [INFO] [gc_worker.go:246] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5cbfbc63b5c00cc]
[2020/06/29 15:52:03.437 +08:00] [INFO] [gc_worker.go:246] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5cbfbc63b5c00cc]
[2020/06/29 15:53:03.574 +08:00] [INFO] [gc_worker.go:246] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5cbfbc63b5c00cc]
[2020/06/29 15:54:03.525 +08:00] [INFO] [gc_worker.go:246] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5cbfbc63b5c00cc]
[2020/06/29 15:55:03.213 +08:00] [INFO] [gc_worker.go:246] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5cbfbc63b5c00cc]
[2020/06/29 15:55:03.401 +08:00] [INFO] [range_task.go:133] ["range task in progress"] [name=resolve-locks-runner] [startKey=] [endKey=] [concurrency=22] ["cost time"=30m0.005813553s] ["completed regions"=1579411]
[2020/06/29 15:56:03.190 +08:00] [INFO] [gc_worker.go:246] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5cbfbc63b5c00cc]

TiDB集羣qps抖動後續之gc失效

但速度仍是很慢,直到30日凌晨1點的時候gc speed徹底恢復了正常
TiDB集羣qps抖動後續之gc失效

gc生效後,能夠看到空間釋放出現拐點
TiDB集羣qps抖動後續之gc失效

4、總結若是對gc原理理解不夠深刻,且參數不夠熟悉的話,這個案例仍是處理起來很費時間的,但願看完本文可以加深您對GC的理解,後續由於這個案例又引起了新的案例,以後會爲你們繼續帶來分享~

相關文章
相關標籤/搜索