Google 的 Git v2 帶來顛覆性性能提高?恐怕未必。

做者簡介

王振威,CODING 創始團隊成員之一,多年系統軟件開發經驗,擅長 Linux,Golang,Java,Ruby,Docker 等技術領域,近兩年來一直在 CODING 從事系統架構和運維工做html

前言

最近 Google 發佈了一篇文章,描述了對 Git 的一個傳輸協議的更新,引發了國內技術圈的不小規模的轟動(相關文章請自行百度「Git v2 性能提高」)。
不少技術圈的朋友也在轉載這個新聞,那至於性能改進有多大,裏面的細節是什麼呢?事實上此次改動只在極端狀況下有性能提高,絕大多數狀況下,用戶感覺不到性能的提高。不少不明因此的轉發大概是由於 Google 的品牌效應吧 :)git

Git 是什麼?

爲了講清楚 why,咱們先來簡單介紹一下 Git 相關的協議。若是你還不瞭解 Git,想了解更多內容,可參考其官方網站:http://git-scm.com/ . 也可來 https://coding.net/help/doc/git 這裏瞭解如何在國內使用優質快速的 Git 託管服務。express

Git 傳輸協議

Git 常見的有三種協議,SSH,HTTP(S),Git,使用最普遍的是前兩種。服務器

讓咱們來看一下, HTTP(S) 和 SSH 協議的使用示例網絡

git clone https://git.coding.net/wzw/coding-demo.git
Cloning into 'coding-demo'...
remote: Counting objects: 3, done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
git clone git@git.coding.net:wzw/coding-demo.git
Cloning into 'coding-demo'...
remote: Counting objects: 3, done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (3/3), done.

能夠看到,對於全新 clone 來說二者基本上的過程是如出一轍的。架構

事實上, Git 底層對於各類應用層協議的底層處理是一致的,不論是 HTTP(S) 仍是 SSH 仍是 Git 協議。less

讓咱們來進一步看一下, Git 在傳輸過程當中都作了什麼。運維

GIT_TRACE=1 GIT_TRACE_PACKET=1 git clone https://git.coding.net/wzw/coding-demo.git
17:48:21.767799 git.c:344               trace: built-in: git 'clone' 'https://git.coding.net/wzw/coding-demo.git'
Cloning into 'coding-demo'...
17:48:21.797959 run-command.c:626       trace: run_command: 'git-remote-https' 'origin' 'https://git.coding.net/wzw/coding-demo.git'
17:48:22.278880 pkt-line.c:80           packet:          git< # service=git-upload-pack
17:48:22.279390 pkt-line.c:80           packet:          git< 0000
17:48:22.279405 pkt-line.c:80           packet:          git< fdacba1d541c75bd48f2cd742ee18f77ea3517a1 HEAD\0multi_ack thin-pack side-band side-band-64k ofs-delta shallow deepen-since deepen-not deepen-relative no-progress include-tag multi_ack_detailed no-done symref=HEAD:refs/heads/master agent=git/2.15.0
17:48:22.279419 pkt-line.c:80           packet:          git< fdacba1d541c75bd48f2cd742ee18f77ea3517a1 refs/heads/master
17:48:22.279431 pkt-line.c:80           packet:          git< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/heads/test-abc
17:48:22.279442 pkt-line.c:80           packet:          git< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/heads/test-bcd
17:48:22.279453 pkt-line.c:80           packet:          git< 30eb4b0d813c662c4d7e87c4d3b4cf561e544f8e refs/tags/v1.0
17:48:22.279472 pkt-line.c:80           packet:          git< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/tags/v1.0^{}
17:48:22.279483 pkt-line.c:80           packet:          git< 0000
17:48:22.280959 pkt-line.c:80           packet:          git> fdacba1d541c75bd48f2cd742ee18f77ea3517a1 refs/heads/master
17:48:22.280986 pkt-line.c:80           packet:          git> fdacba1d541c75bd48f2cd742ee18f77ea3517a1 refs/heads/master
17:48:22.280999 pkt-line.c:80           packet:          git> 1536ad10fc0a188c50680932ca191c8da46938c4 refs/heads/test-abc
17:48:22.281011 pkt-line.c:80           packet:          git> 1536ad10fc0a188c50680932ca191c8da46938c4 refs/heads/test-bcd
17:48:22.281023 pkt-line.c:80           packet:          git> 30eb4b0d813c662c4d7e87c4d3b4cf561e544f8e refs/tags/v1.0
17:48:22.281033 pkt-line.c:80           packet:          git> 0000
17:48:22.281089 run-command.c:626       trace: run_command: 'fetch-pack' '--stateless-rpc' '--stdin' '--lock-pack' '--thin' '--check-self-contained-and-connected' '--cloning' 'https://git.coding.net/wzw/coding-demo.git/'
17:48:22.287860 git.c:344               trace: built-in: git 'fetch-pack' '--stateless-rpc' '--stdin' '--lock-pack' '--thin' '--check-self-contained-and-connected' '--cloning' 'https://git.coding.net/wzw/coding-demo.git/'
17:48:22.288761 pkt-line.c:80           packet:   fetch-pack< fdacba1d541c75bd48f2cd742ee18f77ea3517a1 refs/heads/master
17:48:22.288799 pkt-line.c:80           packet:   fetch-pack< fdacba1d541c75bd48f2cd742ee18f77ea3517a1 refs/heads/master
17:48:22.288824 pkt-line.c:80           packet:   fetch-pack< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/heads/test-abc
17:48:22.288838 pkt-line.c:80           packet:   fetch-pack< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/heads/test-bcd
17:48:22.288851 pkt-line.c:80           packet:   fetch-pack< 30eb4b0d813c662c4d7e87c4d3b4cf561e544f8e refs/tags/v1.0
17:48:22.288863 pkt-line.c:80           packet:   fetch-pack< 0000
17:48:22.288876 pkt-line.c:80           packet:   fetch-pack< fdacba1d541c75bd48f2cd742ee18f77ea3517a1 HEAD\0multi_ack thin-pack side-band side-band-64k ofs-delta shallow deepen-since deepen-not deepen-relative no-progress include-tag multi_ack_detailed no-done symref=HEAD:refs/heads/master agent=git/2.15.0
17:48:22.288901 pkt-line.c:80           packet:   fetch-pack< fdacba1d541c75bd48f2cd742ee18f77ea3517a1 refs/heads/master
17:48:22.288914 pkt-line.c:80           packet:   fetch-pack< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/heads/test-abc
17:48:22.288927 pkt-line.c:80           packet:   fetch-pack< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/heads/test-bcd
17:48:22.288941 pkt-line.c:80           packet:   fetch-pack< 30eb4b0d813c662c4d7e87c4d3b4cf561e544f8e refs/tags/v1.0
17:48:22.288955 pkt-line.c:80           packet:   fetch-pack< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/tags/v1.0^{}
17:48:22.288967 pkt-line.c:80           packet:   fetch-pack< 0000
17:48:22.289909 pkt-line.c:80           packet:   fetch-pack> want fdacba1d541c75bd48f2cd742ee18f77ea3517a1 multi_ack_detailed no-done side-band-64k thin-pack ofs-delta deepen-since deepen-not agent=git/2.15.1.(Apple.Git-101)
17:48:22.289924 pkt-line.c:80           packet:   fetch-pack> want 1536ad10fc0a188c50680932ca191c8da46938c4
17:48:22.290081 pkt-line.c:80           packet:   fetch-pack> want 1536ad10fc0a188c50680932ca191c8da46938c4
17:48:22.290094 pkt-line.c:80           packet:   fetch-pack> want 30eb4b0d813c662c4d7e87c4d3b4cf561e544f8e
17:48:22.290103 pkt-line.c:80           packet:   fetch-pack> 0000
17:48:22.290127 pkt-line.c:80           packet:   fetch-pack> done
17:48:22.290257 pkt-line.c:80           packet:   fetch-pack> 0000
17:48:22.290290 pkt-line.c:80           packet:          git< 00a8want fdacba1d541c75bd48f2cd742ee18f77ea3517a1 multi_ack_detailed no-done side-band-64k thin-pack ofs-delta deepen-since deepen-not agent=git/2.15.1.(Apple.Git-101)0032want 1536ad10fc0a188c50680932ca191c8da46938c40032want 1536ad10fc0a188c50680932ca191c8da46938c40032want 30eb4b0d813c662c4d7e87c4d3b4cf561e544f8e00000009done
17:48:22.290375 pkt-line.c:80           packet:          git< 0000
17:48:22.436811 pkt-line.c:80           packet:   fetch-pack< NAK
17:48:22.436844 pkt-line.c:80           packet:   fetch-pack> 0000
17:48:22.437152 pkt-line.c:80           packet:     sideband< \2Counting objects: 7, done.
remote: Counting objects: 7, done.
17:48:22.437185 pkt-line.c:80           packet:     sideband< \2Compressing objects:  25% (1/4)   \15
17:48:22.437200 pkt-line.c:80           packet:     sideband< \2Compressing objects:  50% (2/4)   \15
17:48:22.437250 pkt-line.c:80           packet:     sideband< \2Compressing objects:  75% (3/4)   \15
17:48:22.437279 pkt-line.c:80           packet:     sideband< \2Compressing objects: 100% (4/4)   \15
17:48:22.437302 pkt-line.c:80           packet:     sideband< \2Compressing objects: 100% (4/4), done.
remote: Compressing objects: 100% (4/4), done.
17:48:22.447214 pkt-line.c:80           packet:          git< 0000
17:48:22.447201 pkt-line.c:80           packet:     sideband< PACK ...
17:48:22.447316 pkt-line.c:80           packet:     sideband< \2Total 7 (delta 0), reused 0 (delta 0)
remote: Total 7 (delta 0), reused 0 (delta 0)
17:48:22.447363 pkt-line.c:80           packet:     sideband< 0000
17:48:22.447372 run-command.c:626       trace: run_command: 'unpack-objects' '--pack_header=2,7'
17:48:22.453090 git.c:344               trace: built-in: git 'unpack-objects' '--pack_header=2,7'
Unpacking objects: 100% (7/7), done.
17:48:22.460604 run-command.c:626       trace: run_command: 'rev-list' '--objects' '--stdin' '--not' '--all' '--quiet' '--progress=Checking connectivity'
17:48:22.464831 git.c:344               trace: built-in: git 'rev-list' '--objects' '--stdin' '--not' '--all' '--quiet' '--progress=Checking connectivity'
GIT_TRACE=1 GIT_TRACE_PACKET=1 git clone git@git.coding.net:wzw/coding-demo.git
17:49:18.654786 git.c:344               trace: built-in: git 'clone' 'git@git.coding.net:wzw/coding-demo.git'
Cloning into 'coding-demo'...
17:49:18.669187 run-command.c:626       trace: run_command: 'ssh' 'git@git.coding.net' 'git-upload-pack '\''wzw/coding-demo.git'\'''
17:49:19.768942 pkt-line.c:80           packet:        clone< fdacba1d541c75bd48f2cd742ee18f77ea3517a1 HEAD\0multi_ack thin-pack side-band side-band-64k ofs-delta shallow deepen-since deepen-not deepen-relative no-progress include-tag multi_ack_detailed symref=HEAD:refs/heads/master agent=git/2.15.0
17:49:19.772436 pkt-line.c:80           packet:        clone< fdacba1d541c75bd48f2cd742ee18f77ea3517a1 refs/heads/master
17:49:19.772527 pkt-line.c:80           packet:        clone< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/heads/test-abc
17:49:19.772549 pkt-line.c:80           packet:        clone< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/heads/test-bcd
17:49:19.772566 pkt-line.c:80           packet:        clone< 30eb4b0d813c662c4d7e87c4d3b4cf561e544f8e refs/tags/v1.0
17:49:19.772863 pkt-line.c:80           packet:        clone< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/tags/v1.0^{}
17:49:19.772910 pkt-line.c:80           packet:        clone< 0000
17:49:19.776185 pkt-line.c:80           packet:        clone> want fdacba1d541c75bd48f2cd742ee18f77ea3517a1 multi_ack_detailed side-band-64k thin-pack ofs-delta deepen-since deepen-not agent=git/2.15.1.(Apple.Git-101)
17:49:19.776215 pkt-line.c:80           packet:        clone> want fdacba1d541c75bd48f2cd742ee18f77ea3517a1
17:49:19.776224 pkt-line.c:80           packet:        clone> want 1536ad10fc0a188c50680932ca191c8da46938c4
17:49:19.776232 pkt-line.c:80           packet:        clone> want 1536ad10fc0a188c50680932ca191c8da46938c4
17:49:19.776239 pkt-line.c:80           packet:        clone> want 30eb4b0d813c662c4d7e87c4d3b4cf561e544f8e
17:49:19.776246 pkt-line.c:80           packet:        clone> 0000
17:49:19.776262 pkt-line.c:80           packet:        clone> done
17:49:19.879841 pkt-line.c:80           packet:        clone< NAK
17:49:19.880083 run-command.c:626       trace: run_command: 'index-pack' '--stdin' '-v' '--fix-thin' '--keep=fetch-pack 75332 on wangzheweideMBP' '--check-self-contained-and-connected'
17:49:19.885280 git.c:344               trace: built-in: git 'index-pack' '--stdin' '-v' '--fix-thin' '--keep=fetch-pack 75332 on wangzheweideMBP' '--check-self-contained-and-connected'
17:49:19.889021 pkt-line.c:80           packet:     sideband< \2Counting objects: 7, done.
remote: Counting objects: 7, done.
17:49:19.895119 pkt-line.c:80           packet:     sideband< \2Compressing objects:  25% (1/4)   \15Compressing objects:  50% (2/4)   \15Compressing objects:  75% (3/4)   \15Compressing objects: 10
17:49:19.895170 pkt-line.c:80           packet:     sideband< \20% (4/4)   \15
17:49:19.897621 pkt-line.c:80           packet:     sideband< \2Compressing objects: 100% (4/4), done.
remote: Compressing objects: 100% (4/4), done.
17:49:19.914866 pkt-line.c:80           packet:     sideband< PACK ...
17:49:19.914916 pkt-line.c:80           packet:     sideband< \2Total 7 (delta 0), reused 0 (delta 0)
remote: Total 7 (delta 0), reused 0 (delta 0)
17:49:19.914936 pkt-line.c:80           packet:     sideband< 0000
Receiving objects: 100% (7/7), done.
17:49:20.088640 run-command.c:626       trace: run_command: 'rev-list' '--objects' '--stdin' '--not' '--all' '--quiet' '--progress=Checking connectivity'
17:49:20.093965 git.c:344               trace: built-in: git 'rev-list' '--objects' '--stdin' '--not' '--all' '--quiet' '--progress=Checking connectivity'

我使用了 GIT_TRACE=1 GIT_TRACE_PACKET=1 環境變量來讓 Git 打印出 clone 過程當中的更多信息,方便調試。並且咱們發現,HTTPS 和 SSH 協議,Git 底層調用了不一樣的命令,可是內容的交互過程倒是極爲類似。dom

簡而言之,整個 Clone 交互的協議過程大體以下:ssh

  • 客戶端向遠端聲明本身要進行的操做 -- git-upload-pack (全部讀取性質的操做都是這個)
  • 服務端返回本身兼容的協議格式以及推薦的 ref 列表
  • 客戶端聲明本身想要接收的對象列表
  • 服務器端計算須要傳輸的全部對象並壓縮並且將對象傳輸至客戶端
  • 客戶端解壓對象,校驗對象
  • 客戶端更新本地 ref (此步驟在上述詳細過程當中未有體現,可看本文最後的 fetch 過程當中體現出的 ref 更新)

要想理解這個協議的傳輸過程,須要對 Git 的底層數據存儲原理有一個基本瞭解,這裏稍微作下科普。

Git 有一個說法是:Git 是一個帶歷史追溯功能的內容尋址系統。聽起來貌似比較抽象,可是其實是很容易理解的,Git 底層對於全部版本控制內容的存儲分爲對象(Object)和引用(Ref)。對象(文件,提交,目錄等等)就是存儲的實際的數據,引用(分支,標籤等等)就是指針。

對象一覽:

咱們能夠經過 git cat-file -p 來查看一個對象的基本信息。

git cat-file -p fdacba1d541c75bd48f2cd742ee18f77ea3517a1
tree ae0532862af27ecd131a7f792c9156624783d562
parent 1536ad10fc0a188c50680932ca191c8da46938c4
author wzw <wangzhenwei@coding.net> 1526896089 +0800
committer wzw <wangzhenwei@coding.net> 1526896089 +0800

update README.md

能夠看到, fdacba1d541c75bd48f2cd742ee18f77ea3517a1 這個對象是一個提交對象,這裏列出了他依賴了父提交 1536ad10fc0a188c50680932ca191c8da46938c4 和目錄樹文件 ae0532862af27ecd131a7f792c9156624783d562 以及他對應的提交做者信息和提交描述

咱們能夠追隨引用再看下他的父提交

git cat-file -p 1536ad10fc0a188c50680932ca191c8da46938c4
tree f7aa6821aa977f65dc987fe6d6838790371f3d90
author wzw <wangzhenwei@coding.net> 1526895383 +0800
committer wzw <wangzhenwei@coding.net> 1526895383 +0800

Initial commit

他的父提交則是依賴目錄樹文件 f7aa6821aa977f65dc987fe6d6838790371f3d90 .

咱們來看下目錄樹文件:

git cat-file -p f7aa6821aa977f65dc987fe6d6838790371f3d90
100644 blob 3aed7e951e0457a2784ff6cd009412e07a09e362    README.md

能夠看到目錄下有一個 blob 對象, ID 是 3aed7e951e0457a2784ff6cd009412e07a09e362, 咱們來看一下它:

git cat-file -p 3aed7e951e0457a2784ff6cd009412e07a09e362
#coding-demo

咱們能夠看到,這個內容是 README.md 文件的第一個版本內容,即其內容對應了 1536ad10fc0a188c50680932ca191c8da46938c4 這個版本。

整體下來, Git 的內部存儲結構是這樣的:

圖片

好,基礎知識補充完畢,有沒有發現火爆的區塊鏈在技術層面上跟 Git 的存儲是有類似之處的 :)

在 Clone 過程當中,服務器端首先會推薦給客戶端一些 ref 列表,這也是 Git v2 協議號稱的性能改進的地方,後文有解釋。

像這樣:

17:49:19.772436 pkt-line.c:80           packet:        clone< fdacba1d541c75bd48f2cd742ee18f77ea3517a1 refs/heads/master
17:49:19.772527 pkt-line.c:80           packet:        clone< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/heads/test-abc
17:49:19.772549 pkt-line.c:80           packet:        clone< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/heads/test-bcd
17:49:19.772566 pkt-line.c:80           packet:        clone< 30eb4b0d813c662c4d7e87c4d3b4cf561e544f8e refs/tags/v1.0
17:49:19.772863 pkt-line.c:80           packet:        clone< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/tags/v1.0^{}

很顯然,上文中的 40 位16進制數字就是對應後面的 ref 指向的對象 ID。

而客戶端,只須要依據本身感興趣的 ref 和本身本地已經存在的對象庫(對於 pull 和 fetch 來說,本地有對象庫,對於 clone 來說本地尚未對象庫,那麼他就是須要全部的感興趣的對象)。

在客戶端計算完畢本身感興趣的對象列表後,會用 want 指令告訴遠端服務器。

17:49:19.776185 pkt-line.c:80           packet:        clone> want fdacba1d541c75bd48f2cd742ee18f77ea3517a1 multi_ack_detailed side-band-64k thin-pack ofs-delta deepen-since deepen-not agent=git/2.15.1.(Apple.Git-101)
17:49:19.776215 pkt-line.c:80           packet:        clone> want fdacba1d541c75bd48f2cd742ee18f77ea3517a1
17:49:19.776224 pkt-line.c:80           packet:        clone> want 1536ad10fc0a188c50680932ca191c8da46938c4
17:49:19.776232 pkt-line.c:80           packet:        clone> want 1536ad10fc0a188c50680932ca191c8da46938c4
17:49:19.776239 pkt-line.c:80           packet:        clone> want 30eb4b0d813c662c4d7e87c4d3b4cf561e544f8e

若是客戶端執行的是 pull 或者 fetch ,他還會告訴遠端本身已經有了什麼對象(在文章的後面,咱們會補充一段專門說明此點)。

遠端服務器會根據客戶端想要的對象以及客戶端已經有的對象並對比自身的對象庫和對象依賴關係,將客戶端必須的對象整理起來並打包壓縮傳給客戶端。

客戶端收到對象包後,解包並校驗對象,並更新引用的對應指向。

Google 在 Protocol version 2 作了什麼

完整的 version 2 的協議說明在這裏: https://www.kernel.org/pub/so...

這裏咱們對其作的主要改動作些說明,主要有三點:

  • 服務端引用過濾
  • 新特性的易擴展性升級(例如可聲明想要什麼 ref)
  • 簡化的客戶端 HTTP 協議處理

被不少標題黨誇大其詞的主要是其第一點:服務端引用過濾。

Google 官方的博客中對此段的描述是這樣的:

The main motivation for the new protocol was to enable server side filtering of references (branches and tags). Prior to protocol v2, servers responded to all fetch commands with an initial reference advertisement, listing all references in the repository. This complete listing is sent even when a client only cares about updating a single branch, e.g.: git fetch origin master. For repositories that contain 100s of thousands of references (the Chromium repository has over 500k branches and tags) the server could end up sending 10s of megabytes of data that get ignored. This typically dominates both time and bandwidth during a fetch, especially when you are updating a branch that's only a few commits behind the remote, or even when you are only checking if you are up-to-date, resulting in a no-op fetch.

We recently rolled out support for protocol version 2 at Google and have seen a performance improvement of 3x for no-op fetches of a single branch on repositories containing 500k references. Protocol v2 has also enabled a reduction of 8x of the overhead bytes (non-packfile) sent from googlesource.com servers. A majority of this improvement is due to filtering references advertised by the server to the refs the client has expressed interest in.

本着實事求是,便利讀者的精神,我把這段文字翻譯成了中文,以下:

新協議最激動人心的是啓用了服務器端過濾引用(分支和標籤)。在 V2 協議以前,服務器對於全部 fetch 命令都以一個初始化的建議引用列表做爲響應,這會列出倉庫中的全部引用。甚至在客戶端只關心他想要更新的那一個分支的時候(例如 git fetch origin master)時,引用列表也會被完整地發送到客戶端。這對於那些有幾十萬個引用(Chromium 的源碼倉庫超過 50萬個分支和標籤),服務器可能要發送不少客戶端徹底忽略掉的內容,這很顯然對時間和帶寬是一個毫無心義的浪費,尤爲是對於那些更新一個只落後於遠端幾個提交或者你本地的分支本就是最新的,只是執行這個檢查更新過程。
咱們最近在 Google 作出了 v2 版本的協議,這使得在一個有50萬引用的倉庫上更新單個分支的性能有了三倍的提高。這也將 googlesource.com 的非 pack 文件的額外數據傳輸下降了8倍。這個提高主要是得益於服務器端能夠根據客戶端聲明的感興趣的引用來過濾引用列表。

讀到這裏,不少人已經看明白了,原文說的很清楚,性能提高只是在客戶端跟服務器端通訊時的第一步,服務器端能夠沒必要發送全部的 ref 列表。這在一些極端場景下(有幾十萬分支和標籤的倉庫),在這個步驟有顯著的性能提示。

而事實上,大多數 Git 倉庫都不會有這麼多 ref,拿示例項目 git@git.coding.net:wzw/coding-demo.git 來講,這個過程的執行是很是快的:

time git ls-remote git@git.coding.net:wzw/coding-demo.git
fdacba1d541c75bd48f2cd742ee18f77ea3517a1    HEAD
fdacba1d541c75bd48f2cd742ee18f77ea3517a1    refs/heads/master
1536ad10fc0a188c50680932ca191c8da46938c4    refs/heads/test-abc
1536ad10fc0a188c50680932ca191c8da46938c4    refs/heads/test-bcd
30eb4b0d813c662c4d7e87c4d3b4cf561e544f8e    refs/tags/v1.0
1536ad10fc0a188c50680932ca191c8da46938c4    refs/tags/v1.0^{}

real    0m0.103s
user    0m0.020s
sys    0m0.004s

執行過程很快,約耗時 100ms,這仍是包含了 SSH 協議連接創建,認證,數據傳輸等過程一塊兒,對於這個過程而言,耗時主要是花在網絡連接,認證過程當中,Git 列出引用列表的過程並非性能瓶頸。

拿 Coding 官方的主開發代碼倉庫來講,目前有 2000+ 標籤,500+ 分支, 還有約 5000 個合併請求創建的隱藏引用。考慮到 Coding 對倉庫有按期 gc,因此有 packed-refs 文件的存在,這個讀取和發送過程的確開始變得慢了,可是仍是在可接受範圍。

time git ls-remote git@e.coding.net:codingcorp/coding-dev.git
// 中間隱藏去了幾千行
5708bacfe2c2510efd0bbb0b4be8268f2a171747    refs/tags/private-1.2
dec93b8774f90c4660bbe8b3759b6d59db30ee45    refs/tags/private-1.2^{}
5ddcbab95eedc1664ac131cddfc51a5d265446ce    refs/tags/release/20160927.1
a91709b7bf08c00fb0b2319aedf999ca7e636109    refs/tags/release/20160927.1^{}
476075fd8442e76d02264f0b109bb2afcb6d39a1    refs/tags/repo-manager-20161118.1
ce29fb126a27f58de555badeb33838d6a3dde8eb    refs/tags/repo-manager-20161118.1^{}
30739025962d6e788f1542841aa509422810853e    refs/tags/test-tag-20180308.1
1a7b7474257badeca9fa0c15204bf5769f42b33a    refs/tags/test-tag-20180308.1^{}

real    0m1.677s
user    0m0.032s
sys    0m0.052s

而 CODING 主開發倉庫的單次全新 Clone 的總傳輸數據量在 550M 左右,以較好的網絡帶寬,clone 倉庫可達 5MB/s 算,也要 110 秒才能所有傳輸完畢,而這前置的 1.677秒就顯得很是微不足道。

這樣算來,Google 的此次改動確實給一些大倉庫(尤爲是一些引用數量特別多的倉庫)在一些特定場景下有了一些優化,並算不上是國內的一些媒體誇大其詞的大幅性能提高。從傳輸過程來看,Git 主要的對象依賴關係計算,對象聲明協議格式,傳輸過程並無改變。其號稱節省了8倍數據量的非 pack 數據的傳輸量只佔總傳輸量很小的比例, 整體算下來其確實節省了數據傳輸量,可是還遠遠沒法達到大幅提高。
固然,咱們仍然要感謝 Googler 對於開源的貢獻仍然值得咱們讚揚。看過此文,但願你們能以一個嚴謹的態度面對技術,不要人云亦云,Talk is cheap, show me your code!

再扯幾句

PS:提及 Git 性能的大幅提高,歷史上 Google 工程師在開發 JGit 的時候,貢獻過一個 bitmap 索引理念給 Git,使得 Git 在作對象關係依賴解析的時候可使用少許的空間節省大量的樹節點遍歷,這纔是真正性能大幅提高的改進,目前 bitmap index 已是 Git 新版本默認攜帶的一個功能了,下次有機會再將其原理分享給你們。

PS2: Git 協議中還有不少其餘特性,這裏爲了講明本文要點,文中沒有說起其餘特性。

PS3:Git 傳輸協議中對於本地已經有的對象的聲明(have 指令)

GIT_TRACE=1 GIT_TRACE_PACKET=1 git fetch origin master
19:58:08.432172 git.c:344               trace: built-in: git 'fetch' 'origin' 'master'
19:58:08.438917 run-command.c:626       trace: run_command: 'ssh' 'git@git.coding.net' 'git-upload-pack '\''wzw/coding-demo.git'\'''
Warning: Permanently added the RSA host key for IP address '123.59.85.127' to the list of known hosts.
19:58:09.634163 pkt-line.c:80           packet:        fetch< 8dccad22648e94c52335a7266c7cff5d947c9532 HEAD\0multi_ack thin-pack side-band side-band-64k ofs-delta shallow deepen-since deepen-not deepen-relative no-progress include-tag multi_ack_detailed symref=HEAD:refs/heads/master agent=git/2.15.0
19:58:09.641777 pkt-line.c:80           packet:        fetch< 8dccad22648e94c52335a7266c7cff5d947c9532 refs/heads/master
19:58:09.641846 pkt-line.c:80           packet:        fetch< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/heads/test-abc
19:58:09.641872 pkt-line.c:80           packet:        fetch< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/heads/test-bcd
19:58:09.641891 pkt-line.c:80           packet:        fetch< 30eb4b0d813c662c4d7e87c4d3b4cf561e544f8e refs/tags/v1.0
19:58:09.641903 pkt-line.c:80           packet:        fetch< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/tags/v1.0^{}
19:58:09.641913 pkt-line.c:80           packet:        fetch< 0000
19:58:09.642105 run-command.c:626       trace: run_command: 'rev-list' '--objects' '--stdin' '--not' '--all' '--quiet'
19:58:09.655120 pkt-line.c:80           packet:        fetch> want 8dccad22648e94c52335a7266c7cff5d947c9532 multi_ack_detailed side-band-64k thin-pack ofs-delta deepen-since deepen-not agent=git/2.15.1.(Apple.Git-101)
19:58:09.655157 pkt-line.c:80           packet:        fetch> 0000
19:58:09.655190 pkt-line.c:80           packet:        fetch> have fdacba1d541c75bd48f2cd742ee18f77ea3517a1
19:58:09.655207 pkt-line.c:80           packet:        fetch> have 1536ad10fc0a188c50680932ca191c8da46938c4
19:58:09.655221 pkt-line.c:80           packet:        fetch> done
19:58:09.975282 pkt-line.c:80           packet:        fetch< ACK fdacba1d541c75bd48f2cd742ee18f77ea3517a1 common
19:58:09.975382 pkt-line.c:80           packet:        fetch< ACK 1536ad10fc0a188c50680932ca191c8da46938c4 common
19:58:09.975404 pkt-line.c:80           packet:        fetch< ACK 1536ad10fc0a188c50680932ca191c8da46938c4
19:58:09.975728 pkt-line.c:80           packet:     sideband< \2Counting objects: 3, done.
remote: Counting objects: 3, done.
19:58:09.975763 pkt-line.c:80           packet:     sideband< \2Compressing objects:  50% (1/2)   \15Compressing objects: 100% (2/2)   \15
19:58:09.975798 pkt-line.c:80           packet:     sideband< \2Compressing objects: 100% (2/2), done.
remote: Compressing objects: 100% (2/2), done.
19:58:10.065650 pkt-line.c:80           packet:     sideband< PACK ...
19:58:10.065707 pkt-line.c:80           packet:     sideband< \2Total 3 (delta 0), reused 0 (delta 0)
remote: Total 3 (delta 0), reused 0 (delta 0)
19:58:10.065714 run-command.c:626       trace: run_command: 'unpack-objects' '--pack_header=2,3'
19:58:10.065741 pkt-line.c:80           packet:     sideband< 0000
19:58:10.071004 git.c:344               trace: built-in: git 'unpack-objects' '--pack_header=2,3'
Unpacking objects: 100% (3/3), done.
19:58:10.317201 run-command.c:626       trace: run_command: 'rev-list' '--objects' '--stdin' '--not' '--all' '--quiet'
19:58:10.322159 git.c:344               trace: built-in: git 'rev-list' '--objects' '--stdin' '--not' '--all' '--quiet'
From git.coding.net:wzw/coding-demo
 * branch            master     -> FETCH_HEAD
   fdacba1..8dccad2  master     -> origin/master
19:58:10.328515 run-command.c:1452      run_processes_parallel: preparing to run up to 1 tasks
19:58:10.328564 run-command.c:1484      run_processes_parallel: done
19:58:10.328621 run-command.c:626       trace: run_command: 'gc' '--auto'
19:58:10.333115 git.c:344               trace: built-in: git 'gc' '--auto'

本文參考資料

相關文章
相關標籤/搜索