rsync同步和備份文件到本地

轉載請註明文章出處: https://tlanyan.me/use-rsync-...

rsync是主機間同步和備份的神器。相對於ftpscp等工具,rsync功能更強大,同步/傳輸效率更高,實屬服務器的必備工具。git

最近使用rsync時發現一個問題:PC和移動硬盤之間用rsync同步,修改過的二進制大文件會整個文件重傳,效率十分低下。說好的rsync只傳輸差別部分呢?仍是二進制文件的問題?但rsync的man手冊明明這樣寫的:github

Rsync is a fast and extraordinarily versatile file copying tool. It can copy locally, to/from another host over any remote shell, or to/from a remote rsync daemon. It offers a large number of options that control every aspect of its behavior and permit very flexible specification of the set of files to be copied. It is famous for its delta-transfer algorithm, which reduces the amount of data sent over the network by sending only the differences between the source files and the existing files in the destination. Rsync is widely used for backups and mirroring and as an improved copy command for everyday use.

帶着這個疑問上網查詢,找到一個和我有一樣困惑的人:Smarter filetransfers than rsync?shell

幸運的是有人完美的回答了這個問題:ubuntu

Rsync will not use deltas but will transmit the full file in its entirety if it - as a single process - is responsible for the source and destination files. It can transmit deltas when there is a separate client and server process running on the source and destination machines.

The reason that rsync will not send deltas when it is the only process is that in order to determine whether it needs to send a delta it needs to read the source and destination files. By the time it's done that it might as well have just copied the file directly.windows

翻譯過來是:主機間經過網絡同步文件,每一個主機各運行一個rsync進程分別本機內的文件hash,而後經過網絡傳輸差別部分;主機內的同步只有一個進程,rsync認爲與其先對比文件再複製差別部分,不如直接進行復制來得快,故而選擇傳送整個文件。bash

仔細想一下,rsync的行爲是合理的:主機間通信的瓶頸在網絡帶寬,先計算差別部分再傳效率高;同主機內是硬盤對拷,速度是網絡速度的十來倍,直接拷貝比通常比先對比再傳輸更快,直接複製整個文件是很好的選擇。服務器

寫了個腳本測試rsync的行爲:網絡

<pre>#!/bin/bash
echo "make test file"
dd if=/dev/zero of=testfile bs=1024k count=512
echo "cp test file"
cp testfile syncfile
echo "make changes to test file"
echo '1234567890' >> testfile
echo "rsync file in local..."
rsync -avh -P testfile syncfile
ssh

echo ""
echo "restore sync file"
dd if=/dev/zero of=syncfile bs=1024k count=512
echo "rsync file via network"
rsync -avh -P testfile localhost:~/syncfile
</pre>ide

測試腳本輸出結果以下:

<img src="https://tlanyan.me/wp-content...; alt="" width="665" height="444" class="aligncenter size-large wp-image-3233" />

結果和預期的一致:本機內同步時,直接全量複製;走SSH協議後,僅發送差別部分,顯著提升效率。

rsync的作法沒毛病,但僅作太小部分修改的大文件,同主機內全量拷貝也很傷人。解決辦法是用測試腳本內的模擬網絡傳輸。Linux系統的主機基本都內置SSHD,寫命令時加上localhost和表明網絡的冒號便可;Windows 10的1809版本上,OpenSSH已經成爲系統的內置組建,安裝和使用也省心。此外有CygwinBitvise SSH Server等可供選擇,安裝好後也同步大文件也再也不是問題。

另外一個須要注意的問題是跨分區或設備進行同步時,文件系統應當互相兼容,不然可能會出現問題。例如從NTFS文件系統向(ex)FAT優盤同步文件,使用經常使用的-avhP選項,每次同步都會將全部文件複製一遍。問題在於兩個文件系統支持的功能不一樣,FAT不支持-l-p等功能,加上這些選項會讓rsync判斷爲兩個不一樣的文件,從而再次複製。針對這種狀況,要使用-cvrhP選項。

參考

  1. Smarter filetransfers than rsync?
  2. OpenSSH in Windows
  3. Installing CYGWIN + SSHD for remote access through SSH on windows
  4. Installing SFTP/SSH Server on Windows using OpenSSH
  5. Bitvise SSH Server
  6. rsync not working between NTFS/FAT and EXT
相關文章
相關標籤/搜索