原文地址:https://mercurial.selenic.com/wiki/LargefilesExtensionhttps://mercurial.selenic.com/wiki/LargefilesExtension #Largefiles extensionshell
<!> This is considered a feature of last resort. Large binary files tend to be not very compressible, not very "diffable", and not at all mergeable. Such files are not handled well by Mercurial's storage format (Revlog), which is based on compressed binary deltas. largefiles solves this problem by adding a centralized client-server layer on top of Mercurial: largefiles live in a central store out on the network somewhere, and you only fetch the ones that you need when you need them. 大的二進制文件通常不能被壓縮,不能被對比,不能被合併。這些文件不能被Mercurial默認的存儲格式(revlog)所記錄,revlog默認是須要將文件壓縮到一塊兒的。largefiles擴展經過在mercurial之上增長一箇中心化的client-server層來解決這個問題:大型的文件只在中心存儲中保存而不是保存在網絡的其餘地方,當你須要他們的時候纔讀取他們。緩存
##1 Status This extension is distributed with Mercurial 2.0 and later.服務器
這個擴展已經在mercurail2.0或者更高版本中自帶網絡
Author: Variousapp
##2. Overviewdom
The largefiles extension allows for tracking large, incompressible binary files in Mercurial without requiring excessive bandwidth for clones and pulls. Files added as largefiles are not tracked directly by Mercurial; rather, their revisions are identified by a checksum, and Mercurial tracks these checksums. This way, when you clone a repository or pull in changesets, only the largefiles needed to update to the current version are downloaded. This saves both disk space and bandwidth.ide
largefiles擴展可讓mercurial跟蹤大的不能被要鎖的二進制文件,而不須要再clone 或者pull的時候佔據大量的貸款。文件以largefilse方式,而不是經過mercurial進行跟蹤,並且他們的版本revision是經過checksum進行肯定,mercurialtrack這些文件的checksum而不是文件自己。這樣當你clone一個倉庫或者拉取一個修改集的時候,只有那些當前版本須要更新的文件纔會被下載。這樣就節省了磁盤空間和帶寬性能
If you are starting a new repository or adding new large binary files, using largefiles for them is as easy as adding '--large' to your hg add command. For example:fetch
若是你要開啓一個新的倉庫,或者增長一個新的二進制文件,使用largefiles的方法是簡單的加上‘--large’ 參數就能夠了,以下:ui
$ dd if=/dev/urandom of=thisfileislarge count=2000 $ hg add --large thisfileislarge $ hg commit -m 'add thisfileislarge, which is large, as a largefile'
When you push a changeset that affects largefiles to a remote repository, its largefile revisions will be uploaded along with the changeset. This ensures that the central store gets a copy of every revision of every largefile. Note that the remote Mercurial must also have the largefiles extension enabled for this to work.
當你Push一個修改集到遠端的倉庫時,largfile版本就是和修改集一汽被push.這就確保了中心存儲保存有每一個largefiles的每一個版本.另外須要確保遠端倉庫的largefiles擴展是一樣被啓用的狀態。
When you pull a changeset that affects largefiles from a remote repository, nothing different from Mercurial's normal behavior happens. However, when you update to such a revision, any largefiles needed by that revision are downloaded if they have never been downloaded before. This means that network access is required to update to a revision you have not yet updated to.
當你從遠端倉庫pull一個帶有largefiles的修改集,這跟mercurial一般的操做是同樣的。不過,當你要更新到這個版本的時候,任何須要本下載的largefiles才被真正的下載。就是說直到你真正須要更新update到相應版本的時候才進行大文件的網絡訪問。
If you already have large files tracked by Mercurial without the largefiles extension, you will need to convert your repository in order to benefit from largefiles. This is done with the 'hg lfconvert' command:
若是你在使用largefiles 擴展以前已經使用mercurial進行了大文件的跟蹤,那你就須要將你的倉庫進行轉換,使用'hg lfconvert'命令:
$ hg lfconvert --size 10 oldrepo newrepo
By default, in repositories that already have largefiles in them, any new file over 10 MB will automatically be added as largefiles. To change this threshhold, set largefiles.minsize in your Mercurial config file to the minimum size in megabytes to track as a largefile:
默認狀況下對於已經有largefiles的倉庫,任何新的超過10MB的文件將被自動轉爲largefiles方式。你能夠設置largefiles.minsize來改變這個閾值
[largefiles] minsize = 2
or use the --lfsize option to the add command (also in megabytes):
或者使用--lfsize參數在add的時候進行指定
$ hg add --lfsize 2
The largefiles.patterns config option allows you to specify specific space-separated filename patterns (in shell glob syntax) that should always be tracked as largefiles:
largefiles.patterns 屬性設置容許你制定特定的文件類型(使用shell glob 格式)來讓largefiles進行跟蹤:
[largefiles] patterns = *.jpg *.{png,bmp} library.zip content/audio/*
Note: the patterns syntax shown here is probably incorrect, please try hg help patterns to see if it fits better, in particular .{png,bmp} seems not to work, whereas re:..(png|bmp) get things done as expected.
注意:這裏顯示的格式是不正確的,請使用hg help patterns來看正確的格式,這裏*{png,bmp}是不對的,應該是re:.*.(png|bmp)
##3. Configuration 設置 Enable the largefiles extension by adding following lines in your config file:
開啓largefiles擴展功能是在你的config文件中加上以下:
[extensions] largefiles =
##4. Design設計 This section explains how largefiles works behind the scenes. If you're just adding/modifying/committing/pushing/pulling in a largefiles repo, you shouldn't have to read this section (although it can't hurt). But if you are setting up or administering Mercurial with largefiles, this is essential reading.
這一節主要解釋largefiles是如何工做的。若是你僅僅使用adding/modifying/commiting/pusing/pulling操做,則不須要閱讀本節內容(固然讀了也不會死)。可是若是你須要設置或者管理mercurial和largefiles.本節內容則對你很是重要。
###4.1. The local store 本地存儲
Each local repository has a local largefiles store in '.hg/largefiles'. When you add a new largefile to a repository, it is first stored here. When largefiles are downloaded from the central store (see below), a copy is saved there. Files in the local store are also hard-linked to the user cache.
每一個本地的倉庫都有一個本地的largefiles位置在'.hg/largefiles'。當你添加一個新的largefiles,它首先被存儲到這裏。當largefiles從中心存儲中被下載下來也是被存儲到這裏。在本地存儲中的文件一樣是user cache中的hard-link
###4.2. The user cache 用戶緩存
The user cache helps to avoid downloading and storing multiple copies of largefiles. When a largefile is needed but does not exist in the local store, Mercurial checks the user cache. If the needed largefile exists, a hard-link is created in the local store.
用戶緩存用來避免下載和存儲多個拷貝的largefiles.當一個largefiles被須要並且在local store(本地存儲)中沒有的時候,mercurial會首先檢查用戶緩存。若是文件在用戶緩存中存在,則在local store中建立一個hard-link
The cache location is OS dependent:
緩存文件夾的位置根據操做系統有所不一樣:
You can set your user cache to a non-default location by setting largefiles.usercache in your Mercurial config:
你能夠設置largefiles.usercache來改變默認user cache的路徑
[largefiles] usercache = /shared/myusercachedir
The user cache can be deleted at any time to reclaim disk space, but doing so may also result in downloading and storing additional copies of largefiles.
user cache文件在任什麼時候候被刪除以節省磁盤空間,可是刪除之後,若是再須要就須要重新進行下載。
#####4.2.1. The central store
In a typical setup with a central Mercurial server, the user who serves the central repositories will get a user cache that acts as a central store for all the repositories. This central largefiles store has every past revision of every largefile.
在一個一般的mercurial中心服務器中,設置中心倉庫的user將會有一個user cache,這個usercache就像是一箇中心store,爲全部倉庫服務。這個largefiles中心存儲着全部largefile的版本。
<!> Unlike other user caches, the central store should not be deleted! It may be the only cache that holds a largefile used by an old revision.
跟其餘的user cache不同,中心存儲的文件不能刪除,這裏是存儲largefile各個版本的惟一位置。
<!> When a client repository needs to download a largefile, it'll try to get it from the repository specified as default in the hgrc file. If not specified or incorrect repository is specified, the download will fail. As an alternative, a default path can be set for the specific hg update command:
當客戶端的倉庫須要下載一個largefile的時候,它將是同從hgrc配置文件中制定的默認遠程倉庫進行下載。若是沒有指定,或者制定的倉庫不存在,下載將會失敗。另外也能夠在hg update命令時制定相應的倉庫路徑:
hg --config paths.default=path-to-repo-with-the-file update
####4.3. Implementation details 執行的細節
Each largefile has a standin file in '.hglf/', which is tracked by Mercurial like any other file. The standin contains the SHA-1 hash of the largefile contents. When a largefile is added/removed/copied/renamed/etc the same operation is applied to the standin. Thus the history of the standin is the history of the largefile.
每一個largefile 在'.hglf'文件夾中都有一個替身文件,mercurial像其餘文件同樣跟蹤這些文件。替身文件包含着屍體文件的sha-1 hash值。當一個largefile 被 added/removed/copied/renamed/etc 替身文件也會被作相應的操做,這樣替身文件的歷史就是相應largefile的歷史
For performance reasons, the contents of a standin are only updated before a commit. Standins are added/removed/copied/renamed from add/remove/copy/rename Mercurial commands but their contents will not be updated. The contents of a standin will always be the hash of the largefile as of the last commit. To support some commands (revert) some standins are temporarily updated, but changed back after the command is finished.
爲了性能考慮的緣由,替身文件的內容只在commit前進行update.替身文件在mercural add/remove/copy/rename/updated命令時進行相應的added/removed/copied/renamed,可是他們的內用卻不會update.替身文件的內容永遠是相應largefile最後一次commit的hash值。當運行revert或者某些命令的時候替身文件的內容會臨時的updated,可是當命令結束之後又會恢復。
A Mercurial dirstate object tracks the state of the largefiles. The dirstate uses the last modified time and current size to detect if a file has changed without reading the entire contents of the file.
mercurial的一個dirstate object跟蹤largefiles的狀態。dirstate查詢largefile最後的修改時間和當前的文件大小來肯定文件是否被修改,而不是讀取文件的所有內容。 ###5. See also 額外的參考
There are a number of older extensions for managing large files. This extension is a descendant of the BfilesExtension and is now the recommended way to handle such files. Alternatives are BigfilesExtension and SnapExtension.
有幾個老的擴展也用來管理大的文件,本擴展是由BfilesExtension衍生而來,並且目前咱們推薦使用本擴展來處理相似的大文件,其餘的可選擴展還有BigFilesExtension 和SnapExtension