曾經有幾回,我用ls和du查看一個文件的大小,發現兩者顯示出來的大小並不一致,例如: app
bl@d3:~/test/sparse_file$ ls -l fs.img -rw-r--r-- 1 bl bl 1073741824 2012-02-17 05:09 fs.img bl@d3:~/test/sparse_file$ du -sh fs.img 0 fs.img
這裏ls顯示出fs.img的大小是1073741824字節(1GB),而du顯示出fs.img的大小是0。 優化
原來一直沒有深究這個問題,今天特來補上。 spa
形成這兩者不一樣的緣由主要有兩點: 指針
先來看一下稀疏文件。稀疏文件只文件中有「洞」(hole)的文件,例若有C寫一個建立有「洞」的文件: code
#include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> int main(int argc, char *argv[]) { int fd = open("sparse.file", O_RDWR|O_CREAT); lseek(fd, 1024, SEEK_CUR); write(fd, "\0", 1); return 0; }
從這個文件能夠看出,建立一個有「洞」的文件主要是用lseek移動文件指針超過文件末尾,而後write,這樣就造成了一個「洞」。 ip
用Shell也能夠建立稀疏文件: ci
$ dd if=/dev/zero of=sparse_file.img bs=1M seek=1024 count=0 0+0 records in 0+0 records out
使用稀疏文件的優勢以下(Wikipedia上的原文): it
The advantage of sparse files is that storage is only allocated when actually needed: disk space is saved, and large files can be created even if there is insufficient free space on the file system. io
即稀疏文件中的「洞」能夠不佔存儲空間。 class
The du command which prints the occupied space, while ls print the apparent size.
bl@d3:~/test/sparse_file$ echo -n 1 > 1B.txt bl@d3:~/test/sparse_file$ ls -l 1B.txt -rw-r--r-- 1 bl bl 1 2012-02-19 05:17 1B.txt bl@dl3:~/test/sparse_file$ du -h 1B.txt 4.0K 1B.txt
這裏咱們先建立一個文件1B.txt,大小是一個字節,ls顯示出的size就是1Byte,而1B.txt這個文件在硬盤上會佔用N個block,而後根據每一個block的大小計算出來的。這裏之因此用了N,而不是一個具體的數字,是由於隱藏在幕後的細節還不少,例如Fragment size,咱們之後再討論。
固然,上述這些都是ls和du的缺省行爲,ls和du分別提供了不一樣參數來改變這些行爲。好比ls的-s選項(print the allocated size of each file, in blocks)和du的--apparent-size選項(print apparent sizes, rather than disk usage; although the apparent size is usually smaller, it may be larger due to holes in (`sparse') files, internal fragmentation, indirect blocks, and the like).
$ strace cp fs.img fs.img.copy >log 2>&1
stat("fs.img.copy", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 stat("fs.img", {st_mode=S_IFREG|0644, st_size=1073741824, ...}) = 0 stat("fs.img.copy", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 open("fs.img", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=1073741824, ...}) = 0 open("fs.img.copy", O_WRONLY|O_TRUNC) = 4 fstat(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 mmap(NULL, 532480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f90df965000 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 524288) = 524288 lseek(4, 524288, SEEK_CUR) = 524288 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 524288) = 524288 lseek(4, 524288, SEEK_CUR) = 1048576 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 524288) = 524288 lseek(4, 524288, SEEK_CUR) = 1572864
By default, sparse SOURCE files are detected by a crude heuristic and the corresponding DEST file is made sparse as well. That is the behavior selected by --sparse=auto. Specify --sparse=always to create a sparse DEST file whenever the SOURCE file contains a long enough sequence of zero bytes. Use --sparse=never to inhibit creation of sparse files.