每一個月都有那麼幾天想划水,又到划水的日子了,今天分享的是剛在處理遍歷目錄相關用到的相關方法。html
os.walk的參數以下:python
os.walk(top, topdown=True, onerror=None, followlinks=False)
其中:git
top是要遍歷的目錄。函數
topdown是表明要從上而下遍歷仍是從下往上遍歷。測試
onerror能夠用來設置當便利出現錯誤的處理函數(該函數接受一個OSError的實例做爲參數),設置爲空則不做處理。code
followlinks表示是否要跟隨目錄下的連接去繼續遍歷,要注意的是,os.walk不會記錄已經遍歷的目錄,因此跟隨連接遍歷的話有可能一直循環調用下去。htm
os.walk返回的是一個3個元素的元組 (root, dirs, files)
,分別表示遍歷的路徑名,該路徑下的目錄列表和該路徑下文件列表。注意目錄列表和文件列表不是具體路徑,須要具體路徑(從root開始的路徑)的話能夠用 os.path.join(root,dir)
和 os.path.join(root,dir)
。pdo
假設如今存在以下的文件和目錄結構:rem
➜ test_os_walk git:(master) ✗ tree . ├── a.py ├── b.py ├── c.py ├── dir1 │ ├── dir4 │ │ ├── g.py │ │ └── h.py │ ├── dirx │ │ ├── diry │ │ │ └── k.py │ │ └── z.py │ ├── e.py │ ├── f.py │ └── g.py ├── dir2 │ ├── dira │ │ └── dirb │ │ └── dirc │ │ └── aha.py │ ├── k.py │ ├── l.py │ └── m.py └── dir3 ├── dir5 │ └── z.py ├── x.py └── y.py 10 directories, 17 files
當我用 os.walk
遍歷這個目錄時,程序和輸出以下:get
import os path = '/Users/nisen/Projects/python_advanced_class/test/test_os_walk' for root, dirs, files in os.walk(path, True): print 'root: %s' % root print 'dirs: %s' % dirs print 'files: %s' % files print ''
結果以下,從root的路徑能夠看出遍歷是自上而下的:
➜ test git:(master) ✗ python test11.py root: /Users/nisen/Projects/python_advanced_class/test/test_os_walk dirs: ['dir1', 'dir2', 'dir3'] files: ['a.py', 'b.py', 'c.py'] root: /Users/nisen/Projects/python_advanced_class/test/test_os_walk/dir1 dirs: ['dir4', 'dirx'] files: ['e.py', 'f.py', 'g.py'] root: /Users/nisen/Projects/python_advanced_class/test/test_os_walk/dir1/dir4 dirs: [] files: ['g.py', 'h.py'] root: /Users/nisen/Projects/python_advanced_class/test/test_os_walk/dir1/dirx dirs: ['diry'] files: ['z.py'] root: /Users/nisen/Projects/python_advanced_class/test/test_os_walk/dir1/dirx/diry dirs: [] files: ['k.py'] root: /Users/nisen/Projects/python_advanced_class/test/test_os_walk/dir2 dirs: ['dira'] files: ['k.py', 'l.py', 'm.py'] root: /Users/nisen/Projects/python_advanced_class/test/test_os_walk/dir2/dira dirs: ['dirb'] files: [] root: /Users/nisen/Projects/python_advanced_class/test/test_os_walk/dir2/dira/dirb dirs: ['dirc'] files: [] root: /Users/nisen/Projects/python_advanced_class/test/test_os_walk/dir2/dira/dirb/dirc dirs: [] files: ['aha.py'] root: /Users/nisen/Projects/python_advanced_class/test/test_os_walk/dir3 dirs: ['dir5'] files: ['x.py', 'y.py'] root: /Users/nisen/Projects/python_advanced_class/test/test_os_walk/dir3/dir5 dirs: [] files: ['z.py']
而當設置os.walk的topdown爲False時,結果以下, 能夠看出他是自上而下遍歷的:
➜ test git:(master) ✗ python test11.py root: /Users/nisen/Projects/python_advanced_class/test/test_os_walk/dir1/dir4 dirs: [] files: ['g.py', 'h.py'] root: /Users/nisen/Projects/python_advanced_class/test/test_os_walk/dir1/dirx/diry dirs: [] files: ['k.py'] root: /Users/nisen/Projects/python_advanced_class/test/test_os_walk/dir1/dirx dirs: ['diry'] files: ['z.py'] root: /Users/nisen/Projects/python_advanced_class/test/test_os_walk/dir1 dirs: ['dir4', 'dirx'] files: ['e.py', 'f.py', 'g.py'] root: /Users/nisen/Projects/python_advanced_class/test/test_os_walk/dir2/dira/dirb/dirc dirs: [] files: ['aha.py'] root: /Users/nisen/Projects/python_advanced_class/test/test_os_walk/dir2/dira/dirb dirs: ['dirc'] files: [] root: /Users/nisen/Projects/python_advanced_class/test/test_os_walk/dir2/dira dirs: ['dirb'] files: [] root: /Users/nisen/Projects/python_advanced_class/test/test_os_walk/dir2 dirs: ['dira'] files: ['k.py', 'l.py', 'm.py'] root: /Users/nisen/Projects/python_advanced_class/test/test_os_walk/dir3/dir5 dirs: [] files: ['z.py'] root: /Users/nisen/Projects/python_advanced_class/test/test_os_walk/dir3 dirs: ['dir5'] files: ['x.py', 'y.py'] root: /Users/nisen/Projects/python_advanced_class/test/test_os_walk dirs: ['dir1', 'dir2', 'dir3'] files: ['a.py', 'b.py', 'c.py']
當topdown設置爲True時,能夠在處理時修改返回的 dirs
列表,這樣能夠遍歷下面的目錄時會根據修改後的 dirs
來遍歷。好比下面的例子,在遍歷的時候不把"CSV"目錄包括在內:
import os from os.path import join, getsize for root, dirs, files in os.walk('python/Lib/email'): print root, "consumes", print sum(getsize(join(root, name)) for name in files), print "bytes in", len(files), "non-directory files" if 'CVS' in dirs: dirs.remove('CVS') # don't visit CVS directories