定義:模塊,用一砣代碼實現了某類功能的代碼集合。 html
爲了編寫可維護的代碼,咱們把不少函數分組,分別放到不一樣的文件裏,提供了代碼的重用性。在Python中,一個.py文件就稱之爲一個模塊(Module)。node
注意:python
模塊分爲三種:mysql
使用模塊有什麼好處?nginx
第一:最大的好處是大大提升了代碼的可維護性。git
其次:編寫代碼沒必要從零開始。當一個模塊編寫完畢,就能夠被其餘地方引用。咱們在編寫程序的時候,也常常引用其餘模塊,包括Python程序員
內置的模塊和來自第三方的模塊。github
第三:使用模塊還能夠避免函數名和變量名衝突。相同名字的函數和變量徹底能夠分別存在不一樣的模塊中,所以,咱們本身在編寫模塊時,正則表達式
沒必要考慮名字會與其餘模塊衝突。redis
可是也要注意:儘可能不要與內置函數名字衝突。
注意:自定義模塊不要與系統內置的模塊同名!
2.模塊須知:
Python之因此應用愈來愈普遍,在必定程度上也依賴於其爲程序員提供了大量的模塊以供使用,若是想要使用模塊,則須要導入。導入模塊用import:
import 語法:
import module1[, module2[,... moduleN]
Python 自己帶着一些標準的模塊庫
#!/usr/bin/python3 # 文件名: using_sys.py import sys print('命令行參數以下:') for i in sys.argv: print(i) print('\n\nPython 路徑爲:', sys.path, '\n')
注意:
import os #獲取文件的當前路徑 current_path = os.path.dirname(__file__) #獲取父級目錄 pre_path = os.path.dirname(current_path) print(pre_path)
import拓展:
import module #導入整個模塊,可是指定模塊中的變量和函數並無導入,導入以後就能夠直接用module.[點]的方式訪問裏面的變量和方法了! from module.xx.xx import xx[,yy] #導入模塊中某些變量或函數 from module.xx.xx import xx as rename[,yy as rename] #別名 from module.xx.xx import * #導入模塊中全部的不是如下劃線(_)開頭的名字都導入到當前位置.[_]單下劃線開頭的是私有變量或者方法,只能在本模塊使用,不能夠在別的模塊使用!
通常來講,應該避免使用from … import 而使用import語句,由於這樣可使你的程序更加易讀,也能夠避免名稱衝突
導入模塊其實就是告訴Python解釋器去解釋那個py文件
那麼問題來了,導入模塊時是根據那個路徑做爲基準來進行的呢?即:sys.path;若是sys.path路徑列表沒有你想要的路徑,能夠經過 sys.path.append('路徑') 添加。
sys.path.insert(0,'/x/y/z') #排在前的目錄,優先被搜索 sys.path.remove() #刪除某個搜索目錄
綜上所述:當咱們導入某個模塊的時候,首先會從python內置的模塊【內置模塊如:sys】中找,找不到再去sys.path路徑中查找【此路徑包含當前目錄(空目錄)】
在一個模塊中,咱們可能會定義不少函數和變量,但有的函數和變量咱們但願給別人使用,有的函數和變量咱們但願僅僅在模塊內部使用。在Python中,是經過_
前綴來實現的。
正常的函數和變量名是公開的(public),能夠被直接引用,好比:abc
,x123
,PI
等;
相似__xxx__
這樣的變量是特殊變量,能夠被直接引用,可是有特殊用途
好比:做者:__author__
,文檔註釋__doc__
就是特殊變量,hello
模塊定義的也能夠用特殊變量訪問,咱們本身的變量通常不要用這種變量名;
相似_xxx
和__xxx
這樣的函數或變量就是非公開的(private),不該該被直接引用,好比_abc
,__abc
等;
之因此咱們說,private函數和變量「不該該」被直接引用,而不是「不能」被直接引用,是由於Python並無一種方法能夠徹底限制訪問private函數或變量,可是,從編程習慣上不該該引用private函數或變量。
private函數或變量不該該被別人引用,那它們有什麼用呢?請看例子:
def _private_1(name): return 'Hello, %s' % name def _private_2(name): return 'Hi, %s' % name def greeting(name): if len(name) > 3: return _private_1(name) else: return _private_2(name)
咱們在模塊裏公開greeting()
函數,而把內部邏輯用private函數隱藏起來了,這樣,調用greeting()
函數不用關心內部的private函數細節,這也是一種很是有用的代碼封裝和抽象的方法,即:
外部不須要引用的函數所有定義成private,只有外部須要引用的函數才定義爲public。
__name__屬性:
一個模塊被另外一個程序第一次引入時,其主程序將運行。若是咱們想在模塊被引入時,模塊中的某一程序塊不執行,咱們能夠用__name__屬性來使該程序塊僅在該模塊自身運行時執行。
if __name__ == '__main__': print('程序自身在運行') else: print('我來自另外一模塊')
每一個模塊都有一個__name__屬性,當其值是'__main__'時,代表該模塊自身在運行,不然是被引入。
dir() 函數:
內置的函數 dir() 能夠找到模塊【可調用對象】內定義的全部屬性和方法,這些方法能夠以模塊名.[點]的方式訪問。以一個字符串列表的形式返回:
import sys print(sys.__name__) #注意:用import導入時是能夠訪問它的全部的屬性和方法的! print(dir(sys))
若是沒有參數,dir()列舉出當前定義的名字
dir()不會列舉出內建函數或者變量的名字,它們都被定義到了標準模塊builtin中,能夠列舉出它們,
import builtins dir(builtins)
爲了提升模塊的加載速度,Python緩存編譯的版本,每一個模塊在__pycache__目錄的以module.version.pyc的形式命名,一般包含了python的版本號,如在CPython版本3.3,關於spam.py的編譯版本將被緩存成__pycache__/spam.cpython-33.pyc,這種命名約定容許不一樣的版本,不一樣版本的Python編寫模塊共存。
Python檢查源文件的修改時間與編譯的版本進行對比,若是過時就須要從新編譯。這是徹底自動的過程。而且編譯的模塊是平臺獨立的,因此相同的庫能夠在不一樣的架構的系統之間共享,即pyc使一種跨平臺的字節碼,相似於JAVA、NET,是由python虛擬機來執行的,可是pyc的內容跟python的版本相關,不一樣的版本編譯後的pyc文件不一樣,2.5編譯的pyc文件不能到3.5上執行,而且pyc文件是能夠反編譯的,於是它的出現僅僅是用來提高模塊的加載速度的。
提示:
1.模塊名區分大小寫,foo.py與FOO.py表明的是兩個模塊
2.你可使用-O或者-OO轉換python命令來減小編譯模塊的大小
1 -O轉換會幫你去掉assert語句 2 -OO轉換會幫你去掉assert語句和__doc__文檔字符串 3 因爲一些程序可能依賴於assert語句或文檔字符串,你應該在在確認須要的狀況下使用這些選項。
3.在速度上從.pyc文件中讀指令來執行不會比從.py文件中讀指令執行更快,只有在模塊被加載時,.pyc文件纔是更快的
4.只有使用import語句是纔將文件自動編譯爲.pyc文件,在命令行或標準輸入中指定運行腳本則不會生成這類文件,於是咱們可使用compieall模塊爲一個目錄中的全部模塊建立.pyc文件
模塊能夠做爲一個腳本(使用python -m compileall)編譯Python源 python -m compileall /module_directory 遞歸着編譯 若是使用python -O -m compileall /module_directory -l則只一層 命令行裏使用compile()函數時,自動使用python -O -m compileall 詳見:https://docs.python.org/3/library/compileall.html#module-compileall
總結:python並不是徹底是解釋性語言,它是有編譯的,先把源碼py文件編譯成pyc,而後由python的虛擬機執行!
你也許還想到,若是不一樣的人編寫的模塊名相同怎麼辦?爲了不模塊名衝突,Python又引入了按目錄來組織模塊的方法,稱爲包(Package)。
函數就至關於工具,模塊就是工具包,包就至關於工具包的集合!
舉個例子,一個init.py
的文件就是一個名字叫init
的模塊,一個Demo.py
的文件就是一個名字叫Demo
的模塊。
如今,假設咱們的init和Demo
這兩個模塊名字與其餘模塊衝突了,因而咱們能夠經過包來組織模塊,避免衝突。方法是選擇一個頂層包名,好比oop
,按照以下目錄存放:
引入了包之後,只要頂層的包名不與別人衝突,那全部模塊都不會與別人衝突。如今,init.py
模塊的名字就變成了oop.init
,相似的,Demo.py
的模塊名變成了oop.Demo
。
請注意,每個包目錄下面都會有一個__init__.py
的文件,這個文件是必須存在的,不然,Python就把這個目錄當成普通目錄,而不是一個包。
__init__.py
能夠是空文件,也能夠有Python代碼,由於__init__.py
自己就是一個模塊,而它的模塊名就是oop。
相似的,能夠有多級目錄,組成多級層次的包結構。好比以下的目錄結構:
兩個文件Demo.py
的模塊名分別是oop.Demo
和oop.conf.Demo
。
不管是import形式仍是from...import形式,凡是在導入語句中(而不是在使用時)遇到帶點的,都要第一時間提升警覺:這是關於包纔有的導入語法
包的本質就是一個包含__init__.py文件的目錄。
包A和包B下有同名模塊也不會衝突,如A.a與B.a來自倆個命名空間:
glance/ #Top-level package ├── __init__.py #Initialize the glance package ├── api #Subpackage for api │ ├── __init__.py │ ├── policy.py │ └── versions.py ├── cmd #Subpackage for cmd │ ├── __init__.py │ └── manage.py └── db #Subpackage for db ├── __init__.py └── models.py
#文件內容 #policy.py def get(): print('from policy.py') #versions.py def create_resource(conf): print('from version.py: ',conf) #manage.py def main(): print('from manage.py') #models.py def register_models(engine): print('from models.py: ',engine)
2.1 注意事項
1.關於包相關的導入語句也分爲import和from ... import ...兩種,可是不管哪一種,不管在什麼位置,在導入時都必須遵循一個原則:凡是在導入時帶點的,點的左邊都必須是一個包,不然非法。能夠帶有一連串的點, 如item.subitem.subsubitem,但都必須遵循這個原則。
2.對於導入後,在使用時就沒有這種限制了,點的左邊能夠是包,模塊,函數,類(它們均可以用點的方式調用本身的屬性)。
3.對比import item 和from item import name的應用場景:
若是咱們想直接使用name那必須使用後者。
import
咱們在與包glance同級別的文件中測試
import glance.db.models glance.db.models.register_models('mysql')
from ... import ...
須要注意的是from後import導入的模塊,必須是明確的一個不能帶點,不然會有語法錯誤,如:from a import b.c是錯誤語法
咱們在與包glance同級別的文件中測試
from glance.db import models models.register_models('mysql') from glance.db.models import register_models register_models('mysql')
__init__.py文件
無論是哪一種方式,只要是第一次導入包或者是包的任何其餘部分,都會依次執行包下的__init__.py文件(咱們能夠在每一個包的文件內都打印一行內容來驗證一下),這個文件能夠爲空,可是也能夠存放一些初始化包的代碼。
from glance.api import *
在講模塊時,咱們已經討論過了從一個模塊內導入全部*,此處咱們研究從一個包導入全部模塊[*]。
此處是想從包api中導入全部,實際上該語句只會執行導入包api下__init__.py文件中的代碼【,此時在導入的地方就能夠引用定義在__init__文件的變量和方法了】,而沒有導入這個包下的其它模塊,要是想導入裏面的某些或全部模塊,咱們能夠在這個文件中定義__all__,而後將模塊都放到__all__表示的列表中:
1 #在__init__.py中定義 2 x=10 3 4 def func(): 5 print('from api.__init.py') 6 7 __all__=['x','func','policy']
此時咱們在於glance同級的文件中執行from glance.api import *就導入__all__中的內容(versions仍然不能導入),同時若是__init__文件中的變量或者函數若是不加入到__all__列表中,實際上在導入的地方也是不可以直接使用的,簡言之:__all__限定了 glance.api import *導入包的時候只是導入了init文件的__all__中的全部內容[注意:__all__只是給from module import *用的哦]。
注意:上面的from glance.api import * 僅僅是導入一個包中的全部模塊的時候是按着上面那麼用,實際上以下:
從某個包中導入某個文件from glance.db import models即從glance.db包中導入models模塊仍是能夠這麼用的,而後使用models.register_models()函數也是能夠的!
固然導入某個包某個文件中的全部不是以_開頭的屬性和方法也是能夠的,以下所示:
from test.lib.aa import * #導入test包下lib包下的aa模塊,而後就能夠在下面使用aa模塊中的af()函數了! af()
雖然有這種方式,可是不建議這麼使用!
咱們的最頂級包glance是寫給別人用的,而後在glance包內部也會有彼此之間互相導入的需求,這時候就有絕對導入和相對導入兩種方式:
絕對導入:以glance做爲起始
相對導入:用.或者..的方式最爲起始(只能在一個包中使用,不能用於不一樣目錄內)
例如:咱們在glance/api/versions.py中想要導入glance/api/policy.py文件,這時候咱們又在與glance包在同一級目錄的py文件中導入了glance/api/versions.py就會出問題!以下所示:
import policy policy.get() #注意:此時導入policy模塊,並調用policy模塊中的get()方法是沒問題的,此時執行的時候是以當前文件的相對路徑導入的,因此沒問題! def create_resource(conf): print('from version.py: ',conf)
import glance.api.versions #Demo.py與glance包在同一級目錄,此時就會有問題,由於此時是以Demo的路徑爲當前路徑執行的,當執行導入glance.api.versions的時候,在這個文件中 #又導入了policy文件,可是此時的policy文件查找路徑是以Demo的路徑爲基準查找的,因此找不到,報錯誤!
出現以下錯誤[緣由在上面已寫]:
ImportError: No module named 'policy'
特別須要注意的是:能夠用import導入內置或者第三方開源模塊,可是要絕對避免使用import來導入自定義包的子模塊,應該使用from... import ...的絕對或者相對導入,且包的相對導入只能用from的形式。
這時咱們能夠導入例如:咱們在glance/api/versions.py中想要導入glance/api/policy.py能夠用相對路徑和絕對路徑的方式,以下所示:
from glance.api import policy #絕對路徑 # from . import policy 相對路徑 #from ..cmd import manage policy.get() def create_resource(conf): print('from version.py: ',conf)
尤爲是咱們在用到了包的概念的時候,咱們對項目的組織結構就是一個項目下有多個包,包中再有子包或者py文件,此時咱們包中的文件必定要用from ... import ... 的方式導入,方便本身在別的包使用,也方便給別人的時候使用!
單獨導入包名稱時不會導入包中全部包含的全部子模塊,如:
#在與glance同級的test.py中 import glance glance.cmd.manage.main() ''' 執行結果: AttributeError: module 'glance' has no attribute 'cmd' '''
解決方法:
#glance/__init__.py from . import cmd #glance/cmd/__init__.py from . import manage
千萬別問:__all__不能解決嗎,__all__是用於控制from...import * ,fuck!
綜上所述:在包內的py文件若是想要導入本身寫的py文件就用from module import 導入【若是導入的是內置的或者第三方包能夠用import】,在包外面導入包【導入包中的文件就不用這麼作了】的時候,能夠用咱們上面這種方式,在包下面的__init__.py文件中加入 from . import module/py
下載安裝有兩種方式:
yum pip apt-get ...
下載源碼
解壓源碼
進入目錄
編譯源碼 python setup.py build
安裝源碼 python setup.py install
注:在使用源碼安裝時,須要使用到gcc編譯和python開發環境,因此,須要先執行:
yum install gcc yum install python-devel 或 apt-get python-dev
安裝成功後,模塊會自動安裝到 sys.path 中的某個目錄中,如:
/usr/lib/python2.7/site-packages/
2、導入模塊
同自定義模塊中導入的方式
3、模塊 paramiko
paramiko是一個用於作遠程控制的模塊,使用該模塊能夠對遠程服務器進行命令或文件操做,值得一說的是,fabric和ansible內部的遠程管理就是使用的paramiko來現實。
一、下載安裝
pip3 install paramiko
或
# pycrypto,因爲 paramiko 模塊內部依賴pycrypto,因此先下載安裝pycrypto # 下載安裝 pycrypto wget http://files.cnblogs.com/files/wupeiqi/pycrypto-2.6.1.tar.gz tar -xvf pycrypto-2.6.1.tar.gz cd pycrypto-2.6.1 python setup.py build python setup.py install # 進入python環境,導入Crypto檢查是否安裝成功 # 下載安裝 paramiko wget http://files.cnblogs.com/files/wupeiqi/paramiko-1.10.1.tar.gz tar -xvf paramiko-1.10.1.tar.gz cd paramiko-1.10.1 python setup.py build python setup.py install # 進入python環境,導入paramiko檢查是否安裝成功
二、使用模塊
#!/usr/bin/env python #coding:utf-8 import paramiko ssh = paramiko.SSHClient() ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy()) ssh.connect('192.168.1.108', 22, 'alex', '123') stdin, stdout, stderr = ssh.exec_command('df') print stdout.read() ssh.close();
import os,sys import paramiko t = paramiko.Transport(('182.92.219.86',22)) t.connect(username='wupeiqi',password='123') sftp = paramiko.SFTPClient.from_transport(t) sftp.put('/tmp/test.py','/tmp/test.py') t.close() import os,sys import paramiko t = paramiko.Transport(('182.92.219.86',22)) t.connect(username='wupeiqi',password='123') sftp = paramiko.SFTPClient.from_transport(t) sftp.get('/tmp/test.py','/tmp/test2.py') t.close()
python的內置模塊在python下載的解釋器中是找不到py文件的!好比sys模塊對吧,那麼在下載python解釋器的時候,還自帶了Lib庫等文件夾,這裏面也是有py文件的,咱們把這些自帶的以及找不到py文件的這些統稱爲內置模塊!
用於提供系統級別的操做
os.getcwd() 獲取當前工做目錄,即當前python腳本工做的目錄路徑 os.chdir("dirname") 改變當前腳本工做目錄;至關於shell下cd os.curdir 返回當前目錄: ('.') os.pardir 獲取當前目錄的父目錄字符串名:('..') os.makedirs('dirname1/dirname2') 可生成多層遞歸目錄 os.removedirs('dirname1') 若目錄爲空,則刪除,並遞歸到上一級目錄,如若也爲空,則刪除,依此類推 os.mkdir('dirname') 生成單級目錄;至關於shell中mkdir dirname os.rmdir('dirname') 刪除單級空目錄,若目錄不爲空則沒法刪除,報錯;至關於shell中rmdir dirname os.listdir('dirname') 列出指定目錄下的全部文件和子目錄,包括隱藏文件,並以列表方式打印 os.remove() 刪除一個文件 os.rename("oldname","newname") 重命名文件/目錄 os.stat('path/filename') 獲取文件/目錄信息 os.sep 輸出操做系統特定的路徑分隔符,win下爲"\\",Linux下爲"/" os.linesep 輸出當前平臺使用的行終止符,win下爲"\t\n",Linux下爲"\n" os.pathsep 輸出用於分割文件路徑的字符串 os.name 輸出字符串指示當前使用平臺。win->'nt'; Linux->'posix' os.system("bash command") 運行shell命令,直接顯示 os.environ 獲取系統環境變量 os.path.abspath(path) 返回path規範化的絕對路徑 os.path.split(path) 將path分割成目錄和文件名二元組返回 os.path.dirname(path) 返回path的目錄。其實就是os.path.split(path)的第一個元素 os.path.basename(path) 返回path最後的文件名。如何path以/或\結尾,那麼就會返回空值。即os.path.split(path)的第二個元素 os.path.exists(path) 若是path存在,返回True;若是path不存在,返回False os.path.isabs(path) 若是path是絕對路徑,返回True os.path.isfile(path) 若是path是一個存在的文件,返回True。不然返回False os.path.isdir(path) 若是path是一個存在的目錄,則返回True。不然返回False os.path.join(path1[, path2[, ...]]) 將多個路徑組合後返回,第一個絕對路徑以前的參數將被忽略 os.path.getatime(path) 返回path所指向的文件或者目錄的最後存取時間 os.path.getmtime(path) 返回path所指向的文件或者目錄的最後修改時間
更多猛擊這裏
用於提供對解釋器相關的操做
sys.argv 命令行參數List,第一個元素是程序自己路徑 sys.exit(n) 退出程序,正常退出時exit(0) sys.version 獲取Python解釋程序的版本信息 sys.maxint 最大的Int值 sys.path 返回模塊的搜索路徑,初始化時使用PYTHONPATH環境變量的值 sys.platform 返回操做系統平臺名稱 sys.stdout.write('please:') val = sys.stdin.readline()[:-1]
更多猛擊這裏
用於加密相關的操做,代替了md5模塊和sha模塊,主要提供 SHA1, SHA224, SHA256, SHA384, SHA512 ,MD5 算法
import md5 hash = md5.new() hash.update('admin') print hash.hexdigest()
import sha hash = sha.new() hash.update('admin') print hash.hexdigest()
import hashlib # ######## md5 ######## hash = hashlib.md5() hash.update('admin') print hash.hexdigest() # ######## sha1 ######## hash = hashlib.sha1() hash.update('admin') print hash.hexdigest() # ######## sha256 ######## hash = hashlib.sha256() hash.update('admin') print hash.hexdigest() # ######## sha384 ######## hash = hashlib.sha384() hash.update('admin') print hash.hexdigest() # ######## sha512 ######## hash = hashlib.sha512() hash.update('admin') print hash.hexdigest()
以上加密算法雖然依然很是厲害,但時候存在缺陷,即:經過撞庫能夠反解。因此,有必要對加密算法中添加自定義key再來作加密。
import hashlib # ######## md5 ######## hash = hashlib.md5('898oaFs09f') hash.update('admin') print hash.hexdigest()
還不夠吊?python 還有一個 hmac 模塊,它內部對咱們建立 key 和 內容 再進行處理而後再加密
散列消息鑑別碼,簡稱HMAC,是一種基於消息鑑別碼MAC(Message Authentication Code)的鑑別機制。使用HMAC時,消息通信的雙方,經過驗證消息中加入的鑑別密鑰K來鑑別消息的真僞;
通常用於網絡通訊中消息加密,前提是雙方先要約定好key,就像接頭暗號同樣,而後消息發送把用key把消息加密,接收方用key + 消息明文再加密,拿加密後的值 跟 發送者的相對比是否相等,這樣就能驗證消息的真實性,及發送者的合法性了。
import hmac h = hmac.new(b'天王蓋地虎', b'寶塔鎮河妖') print h.hexdigest()
更多關於md5,sha1,sha256等介紹的文章看這裏https://www.tbs-certificates.co.uk/FAQ/en/sha256.html
隨機數
mport random print random.random() print random.randint(1,2) print random.randrange(1,10)
生成隨機驗證碼
import random checkcode = '' for i in range(4): current = random.randrange(0,4) if current != i: temp = chr(random.randint(65,90)) else: temp = random.randint(0,9) checkcode += str(temp) print checkcode
用於序列化的兩個模塊
Json模塊提供了四個功能:dumps、dump、loads、load
pickle模塊提供了四個功能:dumps、dump、loads、load
#_*_coding:utf-8_*_ __author__ = 'Alex Li' import time # print(time.clock()) #返回處理器時間,3.3開始已廢棄 , 改爲了time.process_time()測量處理器運算時間,不包括sleep時間,不穩定,mac上測不出來 # print(time.altzone) #返回與utc時間的時間差,以秒計算\ # print(time.asctime()) #返回時間格式"Fri Aug 19 11:14:16 2016", # print(time.localtime()) #返回本地時間 的struct time對象格式 # print(time.gmtime(time.time()-800000)) #返回utc時間的struc時間對象格式 # print(time.asctime(time.localtime())) #返回時間格式"Fri Aug 19 11:14:16 2016", #print(time.ctime()) #返回Fri Aug 19 12:38:29 2016 格式, 同上 # 日期字符串 轉成 時間戳 # string_2_struct = time.strptime("2016/05/22","%Y/%m/%d") #將 日期字符串 轉成 struct時間對象格式 # print(string_2_struct) # # # struct_2_stamp = time.mktime(string_2_struct) #將struct時間對象轉成時間戳 # print(struct_2_stamp) #將時間戳轉爲字符串格式 # print(time.gmtime(time.time()-86640)) #將utc時間戳轉換成struct_time格式 # print(time.strftime("%Y-%m-%d %H:%M:%S",time.gmtime()) ) #將utc struct_time格式轉成指定的字符串格式 #時間加減 import datetime # print(datetime.datetime.now()) #返回 2016-08-19 12:47:03.941925 #print(datetime.date.fromtimestamp(time.time()) ) # 時間戳直接轉成日期格式 2016-08-19 # print(datetime.datetime.now() ) # print(datetime.datetime.now() + datetime.timedelta(3)) #當前時間+3天 # print(datetime.datetime.now() + datetime.timedelta(-3)) #當前時間-3天 # print(datetime.datetime.now() + datetime.timedelta(hours=3)) #當前時間+3小時 # print(datetime.datetime.now() + datetime.timedelta(minutes=30)) #當前時間+30分 # # c_time = datetime.datetime.now() # print(c_time.replace(minute=3,hour=2)) #時間替換
Directive | Meaning | Notes |
---|---|---|
%a |
Locale’s abbreviated weekday name. | |
%A |
Locale’s full weekday name. | |
%b |
Locale’s abbreviated month name. | |
%B |
Locale’s full month name. | |
%c |
Locale’s appropriate date and time representation. | |
%d |
Day of the month as a decimal number [01,31]. | |
%H |
Hour (24-hour clock) as a decimal number [00,23]. | |
%I |
Hour (12-hour clock) as a decimal number [01,12]. | |
%j |
Day of the year as a decimal number [001,366]. | |
%m |
Month as a decimal number [01,12]. | |
%M |
Minute as a decimal number [00,59]. | |
%p |
Locale’s equivalent of either AM or PM. | (1) |
%S |
Second as a decimal number [00,61]. | (2) |
%U |
Week number of the year (Sunday as the first day of the week) as a decimal number [00,53]. All days in a new year preceding the first Sunday are considered to be in week 0. | (3) |
%w |
Weekday as a decimal number [0(Sunday),6]. | |
%W |
Week number of the year (Monday as the first day of the week) as a decimal number [00,53]. All days in a new year preceding the first Monday are considered to be in week 0. | (3) |
%x |
Locale’s appropriate date representation. | |
%X |
Locale’s appropriate time representation. | |
%y |
Year without century as a decimal number [00,99]. | |
%Y |
Year with century as a decimal number. | |
%z |
Time zone offset indicating a positive or negative time difference from UTC/GMT of the form +HHMM or -HHMM, where H represents decimal hour digits and M represents decimal minute digits [-23:59, +23:59]. | |
%Z |
Time zone name (no characters if no time zone exists). | |
%% |
A literal '%' character. |
高級的 文件、文件夾、壓縮包 處理模塊
shutil.copyfileobj(fsrc, fdst[, length])
將文件內容拷貝到另外一個文件中,能夠部份內容
def copyfileobj(fsrc, fdst, length=16*1024): """copy data from file-like object fsrc to file-like object fdst""" while 1: buf = fsrc.read(length) if not buf: break fdst.write(buf)
shutil.copyfile(src, dst)
拷貝文件
def copyfile(src, dst): """Copy data from src to dst""" if _samefile(src, dst): raise Error("`%s` and `%s` are the same file" % (src, dst)) for fn in [src, dst]: try: st = os.stat(fn) except OSError: # File most likely does not exist pass else: # XXX What about other special files? (sockets, devices...) if stat.S_ISFIFO(st.st_mode): raise SpecialFileError("`%s` is a named pipe" % fn) with open(src, 'rb') as fsrc: with open(dst, 'wb') as fdst: copyfileobj(fsrc, fdst)
shutil.copymode(src, dst)
僅拷貝權限。內容、組、用戶均不變
def copymode(src, dst): """Copy mode bits from src to dst""" if hasattr(os, 'chmod'): st = os.stat(src) mode = stat.S_IMODE(st.st_mode) os.chmod(dst, mode)
shutil.copystat(src, dst)
拷貝狀態的信息,包括:mode bits, atime, mtime, flags
def copystat(src, dst): """Copy all stat info (mode bits, atime, mtime, flags) from src to dst""" st = os.stat(src) mode = stat.S_IMODE(st.st_mode) if hasattr(os, 'utime'): os.utime(dst, (st.st_atime, st.st_mtime)) if hasattr(os, 'chmod'): os.chmod(dst, mode) if hasattr(os, 'chflags') and hasattr(st, 'st_flags'): try: os.chflags(dst, st.st_flags) except OSError, why: for err in 'EOPNOTSUPP', 'ENOTSUP': if hasattr(errno, err) and why.errno == getattr(errno, err): break else: raise
shutil.copy(src, dst)
拷貝文件和權限
def copy(src, dst): """Copy data and mode bits ("cp src dst"). The destination may be a directory. """ if os.path.isdir(dst): dst = os.path.join(dst, os.path.basename(src)) copyfile(src, dst) copymode(src, dst)
shutil.copy2(src, dst)
拷貝文件和狀態信息
def copy2(src, dst): """Copy data and all stat info ("cp -p src dst"). The destination may be a directory. """ if os.path.isdir(dst): dst = os.path.join(dst, os.path.basename(src)) copyfile(src, dst) copystat(src, dst)
shutil.ignore_patterns(*patterns)
shutil.copytree(src, dst, symlinks=False, ignore=None)
遞歸的去拷貝文件
例如:copytree(source, destination, ignore=ignore_patterns('*.pyc', 'tmp*'))
def ignore_patterns(*patterns): """Function that can be used as copytree() ignore parameter. Patterns is a sequence of glob-style patterns that are used to exclude files""" def _ignore_patterns(path, names): ignored_names = [] for pattern in patterns: ignored_names.extend(fnmatch.filter(names, pattern)) return set(ignored_names) return _ignore_patterns def copytree(src, dst, symlinks=False, ignore=None): """Recursively copy a directory tree using copy2(). The destination directory must not already exist. If exception(s) occur, an Error is raised with a list of reasons. If the optional symlinks flag is true, symbolic links in the source tree result in symbolic links in the destination tree; if it is false, the contents of the files pointed to by symbolic links are copied. The optional ignore argument is a callable. If given, it is called with the `src` parameter, which is the directory being visited by copytree(), and `names` which is the list of `src` contents, as returned by os.listdir(): callable(src, names) -> ignored_names Since copytree() is called recursively, the callable will be called once for each directory that is copied. It returns a list of names relative to the `src` directory that should not be copied. XXX Consider this example code rather than the ultimate tool. """ names = os.listdir(src) if ignore is not None: ignored_names = ignore(src, names) else: ignored_names = set() os.makedirs(dst) errors = [] for name in names: if name in ignored_names: continue srcname = os.path.join(src, name) dstname = os.path.join(dst, name) try: if symlinks and os.path.islink(srcname): linkto = os.readlink(srcname) os.symlink(linkto, dstname) elif os.path.isdir(srcname): copytree(srcname, dstname, symlinks, ignore) else: # Will raise a SpecialFileError for unsupported file types copy2(srcname, dstname) # catch the Error from the recursive copytree so that we can # continue with other files except Error, err: errors.extend(err.args[0]) except EnvironmentError, why: errors.append((srcname, dstname, str(why))) try: copystat(src, dst) except OSError, why: if WindowsError is not None and isinstance(why, WindowsError): # Copying file access times may fail on Windows pass else: errors.append((src, dst, str(why))) if errors: raise Error, errors
shutil.rmtree(path[, ignore_errors[, onerror]])
遞歸的去刪除文件
def rmtree(path, ignore_errors=False, onerror=None): """Recursively delete a directory tree. If ignore_errors is set, errors are ignored; otherwise, if onerror is set, it is called to handle the error with arguments (func, path, exc_info) where func is os.listdir, os.remove, or os.rmdir; path is the argument to that function that caused it to fail; and exc_info is a tuple returned by sys.exc_info(). If ignore_errors is false and onerror is None, an exception is raised. """ if ignore_errors: def onerror(*args): pass elif onerror is None: def onerror(*args): raise try: if os.path.islink(path): # symlinks to directories are forbidden, see bug #1669 raise OSError("Cannot call rmtree on a symbolic link") except OSError: onerror(os.path.islink, path, sys.exc_info()) # can't continue even if onerror hook returns return names = [] try: names = os.listdir(path) except os.error, err: onerror(os.listdir, path, sys.exc_info()) for name in names: fullname = os.path.join(path, name) try: mode = os.lstat(fullname).st_mode except os.error: mode = 0 if stat.S_ISDIR(mode): rmtree(fullname, ignore_errors, onerror) else: try: os.remove(fullname) except os.error, err: onerror(os.remove, fullname, sys.exc_info()) try: os.rmdir(path) except os.error: onerror(os.rmdir, path, sys.exc_info())
shutil.move(src, dst)
遞歸的去移動文件
def move(src, dst): """Recursively move a file or directory to another location. This is similar to the Unix "mv" command. If the destination is a directory or a symlink to a directory, the source is moved inside the directory. The destination path must not already exist. If the destination already exists but is not a directory, it may be overwritten depending on os.rename() semantics. If the destination is on our current filesystem, then rename() is used. Otherwise, src is copied to the destination and then removed. A lot more could be done here... A look at a mv.c shows a lot of the issues this implementation glosses over. """ real_dst = dst if os.path.isdir(dst): if _samefile(src, dst): # We might be on a case insensitive filesystem, # perform the rename anyway. os.rename(src, dst) return real_dst = os.path.join(dst, _basename(src)) if os.path.exists(real_dst): raise Error, "Destination path '%s' already exists" % real_dst try: os.rename(src, real_dst) except OSError: if os.path.isdir(src): if _destinsrc(src, dst): raise Error, "Cannot move a directory '%s' into itself '%s'." % (src, dst) copytree(src, real_dst, symlinks=True) rmtree(src) else: copy2(src, real_dst) os.unlink(src)
shutil.make_archive(base_name, format,...)
建立壓縮包並返回文件路徑,例如:zip、tar
#將 /Users/wupeiqi/Downloads/test 下的文件打包放置當前程序目錄 import shutil ret = shutil.make_archive("wwwwwwwwww", 'gztar', root_dir='/Users/wupeiqi/Downloads/test') #將 /Users/wupeiqi/Downloads/test 下的文件打包放置 /Users/wupeiqi/目錄 import shutil ret = shutil.make_archive("/Users/wupeiqi/wwwwwwwwww", 'gztar', root_dir='/Users/wupeiqi/Downloads/test')
def make_archive(base_name, format, root_dir=None, base_dir=None, verbose=0, dry_run=0, owner=None, group=None, logger=None): """Create an archive file (eg. zip or tar). 'base_name' is the name of the file to create, minus any format-specific extension; 'format' is the archive format: one of "zip", "tar", "bztar" or "gztar". 'root_dir' is a directory that will be the root directory of the archive; ie. we typically chdir into 'root_dir' before creating the archive. 'base_dir' is the directory where we start archiving from; ie. 'base_dir' will be the common prefix of all files and directories in the archive. 'root_dir' and 'base_dir' both default to the current directory. Returns the name of the archive file. 'owner' and 'group' are used when creating a tar archive. By default, uses the current owner and group. """ save_cwd = os.getcwd() if root_dir is not None: if logger is not None: logger.debug("changing into '%s'", root_dir) base_name = os.path.abspath(base_name) if not dry_run: os.chdir(root_dir) if base_dir is None: base_dir = os.curdir kwargs = {'dry_run': dry_run, 'logger': logger} try: format_info = _ARCHIVE_FORMATS[format] except KeyError: raise ValueError, "unknown archive format '%s'" % format func = format_info[0] for arg, val in format_info[1]: kwargs[arg] = val if format != 'zip': kwargs['owner'] = owner kwargs['group'] = group try: filename = func(base_name, base_dir, **kwargs) finally: if root_dir is not None: if logger is not None: logger.debug("changing back to '%s'", save_cwd) os.chdir(save_cwd) return filename
shutil 對壓縮包的處理是調用 ZipFile 和 TarFile 兩個模塊來進行的,詳細:
import zipfile # 壓縮 z = zipfile.ZipFile('laxi.zip', 'w') z.write('a.log') z.write('data.data') z.close() # 解壓 z = zipfile.ZipFile('laxi.zip', 'r') z.extractall() z.close()
import tarfile # 壓縮 tar = tarfile.open('your.tar','w') tar.add('/Users/wupeiqi/PycharmProjects/bbs2.zip', arcname='bbs2.zip') tar.add('/Users/wupeiqi/PycharmProjects/cmdb.zip', arcname='cmdb.zip') tar.close() # 解壓 tar = tarfile.open('your.tar','r') tar.extractall() # 可設置解壓地址 tar.close()
class ZipFile(object): """ Class with methods to open, read, write, close, list zip files. z = ZipFile(file, mode="r", compression=ZIP_STORED, allowZip64=False) file: Either the path to the file, or a file-like object. If it is a path, the file will be opened and closed by ZipFile. mode: The mode can be either read "r", write "w" or append "a". compression: ZIP_STORED (no compression) or ZIP_DEFLATED (requires zlib). allowZip64: if True ZipFile will create files with ZIP64 extensions when needed, otherwise it will raise an exception when this would be necessary. """ fp = None # Set here since __del__ checks it def __init__(self, file, mode="r", compression=ZIP_STORED, allowZip64=False): """Open the ZIP file with mode read "r", write "w" or append "a".""" if mode not in ("r", "w", "a"): raise RuntimeError('ZipFile() requires mode "r", "w", or "a"') if compression == ZIP_STORED: pass elif compression == ZIP_DEFLATED: if not zlib: raise RuntimeError,\ "Compression requires the (missing) zlib module" else: raise RuntimeError, "That compression method is not supported" self._allowZip64 = allowZip64 self._didModify = False self.debug = 0 # Level of printing: 0 through 3 self.NameToInfo = {} # Find file info given name self.filelist = [] # List of ZipInfo instances for archive self.compression = compression # Method of compression self.mode = key = mode.replace('b', '')[0] self.pwd = None self._comment = '' # Check if we were passed a file-like object if isinstance(file, basestring): self._filePassed = 0 self.filename = file modeDict = {'r' : 'rb', 'w': 'wb', 'a' : 'r+b'} try: self.fp = open(file, modeDict[mode]) except IOError: if mode == 'a': mode = key = 'w' self.fp = open(file, modeDict[mode]) else: raise else: self._filePassed = 1 self.fp = file self.filename = getattr(file, 'name', None) try: if key == 'r': self._RealGetContents() elif key == 'w': # set the modified flag so central directory gets written # even if no files are added to the archive self._didModify = True elif key == 'a': try: # See if file is a zip file self._RealGetContents() # seek to start of directory and overwrite self.fp.seek(self.start_dir, 0) except BadZipfile: # file is not a zip file, just append self.fp.seek(0, 2) # set the modified flag so central directory gets written # even if no files are added to the archive self._didModify = True else: raise RuntimeError('Mode must be "r", "w" or "a"') except: fp = self.fp self.fp = None if not self._filePassed: fp.close() raise def __enter__(self): return self def __exit__(self, type, value, traceback): self.close() def _RealGetContents(self): """Read in the table of contents for the ZIP file.""" fp = self.fp try: endrec = _EndRecData(fp) except IOError: raise BadZipfile("File is not a zip file") if not endrec: raise BadZipfile, "File is not a zip file" if self.debug > 1: print endrec size_cd = endrec[_ECD_SIZE] # bytes in central directory offset_cd = endrec[_ECD_OFFSET] # offset of central directory self._comment = endrec[_ECD_COMMENT] # archive comment # "concat" is zero, unless zip was concatenated to another file concat = endrec[_ECD_LOCATION] - size_cd - offset_cd if endrec[_ECD_SIGNATURE] == stringEndArchive64: # If Zip64 extension structures are present, account for them concat -= (sizeEndCentDir64 + sizeEndCentDir64Locator) if self.debug > 2: inferred = concat + offset_cd print "given, inferred, offset", offset_cd, inferred, concat # self.start_dir: Position of start of central directory self.start_dir = offset_cd + concat fp.seek(self.start_dir, 0) data = fp.read(size_cd) fp = cStringIO.StringIO(data) total = 0 while total < size_cd: centdir = fp.read(sizeCentralDir) if len(centdir) != sizeCentralDir: raise BadZipfile("Truncated central directory") centdir = struct.unpack(structCentralDir, centdir) if centdir[_CD_SIGNATURE] != stringCentralDir: raise BadZipfile("Bad magic number for central directory") if self.debug > 2: print centdir filename = fp.read(centdir[_CD_FILENAME_LENGTH]) # Create ZipInfo instance to store file information x = ZipInfo(filename) x.extra = fp.read(centdir[_CD_EXTRA_FIELD_LENGTH]) x.comment = fp.read(centdir[_CD_COMMENT_LENGTH]) x.header_offset = centdir[_CD_LOCAL_HEADER_OFFSET] (x.create_version, x.create_system, x.extract_version, x.reserved, x.flag_bits, x.compress_type, t, d, x.CRC, x.compress_size, x.file_size) = centdir[1:12] x.volume, x.internal_attr, x.external_attr = centdir[15:18] # Convert date/time code to (year, month, day, hour, min, sec) x._raw_time = t x.date_time = ( (d>>9)+1980, (d>>5)&0xF, d&0x1F, t>>11, (t>>5)&0x3F, (t&0x1F) * 2 ) x._decodeExtra() x.header_offset = x.header_offset + concat x.filename = x._decodeFilename() self.filelist.append(x) self.NameToInfo[x.filename] = x # update total bytes read from central directory total = (total + sizeCentralDir + centdir[_CD_FILENAME_LENGTH] + centdir[_CD_EXTRA_FIELD_LENGTH] + centdir[_CD_COMMENT_LENGTH]) if self.debug > 2: print "total", total def namelist(self): """Return a list of file names in the archive.""" l = [] for data in self.filelist: l.append(data.filename) return l def infolist(self): """Return a list of class ZipInfo instances for files in the archive.""" return self.filelist def printdir(self): """Print a table of contents for the zip file.""" print "%-46s %19s %12s" % ("File Name", "Modified ", "Size") for zinfo in self.filelist: date = "%d-%02d-%02d %02d:%02d:%02d" % zinfo.date_time[:6] print "%-46s %s %12d" % (zinfo.filename, date, zinfo.file_size) def testzip(self): """Read all the files and check the CRC.""" chunk_size = 2 ** 20 for zinfo in self.filelist: try: # Read by chunks, to avoid an OverflowError or a # MemoryError with very large embedded files. with self.open(zinfo.filename, "r") as f: while f.read(chunk_size): # Check CRC-32 pass except BadZipfile: return zinfo.filename def getinfo(self, name): """Return the instance of ZipInfo given 'name'.""" info = self.NameToInfo.get(name) if info is None: raise KeyError( 'There is no item named %r in the archive' % name) return info def setpassword(self, pwd): """Set default password for encrypted files.""" self.pwd = pwd @property def comment(self): """The comment text associated with the ZIP file.""" return self._comment @comment.setter def comment(self, comment): # check for valid comment length if len(comment) > ZIP_MAX_COMMENT: import warnings warnings.warn('Archive comment is too long; truncating to %d bytes' % ZIP_MAX_COMMENT, stacklevel=2) comment = comment[:ZIP_MAX_COMMENT] self._comment = comment self._didModify = True def read(self, name, pwd=None): """Return file bytes (as a string) for name.""" return self.open(name, "r", pwd).read() def open(self, name, mode="r", pwd=None): """Return file-like object for 'name'.""" if mode not in ("r", "U", "rU"): raise RuntimeError, 'open() requires mode "r", "U", or "rU"' if not self.fp: raise RuntimeError, \ "Attempt to read ZIP archive that was already closed" # Only open a new file for instances where we were not # given a file object in the constructor if self._filePassed: zef_file = self.fp should_close = False else: zef_file = open(self.filename, 'rb') should_close = True try: # Make sure we have an info object if isinstance(name, ZipInfo): # 'name' is already an info object zinfo = name else: # Get info object for name zinfo = self.getinfo(name) zef_file.seek(zinfo.header_offset, 0) # Skip the file header: fheader = zef_file.read(sizeFileHeader) if len(fheader) != sizeFileHeader: raise BadZipfile("Truncated file header") fheader = struct.unpack(structFileHeader, fheader) if fheader[_FH_SIGNATURE] != stringFileHeader: raise BadZipfile("Bad magic number for file header") fname = zef_file.read(fheader[_FH_FILENAME_LENGTH]) if fheader[_FH_EXTRA_FIELD_LENGTH]: zef_file.read(fheader[_FH_EXTRA_FIELD_LENGTH]) if fname != zinfo.orig_filename: raise BadZipfile, \ 'File name in directory "%s" and header "%s" differ.' % ( zinfo.orig_filename, fname) # check for encrypted flag & handle password is_encrypted = zinfo.flag_bits & 0x1 zd = None if is_encrypted: if not pwd: pwd = self.pwd if not pwd: raise RuntimeError, "File %s is encrypted, " \ "password required for extraction" % name zd = _ZipDecrypter(pwd) # The first 12 bytes in the cypher stream is an encryption header # used to strengthen the algorithm. The first 11 bytes are # completely random, while the 12th contains the MSB of the CRC, # or the MSB of the file time depending on the header type # and is used to check the correctness of the password. bytes = zef_file.read(12) h = map(zd, bytes[0:12]) if zinfo.flag_bits & 0x8: # compare against the file type from extended local headers check_byte = (zinfo._raw_time >> 8) & 0xff else: # compare against the CRC otherwise check_byte = (zinfo.CRC >> 24) & 0xff if ord(h[11]) != check_byte: raise RuntimeError("Bad password for file", name) return ZipExtFile(zef_file, mode, zinfo, zd, close_fileobj=should_close) except: if should_close: zef_file.close() raise def extract(self, member, path=None, pwd=None): """Extract a member from the archive to the current working directory, using its full name. Its file information is extracted as accurately as possible. `member' may be a filename or a ZipInfo object. You can specify a different directory using `path'. """ if not isinstance(member, ZipInfo): member = self.getinfo(member) if path is None: path = os.getcwd() return self._extract_member(member, path, pwd) def extractall(self, path=None, members=None, pwd=None): """Extract all members from the archive to the current working directory. `path' specifies a different directory to extract to. `members' is optional and must be a subset of the list returned by namelist(). """ if members is None: members = self.namelist() for zipinfo in members: self.extract(zipinfo, path, pwd) def _extract_member(self, member, targetpath, pwd): """Extract the ZipInfo object 'member' to a physical file on the path targetpath. """ # build the destination pathname, replacing # forward slashes to platform specific separators. arcname = member.filename.replace('/', os.path.sep) if os.path.altsep: arcname = arcname.replace(os.path.altsep, os.path.sep) # interpret absolute pathname as relative, remove drive letter or # UNC path, redundant separators, "." and ".." components. arcname = os.path.splitdrive(arcname)[1] arcname = os.path.sep.join(x for x in arcname.split(os.path.sep) if x not in ('', os.path.curdir, os.path.pardir)) if os.path.sep == '\\': # filter illegal characters on Windows illegal = ':<>|"?*' if isinstance(arcname, unicode): table = {ord(c): ord('_') for c in illegal} else: table = string.maketrans(illegal, '_' * len(illegal)) arcname = arcname.translate(table) # remove trailing dots arcname = (x.rstrip('.') for x in arcname.split(os.path.sep)) arcname = os.path.sep.join(x for x in arcname if x) targetpath = os.path.join(targetpath, arcname) targetpath = os.path.normpath(targetpath) # Create all upper directories if necessary. upperdirs = os.path.dirname(targetpath) if upperdirs and not os.path.exists(upperdirs): os.makedirs(upperdirs) if member.filename[-1] == '/': if not os.path.isdir(targetpath): os.mkdir(targetpath) return targetpath with self.open(member, pwd=pwd) as source, \ file(targetpath, "wb") as target: shutil.copyfileobj(source, target) return targetpath def _writecheck(self, zinfo): """Check for errors before writing a file to the archive.""" if zinfo.filename in self.NameToInfo: import warnings warnings.warn('Duplicate name: %r' % zinfo.filename, stacklevel=3) if self.mode not in ("w", "a"): raise RuntimeError, 'write() requires mode "w" or "a"' if not self.fp: raise RuntimeError, \ "Attempt to write ZIP archive that was already closed" if zinfo.compress_type == ZIP_DEFLATED and not zlib: raise RuntimeError, \ "Compression requires the (missing) zlib module" if zinfo.compress_type not in (ZIP_STORED, ZIP_DEFLATED): raise RuntimeError, \ "That compression method is not supported" if not self._allowZip64: requires_zip64 = None if len(self.filelist) >= ZIP_FILECOUNT_LIMIT: requires_zip64 = "Files count" elif zinfo.file_size > ZIP64_LIMIT: requires_zip64 = "Filesize" elif zinfo.header_offset > ZIP64_LIMIT: requires_zip64 = "Zipfile size" if requires_zip64: raise LargeZipFile(requires_zip64 + " would require ZIP64 extensions") def write(self, filename, arcname=None, compress_type=None): """Put the bytes from filename into the archive under the name arcname.""" if not self.fp: raise RuntimeError( "Attempt to write to ZIP archive that was already closed") st = os.stat(filename) isdir = stat.S_ISDIR(st.st_mode) mtime = time.localtime(st.st_mtime) date_time = mtime[0:6] # Create ZipInfo instance to store file information if arcname is None: arcname = filename arcname = os.path.normpath(os.path.splitdrive(arcname)[1]) while arcname[0] in (os.sep, os.altsep): arcname = arcname[1:] if isdir: arcname += '/' zinfo = ZipInfo(arcname, date_time) zinfo.external_attr = (st[0] & 0xFFFF) << 16L # Unix attributes if compress_type is None: zinfo.compress_type = self.compression else: zinfo.compress_type = compress_type zinfo.file_size = st.st_size zinfo.flag_bits = 0x00 zinfo.header_offset = self.fp.tell() # Start of header bytes self._writecheck(zinfo) self._didModify = True if isdir: zinfo.file_size = 0 zinfo.compress_size = 0 zinfo.CRC = 0 zinfo.external_attr |= 0x10 # MS-DOS directory flag self.filelist.append(zinfo) self.NameToInfo[zinfo.filename] = zinfo self.fp.write(zinfo.FileHeader(False)) return with open(filename, "rb") as fp: # Must overwrite CRC and sizes with correct data later zinfo.CRC = CRC = 0 zinfo.compress_size = compress_size = 0 # Compressed size can be larger than uncompressed size zip64 = self._allowZip64 and \ zinfo.file_size * 1.05 > ZIP64_LIMIT self.fp.write(zinfo.FileHeader(zip64)) if zinfo.compress_type == ZIP_DEFLATED: cmpr = zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION, zlib.DEFLATED, -15) else: cmpr = None file_size = 0 while 1: buf = fp.read(1024 * 8) if not buf: break file_size = file_size + len(buf) CRC = crc32(buf, CRC) & 0xffffffff if cmpr: buf = cmpr.compress(buf) compress_size = compress_size + len(buf) self.fp.write(buf) if cmpr: buf = cmpr.flush() compress_size = compress_size + len(buf) self.fp.write(buf) zinfo.compress_size = compress_size else: zinfo.compress_size = file_size zinfo.CRC = CRC zinfo.file_size = file_size if not zip64 and self._allowZip64: if file_size > ZIP64_LIMIT: raise RuntimeError('File size has increased during compressing') if compress_size > ZIP64_LIMIT: raise RuntimeError('Compressed size larger than uncompressed size') # Seek backwards and write file header (which will now include # correct CRC and file sizes) position = self.fp.tell() # Preserve current position in file self.fp.seek(zinfo.header_offset, 0) self.fp.write(zinfo.FileHeader(zip64)) self.fp.seek(position, 0) self.filelist.append(zinfo) self.NameToInfo[zinfo.filename] = zinfo def writestr(self, zinfo_or_arcname, bytes, compress_type=None): """Write a file into the archive. The contents is the string 'bytes'. 'zinfo_or_arcname' is either a ZipInfo instance or the name of the file in the archive.""" if not isinstance(zinfo_or_arcname, ZipInfo): zinfo = ZipInfo(filename=zinfo_or_arcname, date_time=time.localtime(time.time())[:6]) zinfo.compress_type = self.compression if zinfo.filename[-1] == '/': zinfo.external_attr = 0o40775 << 16 # drwxrwxr-x zinfo.external_attr |= 0x10 # MS-DOS directory flag else: zinfo.external_attr = 0o600 << 16 # ?rw------- else: zinfo = zinfo_or_arcname if not self.fp: raise RuntimeError( "Attempt to write to ZIP archive that was already closed") if compress_type is not None: zinfo.compress_type = compress_type zinfo.file_size = len(bytes) # Uncompressed size zinfo.header_offset = self.fp.tell() # Start of header bytes self._writecheck(zinfo) self._didModify = True zinfo.CRC = crc32(bytes) & 0xffffffff # CRC-32 checksum if zinfo.compress_type == ZIP_DEFLATED: co = zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION, zlib.DEFLATED, -15) bytes = co.compress(bytes) + co.flush() zinfo.compress_size = len(bytes) # Compressed size else: zinfo.compress_size = zinfo.file_size zip64 = zinfo.file_size > ZIP64_LIMIT or \ zinfo.compress_size > ZIP64_LIMIT if zip64 and not self._allowZip64: raise LargeZipFile("Filesize would require ZIP64 extensions") self.fp.write(zinfo.FileHeader(zip64)) self.fp.write(bytes) if zinfo.flag_bits & 0x08: # Write CRC and file sizes after the file data fmt = '<LQQ' if zip64 else '<LLL' self.fp.write(struct.pack(fmt, zinfo.CRC, zinfo.compress_size, zinfo.file_size)) self.fp.flush() self.filelist.append(zinfo) self.NameToInfo[zinfo.filename] = zinfo def __del__(self): """Call the "close()" method in case the user forgot.""" self.close() def close(self): """Close the file, and for mode "w" and "a" write the ending records.""" if self.fp is None: return try: if self.mode in ("w", "a") and self._didModify: # write ending records pos1 = self.fp.tell() for zinfo in self.filelist: # write central directory dt = zinfo.date_time dosdate = (dt[0] - 1980) << 9 | dt[1] << 5 | dt[2] dostime = dt[3] << 11 | dt[4] << 5 | (dt[5] // 2) extra = [] if zinfo.file_size > ZIP64_LIMIT \ or zinfo.compress_size > ZIP64_LIMIT: extra.append(zinfo.file_size) extra.append(zinfo.compress_size) file_size = 0xffffffff compress_size = 0xffffffff else: file_size = zinfo.file_size compress_size = zinfo.compress_size if zinfo.header_offset > ZIP64_LIMIT: extra.append(zinfo.header_offset) header_offset = 0xffffffffL else: header_offset = zinfo.header_offset extra_data = zinfo.extra if extra: # Append a ZIP64 field to the extra's extra_data = struct.pack( '<HH' + 'Q'*len(extra), 1, 8*len(extra), *extra) + extra_data extract_version = max(45, zinfo.extract_version) create_version = max(45, zinfo.create_version) else: extract_version = zinfo.extract_version create_version = zinfo.create_version try: filename, flag_bits = zinfo._encodeFilenameFlags() centdir = struct.pack(structCentralDir, stringCentralDir, create_version, zinfo.create_system, extract_version, zinfo.reserved, flag_bits, zinfo.compress_type, dostime, dosdate, zinfo.CRC, compress_size, file_size, len(filename), len(extra_data), len(zinfo.comment), 0, zinfo.internal_attr, zinfo.external_attr, header_offset) except DeprecationWarning: print >>sys.stderr, (structCentralDir, stringCentralDir, create_version, zinfo.create_system, extract_version, zinfo.reserved, zinfo.flag_bits, zinfo.compress_type, dostime, dosdate, zinfo.CRC, compress_size, file_size, len(zinfo.filename), len(extra_data), len(zinfo.comment), 0, zinfo.internal_attr, zinfo.external_attr, header_offset) raise self.fp.write(centdir) self.fp.write(filename) self.fp.write(extra_data) self.fp.write(zinfo.comment) pos2 = self.fp.tell() # Write end-of-zip-archive record centDirCount = len(self.filelist) centDirSize = pos2 - pos1 centDirOffset = pos1 requires_zip64 = None if centDirCount > ZIP_FILECOUNT_LIMIT: requires_zip64 = "Files count" elif centDirOffset > ZIP64_LIMIT: requires_zip64 = "Central directory offset" elif centDirSize > ZIP64_LIMIT: requires_zip64 = "Central directory size" if requires_zip64: # Need to write the ZIP64 end-of-archive records if not self._allowZip64: raise LargeZipFile(requires_zip64 + " would require ZIP64 extensions") zip64endrec = struct.pack( structEndArchive64, stringEndArchive64, 44, 45, 45, 0, 0, centDirCount, centDirCount, centDirSize, centDirOffset) self.fp.write(zip64endrec) zip64locrec = struct.pack( structEndArchive64Locator, stringEndArchive64Locator, 0, pos2, 1) self.fp.write(zip64locrec) centDirCount = min(centDirCount, 0xFFFF) centDirSize = min(centDirSize, 0xFFFFFFFF) centDirOffset = min(centDirOffset, 0xFFFFFFFF) endrec = struct.pack(structEndArchive, stringEndArchive, 0, 0, centDirCount, centDirCount, centDirSize, centDirOffset, len(self._comment)) self.fp.write(endrec) self.fp.write(self._comment) self.fp.flush() finally: fp = self.fp self.fp = None if not self._filePassed: fp.close()
class TarFile(object): """The TarFile Class provides an interface to tar archives. """ debug = 0 # May be set from 0 (no msgs) to 3 (all msgs) dereference = False # If true, add content of linked file to the # tar file, else the link. ignore_zeros = False # If true, skips empty or invalid blocks and # continues processing. errorlevel = 1 # If 0, fatal errors only appear in debug # messages (if debug >= 0). If > 0, errors # are passed to the caller as exceptions. format = DEFAULT_FORMAT # The format to use when creating an archive. encoding = ENCODING # Encoding for 8-bit character strings. errors = None # Error handler for unicode conversion. tarinfo = TarInfo # The default TarInfo class to use. fileobject = ExFileObject # The default ExFileObject class to use. def __init__(self, name=None, mode="r", fileobj=None, format=None, tarinfo=None, dereference=None, ignore_zeros=None, encoding=None, errors=None, pax_headers=None, debug=None, errorlevel=None): """Open an (uncompressed) tar archive `name'. `mode' is either 'r' to read from an existing archive, 'a' to append data to an existing file or 'w' to create a new file overwriting an existing one. `mode' defaults to 'r'. If `fileobj' is given, it is used for reading or writing data. If it can be determined, `mode' is overridden by `fileobj's mode. `fileobj' is not closed, when TarFile is closed. """ modes = {"r": "rb", "a": "r+b", "w": "wb"} if mode not in modes: raise ValueError("mode must be 'r', 'a' or 'w'") self.mode = mode self._mode = modes[mode] if not fileobj: if self.mode == "a" and not os.path.exists(name): # Create nonexistent files in append mode. self.mode = "w" self._mode = "wb" fileobj = bltn_open(name, self._mode) self._extfileobj = False else: if name is None and hasattr(fileobj, "name"): name = fileobj.name if hasattr(fileobj, "mode"): self._mode = fileobj.mode self._extfileobj = True self.name = os.path.abspath(name) if name else None self.fileobj = fileobj # Init attributes. if format is not None: self.format = format if tarinfo is not None: self.tarinfo = tarinfo if dereference is not None: self.dereference = dereference if ignore_zeros is not None: self.ignore_zeros = ignore_zeros if encoding is not None: self.encoding = encoding if errors is not None: self.errors = errors elif mode == "r": self.errors = "utf-8" else: self.errors = "strict" if pax_headers is not None and self.format == PAX_FORMAT: self.pax_headers = pax_headers else: self.pax_headers = {} if debug is not None: self.debug = debug if errorlevel is not None: self.errorlevel = errorlevel # Init datastructures. self.closed = False self.members = [] # list of members as TarInfo objects self._loaded = False # flag if all members have been read self.offset = self.fileobj.tell() # current position in the archive file self.inodes = {} # dictionary caching the inodes of # archive members already added try: if self.mode == "r": self.firstmember = None self.firstmember = self.next() if self.mode == "a": # Move to the end of the archive, # before the first empty block. while True: self.fileobj.seek(self.offset) try: tarinfo = self.tarinfo.fromtarfile(self) self.members.append(tarinfo) except EOFHeaderError: self.fileobj.seek(self.offset) break except HeaderError, e: raise ReadError(str(e)) if self.mode in "aw": self._loaded = True if self.pax_headers: buf = self.tarinfo.create_pax_global_header(self.pax_headers.copy()) self.fileobj.write(buf) self.offset += len(buf) except: if not self._extfileobj: self.fileobj.close() self.closed = True raise def _getposix(self): return self.format == USTAR_FORMAT def _setposix(self, value): import warnings warnings.warn("use the format attribute instead", DeprecationWarning, 2) if value: self.format = USTAR_FORMAT else: self.format = GNU_FORMAT posix = property(_getposix, _setposix) #-------------------------------------------------------------------------- # Below are the classmethods which act as alternate constructors to the # TarFile class. The open() method is the only one that is needed for # public use; it is the "super"-constructor and is able to select an # adequate "sub"-constructor for a particular compression using the mapping # from OPEN_METH. # # This concept allows one to subclass TarFile without losing the comfort of # the super-constructor. A sub-constructor is registered and made available # by adding it to the mapping in OPEN_METH. @classmethod def open(cls, name=None, mode="r", fileobj=None, bufsize=RECORDSIZE, **kwargs): """Open a tar archive for reading, writing or appending. Return an appropriate TarFile class. mode: 'r' or 'r:*' open for reading with transparent compression 'r:' open for reading exclusively uncompressed 'r:gz' open for reading with gzip compression 'r:bz2' open for reading with bzip2 compression 'a' or 'a:' open for appending, creating the file if necessary 'w' or 'w:' open for writing without compression 'w:gz' open for writing with gzip compression 'w:bz2' open for writing with bzip2 compression 'r|*' open a stream of tar blocks with transparent compression 'r|' open an uncompressed stream of tar blocks for reading 'r|gz' open a gzip compressed stream of tar blocks 'r|bz2' open a bzip2 compressed stream of tar blocks 'w|' open an uncompressed stream for writing 'w|gz' open a gzip compressed stream for writing 'w|bz2' open a bzip2 compressed stream for writing """ if not name and not fileobj: raise ValueError("nothing to open") if mode in ("r", "r:*"): # Find out which *open() is appropriate for opening the file. for comptype in cls.OPEN_METH: func = getattr(cls, cls.OPEN_METH[comptype]) if fileobj is not None: saved_pos = fileobj.tell() try: return func(name, "r", fileobj, **kwargs) except (ReadError, CompressionError), e: if fileobj is not None: fileobj.seek(saved_pos) continue raise ReadError("file could not be opened successfully") elif ":" in mode: filemode, comptype = mode.split(":", 1) filemode = filemode or "r" comptype = comptype or "tar" # Select the *open() function according to # given compression. if comptype in cls.OPEN_METH: func = getattr(cls, cls.OPEN_METH[comptype]) else: raise CompressionError("unknown compression type %r" % comptype) return func(name, filemode, fileobj, **kwargs) elif "|" in mode: filemode, comptype = mode.split("|", 1) filemode = filemode or "r" comptype = comptype or "tar" if filemode not in ("r", "w"): raise ValueError("mode must be 'r' or 'w'") stream = _Stream(name, filemode, comptype, fileobj, bufsize) try: t = cls(name, filemode, stream, **kwargs) except: stream.close() raise t._extfileobj = False return t elif mode in ("a", "w"): return cls.taropen(name, mode, fileobj, **kwargs) raise ValueError("undiscernible mode") @classmethod def taropen(cls, name, mode="r", fileobj=None, **kwargs): """Open uncompressed tar archive name for reading or writing. """ if mode not in ("r", "a", "w"): raise ValueError("mode must be 'r', 'a' or 'w'") return cls(name, mode, fileobj, **kwargs) @classmethod def gzopen(cls, name, mode="r", fileobj=None, compresslevel=9, **kwargs): """Open gzip compressed tar archive name for reading or writing. Appending is not allowed. """ if mode not in ("r", "w"): raise ValueError("mode must be 'r' or 'w'") try: import gzip gzip.GzipFile except (ImportError, AttributeError): raise CompressionError("gzip module is not available") try: fileobj = gzip.GzipFile(name, mode, compresslevel, fileobj) except OSError: if fileobj is not None and mode == 'r': raise ReadError("not a gzip file") raise try: t = cls.taropen(name, mode, fileobj, **kwargs) except IOError: fileobj.close() if mode == 'r': raise ReadError("not a gzip file") raise except: fileobj.close() raise t._extfileobj = False return t @classmethod def bz2open(cls, name, mode="r", fileobj=None, compresslevel=9, **kwargs): """Open bzip2 compressed tar archive name for reading or writing. Appending is not allowed. """ if mode not in ("r", "w"): raise ValueError("mode must be 'r' or 'w'.") try: import bz2 except ImportError: raise CompressionError("bz2 module is not available") if fileobj is not None: fileobj = _BZ2Proxy(fileobj, mode) else: fileobj = bz2.BZ2File(name, mode, compresslevel=compresslevel) try: t = cls.taropen(name, mode, fileobj, **kwargs) except (IOError, EOFError): fileobj.close() if mode == 'r': raise ReadError("not a bzip2 file") raise except: fileobj.close() raise t._extfileobj = False return t # All *open() methods are registered here. OPEN_METH = { "tar": "taropen", # uncompressed tar "gz": "gzopen", # gzip compressed tar "bz2": "bz2open" # bzip2 compressed tar } #-------------------------------------------------------------------------- # The public methods which TarFile provides: def close(self): """Close the TarFile. In write-mode, two finishing zero blocks are appended to the archive. """ if self.closed: return if self.mode in "aw": self.fileobj.write(NUL * (BLOCKSIZE * 2)) self.offset += (BLOCKSIZE * 2) # fill up the end with zero-blocks # (like option -b20 for tar does) blocks, remainder = divmod(self.offset, RECORDSIZE) if remainder > 0: self.fileobj.write(NUL * (RECORDSIZE - remainder)) if not self._extfileobj: self.fileobj.close() self.closed = True def getmember(self, name): """Return a TarInfo object for member `name'. If `name' can not be found in the archive, KeyError is raised. If a member occurs more than once in the archive, its last occurrence is assumed to be the most up-to-date version. """ tarinfo = self._getmember(name) if tarinfo is None: raise KeyError("filename %r not found" % name) return tarinfo def getmembers(self): """Return the members of the archive as a list of TarInfo objects. The list has the same order as the members in the archive. """ self._check() if not self._loaded: # if we want to obtain a list of self._load() # all members, we first have to # scan the whole archive. return self.members def getnames(self): """Return the members of the archive as a list of their names. It has the same order as the list returned by getmembers(). """ return [tarinfo.name for tarinfo in self.getmembers()] def gettarinfo(self, name=None, arcname=None, fileobj=None): """Create a TarInfo object for either the file `name' or the file object `fileobj' (using os.fstat on its file descriptor). You can modify some of the TarInfo's attributes before you add it using addfile(). If given, `arcname' specifies an alternative name for the file in the archive. """ self._check("aw") # When fileobj is given, replace name by # fileobj's real name. if fileobj is not None: name = fileobj.name # Building the name of the member in the archive. # Backward slashes are converted to forward slashes, # Absolute paths are turned to relative paths. if arcname is None: arcname = name drv, arcname = os.path.splitdrive(arcname) arcname = arcname.replace(os.sep, "/") arcname = arcname.lstrip("/") # Now, fill the TarInfo object with # information specific for the file. tarinfo = self.tarinfo() tarinfo.tarfile = self # Use os.stat or os.lstat, depending on platform # and if symlinks shall be resolved. if fileobj is None: if hasattr(os, "lstat") and not self.dereference: statres = os.lstat(name) else: statres = os.stat(name) else: statres = os.fstat(fileobj.fileno()) linkname = "" stmd = statres.st_mode if stat.S_ISREG(stmd): inode = (statres.st_ino, statres.st_dev) if not self.dereference and statres.st_nlink > 1 and \ inode in self.inodes and arcname != self.inodes[inode]: # Is it a hardlink to an already # archived file? type = LNKTYPE linkname = self.inodes[inode] else: # The inode is added only if its valid. # For win32 it is always 0. type = REGTYPE if inode[0]: self.inodes[inode] = arcname elif stat.S_ISDIR(stmd): type = DIRTYPE elif stat.S_ISFIFO(stmd): type = FIFOTYPE elif stat.S_ISLNK(stmd): type = SYMTYPE linkname = os.readlink(name) elif stat.S_ISCHR(stmd): type = CHRTYPE elif stat.S_ISBLK(stmd): type = BLKTYPE else: return None # Fill the TarInfo object with all # information we can get. tarinfo.name = arcname tarinfo.mode = stmd tarinfo.uid = statres.st_uid tarinfo.gid = statres.st_gid if type == REGTYPE: tarinfo.size = statres.st_size else: tarinfo.size = 0L tarinfo.mtime = statres.st_mtime tarinfo.type = type tarinfo.linkname = linkname if pwd: try: tarinfo.uname = pwd.getpwuid(tarinfo.uid)[0] except KeyError: pass if grp: try: tarinfo.gname = grp.getgrgid(tarinfo.gid)[0] except KeyError: pass if type in (CHRTYPE, BLKTYPE): if hasattr(os, "major") and hasattr(os, "minor"): tarinfo.devmajor = os.major(statres.st_rdev) tarinfo.devminor = os.minor(statres.st_rdev) return tarinfo def list(self, verbose=True): """Print a table of contents to sys.stdout. If `verbose' is False, only the names of the members are printed. If it is True, an `ls -l'-like output is produced. """ self._check() for tarinfo in self: if verbose: print filemode(tarinfo.mode), print "%s/%s" % (tarinfo.uname or tarinfo.uid, tarinfo.gname or tarinfo.gid), if tarinfo.ischr() or tarinfo.isblk(): print "%10s" % ("%d,%d" \ % (tarinfo.devmajor, tarinfo.devminor)), else: print "%10d" % tarinfo.size, print "%d-%02d-%02d %02d:%02d:%02d" \ % time.localtime(tarinfo.mtime)[:6], print tarinfo.name + ("/" if tarinfo.isdir() else ""), if verbose: if tarinfo.issym(): print "->", tarinfo.linkname, if tarinfo.islnk(): print "link to", tarinfo.linkname, print def add(self, name, arcname=None, recursive=True, exclude=None, filter=None): """Add the file `name' to the archive. `name' may be any type of file (directory, fifo, symbolic link, etc.). If given, `arcname' specifies an alternative name for the file in the archive. Directories are added recursively by default. This can be avoided by setting `recursive' to False. `exclude' is a function that should return True for each filename to be excluded. `filter' is a function that expects a TarInfo object argument and returns the changed TarInfo object, if it returns None the TarInfo object will be excluded from the archive. """ self._check("aw") if arcname is None: arcname = name # Exclude pathnames. if exclude is not None: import warnings warnings.warn("use the filter argument instead", DeprecationWarning, 2) if exclude(name): self._dbg(2, "tarfile: Excluded %r" % name) return # Skip if somebody tries to archive the archive... if self.name is not None and os.path.abspath(name) == self.name: self._dbg(2, "tarfile: Skipped %r" % name) return self._dbg(1, name) # Create a TarInfo object from the file. tarinfo = self.gettarinfo(name, arcname) if tarinfo is None: self._dbg(1, "tarfile: Unsupported type %r" % name) return # Change or exclude the TarInfo object. if filter is not None: tarinfo = filter(tarinfo) if tarinfo is None: self._dbg(2, "tarfile: Excluded %r" % name) return # Append the tar header and data to the archive. if tarinfo.isreg(): with bltn_open(name, "rb") as f: self.addfile(tarinfo, f) elif tarinfo.isdir(): self.addfile(tarinfo) if recursive: for f in os.listdir(name): self.add(os.path.join(name, f), os.path.join(arcname, f), recursive, exclude, filter) else: self.addfile(tarinfo) def addfile(self, tarinfo, fileobj=None): """Add the TarInfo object `tarinfo' to the archive. If `fileobj' is given, tarinfo.size bytes are read from it and added to the archive. You can create TarInfo objects using gettarinfo(). On Windows platforms, `fileobj' should always be opened with mode 'rb' to avoid irritation about the file size. """ self._check("aw") tarinfo = copy.copy(tarinfo) buf = tarinfo.tobuf(self.format, self.encoding, self.errors) self.fileobj.write(buf) self.offset += len(buf) # If there's data to follow, append it. if fileobj is not None: copyfileobj(fileobj, self.fileobj, tarinfo.size) blocks, remainder = divmod(tarinfo.size, BLOCKSIZE) if remainder > 0: self.fileobj.write(NUL * (BLOCKSIZE - remainder)) blocks += 1 self.offset += blocks * BLOCKSIZE self.members.append(tarinfo) def extractall(self, path=".", members=None): """Extract all members from the archive to the current working directory and set owner, modification time and permissions on directories afterwards. `path' specifies a different directory to extract to. `members' is optional and must be a subset of the list returned by getmembers(). """ directories = [] if members is None: members = self for tarinfo in members: if tarinfo.isdir(): # Extract directories with a safe mode. directories.append(tarinfo) tarinfo = copy.copy(tarinfo) tarinfo.mode = 0700 self.extract(tarinfo, path) # Reverse sort directories. directories.sort(key=operator.attrgetter('name')) directories.reverse() # Set correct owner, mtime and filemode on directories. for tarinfo in directories: dirpath = os.path.join(path, tarinfo.name) try: self.chown(tarinfo, dirpath) self.utime(tarinfo, dirpath) self.chmod(tarinfo, dirpath) except ExtractError, e: if self.errorlevel > 1: raise else: self._dbg(1, "tarfile: %s" % e) def extract(self, member, path=""): """Extract a member from the archive to the current working directory, using its full name. Its file information is extracted as accurately as possible. `member' may be a filename or a TarInfo object. You can specify a different directory using `path'. """ self._check("r") if isinstance(member, basestring): tarinfo = self.getmember(member) else: tarinfo = member # Prepare the link target for makelink(). if tarinfo.islnk(): tarinfo._link_target = os.path.join(path, tarinfo.linkname) try: self._extract_member(tarinfo, os.path.join(path, tarinfo.name)) except EnvironmentError, e: if self.errorlevel > 0: raise else: if e.filename is None: self._dbg(1, "tarfile: %s" % e.strerror) else: self._dbg(1, "tarfile: %s %r" % (e.strerror, e.filename)) except ExtractError, e: if self.errorlevel > 1: raise else: self._dbg(1, "tarfile: %s" % e) def extractfile(self, member): """Extract a member from the archive as a file object. `member' may be a filename or a TarInfo object. If `member' is a regular file, a file-like object is returned. If `member' is a link, a file-like object is constructed from the link's target. If `member' is none of the above, None is returned. The file-like object is read-only and provides the following methods: read(), readline(), readlines(), seek() and tell() """ self._check("r") if isinstance(member, basestring): tarinfo = self.getmember(member) else: tarinfo = member if tarinfo.isreg(): return self.fileobject(self, tarinfo) elif tarinfo.type not in SUPPORTED_TYPES: # If a member's type is unknown, it is treated as a # regular file. return self.fileobject(self, tarinfo) elif tarinfo.islnk() or tarinfo.issym(): if isinstance(self.fileobj, _Stream): # A small but ugly workaround for the case that someone tries # to extract a (sym)link as a file-object from a non-seekable # stream of tar blocks. raise StreamError("cannot extract (sym)link as file object") else: # A (sym)link's file object is its target's file object. return self.extractfile(self._find_link_target(tarinfo)) else: # If there's no data associated with the member (directory, chrdev, # blkdev, etc.), return None instead of a file object. return None def _extract_member(self, tarinfo, targetpath): """Extract the TarInfo object tarinfo to a physical file called targetpath. """ # Fetch the TarInfo object for the given name # and build the destination pathname, replacing # forward slashes to platform specific separators. targetpath = targetpath.rstrip("/") targetpath = targetpath.replace("/", os.sep) # Create all upper directories. upperdirs = os.path.dirname(targetpath) if upperdirs and not os.path.exists(upperdirs): # Create directories that are not part of the archive with # default permissions. os.makedirs(upperdirs) if tarinfo.islnk() or tarinfo.issym(): self._dbg(1, "%s -> %s" % (tarinfo.name, tarinfo.linkname)) else: self._dbg(1, tarinfo.name) if tarinfo.isreg(): self.makefile(tarinfo, targetpath) elif tarinfo.isdir(): self.makedir(tarinfo, targetpath) elif tarinfo.isfifo(): self.makefifo(tarinfo, targetpath) elif tarinfo.ischr() or tarinfo.isblk(): self.makedev(tarinfo, targetpath) elif tarinfo.islnk() or tarinfo.issym(): self.makelink(tarinfo, targetpath) elif tarinfo.type not in SUPPORTED_TYPES: self.makeunknown(tarinfo, targetpath) else: self.makefile(tarinfo, targetpath) self.chown(tarinfo, targetpath) if not tarinfo.issym(): self.chmod(tarinfo, targetpath) self.utime(tarinfo, targetpath) #-------------------------------------------------------------------------- # Below are the different file methods. They are called via # _extract_member() when extract() is called. They can be replaced in a # subclass to implement other functionality. def makedir(self, tarinfo, targetpath): """Make a directory called targetpath. """ try: # Use a safe mode for the directory, the real mode is set # later in _extract_member(). os.mkdir(targetpath, 0700) except EnvironmentError, e: if e.errno != errno.EEXIST: raise def makefile(self, tarinfo, targetpath): """Make a file called targetpath. """ source = self.extractfile(tarinfo) try: with bltn_open(targetpath, "wb") as target: copyfileobj(source, target) finally: source.close() def makeunknown(self, tarinfo, targetpath): """Make a file from a TarInfo object with an unknown type at targetpath. """ self.makefile(tarinfo, targetpath) self._dbg(1, "tarfile: Unknown file type %r, " \ "extracted as regular file." % tarinfo.type) def makefifo(self, tarinfo, targetpath): """Make a fifo called targetpath. """ if hasattr(os, "mkfifo"): os.mkfifo(targetpath) else: raise ExtractError("fifo not supported by system") def makedev(self, tarinfo, targetpath): """Make a character or block device called targetpath. """ if not hasattr(os, "mknod") or not hasattr(os, "makedev"): raise ExtractError("special devices not supported by system") mode = tarinfo.mode if tarinfo.isblk(): mode |= stat.S_IFBLK else: mode |= stat.S_IFCHR os.mknod(targetpath, mode, os.makedev(tarinfo.devmajor, tarinfo.devminor)) def makelink(self, tarinfo, targetpath): """Make a (symbolic) link called targetpath. If it cannot be created (platform limitation), we try to make a copy of the referenced file instead of a link. """ if hasattr(os, "symlink") and hasattr(os, "link"): # For systems that support symbolic and hard links. if tarinfo.issym(): if os.path.lexists(targetpath): os.unlink(targetpath) os.symlink(tarinfo.linkname, targetpath) else: # See extract(). if os.path.exists(tarinfo._link_target): if os.path.lexists(targetpath): os.unlink(targetpath) os.link(tarinfo._link_target, targetpath) else: self._extract_member(self._find_link_target(tarinfo), targetpath) else: try: self._extract_member(self._find_link_target(tarinfo), targetpath) except KeyError: raise ExtractError("unable to resolve link inside archive") def chown(self, tarinfo, targetpath): """Set owner of targetpath according to tarinfo. """ if pwd and hasattr(os, "geteuid") and os.geteuid() == 0: # We have to be root to do so. try: g = grp.getgrnam(tarinfo.gname)[2] except KeyError: g = tarinfo.gid try: u = pwd.getpwnam(tarinfo.uname)[2] except KeyError: u = tarinfo.uid try: if tarinfo.issym() and hasattr(os, "lchown"): os.lchown(targetpath, u, g) else: if sys.platform != "os2emx": os.chown(targetpath, u, g) except EnvironmentError, e: raise ExtractError("could not change owner") def chmod(self, tarinfo, targetpath): """Set file permissions of targetpath according to tarinfo. """ if hasattr(os, 'chmod'): try: os.chmod(targetpath, tarinfo.mode) except EnvironmentError, e: raise ExtractError("could not change mode") def utime(self, tarinfo, targetpath): """Set modification time of targetpath according to tarinfo. """ if not hasattr(os, 'utime'): return try: os.utime(targetpath, (tarinfo.mtime, tarinfo.mtime)) except EnvironmentError, e: raise ExtractError("could not change modification time") #-------------------------------------------------------------------------- def next(self): """Return the next member of the archive as a TarInfo object, when TarFile is opened for reading. Return None if there is no more available. """ self._check("ra") if self.firstmember is not None: m = self.firstmember self.firstmember = None return m # Read the next block. self.fileobj.seek(self.offset) tarinfo = None while True: try: tarinfo = self.tarinfo.fromtarfile(self) except EOFHeaderError, e: if self.ignore_zeros: self._dbg(2, "0x%X: %s" % (self.offset, e)) self.offset += BLOCKSIZE continue except InvalidHeaderError, e: if self.ignore_zeros: self._dbg(2, "0x%X: %s" % (self.offset, e)) self.offset += BLOCKSIZE continue elif self.offset == 0: raise ReadError(str(e)) except EmptyHeaderError: if self.offset == 0: raise ReadError("empty file") except TruncatedHeaderError, e: if self.offset == 0: raise ReadError(str(e)) except SubsequentHeaderError, e: raise ReadError(str(e)) break if tarinfo is not None: self.members.append(tarinfo) else: self._loaded = True return tarinfo #-------------------------------------------------------------------------- # Little helper methods: def _getmember(self, name, tarinfo=None, normalize=False): """Find an archive member by name from bottom to top. If tarinfo is given, it is used as the starting point. """ # Ensure that all members have been loaded. members = self.getmembers() # Limit the member search list up to tarinfo. if tarinfo is not None: members = members[:members.index(tarinfo)] if normalize: name = os.path.normpath(name) for member in reversed(members): if normalize: member_name = os.path.normpath(member.name) else: member_name = member.name if name == member_name: return member def _load(self): """Read through the entire archive file and look for readable members. """ while True: tarinfo = self.next() if tarinfo is None: break self._loaded = True def _check(self, mode=None): """Check if TarFile is still open, and if the operation's mode corresponds to TarFile's mode. """ if self.closed: raise IOError("%s is closed" % self.__class__.__name__) if mode is not None and self.mode not in mode: raise IOError("bad operation for mode %r" % self.mode) def _find_link_target(self, tarinfo): """Find the target member of a symlink or hardlink member in the archive. """ if tarinfo.issym(): # Always search the entire archive. linkname = "/".join(filter(None, (os.path.dirname(tarinfo.name), tarinfo.linkname))) limit = None else: # Search the archive before the link, because a hard link is # just a reference to an already archived file. linkname = tarinfo.linkname limit = tarinfo member = self._getmember(linkname, tarinfo=limit, normalize=True) if member is None: raise KeyError("linkname %r not found" % linkname) return member def __iter__(self): """Provide an iterator object. """ if self._loaded: return iter(self.members) else: return TarIter(self) def _dbg(self, level, msg): """Write debugging output to sys.stderr. """ if level <= self.debug: print >> sys.stderr, msg def __enter__(self): self._check() return self def __exit__(self, type, value, traceback): if type is None: self.close() else: # An exception occurred. We must not call close() because # it would try to write end-of-archive blocks and padding. if not self._extfileobj: self.fileobj.close() self.closed = True # class TarFile
xml是實現不一樣語言或程序之間進行數據交換的協議,跟json差很少,但json使用起來更簡單,不過,古時候,在json還沒誕生的黑暗年代,你們只能選擇用xml呀,至今不少傳統公司如金融行業的不少系統的接口還主要是xml。
xml的格式以下,就是經過<>節點來區別數據結構的:
<?xml version="1.0"?> <data> <country name="Liechtenstein"> <rank updated="yes">2</rank> <year>2008</year> <gdppc>141100</gdppc> <neighbor name="Austria" direction="E"/> <neighbor name="Switzerland" direction="W"/> </country> <country name="Singapore"> <rank updated="yes">5</rank> <year>2011</year> <gdppc>59900</gdppc> <neighbor name="Malaysia" direction="N"/> </country> <country name="Panama"> <rank updated="yes">69</rank> <year>2011</year> <gdppc>13600</gdppc> <neighbor name="Costa Rica" direction="W"/> <neighbor name="Colombia" direction="E"/> </country> </data>
xml協議在各個語言裏的都 是支持的,在python中能夠用如下模塊操做xml
import xml.etree.ElementTree as ET tree = ET.parse("xmltest.xml") root = tree.getroot() print(root.tag) #遍歷xml文檔 for child in root: print(child.tag, child.attrib) for i in child: print(i.tag,i.text) #只遍歷year 節點 for node in root.iter('year'): print(node.tag,node.text)
修改和刪除xml文檔內容
import xml.etree.ElementTree as ET tree = ET.parse("xmltest.xml") root = tree.getroot() #修改 for node in root.iter('year'): new_year = int(node.text) + 1 node.text = str(new_year) node.set("updated","yes") tree.write("xmltest.xml") #刪除node for country in root.findall('country'): rank = int(country.find('rank').text) if rank > 50: root.remove(country) tree.write('output.xml')
本身建立xml文檔
import xml.etree.ElementTree as ET new_xml = ET.Element("namelist") name = ET.SubElement(new_xml,"name",attrib={"enrolled":"yes"}) age = ET.SubElement(name,"age",attrib={"checked":"no"}) sex = ET.SubElement(name,"sex") sex.text = '33' name2 = ET.SubElement(new_xml,"name",attrib={"enrolled":"no"}) age = ET.SubElement(name2,"age") age.text = '19' et = ET.ElementTree(new_xml) #生成文檔對象 et.write("test.xml", encoding="utf-8",xml_declaration=True) ET.dump(new_xml) #打印生成的格式
用於生成和修改常見配置文檔,當前模塊的名稱在 python 3.x 版本中變動爲 configparser。
來看一個好多軟件的常見文檔格式以下
[DEFAULT] ServerAliveInterval = 45 Compression = yes CompressionLevel = 9 ForwardX11 = yes [bitbucket.org] User = hg [topsecret.server.com] Port = 50022 ForwardX11 = no
若是想用python生成一個這樣的文檔怎麼作呢?
import configparser config = configparser.ConfigParser() config["DEFAULT"] = {'ServerAliveInterval': '45', 'Compression': 'yes', 'CompressionLevel': '9'} config['bitbucket.org'] = {} config['bitbucket.org']['User'] = 'hg' config['topsecret.server.com'] = {} topsecret = config['topsecret.server.com'] topsecret['Host Port'] = '50022' # mutates the parser topsecret['ForwardX11'] = 'no' # same here config['DEFAULT']['ForwardX11'] = 'yes' with open('example.ini', 'w') as configfile: config.write(configfile)
寫完了還能夠再讀出來哈。
>>> import configparser >>> config = configparser.ConfigParser() >>> config.sections() [] >>> config.read('example.ini') ['example.ini'] >>> config.sections() ['bitbucket.org', 'topsecret.server.com'] >>> 'bitbucket.org' in config True >>> 'bytebong.com' in config False >>> config['bitbucket.org']['User'] 'hg' >>> config['DEFAULT']['Compression'] 'yes' >>> topsecret = config['topsecret.server.com'] >>> topsecret['ForwardX11'] 'no' >>> topsecret['Port'] '50022' >>> for key in config['bitbucket.org']: print(key) ... user compressionlevel serveraliveinterval compression forwardx11 >>> config['bitbucket.org']['ForwardX11'] 'yes'
configparser增刪改查語法
[section1] k1 = v1 k2:v2 [section2] k1 = v1 import ConfigParser config = ConfigParser.ConfigParser() config.read('i.cfg') # ########## 讀 ########## #secs = config.sections() #print secs #options = config.options('group2') #print options #item_list = config.items('group2') #print item_list #val = config.get('group1','key') #val = config.getint('group1','key') # ########## 改寫 ########## #sec = config.remove_section('group1') #config.write(open('i.cfg', "w")) #sec = config.has_section('wupeiqi') #sec = config.add_section('wupeiqi') #config.write(open('i.cfg', "w")) #config.set('group2','k1',11111) #config.write(open('i.cfg', "w")) #config.remove_option('group2','age') #config.write(open('i.cfg', "w"))
能夠執行shell命令的相關模塊和函數有:
import commands result = commands.getoutput('cmd') result = commands.getstatus('cmd') result = commands.getstatusoutput('cmd')
以上執行shell命令的相關的模塊和函數的功能均在 subprocess 模塊中實現,並提供了更豐富的功能。
call
執行命令,返回狀態碼
ret = subprocess.call(["ls", "-l"], shell=False) ret = subprocess.call("ls -l", shell=True)
shell = True ,容許 shell 命令是字符串形式
check_call
執行命令,若是執行狀態碼是 0 ,則返回0,不然拋異常
subprocess.check_call(["ls", "-l"]) subprocess.check_call("exit 1", shell=True)
check_output
執行命令,若是狀態碼是 0 ,則返回執行結果,不然拋異常
subprocess.check_output(["echo", "Hello World!"]) subprocess.check_output("exit 1", shell=True)
subprocess.Popen(...)
用於執行復雜的系統命令
參數:
import subprocess ret1 = subprocess.Popen(["mkdir","t1"]) ret2 = subprocess.Popen("mkdir t2", shell=True)
終端輸入的命令分爲兩種:
import subprocess obj = subprocess.Popen("mkdir t3", shell=True, cwd='/home/dev',)
import subprocess obj = subprocess.Popen(["python"], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE) obj.stdin.write('print 1 \n ') obj.stdin.write('print 2 \n ') obj.stdin.write('print 3 \n ') obj.stdin.write('print 4 \n ') obj.stdin.close() cmd_out = obj.stdout.read() obj.stdout.close() cmd_error = obj.stderr.read() obj.stderr.close() print cmd_out print cmd_error
import subprocess obj = subprocess.Popen(["python"], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE) obj.stdin.write('print 1 \n ') obj.stdin.write('print 2 \n ') obj.stdin.write('print 3 \n ') obj.stdin.write('print 4 \n ') out_error_list = obj.communicate() print out_error_list
import subprocess obj = subprocess.Popen(["python"], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE) out_error_list = obj.communicate('print "hello"') print out_error_list
更多猛擊這裏
不少程序都有記錄日誌的需求,而且日誌中包含的信息即有正常的程序訪問日誌,還可能有錯誤、警告等信息輸出,python的logging模塊提供了標準的日誌接口,你能夠經過它存儲各類格式的日誌,logging的日誌能夠分爲 debug()
, info()
, warning()
, error()
and critical() 5個級別,
下面咱們看一下怎麼用。
最簡單用法
import logging logging.warning("user [alex] attempted wrong password more than 3 times") logging.critical("server is down") #輸出 WARNING:root:user [alex] attempted wrong password more than 3 times CRITICAL:root:server is down
看一下這幾個日誌級別分別表明什麼意思
若是想把日誌寫到文件裏,也很簡單
import logging logging.basicConfig(filename='example.log',level=logging.INFO) logging.debug('This message should go to the log file') logging.info('So should this') logging.warning('And this, too')
其中下面這句中的level=loggin.INFO意思是,把日誌紀錄級別設置爲INFO,也就是說,只有比日誌是INFO或比INFO級別更高的日誌纔會被紀錄到文件裏,在這個例子, 第一條日誌是不會被紀錄的,若是但願紀錄debug的日誌,那把日誌級別改爲DEBUG就好了。
logging.basicConfig(filename='example.log',level=logging.INFO)
感受上面的日誌格式忘記加上時間啦,日誌不知道時間怎麼行呢,下面就來加上!
import logging logging.basicConfig(format='%(asctime)s %(message)s', datefmt='%m/%d/%Y %I:%M:%S %p') logging.warning('is when this event was logged.') #輸出 12/12/2010 11:46:36 AM is when this event was logged.
若是想同時把log打印在屏幕和文件日誌裏,就須要瞭解一點複雜的知識 了
Python 使用logging模塊記錄日誌涉及四個主要類,使用官方文檔中的歸納最爲合適:
logger提供了應用程序能夠直接使用的接口;
handler將(logger建立的)日誌記錄發送到合適的目的輸出;
filter提供了細度設備來決定輸出哪條日誌記錄;
formatter決定日誌記錄的最終輸出格式。
logger
每一個程序在輸出信息以前都要得到一個Logger。Logger一般對應了程序的模塊名,好比聊天工具的圖形界面模塊能夠這樣得到它的Logger:
LOG=logging.getLogger(」chat.gui」)
而核心模塊能夠這樣:
LOG=logging.getLogger(」chat.kernel」)
Logger.setLevel(lel):指定最低的日誌級別,低於lel的級別將被忽略。debug是最低的內置級別,critical爲最高
Logger.addFilter(filt)、Logger.removeFilter(filt):添加或刪除指定的filter
Logger.addHandler(hdlr)、Logger.removeHandler(hdlr):增長或刪除指定的handler
Logger.debug()、Logger.info()、Logger.warning()、Logger.error()、Logger.critical():能夠設置的日誌級別
handler
handler對象負責發送相關的信息到指定目的地。Python的日誌系統有多種Handler可使用。有些Handler能夠把信息輸出到控制檯,有些Logger能夠把信息輸出到文件,還有些 Handler能夠把信息發送到網絡上。若是以爲不夠用,還能夠編寫本身的Handler。能夠經過addHandler()方法添加多個多handler
Handler.setLevel(lel):指定被處理的信息級別,低於lel級別的信息將被忽略
Handler.setFormatter():給這個handler選擇一個格式
Handler.addFilter(filt)、Handler.removeFilter(filt):新增或刪除一個filter對象
每一個Logger能夠附加多個Handler。接下來咱們就來介紹一些經常使用的Handler:
1) logging.StreamHandler
使用這個Handler能夠向相似與sys.stdout或者sys.stderr的任何文件對象(file object)輸出信息。它的構造函數是:
StreamHandler([strm])
其中strm參數是一個文件對象。默認是sys.stderr
2) logging.FileHandler
和StreamHandler相似,用於向一個文件輸出日誌信息。不過FileHandler會幫你打開這個文件。它的構造函數是:
FileHandler(filename[,mode])
filename是文件名,必須指定一個文件名。
mode是文件的打開方式。參見Python內置函數open()的用法。默認是’a',即添加到文件末尾。
3) logging.handlers.RotatingFileHandler
這個Handler相似於上面的FileHandler,可是它能夠管理文件大小。當文件達到必定大小以後,它會自動將當前日誌文件更名,而後建立 一個新的同名日誌文件繼續輸出。好比日誌文件是chat.log。當chat.log達到指定的大小以後,RotatingFileHandler自動把 文件更名爲chat.log.1。不過,若是chat.log.1已經存在,會先把chat.log.1重命名爲chat.log.2。。。最後從新建立 chat.log,繼續輸出日誌信息。它的構造函數是:
RotatingFileHandler( filename[, mode[, maxBytes[, backupCount]]])
其中filename和mode兩個參數和FileHandler同樣。
maxBytes用於指定日誌文件的最大文件大小。若是maxBytes爲0,意味着日誌文件能夠無限大,這時上面描述的重命名過程就不會發生。
backupCount用於指定保留的備份文件的個數。好比,若是指定爲2,當上面描述的重命名過程發生時,原有的chat.log.2並不會被改名,而是被刪除。
4) logging.handlers.TimedRotatingFileHandler
這個Handler和RotatingFileHandler相似,不過,它沒有經過判斷文件大小來決定什麼時候從新建立日誌文件,而是間隔必定時間就 自動建立新的日誌文件。重命名的過程與RotatingFileHandler相似,不過新的文件不是附加數字,而是當前時間。它的構造函數是:
TimedRotatingFileHandler( filename [,when [,interval [,backupCount]]])
其中filename參數和backupCount參數和RotatingFileHandler具備相同的意義。
interval是時間間隔。
when參數是一個字符串。表示時間間隔的單位,不區分大小寫。它有如下取值:
S 秒
M 分
H 小時
D 天
W 每星期(interval==0時表明星期一)
midnight 天天凌晨
import logging #create logger logger = logging.getLogger('TEST-LOG') logger.setLevel(logging.DEBUG) # create console handler and set level to debug ch = logging.StreamHandler() ch.setLevel(logging.DEBUG) # create file handler and set level to warning fh = logging.FileHandler("access.log") fh.setLevel(logging.WARNING) # create formatter formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s') # add formatter to ch and fh ch.setFormatter(formatter) fh.setFormatter(formatter) # add ch and fh to logger logger.addHandler(ch) logger.addHandler(fh) # 'application' code logger.debug('debug message') logger.info('info message') logger.warn('warn message') logger.error('error message') logger.critical('critical message')
文件自動截斷例子
import logging from logging import handlers logger = logging.getLogger(__name__) log_file = "timelog.log" #fh = handlers.RotatingFileHandler(filename=log_file,maxBytes=10,backupCount=3) fh = handlers.TimedRotatingFileHandler(filename=log_file,when="S",interval=5,backupCount=3) formatter = logging.Formatter('%(asctime)s %(module)s:%(lineno)d %(message)s') fh.setFormatter(formatter) logger.addHandler(fh) logger.warning("test1") logger.warning("test12") logger.warning("test13") logger.warning("test14")
經常使用正則表達式符號
'.' 默認匹配除\n以外的任意一個字符,若指定flag DOTALL,則匹配任意字符,包括換行 '^' 匹配字符開頭,若指定flags MULTILINE,這種也能夠匹配上(r"^a","\nabc\neee",flags=re.MULTILINE) '$' 匹配字符結尾,或e.search("foo$","bfoo\nsdfsf",flags=re.MULTILINE).group()也能夠 '*' 匹配*號前的字符0次或屢次,re.findall("ab*","cabb3abcbbac") 結果爲['abb', 'ab', 'a'] '+' 匹配前一個字符1次或屢次,re.findall("ab+","ab+cd+abb+bba") 結果['ab', 'abb'] '?' 匹配前一個字符1次或0次 '{m}' 匹配前一個字符m次 '{n,m}' 匹配前一個字符n到m次,re.findall("ab{1,3}","abb abc abbcbbb") 結果'abb', 'ab', 'abb'] '|' 匹配|左或|右的字符,re.search("abc|ABC","ABCBabcCD").group() 結果'ABC' '(...)' 分組匹配,re.search("(abc){2}a(123|456)c", "abcabca456c").group() 結果 abcabca456c '\A' 只從字符開頭匹配,re.search("\Aabc","alexabc") 是匹配不到的 '\Z' 匹配字符結尾,同$ '\d' 匹配數字0-9 '\D' 匹配非數字 '\w' 匹配[A-Za-z0-9] '\W' 匹配非[A-Za-z0-9] 's' 匹配空白字符、\t、\n、\r , re.search("\s+","ab\tc1\n3").group() 結果 '\t' '(?P<name>...)' 分組匹配 re.search("(?P<province>[0-9]{4})(?P<city>[0-9]{2})(?P<birthday>[0-9]{4})","371481199306143242").groupdict("city") 結果{'province': '3714', 'city': '81', 'birthday': '1993'}
最經常使用的匹配語法
re.match 從頭開始匹配 re.search 匹配包含 re.findall 把全部匹配到的字符放到以列表中的元素返回 re.splitall 以匹配到的字符當作列表分隔符 re.sub 匹配字符並替換
反斜槓的困擾
與大多數編程語言相同,正則表達式裏使用"\"做爲轉義字符,這就可能形成反斜槓困擾。假如你須要匹配文本中的字符"\",那麼使用編程語言表示的正則表達式裏將須要4個反斜槓"\\\\":前兩個和後兩個分別用於在編程語言裏轉義成反斜槓,轉換成兩個反斜槓後再在正則表達式裏轉義成一個反斜槓。Python裏的原生字符串很好地解決了這個問題,這個例子中的正則表達式可使用r"\\"表示。一樣,匹配一個數字的"\\d"能夠寫成r"\d"。有了原生字符串,你不再用擔憂是否是漏寫了反斜槓,寫出來的表達式也更直觀。
僅需輕輕知道的幾個匹配模式
re.I(re.IGNORECASE): 忽略大小寫(括號內是完整寫法,下同) M(MULTILINE): 多行模式,改變'^'和'$'的行爲(參見上圖) S(DOTALL): 點任意匹配模式,改變'.'的行爲
開發一個簡單的python計算器
hint:
re.search(r'\([^()]+\)',s).group()
'(-40/5)'
爲何要設計好目錄結構?
"設計項目目錄結構",就和"代碼編碼風格"同樣,屬於我的風格問題。對於這種風格上的規範,一直都存在兩種態度:
我是比較偏向於後者的,由於我是前一類同窗思想行爲下的直接受害者。我曾經維護過一個很是很差讀的項目,其實現的邏輯並不複雜,可是卻耗費了我很是長的時間去理解它想表達的意思。今後我我的對於提升項目可讀性、可維護性的要求就很高了。"項目目錄結構"其實也是屬於"可讀性和可維護性"的範疇,咱們設計一個層次清晰的目錄結構,就是爲了達到如下兩點:
因此,我認爲,保持一個層次清晰的目錄結構是有必要的。更況且組織一個良好的工程目錄,實際上是一件很簡單的事兒。
目錄組織方式
關於如何組織一個較好的Python工程目錄結構,已經有一些獲得了共識的目錄結構。在Stackoverflow的這個問題上,能看到你們對Python目錄結構的討論。
這裏面說的已經很好了,我也不打算從新造輪子列舉各類不一樣的方式,這裏面我說一下個人理解和體會。
假設你的項目名爲foo, 我比較建議的最方便快捷目錄結構這樣就足夠了:
Foo/ |-- bin/ | |-- foo | |-- foo/ | |-- tests/ | | |-- __init__.py | | |-- test_main.py | | | |-- __init__.py | |-- main.py | |-- docs/ | |-- conf.py | |-- abc.rst | |-- setup.py |-- requirements.txt |-- README
簡要解釋一下:
bin/
: 存放項目的一些可執行文件,固然你能夠起名script/
之類的也行。foo/
: 存放項目的全部源代碼。(1) 源代碼中的全部模塊、包都應該放在此目錄。不要置於頂層目錄。(2) 其子目錄tests/
存放單元測試代碼; (3) 程序的入口最好命名爲main.py
。docs/
: 存放一些文檔。setup.py
: 安裝、部署、打包的腳本。requirements.txt
: 存放軟件依賴的外部Python包列表。README
: 項目說明文件。除此以外,有一些方案給出了更加多的內容。好比LICENSE.txt
,ChangeLog.txt
文件等,我沒有列在這裏,由於這些東西主要是項目開源的時候須要用到。若是你想寫一個開源軟件,目錄該如何組織,能夠參考這篇文章。
下面,再簡單講一下我對這些目錄的理解和我的要求吧。
關於README的內容
這個我以爲是每一個項目都應該有的一個文件,目的是能簡要描述該項目的信息,讓讀者快速瞭解這個項目。
它須要說明如下幾個事項:
我以爲有以上幾點是比較好的一個README
。在軟件開發初期,因爲開發過程當中以上內容可能不明確或者發生變化,並非必定要在一開始就將全部信息都補全。可是在項目完結的時候,是須要撰寫這樣的一個文檔的。
能夠參考Redis源碼中Readme的寫法,這裏面簡潔可是清晰的描述了Redis功能和源碼結構。
關於requirements.txt和setup.py
setup.py
通常來講,用setup.py
來管理代碼的打包、安裝、部署問題。業界標準的寫法是用Python流行的打包工具setuptools來管理這些事情。這種方式廣泛應用於開源項目中。不過這裏的核心思想不是用標準化的工具來解決這些問題,而是說,一個項目必定要有一個安裝部署工具,能快速便捷的在一臺新機器上將環境裝好、代碼部署好和將程序運行起來。
這個我是踩過坑的。
我剛開始接觸Python寫項目的時候,安裝環境、部署代碼、運行程序這個過程全是手動完成,遇到過如下問題:
setup.py
能夠將這些事情自動化起來,提升效率、減小出錯的機率。"複雜的東西自動化,能自動化的東西必定要自動化。"是一個很是好的習慣。
setuptools的文檔比較龐大,剛接觸的話,可能不太好找到切入點。學習技術的方式就是看他人是怎麼用的,能夠參考一下Python的一個Web框架,flask是如何寫的: setup.py
固然,簡單點本身寫個安裝腳本(deploy.sh
)替代setup.py
也何嘗不可。
requirements.txt
這個文件存在的目的是:
setup.py
安裝依賴時漏掉軟件包。這個文件的格式是每一行包含一個包依賴的說明,一般是flask>=0.10
這種格式,要求是這個格式能被pip
識別,這樣就能夠簡單的經過 pip install -r requirements.txt
來把全部Python包依賴都裝好了。具體格式說明: 點這裏。
關於配置文件的使用方法
注意,在上面的目錄結構中,沒有將conf.py
放在源碼目錄下,而是放在docs/
目錄下。
不少項目對配置文件的使用作法是:
import conf
這種形式來在代碼中使用配置。這種作法我不太贊同:
conf.py
這個文件。因此,我認爲配置的使用,更好的方式是,
可以佐證這個思想的是,用過nginx和mysql的同窗都知道,nginx、mysql這些程序均可以自由的指定用戶配置。
因此,不該當在代碼中直接import conf
來使用配置文件。上面目錄結構中的conf.py
,是給出的一個配置樣例,不是在寫死在程序中直接引用的配置文件。能夠經過給main.py
啓動參數指定配置路徑的方式來讓程序讀取配置內容。固然,這裏的conf.py
你能夠換個相似的名字,好比settings.py
。或者你也可使用其餘格式的內容來編寫配置文件,好比settings.yaml
之類的。