"We'd like to pretend that 'Fredrik' is a role, but even hundreds of volunteers couldn't possibly keep up. No, 'Fredrik' is the result of crossing an http server with a spam filter with an emacs whatsit and some other stuff besides."
-Gordon McMillan, June 1998
Python 2.0發佈附帶了一個包含200個以上模塊的可擴展的標準庫. 本書簡要地介紹每一個模塊並提供至少一個例子來講明如何使用它. 本書一共包含360個例子.html
"Those people who have nothing better to do than post on the Internet all day long are rarely the ones who have the most insights."
- Jakob Nielsen, December 1998
五年前我偶然遇到了 Python, 開始了個人 Python 之旅, 我花費了大量的時間 在 comp.lang.python
新聞組裏回答問題. 也許某我的發現一個模塊正是他想要的, 可是殊不知道如何使用它. 也許某我的爲他的任務挑選的不合適的模塊. 也許某我的已經厭 倦了發明新輪子. 大多時候, 一個簡短的例子要比一份手冊文檔更有幫助.java
本書是超過3,000個新聞組討論的精華部分, 固然也有不少的新腳本, 爲了涵蓋標準庫的每一個角落.node
我盡力使得每一個腳本都易於理解, 易於重用代碼. 我有意縮短註釋的長度, 若是你想更深刻地 瞭解背景, 那麼你能夠參閱每一個 Python 發佈中的參考手冊. 本書的重要之處在於範例代碼.python
咱們歡迎任何評論, 建議, 以及 bug 報告, 請將它們發送到 fredrik@pythonware.com. 我將閱讀盡我所能閱讀全部的郵件, 但可能回覆不是那麼及時.mysql
本書的相關更新內容以及其餘信息請訪問 http://www.pythonware.com/people/fredrik/librarybook.htmreact
爲何沒有Tkinter?linux
本書涵蓋了整個標準庫, 除了(可選的)Tkinter ui(user-interface : 用戶界面) 庫. 有不少緣由, 更可能是由於時間, 本書的空間, 以及我正在寫另外一本關於 Tkinter 的書.ios
關於這些書的信息, 請訪問 http://www.pythonware.com/people/fredrik/tkinterbook.htm. (不用看了,又一404)git
產品細節程序員
本書使用DocBook SGML編寫, 我使用了一系列的工具, 包括Secret Labs' PythonWorks, Excosoft Documentor, James Clark's Jade DSSSL processor, Norm Walsh's DocBook stylesheets, 固然,還有一些 Python 腳本.
感謝幫忙校對的人們: Tim Peters, Guido van Rossum, David Ascher, Mark Lutz, 和 Rael Dornfest, 以及 PythonWare 成員: Matthew Ellis, Håkan Karlsson, 和 Rune Uhlin.
感謝 Lenny Muellner, 他幫助我把SGML文件轉變爲大家如今所看到的這本書, 以及Christien Shangraw, 他將那些代碼文件集合起來作成了隨書CD (能夠在http://examples.oreilly.com/pythonsl 找到, 居然沒有404, 奇蹟).
本書使用如下習慣用法:
斜體
用於文件名和命令. 還用於定義術語.
等寬字體 e.g. Python
用於代碼以及方法,模塊,操做符,函數,語句,屬性等的名稱.
等寬粗體
用於代碼執行結果.
除非提到,全部例子均可以在 Python 1.5.2 和 Python 2.0 下運行. 能不能在 Python 2.4/2.5 下執行.....看參與翻譯各位的了.
除了一些平臺相關模塊的腳本, 全部例子均可以在 Windows, Solaris, 以及 Linux 下正常執行.
全部代碼都是有版權的. 固然,你能夠自由地使用這些這些模塊,別忘記你是從哪獲得(?學會)這些的.
大多例子的文件名都包含它所使用的模塊名稱,後邊是 "-example-
" 以及一個惟一的"序號". 注意有些例子並非按順序出現的, 這是爲了匹配本書的較早版本 -(the eff-bot guide to) The Standard Python Library.
你能夠在網上找到本書附帶CD的內容 (參閱 http://examples.oreilly.com/pythonsl). 更多信息以及更新內容參閱http://www.pythonware.com/people/fredrik/librarybook.htm. (ft, 又一404. 你們必定不要看~)
Python 江湖 QQ 羣: 43680167
Feather (校對) QQ: 85660100
"Since the functions in the C runtime library are not part of the Win32 API, we believe the number of applications that will be affected by this bug to be very limited."
- Microsoft, January 1999
Python 的標準庫包括了不少的模塊, 從 Python 語言自身特定的類型和聲明, 到一些只用於少數程序的不著名的模塊.
本章描述了一些基本的標準庫模塊. 任何大型 Python 程序都有可能直接或間接地使用到這類模塊的大部分.
下面的這兩個模塊比其餘模塊加在一塊兒還要重要: 定義內建函數(例如 len, int, range ...)的 _ _builtin_ _
模塊, 以及定義全部內建異常的 exceptions
模塊.
Python 在啓動時導入這兩個模塊, 使任何程序都可以使用它們.
Python 有許多使用了 POSIX 標準 API 和標準 C 語言庫的模塊. 它們爲底層操做系統提供了平臺獨立的接口.
這類的模塊包括: 提供文件和進程處理功能的 os
模塊; 提供平臺獨立的文件名處理 (分拆目錄名, 文件名, 後綴等)的 os.path
模塊; 以及時間日期處理相關的time/datetime
模塊.
[!Feather注: datetime 爲 Py2.3 新增模塊, 提供加強的時間處理方法 ]
延伸一點說, 網絡和線程模塊一樣也能夠歸爲這一個類型. 不過 Python 並無在全部的平臺/版本實現這些.
標準庫裏有許多用於支持內建類型操做的庫. string
模塊實現了經常使用的字符串處理. math
模塊提供了數學計算操做和常量(pi, e都屬於這類常量), cmath
模塊爲複數提供了和 math
同樣的功能.
re
模塊爲 Python 提供了正則表達式支持. 正則表達式是用於匹配字符串或特定子字符串的 有特定語法的字符串模式.
sys 模塊可讓你訪問解釋器相關參數,好比模塊搜索路徑,解釋器版本號等. operator
模塊提供了和內建操做符做用相同的函數. copy
模塊容許 你複製對象, Python 2.0 新加入的 gc
模塊提供了對垃圾收集的相關控制功能.
這個模塊包含 Python 中使用的內建函數. 通常不用手動導入這個模塊; Python會幫你作好一切.
Python容許你實時地建立函數參數列表. 只要把全部的參數放入一個元組中, 而後經過內建的 apply
函數調用函數. 如 Example 1-1.
File: builtin-apply-example-1.py
def function(a, b):
print a, b
apply(function, ("whither", "canada?"))
apply(function, (1, 2 + 3))
whither canada?
1 5
要想把關鍵字參數傳遞給一個函數, 你能夠將一個字典做爲 apply
函數的第 3 個參數, 參考 Example 1-2.
File: builtin-apply-example-2.py
def function(a, b):
print a, b
apply(function, ("crunchy", "frog"))
apply(function, ("crunchy",), {"b": "frog"})
apply(function, (), {"a": "crunchy", "b": "frog"})
crunchy frog
crunchy frog
crunchy frog
apply
函數的一個常見用法是把構造函數參數從子類傳遞到基類, 尤爲是構造函數須要接受不少參數的時候. 如 Example 1-3 所示.
File: builtin-apply-example-3.py
class Rectangle:
def _ _init_ _(self, color="white", width=10, height=10):
print "create a", color, self, "sized", width, "x", height
class RoundedRectangle(Rectangle):
def _ _init_ _(self, **kw):
apply(Rectangle._ _init_ _, (self,), kw)
rect = Rectangle(color="green", height=100, width=100)
rect = RoundedRectangle(color="blue", height=20)
create a green <Rectangle instance at 8c8260> sized 100 x 100
create a blue <RoundedRectangle instance at 8c84c0> sized 10 x 20
Python 2.0 提供了另個方法來作相同的事. 你只須要使用一個傳統的函數調用 , 使用 *
來標記元組, **
來標記字典.
下面兩個語句是等價的:
result = function(*args, **kwargs)
result = apply(function, args, kwargs)
若是你寫過較龐大的 Python 程序, 那麼你就應該知道 import
語句是用來導入外部模塊的 (固然也可使用 from-import
版本). 不過你可能不知道 import
實際上是靠調用內建 函數 _ _import_ _
來工做的.
經過這個戲法你能夠動態地調用函數. 當你只知道模塊名稱(字符串)的時候, 這將很方便. Example 1-4 展現了這種用法, 動態地導入全部以 "-plugin
" 結尾的模塊.
File: builtin-import-example-1.py
import glob, os
modules = []
for module_file in glob.glob("*-plugin.py"):
try:
module_name, ext = os.path.splitext(os.path.basename(module_file))
module = _ _import_ _(module_name)
modules.append(module)
except ImportError:
pass # ignore broken modules
# say hello to all modules
for module in modules:
module.hello()
example-plugin says hello
注意這個 plug-in 模塊文件名中有個 "-" (hyphens). 這意味着你不能使用普通的 import
命令, 由於 Python 的辨識符不容許有 "-" .
Example 1-5 展現了 Example 1-4 中使用的 plug-in .
File: example-plugin.py
def hello():
print "example-plugin says hello"
Example 1-6 展現瞭如何根據給定模塊名和函數名得到想要的函數對象.
File: builtin-import-example-2.py
def getfunctionbyname(module_name, function_name):
module = _ _import_ _(module_name)
return getattr(module, function_name)
print repr(getfunctionbyname("dumbdbm", "open"))
<function open at 794fa0>
你也可使用這個函數實現延遲化的模塊導入 (lazy module loading). 例如在 Example 1-7 中 的 string
模塊只在第一次使用的時候導入.
File: builtin-import-example-3.py
class LazyImport:
def _ _init_ _(self, module_name):
self.module_name = module_name
self.module = None
def _ _getattr_ _(self, name):
if self.module is None:
self.module = _ _import_ _(self.module_name)
return getattr(self.module, name)
string = LazyImport("string")
print string.lowercase
abcdefghijklmnopqrstuvwxyz
Python 也提供了從新加載已加載模塊的基本支持. [Example 1-8 #eg-1-8 會加載 3 次 hello.py 文件.
File: builtin-reload-example-1.py
import hello
reload(hello)
reload(hello)
hello again, and welcome to the show
hello again, and welcome to the show
hello again, and welcome to the show
reload 直接接受模塊做爲參數.
[!Feather 注: ^ 原句沒法理解, 稍後討論.]
注意,當你重加載模塊時, 它會被從新編譯, 新的模塊會代替模塊字典裏的老模塊. 可是, 已經用原模塊裏的類創建的實例仍然使用的是老模塊(不會被更新).
一樣地, 使用 from-import
直接建立的到模塊內容的引用也是不會被更新的.
dir
返回由給定模塊, 類, 實例, 或其餘類型的全部成員組成的列表. 這可能在交互式 Python 解釋器下頗有用, 也能夠用在其餘地方. Example 1-9展現了 dir
函數的用法.
File: builtin-dir-example-1.py
def dump(value):
print value, "=>", dir(value)
import sys
dump(0)
dump(1.0)
dump(0.0j) # complex number
dump([]) # list
dump({}) # dictionary
dump("string")
dump(len) # function
dump(sys) # module
0 => []
1.0 => []
0j => ['conjugate', 'imag', 'real']
[] => ['append', 'count', 'extend', 'index', 'insert',
'pop', 'remove', 'reverse', 'sort']
{} => ['clear', 'copy', 'get', 'has_key', 'items',
'keys', 'update', 'values']
string => []
<built-in function len> => ['_ _doc_ _', '_ _name_ _', '_ _self_ _']
<module 'sys' (built-in)> => ['_ _doc_ _', '_ _name_ _',
'_ _stderr_ _', '_ _stdin_ _', '_ _stdout_ _', 'argv',
'builtin_module_names', 'copyright', 'dllhandle',
'exc_info', 'exc_type', 'exec_prefix', 'executable',
...
在例子 Example 1-10中定義的 getmember
函數返回給定類定義的全部類級別的屬性和方法.
File: builtin-dir-example-2.py
class A:
def a(self):
pass
def b(self):
pass
class B(A):
def c(self):
pass
def d(self):
pass
def getmembers(klass, members=None):
# get a list of all class members, ordered by class
if members is None:
members = []
for k in klass._ _bases_ _:
getmembers(k, members)
for m in dir(klass):
if m not in members:
members.append(m)
return members
print getmembers(A)
print getmembers(B)
print getmembers(IOError)
['_ _doc_ _', '_ _module_ _', 'a', 'b']
['_ _doc_ _', '_ _module_ _', 'a', 'b', 'c', 'd']
['_ _doc_ _', '_ _getitem_ _', '_ _init_ _', '_ _module_ _', '_ _str_ _']
getmembers
函數返回了一個有序列表. 成員在列表中名稱出現的越早, 它所處的類層次就越高. 若是無所謂順序的話, 你可使用字典代替列表.
[!Feather 注: 字典是無序的, 而列表和元組是有序的, 網上有關於有序字典的討論]
vars
函數與此類似, 它返回的是包含每一個成員當前值的字典. 若是你使用不帶參數的 vars
, 它將返回當前局部名稱空間的可見元素(同 locals()
函數 ). 如Example 1-11所表示.
File: builtin-vars-example-1.py
book = "library2"
pages = 250
scripts = 350
print "the %(book)s book contains more than %(scripts)s scripts" % vars()
the library book contains more than 350 scripts
Python 是一種動態類型語言, 這意味着給一個定變量名能夠在不一樣的場合綁定到不一樣的類型上. 在接下面例子中, 一樣的函數分別被整數, 浮點數, 以及一個字符串調用:
def function(value):
print value
function(1)
function(1.0)
function("one")
type
函數 (如 Example 1-12 所示) 容許你檢查一個變量的類型. 這個函數會返回一個 type descriptor (類型描述符), 它對於 Python 解釋器提供的每一個類型都是不一樣的.
File: builtin-type-example-1.py
def dump(value):
print type(value), value
dump(1)
dump(1.0)
dump("one")
<type 'int'> 1
<type 'float'> 1.0
<type 'string'> one
每一個類型都有一個對應的類型對象, 因此你可使用 is
操做符 (對象身份?) 來 檢查類型. (如 Example 1-13所示).
File: builtin-type-example-2.py
def load(file):
if isinstance(file, type("")):
file = open(file, "rb")
return file.read()
print len(load("samples/sample.jpg")), "bytes"
print len(load(open("samples/sample.jpg", "rb"))), "bytes"
4672 bytes
4672 bytes
callable
函數, 如 Example 1-14 所示, 能夠檢查一個對象是不是可調用的 (不管是直接調用或是經過 apply
). 對於函數, 方法, lambda
函式, 類, 以及實現了 _ _call_ _
方法的類實例, 它都返回 True.
File: builtin-callable-example-1.py
def dump(function):
if callable(function):
print function, "is callable"
else:
print function, "is *not* callable"
class A:
def method(self, value):
return value
class B(A):
def _ _call_ _(self, value):
return value
a = A()
b = B()
dump(0) # simple objects
dump("string")
dump(callable)
dump(dump) # function
dump(A) # classes
dump(B)
dump(B.method)
dump(a) # instances
dump(b)
dump(b.method)
0 is *not* callable
string is *not* callable
<built-in function callable> is callable
<function dump at 8ca320> is callable
A is callable
B is callable
<unbound method A.method> is callable
<A instance at 8caa10> is *not* callable
<B instance at 8cab00> is callable
<method A.method of B instance at 8cab00> is callable
注意類對象 (A 和 B) 都是可調用的; 若是調用它們, 就產生新的對象(類實例). 可是 A 類的實例不可調用, 由於它的類沒有實現 _ _call_ _
方法.
你能夠在 operator
模塊中找到檢查對象是否爲某一內建類型(數字, 序列, 或者字典等) 的函數. 可是, 由於建立一個類很簡單(好比實現基本序列方法的類), 因此對這些 類型使用顯式的類型判斷並非好主意.
在處理類和實例的時候會複雜些. Python 不會把類做爲本質上的類型對待; 相反地, 全部的類都屬於一個特殊的類類型(special class type), 全部的類實例屬於一個特殊的實例類型(special instance type).
這意味着你不能使用 type
函數來測試一個實例是否屬於一個給定的類; 全部的實例都是一樣 的類型! 爲了解決這個問題, 你可使用 isinstance
函數,它會檢查一個對象是 不是給定類(或其子類)的實例. Example 1-15 展現了 isinstance
函數的使用.
File: builtin-isinstance-example-1.py
class A:
pass
class B:
pass
class C(A):
pass
class D(A, B):
pass
def dump(object):
print object, "=>",
if isinstance(object, A):
print "A",
if isinstance(object, B):
print "B",
if isinstance(object, C):
print "C",
if isinstance(object, D):
print "D",
a = A()
b = B()
c = C()
d = D()
dump(a)
dump(b)
dump(c)
dump(d)
dump(0)
dump("string")
<A instance at 8ca6d0> => A
<B instance at 8ca750> => B
<C instance at 8ca780> => A C
<D instance at 8ca7b0> => A B D
0 =>
string =>
issubclass
函數與此類似, 它用於檢查一個類對象是否與給定類相同, 或者是給定類的子類. 如 Example 1-16 所示.
注意, isinstance
能夠接受任何對象做爲參數, 而 issubclass
函數在接受非類對象參 數時會引起 TypeError 異常.
File: builtin-issubclass-example-1.py
class A:
pass
class B:
pass
class C(A):
pass
class D(A, B):
pass
def dump(object):
print object, "=>",
if issubclass(object, A):
print "A",
if issubclass(object, B):
print "B",
if issubclass(object, C):
print "C",
if issubclass(object, D):
print "D",
dump(A)
dump(B)
dump(C)
dump(D)
dump(0)
dump("string")
A => A
B => B
C => A C
D => A B D
0 =>
Traceback (innermost last):
File "builtin-issubclass-example-1.py", line 29, in ?
File "builtin-issubclass-example-1.py", line 15, in dump
TypeError: arguments must be classes
Python 提供了在程序中與解釋器交互的多種方法. 例如 eval
函數將一個字符串 做爲 Python 表達式求值. 你能夠傳遞一串文本, 簡單的表達式, 或者使用 內建 Python 函數. 如 Example 1-17 所示.
File: builtin-eval-example-1.py
def dump(expression):
result = eval(expression)
print expression, "=>", result, type(result)
dump("1")
dump("1.0")
dump("'string'")
dump("1.0 + 2.0")
dump("'*' * 10")
dump("len('world')")
1 => 1 <type 'int'>
1.0 => 1.0 <type 'float'>
'string' => string <type 'string'>
1.0 + 2.0 => 3.0 <type 'float'>
'*' * 10 => ********** <type 'string'>
len('world') => 5 <type 'int'>
若是你不肯定字符串來源的安全性, 那麼你在使用 eval
的時候會遇到些麻煩. 例如, 某個用戶可能會使用 _ _import_ _
函數加載 os
模塊, 而後從硬盤刪除文件 (如 Example 1-18 所示).
File: builtin-eval-example-2.py
print eval("_ _import_ _('os').getcwd()")
print eval("_ _import_ _('os').remove('file')")
/home/fredrik/librarybook
Traceback (innermost last):
File "builtin-eval-example-2", line 2, in ?
File "<string>", line 0, in ?
os.error: (2, 'No such file or directory')
這裏咱們獲得了一個 os.error 異常, 這說明 Python 事實上在嘗試刪除文件!
幸運地是, 這個問題很容易解決. 你能夠給 eval
函數傳遞第 2 個參數, 一個定義了該表達式求值時名稱空間的字典. 咱們測試下, 給函數傳遞個空字典:
>>> print eval("_ _import_ _('os').remove('file')", {})
Traceback (innermost last):
File "<stdin>", line 1, in ?
File "<string>", line 0, in ?
os.error: (2, 'No such file or directory')
呃.... 咱們仍是獲得了個 os.error 異常.
這是由於 Python 在求值前會檢查這個字典, 若是沒有發現名稱爲 _ _builtins_ _
的變量(複數形式), 它就會添加一個:
>>> namespace = {}
>>> print eval("_ _import_ _('os').remove('file')", namespace)
Traceback (innermost last):
File "<stdin>", line 1, in ?
File "<string>", line 0, in ?
os.error: (2, 'No such file or directory')
>>> namespace.keys()
['_ _builtins_ _']
若是你打印這個 namespace 的內容, 你會發現裏邊有全部的內建函數.
[!Feather 注: 若是我RP不錯的話, 添加的這個_ _builtins_ _就是當前的_ _builtins_ _]
咱們注意到了若是這個變量存在, Python 就不會去添加默認的, 那麼咱們的解決方法也來了, 爲傳遞的字典參數加入一個 _ _builtins_ _
項便可. 如 Example 1-19 所示.
File: builtin-eval-example-3.py
print eval("_ _import_ _('os').getcwd()", {})
print eval("_ _import_ _('os').remove('file')", {"_ _builtins_ _": {}})
/home/fredrik/librarybook
Traceback (innermost last):
File "builtin-eval-example-3.py", line 2, in ?
File "<string>", line 0, in ?
NameError: _ _import_ _
即便這樣, 你仍然沒法避免針對 CPU 和內存資源的攻擊. (好比, 形如 eval("'*'*1000000*2*2*2*2*2*2*2*2*2")
的語句在執行後會使你的程序耗盡系統資源).
eval
函數只針對簡單的表達式. 若是要處理大塊的代碼, 你應該使用 compile
和 exec
函數 (如 Example 1-20 所示).
File: builtin-compile-example-1.py
NAME = "script.py"
BODY = """
prnt 'owl-stretching time'
"""
try:
compile(BODY, NAME, "exec")
except SyntaxError, v:
print "syntax error:", v, "in", NAME
# syntax error: invalid syntax in script.py
成功執行後, compile
函數會返回一個代碼對象, 你可使用 exec
語句執行它, 參見 Example 1-21 .
File: builtin-compile-example-2.py
BODY = """
print 'the ant, an introduction'
"""
code = compile(BODY, "<script>", "exec")
print code
exec code
<code object ? at 8c6be0, file "<script>", line 0>
the ant, an introduction
使用 Example 1-22 中的類能夠在程序執行時實時地生成代碼. write
方法用於添加代碼, indent
和 dedent
方法用於控制縮進結構. 其餘部分交給類來處理.
File: builtin-compile-example-3.py
import sys, string
class CodeGeneratorBackend:
"Simple code generator for Python"
def begin(self, tab="/t"):
self.code = []
self.tab = tab
self.level = 0
def end(self):
self.code.append("") # make sure there's a newline at the end
return compile(string.join(self.code, "/n"), "<code>", "exec")
def write(self, string):
self.code.append(self.tab * self.level + string)
def indent(self):
self.level = self.level + 1
# in 2.0 and later, this can be written as: self.level += 1
def dedent(self):
if self.level == 0:
raise SyntaxError, "internal error in code generator"
self.level = self.level - 1
# or: self.level -= 1
#
# try it out!
c = CodeGeneratorBackend()
c.begin()
c.write("for i in range(5):")
c.indent()
c.write("print 'code generation made easy!'")
c.dedent()
exec c.end()
code generation made easy!
code generation made easy!
code generation made easy!
code generation made easy!
code generation made easy!
Python 還提供了 execfile
函數, 一個從文件加載代碼, 編譯代碼, 執行代碼的快捷方式. Example 1-23 簡單地展現瞭如何使用這個函數.
File: builtin-execfile-example-1.py
execfile("hello.py")
def EXECFILE(filename, locals=None, globals=None):
exec compile(open(filename).read(), filename, "exec") in locals, globals
EXECFILE("hello.py")
hello again, and welcome to the show
hello again, and welcome to the show
Example 1-24 中的代碼是 Example 1-23 中使用的 hello.py 文件.
File: hello.py
print "hello again, and welcome to the show"
由於 Python 在檢查局部名稱空間和模塊名稱空間前不會檢查內建函數, 因此有時候你可能要顯式地引用 _ _builtin_ _
模塊. 例如 Example 1-25 重載了內建的 open
函數. 這時候要想使用原來的 open
函數, 就須要腳本顯式地指明模塊名稱.
File: builtin-open-example-1.py
def open(filename, mode="rb"):
import _ _builtin_ _
file = _ _builtin_ _.open(filename, mode)
if file.read(5) not in("GIF87", "GIF89"):
raise IOError, "not a GIF file"
file.seek(0)
return file
fp = open("samples/sample.gif")
print len(fp.read()), "bytes"
fp = open("samples/sample.jpg")
print len(fp.read()), "bytes"
3565 bytes
Traceback (innermost last):
File "builtin-open-example-1.py", line 12, in ?
File "builtin-open-example-1.py", line 5, in open
IOError: not a GIF file
[!Feather 注: 明白這個open()函數是幹什麼的麼? 檢查一個文件是不是 GIF 文件,
通常如這類的圖片格式都在文件開頭有默認的格式.
另外打開文件推薦使用file()而不是open() , 雖然暫時沒有區別]
exceptions
模塊提供了標準異常的層次結構. Python 啓動的時候會自動導入這個模塊, 而且將它加入到 _ _builtin_ _
模塊中. 也就是說, 通常不須要手動導入這個模塊.
在 1.5.2 版本時它是一個普通模塊, 2.0 以及之後版本成爲內建模塊.
該模塊定義瞭如下標準異常:
sys.exit
函數引起. 若是它在最頂層沒有被 try-except
語句捕獲, 那麼解釋器將直接關閉而不會顯示任何跟蹤返回信息.try-except
語句時致使奇怪的問題.os
模塊引發的錯誤.os
模塊中 Windows 相關錯誤.-tt
選項檢查不一致縮進時有可能被引起. 該異常只用於 2.0 及之後版本, 以前版本會引起一個SyntaxError 異常.assert
語句失敗時被引起(即表達式爲 false 時).eval_code2: NULL globals" )
. 這本書的做者編了 5 年程序都沒見過這個錯誤. (想必是沒有用 raise SystemError
).你能夠建立本身的異常類. 只須要繼承內建的 Exception 類(或者它的任意一個合適的子類)便可, 有須要時能夠再重載它的 _ _str_ _
方法. Example 1-26 展現瞭如何使用 exceptions
模塊.
File: exceptions-example-1.py
# python imports this module by itself, so the following
# line isn't really needed
# python 會自動導入該模塊, 因此如下這行是沒必要要的
# import exceptions
class HTTPError(Exception):
# indicates an HTTP protocol error
def _ _init_ _(self, url, errcode, errmsg):
self.url = url
self.errcode = errcode
self.errmsg = errmsg
def _ _str_ _(self):
return (
"<HTTPError for %s: %s %s>" %
(self.url, self.errcode, self.errmsg)
)
try:
raise HTTPError("http://www.python.org/foo", 200, "Not Found")
except HTTPError, error:
print "url", "=>", error.url
print "errcode", "=>", error.errcode
print "errmsg", "=>", error.errmsg
raise # reraise exception
url => http://www.python.org/foo
errcode => 200
errmsg => Not Found
Traceback (innermost last):
File "exceptions-example-1", line 16, in ?
HTTPError: <HTTPError for http://www.python.org/foo: 200 Not Found>
這個模塊中的大部分函數經過對應平臺相關模塊實現, 好比 posix
和 nt. os
模塊會在第一次導入的時候自動加載合適的執行模塊.
內建的 open / file
函數用於建立, 打開和編輯文件, 如 Example 1-27 所示. 而 os
模塊提供了重命名和刪除文件所需的函數.
File: os-example-3.py
import os
import string
def replace(file, search_for, replace_with):
# replace strings in a text file
back = os.path.splitext(file)[0] + ".bak"
temp = os.path.splitext(file)[0] + ".tmp"
try:
# remove old temp file, if any
os.remove(temp)
except os.error:
pass
fi = open(file)
fo = open(temp, "w")
for s in fi.readlines():
fo.write(string.replace(s, search_for, replace_with))
fi.close()
fo.close()
try:
# remove old backup file, if any
os.remove(back)
except os.error:
pass
# rename original to backup...
os.rename(file, back)
# ...and temporary to original
os.rename(temp, file)
#
# try it out!
file = "samples/sample.txt"
replace(file, "hello", "tjena")
replace(file, "tjena", "hello")
os
模塊也包含了一些用於目錄處理的函數.
listdir
函數返回給定目錄中全部文件名(包括目錄名)組成的列表, 如 Example 1-28 所示. 而 Unix 和 Windows 中使用的當前目錄和父目錄標記(. 和 .. )不包含在此列表中.
File: os-example-5.py
import os
for file in os.listdir("samples"):
print file
sample.au
sample.jpg
sample.wav
...
getcwd
和 chdir
函數分別用於得到和改變當前工做目錄. 如 Example 1-29 所示.
File: os-example-4.py
import os
# where are we?
cwd = os.getcwd()
print "1", cwd
# go down
os.chdir("samples")
print "2", os.getcwd()
# go back up
os.chdir(os.pardir)
print "3", os.getcwd()
1 /ematter/librarybook
2 /ematter/librarybook/samples
3 /ematter/librarybook
makedirs
和 removedirs
函數用於建立或刪除目錄層,如 Example 1-30 所示.
File: os-example-6.py
import os
os.makedirs("test/multiple/levels")
fp = open("test/multiple/levels/file", "w")
fp.write("inspector praline")
fp.close()
# remove the file
os.remove("test/multiple/levels/file")
# and all empty directories above it
os.removedirs("test/multiple/levels")
removedirs
函數會刪除所給路徑中最後一個目錄下全部的空目錄. 而 mkdir
和 rmdir
函數只能處理單個目錄級. 如 Example 1-31 所示.
File: os-example-7.py
import os
os.mkdir("test")
os.rmdir("test")
os.rmdir("samples") # this will fail
Traceback (innermost last):
File "os-example-7", line 6, in ?
OSError: [Errno 41] Directory not empty: 'samples'
若是須要刪除非空目錄, 你可使用 shutil
模塊中的 rmtree
函數.
stat
函數能夠用來獲取一個存在文件的信息, 如 Example 1-32 所示. 它返回一個類元組對象(stat_result對象, 包含 10 個元素), 依次是st_mode (權限模式), st_ino (inode number), st_dev (device), st_nlink (number of hard links), st_uid (全部者用戶 ID), st_gid (全部者所在組 ID ), st_size (文件大小, 字節), st_atime (最近一次訪問時間), st_mtime (最近修改時間), st_ctime (平臺相關; Unix下的最近一次元數據/metadata修改時間, 或者 Windows 下的建立時間) - 以上項目也可做爲屬性訪問.
[!Feather 注: 原文爲 9 元元組. 另,返回對象並不是元組類型,爲 struct.]
File: os-example-1.py
import os
import time
file = "samples/sample.jpg"
def dump(st):
mode, ino, dev, nlink, uid, gid, size, atime, mtime, ctime = st
print "- size:", size, "bytes"
print "- owner:", uid, gid
print "- created:", time.ctime(ctime)
print "- last accessed:", time.ctime(atime)
print "- last modified:", time.ctime(mtime)
print "- mode:", oct(mode)
print "- inode/dev:", ino, dev
#
# get stats for a filename
st = os.stat(file)
print "stat", file
dump(st)
#
# get stats for an open file
fp = open(file)
st = os.fstat(fp.fileno())
print "fstat", file
dump(st)
stat samples/sample.jpg
- size: 4762 bytes
- owner: 0 0
- created: Tue Sep 07 22:45:58 1999
- last accessed: Sun Sep 19 00:00:00 1999
- last modified: Sun May 19 01:42:16 1996
- mode: 0100666
- inode/dev: 0 2
fstat samples/sample.jpg
- size: 4762 bytes
- owner: 0 0
- created: Tue Sep 07 22:45:58 1999
- last accessed: Sun Sep 19 00:00:00 1999
- last modified: Sun May 19 01:42:16 1996
- mode: 0100666
- inode/dev: 0 0
返回對象中有些屬性在非 Unix 平臺下是無心義的, 好比 (st_inode
, st_dev
)爲 Unix 下的爲每一個文件提供了惟一標識, 但在其餘平臺可能爲任意無心義數據 .
stat
模塊包含了不少能夠處理該返回對象的常量及函數. 下面的代碼展現了其中的一些.
可使用 chmod
和 utime
函數修改文件的權限模式和時間屬性,如 Example 1-33 所示.
File: os-example-2.py
import os
import stat, time
infile = "samples/sample.jpg"
outfile = "out.jpg"
# copy contents
fi = open(infile, "rb")
fo = open(outfile, "wb")
while 1:
s = fi.read(10000)
if not s:
break
fo.write(s)
fi.close()
fo.close()
# copy mode and timestamp
st = os.stat(infile)
os.chmod(outfile, stat.S_IMODE(st[stat.ST_MODE]))
os.utime(outfile, (st[stat.ST_ATIME], st[stat.ST_MTIME]))
print "original", "=>"
print "mode", oct(stat.S_IMODE(st[stat.ST_MODE]))
print "atime", time.ctime(st[stat.ST_ATIME])
print "mtime", time.ctime(st[stat.ST_MTIME])
print "copy", "=>"
st = os.stat(outfile)
print "mode", oct(stat.S_IMODE(st[stat.ST_MODE]))
print "atime", time.ctime(st[stat.ST_ATIME])
print "mtime", time.ctime(st[stat.ST_MTIME])
original =>
mode 0666
atime Thu Oct 14 15:15:50 1999
mtime Mon Nov 13 15:42:36 1995
copy =>
mode 0666
atime Thu Oct 14 15:15:50 1999
mtime Mon Nov 13 15:42:36 1995
system
函數在當前進程下執行一個新命令, 並等待它完成, 如 Example 1-34 所示.
File: os-example-8.py
import os
if os.name == "nt":
command = "dir"
else:
command = "ls -l"
os.system(command)
-rwxrw-r-- 1 effbot effbot 76 Oct 9 14:17 README
-rwxrw-r-- 1 effbot effbot 1727 Oct 7 19:00 SimpleAsyncHTTP.py
-rwxrw-r-- 1 effbot effbot 314 Oct 7 20:29 aifc-example-1.py
-rwxrw-r-- 1 effbot effbot 259 Oct 7 20:38 anydbm-example-1.py
...
命令經過操做系統的標準 shell 執行, 並返回 shell 的退出狀態. 須要注意的是在 Windows 95/98 下, shell 一般是 command.com
, 它的推出狀態老是 0.
因爲 11os.system11 直接將命令傳遞給 shell , 因此若是你不檢查傳入參數的時候會很危險 (好比命令os.system("viewer %s" % file)
, 將 file 變量設置爲 "sample.jpg; rm -rf $HOME" ....
). 若是不肯定參數的安全性, 那麼最好使用exec
或spawn
代替(稍後介紹).
exec
函數會使用新進程替換當前進程(或者說是"轉到進程"). 在 Example 1-35 中, 字符串 "goodbye" 永遠不會被打印.
File: os-exec-example-1.py
import os
import sys
program = "python"
arguments = ["hello.py"]
print os.execvp(program, (program,) + tuple(arguments))
print "goodbye"
hello again, and welcome to the show
Python 提供了不少表現不一樣的 exec
函數. Example 1-35 使用的是 execvp
函數, 它會從標準路徑搜索執行程序, 把第二個參數(元組)做爲單獨的參數傳遞給程序, 並使用當前的環境變量來運行程序. 其餘七個同類型函數請參閱 Python Library Reference .
在 Unix 環境下, 你能夠經過組合使用 exec
, fork
以及 wait
函數來從當前程序調用另外一個程序, 如 Example 1-36 所示. fork
函數複製當前進程, wait
函數會等待一個子進程執行結束.
File: os-exec-example-2.py
import os
import sys
def run(program, *args):
pid = os.fork()
if not pid:
os.execvp(program, (program,) + args)
return os.wait()[0]
run("python", "hello.py")
print "goodbye"
hello again, and welcome to the show
goodbye
fork
函數在子進程返回中返回 0 (這個進程首先從 fork
返回值), 在父進程中返回一個非 0 的進程標識符(子進程的 PID ). 也就是說, 只有當咱們處於子進程的時候 "not pid
" 才爲真.
fork
和 wait
函數在 Windows 上是不可用的, 可是你可使用 spawn
函數, 如 Example 1-37 所示. 不過, spawn
不會沿着路徑搜索可執行文件, 你必須本身處理好這些.
File: os-spawn-example-1.py
import os
import string
def run(program, *args):
# find executable
for path in string.split(os.environ["PATH"], os.pathsep):
file = os.path.join(path, program) + ".exe"
try:
return os.spawnv(os.P_WAIT, file, (file,) + args)
except os.error:
pass
raise os.error, "cannot find executable"
run("python", "hello.py")
print "goodbye"
hello again, and welcome to the show
goodbye
spawn
函數還可用於在後臺運行一個程序. Example 1-38 給 run
函數添加了一個可選的 mode
參數; 當設置爲 os.P_NOWAIT
時, 這個腳本不會等待子程序結束, 默認值 os.P_WAIT
時 spawn
會等待子進程結束.
其它的標誌常量還有 os.P_OVERLAY
,它使得 spawn
的行爲和 exec
相似, 以及 os.P_DETACH
, 它在後臺運行子進程, 與當前控制檯和鍵盤焦點隔離.
File: os-spawn-example-2.py
import os
import string
def run(program, *args, **kw):
# find executable
mode = kw.get("mode", os.P_WAIT)
for path in string.split(os.environ["PATH"], os.pathsep):
file = os.path.join(path, program) + ".exe"
try:
return os.spawnv(mode, file, (file,) + args)
except os.error:
pass
raise os.error, "cannot find executable"
run("python", "hello.py", mode=os.P_NOWAIT)
print "goodbye"
goodbye
hello again, and welcome to the show
Example 1-39 提供了一個在 Unix 和 Windows 平臺上通用的 spawn
方法.
File: os-spawn-example-3.py
import os
import string
if os.name in ("nt", "dos"):
exefile = ".exe"
else:
exefile = ""
def spawn(program, *args):
try:
# possible 2.0 shortcut!
return os.spawnvp(program, (program,) + args)
except AttributeError:
pass
try:
spawnv = os.spawnv
except AttributeError:
# assume it's unix
pid = os.fork()
if not pid:
os.execvp(program, (program,) + args)
return os.wait()[0]
else:
# got spawnv but no spawnp: go look for an executable
for path in string.split(os.environ["PATH"], os.pathsep):
file = os.path.join(path, program) + exefile
try:
return spawnv(os.P_WAIT, file, (file,) + args)
except os.error:
pass
raise IOError, "cannot find executable"
#
# try it out!
spawn("python", "hello.py")
print "goodbye"
hello again, and welcome to the show
goodbye
Example 1-39 首先嚐試調用 spawnvp
函數. 若是該函數不存在 (一些版本/平臺沒有這個函數), 它將繼續查找一個名爲 spawnv
的函數而且 開始查找程序路徑. 做爲最後的選擇, 它會調用 exec
和 fork
函數完成工做.
Unix 系統中, 你可使用 fork
函數把當前進程轉入後臺(一個"守護者/daemon"). 通常來講, 你須要派生(fork off)一個當前進程的副本, 而後終止原進程, 如Example 1-40 所示.
File: os-example-14.py
import os
import time
pid = os.fork()
if pid:
os._exit(0) # kill original
print "daemon started"
time.sleep(10)
print "daemon terminated"
須要建立一個真正的後臺程序稍微有點複雜, 首先調用 setpgrp
函數建立一個 "進程組首領/process group leader". 不然, 向無關進程組發送的信號(同時)會引發守護進程的問題:
os.setpgrp()
爲了確保守護進程建立的文件可以得到程序指定的 mode flags(權限模式標記?), 最好刪除 user mode mask:
os.umask(0)
而後, 你應該重定向 stdout/stderr 文件, 而不能只是簡單地關閉它們(若是你的程序須要 stdout
或 stderr
寫入內容的時候, 可能會出現意想不到的問題).
class NullDevice:
def write(self, s):
pass
sys.stdin.close()
sys.stdout = NullDevice()
sys.stderr = NullDevice()
換言之, 因爲 Python 的 print
和 C 中的 printf/fprintf
在設備(device) 沒有鏈接後不會關閉你的程序, 此時守護進程中的 sys.stdout.write()
會拋出一個IOError 異常, 而你的程序依然在後臺運行的很好....
另外, 先前例子中的 _exit
函數會終止當前進程. 而 sys.exit
不一樣, 若是調用者(caller) 捕獲了 SystemExit 異常, 程序仍然會繼續執行. 如 Example 1-41 所示.
File: os-example-9.py
import os
import sys
try:
sys.exit(1)
except SystemExit, value:
print "caught exit(%s)" % value
try:
os._exit(2)
except SystemExit, value:
print "caught exit(%s)" % value
print "bye!"
caught exit(1)
os.path
模塊包含了各類處理長文件名(路徑名)的函數. 先導入 (import) os
模塊, 而後就能夠以 os.path
訪問該模塊.
os.path
模塊包含了許多與平臺無關的處理長文件名的函數. 也就是說, 你不須要處理先後斜槓, 冒號等. 咱們能夠看看 Example 1-42 中的樣例代碼.
File: os-path-example-1.py
import os
filename = "my/little/pony"
print "using", os.name, "..."
print "split", "=>", os.path.split(filename)
print "splitext", "=>", os.path.splitext(filename)
print "dirname", "=>", os.path.dirname(filename)
print "basename", "=>", os.path.basename(filename)
print "join", "=>", os.path.join(os.path.dirname(filename),
os.path.basename(filename))
using nt ...
split => ('my/little', 'pony')
splitext => ('my/little/pony', '')
dirname => my/little
basename => pony
join => my/little/pony
注意這裏的 split
只分割出最後一項(不帶斜槓).
os.path
模塊中還有許多函數容許你簡單快速地獲知文件名的一些特徵,如 Example 1-43 所示。
File: os-path-example-2.py
import os
FILES = (
os.curdir,
"/",
"file",
"/file",
"samples",
"samples/sample.jpg",
"directory/file",
"../directory/file",
"/directory/file"
)
for file in FILES:
print file, "=>",
if os.path.exists(file):
print "EXISTS",
if os.path.isabs(file):
print "ISABS",
if os.path.isdir(file):
print "ISDIR",
if os.path.isfile(file):
print "ISFILE",
if os.path.islink(file):
print "ISLINK",
if os.path.ismount(file):
print "ISMOUNT",
. => EXISTS ISDIR
/ => EXISTS ISABS ISDIR ISMOUNT
file =>
/file => ISABS
samples => EXISTS ISDIR
samples/sample.jpg => EXISTS ISFILE
directory/file =>
../directory/file =>
/directory/file => ISABS
expanduser
函數以與大部分Unix shell相同的方式處理用戶名快捷符號(~, 不過在 Windows 下工做不正常), 如 Example 1-44 所示.
File: os-path-expanduser-example-1.py
import os
print os.path.expanduser("~/.pythonrc")
# /home/effbot/.pythonrc
expandvars
函數將文件名中的環境變量替換爲對應值, 如 Example 1-45 所示.
File: os-path-expandvars-example-1.py
import os
os.environ["USER"] = "user"
print os.path.expandvars("/home/$USER/config")
print os.path.expandvars("$USER/folders")
/home/user/config
user/folders
walk
函數會幫你找出一個目錄樹下的全部文件 (如 Example 1-46 所示). 它的參數依次是目錄名, 回調函數, 以及傳遞給回調函數的數據對象.
File: os-path-walk-example-1.py
import os
def callback(arg, directory, files):
for file in files:
print os.path.join(directory, file), repr(arg)
os.path.walk(".", callback, "secret message")
./aifc-example-1.py 'secret message'
./anydbm-example-1.py 'secret message'
./array-example-1.py 'secret message'
...
./samples 'secret message'
./samples/sample.jpg 'secret message'
./samples/sample.txt 'secret message'
./samples/sample.zip 'secret message'
./samples/articles 'secret message'
./samples/articles/article-1.txt 'secret message'
./samples/articles/article-2.txt 'secret message'
...
walk
函數的接口多少有點晦澀 (也許只是對我我的而言, 我老是記不住參數的順序). Example 1-47 中展現的 index
函數會返回一個文件名列表, 你能夠直接使用 for-in
循環處理文件.
File: os-path-walk-example-2.py
import os
def index(directory):
# like os.listdir, but traverses directory trees
stack = [directory]
files = []
while stack:
directory = stack.pop()
for file in os.listdir(directory):
fullname = os.path.join(directory, file)
files.append(fullname)
if os.path.isdir(fullname) and not os.path.islink(fullname):
stack.append(fullname)
return files
for file in index("."):
print file
./aifc-example-1.py
./anydbm-example-1.py
./array-example-1.py
...
若是你不想列出全部的文件 (基於性能或者是內存的考慮) , Example 1-48 展現了另外一種方法. 這裏 DirectoryWalker 類的行爲與序列對象類似, 一次返回一個文件. (generator?)
File: os-path-walk-example-3.py
import os
class DirectoryWalker:
# a forward iterator that traverses a directory tree
def _ _init_ _(self, directory):
self.stack = [directory]
self.files = []
self.index = 0
def _ _getitem_ _(self, index):
while 1:
try:
file = self.files[self.index]
self.index = self.index + 1
except IndexError:
# pop next directory from stack
self.directory = self.stack.pop()
self.files = os.listdir(self.directory)
self.index = 0
else:
# got a filename
fullname = os.path.join(self.directory, file)
if os.path.isdir(fullname) and not os.path.islink(fullname):
self.stack.append(fullname)
return fullname
for file in DirectoryWalker("."):
print file
./aifc-example-1.py
./anydbm-example-1.py
./array-example-1.py
...
注意 DirectoryWalker 類並不檢查傳遞給 _ _getitem_ _
方法的索引值. 這意味着若是你越界訪問序列成員(索引數字過大)的話, 這個類將不能正常工做.
最後, 若是你須要處理文件大小和時間戳, Example 1-49 給出了一個類, 它返回文件名和它的 os.stat
屬性(一個元組). 這個版本在每一個文件上都能節省一次或兩次 stat
調用( os.path.isdir
和 os.path.islink
內部都使用了 stat
), 而且在一些平臺上運行很快.
File: os-path-walk-example-4.py
import os, stat
class DirectoryStatWalker:
# a forward iterator that traverses a directory tree, and
# returns the filename and additional file information
def _ _init_ _(self, directory):
self.stack = [directory]
self.files = []
self.index = 0
def _ _getitem_ _(self, index):
while 1:
try:
file = self.files[self.index]
self.index = self.index + 1
except IndexError:
# pop next directory from stack
self.directory = self.stack.pop()
self.files = os.listdir(self.directory)
self.index = 0
else:
# got a filename
fullname = os.path.join(self.directory, file)
st = os.stat(fullname)
mode = st[stat.ST_MODE]
if stat.S_ISDIR(mode) and not stat.S_ISLNK(mode):
self.stack.append(fullname)
return fullname, st
for file, st in DirectoryStatWalker("."):
print file, st[stat.ST_SIZE]
./aifc-example-1.py 336
./anydbm-example-1.py 244
./array-example-1.py 526
Example 1-50 展現了 stat
模塊的基本用法, 這個模塊包含了一些 os.stat
函數中可用的常量和測試函數.
File: stat-example-1.py
import stat
import os, time
st = os.stat("samples/sample.txt")
print "mode", "=>", oct(stat.S_IMODE(st[stat.ST_MODE]))
print "type", "=>",
if stat.S_ISDIR(st[stat.ST_MODE]):
print "DIRECTORY",
if stat.S_ISREG(st[stat.ST_MODE]):
print "REGULAR",
if stat.S_ISLNK(st[stat.ST_MODE]):
print "LINK",
print "size", "=>", st[stat.ST_SIZE]
print "last accessed", "=>", time.ctime(st[stat.ST_ATIME])
print "last modified", "=>", time.ctime(st[stat.ST_MTIME])
print "inode changed", "=>", time.ctime(st[stat.ST_CTIME])
mode => 0664
type => REGULAR
size => 305
last accessed => Sun Oct 10 22:12:30 1999
last modified => Sun Oct 10 18:39:37 1999
inode changed => Sun Oct 10 15:26:38 1999
string
模塊提供了一些用於處理字符串類型的函數, 如 Example 1-51 所示.
File: string-example-1.py
import string
text = "Monty Python's Flying Circus"
print "upper", "=>", string.upper(text)
print "lower", "=>", string.lower(text)
print "split", "=>", string.split(text)
print "join", "=>", string.join(string.split(text), "+")
print "replace", "=>", string.replace(text, "Python", "Java")
print "find", "=>", string.find(text, "Python"), string.find(text, "Java")
print "count", "=>", string.count(text, "n")
upper => MONTY PYTHON'S FLYING CIRCUS
lower => monty python's flying circus
split => ['Monty', "Python's", 'Flying', 'Circus']
join => Monty+Python's+Flying+Circus
replace => Monty Java's Flying Circus
find => 6 -1
count => 3
在 Python 1.5.2 以及更早版本中, string
使用 strop
中的函數來實現模塊功能.
在 Python1.6 和後繼版本,更多的字符串操做均可以做爲字符串方法來訪問, 如 Example 1-52 所示, string
模塊中的許多函數只是對相對應字符串方法的封裝.
File: string-example-2.py
text = "Monty Python's Flying Circus"
print "upper", "=>", text.upper()
print "lower", "=>", text.lower()
print "split", "=>", text.split()
print "join", "=>", "+".join(text.split())
print "replace", "=>", text.replace("Python", "Perl")
print "find", "=>", text.find("Python"), text.find("Perl")
print "count", "=>", text.count("n")
upper => MONTY PYTHON'S FLYING CIRCUS
lower => monty python's flying circus
split => ['Monty', "Python's", 'Flying', 'Circus']
join => Monty+Python's+Flying+Circus
replace => Monty Perl's Flying Circus
find => 6 -1
count => 3
爲了加強模塊對字符的處理能力, 除了字符串方法, string
模塊還包含了類型轉換函數用於把字符串轉換爲其餘類型, (如 Example 1-53 所示).
File: string-example-3.py
import string
print int("4711"),
print string.atoi("4711"),
print string.atoi("11147", 8), # octal 八進制
print string.atoi("1267", 16), # hexadecimal 十六進制
print string.atoi("3mv", 36) # whatever...
print string.atoi("4711", 0),
print string.atoi("04711", 0),
print string.atoi("0x4711", 0)
print float("4711"),
print string.atof("1"),
print string.atof("1.23e5")
4711 4711 4711 4711 4711
4711 2505 18193
4711.0 1.0 123000.0
大多數狀況下 (特別是當你使用的是1.6及更高版本時) ,你可使用 int
和 float
函數代替 string
模塊中對應的函數。
atoi
函數能夠接受可選的第二個參數, 指定數基(number base). 若是數基爲 0, 那麼函數將檢查字符串的前幾個字符來決定使用的數基: 若是爲 "0x," 數基將爲 16 (十六進制), 若是爲 "0," 則數基爲 8 (八進制). 默認數基值爲 10 (十進制), 當你未傳遞參數時就使用這個值.
在 1.6 及之後版本中, int
函數和 atoi
同樣能夠接受第二個參數. 與字符串版本函數不同的是 , int
和 float
能夠接受 Unicode 字符串對象.
"Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems."
- Jamie Zawinski, on comp.lang.emacs
re
模塊提供了一系列功能強大的正則表達式 (regular expression) 工具, 它們容許你快速檢查給定字符串是否與給定的模式匹配 (使用 match
函數), 或者包含這個模式 (使用 search
函數). 正則表達式是以緊湊(也很神祕)的語法寫出的字符串模式.
match
嘗試從字符串的起始匹配一個模式, 如 Example 1-54 所示. 若是模式匹配了某些內容 (包括空字符串, 若是模式容許的話) , 它將返回一個匹配對象. 使用它的 group
方法能夠找出匹配的內容.
File: re-example-1.py
import re
text = "The Attila the Hun Show"
# a single character 單個字符
m = re.match(".", text)
if m: print repr("."), "=>", repr(m.group(0))
# any string of characters 任何字符串
m = re.match(".*", text)
if m: print repr(".*"), "=>", repr(m.group(0))
# a string of letters (at least one) 只包含字母的字符串(至少一個)
m = re.match("/w+", text)
if m: print repr("/w+"), "=>", repr(m.group(0))
# a string of digits 只包含數字的字符串
m = re.match("/d+", text)
if m: print repr("/d+"), "=>", repr(m.group(0))
'.' => 'T'
'.*' => 'The Attila the Hun Show'
'//w+' => 'The'
可使用圓括號在模式中標記區域. 找到匹配後, group
方法能夠抽取這些區域的內容, 如 Example 1-55 所示. group(1)
會返回第一組的內容, group(2)
返回第二組的內容, 這樣... 若是你傳遞多個組數給 group
函數, 它會返回一個元組.
File: re-example-2.py
import re
text ="10/15/99"
m = re.match("(/d{2})/(/d{2})/(/d{2,4})", text)
if m:
print m.group(1, 2, 3)
('10', '15', '99')
search
函數會在字符串內查找模式匹配, 如 Example 1-56 所示. 它在全部可能的字符位置嘗試匹配模式, 從最左邊開始, 一旦找到匹配就返回一個匹配對象. 若是沒有找到相應的匹配, 就返回 None .
File: re-example-3.py
import re
text = "Example 3: There is 1 date 10/25/95 in here!"
m = re.search("(/d{1,2})/(/d{1,2})/(/d{2,4})", text)
print m.group(1), m.group(2), m.group(3)
month, day, year = m.group(1, 2, 3)
print month, day, year
date = m.group(0)
print date
10 25 95
10 25 95
10/25/95
Example 1-57 中展現了 sub
函數, 它可使用另個字符串替代匹配模式.
File: re-example-4.py
import re
text = "you're no fun anymore..."
# literal replace (string.replace is faster)
# 文字替換 (string.replace 速度更快)
print re.sub("fun", "entertaining", text)
# collapse all non-letter sequences to a single dash
# 將全部非字母序列轉換爲一個"-"(dansh,破折號)
print re.sub("[^/w]+", "-", text)
# convert all words to beeps
# 將全部單詞替換爲 BEEP
print re.sub("/S+", "-BEEP-", text)
you're no entertaining anymore...
you-re-no-fun-anymore-
-BEEP- -BEEP- -BEEP- -BEEP-
你也能夠經過回調 (callback) 函數使用 sub
來替換指定模式. Example 1-58 展現瞭如何預編譯模式.
File: re-example-5.py
import re
import string
text = "a line of text//012another line of text//012etc..."
def octal(match):
# replace octal code with corresponding ASCII character
# 使用對應 ASCII 字符替換八進制代碼
return chr(string.atoi(match.group(1), 8))
octal_pattern = re.compile(r"//(/d/d/d)")
print text
print octal_pattern.sub(octal, text)
a line of text/012another line of text/012etc...
a line of text
another line of text
etc...
若是你不編譯, re
模塊會爲你緩存一個編譯後版本, 全部的小腳本中, 一般不須要編譯正則表達式. Python1.5.2 中, 緩存中能夠容納 20 個匹配模式, 而在 2.0 中, 緩存則能夠容納 100 個匹配模式.
最後, Example 1-59 用一個模式列表匹配一個字符串. 這些模式將會組合爲一個模式, 並預編譯以節省時間.
File: re-example-6.py
import re, string
def combined_pattern(patterns):
p = re.compile(
string.join(map(lambda x: "("+x+")", patterns), "|")
)
def fixup(v, m=p.match, r=range(0,len(patterns))):
try:
regs = m(v).regs
except AttributeError:
return None # no match, so m.regs will fail
else:
for i in r:
if regs[i+1] != (-1, -1):
return i
return fixup
#
# try it out!
patterns = [
r"/d+",
r"abc/d{2,4}",
r"p/w+"
]
p = combined_pattern(patterns)
print p("129391")
print p("abc800")
print p("abc1600")
print p("python")
print p("perl")
print p("tcl")
0
1
1
2
2
None
math
模塊實現了許多對浮點數的數學運算函數. 這些函數通常是對平臺 C 庫中同名函數的簡單封裝, 因此通常狀況下, 不一樣平臺下計算的結果可能稍微地有所不一樣, 有時候甚至有很大出入. Example 1-60 展現瞭如何使用 math
模塊.
File: math-example-1.py
import math
print "e", "=>", math.e
print "pi", "=>", math.pi
print "hypot", "=>", math.hypot(3.0, 4.0)
# and many others...
e => 2.71828182846
pi => 3.14159265359
hypot => 5.0
完整函數列表請參閱 Python Library Reference .
Example 1-61 所展現的 cmath
模塊包含了一些用於複數運算的函數.
File: cmath-example-1.py
import cmath
print "pi", "=>", cmath.pi
print "sqrt(-1)", "=>", cmath.sqrt(-1)
pi => 3.14159265359
sqrt(-1) => 1j
完整函數列表請參閱 Python Library Reference .
operator
模塊爲 Python 提供了一個 "功能性" 的標準操做符接口. 當使用 map
以及 filter
一類的函數的時候, operator
模塊中的函數能夠替換一些 lambda
函式. 並且這些函數在一些喜歡寫晦澀代碼的程序員中很流行. Example 1-62 展現了 operator
模塊的通常用法.
File: operator-example-1.py
import operator
sequence = 1, 2, 4
print "add", "=>", reduce(operator.add, sequence)
print "sub", "=>", reduce(operator.sub, sequence)
print "mul", "=>", reduce(operator.mul, sequence)
print "concat", "=>", operator.concat("spam", "egg")
print "repeat", "=>", operator.repeat("spam", 5)
print "getitem", "=>", operator.getitem(sequence, 2)
print "indexOf", "=>", operator.indexOf(sequence, 2)
print "sequenceIncludes", "=>", operator.sequenceIncludes(sequence, 3)
add => 7
sub => -5
mul => 8
concat => spamegg
repeat => spamspamspamspamspam
getitem => 4
indexOf => 1
sequenceIncludes => 0
Example 1-63 展現了一些能夠用於檢查對象類型的 operator
函數.
File: operator-example-2.py
import operator
import UserList
def dump(data):
print type(data), "=>",
if operator.isCallable(data):
print "CALLABLE",
if operator.isMappingType(data):
print "MAPPING",
if operator.isNumberType(data):
print "NUMBER",
if operator.isSequenceType(data):
print "SEQUENCE",
dump(0)
dump("string")
dump("string"[0])
dump([1, 2, 3])
dump((1, 2, 3))
dump({"a": 1})
dump(len) # function 函數
dump(UserList) # module 模塊
dump(UserList.UserList) # class 類
dump(UserList.UserList()) # instance 實例
<type 'int'> => NUMBER
<type 'string'> => SEQUENCE
<type 'string'> => SEQUENCE
<type 'list'> => SEQUENCE
<type 'tuple'> => SEQUENCE
<type 'dictionary'> => MAPPING
<type 'builtin_function_or_method'> => CALLABLE
<type 'module'> =>
<type 'class'> => CALLABLE
<type 'instance'> => MAPPING NUMBER SEQUENCE
這裏須要注意 operator
模塊使用很是規的方法處理對象實例. 因此使用 isNumberType
, isMappingType
, 以及 isSequenceType
函數的時候要當心, 這很容易下降代碼的擴展性.
一樣須要注意的是一個字符串序列成員 (單個字符) 也是序列. 因此當在遞歸函數使用 isSequenceType 來截斷對象樹的時候, 別把普通字符串做爲參數(或者是任何包含字符串的序列對象).
copy
模塊包含兩個函數, 用來拷貝對象, 如 Example 1-64 所示.
copy(object) => object
建立給定對象的 "淺/淺層(shallow)" 拷貝(copy). 這裏 "淺/淺層(shallow)" 的意思是複製對象自己, 但當對象是一個容器 (Container) 時, 它的成員仍然指向原來的成員對象.
File: copy-example-1.py
import copy
a = [[1],[2],[3]]
b = copy.copy(a)
print "before", "=>"
print a
print b
# modify original
a[0][0] = 0
a[1] = None
print "after", "=>"
print a
print b
before =>
[[1], [2], [3]]
[[1], [2], [3]]
after =>
[[0], None, [3]]
[[0], [2], [3]]
你也可使用[:]語句 (完整切片) 來對列表進行淺層複製, 也可使用 copy
方法複製字典.
相反地, deepcopy(object) => object
建立一個對象的深層拷貝(deepcopy), 如 Example 1-65 所示, 當對象爲一個容器時, 全部的成員都被遞歸地複製了。
File: copy-example-2.py
import copy
a = [[1],[2],[3]]
b = copy.deepcopy(a)
print "before", "=>"
print a
print b
# modify original
a[0][0] = 0
a[1] = None
print "after", "=>"
print a
print b
before =>
[[1], [2], [3]]
[[1], [2], [3]]
after =>
[[0], None, [3]]
[[1], [2], [3]]
sys
模塊提供了許多函數和變量來處理 Python 運行時環境的不一樣部分.
在解釋器啓動後, argv
列表包含了傳遞給腳本的全部參數, 如 Example 1-66 所示. 列表的第一個元素爲腳本自身的名稱.
File: sys-argv-example-1.py
import sys
print "script name is", sys.argv[0]
if len(sys.argv) > 1:
print "there are", len(sys.argv)-1, "arguments:"
for arg in sys.argv[1:]:
print arg
else:
print "there are no arguments!"
script name is sys-argv-example-1.py
there are no arguments!
若是是從標準輸入讀入腳本 (好比 "python < sys-argv-example-1.py
"), 腳本的名稱將被設置爲空串. 若是把腳本做爲字符串傳遞給python (使用 -c
選項), 腳本名會被設置爲 "-c".
path
列表是一個由目錄名構成的列表, Python 從中查找擴展模塊( Python 源模塊, 編譯模塊,或者二進制擴展). 啓動 Python 時,這個列表從根據內建規則, PYTHONPATH 環境變量的內容, 以及註冊表( Windows 系統)等進行初始化. 因爲它只是一個普通的列表, 你能夠在程序中對它進行操做, 如 Example 1-67 所示.
File: sys-path-example-1.py
import sys
print "path has", len(sys.path), "members"
# add the sample directory to the path
sys.path.insert(0, "samples")
import sample
# nuke the path
sys.path = []
import random # oops!
path has 7 members
this is the sample module!
Traceback (innermost last):
File "sys-path-example-1.py", line 11, in ?
import random # oops!
ImportError: No module named random
builtin_module_names
列表包含 Python 解釋器中全部內建模塊的名稱, Example 1-68 給出了它的樣例代碼.
File: sys-builtin-module-names-example-1.py
import sys
def dump(module):
print module, "=>",
if module in sys.builtin_module_names:
print "<BUILTIN>"
else:
module = _ _import_ _(module)
print module._ _file_ _
dump("os")
dump("sys")
dump("string")
dump("strop")
dump("zlib")
os => C:/python/lib/os.pyc
sys => <BUILTIN>
string => C:/python/lib/string.pyc
strop => <BUILTIN>
zlib => C:/python/zlib.pyd
modules
字典包含全部加載的模塊. import
語句在從磁盤導入內容以前會先檢查這個字典.
正如你在 Example 1-69 中所見到的, Python 在處理你的腳本以前就已經導入了不少模塊.
File: sys-modules-example-1.py
import sys
print sys.modules.keys()
['os.path', 'os', 'exceptions', '_ _main_ _', 'ntpath', 'strop', 'nt',
'sys', '_ _builtin_ _', 'site', 'signal', 'UserDict', 'string', 'stat']
getrefcount
函數 (如 Example 1-70 所示) 返回給定對象的引用記數 - 也就是這個對象使用次數. Python 會跟蹤這個值, 當它減小爲0的時候, 就銷燬這個對象.
File: sys-getrefcount-example-1.py
import sys
variable = 1234
print sys.getrefcount(0)
print sys.getrefcount(variable)
print sys.getrefcount(None)
50
3
192
注意這個值老是比實際的數量大, 由於該函數自己在肯定這個值的時候依賴這個對象.
== 檢查主機平臺===
Example 1-71 展現了 platform
變量, 它包含主機平臺的名稱.
File: sys-platform-example-1.py
import sys
#
# emulate "import os.path" (sort of)...
if sys.platform == "win32":
import ntpath
pathmodule = ntpath
elif sys.platform == "mac":
import macpath
pathmodule = macpath
else:
# assume it's a posix platform
import posixpath
pathmodule = posixpath
print pathmodule
典型的平臺有Windows 9X/NT(顯示爲 win32
), 以及 Macintosh(顯示爲 mac
) . 對於 Unix 系統而言, platform 一般來自 "uname -r
" 命令的輸出, 例如 irix6
,linux2
, 或者 sunos5
(Solaris).
setprofiler
函數容許你配置一個分析函數(profiling function). 這個函數會在每次調用某個函數或方法時被調用(明確或隱含的), 或是遇到異常的時候被調用. 讓咱們看看 Example 1-72 的代碼.
File: sys-setprofiler-example-1.py
import sys
def test(n):
j = 0
for i in range(n):
j = j + i
return n
def profiler(frame, event, arg):
print event, frame.f_code.co_name, frame.f_lineno, "->", arg
# profiler is activated on the next call, return, or exception
# 分析函數將在下次函數調用, 返回, 或異常時激活
sys.setprofile(profiler)
# profile this function call
# 分析此次函數調用
test(1)
# disable profiler
# 禁用分析函數
sys.setprofile(None)
# don't profile this call
# 不會分析此次函數調用
test(2)
call test 3 -> None
return test 7 -> 1
基於該函數, profile
模塊提供了一個完整的分析器框架.
Example 1-73 中的 settrace
函數與此相似, 可是 trace
函數會在解釋器每執行到新的一行時被調用.
File: sys-settrace-example-1.py
import sys
def test(n):
j = 0
for i in range(n):
j = j + i
return n
def tracer(frame, event, arg):
print event, frame.f_code.co_name, frame.f_lineno, "->", arg
return tracer
# tracer is activated on the next call, return, or exception
# 跟蹤器將在下次函數調用, 返回, 或異常時激活
sys.settrace(tracer)
# trace this function call
# 跟蹤此次函數調用
test(1)
# disable tracing
# 禁用跟蹤器
sys.settrace(None)
# don't trace this call
# 不會跟蹤此次函數調用
test(2)
call test 3 -> None
line test 3 -> None
line test 4 -> None
line test 5 -> None
line test 5 -> None
line test 6 -> None
line test 5 -> None
line test 7 -> None
return test 7 -> 1
基於該函數提供的跟蹤功能, pdb
模塊提供了完整的調試( debug )框架.
stdin
, stdout
, 以及 stderr
變量包含與標準 I/O 流對應的流對象. 若是須要更好地控制輸出,而 print
不能知足你的要求, 它們就是你所須要的. 你也能夠 替換 它們, 這時候你就能夠重定向輸出和輸入到其它設備( device ), 或者以非標準的方式處理它們. 如 Example 1-74 所示.
File: sys-stdout-example-1.py
import sys
import string
class Redirect:
def _ _init_ _(self, stdout):
self.stdout = stdout
def write(self, s):
self.stdout.write(string.lower(s))
# redirect standard output (including the print statement)
# 重定向標準輸出(包括print語句)
old_stdout = sys.stdout
sys.stdout = Redirect(sys.stdout)
print "HEJA SVERIGE",
print "FRISKT HUM/303/226R"
# restore standard output
# 恢復標準輸出
sys.stdout = old_stdout
print "M/303/205/303/205/303/205/303/205L!"
heja sverige friskt hum/303/266r
M/303/205/303/205/303/205/303/205L!
要重定向輸出只要建立一個對象, 並實現它的 write
方法.
(除非 C 類型的實例外:Python 使用一個叫作 softspace
的整數屬性來控制輸出中的空白. 若是沒有這個屬性, Python 將把這個屬性附加到這個對象上. 你不須要在使用 Python 對象時擔憂, 可是在重定向到一個 C 類型時, 你應該確保該類型支持 softspace
屬性.)
執行至主程序的末尾時,解釋器會自動退出. 可是若是須要中途退出程序, 你能夠調用 sys.exit
函數, 它帶有一個可選的整數參數返回給調用它的程序.Example 1-75 給出了範例.
File: sys-exit-example-1.py
import sys
print "hello"
sys.exit(1)
print "there"
hello
注意 sys.exit
並非當即退出. 而是引起一個 SystemExit 異常. 這意味着你能夠在主程序中捕獲對 sys.exit
的調用, 如 Example 1-76 所示.
File: sys-exit-example-2.py
import sys
print "hello"
try:
sys.exit(1)
except SystemExit:
pass
print "there"
hello
there
若是準備在退出前本身清理一些東西(好比刪除臨時文件), 你能夠配置一個 "退出處理函數"(exit handler), 它將在程序退出的時候自動被調用. 如 Example 1-77 所示.
File: sys-exitfunc-example-1.py
import sys
def exitfunc():
print "world"
sys.exitfunc = exitfunc
print "hello"
sys.exit(1)
print "there" # never printed # 不會被 print
hello
world
在 Python 2.0 之後, 你可使用 atexit
模塊來註冊多個退出處理函數.
(用於2.0版本及以上) atexit
模塊容許你註冊一個或多個終止函數(暫且這麼叫), 這些函數將在解釋器終止前被自動調用.
調用 register
函數, 即可以將函數註冊爲終止函數, 如 Example 1-78 所示. 你也能夠添加更多的參數, 這些將做爲 exit
函數的參數傳遞.
File: atexit-example-1.py
import atexit
def exit(*args):
print "exit", args
# register two exit handler
atexit.register(exit)
atexit.register(exit, 1)
atexit.register(exit, "hello", "world")
exit ('hello', 'world')
exit (1,)
exit ()
該模塊實際上是一個對 sys.exitfunc
鉤子( hook )的簡單封裝.
time
模塊提供了一些處理日期和一天內時間的函數. 它是創建在 C 運行時庫的簡單封裝.
給定的日期和時間能夠被表示爲浮點型(從參考時間, 一般是 1970.1.1 到如今通過的秒數. 即 Unix 格式), 或者一個表示時間的 struct (類元組).
Example 1-79 展現瞭如何使用 time
模塊獲取當前時間.
File: time-example-1.py
import time
now = time.time()
print now, "seconds since", time.gmtime(0)[:6]
print "or in other words:"
print "- local time:", time.localtime(now)
print "- utc:", time.gmtime(now)
937758359.77 seconds since (1970, 1, 1, 0, 0, 0)
or in other words:
- local time: (1999, 9, 19, 18, 25, 59, 6, 262, 1)
- utc: (1999, 9, 19, 16, 25, 59, 6, 262, 0)
localtime
和 gmtime
返回的類元組包括年, 月, 日, 時, 分, 秒, 星期, 一年的第幾天, 日光標誌. 其中年是一個四位數(在有千年蟲問題的平臺上另有規定, 但仍是四位數), 星期從星期一(數字 0 表明)開始, 1月1日是一年的第一天.
你可使用標準的格式化字符串把時間對象轉換爲字符串, 不過 time
模塊已經提供了許多標準轉換函數, 如 Example 1-80 所示.
File: time-example-2.py
import time
now = time.localtime(time.time())
print time.asctime(now)
print time.strftime("%y/%m/%d %H:%M", now)
print time.strftime("%a %b %d", now)
print time.strftime("%c", now)
print time.strftime("%I %p", now)
print time.strftime("%Y-%m-%d %H:%M:%S %Z", now)
# do it by hand...
year, month, day, hour, minute, second, weekday, yearday, daylight = now
print "%04d-%02d-%02d" % (year, month, day)
print "%02d:%02d:%02d" % (hour, minute, second)
print ("MON", "TUE", "WED", "THU", "FRI", "SAT", "SUN")[weekday], yearday
Sun Oct 10 21:39:24 1999
99/10/10 21:39
Sun Oct 10
Sun Oct 10 21:39:24 1999
09 PM
1999-10-10 21:39:24 CEST
1999-10-10
21:39:24
SUN 283
在一些平臺上, time
模塊包含了 strptime
函數, 它的做用與 strftime
相反. 給定一個字符串和模式, 它返回相應的時間對象, 如 Example 1-81 所示.
File: time-example-6.py
import time
# make sure we have a strptime function!
# 確認有函數 strptime
try:
strptime = time.strptime
except AttributeError:
from strptime import strptime
print strptime("31 Nov 00", "%d %b %y")
print strptime("1 Jan 70 1:30pm", "%d %b %y %I:%M%p")
只有在系統的 C 庫提供了相應的函數的時候, time.strptime
函數纔可使用. 對於沒有提供標準實現的平臺, Example 1-82 提供了一個不徹底的實現.
File: strptime.py
import re
import string
MONTHS = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug",
"Sep", "Oct", "Nov", "Dec"]
SPEC = {
# map formatting code to a regular expression fragment
"%a": "(?P<weekday>[a-z]+)",
"%A": "(?P<weekday>[a-z]+)",
"%b": "(?P<month>[a-z]+)",
"%B": "(?P<month>[a-z]+)",
"%C": "(?P<century>/d/d?)",
"%d": "(?P<day>/d/d?)",
"%D": "(?P<month>/d/d?)/(?P<day>/d/d?)/(?P<year>/d/d)",
"%e": "(?P<day>/d/d?)",
"%h": "(?P<month>[a-z]+)",
"%H": "(?P<hour>/d/d?)",
"%I": "(?P<hour12>/d/d?)",
"%j": "(?P<yearday>/d/d?/d?)",
"%m": "(?P<month>/d/d?)",
"%M": "(?P<minute>/d/d?)",
"%p": "(?P<ampm12>am|pm)",
"%R": "(?P<hour>/d/d?):(?P<minute>/d/d?)",
"%S": "(?P<second>/d/d?)",
"%T": "(?P<hour>/d/d?):(?P<minute>/d/d?):(?P<second>/d/d?)",
"%U": "(?P<week>/d/d)",
"%w": "(?P<weekday>/d)",
"%W": "(?P<weekday>/d/d)",
"%y": "(?P<year>/d/d)",
"%Y": "(?P<year>/d/d/d/d)",
"%%": "%"
}
class TimeParser:
def _ _init_ _(self, format):
# convert strptime format string to regular expression
format = string.join(re.split("(?:/s|%t|%n)+", format))
pattern = []
try:
for spec in re.findall("%/w|%%|.", format):
if spec[0] == "%":
spec = SPEC[spec]
pattern.append(spec)
except KeyError:
raise ValueError, "unknown specificer: %s" % spec
self.pattern = re.compile("(?i)" + string.join(pattern, ""))
def match(self, daytime):
# match time string
match = self.pattern.match(daytime)
if not match:
raise ValueError, "format mismatch"
get = match.groupdict().get
tm = [0] * 9
# extract date elements
y = get("year")
if y:
y = int(y)
if y < 68:
y = 2000 + y
elif y < 100:
y = 1900 + y
tm[0] = y
m = get("month")
if m:
if m in MONTHS:
m = MONTHS.index(m) + 1
tm[1] = int(m)
d = get("day")
if d: tm[2] = int(d)
# extract time elements
h = get("hour")
if h:
tm[3] = int(h)
else:
h = get("hour12")
if h:
h = int(h)
if string.lower(get("ampm12", "")) == "pm":
h = h + 12
tm[3] = h
m = get("minute")
if m: tm[4] = int(m)
s = get("second")
if s: tm[5] = int(s)
# ignore weekday/yearday for now
return tuple(tm)
def strptime(string, format="%a %b %d %H:%M:%S %Y"):
return TimeParser(format).match(string)
if _ _name_ _ == "_ _main_ _":
# try it out
import time
print strptime("2000-12-20 01:02:03", "%Y-%m-%d %H:%M:%S")
print strptime(time.ctime(time.time()))
(2000, 12, 20, 1, 2, 3, 0, 0, 0)
(2000, 11, 15, 12, 30, 45, 0, 0, 0)
將時間元組轉換回時間值很是簡單, 至少咱們談論的當地時間 (local time) 如此. 只要把時間元組傳遞給 mktime
函數, 如 Example 1-83 所示.
File: time-example-3.py
import time
t0 = time.time()
tm = time.localtime(t0)
print tm
print t0
print time.mktime(tm)
(1999, 9, 9, 0, 11, 8, 3, 252, 1)
936828668.16
936828668.0
可是, 1.5.2 版本的標準庫沒有提供能將 UTC 時間 (Universal Time, Coordinated: 特林威治標準時間)轉換爲時間值的函數 ( Python 和對應底層 C 庫都沒有提供). Example 1-84 提供了該函數的一個 Python 實現, 稱爲 timegm
.
File: time-example-4.py
import time
def _d(y, m, d, days=(0,31,59,90,120,151,181,212,243,273,304,334,365)):
# map a date to the number of days from a reference point
return (((y - 1901)*1461)/4 + days[m-1] + d +
((m > 2 and not y % 4 and (y % 100 or not y % 400)) and 1))
def timegm(tm, epoch=_d(1970,1,1)):
year, month, day, h, m, s = tm[:6]
assert year >= 1970
assert 1 <= month <= 12
return (_d(year, month, day) - epoch)*86400 + h*3600 + m*60 + s
t0 = time.time()
tm = time.gmtime(t0)
print tm
print t0
print timegm(tm)
(1999, 9, 8, 22, 12, 12, 2, 251, 0)
936828732.48
936828732
從 1.6 版本開始, calendar
模塊提供了一個相似的函數 calendar.timegm
.
time
模塊能夠計算 Python 程序的執行時間, 如 Example 1-85 所示. 你能夠測量 "wall time" (real world time), 或是"進程時間" (消耗的 CPU 時間).
File: time-example-5.py
import time
def procedure():
time.sleep(2.5)
# measure process time
t0 = time.clock()
procedure()
print time.clock() - t0, "seconds process time"
# measure wall time
t0 = time.time()
procedure()
print time.time() - t0, "seconds wall time"
0.0 seconds process time
2.50903499126 seconds wall time
並非全部的系統都能測量真實的進程時間. 一些系統中(包括 Windows ), clock
函數一般測量從程序啓動到測量時的 wall time.
進程時間的精度受限制. 在一些系統中, 它超過 30 分鐘後進程會被清理. (原文: On many systems, it wraps around after just over 30 minutes.)
另參見 timing
模塊( Windows 下的朋友不用忙活了,沒有地~), 它能夠測量兩個事件之間的 wall time.
types
模塊包含了標準解釋器定義的全部類型的類型對象, 如 Example 1-86 所示. 同一類型的全部對象共享一個類型對象. 你可使用 is
來檢查一個對象是否是屬於某個給定類型.
File: types-example-1.py
import types
def check(object):
print object,
if type(object) is types.IntType:
print "INTEGER",
if type(object) is types.FloatType:
print "FLOAT",
if type(object) is types.StringType:
print "STRING",
if type(object) is types.ClassType:
print "CLASS",
if type(object) is types.InstanceType:
print "INSTANCE",
check(0)
check(0.0)
check("0")
class A:
pass
class B:
pass
check(A)
check(B)
a = A()
b = B()
check(a)
check(b)
0 INTEGER
0.0 FLOAT
0 STRING
A CLASS
B CLASS
<A instance at 796960> INSTANCE
<B instance at 796990> INSTANCE
注意全部的類都具備相同的類型, 全部的實例也是同樣. 要測試一個類或者實例所屬的類, 可使用內建的 issubclass
和 isinstance
函數.
types
模塊在第一次引入的時候會破壞當前的異常狀態. 也就是說, 不要在異常處理語句塊中導入該模塊 (或其餘會導入它的模塊) .
(可選, 2.0 及之後版本) gc
模塊提供了到內建循環垃圾收集器的接口.
Python 使用引用記數來跟蹤何時銷燬一個對象; 一個對象的最後一個引用一旦消失, 這個對象就會被銷燬.
從 2.0 版開始, Python 還提供了一個循環垃圾收集器, 它每隔一段時間執行. 這個收集器查找指向自身的數據結構, 並嘗試破壞循環. 如 Example 1-87 所示.
你可使用 gc.collect
函數來強制完整收集. 這個函數將返回收集器銷燬的對象的數量.
File: gc-example-1.py
import gc
# create a simple object that links to itself
class Node:
def _ _init_ _(self, name):
self.name = name
self.parent = None
self.children = []
def addchild(self, node):
node.parent = self
self.children.append(node)
def _ _repr_ _(self):
return "<Node %s at %x>" % (repr(self.name), id(self))
# set up a self-referencing structure
root = Node("monty")
root.addchild(Node("eric"))
root.addchild(Node("john"))
root.addchild(Node("michael"))
# remove our only reference
del root
print gc.collect(), "unreachable objects"
print gc.collect(), "unreachable objects"
12 unreachable objects
0 unreachable objects
若是你肯定你的程序不會建立自引用的數據結構, 你可使用 gc.disable
函數禁用垃圾收集, 調用這個函數之後, Python 的工做方式將與 1.5.2 或更早的版本相同.
"Now, imagine that your friend kept complaining that she didn't want to visit you since she found it too hard to climb up the drain pipe, and you kept telling her to use the friggin' stairs like everyone else..."
- eff-bot, June 1998
本章敘述了許多在 Python 程序中普遍使用的模塊. 固然, 在大型的 Python 程序中不使用這些模塊也是能夠的, 但若是使用會節省你很多時間.
fileinput
模塊可讓你更簡單地向不一樣的文件寫入內容. 該模塊提供了一個簡單的封裝類, 一個簡單的 for-in
語句就能夠循環獲得一個或多個文本文件的內容.
StringIO
模塊 (以及 cStringIO
模塊, 做爲一個的變種) 實現了一個工做在內存的文件對象. 你能夠在不少地方用 StringIO
對象替換普通的文件對象.
UserDict
, UserList
, 以及 UserString
是對應內建類型的頂層簡單封裝. 和內建類型不一樣的是, 這些封裝是能夠被繼承的. 這在你須要一個和內建類型行爲類似但由額外新方法的類的時候頗有用.
random
模塊提供了一些不一樣的隨機數字生成器. whrandom
模塊與此類似, 但容許你建立多個生成器對象.
[!Feather 注: whrandom 在版本 2.1 時聲明不支持. 請使用 random 替代.]
md5
和 sha
模塊用於計算密寫的信息標記( cryptographically strong message signatures , 所謂的 "message digests", 信息摘要).
crypt
模塊實現了 DES 樣式的單向加密. 該模塊只在 Unix 系統下可用.
rotor
模塊提供了簡單的雙向加密. 版本 2.4 之後的朋友能夠不用忙活了.
[!Feather 注: 它在版本 2.3 時申明不支持, 由於它的加密運算不安全.]
fileinput
模塊容許你循環一個或多個文本文件的內容, 如 Example 2-1 所示.
File: fileinput-example-1.py
import fileinput
import sys
for line in fileinput.input("samples/sample.txt"):
sys.stdout.write("-> ")
sys.stdout.write(line)
-> We will perhaps eventually be writing only small
-> modules which are identified by name as they are
-> used to build larger ones, so that devices like
-> indentation, rather than delimiters, might become
-> feasible for expressing local structure in the
-> source language.
-> -- Donald E. Knuth, December 1974
你也可使用 fileinput
模塊得到當前行的元信息 (meta information). 其中包括 isfirstline
, filename
, lineno
, 如 Example 2-2 所示.
File: fileinput-example-2.py
import fileinput
import glob
import string, sys
for line in fileinput.input(glob.glob("samples/*.txt")):
if fileinput.isfirstline(): # first in a file?
sys.stderr.write("-- reading %s --/n" % fileinput.filename())
sys.stdout.write(str(fileinput.lineno()) + " " + string.upper(line))
-- reading samples/sample.txt --
1 WE WILL PERHAPS EVENTUALLY BE WRITING ONLY SMALL
2 MODULES WHICH ARE IDENTIFIED BY NAME AS THEY ARE
3 USED TO BUILD LARGER ONES, SO THAT DEVICES LIKE
4 INDENTATION, RATHER THAN DELIMITERS, MIGHT BECOME
5 FEASIBLE FOR EXPRESSING LOCAL STRUCTURE IN THE
6 SOURCE LANGUAGE.
7 -- DONALD E. KNUTH, DECEMBER 1974
文本文件的替換操做很簡單. 只須要把 inplace
關鍵字參數設置爲 1 , 傳遞給 input
函數, 該模塊會幫你作好一切. Example 2-3 展現了這些.
File: fileinput-example-3.py
import fileinput, sys
for line in fileinput.input(inplace=1):
# convert Windows/DOS text files to Unix files
if line[-2:] == "/r/n":
line = line[:-2] + "/n"
sys.stdout.write(line)
shutil
實用模塊包含了一些用於複製文件和文件夾的函數. Example 2-4 中使用的 copy
函數使用和 Unix 下 cp
命令基本相同的方式複製一個文件.
File: shutil-example-1.py
import shutil
import os
for file in os.listdir("."):
if os.path.splitext(file)[1] == ".py":
print file
shutil.copy(file, os.path.join("backup", file))
aifc-example-1.py
anydbm-example-1.py
array-example-1.py
...
copytree
函數用於複製整個目錄樹 (與 cp -r
相同), 而 rmtree
函數用於刪除整個目錄樹 (與 rm -r
). 如 Example 2-5 所示.
File: shutil-example-2.py
import shutil
import os
SOURCE = "samples"
BACKUP = "samples-bak"
# create a backup directory
shutil.copytree(SOURCE, BACKUP)
print os.listdir(BACKUP)
# remove it
shutil.rmtree(BACKUP)
print os.listdir(BACKUP)
['sample.wav', 'sample.jpg', 'sample.au', 'sample.msg', 'sample.tgz',
...
Traceback (most recent call last):
File "shutil-example-2.py", line 17, in ?
print os.listdir(BACKUP)
os.error: No such file or directory
Example 2-6 中展現的 tempfile
模塊容許你快速地建立名稱惟一的臨時文件供使用.
File: tempfile-example-1.py
import tempfile
import os
tempfile = tempfile.mktemp()
print "tempfile", "=>", tempfile
file = open(tempfile, "w+b")
file.write("*" * 1000)
file.seek(0)
print len(file.read()), "bytes"
file.close()
try:
# must remove file when done
os.remove(tempfile)
except OSError:
pass
tempfile => C:/TEMP/~160-1
1000 bytes
TemporaryFile
函數會自動挑選合適的文件名, 並打開文件, 如 Example 2-7 所示. 並且它會確保該文件在關閉的時候會被刪除. (在 Unix 下, 你能夠刪除一個已打開的文件, 這 時文件關閉時它會被自動刪除. 在其餘平臺上, 這經過一個特殊的封裝類實現.)
File: tempfile-example-2.py
import tempfile
file = tempfile.TemporaryFile()
for i in range(100):
file.write("*" * 100)
file.close() # removes the file!
Example 2-8 展現了 StringIO
模塊的使用. 它實現了一個工做在內存的文件對象 (內存文件). 在大多須要標準文件對象的地方均可以使用它來替換.
File: stringio-example-1.py
import StringIO
MESSAGE = "That man is depriving a village somewhere of a computer scientist."
file = StringIO.StringIO(MESSAGE)
print file.read()
That man is depriving a village somewhere of a computer scientist.
StringIO
類實現了內建文件對象的全部方法, 此外還有 getvalue
方法用來返回它內部的字符串值. Example 2-9 展現了這個方法.
File: stringio-example-2.py
import StringIO
file = StringIO.StringIO()
file.write("This man is no ordinary man. ")
file.write("This is Mr. F. G. Superman.")
print file.getvalue()
This man is no ordinary man. This is Mr. F. G. Superman.
StringIO
能夠用於從新定向 Python 解釋器的輸出, 如 Example 2-10 所示.
File: stringio-example-3.py
import StringIO
import string, sys
stdout = sys.stdout
sys.stdout = file = StringIO.StringIO()
print """
According to Gbaya folktales, trickery and guile
are the best ways to defeat the python, king of
snakes, which was hatched from a dragon at the
world's start. -- National Geographic, May 1997
"""
sys.stdout = stdout
print string.upper(file.getvalue())
ACCORDING TO GBAYA FOLKTALES, TRICKERY AND GUILE
ARE THE BEST WAYS TO DEFEAT THE PYTHON, KING OF
SNAKES, WHICH WAS HATCHED FROM A DRAGON AT THE
WORLD'S START. -- NATIONAL GEOGRAPHIC, MAY 1997
cStringIO
是一個可選的模塊, 是 StringIO
的更快速實現. 它的工做方式和 StringIO
基本相同, 可是它不能夠被繼承. Example 2-11 展現了 cStringIO
的用法, 另參考前一節.
File: cstringio-example-1.py
import cStringIO
MESSAGE = "That man is depriving a village somewhere of a computer scientist."
file = cStringIO.StringIO(MESSAGE)
print file.read()
That man is depriving a village somewhere of a computer scientist.
爲了讓你的代碼儘量快, 但同時保證兼容低版本的 Python ,你可使用一個小技巧在 cStringIO
不可用時啓用 StringIO
模塊, 如 Example 2-12 所示.
File: cstringio-example-2.py
try:
import cStringIO
StringIO = cStringIO
except ImportError:
import StringIO
print StringIO
<module 'StringIO' (built-in)>
(2.0 新增) mmap
模塊提供了操做系統內存映射函數的接口, 如 Example 2-13 所示. 映射區域的行爲和字符串對象相似, 但數據是直接從文件讀取的.
File: mmap-example-1.py
import mmap
import os
filename = "samples/sample.txt"
file = open(filename, "r+")
size = os.path.getsize(filename)
data = mmap.mmap(file.fileno(), size)
# basics
print data
print len(data), size
# use slicing to read from the file
# 使用切片操做讀取文件
print repr(data[:10]), repr(data[:10])
# or use the standard file interface
# 或使用標準的文件接口
print repr(data.read(10)), repr(data.read(10))
<mmap object at 008A2A10>
302 302
'We will pe' 'We will pe'
'We will pe' 'rhaps even'
在 Windows 下, 這個文件必須以既可讀又可寫的模式打開( `r+` , `w+` , 或 `a+` ), 不然 mmap
調用會失敗.
[!Feather 注: 經本人測試, a+ 模式是徹底能夠的, 原文只有 r+ 和 w+]
Example 2-14 展現了內存映射區域的使用, 在不少地方它均可以替換普通字符串使用, 包括正則表達式和其餘字符串操做.
File: mmap-example-2.py
import mmap
import os, string, re
def mapfile(filename):
file = open(filename, "r+")
size = os.path.getsize(filename)
return mmap.mmap(file.fileno(), size)
data = mapfile("samples/sample.txt")
# search
index = data.find("small")
print index, repr(data[index-5:index+15])
# regular expressions work too!
m = re.search("small", data)
print m.start(), m.group()
43 'only small/015/012modules '
43 small
UserDict
模塊包含了一個可繼承的字典類 (事實上是對內建字典類型的 Python 封裝).
Example 2-15 展現了一個加強的字典類, 容許對字典使用 "加/+" 操做並提供了接受關鍵字參數的構造函數.
File: userdict-example-1.py
import UserDict
class FancyDict(UserDict.UserDict):
def _ _init_ _(self, data = {}, **kw):
UserDict.UserDict._ _init_ _(self)
self.update(data)
self.update(kw)
def _ _add_ _(self, other):
dict = FancyDict(self.data)
dict.update(b)
return dict
a = FancyDict(a = 1)
b = FancyDict(b = 2)
print a + b
{'b': 2, 'a': 1}
UserList
模塊包含了一個可繼承的列表類 (事實上是對內建列表類型的 Python 封裝).
在 Example 2-16 中, AutoList 實例相似一個普通的列表對象, 但它容許你經過賦值爲列表添加項目.
File: userlist-example-1.py
import UserList
class AutoList(UserList.UserList):
def _ _setitem_ _(self, i, item):
if i == len(self.data):
self.data.append(item)
else:
self.data[i] = item
list = AutoList()
for i in range(10):
list[i] = i
print list
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
(2.0 新增) UserString
模塊包含兩個類, UserString 和 MutableString . 前者是對標準字符串類型的封裝, 後者是一個變種, 容許你修改特定位置的字符(聯想下列表就知道了).
注意 MutableString 並非效率很好, 許多操做是經過切片和字符串鏈接實現的. 若是性能很對你的腳原本說重要的話, 你最好使用字符串片段的列表或者array
模塊. Example 2-17 展現了 UserString
模塊.
File: userstring-example-1.py
import UserString
class MyString(UserString.MutableString):
def append(self, s):
self.data = self.data + s
def insert(self, index, s):
self.data = self.data[index:] + s + self.data[index:]
def remove(self, s):
self.data = self.data.replace(s, "")
file = open("samples/book.txt")
text = file.read()
file.close()
book = MyString(text)
for bird in ["gannet", "robin", "nuthatch"]:
book.remove(bird)
print book
...
C: The one without the !
P: The one without the -!!! They've ALL got the !! It's a
Standard British Bird, the , it's in all the books!!!
...
Example 2-18 展現了 traceback
模塊容許你在程序裏打印異常的跟蹤返回 (Traceback)信息, 相似未捕獲異常時解釋器所作的. 如 Example 2-18 所示.
File: traceback-example-1.py
# note! importing the traceback module messes up the
# exception state, so you better do that here and not
# in the exception handler
# 注意! 導入 traceback 會清理掉異常狀態, 因此
# 最好別在異常處理代碼中導入該模塊
import traceback
try:
raise SyntaxError, "example"
except:
traceback.print_exc()
Traceback (innermost last):
File "traceback-example-1.py", line 7, in ?
SyntaxError: example
Example 2-19 使用 StringIO
模塊將跟蹤返回信息放在字符串中.
File: traceback-example-2.py
import traceback
import StringIO
try:
raise IOError, "an i/o error occurred"
except:
fp = StringIO.StringIO()
traceback.print_exc(file=fp)
message = fp.getvalue()
print "failure! the error was:", repr(message)
failure! the error was: 'Traceback (innermost last):/012 File
"traceback-example-2.py", line 5, in ?/012IOError: an i/o error
occurred/012'
你可使用 extract_tb
函數格式化跟蹤返回信息, 獲得包含錯誤信息的列表, 如 Example 2-20 所示.
File: traceback-example-3.py
import traceback
import sys
def function():
raise IOError, "an i/o error occurred"
try:
function()
except:
info = sys.exc_info()
for file, lineno, function, text in traceback.extract_tb(info[2]):
print file, "line", lineno, "in", function
print "=>", repr(text)
print "** %s: %s" % info[:2]
traceback-example-3.py line 8 in ?
=> 'function()'
traceback-example-3.py line 5 in function
=> 'raise IOError, "an i/o error occurred"'
** exceptions.IOError: an i/o error occurred
errno
模塊定義了許多的符號錯誤碼, 好比 ENOENT
("沒有該目錄入口") 以及 EPERM
("權限被拒絕"). 它還提供了一個映射到對應平臺數字錯誤代碼的字典.Example 2-21 展現瞭如何使用 errno
模塊.
在大多狀況下, IOError 異常會提供一個二元元組, 包含對應數值錯誤代碼和一個說明字符串. 若是你須要區分不一樣的錯誤代碼, 那麼最好在可能的地方使用符號名稱.
File: errno-example-1.py
import errno
try:
fp = open("no.such.file")
except IOError, (error, message):
if error == errno.ENOENT:
print "no such file"
elif error == errno.EPERM:
print "permission denied"
else:
print message
no such file
Example 2-22 繞了些無用的彎子, 不過它很好地說明了如何使用 errorcode
字典把數字錯誤碼映射到符號名稱( symbolic name ).
File: errno-example-2.py
import errno
try:
fp = open("no.such.file")
except IOError, (error, message):
print error, repr(message)
print errno.errorcode[error]
# 2 'No such file or directory'
# ENOENT
getopt
模塊包含用於抽出命令行選項和參數的函數, 它能夠處理多種格式的選項. 如 Example 2-23 所示.
其中第 2 個參數指定了容許的可縮寫的選項. 選項名後的冒號(:) 意味這這個選項必須有額外的參數.
File: getopt-example-1.py
import getopt
import sys
# simulate command-line invocation
# 模仿命令行參數
sys.argv = ["myscript.py", "-l", "-d", "directory", "filename"]
# process options
# 處理選項
opts, args = getopt.getopt(sys.argv[1:], "ld:")
long = 0
directory = None
for o, v in opts:
if o == "-l":
long = 1
elif o == "-d":
directory = v
print "long", "=", long
print "directory", "=", directory
print "arguments", "=", args
long = 1
directory = directory
arguments = ['filename']
爲了讓 getopt
查找長的選項, 如 Example 2-24 所示, 傳遞一個描述選項的列表作爲第 3 個參數. 若是一個選項名稱以等號(=) 結尾, 那麼它必須有一個附加參數.
File: getopt-example-2.py
import getopt
import sys
# simulate command-line invocation
# 模仿命令行參數
sys.argv = ["myscript.py", "--echo", "--printer", "lp01", "message"]
opts, args = getopt.getopt(sys.argv[1:], "ep:", ["echo", "printer="])
# process options
# 處理選項
echo = 0
printer = None
for o, v in opts:
if o in ("-e", "--echo"):
echo = 1
elif o in ("-p", "--printer"):
printer = v
print "echo", "=", echo
print "printer", "=", printer
print "arguments", "=", args
echo = 1
printer = lp01
arguments = ['message']
[!Feather 注: 我不知道你們明白沒, 能夠本身試下:
myscript.py -e -p lp01 message
myscript.py --echo --printer=lp01 message
]
getpass
模塊提供了平臺無關的在命令行下輸入密碼的方法. 如 Example 2-25 所示.
getpass(prompt)
會顯示提示字符串, 關閉鍵盤的屏幕反饋, 而後讀取密碼. 若是提示參數省略, 那麼它將打印出 "Password:
".
getuser()
得到當前用戶名, 若是可能的話.
File: getpass-example-1.py
import getpass
usr = getpass.getuser()
pwd = getpass.getpass("enter password for user %s: " % usr)
print usr, pwd
enter password for user mulder:
mulder trustno1
glob
根據給定模式生成知足該模式的文件名列表, 和 Unix shell 相同.
這裏的模式和正則表達式相似, 但更簡單. 星號(*
) 匹配零個或更多個字符, 問號(?
) 匹配單個字符. 你也可使用方括號來指定字符範圍, 例如 [0-9]
表明一個數字. 其餘全部字符都表明它們自己.
glob(pattern)
返回知足給定模式的全部文件的列表. Example 2-26 展現了它的用法.
File: glob-example-1.py
import glob
for file in glob.glob("samples/*.jpg"):
print file
samples/sample.jpg
注意這裏的 glob
返回完整路徑名, 這點和 os.listdir
函數不一樣. glob
事實上使用了 fnmatch
模塊來完成模式匹配.
fnmatch
模塊使用模式來匹配文件名. 如 Example 2-27 所示.
模式語法和 Unix shell 中所使用的相同. 星號(*
) 匹配零個或更多個字符, 問號(?
) 匹配單個字符. 你也可使用方括號來指定字符範圍, 例如 [0-9]
表明一個數字. 其餘全部字符都匹配它們自己.
File: fnmatch-example-1.py
import fnmatch
import os
for file in os.listdir("samples"):
if fnmatch.fnmatch(file, "*.jpg"):
print file
sample.jpg
Example 2-28 中的 translate
函數能夠將一個文件匹配模式轉換爲正則表達式.
File: fnmatch-example-2.py
import fnmatch
import os, re
pattern = fnmatch.translate("*.jpg")
for file in os.listdir("samples"):
if re.match(pattern, file):
print file
print "(pattern was %s)" % pattern
sample.jpg
(pattern was .*/.jpg$)
glob
和 find
模塊在內部使用 fnmatch
模塊來實現.
"Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin."
- John von Neumann, 1951
random
模塊包含許多隨機數生成器.
基本隨機數生成器(基於 Wichmann 和 Hill , 1982 的數學運算理論) 能夠經過不少方法訪問, 如 Example 2-29 所示.
File: random-example-1.py
import random
for i in range(5):
# random float: 0.0 <= number < 1.0
print random.random(),
# random float: 10 <= number < 20
print random.uniform(10, 20),
# random integer: 100 <= number <= 1000
print random.randint(100, 1000),
# random integer: even numbers in 100 <= number < 1000
print random.randrange(100, 1000, 2)
0.946842713956 19.5910069381 709 172
0.573613195398 16.2758417025 407 120
0.363241598013 16.8079747714 916 580
0.602115173978 18.386796935 531 774
0.526767588533 18.0783794596 223 344
注意這裏的 randint
函數能夠返回上界, 而其餘函數老是返回小於上界的值. 全部函數都有可能返回下界值.
Example 2-30 展現了 choice
函數, 它用來從一個序列裏分揀出一個隨機項目. 它能夠用於列表, 元組, 以及其餘序列(固然, 非空的).
File: random-example-2.py
import random
# random choice from a list
for i in range(5):
print random.choice([1, 2, 3, 5, 9])
2
3
1
9
1
在 2.0 及之後版本, shuffle
函數能夠用於打亂一個列表的內容 (也就是生成一個該列表的隨機全排列). Example 2-31 展現瞭如何在舊版本中實現該函數.
File: random-example-4.py
import random
try:
# available in 2.0 and later
shuffle = random.shuffle
except AttributeError:
def shuffle(x):
for i in xrange(len(x)-1, 0, -1):
# pick an element in x[:i+1] with which to exchange x[i]
j = int(random.random() * (i+1))
x[i], x[j] = x[j], x[i]
cards = range(52)
shuffle(cards)
myhand = cards[:5]
print myhand
[4, 8, 40, 12, 30]
random
模塊也包含了非恆定分佈的隨機生成器函數. Example 2-32 使用了 gauss (高斯)函數來生成知足高斯分的布隨機數字.
File: random-example-3.py
import random
histogram = [0] * 20
# calculate histogram for gaussian
# noise, using average=5, stddev=1
for i in range(1000):
i = int(random.gauss(5, 1) * 2)
histogram[i] = histogram[i] + 1
# print the histogram
m = max(histogram)
for v in histogram:
print "*" * (v * 50 / m)
****
**********
*************************
***********************************
************************************************
**************************************************
*************************************
***************************
*************
***
*
你能夠在 Python Library Reference 找到更多關於非恆定分佈隨機生成器函數的信息.
標準庫中提供的隨機數生成器都是僞隨機數生成器. 不過這對於不少目的來講已經足夠了, 好比模擬, 數值分析, 以及遊戲. 能夠肯定的是它不適合密碼學用途.
這個模塊早在 2.1 就被聲明不同意, 早廢了. 請使用 random
代替.
- Feather
Example 2-33 展現了 whrandom
, 它提供了一個僞隨機數生成器. (基於 Wichmann 和 Hill, 1982 的數學運算理論). 除非你須要不共享狀態的多個生成器(如多線程程序), 請使用 random
模塊代替.
File: whrandom-example-1.py
import whrandom
# same as random
print whrandom.random()
print whrandom.choice([1, 2, 3, 5, 9])
print whrandom.uniform(10, 20)
print whrandom.randint(100, 1000)
0.113412062346
1
16.8778954689
799
Example 2-34 展現瞭如何使用 whrandom 類實例建立多個生成器.
File: whrandom-example-2.py
import whrandom
# initialize all generators with the same seed
rand1 = whrandom.whrandom(4,7,11)
rand2 = whrandom.whrandom(4,7,11)
rand3 = whrandom.whrandom(4,7,11)
for i in range(5):
print rand1.random(), rand2.random(), rand3.random()
0.123993532536 0.123993532536 0.123993532536
0.180951499518 0.180951499518 0.180951499518
0.291924111809 0.291924111809 0.291924111809
0.952048889363 0.952048889363 0.952048889363
0.969794283643 0.969794283643 0.969794283643
md5
(Message-Digest Algorithm 5)模塊用於計算信息密文(信息摘要).
md5
算法計算一個強壯的128位密文. 這意味着若是兩個字符串是不一樣的, 那麼有極高可能它們的 md5
也不一樣. 也就是說, 給定一個 md5
密文, 那麼幾乎沒有可能再找到另個字符串的密文與此相同. Example 2-35 展現瞭如何使用 md5
模塊.
File: md5-example-1.py
import md5
hash = md5.new()
hash.update("spam, spam, and eggs")
print repr(hash.digest())
'L/005J/243/266/355/243u`/305r/203/267/020F/303'
注意這裏的校驗和是一個二進制字符串. Example 2-36 展現瞭如何得到一個十六進制或 base64 編碼的字符串.
File: md5-example-2.py
import md5
import string
import base64
hash = md5.new()
hash.update("spam, spam, and eggs")
value = hash.digest()
print hash.hexdigest()
# before 2.0, the above can be written as
# 在 2.0 前, 以上應該寫作:
# print string.join(map(lambda v: "%02x" % ord(v), value), "")
print base64.encodestring(value)
4c054aa3b6eda37560c57283b71046c3
TAVKo7bto3VgxXKDtxBGww==
Example 2-37 展現瞭如何使用 md5
校驗和來處理口令的發送與應答的驗證(不過咱們將稍候討論這裏使用隨機數字所帶來的問題).
File: md5-example-3.py
import md5
import string, random
def getchallenge():
# generate a 16-byte long random string. (note that the built-
# in pseudo-random generator uses a 24-bit seed, so this is not
# as good as it may seem...)
# 生成一個 16 字節長的隨機字符串. 注意內建的僞隨機生成器
# 使用的是 24 位的種子(seed), 因此這裏這樣用並很差..
challenge = map(lambda i: chr(random.randint(0, 255)), range(16))
return string.join(challenge, "")
def getresponse(password, challenge):
# calculate combined digest for password and challenge
# 計算密碼和質詢(challenge)的聯合密文
m = md5.new()
m.update(password)
m.update(challenge)
return m.digest()
#
# server/client communication
# 服務器/客戶端通信
# 1. client connects. server issues challenge.
# 1. 客戶端鏈接, 服務器發佈質詢(challenge)
print "client:", "connect"
challenge = getchallenge()
print "server:", repr(challenge)
# 2. client combines password and challenge, and calculates
# the response.
# 2. 客戶端計算密碼和質詢(challenge)的組合後的密文
client_response = getresponse("trustno1", challenge)
print "client:", repr(client_response)
# 3. server does the same, and compares the result with the
# client response. the result is a safe login in which the
# password is never sent across the communication channel.
# 3. 服務器作一樣的事, 而後比較結果與客戶端的返回,
# 判斷是否容許用戶登錄. 這樣作密碼沒有在通信中明文傳輸.
server_response = getresponse("trustno1", challenge)
if server_response == client_response:
print "server:", "login ok"
client: connect
server: '/334/352/227Z#/272/273/212KG/330/265/032>/311o'
client: "l'/305/240-x/245/237/035/225A/254/233/337/225/001"
server: login ok
Example 2-38 提供了 md5
的一個變種, 你能夠經過標記信息來判斷它是否在網絡傳輸過程當中被修改(丟失).
File: md5-example-4.py
import md5
import array
class HMAC_MD5:
# keyed md5 message authentication
def _ _init_ _(self, key):
if len(key) > 64:
key = md5.new(key).digest()
ipad = array.array("B", [0x36] * 64)
opad = array.array("B", [0x5C] * 64)
for i in range(len(key)):
ipad[i] = ipad[i] ^ ord(key[i])
opad[i] = opad[i] ^ ord(key[i])
self.ipad = md5.md5(ipad.tostring())
self.opad = md5.md5(opad.tostring())
def digest(self, data):
ipad = self.ipad.copy()
opad = self.opad.copy()
ipad.update(data)
opad.update(ipad.digest())
return opad.digest()
#
# simulate server end
# 模擬服務器端
key = "this should be a well-kept secret"
message = open("samples/sample.txt").read()
signature = HMAC_MD5(key).digest(message)
# (send message and signature across a public network)
# (通過由網絡發送信息和簽名)
#
# simulate client end
#模擬客戶端
key = "this should be a well-kept secret"
client_signature = HMAC_MD5(key).digest(message)
if client_signature == signature:
print "this is the original message:"
print message
else:
print "someone has modified the message!!!"
copy
方法會對這個內部對象狀態作一個快照( snapshot ). 這容許你預先計算部分密文摘要(例如 Example 2-38 中的 padded key).
該算法的細節請參閱 HMAC-MD5:Keyed-MD5 for Message Authentication ( http://www.research.ibm.com/security/draft-ietf-ipsec-hmac-md5-00.txt ) by Krawczyk, 或其餘.
千萬別忘記內建的僞隨機生成器對於加密操做而言並不合適. 千萬當心.
sha
模塊提供了計算信息摘要(密文)的另種方法, 如 Example 2-39 所示. 它與 md5
模塊相似, 但生成的是 160 位簽名.
File: sha-example-1.py
import sha
hash = sha.new()
hash.update("spam, spam, and eggs")
print repr(hash.digest())
print hash.hexdigest()
'/321/333/003/026I/331/272-j/303/247/240/345/343Tvq/364/346/311'
d1db031649d9ba2d6ac3a7a0e5e3547671f4e6c9
關於 sha
密文的使用, 請參閱 md5
中的例子.
(可選, 只用於 Unix) crypt
模塊實現了單向的 DES 加密, Unix 系統使用這個加密算法來儲存密碼, 這個模塊真正也就只在檢查這樣的密碼時有用.
Example 2-40 展現瞭如何使用 crypt.crypt
來加密一個密碼, 將密碼和 salt 組合起來而後傳遞給函數, 這裏的 salt 包含兩位隨機字符. 如今你能夠扔掉原密碼而只保存加密後的字符串了.
File: crypt-example-1.py
import crypt
import random, string
def getsalt(chars = string.letters + string.digits):
# generate a random 2-character 'salt'
# 生成隨機的 2 字符 'salt'
return random.choice(chars) + random.choice(chars)
print crypt.crypt("bananas", getsalt())
'py8UGrijma1j6'
確認密碼時, 只須要用新密碼調用加密函數, 並取加密後字符串的前兩位做爲 salt 便可. 如 果結果和加密後字符串匹配, 那麼密碼就是正確的. Example 2-41使用 pwd
模塊來獲取已知用戶的加密後密碼.
File: crypt-example-2.py
import pwd, crypt
def login(user, password):
"Check if user would be able to log in using password"
try:
pw1 = pwd.getpwnam(user)[1]
pw2 = crypt.crypt(password, pw1[:2])
return pw1 == pw2
except KeyError:
return 0 # no such user
user = raw_input("username:")
password = raw_input("password:")
if login(user, password):
print "welcome", user
else:
print "login failed"
關於其餘實現驗證的方法請參閱 md5
模塊一節.
這個模塊在 2.3 時被聲明不同意, 2.4 時廢了. 由於它的加密算法不安全.
- Feather
(可選) rotor
模塊實現了一個簡單的加密算法. 如 Example 2-42 所示. 它的算法基於 WWII Enigma engine.
File: rotor-example-1.py
import rotor
SECRET_KEY = "spam"
MESSAGE = "the holy grail"
r = rotor.newrotor(SECRET_KEY)
encoded_message = r.encrypt(MESSAGE)
decoded_message = r.decrypt(encoded_message)
print "original:", repr(MESSAGE)
print "encoded message:", repr(encoded_message)
print "decoded message:", repr(decoded_message)
original: 'the holy grail'
encoded message: '/227/271/244/015/305sw/3340/337/252/237/340U'
decoded message: 'the holy grail'
(可選) zlib
模塊爲 "zlib" 壓縮提供支持. (這種壓縮方法是 "deflate".)
Example 2-43 展現瞭如何使用 compress
和 decompress
函數接受字符串參數.
File: zlib-example-1.py
import zlib
MESSAGE = "life of brian"
compressed_message = zlib.compress(MESSAGE)
decompressed_message = zlib.decompress(compressed_message)
print "original:", repr(MESSAGE)
print "compressed message:", repr(compressed_message)
print "decompressed message:", repr(decompressed_message)
original: 'life of brian'
compressed message: 'x/234/313/311LKU/310OSH*/312L/314/003/000!/010/004/302'
decompressed message: 'life of brian'
文件的內容決定了壓縮比率, Example 2-44 說明了這點.
File: zlib-example-2.py
import zlib
import glob
for file in glob.glob("samples/*"):
indata = open(file, "rb").read()
outdata = zlib.compress(indata, zlib.Z_BEST_COMPRESSION)
print file, len(indata), "=>", len(outdata),
print "%d%%" % (len(outdata) * 100 / len(indata))
samples/sample.au 1676 => 1109 66%
samples/sample.gz 42 => 51 121%
samples/sample.htm 186 => 135 72%
samples/sample.ini 246 => 190 77%
samples/sample.jpg 4762 => 4632 97%
samples/sample.msg 450 => 275 61%
samples/sample.sgm 430 => 321 74%
samples/sample.tar 10240 => 125 1%
samples/sample.tgz 155 => 159 102%
samples/sample.txt 302 => 220 72%
samples/sample.wav 13260 => 10992 82%
你也能夠實時地壓縮或解壓縮數據, 如 Example 2-45 所示.
File: zlib-example-3.py
import zlib
encoder = zlib.compressobj()
data = encoder.compress("life")
data = data + encoder.compress(" of ")
data = data + encoder.compress("brian")
data = data + encoder.flush()
print repr(data)
print repr(zlib.decompress(data))
'x/234/313/311LKU/310OSH*/312L/314/003/000!/010/004/302'
'life of brian'
Example 2-46 把解碼對象封裝到了一個相似文件對象的類中, 實現了一些文件對象的方法, 這樣使得讀取壓縮文件更方便.
File: zlib-example-4.py
import zlib
import string, StringIO
class ZipInputStream:
def _ _init_ _(self, file):
self.file = file
self._ _rewind()
def _ _rewind(self):
self.zip = zlib.decompressobj()
self.pos = 0 # position in zipped stream
self.offset = 0 # position in unzipped stream
self.data = ""
def _ _fill(self, bytes):
if self.zip:
# read until we have enough bytes in the buffer
while not bytes or len(self.data) < bytes:
self.file.seek(self.pos)
data = self.file.read(16384)
if not data:
self.data = self.data + self.zip.flush()
self.zip = None # no more data
break
self.pos = self.pos + len(data)
self.data = self.data + self.zip.decompress(data)
def seek(self, offset, whence=0):
if whence == 0:
position = offset
elif whence == 1:
position = self.offset + offset
else:
raise IOError, "Illegal argument"
if position < self.offset:
raise IOError, "Cannot seek backwards"
# skip forward, in 16k blocks
while position > self.offset:
if not self.read(min(position - self.offset, 16384)):
break
def tell(self):
return self.offset
def read(self, bytes = 0):
self._ _fill(bytes)
if bytes:
data = self.data[:bytes]
self.data = self.data[bytes:]
else:
data = self.data
self.data = ""
self.offset = self.offset + len(data)
return data
def readline(self):
# make sure we have an entire line
while self.zip and "/n" not in self.data:
self._ _fill(len(self.data) + 512)
i = string.find(self.data, "/n") + 1
if i <= 0:
return self.read()
return self.read(i)
def readlines(self):
lines = []
while 1:
s = self.readline()
if not s:
break
lines.append(s)
return lines
#
# try it out
data = open("samples/sample.txt").read()
data = zlib.compress(data)
file = ZipInputStream(StringIO.StringIO(data))
for line in file.readlines():
print line[:-1]
We will perhaps eventually be writing only small
modules which are identified by name as they are
used to build larger ones, so that devices like
indentation, rather than delimiters, might become
feasible for expressing local structure in the
source language.
-- Donald E. Knuth, December 1974
code
模塊提供了一些用於模擬標準交互解釋器行爲的函數.
compile_command
與內建 compile
函數行爲類似, 但它會經過測試來保證你傳遞的是一個完成的 Python 語句.
在 Example 2-47 中, 咱們一行一行地編譯一個程序, 編譯完成後會執行所獲得的代碼對象 (code object). 程序代碼以下:
a = (
1,
2,
3
)
print a
注意只有咱們到達第 2 個括號, 元組的賦值操做能編譯完成.
File: code-example-1.py
import code
import string
#
SCRIPT = [
"a = (",
" 1,",
" 2,",
" 3 ",
")",
"print a"
]
script = ""
for line in SCRIPT:
script = script + line + "/n"
co = code.compile_command(script, "<stdin>", "exec")
if co:
# got a complete statement. execute it!
print "-"*40
print script,
print "-"*40
exec co
script = ""
----------------------------------------
a = (
1,
2,
3
)
----------------------------------------
----------------------------------------
print a
----------------------------------------
(1, 2, 3)
InteractiveConsole 類實現了一個交互控制檯, 相似你啓動的 Python 解釋器交互模式.
控制檯能夠是活動的(自動調用函數到達下一行) 或是被動的(當有新數據時調用 push 方法). 默認使用內建的 raw_input
函數. 若是你想使用另個輸入函數, 你可使用相同的名稱重載這個方法. Example 2-48 展現瞭如何使用 code
模塊來模擬交互解釋器.
File: code-example-2.py
import code
console = code.InteractiveConsole()
console.interact()
Python 1.5.2
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
(InteractiveConsole)
>>> a = (
... 1,
... 2,
... 3
... )
>>> print a
(1, 2, 3)
Example 2-49 中的腳本定義了一個 keyboard
函數. 它容許你在程序中手動控制交互解釋器.
File: code-example-3.py
def keyboard(banner=None):
import code, sys
# use exception trick to pick up the current frame
try:
raise None
except:
frame = sys.exc_info()[2].tb_frame.f_back
# evaluate commands in current namespace
namespace = frame.f_globals.copy()
namespace.update(frame.f_locals)
code.interact(banner=banner, local=namespace)
def func():
print "START"
a = 10
keyboard()
print "END"
func()
START
Python 1.5.2
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
(InteractiveConsole)
>>> print a
10
>>> print keyboard
<function keyboard at 9032c8>
^Z
END
"Well, since you last asked us to stop, this thread has moved from discussing languages suitable for professional programmers via accidental users to computer-phobic users. A few more iterations can make this thread really interesting..."
- eff-bot, June 1996
本章將介紹標準 Python 解釋器中所提供的線程支持模塊. 注意線程支持模塊是可選的, 有可能在一些 Python 解釋器中不可用.
本章還涵蓋了一些 Unix 和 Windows 下用於執行外部進程的模塊.
執行 Python 程序的時候, 是按照從主模塊頂端向下執行的. 循環用於重複執行部分代碼, 函數和方法會將控制臨時移交到程序的另外一部分.
經過線程, 你的程序能夠在同時處理多個任務. 每一個線程都有它本身的控制流. 因此你能夠在一個線程裏從文件讀取數據, 另個向屏幕輸出內容.
爲了保證兩個線程能夠同時訪問相同的內部數據, Python 使用了 global interpreter lock (全局解釋器鎖). 在同一時間只可能有一個線程執行 Python 代碼; Python 其實是自動地在一段很短的時間後切換到下個線程執行, 或者等待 一個線程執行一項須要時間的操做(例如等待經過 socket 傳輸的數據, 或是從文件中讀取數據).
全局鎖事實上並不能避免你程序中的問題. 多個線程嘗試訪問相同的數據會致使異常 狀態. 例如如下的代碼:
def getitem(key):
item = cache.get(key)
if item is None:
# not in cache; create a new one
item = create_new_item(key)
cache[key] = item
return item
若是不一樣的線程前後使用相同的 key 調用這裏的 getitem
方法, 那麼它們極可能會致使相同的參數調用兩次 create_new_item
. 大多時候這樣作沒有問題, 但在某些時候會致使嚴重錯誤.
不過你可使用 lock objects 來同步線程. 一個線程只能擁有一個 lock object , 這樣就能夠確保某個時刻 只有一個線程執行 getitem 函數.
在大多現代操做系統中, 每一個程序在它自身的進程( process )內執行. 咱們經過在 shell 中鍵入命令或直接在菜單中選擇來執行一個程序/進程. Python 容許你在一個腳本內執行一個新的程序.
大多進程相關函數經過 os
模塊定義. 相關內容請參閱 第 1.4.4 小節 .
(可選) threading
模塊爲線程提供了一個高級接口, 如 Example 3-1 所示. 它源自 Java 的線程實現. 和低級的 thread
模塊相同, 只有你在編譯解釋器時打開了線程支持纔可使用它 .
你只須要繼承 Thread 類, 定義好 run
方法, 就能夠建立一 個新的線程. 使用時首先建立該類的一個或多個實例, 而後調用 start
方法. 這樣每一個實例的 run
方法都會運行在它本身的線程裏.
File: threading-example-1.py
import threading
import time, random
class Counter:
def _ _init_ _(self):
self.lock = threading.Lock()
self.value = 0
def increment(self):
self.lock.acquire() # critical section
self.value = value = self.value + 1
self.lock.release()
return value
counter = Counter()
class Worker(threading.Thread):
def run(self):
for i in range(10):
# pretend we're doing something that takes 10�00 ms
value = counter.increment() # increment global counter
time.sleep(random.randint(10, 100) / 1000.0)
print self.getName(), "-- task", i, "finished", value
#
# try it
for i in range(10):
Worker().start() # start a worker
Thread-1 -- task 0 finished 1
Thread-3 -- task 0 finished 3
Thread-7 -- task 0 finished 8
Thread-1 -- task 1 finished 7
Thread-4 -- task 0 Thread-5 -- task 0 finished 4
finished 5
Thread-8 -- task 0 Thread-6 -- task 0 finished 9
finished 6
...
Thread-6 -- task 9 finished 98
Thread-4 -- task 9 finished 99
Thread-9 -- task 9 finished 100
Example 3-1 使用了 Lock 對象來在全局 Counter 對象裏建立臨界區 (critical section). 若是刪除了 acquire
和 release
語句, 那麼 Counter
極可能不會到達 100.
Queue
模塊提供了一個線程安全的隊列 (queue) 實現, 如 Example 3-2 所示. 你能夠經過它在多個線程裏安全訪問同個對象.
File: queue-example-1.py
import threading
import Queue
import time, random
WORKERS = 2
class Worker(threading.Thread):
def _ _init_ _(self, queue):
self._ _queue = queue
threading.Thread._ _init_ _(self)
def run(self):
while 1:
item = self._ _queue.get()
if item is None:
break # reached end of queue
# pretend we're doing something that takes 10�00 ms
time.sleep(random.randint(10, 100) / 1000.0)
print "task", item, "finished"
#
# try it
queue = Queue.Queue(0)
for i in range(WORKERS):
Worker(queue).start() # start a worker
for i in range(10):
queue.put(i)
for i in range(WORKERS):
queue.put(None) # add end-of-queue markers
task 1 finished
task 0 finished
task 3 finished
task 2 finished
task 4 finished
task 5 finished
task 7 finished
task 6 finished
task 9 finished
task 8 finished
Example 3-3 展現瞭如何限制隊列的大小. 若是隊列滿了, 那麼控制主線程 (producer threads) 被阻塞, 等待項目被彈出 (pop off).
File: queue-example-2.py
import threading
import Queue
import time, random
WORKERS = 2
class Worker(threading.Thread):
def _ _init_ _(self, queue):
self._ _queue = queue
threading.Thread._ _init_ _(self)
def run(self):
while 1:
item = self._ _queue.get()
if item is None:
break # reached end of queue
# pretend we're doing something that takes 10�00 ms
time.sleep(random.randint(10, 100) / 1000.0)
print "task", item, "finished"
#
# run with limited queue
queue = Queue.Queue(3)
for i in range(WORKERS):
Worker(queue).start() # start a worker
for item in range(10):
print "push", item
queue.put(item)
for i in range(WORKERS):
queue.put(None) # add end-of-queue markers
push 0
push 1
push 2
push 3
push 4
push 5
task 0 finished
push 6
task 1 finished
push 7
task 2 finished
push 8
task 3 finished
push 9
task 4 finished
task 6 finished
task 5 finished
task 7 finished
task 9 finished
task 8 finished
你能夠經過繼承 Queue 類來修改它的行爲. Example 3-4 爲咱們展現了一個簡單的具備優先級的隊列. 它接受一個元組做爲參數, 元組的第一個成員表示優先級(數值越小優先級越高).
File: queue-example-3.py
import Queue
import bisect
Empty = Queue.Empty
class PriorityQueue(Queue.Queue):
"Thread-safe priority queue"
def _put(self, item):
# insert in order
bisect.insort(self.queue, item)
#
# try it
queue = PriorityQueue(0)
# add items out of order
queue.put((20, "second"))
queue.put((10, "first"))
queue.put((30, "third"))
# print queue contents
try:
while 1:
print queue.get_nowait()
except Empty:
pass
third
second
first
Example 3-5 展現了一個簡單的堆棧 (stack) 實現 (末尾添加, 頭部彈出, 而非頭部添加, 頭部彈出).
File: queue-example-4.py
import Queue
Empty = Queue.Empty
class Stack(Queue.Queue):
"Thread-safe stack"
def _put(self, item):
# insert at the beginning of queue, not at the end
self.queue.insert(0, item)
# method aliases
push = Queue.Queue.put
pop = Queue.Queue.get
pop_nowait = Queue.Queue.get_nowait
#
# try it
stack = Stack(0)
# push items on stack
stack.push("first")
stack.push("second")
stack.push("third")
# print stack contents
try:
while 1:
print stack.pop_nowait()
except Empty:
pass
third
second
first
(可選) thread
模塊提爲線程提供了一個低級 (low_level) 的接口, 如 Example 3-6 所示. 只有你在編譯解釋器時打開了線程支持纔可使用它. 若是沒有特殊須要, 最好使用高級接口 threading
模塊替代.
File: thread-example-1.py
import thread
import time, random
def worker():
for i in range(50):
# pretend we're doing something that takes 10�00 ms
time.sleep(random.randint(10, 100) / 1000.0)
print thread.get_ident(), "-- task", i, "finished"
#
# try it out!
for i in range(2):
thread.start_new_thread(worker, ())
time.sleep(1)
print "goodbye!"
311 -- task 0 finished
265 -- task 0 finished
265 -- task 1 finished
311 -- task 1 finished
...
265 -- task 17 finished
311 -- task 13 finished
265 -- task 18 finished
goodbye!
注意當主程序退出的時候, 全部的線程也隨着退出. 而 threading
模塊不存在這個問題 . (該行爲可改變)
(只用於 Unix) commands
模塊包含一些用於執行外部命令的函數. Example 3-7 展現了這個模塊.
File: commands-example-1.py
import commands
stat, output = commands.getstatusoutput("ls -lR")
print "status", "=>", stat
print "output", "=>", len(output), "bytes"
status => 0
output => 171046 bytes
(只用於 Unix) pipes
模塊提供了 "轉換管道 (conversion pipelines)" 的支持. 你能夠建立包含許多外部工具調用的管道來處理多個文件. 如 Example 3-8 所示.
File: pipes-example-1.py
import pipes
t = pipes.Template()
# create a pipeline
# 這裏 " - " 表明從標準輸入讀入內容
t.append("sort", "--")
t.append("uniq", "--")
# filter some text
# 這裏空字符串表明標準輸出
t.copy("samples/sample.txt", "")
Alan Jones (sensible party)
Kevin Phillips-Bong (slightly silly)
Tarquin Fin-tim-lin-bin-whin-bim-lin-bus-stop-F'tang-F'tang-Olé-Biscuitbarrel
popen2
模塊容許你執行外部命令, 並經過流來分別訪問它的 stdin
和 stdout
( 可能還有 stderr
).
在 python 1.5.2 以及以前版本, 該模塊只存在於 Unix 平臺上. 2.0 後, Windows 下也實現了該函數. Example 3-9 展現瞭如何使用該模塊來給字符串排序.
File: popen2-example-1.py
import popen2, string
fin, fout = popen2.popen2("sort")
fout.write("foo/n")
fout.write("bar/n")
fout.close()
print fin.readline(),
print fin.readline(),
fin.close()
bar
foo
Example 3-10 展現瞭如何使用該模塊控制應用程序 .
File: popen2-example-2.py
import popen2
import string
class Chess:
"Interface class for chesstool-compatible programs"
def _ _init_ _(self, engine = "gnuchessc"):
self.fin, self.fout = popen2.popen2(engine)
s = self.fin.readline()
if s != "Chess/n":
raise IOError, "incompatible chess program"
def move(self, move):
self.fout.write(move + "/n")
self.fout.flush()
my = self.fin.readline()
if my == "Illegal move":
raise ValueError, "illegal move"
his = self.fin.readline()
return string.split(his)[2]
def quit(self):
self.fout.write("quit/n")
self.fout.flush()
#
# play a few moves
g = Chess()
print g.move("a2a4")
print g.move("b2b3")
g.quit()
b8c6
e7e5
你可使用 signal
模塊配置你本身的信號處理器 (signal handler), 如 Example 3-11 所示. 當解釋器收到某個信號時, 信號處理器會當即執行.
File: signal-example-1.py
import signal
import time
def handler(signo, frame):
print "got signal", signo
signal.signal(signal.SIGALRM, handler)
# wake me up in two seconds
signal.alarm(2)
now = time.time()
time.sleep(200)
print "slept for", time.time() - now, "seconds"
got signal 14
slept for 1.99262607098 seconds
"PALO ALTO, Calif. - Intel says its Pentium Pro and new Pentium II chips have a flaw that can cause computers to sometimes make mistakes but said the problems could be fixed easily with rewritten software."
- Reuters telegram
本章描述了一些用於在 Python 對象和其餘數據表示類型間相互轉換的模塊. 這些模塊一般用於讀寫特定的文件格式或是儲存/取出 Python 變量.
Python 提供了一些用於二進制數據解碼/編碼的模塊. struct
模塊用於在 二進制數據結構(例如 C 中的 struct )和 Python 元組間轉換. array
模塊將二進制數據陣列 ( C arrays )封裝爲 Python 序列對象.
marshal
和 pickle
模塊用於在不一樣的 Python 程序間共享/傳遞數據.
marshal
模塊使用了簡單的自描述格式( Self-Describing Formats ), 它支持大多的內建數據類型, 包括 code 對象. Python 自身也使用了這個格式來儲存編譯後代碼( .pyc 文件).
pickle
模塊提供了更復雜的格式, 它支持用戶定義的類, 自引用數據結構等等. pickle
是用 Python 寫的, 相對來講速度較慢, 不過還有一個 cPickle
模塊, 使用 C 實現了相同的功能, 速度和 marshal
不相上下.
一些模塊提供了加強的格式化輸出, 用來補充內建的 repr
函數和 %
字符串格式化操做符.
pprint
模塊幾乎能夠將任何 Python 數據結構很好地打印出來(提升可讀性).
repr
模塊能夠用來替換內建同名函數. 該模塊與內建函數不一樣的是它限制了不少輸出形式: 他只會 輸出字符串的前 30 個字符, 它只打印嵌套數據結構的幾個等級, 等等.
Python 支持大部分常見二進制編碼, 例如 base64
, binhex
(一種 Macintosh 格式) , quoted printable
, 以及 uu
編碼.
array
模塊實現了一個有效的陣列儲存類型. 陣列和列表相似, 但其中全部的項目必須爲相同的 類型. 該類型在陣列建立時指定.
Examples 4-1 到 4-5 都是很簡單的範例. Example 4-1 建立了一個 array 對象, 而後使用 tostring
方法將內部緩衝區( internal buffer )複製到字符串.
File: array-example-1.py
import array
a = array.array("B", range(16)) # unsigned char
b = array.array("h", range(16)) # signed short
print a
print repr(a.tostring())
print b
print repr(b.tostring())
array('B', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])
'/000/001/002/003/004/005/006/007/010/011/012/013/014/015/016/017'
array('h', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])
'/000/000/001/000/002/000/003/000/004/000/005/000/006/000/007/000
/010/000/011/000/012/000/013/000/014/000/015/000/016/000/017/000'
array 對象能夠做爲一個普通列表對待, 如 Example 4-2 所示. 不過, 你不能鏈接兩個不一樣類型的陣列.
File: array-example-2.py
import array
a = array.array("B", [1, 2, 3])
a.append(4)
a = a + a
a = a[2:-2]
print a
print repr(a.tostring())
for i in a:
print i,
array('B', [3, 4, 1, 2])
'/003/004/001/002'
3 4 1 2
該模塊還提供了用於轉換原始二進制數據到整數序列(或浮點數數列, 具體狀況決定)的方法, 如 Example 4-3 所示.
File: array-example-3.py
import array
a = array.array("i", "fish license") # signed integer
print a
print repr(a.tostring())
print a.tolist()
array('i', [1752394086, 1667853344, 1702063717])
'fish license'
[1752394086, 1667853344, 1702063717]
最後, Example 4-4 展現瞭如何使用該模塊判斷當前平臺的字節序( endianess ) .
File: array-example-4.py
import array
def little_endian():
return ord(array.array("i",[1]).tostring()[0])
if little_endian():
print "little-endian platform (intel, alpha)"
else:
print "big-endian platform (motorola, sparc)"
big-endian platform (motorola, sparc)
Python 2.0 以及之後版本提供了 sys.byteorder
屬性, 能夠更簡單地判斷字節序 (屬性值爲 "little
" 或 "big
" ), 如 Example 4-5 所示.
File: sys-byteorder-example-1.py
import sys
# 2.0 and later
if sys.byteorder == "little":
print "little-endian platform (intel, alpha)"
else:
print "big-endian platform (motorola, sparc)"
big-endian platform (motorola, sparc)
struct
模塊用於轉換二進制字符串和 Python 元組. pack
函數接受格式字符串以及額外參數, 根據指定格式將額外參數轉換爲二進制字符串. upack
函數接受一個字符串做爲參數, 返回一個元組. 如 Example 4-6 所示.
File: struct-example-1.py
import struct
# native byteorder
buffer = struct.pack("ihb", 1, 2, 3)
print repr(buffer)
print struct.unpack("ihb", buffer)
# data from a sequence, network byteorder
data = [1, 2, 3]
buffer = apply(struct.pack, ("!ihb",) + tuple(data))
print repr(buffer)
print struct.unpack("!ihb", buffer)
# in 2.0, the apply statement can also be written as:
# buffer = struct.pack("!ihb", *data)
'/001/000/000/000/002/000/003'
(1, 2, 3)
'/000/000/000/001/000/002/003'
(1, 2, 3)
xdrlib
模塊用於在 Python 數據類型和 Sun 的 external data representation (XDR) 間相互轉化, 如 Example 4-7 所示.
File: xdrlib-example-1.py
import xdrlib
#
# create a packer and add some data to it
p = xdrlib.Packer()
p.pack_uint(1)
p.pack_string("spam")
data = p.get_buffer()
print "packed:", repr(data)
#
# create an unpacker and use it to decode the data
u = xdrlib.Unpacker(data)
print "unpacked:", u.unpack_uint(), repr(u.unpack_string())
u.done()
packed: '/000/000/000/001/000/000/000/004spam'
unpacked: 1 'spam'
Sun 在 remote procedure call (RPC) 協議中使用了 XDR 格式. Example 4-8 雖然不完整, 但它展現瞭如何創建一個 RPC 請求包.
File: xdrlib-example-2.py
import xdrlib
# some constants (see the RPC specs for details)
RPC_CALL = 1
RPC_VERSION = 2
MY_PROGRAM_ID = 1234 # assigned by Sun
MY_VERSION_ID = 1000
MY_TIME_PROCEDURE_ID = 9999
AUTH_NULL = 0
transaction = 1
p = xdrlib.Packer()
# send a Sun RPC call package
p.pack_uint(transaction)
p.pack_enum(RPC_CALL)
p.pack_uint(RPC_VERSION)
p.pack_uint(MY_PROGRAM_ID)
p.pack_uint(MY_VERSION_ID)
p.pack_uint(MY_TIME_PROCEDURE_ID)
p.pack_enum(AUTH_NULL)
p.pack_uint(0)
p.pack_enum(AUTH_NULL)
p.pack_uint(0)
print repr(p.get_buffer())
'/000/000/000/001/000/000/000/001/000/000/000/002/000/000/004/322
/000/000/003/350/000/000/'/017/000/000/000/000/000/000/000/000/000
/000/000/000/000/000/000/000'
marshal
模塊能夠把不連續的數據組合起來 - 與字符串相互轉化, 這樣它們就能夠寫入文件或是在網絡中傳輸. 如 Example 4-9 所示.
marshal
模塊使用了簡單的自描述格式. 對於每一個數據項目, 格式化後的字符串都包含一個類型代碼, 而後是一個或多個類型標識區域. 整數使用小字節序( little-endian order )儲存, 字符串儲存時和它自身內容長度相同(可能包含空字節), 元組由組成它的對象組合表示.
File: marshal-example-1.py
import marshal
value = (
"this is a string",
[1, 2, 3, 4],
("more tuples", 1.0, 2.3, 4.5),
"this is yet another string"
)
data = marshal.dumps(value)
# intermediate format
print type(data), len(data)
print "-"*50
print repr(data)
print "-"*50
print marshal.loads(data)
<type 'string'> 118
--------------------------------------------------
'(/004/000/000/000s/020/000/000/000this is a string
[/004/000/000/000i/001/000/000/000i/002/000/000/000
i/003/000/000/000i/004/000/000/000(/004/000/000/000
s/013/000/000/000more tuplesf/0031.0f/0032.3f/0034.
5s/032/000/000/000this is yet another string'
--------------------------------------------------
('this is a string', [1, 2, 3, 4], ('more tuples',
1.0, 2.3, 4.5), 'this is yet another string')
marshal
模塊還能夠處理 code 對象(它用於儲存預編譯的 Python 模塊). 如 Example 4-10 所示.
File: marshal-example-2.py
import marshal
script = """
print 'hello'
"""
code = compile(script, "<script>", "exec")
data = marshal.dumps(code)
# intermediate format
print type(data), len(data)
print "-"*50
print repr(data)
print "-"*50
exec marshal.loads(data)
<type 'string'> 81
--------------------------------------------------
'c/000/000/000/000/001/000/000/000s/017/000/000/00
0/177/000/000/177/002/000d/000/000GHd/001/000S(/00
2/000/000/000s/005/000/000/000helloN(/000/000/000/
000(/000/000/000/000s/010/000/000/000<script>s/001
/000/000/000?/002/000s/000/000/000/000'
--------------------------------------------------
hello
pickle
模塊同 marshal
模塊相同, 將數據連續化, 便於保存傳輸. 它比 marshal
要慢一些, 但它能夠處理類實例, 共享的元素, 以及遞歸數據結構等.
File: pickle-example-1.py
import pickle
value = (
"this is a string",
[1, 2, 3, 4],
("more tuples", 1.0, 2.3, 4.5),
"this is yet another string"
)
data = pickle.dumps(value)
# intermediate format
print type(data), len(data)
print "-"*50
print data
print "-"*50
print pickle.loads(data)
<type 'string'> 121
--------------------------------------------------
(S'this is a string'
p0
(lp1
I1
aI2
aI3
aI4
a(S'more tuples'
p2
F1.0
F2.3
F4.5
tp3
S'this is yet another string'
p4
tp5
.
--------------------------------------------------
('this is a string', [1, 2, 3, 4], ('more tuples',
1.0, 2.3, 4.5), 'this is yet another string')
不過另外一方面, pickle
不能處理 code 對象(能夠參閱 copy_reg
模塊來完成這個).
默認狀況下, pickle 使用急於文本的格式. 你也可使用二進制格式, 這樣數字和二進制 字符串就會以緊密的格式儲存, 這樣文件就會更小點. 如 Example 4-12所示.
File: pickle-example-2.py
import pickle
import math
value = (
"this is a long string" * 100,
[1.2345678, 2.3456789, 3.4567890] * 100
)
# text mode
data = pickle.dumps(value)
print type(data), len(data), pickle.loads(data) == value
# binary mode
data = pickle.dumps(value, 1)
print type(data), len(data), pickle.loads(data) == value
(可選, 注意大小寫) cPickle 模塊是針對 pickle 模塊的一個更快的實現. 如 Example 4-13 所示.
File: cpickle-example-1.py
try:
import cPickle
pickle = cPickle
except ImportError:
import pickle
你可使用 copy_reg
模塊註冊你本身的擴展類型. 這樣 pickle
和 copy
模塊就會知道 如何處理非標準類型.
例如, 標準的 pickle
實現不能用來處理 Python code 對象, 以下所示:
File: copy-reg-example-1.py
import pickle
CODE = """
print 'good evening'
"""
code = compile(CODE, "<string>", "exec")
exec code
exec pickle.loads(pickle.dumps(code))
good evening
Traceback (innermost last):
...
pickle.PicklingError: can't pickle 'code' objects
咱們能夠註冊一個 code 對象處理器來完成目標. 處理器應包含兩個部分: 一個 pickler
, 接受 code 對象 並返回一個只包含簡單數據類型的元組, 以及一個unpickler
, 做用相反, 接受這樣的元組做爲參數. 如 Example 4-14 所示.
File: copy-reg-example-2.py
import copy_reg
import pickle, marshal, types
#
# register a pickle handler for code objects
def code_unpickler(data):
return marshal.loads(data)
def code_pickler(code):
return code_unpickler, (marshal.dumps(code),)
copy_reg.pickle(types.CodeType, code_pickler, code_unpickler)
#
# try it out
CODE = """
print "suppose he's got a pointed stick"
"""
code = compile(CODE, "<string>", "exec")
exec code
exec pickle.loads(pickle.dumps(code))
suppose he's got a pointed stick
suppose he's got a pointed stick
若是你是在網絡中傳輸 pickle 後的數據, 那麼請確保自定義的 unpickler
在數據接收端也是可用的.
Example 4-15 展現瞭如何實現 pickle 一個打開的文件對象.
File: copy-reg-example-3.py
import copy_reg
import pickle, types
import StringIO
#
# register a pickle handler for file objects
def file_unpickler(position, data):
file = StringIO.StringIO(data)
file.seek(position)
return file
def file_pickler(code):
position = file.tell()
file.seek(0)
data = file.read()
file.seek(position)
return file_unpickler, (position, data)
copy_reg.pickle(types.FileType, file_pickler, file_unpickler)
#
# try it out
file = open("samples/sample.txt", "rb")
print file.read(120),
print "<here>",
print pickle.loads(pickle.dumps(file)).read()
We will perhaps eventually be writing only small
modules, which are identified by name as they are
used to build larger <here> ones, so that devices like
indentation, rather than delimiters, might become
feasible for expressing local structure in the
source language.
-- Donald E. Knuth, December 1974
pprint
模塊( pretty printer )用於打印 Python 數據結構. 當你在命令行下打印 特定數據結構時你會發現它頗有用(輸出格式比較整齊, 便於閱讀).
File: pprint-example-1.py
import pprint
data = (
"this is a string", [1, 2, 3, 4], ("more tuples",
1.0, 2.3, 4.5), "this is yet another string"
)
pprint.pprint(data)
('this is a string',
[1, 2, 3, 4],
('more tuples', 1.0, 2.3, 4.5),
'this is yet another string')
repr
模塊提供了內建 repr
函數的另個版本. 它限制了不少(字符串長度, 遞歸等). Example 4-17 展現瞭如何使用該模塊.
File: repr-example-1.py
# note: this overrides the built-in 'repr' function
from repr import repr
# an annoyingly recursive data structure
data = (
"X" * 100000,
)
data = [data]
data.append(data)
print repr(data)
[('XXXXXXXXXXXX...XXXXXXXXXXXXX',), [('XXXXXXXXXXXX...XXXXXXXXXX
XXX',), [('XXXXXXXXXXXX...XXXXXXXXXXXXX',), [('XXXXXXXXXXXX...XX
XXXXXXXXXXX',), [('XXXXXXXXXXXX...XXXXXXXXXXXXX',), [(...), [...
]]]]]]]
base64
編碼體系用於將任意二進制數據轉換爲純文本. 它將一個 3 字節的二進制字節組 轉換爲 4 個文本字符組儲存, 並且規定只容許如下集合中的字符出現:
ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz
0123456789+/
另外, =
用於填充數據流的末尾.
Example 4-18 展現瞭如何使用 encode
和 decode
函數操做文件對象.
File: base64-example-1.py
import base64
MESSAGE = "life of brian"
file = open("out.txt", "w")
file.write(MESSAGE)
file.close()
base64.encode(open("out.txt"), open("out.b64", "w"))
base64.decode(open("out.b64"), open("out.txt", "w"))
print "original:", repr(MESSAGE)
print "encoded message:", repr(open("out.b64").read())
print "decoded message:", repr(open("out.txt").read())
original: 'life of brian'
encoded message: 'bGlmZSBvZiBicmlhbg==/012'
decoded message: 'life of brian'
Example 4-19 展現瞭如何使用 encodestring
和 decodestring
函數在字符串間轉換. 它們是 encode
和 decode
函數的頂層封裝. 使用 StringIO
對象處理輸入和輸出.
File: base64-example-2.py
import base64
MESSAGE = "life of brian"
data = base64.encodestring(MESSAGE)
original_data = base64.decodestring(data)
print "original:", repr(MESSAGE)
print "encoded data:", repr(data)
print "decoded data:", repr(original_data)
original: 'life of brian'
encoded data: 'bGlmZSBvZiBicmlhbg==/012'
decoded data: 'life of brian'
Example 4-20 展現瞭如何將用戶名和密碼轉換爲 HTTP 基自己份驗證字符串.
File: base64-example-3.py
import base64
def getbasic(user, password):
# basic authentication (according to HTTP)
return base64.encodestring(user + ":" + password)
print getbasic("Aladdin", "open sesame")
'QWxhZGRpbjpvcGVuIHNlc2FtZQ=='
最後, Example 4-21 展現了一個實用小工具, 它能夠把 GIF 格式轉換爲 Python 腳本, 便於使用 Tkinter 庫.
File: base64-example-4.py
import base64, sys
if not sys.argv[1:]:
print "Usage: gif2tk.py giffile >pyfile"
sys.exit(1)
data = open(sys.argv[1], "rb").read()
if data[:4] != "GIF8":
print sys.argv[1], "is not a GIF file"
sys.exit(1)
print '# generated from', sys.argv[1], 'by gif2tk.py'
print 'from Tkinter import PhotoImage'
print 'image = PhotoImage(data="""'
print base64.encodestring(data),
print '""")'
# generated from samples/sample.gif by gif2tk.py
from Tkinter import PhotoImage
image = PhotoImage(data="""
R0lGODlhoAB4APcAAAAAAIAAAACAAICAAAAAgIAAgACAgICAgAQEBIwEBIyMBJRUlISE/LRUBAQE
...
AjmQBFmQBnmQCJmQCrmQDNmQDvmQEBmREnkRAQEAOw==
""")
binhex
模塊用於到 Macintosh BinHex 格式的相互轉化. 如 Example 4-22 所示.
File: binhex-example-1.py
import binhex
import sys
infile = "samples/sample.jpg"
binhex.binhex(infile, sys.stdout)
(This file must be converted with BinHex 4.0)
:#R0KEA"XC5jUF'F!2j!)!*!%%TS!N!4RdrrBrq!!%%T'58B!!3%!!!%!!3!!rpX
!3`!)"JB("J8)"`F(#3N)#J`8$3`,#``C%K-2&"dD(aiG'K`F)#3Z*b!L,#-F(#J
h+5``-63d0"mR16di-M`Z-c3brpX!3`%*#3N-#``B$3dB-L%F)6+3-[r!!"%)!)!
!J!-")J!#%3%$%3(ra!!I!!!""3'3"J#3#!%#!`3&"JF)#3S,rm3!Y4!!!J%$!`)
%!`8&"!3!!!&p!3)$!!34"4)K-8%'%e&K"b*a&$+"ND%))d+a`495dI!N-f*bJJN
該模塊有兩個函數 binhex
和 hexbin
.
quopri
模塊基於 MIME 標準實現了引用的可打印編碼( quoted printable encoding ).
這樣的編碼能夠將不包含或只包含一部分U.S. ASCII 文本的信息, 例如大多歐洲語言, 中文, 轉換爲只包含 U.S. ASCII 的信息. 在一些老式的 mail 代理中你會發現這頗有用, 由於它們通常不支持特殊. 如 Example 4-23 所示.
File: quopri-example-1.py
import quopri
import StringIO
# helpers (the quopri module only supports file-to-file conversion)
def encodestring(instring, tabs=0):
outfile = StringIO.StringIO()
quopri.encode(StringIO.StringIO(instring), outfile, tabs)
return outfile.getvalue()
def decodestring(instring):
outfile = StringIO.StringIO()
quopri.decode(StringIO.StringIO(instring), outfile)
return outfile.getvalue()
#
# try it out
MESSAGE = "å i åa ä e ö!"
encoded_message = encodestring(MESSAGE)
decoded_message = decodestring(encoded_message)
print "original:", MESSAGE
print "encoded message:", repr(encoded_message)
print "decoded message:", decoded_message
original: å i åa ä e ö!
encoded message: '=E5 i =E5a =E4 e =F6!/012'
decoded message: å i åa ä e ö!
如 Example 4-23 所示, 非 U.S. 字符經過等號 (=
) 附加兩個十六進制字符來表示. 這裏須要注意等號也是使用這樣的方式( "=3D
" )來表示的, 以及換行符( "=20
" ). 其餘字符不會被改變. 因此若是你沒有用太多的怪異字符的話, 編碼後字符串依然可讀性很好.
(Europeans generally hate this encoding and strongly believe that certain U.S. programmers deserve to be slapped in the head with a huge great fish to the jolly music of Edward German....)
uu
編碼體系用於將任意二進制數據轉換爲普通文本格式. 該格式在新聞組中很流行, 但逐漸被 base64
編碼取代.
uu
編碼將每一個 3 字節( 24 位)的數據組轉換爲 4 個可打印字符(每一個字符 6 位), 使用從 chr(32) (空格) 到 chr(95) 的字符. uu 編碼一般會使數據大小增長 40% .
一個編碼後的數據流以一個新行開始, 它包含文件的權限( Unix 格式)和文件名, 以 end 行結尾:
begin 666 sample.jpg
M_]C_X 02D9)1@ ! 0 0 ! #_VP!# @&!@<&!0@'!P<)'0@*#!0-# L+
...more lines like this...
end
uu
模塊提供了兩個函數: encode
和 decode
.
encode(infile, outfile, filename)
函數從編碼輸入文件中的數據, 而後寫入到輸出文件中. 如 Example 4-24 所示. infile 和 outfile 能夠是文件名或文件對象. filename 參數做爲起始域的文件名寫入.
File: uu-example-1.py
import uu
import os, sys
infile = "samples/sample.jpg"
uu.encode(infile, sys.stdout, os.path.basename(infile))
begin 666 sample.jpg
M_]C_X 02D9)1@ ! 0 0 ! #_VP!# @&!@<&!0@'!P<)"0@*#!0-# L+
M#!D2$P/4'1H?'AT:'!P@)"XG("(L(QP<*#<I+# Q-#0T'R<Y/3@R/"XS-#+_
MVP!# 0D)"0P+#!@-#1@R(1PA,C(R,C(R,C(R,C(R,C(R,C(R,C(R,C(R,C(R
M,C(R,C(R,C(R,C(R,C(R,C(R,C(R,C+_P 1" " ( # 2( A$! Q$!_/0
M'P 04! 0$! 0$ $" P0%!@<("0H+_/0 M1 @$# P($ P4%
decode(infile, outfile)
函數用來解碼 uu 編碼的數據. 一樣地, 參數能夠是文件名也能夠是文件對象. 如 Example 4-25 所示.
File: uu-example-2.py
import uu
import StringIO
infile = "samples/sample.uue"
outfile = "samples/sample.jpg"
#
# decode
fi = open(infile)
fo = StringIO.StringIO()
uu.decode(fi, fo)
#
# compare with original data file
data = open(outfile, "rb").read()
if fo.getvalue() == data:
print len(data), "bytes ok"
binascii
提供了多個編碼的支持函數, 包括 base64
, binhex
, 以及 uu
. 如 Example 4-26 所示.
2.0 及之後版本中, 你還可使用它在二進制數據和十六進制字符串中相互轉換.
File: binascii-example-1.py
import binascii
text = "hello, mrs teal"
data = binascii.b2a_base64(text)
text = binascii.a2b_base64(data)
print text, "<=>", repr(data)
data = binascii.b2a_uu(text)
text = binascii.a2b_uu(data)
print text, "<=>", repr(data)
data = binascii.b2a_hqx(text)
text = binascii.a2b_hqx(data)[0]
print text, "<=>", repr(data)
# 2.0 and newer
data = binascii.b2a_hex(text)
text = binascii.a2b_hex(data)
print text, "<=>", repr(data)
hello, mrs teal <=> 'aGVsbG8sIG1ycyB0ZWFs/012'
hello, mrs teal <=> '/:&5L;&//L(&UR<R!T96%L/012'
hello, mrs teal <=> 'D/'9XE/'mX)/'ebFb"dC@&X'
hello, mrs teal <=> '68656c6c6f2c206d7273207465616c'
本章將描述用於處理不一樣文件格式的模塊.
Python 提供了一些用於處理可擴展標記語言( Extensible Markup Language , XML ) 和超文本標記語言( Hypertext Markup Language , HTML )的擴展. Python 一樣提供了對 標準通用標記語言( Standard Generalized Markup Language , SGML )的支持.
全部這些格式都有着相同的結構, 由於 HTML 和 XML 都來自 SGML . 每一個文檔都是由 起始標籤( start tags ), 結束標籤( end tags ), 文本(又叫字符數據), 以及實體引用( entity references )構成:
<document name="sample.xml">
<header>This is a header</header>
<body>This is the body text. The text can contain
plain text ("character data"), tags, and
entities.
</body>
</document>
在這個例子中, <document>
, <header>
, 以及 <body>
是起始標籤. 每一個起始標籤都有一個對應的結束標籤, 使用斜線 "/
" 標記. 起始標籤能夠包含多個屬性, 好比這裏的 name
屬性.
起始標籤和它對應的結束標籤中的任何東西被稱爲 元素( element ). 這裏 document
元素包含 header
和 body
兩個元素.
"
是一個字符實體( character entity ). 字符實體用於在文本區域中表示特殊的保留字符, 使用 &
指示. 這裏它表明一個引號, 常見字符實體還有 " < ( < )
" 和 " > ( > )
" .
雖然 XML , HTML , SGML 使用相同的結構塊, 但它們還有一些不一樣點. 在 XML 中, 全部元素必須有起始和結束標籤, 全部標籤必須正確嵌套( well-formed ). 並且 XML 是區分大小寫的, 因此 <document>
和 <Document>
是不一樣的元素類型.
HTML 有很高靈活性, HTML 語法分析器通常會自動補全缺失標籤; 例如, 當遇到一個以 <P>
標籤開始的新段落, 卻沒有對應結束標籤, 語法分析器會自動添加一個 </P>
標籤. HTML 也是區分大小寫的. 另外一方面, XML 容許你定義任何元素, 而 HTML 使用一些由 HTML 規範定義的固定元素.
SGML 有着更高的靈活性, 你可使用本身的聲明( declaration ) 定義源文件如何轉換到元素結構, DTD ( document type description , 文件類型定義)能夠用來 檢查結構並補全缺失標籤. 技術上來講, HTML 和 XML 都是 SGML 應用, 有各自的 SGML 聲明, 並且 HTML 有一個標準 DTD .
Python 提供了多個 makeup 語言分析器. 因爲 SGML 是最靈活的格式, Python 的 sgmllib
事實上很簡單. 它不會去處理 DTD , 不過你能夠繼承它來提供更復雜的功能.
Python 的 HTML 支持基於 SGML 分析器. htmllib
將具體的格式輸出工做交給 formatter 對象. formatter
模塊包含一些標準格式化標誌.
Python 的 XML 支持模塊很複雜. 先前是隻有與 sgmllib
相似的 xmllib
, 後來加入了更高級的 expat
模塊(可選). 而最新版本中已經準備廢棄 xmllib
,啓用xml
包做爲工具集.
ConfigParser
模塊用於讀取簡單的配置文件, 相似 Windows 下的 INI 文件.
netrc
模塊用於讀取 .netrc 配置文件, shlex 模塊用於讀取相似 shell 腳本語法的配置文件.
Python 的標準庫提供了對 GZIP 和 ZIP ( 2.0 及之後) 格式的支持. 基於 zlib 模塊, gzip
和 zipfile
模塊分別用來處理這類文件.
xmllib
已在當前版本中申明不支持.
xmlib
模塊提供了一個簡單的 XML 語法分析器, 使用正則表達式將 XML 數據分離, 如 Example 5-1 所示. 語法分析器只對文檔作基本的檢查, 例如是否只有一個頂層元素, 全部的標籤是否匹配.
XML 數據一塊一塊地發送給 xmllib 分析器(例如在網路中傳輸的數據). 分析器在遇到起始標籤, 數據區域, 結束標籤, 和實體的時候調用不一樣的方法.
若是你只是對某些標籤感興趣, 你能夠定義特殊的 start_tag
和 end_tag
方法, 這裏 tag
是標籤名稱. 這些 start
函數使用它們對應標籤的屬性做爲參數調用(傳遞時爲一個字典).
File: xmllib-example-1.py
import xmllib
class Parser(xmllib.XMLParser):
# get quotation number
def _ _init_ _(self, file=None):
xmllib.XMLParser._ _init_ _(self)
if file:
self.load(file)
def load(self, file):
while 1:
s = file.read(512)
if not s:
break
self.feed(s)
self.close()
def start_quotation(self, attrs):
print "id =>", attrs.get("id")
raise EOFError
try:
c = Parser()
c.load(open("samples/sample.xml"))
except EOFError:
pass
id => 031
Example 5-2 展現了一個簡單(不完整)的內容輸出引擎( rendering engine ). 分析器有一個元素堆棧( _ _tags
), 它連同文本片段傳遞給輸出生成器. 生成器會在 style 字典中查詢當前標籤的層次, 若是不存在, 它將根據樣式表建立一個新的樣式描述.
File: xmllib-example-2.py
import xmllib
import string, sys
STYLESHEET = {
# each element can contribute one or more style elements
"quotation": {"style": "italic"},
"lang": {"weight": "bold"},
"name": {"weight": "medium"},
}
class Parser(xmllib.XMLParser):
# a simple styling engine
def _ _init_ _(self, renderer):
xmllib.XMLParser._ _init_ _(self)
self._ _data = []
self._ _tags = []
self._ _renderer = renderer
def load(self, file):
while 1:
s = file.read(8192)
if not s:
break
self.feed(s)
self.close()
def handle_data(self, data):
self._ _data.append(data)
def unknown_starttag(self, tag, attrs):
if self._ _data:
text = string.join(self._ _data, "")
self._ _renderer.text(self._ _tags, text)
self._ _tags.append(tag)
self._ _data = []
def unknown_endtag(self, tag):
self._ _tags.pop()
if self._ _data:
text = string.join(self._ _data, "")
self._ _renderer.text(self._ _tags, text)
self._ _data = []
class DumbRenderer:
def _ _init_ _(self):
self.cache = {}
def text(self, tags, text):
# render text in the style given by the tag stack
tags = tuple(tags)
style = self.cache.get(tags)
if style is None:
# figure out a combined style
style = {}
for tag in tags:
s = STYLESHEET.get(tag)
if s:
style.update(s)
self.cache[tags] = style # update cache
# write to standard output
sys.stdout.write("%s =>/n" % style)
sys.stdout.write(" " + repr(text) + "/n")
#
# try it out
r = DumbRenderer()
c = Parser(r)
c.load(open("samples/sample.xml"))
{'style': 'italic'} =>
'I/'ve had a lot of developers come up to me and/012say,
"I haven/'t had this much fun in a long time. It sure
beats/012writing '
{'style': 'italic', 'weight': 'bold'} =>
'Cobol'
{'style': 'italic'} =>
'" -- '
{'style': 'italic', 'weight': 'medium'} =>
'James Gosling'
{'style': 'italic'} =>
', on/012'
{'weight': 'bold'} =>
'Java'
{'style': 'italic'} =>
'.'
(可選) xml.parsers.expat
模塊是 James Clark's Expat XML parser 的接口. Example 5-3 展現了這個功能完整且性能很好的語法分析器.
File: xml-parsers-expat-example-1.py
from xml.parsers import expat
class Parser:
def _ _init_ _(self):
self._parser = expat.ParserCreate()
self._parser.StartElementHandler = self.start
self._parser.EndElementHandler = self.end
self._parser.CharacterDataHandler = self.data
def feed(self, data):
self._parser.Parse(data, 0)
def close(self):
self._parser.Parse("", 1) # end of data
del self._parser # get rid of circular references
def start(self, tag, attrs):
print "START", repr(tag), attrs
def end(self, tag):
print "END", repr(tag)
def data(self, data):
print "DATA", repr(data)
p = Parser()
p.feed("<tag>data</tag>")
p.close()
START u'tag' {}
DATA u'data'
END u'tag'
注意即便你傳入的是普通的文本, 這裏的分析器仍然會返回 Unicode 字符串. 默認狀況下, 分析器將源文本做爲 UTF-8 解析. 若是要使用其餘編碼, 請確保 XML 文件包含 encoding 說明. 如 Example 5-4 所示.
File: xml-parsers-expat-example-2.py
from xml.parsers import expat
class Parser:
def _ _init_ _(self):
self._parser = expat.ParserCreate()
self._parser.StartElementHandler = self.start
self._parser.EndElementHandler = self.end
self._parser.CharacterDataHandler = self.data
def feed(self, data):
self._parser.Parse(data, 0)
def close(self):
self._parser.Parse("", 1) # end of data
del self._parser # get rid of circular references
def start(self, tag, attrs):
print "START", repr(tag), attrs
def end(self, tag):
print "END", repr(tag)
def data(self, data):
print "DATA", repr(data)
p = Parser()
p.feed("""/
<?xml version='1.0' encoding='iso-8859-1'?>
<author>
<name>fredrik lundh</name>
<city>linköping</city>
</author>
"""
)
p.close()
START u'author' {}
DATA u'/012'
START u'name' {}
DATA u'fredrik lundh'
END u'name'
DATA u'/012'
START u'city' {}
DATA u'link/366ping'
END u'city'
DATA u'/012'
END u'author'
sgmllib
模塊, 提供了一個基本的 SGML 語法分析器. 它與 xmllib
分析器基本相同, 但限制更少(並且不是很完善). 如 Example 5-5 所示.
和在 xmllib
中同樣, 這個分析器在遇到起始標籤, 數據區域, 結束標籤以及實體時調用內部方法. 若是你只是對某些標籤感興趣, 那麼你能夠定義特殊的方法.
File: sgmllib-example-1.py
import sgmllib
import string
class FoundTitle(Exception):
pass
class ExtractTitle(sgmllib.SGMLParser):
def _ _init_ _(self, verbose=0):
sgmllib.SGMLParser._ _init_ _(self, verbose)
self.title = self.data = None
def handle_data(self, data):
if self.data is not None:
self.data.append(data)
def start_title(self, attrs):
self.data = []
def end_title(self):
self.title = string.join(self.data, "")
raise FoundTitle # abort parsing!
def extract(file):
# extract title from an HTML/SGML stream
p = ExtractTitle()
try:
while 1:
# read small chunks
s = file.read(512)
if not s:
break
p.feed(s)
p.close()
except FoundTitle:
return p.title
return None
#
# try it out
print "html", "=>", extract(open("samples/sample.htm"))
print "sgml", "=>", extract(open("samples/sample.sgm"))
html => A Title.
sgml => Quotations
重載 unknown_starttag
和 unknown_endtag
方法就能夠處理全部的標籤. 如 Example 5-6 所示.
File: sgmllib-example-2.py
import sgmllib
import cgi, sys
class PrettyPrinter(sgmllib.SGMLParser):
# A simple SGML pretty printer
def _ _init_ _(self):
# initialize base class
sgmllib.SGMLParser._ _init_ _(self)
self.flag = 0
def newline(self):
# force newline, if necessary
if self.flag:
sys.stdout.write("/n")
self.flag = 0
def unknown_starttag(self, tag, attrs):
# called for each start tag
# the attrs argument is a list of (attr, value)
# tuples. convert it to a string.
text = ""
for attr, value in attrs:
text = text + " %s='%s'" % (attr, cgi.escape(value))
self.newline()
sys.stdout.write("<%s%s>/n" % (tag, text))
def handle_data(self, text):
# called for each text section
sys.stdout.write(text)
self.flag = (text[-1:] != "/n")
def handle_entityref(self, text):
# called for each entity
sys.stdout.write("&%s;" % text)
def unknown_endtag(self, tag):
# called for each end tag
self.newline()
sys.stdout.write("<%s>" % tag)
#
# try it out
file = open("samples/sample.sgm")
p = PrettyPrinter()
p.feed(file.read())
p.close()
<chapter>
<title>
Quotations
<title>
<epigraph>
<attribution>
eff-bot, June 1997
<attribution>
<para>
<quote>
Nobody expects the Spanish Inquisition! Amongst
our weaponry are such diverse elements as fear, surprise,
ruthless efficiency, and an almost fanatical devotion to
Guido, and nice red uniforms — oh, damn!
<quote>
<para>
<epigraph>
<chapter>
Example 5-7 檢查 SGML 文檔是不是如 XML 那樣 "正確格式化", 全部的元素是否正確嵌套, 起始和結束標籤是否匹配等.
咱們使用列表保存全部起始標籤, 而後檢查每一個結束標籤是否匹配前個起始標籤. 最後確認到達文件末尾時沒有未關閉的標籤.
File: sgmllib-example-3.py
import sgmllib
class WellFormednessChecker(sgmllib.SGMLParser):
# check that an SGML document is 'well-formed'
# (in the XML sense).
def _ _init_ _(self, file=None):
sgmllib.SGMLParser._ _init_ _(self)
self.tags = []
if file:
self.load(file)
def load(self, file):
while 1:
s = file.read(8192)
if not s:
break
self.feed(s)
self.close()
def close(self):
sgmllib.SGMLParser.close(self)
if self.tags:
raise SyntaxError, "start tag %s not closed" % self.tags[-1]
def unknown_starttag(self, start, attrs):
self.tags.append(start)
def unknown_endtag(self, end):
start = self.tags.pop()
if end != start:
raise SyntaxError, "end tag %s does't match start tag %s" %/
(end, start)
try:
c = WellFormednessChecker()
c.load(open("samples/sample.htm"))
except SyntaxError:
raise # report error
else:
print "document is well-formed"
Traceback (innermost last):
...
SyntaxError: end tag head does't match start tag meta
最後, Example 5-8 中的類能夠用來過濾 HTML 和 SGML 文檔. 繼承這個類, 而後實現 start
和 end
方法便可.
File: sgmllib-example-4.py
import sgmllib
import cgi, string, sys
class SGMLFilter(sgmllib.SGMLParser):
# sgml filter. override start/end to manipulate
# document elements
def _ _init_ _(self, outfile=None, infile=None):
sgmllib.SGMLParser._ _init_ _(self)
if not outfile:
outfile = sys.stdout
self.write = outfile.write
if infile:
self.load(infile)
def load(self, file):
while 1:
s = file.read(8192)
if not s:
break
self.feed(s)
self.close()
def handle_entityref(self, name):
self.write("&%s;" % name)
def handle_data(self, data):
self.write(cgi.escape(data))
def unknown_starttag(self, tag, attrs):
tag, attrs = self.start(tag, attrs)
if tag:
if not attrs:
self.write("<%s>" % tag)
else:
self.write("<%s" % tag)
for k, v in attrs:
self.write(" %s=%s" % (k, repr(v)))
self.write(">")
def unknown_endtag(self, tag):
tag = self.end(tag)
if tag:
self.write("</%s>" % tag)
def start(self, tag, attrs):
return tag, attrs # override
def end(self, tag):
return tag # override
class Filter(SGMLFilter):
def fixtag(self, tag):
if tag == "em":
tag = "i"
if tag == "string":
tag = "b"
return string.upper(tag)
def start(self, tag, attrs):
return self.fixtag(tag), attrs
def end(self, tag):
return self.fixtag(tag)
c = Filter()
c.load(open("samples/sample.htm"))
htmlib
模塊包含了一個標籤驅動的( tag-driven ) HTML 語法分析器, 它會將數據發送至一個格式化對象. 如 Example 5-9 所示. 更多關於如何解析 HTML 的例子請參閱 formatter
模塊.
File: htmllib-example-1.py
import htmllib
import formatter
import string
class Parser(htmllib.HTMLParser):
# return a dictionary mapping anchor texts to lists
# of associated hyperlinks
def _ _init_ _(self, verbose=0):
self.anchors = {}
f = formatter.NullFormatter()
htmllib.HTMLParser._ _init_ _(self, f, verbose)
def anchor_bgn(self, href, name, type):
self.save_bgn()
self.anchor = href
def anchor_end(self):
text = string.strip(self.save_end())
if self.anchor and text:
self.anchors[text] = self.anchors.get(text, []) + [self.anchor]
file = open("samples/sample.htm")
html = file.read()
file.close()
p = Parser()
p.feed(html)
p.close()
for k, v in p.anchors.items():
print k, "=>", v
link => ['http://www.python.org']
若是你只是想解析一個 HTML 文件, 而不是將它交給輸出設備, 那麼 sgmllib
模塊會是更好的選擇.
htmlentitydefs
模塊包含一個由 HTML 中 ISO Latin-1 字符實體構成的字典. 如 Example 5-10 所示.
File: htmlentitydefs-example-1.py
import htmlentitydefs
entities = htmlentitydefs.entitydefs
for entity in "amp", "quot", "copy", "yen":
print entity, "=", entities[entity]
amp = &
quot = "
copy = /302/251
yen = /302/245
Example 5-11 展現瞭如何將正則表達式與這個字典結合起來翻譯字符串中的實體 ( cgi.escape
的逆向操做).
File: htmlentitydefs-example-2.py
import htmlentitydefs
import re
import cgi
pattern = re.compile("&(/w+?);")
def descape_entity(m, defs=htmlentitydefs.entitydefs):
# callback: translate one entity to its ISO Latin value
try:
return defs[m.group(1)]
except KeyError:
return m.group(0) # use as is
def descape(string):
return pattern.sub(descape_entity, string)
print descape("<spam&eggs>")
print descape(cgi.escape("<spam&eggs>"))
<spam&eggs>
<spam&eggs>
最後, Example 5-12 展現瞭如何將 XML 保留字符和 ISO Latin-1 字符轉換爲 XML 字符串. 與 cgi.escape
類似, 但它會替換非 ASCII 字符.
File: htmlentitydefs-example-3.py
import htmlentitydefs
import re, string
# this pattern matches substrings of reserved and non-ASCII characters
pattern = re.compile(r"[&<>/"/x80-/xff]+")
# create character map
entity_map = {}
for i in range(256):
entity_map[chr(i)] = "&%d;" % i
for entity, char in htmlentitydefs.entitydefs.items():
if entity_map.has_key(char):
entity_map[char] = "&%s;" % entity
def escape_entity(m, get=entity_map.get):
return string.join(map(get, m.group()), "")
def escape(string):
return pattern.sub(escape_entity, string)
print escape("<spam&eggs>")
print escape("/303/245 i /303/245a /303/244 e /303/266")
<spam&eggs>
å i åa ä e ö
formatter
模塊提供了一些可用於 htmllib
的格式類( formatter classes ).
這些類有兩種, formatter 和 writer . formatter 將 HTML 解析器的標籤和數據流轉換爲適合輸出設備的事件流( event stream ), 而 writer 將事件流輸出到設備上. 如 Example 5-13 所示.
大多狀況下, 你可使用 AbstractFormatter 類進行格式化. 它會根據不一樣的格式化事件調用 writer 對象的方法. AbstractWriter 類在每次方法調用時打印一條信息.
File: formatter-example-1.py
import formatter
import htmllib
w = formatter.AbstractWriter()
f = formatter.AbstractFormatter(w)
file = open("samples/sample.htm")
p = htmllib.HTMLParser(f)
p.feed(file.read())
p.close()
file.close()
send_paragraph(1)
new_font(('h1', 0, 1, 0))
send_flowing_data('A Chapter.')
send_line_break()
send_paragraph(1)
new_font(None)
send_flowing_data('Some text. Some more text. Some')
send_flowing_data(' ')
new_font((None, 1, None, None))
send_flowing_data('emphasized')
new_font(None)
send_flowing_data(' text. A')
send_flowing_data(' link')
send_flowing_data('[1]')
send_flowing_data('.')
formatter
模塊還提供了 NullWriter 類, 它會將任何傳遞給它的事件忽略; 以及 DumbWriter 類, 它會將事件流轉換爲純文本文檔. 如 Example 5-14 所示.
File: formatter-example-2.py
import formatter
import htmllib
w = formatter.DumbWriter() # plain text
f = formatter.AbstractFormatter(w)
file = open("samples/sample.htm")
# print html body as plain text
p = htmllib.HTMLParser(f)
p.feed(file.read())
p.close()
file.close()
# print links
i = 1
for link in p.anchorlist:
print i, "=>", link
i = i + 1
A Chapter.
Some text. Some more text. Some emphasized text. A link[1].
1 => http://www.python.org
Example 5-15 提供了一個自定義的 Writer , 它繼承自 DumbWriter 類, 會記錄當前字體樣式並根據字體美化輸出格式.
File: formatter-example-3.py
import formatter
import htmllib, string
class Writer(formatter.DumbWriter):
def _ _init_ _(self):
formatter.DumbWriter._ _init_ _(self)
self.tag = ""
self.bold = self.italic = 0
self.fonts = []
def new_font(self, font):
if font is None:
font = self.fonts.pop()
self.tag, self.bold, self.italic = font
else:
self.fonts.append((self.tag, self.bold, self.italic))
tag, bold, italic, typewriter = font
if tag is not None:
self.tag = tag
if bold is not None:
self.bold = bold
if italic is not None:
self.italic = italic
def send_flowing_data(self, data):
if not data:
return
atbreak = self.atbreak or data[0] in string.whitespace
for word in string.split(data):
if atbreak:
self.file.write(" ")
if self.tag in ("h1", "h2", "h3"):
word = string.upper(word)
if self.bold:
word = "*" + word + "*"
if self.italic:
word = "_" + word + "_"
self.file.write(word)
atbreak = 1
self.atbreak = data[-1] in string.whitespace
w = Writer()
f = formatter.AbstractFormatter(w)
file = open("samples/sample.htm")
# print html body as plain text
p = htmllib.HTMLParser(f)
p.feed(file.read())
p.close()
_A_ _CHAPTER._
Some text. Some more text. Some *emphasized* text. A link[1].
ConfigParser
模塊用於讀取配置文件.
配置文件的格式與 Windows INI 文件相似, 能夠包含一個或多個區域( section ), 每一個區域能夠有多個配置條目.
這裏有個樣例配置文件, 在 Example 5-16 用到了這個文件:
[book]
title: The Python Standard Library
author: Fredrik Lundh
email: fredrik@pythonware.com
version: 2.0-001115
[ematter]
pages: 250
[hardcopy]
pages: 350
Example 5-16 使用 ConfigParser
模塊讀取這個配製文件.
File: configparser-example-1.py
import ConfigParser
import string
config = ConfigParser.ConfigParser()
config.read("samples/sample.ini")
# print summary
print string.upper(config.get("book", "title"))
print "by", config.get("book", "author"),
print "(" + config.get("book", "email") + ")"
print config.get("ematter", "pages"), "pages"
# dump entire config file
for section in config.sections():
print section
for option in config.options(section):
print " ", option, "=", config.get(section, option)
THE PYTHON STANDARD LIBRARY
by Fredrik Lundh (fredrik@pythonware.com)
250 pages
book
title = The Python Standard Library
email = fredrik@pythonware.com
author = Fredrik Lundh
version = 2.0-001115
_ _name_ _ = book
ematter
_ _name_ _ = ematter
pages = 250
hardcopy
_ _name_ _ = hardcopy
pages = 350
Python 2.0 之後, ConfigParser
模塊也能夠將配置數據寫入文件, 如 Example 5-17 所示.
File: configparser-example-2.py
import ConfigParser
import sys
config = ConfigParser.ConfigParser()
# set a number of parameters
config.add_section("book")
config.set("book", "title", "the python standard library")
config.set("book", "author", "fredrik lundh")
config.add_section("ematter")
config.set("ematter", "pages", 250)
# write to screen
config.write(sys.stdout)
[book]
title = the python standard library
author = fredrik lundh
[ematter]
pages = 250
netrc 模塊能夠用來解析 .netrc 配置文件, 如 Example 5-18 所示. 該文件用於在用戶的 home 目錄儲存 FTP 用戶名和密碼. (別忘記設置這個文件的屬性爲: "chmod 0600 ~/.netrc," 這樣只有當前用戶能訪問).
File: netrc-example-1.py
import netrc
# default is $HOME/.netrc
info = netrc.netrc("samples/sample.netrc")
login, account, password = info.authenticators("secret.fbi")
print "login", "=>", repr(login)
print "account", "=>", repr(account)
print "password", "=>", repr(password)
login => 'mulder'
account => None
password => 'trustno1'
shlex
模塊爲基於 Unix shell 語法的語言提供了一個簡單的 lexer (也就是 tokenizer). 如 Example 5-19 所示.
File: shlex-example-1.py
import shlex
lexer = shlex.shlex(open("samples/sample.netrc", "r"))
lexer.wordchars = lexer.wordchars + "._"
while 1:
token = lexer.get_token()
if not token:
break
print repr(token)
'machine'
'secret.fbi'
'login'
'mulder'
'password'
'trustno1'
'machine'
'non.secret.fbi'
'login'
'scully'
'password'
'noway'
( 2.0 新增) zipfile
模塊能夠用來讀寫 ZIP 格式.
使用 namelist
和 infolist
方法能夠列出壓縮檔的內容, 前者返回由文件名組成的列表, 後者返回由 ZipInfo 實例組成的列表. 如 Example 5-20 所示.
File: zipfile-example-1.py
import zipfile
file = zipfile.ZipFile("samples/sample.zip", "r")
# list filenames
for name in file.namelist():
print name,
# list file information
for info in file.infolist():
print info.filename, info.date_time, info.file_size
sample.txt sample.jpg
sample.txt (1999, 9, 11, 20, 11, 8) 302
sample.jpg (1999, 9, 18, 16, 9, 44) 4762
調用 read
方法就能夠從 ZIP 文檔中讀取數據. 它接受一個文件名做爲參數, 返回字符串. 如 Example 5-21 所示.
File: zipfile-example-2.py
import zipfile
file = zipfile.ZipFile("samples/sample.zip", "r")
for name in file.namelist():
data = file.read(name)
print name, len(data), repr(data[:10])
sample.txt 302 'We will pe'
sample.jpg 4762 '/377/330/377/340/000/020JFIF'
向壓縮檔加入文件很簡單, 將文件名, 文件在 ZIP 檔中的名稱傳遞給 write
方法便可.
Example 5-22 將 samples 目錄中的全部文件打包爲一個 ZIP 文件.
File: zipfile-example-3.py
import zipfile
import glob, os
# open the zip file for writing, and write stuff to it
file = zipfile.ZipFile("test.zip", "w")
for name in glob.glob("samples/*"):
file.write(name, os.path.basename(name), zipfile.ZIP_DEFLATED)
file.close()
# open the file again, to see what's in it
file = zipfile.ZipFile("test.zip", "r")
for info in file.infolist():
print info.filename, info.date_time, info.file_size, info.compress_size
sample.wav (1999, 8, 15, 21, 26, 46) 13260 10985
sample.jpg (1999, 9, 18, 16, 9, 44) 4762 4626
sample.au (1999, 7, 18, 20, 57, 34) 1676 1103
...
write
方法的第三個可選參數用於控制是否使用壓縮. 默認爲 zipfile.ZIP_STORED
, 意味着只是將數據儲存在檔案裏而不進行任何壓縮. 若是安裝了 zlib
模塊, 那麼就可使用 zipfile.ZIP_DEFLATED
進行壓縮.
zipfile
模塊也能夠向檔案中添加字符串. 不過, 這須要一點技巧, 你須要建立一個 ZipInfo 實例, 並正確配置它. Example 5-23 提供了一種簡單的解決辦法.
File: zipfile-example-4.py
import zipfile
import glob, os, time
file = zipfile.ZipFile("test.zip", "w")
now = time.localtime(time.time())[:6]
for name in ("life", "of", "brian"):
info = zipfile.ZipInfo(name)
info.date_time = now
info.compress_type = zipfile.ZIP_DEFLATED
file.writestr(info, name*1000)
file.close()
# open the file again, to see what's in it
file = zipfile.ZipFile("test.zip", "r")
for info in file.infolist():
print info.filename, info.date_time, info.file_size, info.compress_size
life (2000, 12, 1, 0, 12, 1) 4000 26
of (2000, 12, 1, 0, 12, 1) 2000 18
brian (2000, 12, 1, 0, 12, 1) 5000 31
gzip
模塊用來讀寫 gzip 格式的壓縮文件, 如 Example 5-24 所示.
File: gzip-example-1.py
import gzip
file = gzip.GzipFile("samples/sample.gz")
print file.read()
Well it certainly looks as though we're in for
a splendid afternoon's sport in this the 127th
Upperclass Twit of the Year Show.
標準的實現並不支持 seek
和 tell
方法. 不過 Example 5-25 能夠解決這個問題.
File: gzip-example-2.py
import gzip
class gzipFile(gzip.GzipFile):
# adds seek/tell support to GzipFile
offset = 0
def read(self, size=None):
data = gzip.GzipFile.read(self, size)
self.offset = self.offset + len(data)
return data
def seek(self, offset, whence=0):
# figure out new position (we can only seek forwards)
if whence == 0:
position = offset
elif whence == 1:
position = self.offset + offset
else:
raise IOError, "Illegal argument"
if position < self.offset:
raise IOError, "Cannot seek backwards"
# skip forward, in 16k blocks
while position > self.offset:
if not self.read(min(position - self.offset, 16384)):
break
def tell(self):
return self.offset
#
# try it
file = gzipFile("samples/sample.gz")
file.seek(80)
print file.read()
this the 127th
Upperclass Twit of the Year Show.
"To be removed from our list of future commercial postings by [SOME] PUBLISHING COMPANY an Annual Charge of Ninety Five dollars is required. Just send $95.00 with your Name, Address and Name of the Newsgroup to be removed from our list."
- Newsgroup spammer, July 1996
"想要退出 '某' 宣傳公司的將來商業廣告列表嗎, 您須要付 95 美圓. 只要您支付95美圓, 而且告訴咱們您的姓名, 地址, 和須要退出的新聞組, 咱們就會把您從列表中移除."
- 新聞組垃圾發送者, 1996 年 7 月
Python 有大量用於處理郵件和新聞組的模塊, 其中包括了許多常見的郵件格式.
rfc822
模塊包括了一個郵件和新聞組的解析器 (也可用於其它符合 RFC 822 標準的消息, 好比 HTTP 頭).
一般, RFC 822 格式的消息包含一些標頭字段, 後面至少有一個空行, 而後是信息主體.
For example, here's a short mail message. The first five lines make up the message header, and the actual message (a single line, in this case) follows after an empty line:
例如這裏的郵件信息. 前五行組成了消息標頭, 隔一個空行後是消息主體.
Message-Id: <20001114144603.00abb310@oreilly.com>
Date: Tue, 14 Nov 2000 14:55:07 -0500
To: "Fredrik Lundh" <fredrik@effbot.org>
From: Frank
Subject: Re: python library book!
Where is it?
消息解析器讀取標頭字段後會返回一個以消息標頭爲鍵的類字典對象, 如 Example 6-1 所示.
File: rfc822-example-1.py
import rfc822
file = open("samples/sample.eml")
message = rfc822.Message(file)
for k, v in message.items():
print k, "=", v
print len(file.read()), "bytes in body"
subject = Re: python library book!
from = "Frank" <your@editor>
message-id = <20001114144603.00abb310@oreilly.com>
to = "Fredrik Lundh" <fredrik@effbot.org>
date = Tue, 14 Nov 2000 14:55:07 -0500
25 bytes in body
消息對象( message object )還提供了一些用於解析地址字段和數據的, 如 Example 6-2 所示.
File: rfc822-example-2.py
import rfc822
file = open("samples/sample.eml")
message = rfc822.Message(file)
print message.getdate("date")
print message.getaddr("from")
print message.getaddrlist("to")
(2000, 11, 14, 14, 55, 7, 0, 0, 0)
('Frank', 'your@editor')
[('Fredrik Lundh', 'fredrik@effbot.org')]
地址字段被解析爲 (實際名稱, 郵件地址) 這樣的元組. 數據字段被解析爲 9 元時間元組, 可使用 time
模塊處理.
多用途因特網郵件擴展 ( Multipurpose Internet Mail Extensions, MIME ) 標準定義瞭如何在 RFC 822 格式的消息中儲存非 ASCII 文本, 圖像以及其它數據.
mimetools
模塊包含一些讀寫 MIME 信息的工具. 它還提供了一個相似 rfc822
模塊中 Message 的類, 用於處理 MIME 編碼的信息. 如 Example 6-3 所示.
File: mimetools-example-1.py
import mimetools
file = open("samples/sample.msg")
msg = mimetools.Message(file)
print "type", "=>", msg.gettype()
print "encoding", "=>", msg.getencoding()
print "plist", "=>", msg.getplist()
print "header", "=>"
for k, v in msg.items():
print " ", k, "=", v
type => text/plain
encoding => 7bit
plist => ['charset="iso-8859-1"']
header =>
mime-version = 1.0
content-type = text/plain;
charset="iso-8859-1"
to = effbot@spam.egg
date = Fri, 15 Oct 1999 03:21:15 -0400
content-transfer-encoding = 7bit
from = "Fredrik Lundh" <fredrik@pythonware.com>
subject = By the way...
...
MimeWriter
模塊用於生成符合 MIME 郵件標準的 "多部分" 的信息, 如 Example 6-4 所示.
File: mimewriter-example-1.py
import MimeWriter
# data encoders
# 數據編碼
import quopri
import base64
import StringIO
import sys
TEXT = """
here comes the image you asked for. hope
it's what you expected.
</F>"""
FILE = "samples/sample.jpg"
file = sys.stdout
#
# create a mime multipart writer instance
mime = MimeWriter.MimeWriter(file)
mime.addheader("Mime-Version", "1.0")
mime.startmultipartbody("mixed")
# add a text message
# 加入文字信息
part = mime.nextpart()
part.addheader("Content-Transfer-Encoding", "quoted-printable")
part.startbody("text/plain")
quopri.encode(StringIO.StringIO(TEXT), file, 0)
# add an image
# 加入圖片
part = mime.nextpart()
part.addheader("Content-Transfer-Encoding", "base64")
part.startbody("image/jpeg")
base64.encode(open(FILE, "rb"), file)
mime.lastpart()
輸出結果以下:
Content-Type: multipart/mixed;
boundary='host.1.-852461.936831373.130.24813'
--host.1.-852461.936831373.130.24813
Content-Type: text/plain
Context-Transfer-Encoding: quoted-printable
here comes the image you asked for. hope
it's what you expected.
</F>
--host.1.-852461.936831373.130.24813
Content-Type: image/jpeg
Context-Transfer-Encoding: base64
/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRof
HBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIy
...
1e5vLrSYbJnEVpEgjCLx5mPU0qsVK0UaxjdNlS+1U6pfzTR8IzEhj2HrVG6m8m18xc8cIKSC
tCuFyC746j/Cq2pTia4WztfmKjGBXTCmo6IUpt==
--host.1.-852461.936831373.130.24813--
[Example 6-5 #eg-6-5 ] 使用輔助類儲存每一個子部分.
File: mimewriter-example-2.py
import MimeWriter
import string, StringIO, sys
import re, quopri, base64
# check if string contains non-ascii characters
must_quote = re.compile("[/177-/377]").search
#
# encoders
def encode_quoted_printable(infile, outfile):
quopri.encode(infile, outfile, 0)
class Writer:
def _ _init_ _(self, file=None, blurb=None):
if file is None:
file = sys.stdout
self.file = file
self.mime = MimeWriter.MimeWriter(file)
self.mime.addheader("Mime-Version", "1.0")
file = self.mime.startmultipartbody("mixed")
if blurb:
file.write(blurb)
def close(self):
"End of message"
self.mime.lastpart()
self.mime = self.file = None
def write(self, data, mimetype="text/plain"):
"Write data from string or file to message"
# data is either an opened file or a string
if type(data) is type(""):
file = StringIO.StringIO(data)
else:
file = data
data = None
part = self.mime.nextpart()
typ, subtyp = string.split(mimetype, "/", 1)
if typ == "text":
# text data
encoding = "quoted-printable"
encoder = lambda i, o: quopri.encode(i, o, 0)
if data and not must_quote(data):
# copy, don't encode
encoding = "7bit"
encoder = None
else:
# binary data (image, audio, application, ...)
encoding = "base64"
encoder = base64.encode
#
# write part headers
if encoding:
part.addheader("Content-Transfer-Encoding", encoding)
part.startbody(mimetype)
#
# write part body
if encoder:
encoder(file, self.file)
elif data:
self.file.write(data)
else:
while 1:
data = infile.read(16384)
if not data:
break
outfile.write(data)
#
# try it out
BLURB = "if you can read this, your mailer is not MIME-aware/n"
mime = Writer(sys.stdout, BLURB)
# add a text message
mime.write("""/
here comes the image you asked for. hope
it's what you expected.
""", "text/plain")
# add an image
mime.write(open("samples/sample.jpg", "rb"), "image/jpeg")
mime.close()
mailbox
模塊用來處理各類不一樣類型的郵箱格式, 如 Example 6-6 所示. 大部分郵箱格式使用文本文件儲存純 RFC 822 信息, 用分割行區別不一樣的信息.
File: mailbox-example-1.py
import mailbox
mb = mailbox.UnixMailbox(open("/var/spool/mail/effbot"))
while 1:
msg = mb.next()
if not msg:
break
for k, v in msg.items():
print k, "=", v
body = msg.fp.read()
print len(body), "bytes in body"
subject = for he's a ...
message-id = <199910150027.CAA03202@spam.egg>
received = (from fredrik@pythonware.com)
by spam.egg (8.8.7/8.8.5) id CAA03202
for effbot; Fri, 15 Oct 1999 02:27:36 +0200
from = Fredrik Lundh <fredrik@pythonware.com>
date = Fri, 15 Oct 1999 12:35:36 +0200
to = effbot@spam.egg
1295 bytes in body
mailcap
模塊用於處理 mailcap 文件, 該文件指定了不一樣的文檔格式的處理方法( Unix 系統下). 如 Example 6-7 所示.
File: mailcap-example-1.py
import mailcap
caps = mailcap.getcaps()
for k, v in caps.items():
print k, "=", v
image/* = [{'view': 'pilview'}]
application/postscript = [{'view': 'ghostview'}]
Example 6-7 中, 系統使用 pilview
來預覽( view )全部類型的圖片, 使用 ghostscript viewer 預覽 PostScript 文檔. Example 6-8 展現瞭如何使用 mailcap
得到特定操做的命令.
File: mailcap-example-2.py
import mailcap
caps = mailcap.getcaps()
command, info = mailcap.findmatch(
caps, "image/jpeg", "view", "samples/sample.jpg"
)
print command
pilview samples/sample.jpg
mimetypes
模塊能夠判斷給定 url ( uniform resource locator , 統一資源定位符) 的 MIME 類型. 它基於一個內建的表, 還可能搜索 Apache 和 Netscape 的配置文件. 如 Example 6-9 所示.
File: mimetypes-example-1.py
import mimetypes
import glob, urllib
for file in glob.glob("samples/*"):
url = urllib.pathname2url(file)
print file, mimetypes.guess_type(url)
samples/sample.au ('audio/basic', None)
samples/sample.ini (None, None)
samples/sample.jpg ('image/jpeg', None)
samples/sample.msg (None, None)
samples/sample.tar ('application/x-tar', None)
samples/sample.tgz ('application/x-tar', 'gzip')
samples/sample.txt ('text/plain', None)
samples/sample.wav ('audio/x-wav', None)
samples/sample.zip ('application/zip', None)
(已廢棄) packmail
模塊能夠用來建立 Unix shell 檔案. 若是安裝了合適的工具, 那麼你就能夠直接經過運行來解開這樣的檔案. Example 6-10 展現瞭如何打包單個文件, Example 6-11 則打包了整個目錄樹.
File: packmail-example-1.py
import packmail
import sys
packmail.pack(sys.stdout, "samples/sample.txt", "sample.txt")
echo sample.txt
sed "s/^X//" >sample.txt <<"!"
XWe will perhaps eventually be writing only small
Xmodules, which are identified by name as they are
Xused to build larger ones, so that devices like
Xindentation, rather than delimiters, might become
Xfeasible for expressing local structure in the
Xsource language.
X -- Donald E. Knuth, December 1974
!
====Example 6-11. 使用 packmail 打包整個目錄樹===[eg-6-11]
File: packmail-example-2.py
import packmail
import sys
packmail.packtree(sys.stdout, "samples")
注意, 這個模塊不能處理二進制文件, 例如聲音或者圖像文件.
mimify
模塊用於在 MIME 編碼的文本信息和普通文本信息(例如 ISO Latin 1 文本)間相互轉換. 它能夠用做命令行工具, 或是特定郵件代理的轉換過濾器:
$ mimify.py -e raw-message mime-message
$ mimify.py -d mime-message raw-message
做爲模塊使用, 如 Example 6-12 所示.
File: mimify-example-1.py
import mimify
import sys
mimify.unmimify("samples/sample.msg", sys.stdout, 1)
這裏是一個包含兩部分的 MIME 信息, 一個是引用的可打印信息, 另個是 base64 編碼信息. unmimify 的第三個參數決定是否自動解碼 base64 編碼的部分:
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary='boundary'
this is a multipart sample file. the two
parts both contain ISO Latin 1 text, with
different encoding techniques.
--boundary
Content-Type: text/plain
Content-Transfer-Encoding: quoted-printable
sillmj=F6lke! blindstyre! medisterkorv!
--boundary
Content-Type: text/plain
Content-Transfer-Encoding: base64
a29tIG5lciBiYXJhLCBvbSBkdSB09nJzIQ==
--boundary--
解碼結果以下 (可讀性相對來講更好些):
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary= 'boundary'
this is a multipart sample file. the two
parts both contain ISO Latin 1 text, with
different encoding techniques.
--boundary
Content-Type: text/plain
sillmjölke! blindstyre! medisterkorv!
--boundary
Content-Type: text/plain
kom ner bara, om du törs!
Example 6-13 展現瞭如何編碼信息.
File: mimify-example-2.py
import mimify
import StringIO, sys
#
# decode message into a string buffer
file = StringIO.StringIO()
mimify.unmimify("samples/sample.msg", file, 1)
#
# encode message from string buffer
file.seek(0) # rewind
mimify.mimify(file, sys.stdout)
multifile
模塊容許你將一個多部分的 MIME 信息的每部分做爲單獨的文件處理. 如 Example 6-14 所示.
File: multifile-example-1.py
import multifile
import cgi, rfc822
infile = open("samples/sample.msg")
message = rfc822.Message(infile)
# print parsed header
for k, v in message.items():
print k, "=", v
# use cgi support function to parse content-type header
type, params = cgi.parse_header(message["content-type"])
if type[:10] == "multipart/":
# multipart message
boundary = params["boundary"]
file = multifile.MultiFile(infile)
file.push(boundary)
while file.next():
submessage = rfc822.Message(file)
# print submessage
print "-" * 68
for k, v in submessage.items():
print k, "=", v
print file.read()
file.pop()
else:
# plain message
print infile.read()
"Increasingly, people seem to misinterpret complexity as sophistication, which is baffling - the incomprehensible should cause suspicion rather than admiration. Possibly this trend results from a mistaken belief that using a somewhat mysterious device confers an aura of power on the user."
- Niklaus Wirth
本章描述了 Python 的 socket 協議支持以及其餘創建在 socket 模塊上的網絡 模塊. 這些包含了對大多流行 Internet 協議客戶端的支持, 以及一些可用來 實現 Internet 服務器的框架.
對於那些本章中的底層的例子, 我將使用兩個協議做爲樣例: Internet Time Protocol ( Internet 時間協議 ) 以及 Hypertext Transfer Protocol (超文本傳輸協議, HTTP 協議).
Internet 時間協議 ( RFC 868, Postel 和 Harrenstien, 1983) 可讓 一個網絡客戶端得到一個服務器的當前時間.
由於這個協議是輕量級的, 許多 Unix 系統(但不是全部)都提供了這個服務. 它多是最簡單的網絡協議了. 服務器等待鏈接請求並在鏈接後返回當前時間 ( 4 字節整數, 自從 1900 年 1 月 1 日到當前的秒數).
協議很簡單, 這裏咱們提供規格書給你們:
File: rfc868.txt
Network Working Group J. Postel - ISI
Request for Comments: 868 K. Harrenstien - SRI
May 1983
Time Protocol
This RFC specifies a standard for the ARPA Internet community. Hosts on
the ARPA Internet that choose to implement a Time Protocol are expected
to adopt and implement this standard.
本 RFC 規範提供了一個 ARPA Internet community 上的標準.
在 ARPA Internet 上的全部主機應當採用並實現這個標準.
This protocol provides a site-independent, machine readable date and
time. The Time service sends back to the originating source the time in
seconds since midnight on January first 1900.
此協議提供了一個獨立於站點的, 機器可讀的日期和時間信息.
時間服務返回的是從 1900 年 1 月 1 日午夜到如今的秒數.
One motivation arises from the fact that not all systems have a
date/time clock, and all are subject to occasional human or machine
error. The use of time-servers makes it possible to quickly confirm or
correct a system's idea of the time, by making a brief poll of several
independent sites on the network.
設計這個協議的一個重要目的在於, 網絡上的一些主機並無時鐘,
這有可能致使人工或者機器錯誤. 咱們能夠依靠時間服務器快速確認或者修改
一個系統的時間.
This protocol may be used either above the Transmission Control Protocol
(TCP) or above the User Datagram Protocol (UDP).
該協議能夠用在 TCP 協議或是 UDP 協議上.
When used via TCP the time service works as follows:
經過 TCP 訪問時間服務器的步驟:
* S: Listen on port 37 (45 octal).
* U: Connect to port 37.
* S: Send the time as a 32 bit binary number.
* U: Receive the time.
* U: Close the connection.
* S: Close the connection.
* S: 監聽 37 ( 45 的八進制) 端口.
* U: 鏈接 37 端口.
* S: 將時間做爲 32 位二進制數字發送.
* U: 接收時間.
* U: 關閉鏈接.
* S: 關閉鏈接.
The server listens for a connection on port 37. When the connection
is established, the server returns a 32-bit time value and closes the
connection. If the server is unable to determine the time at its
site, it should either refuse the connection or close it without
sending anything.
服務器在 37 端口監聽. 當鏈接創建的時候, 服務器返回一個 32 位的數字值
並關閉鏈接. 若是服務器本身沒法決定當前時間, 那麼它應該拒絕這個鏈接或者
不發送任何數據當即關閉鏈接.
When used via UDP the time service works as follows:
經過 TCP 訪問時間服務器的步驟:
S: Listen on port 37 (45 octal).
U: Send an empty datagram to port 37.
S: Receive the empty datagram.
S: Send a datagram containing the time as a 32 bit binary number.
U: Receive the time datagram.
S: 監聽 37 ( 45 的八進制) 端口.
U: 發送空數據報文到 37 端口.
S: 接受空報文.
S: 發送包含時間( 32 位二進制數字 )的報文.
U: 接受時間報文.
The server listens for a datagram on port 37. When a datagram
arrives, the server returns a datagram containing the 32-bit time
value. If the server is unable to determine the time at its site, it
should discard the arriving datagram and make no reply.
服務器在 37 端口監聽報文. 當報文到達時, 服務器返回包含 32 位時間值
的報文. 若是服務器沒法決定當前時間, 那麼它應該丟棄到達的報文,
不作任何回覆.
The Time
時間
The time is the number of seconds since 00:00 (midnight) 1 January 1900
GMT, such that the time 1 is 12:00:01 am on 1 January 1900 GMT; this
base will serve until the year 2036.
時間是自 1900 年 1 月 1 日 0 時到當前的秒數,
這個協議標準會一直服務到2036年. 到時候數字不夠用再說.
For example:
the time 2,208,988,800 corresponds to 00:00 1 Jan 1970 GMT,
2,398,291,200 corresponds to 00:00 1 Jan 1976 GMT,
2,524,521,600 corresponds to 00:00 1 Jan 1980 GMT,
2,629,584,000 corresponds to 00:00 1 May 1983 GMT,
and -1,297,728,000 corresponds to 00:00 17 Nov 1858 GMT.
例如:
時間值 2,208,988,800 對應 to 00:00 1 Jan 1970 GMT,
2,398,291,200 對應 to 00:00 1 Jan 1976 GMT,
2,524,521,600 對應 to 00:00 1 Jan 1980 GMT,
2,629,584,000 對應 to 00:00 1 May 1983 GMT,
最後 -1,297,728,000 對應 to 00:00 17 Nov 1858 GMT.
RFC868.txt Translated By Andelf(gt: andelf@gmail.com )
非商業用途, 轉載請保留做者信息. Thx.
超文本傳輸協議 ( HTTP, RFC 2616 ) 是另個徹底不一樣的東西. 最近的格式說明書( Version 1.1 )超過了 100 頁.
從它最簡單的格式來看, 這個協議是很簡單的. 客戶端發送以下的請求到服務器, 請求一個文件:
GET /hello.txt HTTP/1.0
Host: hostname
User-Agent: name
[optional request body , 可選的請求正文]
服務器返回對應的響應:
HTTP/1.0 200 OK
Content-Type: text/plain
Content-Length: 7
Hello
請求和響應的 headers (報頭)通常會包含更多的域, 可是請求 header 中的 Host 域/字段是必須提供的.
header 行使用 "/r/n
" 分割, 並且 header 後必須有一個空行, 即便沒有正文 (請求和響應都必須符合這條規則).
剩下的 HTTP 協議格式說明書細節, 例如內容協商, 緩存機制, 保持鏈接, 等等, 請參閱 Hypertext TransferProtocol - HTTP/1.1 ( http://www.w3.org/Protocols).
socket
模塊實現了到 socket 通信層的接口. 你可使用該模塊建立 客戶端或是服務器的 socket .
咱們首先以一個客戶端爲例, Example 7-1 中的客戶端鏈接到一個時間協議服務器, 讀取 4 字節的返回數據, 並把它轉換爲一個時間值.
File: socket-example-1.py
import socket
import struct, time
# server
HOST = "www.python.org"
PORT = 37
# reference time (in seconds since 1900-01-01 00:00:00)
TIME1970 = 2208988800L # 1970-01-01 00:00:00
# connect to server
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((HOST, PORT))
# read 4 bytes, and convert to time value
t = s.recv(4)
t = struct.unpack("!I", t)[0]
t = int(t - TIME1970)
s.close()
# print results
print "server time is", time.ctime(t)
print "local clock is", int(time.time()) - t, "seconds off"
server time is Sat Oct 09 16:42:36 1999
local clock is 8 seconds off
socket
工廠函數( factory function )根據給定類型(該例子中爲 Internet stream socket , 即就是 TCP socket )建立一個新的 socket . connect
方法嘗試將這個 socket 鏈接到指定服務器上. 成功後, 就可使用 recv
方法讀取數據.
建立一個服務器 socket 使用的是相同的方法, 不過這裏不是鏈接到服務器, 而是將 socket bind
(綁定)到本機的一個端口上, 告訴它去監聽鏈接請求, 而後儘快處理每一個到達的請求.
Example 7-2 建立了一個時間服務器, 綁定到本機的 8037 端口( 1024 前的全部端口 是爲系統服務保留的, Unix 系統下訪問它們你必需要有 root 權限).
File: socket-example-2.py
import socket
import struct, time
# user-accessible port
PORT = 8037
# reference time
TIME1970 = 2208988800L
# establish server
service = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
service.bind(("", PORT))
service.listen(1)
print "listening on port", PORT
while 1:
# serve forever
channel, info = service.accept()
print "connection from", info
t = int(time.time()) + TIME1970
t = struct.pack("!I", t)
channel.send(t) # send timestamp
channel.close() # disconnect
listening on port 8037
connection from ('127.0.0.1', 1469)
connection from ('127.0.0.1', 1470)
...
listen
函數的調用告訴 socket 咱們指望接受鏈接. 參數表明鏈接 的隊列(用於在程序沒有處理前保持鏈接)大小. 最後 accept
循環將當前時間返回 給每一個鏈接的客戶端.
注意這裏的 accept
函數返回一個新的 socket 對象, 這個對象是直接鏈接到客戶端 的. 而原 socket 只是用來保持鏈接; 全部後來的數據傳輸操做都使用新的 socket .
咱們可使用 Example 7-3 , ( Example 7-1 的通用化版本)來測試這個服務器, .
File: timeclient.py
import socket
import struct, sys, time
# default server
host = "localhost"
port = 8037
# reference time (in seconds since 1900-01-01 00:00:00)
TIME1970 = 2208988800L # 1970-01-01 00:00:00
def gettime(host, port):
# fetch time buffer from stream server
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((host, port))
t = s.recv(4)
s.close()
t = struct.unpack("!I", t)[0]
return int(t - TIME1970)
if _ _name_ _ == "_ _main_ _":
# command-line utility
if sys.argv[1:]:
host = sys.argv[1]
if sys.argv[2:]:
port = int(sys.argv[2])
else:
port = 37 # default for public servers
t = gettime(host, port)
print "server time is", time.ctime(t)
print "local clock is", int(time.time()) - t, "seconds off"
server time is Sat Oct 09 16:58:50 1999
local clock is 0 seconds off
Example 7-3 所示的腳本也能夠做爲模塊使用; 你只須要導入 timeclient
模塊, 而後調用它的 gettime
函數.
目前爲止, 咱們已經使用了流( TCP ) socket . 時間協議還提到了 UDP sockets (報文). 流 socket 的工做模式和電話線相似; 你會知道在遠端 是否有人拿起接聽器, 在對方掛斷的時候你也會注意到. 相比之下, 發送報文更像 是在一間黑屋子裏大聲喊. 可能某人會在那裏, 但你只有在他回覆的時候纔會知道.
如 Example 7-4 所示, 你不須要在經過報文 socket 發送數據時鏈接遠程機器. 只需使用 sendto
方法, 它接受數據和接收者地址做爲參數. 讀取報文的時候使用recvfrom
方法.
File: socket-example-4.py
import socket
import struct, time
# server
HOST = "localhost"
PORT = 8037
# reference time (in seconds since 1900-01-01 00:00:00)
TIME1970 = 2208988800L # 1970-01-01 00:00:00
# connect to server
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
# send empty packet
s.sendto("", (HOST, PORT))
# read 4 bytes from server, and convert to time value
t, server = s.recvfrom(4)
t = struct.unpack("!I", t)[0]
t = int(t - TIME1970)
s.close()
print "server time is", time.ctime(t)
print "local clock is", int(time.time()) - t, "seconds off"
server time is Sat Oct 09 16:42:36 1999
local clock is 8 seconds off
這裏的 recvfrom
返回兩個值: 數據和發送者的地址. 後者用於發送回覆數據.
Example 7-5 展現了對應的服務器代碼.
Example 7-5. 使用 socket 模塊實現一個報文時間服務器
File: socket-example-5.py
import socket
import struct, time
# user-accessible port
PORT = 8037
# reference time
TIME1970 = 2208988800L
# establish server
service = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
service.bind(("", PORT))
print "listening on port", PORT
while 1:
# serve forever
data, client = service.recvfrom(0)
print "connection from", client
t = int(time.time()) + TIME1970
t = struct.pack("!I", t)
service.sendto(t, client) # send timestamp
listening on port 8037
connection from ('127.0.0.1', 1469)
connection from ('127.0.0.1', 1470)
...
最主要的不一樣在於服務器使用 bind
來分配一個已知端口給 socket , 根據 recvfrom
函數返回的地址向客戶端發送數據.
select
模塊容許你檢查一個或多個 socket , 管道, 以及其餘流兼容對象所接受的數據, 如 Example 7-6 所示.
你能夠將一個或更多 socket 傳遞給 select
函數, 而後等待它們狀態改變(可讀, 可寫, 或是發送錯誤信號):
listen
函數後鏈接, 當遠端數據到達時, socket 就成爲可讀的(這意味着 accept
不會阻塞). 或者是 socket 被關閉或重置時(在此狀況下, recv
會返回一個空字符串).connect
方法後創建鏈接或是數據能夠被寫入到 socket 時, socket 就成爲可寫的.connect
方法後鏈接失敗後, socket 會發出一個錯誤信號.File: select-example-1.py
import select
import socket
import time
PORT = 8037
TIME1970 = 2208988800L
service = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
service.bind(("", PORT))
service.listen(1)
print "listening on port", PORT
while 1:
is_readable = [service]
is_writable = []
is_error = []
r, w, e = select.select(is_readable, is_writable, is_error, 1.0)
if r:
channel, info = service.accept()
print "connection from", info
t = int(time.time()) + TIME1970
t = chr(t>>24&255) + chr(t>>16&255) + chr(t>>8&255) + chr(t&255)
channel.send(t) # send timestamp
channel.close() # disconnect
else:
print "still waiting"
listening on port 8037
still waiting
still waiting
connection from ('127.0.0.1', 1469)
still waiting
connection from ('127.0.0.1', 1470)
...
在 Example 7-6 中, 咱們等待監聽 socket 變成可讀狀態, 這表明有一個鏈接請求到達. 咱們用和以前同樣的方法處理 channel socket , 由於它不可能由於等待 4 字節而填充網絡 緩衝區. 若是你須要向客戶端發送大量的數據, 那麼你應該在循環的頂端把數據加入到 is_writable 列表中, 而且只在 select
容許的狀況下寫入.
若是你設置 socket 爲非阻塞模式(經過調用 setblocking
方法), 那麼你就可使用 select
來等待 socket 鏈接. 不過 asyncore
模塊(參見下一節)提供了一個強大的框架, 它自動爲你處理好了這一切. 因此我不許備在這裏多說什麼, 看下一節吧.
asyncore
模塊提供了一個 "反饋性的( reactive )" socket 實現. 該模塊容許你定義特定過程完成後所執行的代碼, 而不是建立 socket 對象, 調用它們的方法. 你只須要繼承 dispatcher 類, 而後重載以下方法 (能夠選擇重載某一個或多個)就能夠實現異步的 socket 處理器.
handle_connect
: 一個鏈接成功創建後被調用.handle_expt
: 鏈接失敗後被調用.handle_accept
: 鏈接請求創建到一個監聽 socket 上時被調用. 回調時( callback )應該使用 accept
方法來得到客戶端 socket .handle_read
: 有來自 socket 的數據等待讀取時被調用. 回調時應該使用 recv
方法來得到數據.handle_write
: socket 能夠寫入數據的時候被調用. 使用 send
方法寫入數據.handle_close
: 當 socket 被關閉或復位時被調用.handle_error(type, value, traceback)
在任何一個回調函數發生 Python 錯誤時被調用. 默認的實現會打印跟蹤返回消息到 sys.stdout
.Example 7-7 展現了一個時間客戶端, 和 socket
模塊中的那個相似.
File: asyncore-example-1.py
import asyncore
import socket, time
# reference time (in seconds since 1900-01-01 00:00:00)
TIME1970 = 2208988800L # 1970-01-01 00:00:00
class TimeRequest(asyncore.dispatcher):
# time requestor (as defined in RFC 868)
def _ _init_ _(self, host, port=37):
asyncore.dispatcher._ _init_ _(self)
self.create_socket(socket.AF_INET, socket.SOCK_STREAM)
self.connect((host, port))
def writable(self):
return 0 # don't have anything to write
def handle_connect(self):
pass # connection succeeded
def handle_expt(self):
self.close() # connection failed, shutdown
def handle_read(self):
# get local time
here = int(time.time()) + TIME1970
# get and unpack server time
s = self.recv(4)
there = ord(s[3]) + (ord(s[2])<<8) + (ord(s[1])<<16) + (ord(s[0])<<24L)
self.adjust_time(int(here - there))
self.handle_close() # we don't expect more data
def handle_close(self):
self.close()
def adjust_time(self, delta):
# override this method!
print "time difference is", delta
#
# try it out
request = TimeRequest("www.python.org")
asyncore.loop()
log: adding channel <TimeRequest at 8cbe90>
time difference is 28
log: closing channel 192:<TimeRequest connected at 8cbe90>
若是你不想記錄任何信息, 那麼你能夠在你的 dispatcher 類裏重載 log
方法.
Example 7-8 展現了對應的時間服務器. 注意這裏它使用了兩個 dispatcher 子類, 一個用於監聽 socket , 另個用於與客戶端通信.
File: asyncore-example-2.py
import asyncore
import socket, time
# reference time
TIME1970 = 2208988800L
class TimeChannel(asyncore.dispatcher):
def handle_write(self):
t = int(time.time()) + TIME1970
t = chr(t>>24&255) + chr(t>>16&255) + chr(t>>8&255) + chr(t&255)
self.send(t)
self.close()
class TimeServer(asyncore.dispatcher):
def _ _init_ _(self, port=37):
self.port = port
self.create_socket(socket.AF_INET, socket.SOCK_STREAM)
self.bind(("", port))
self.listen(5)
print "listening on port", self.port
def handle_accept(self):
channel, addr = self.accept()
TimeChannel(channel)
server = TimeServer(8037)
asyncore.loop()
log: adding channel <TimeServer at 8cb940>
listening on port 8037
log: adding channel <TimeChannel at 8b2fd0>
log: closing channel 52:<TimeChannel connected at 8b2fd0>
除了 dispatcher 外, 這個模塊還包含一個 dispatcher_with_send 類. 你可使用這個類發送大量的數據而不會阻塞網絡通信緩衝區.
Example 7-9 中的模塊經過繼承 dispatcher_with_send 類定義了一個 AsyncHTTP 類. 當你建立一個它的實例後, 它會發出一個 HTTP GET 請求並把 接受到的數據發送到一個 "consumer" 目標對象
File: SimpleAsyncHTTP.py
import asyncore
import string, socket
import StringIO
import mimetools, urlparse
class AsyncHTTP(asyncore.dispatcher_with_send):
# HTTP requester
def _ _init_ _(self, uri, consumer):
asyncore.dispatcher_with_send._ _init_ _(self)
self.uri = uri
self.consumer = consumer
# turn the uri into a valid request
scheme, host, path, params, query, fragment = urlparse.urlparse(uri)
assert scheme == "http", "only supports HTTP requests"
try:
host, port = string.split(host, ":", 1)
port = int(port)
except (TypeError, ValueError):
port = 80 # default port
if not path:
path = "/"
if params:
path = path + ";" + params
if query:
path = path + "?" + query
self.request = "GET %s HTTP/1.0/r/nHost: %s/r/n/r/n" % (path, host)
self.host = host
self.port = port
self.status = None
self.header = None
self.data = ""
# get things going!
self.create_socket(socket.AF_INET, socket.SOCK_STREAM)
self.connect((host, port))
def handle_connect(self):
# connection succeeded
self.send(self.request)
def handle_expt(self):
# connection failed; notify consumer (status is None)
self.close()
try:
http_header = self.consumer.http_header
except AttributeError:
pass
else:
http_header(self)
def handle_read(self):
data = self.recv(2048)
if not self.header:
self.data = self.data + data
try:
i = string.index(self.data, "/r/n/r/n")
except ValueError:
return # continue
else:
# parse header
fp = StringIO.StringIO(self.data[:i+4])
# status line is "HTTP/version status message"
status = fp.readline()
self.status = string.split(status, " ", 2)
# followed by a rfc822-style message header
self.header = mimetools.Message(fp)
# followed by a newline, and the payload (if any)
data = self.data[i+4:]
self.data = ""
# notify consumer (status is non-zero)
try:
http_header = self.consumer.http_header
except AttributeError:
pass
else:
http_header(self)
if not self.connected:
return # channel was closed by consumer
self.consumer.feed(data)
def handle_close(self):
self.consumer.close()
self.close()
Example 7-10 中的小腳本展現瞭如何使用這個類.
File: asyncore-example-3.py
import SimpleAsyncHTTP
import asyncore
class DummyConsumer:
size = 0
def http_header(self, request):
# handle header
if request.status is None:
print "connection failed"
else:
print "status", "=>", request.status
for key, value in request.header.items():
print key, "=", value
def feed(self, data):
# handle incoming data
self.size = self.size + len(data)
def close(self):
# end of data
print self.size, "bytes in body"
#
# try it out
consumer = DummyConsumer()
request = SimpleAsyncHTTP.AsyncHTTP(
"http://www.pythonware.com",
consumer
)
asyncore.loop()
log: adding channel <AsyncHTTP at 8e2850>
status => ['HTTP/1.1', '200', 'OK/015/012']
server = Apache/Unix (Unix)
content-type = text/html
content-length = 3730
...
3730 bytes in body
log: closing channel 156:<AsyncHTTP connected at 8e2850>
這裏的 consumer 接口設計時是爲了與 htmllib
和 xmllib
分析器兼容的, 這樣你就能夠直接方便地解析 HTML 或是 XML 數據. http_header
方法是可選的; 若是沒有定義它, 那麼它將被忽略.
Example 7-10 的一個問題是它不能很好地處理重定向資源. Example 7-11 加入了一個額外的 consumer 層, 它能夠很好地處理重定向.
File: asyncore-example-4.py
import SimpleAsyncHTTP
import asyncore
class DummyConsumer:
size = 0
def http_header(self, request):
# handle header
if request.status is None:
print "connection failed"
else:
print "status", "=>", request.status
for key, value in request.header.items():
print key, "=", value
def feed(self, data):
# handle incoming data
self.size = self.size + len(data)
def close(self):
# end of data
print self.size, "bytes in body"
class RedirectingConsumer:
def _ _init_ _(self, consumer):
self.consumer = consumer
def http_header(self, request):
# handle header
if request.status is None or/
request.status[1] not in ("301", "302"):
try:
http_header = self.consumer.http_header
except AttributeError:
pass
else:
return http_header(request)
else:
# redirect!
uri = request.header["location"]
print "redirecting to", uri, "..."
request.close()
SimpleAsyncHTTP.AsyncHTTP(uri, self)
def feed(self, data):
self.consumer.feed(data)
def close(self):
self.consumer.close()
#
# try it out
consumer = RedirectingConsumer(DummyConsumer())
request = SimpleAsyncHTTP.AsyncHTTP(
"http://www.pythonware.com/library",
consumer
)
asyncore.loop()
log: adding channel <AsyncHTTP at 8e64b0>
redirecting to http://www.pythonware.com/library/ ...
log: closing channel 48:<AsyncHTTP connected at 8e64b0>
log: adding channel <AsyncHTTP at 8ea790>
status => ['HTTP/1.1', '200', 'OK/015/012']
server = Apache/Unix (Unix)
content-type = text/html
content-length = 387
...
387 bytes in body
log: closing channel 236:<AsyncHTTP connected at 8ea790>
若是服務器返回狀態 301 (永久重定向) 或者是 302 (臨時重定向), 重定向的 consumer 會關閉當前請求並向新地址發出新請求. 全部對 consumer 的其餘調用傳遞給原來的 consumer .
asynchat
模塊是對 asyncore
的一個擴展. 它提供對面向行( line-oriented )的協議的額外支持. 它還提供了加強的緩衝區支持(經過 push
方法和 "producer" 機制.
Example 7-12 實現了一個很小的 HTTP 服務器. 它只是簡單地返回包含 HTTP 請求信息的 HTML 文檔(瀏覽器窗口出現的輸出).
File: asynchat-example-1.py
import asyncore, asynchat
import os, socket, string
PORT = 8000
class HTTPChannel(asynchat.async_chat):
def _ _init_ _(self, server, sock, addr):
asynchat.async_chat._ _init_ _(self, sock)
self.set_terminator("/r/n")
self.request = None
self.data = ""
self.shutdown = 0
def collect_incoming_data(self, data):
self.data = self.data + data
def found_terminator(self):
if not self.request:
# got the request line
self.request = string.split(self.data, None, 2)
if len(self.request) != 3:
self.shutdown = 1
else:
self.push("HTTP/1.0 200 OK/r/n")
self.push("Content-type: text/html/r/n")
self.push("/r/n")
self.data = self.data + "/r/n"
self.set_terminator("/r/n/r/n") # look for end of headers
else:
# return payload.
self.push("<html><body><pre>/r/n")
self.push(self.data)
self.push("</pre></body></html>/r/n")
self.close_when_done()
class HTTPServer(asyncore.dispatcher):
def _ _init_ _(self, port):
self.create_socket(socket.AF_INET, socket.SOCK_STREAM)
self.bind(("", port))
self.listen(5)
def handle_accept(self):
conn, addr = self.accept()
HTTPChannel(self, conn, addr)
#
# try it out
s = HTTPServer(PORT)
print "serving at port", PORT, "..."
asyncore.loop()
GET / HTTP/1.1
Accept: */*
Accept-Language: en, sv
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; Bruce/1.0)
Host: localhost:8000
Connection: Keep-Alive
producer 接口容許你傳入( "push" )太大以致於沒法在內存中儲存的對象. asyncore
在須要更多數據的時候自動調用 producer 的 more
方法. 另外, 它使用一個空字符串標記文件的末尾.
Example 7-13 實現了一個很簡單的基於文件的 HTTP 服務器, 它使用了一個簡單的 FileProducer 類來從文件中讀取數據, 每次只讀取幾 kb .
File: asynchat-example-2.py
import asyncore, asynchat
import os, socket, string, sys
import StringIO, mimetools
ROOT = "."
PORT = 8000
class HTTPChannel(asynchat.async_chat):
def _ _init_ _(self, server, sock, addr):
asynchat.async_chat._ _init_ _(self, sock)
self.server = server
self.set_terminator("/r/n/r/n")
self.header = None
self.data = ""
self.shutdown = 0
def collect_incoming_data(self, data):
self.data = self.data + data
if len(self.data) > 16384:
# limit the header size to prevent attacks
self.shutdown = 1
def found_terminator(self):
if not self.header:
# parse http header
fp = StringIO.StringIO(self.data)
request = string.split(fp.readline(), None, 2)
if len(request) != 3:
# badly formed request; just shut down
self.shutdown = 1
else:
# parse message header
self.header = mimetools.Message(fp)
self.set_terminator("/r/n")
self.server.handle_request(
self, request[0], request[1], self.header
)
self.close_when_done()
self.data = ""
else:
pass # ignore body data, for now
def pushstatus(self, status, explanation="OK"):
self.push("HTTP/1.0 %d %s/r/n" % (status, explanation))
class FileProducer:
# a producer that reads data from a file object
def _ _init_ _(self, file):
self.file = file
def more(self):
if self.file:
data = self.file.read(2048)
if data:
return data
self.file = None
return ""
class HTTPServer(asyncore.dispatcher):
def _ _init_ _(self, port=None, request=None):
if not port:
port = 80
self.port = port
if request:
self.handle_request = request # external request handler
self.create_socket(socket.AF_INET, socket.SOCK_STREAM)
self.bind(("", port))
self.listen(5)
def handle_accept(self):
conn, addr = self.accept()
HTTPChannel(self, conn, addr)
def handle_request(self, channel, method, path, header):
try:
# this is not safe!
while path[:1] == "/":
path = path[1:]
filename = os.path.join(ROOT, path)
print path, "=>", filename
file = open(filename, "r")
except IOError:
channel.pushstatus(404, "Not found")
channel.push("Content-type: text/html/r/n")
channel.push("/r/n")
channel.push("<html><body>File not found.</body></html>/r/n")
else:
channel.pushstatus(200, "OK")
channel.push("Content-type: text/html/r/n")
channel.push("/r/n")
channel.push_with_producer(FileProducer(file))
#
# try it out
s = HTTPServer(PORT)
print "serving at port", PORT
asyncore.loop()
serving at port 8000
log: adding channel <HTTPServer at 8e54d0>
log: adding channel <HTTPChannel at 8e64a0>
samples/sample.htm => ./samples/sample.htm
log: closing channel 96:<HTTPChannel connected at 8e64a0>
urlib
模塊爲 HTTP , FTP , 以及 gopher 提供了一個統一的客戶端接口. 它會自動地根據 URL 選擇合適的協議處理器.
從 URL 獲取數據是很是簡單的. 只須要調用 urlopen
方法, 而後從返回的流對象中讀取數據便可, 如 Example 7-14 所示.
File: urllib-example-1.py
import urllib
fp = urllib.urlopen("http://www.python.org")
op = open("out.html", "wb")
n = 0
while 1:
s = fp.read(8192)
if not s:
break
op.write(s)
n = n + len(s)
fp.close()
op.close()
for k, v in fp.headers.items():
print k, "=", v
print "copied", n, "bytes from", fp.url
server = Apache/1.3.6 (Unix)
content-type = text/html
accept-ranges = bytes
date = Mon, 11 Oct 1999 20:11:40 GMT
connection = close
etag = "741e9-7870-37f356bf"
content-length = 30832
last-modified = Thu, 30 Sep 1999 12:25:35 GMT
copied 30832 bytes from http://www.python.org
這個流對象提供了一些非標準的屬性. headers
是一個 Message 對象(在 mimetools
模塊中定義), url
是實際的 URL . 後者會根據服務器的重定向而更新.
urlopen
函數其實是一個輔助函數, 它會建立一個 FancyURLopener 類的實例並調用它的 open
方法. 你也能夠繼承這個類來完成特殊的行爲. 例如Example 7-15 中的類會自動地 在必要時登錄服務器.
File: urllib-example-3.py
import urllib
class myURLOpener(urllib.FancyURLopener):
# read an URL, with automatic HTTP authentication
def setpasswd(self, user, passwd):
self._ _user = user
self._ _passwd = passwd
def prompt_user_passwd(self, host, realm):
return self._ _user, self._ _passwd
urlopener = myURLOpener()
urlopener.setpasswd("mulder", "trustno1")
fp = urlopener.open("http://www.secretlabs.com")
print fp.read()
urlparse
模塊包含用於處理 URL 的函數, 能夠在 URL 和平臺特定的文件名間相互轉換. 如 Example 7-16 所示.
File: urlparse-example-1.py
import urlparse
print urlparse.urlparse("http://host/path;params?query#fragment")
('http', 'host', '/path', 'params', 'query', 'fragment')
一個常見用途就是把 HTTP URL 分割爲主機名和路徑組件(一個 HTTP 請求會涉及到 主機名以及請求路徑), 如 Example 7-17 所示.
File: urlparse-example-2.py
import urlparse
scheme, host, path, params, query, fragment =/
urlparse.urlparse("http://host/path;params?query#fragment")
if scheme == "http":
print "host", "=>", host
if params:
path = path + ";" + params
if query:
path = path + "?" + query
print "path", "=>", path
host => host
path => /path;params?query
Example 7-18 展現瞭如何使用 urlunparse
函數將各組成部分合並回一個 URL .
File: urlparse-example-3.py
import urlparse
scheme, host, path, params, query, fragment =/
urlparse.urlparse("http://host/path;params?query#fragment")
if scheme == "http":
print "host", "=>", host
print "path", "=>", urlparse.urlunparse(
(None, None, path, params, query, None)
)
host => host
path => /path;params?query
Example 7-19 使用 urljoin
函數將絕對路徑和相對路徑組合起來.
File: urlparse-example-4.py
import urlparse
base = "http://spam.egg/my/little/pony"
for path in "/index", "goldfish", "../black/cat":
print path, "=>", urlparse.urljoin(base, path)
/index => http://spam.egg/index
goldfish => http://spam.egg/my/little/goldfish
../black/cat => http://spam.egg/my/black/cat
(2.0 中新增) 該模塊爲 HTTP 客戶端和服務器提供了基本的 cookie 支持. Example 7-20 展現了它的使用.
File: cookie-example-1.py
import Cookie
import os, time
cookie = Cookie.SimpleCookie()
cookie["user"] = "Mimi"
cookie["timestamp"] = time.time()
print cookie
# simulate CGI roundtrip
os.environ["HTTP_COOKIE"] = str(cookie)
cookie = Cookie.SmartCookie()
cookie.load(os.environ["HTTP_COOKIE"])
for key, item in cookie.items():
# dictionary items are "Morsel" instances
# use value attribute to get actual value
print key, repr(item.value)
Set-Cookie: timestamp=736513200;
Set-Cookie: user=Mimi;
user 'Mimi'
timestamp '736513200'
(2.0 中新增) robotparser
模塊用來讀取 robots.txt
文件, 該文件用於 Robot Exclusion Protocol (搜索機器人排除協議?http://info.webcrawler.com/mak/projects/robots/robots.html).
若是你實現的一個 HTTP 機器人會訪問網路上的任意站點(並不僅是你本身的站點), 那麼最好仍是用該模塊檢查下你所作的一切是否是受歡迎的. Example 7-21 展現了該模塊的使用.
File: robotparser-example-1.py
import robotparser
r = robotparser.RobotFileParser()
r.set_url("http://www.python.org/robots.txt")
r.read()
if r.can_fetch("*", "/index.html"):
print "may fetch the home page"
if r.can_fetch("*", "/tim_one/index.html"):
print "may fetch the tim peters archive"
may fetch the home page
ftplib
模塊包含了一個 File Transfer Protocol (FTP , 文件傳輸協議)客戶端的實現.
Example 7-22 展現瞭如何登錄並得到登錄目錄的文件列表. 注意這裏的文件列表 (列目錄操做)格式與服務器有關(通常和主機平臺的列目錄工具輸出格式相同, 例如 Unix 下的 ls
和 Windows/DOS 下的 dir
).
File: ftplib-example-1.py
import ftplib
ftp = ftplib.FTP("www.python.org")
ftp.login("anonymous", "ftplib-example-1")
print ftp.dir()
ftp.quit()
total 34
drwxrwxr-x 11 root 4127 512 Sep 14 14:18 .
drwxrwxr-x 11 root 4127 512 Sep 14 14:18 ..
drwxrwxr-x 2 root 4127 512 Sep 13 15:18 RCS
lrwxrwxrwx 1 root bin 11 Jun 29 14:34 README -> welcome.msg
drwxr-xr-x 3 root wheel 512 May 19 1998 bin
drwxr-sr-x 3 root 1400 512 Jun 9 1997 dev
drwxrwxr-- 2 root 4127 512 Feb 8 1998 dup
drwxr-xr-x 3 root wheel 512 May 19 1998 etc
...
下載文件很簡單; 使用合適的 retr
函數便可. 注意當你下載文本文件時, 你必須本身加上行結束符. Example 7-23 中使用了一個 lambda
表達式完成這項工做.
File: ftplib-example-2.py
import ftplib
import sys
def gettext(ftp, filename, outfile=None):
# fetch a text file
if outfile is None:
outfile = sys.stdout
# use a lambda to add newlines to the lines read from the server
ftp.retrlines("RETR " + filename, lambda s, w=outfile.write: w(s+"/n"))
def getbinary(ftp, filename, outfile=None):
# fetch a binary file
if outfile is None:
outfile = sys.stdout
ftp.retrbinary("RETR " + filename, outfile.write)
ftp = ftplib.FTP("www.python.org")
ftp.login("anonymous", "ftplib-example-2")
gettext(ftp, "README")
getbinary(ftp, "welcome.msg")
WELCOME to python.org, the Python programming language home site.
You are number %N of %M allowed users. Ni!
Python Web site: http://www.python.org/
CONFUSED FTP CLIENT? Try begining your login password with '-' dash.
This turns off continuation messages that may be confusing your client.
...
最後, Example 7-24 將文件複製到 FTP 服務器上. 這個腳本使用文件擴展名來 判斷文件是文本文件仍是二進制文件.
File: ftplib-example-3.py
import ftplib
import os
def upload(ftp, file):
ext = os.path.splitext(file)[1]
if ext in (".txt", ".htm", ".html"):
ftp.storlines("STOR " + file, open(file))
else:
ftp.storbinary("STOR " + file, open(file, "rb"), 1024)
ftp = ftplib.FTP("ftp.fbi.gov")
ftp.login("mulder", "trustno1")
upload(ftp, "trixie.zip")
upload(ftp, "file.txt")
upload(ftp, "sightings.jpg")
gopherlib
模塊包含了一個 gopher 客戶端實現, 如 Example 7-25 所示.
File: gopherlib-example-1.py
import gopherlib
host = "gopher.spam.egg"
f = gopherlib.send_selector("1/", host)
for item in gopherlib.get_directory(f):
print item
['0', "About Spam.Egg's Gopher Server", "0/About's Spam.Egg's
Gopher Server", 'gopher.spam.egg', '70', '+']
['1', 'About Spam.Egg', '1/Spam.Egg', 'gopher.spam.egg', '70', '+']
['1', 'Misc', '1/Misc', 'gopher.spam.egg', '70', '+']
...
httplib
模塊提供了一個 HTTP 客戶端接口, 如 Example 7-26 所示.
File: httplib-example-1.py
import httplib
USER_AGENT = "httplib-example-1.py"
class Error:
# indicates an HTTP error
def _ _init_ _(self, url, errcode, errmsg, headers):
self.url = url
self.errcode = errcode
self.errmsg = errmsg
self.headers = headers
def _ _repr_ _(self):
return (
"<Error for %s: %s %s>" %
(self.url, self.errcode, self.errmsg)
)
class Server:
def _ _init_ _(self, host):
self.host = host
def fetch(self, path):
http = httplib.HTTP(self.host)
# write header
http.putrequest("GET", path)
http.putheader("User-Agent", USER_AGENT)
http.putheader("Host", self.host)
http.putheader("Accept", "*/*")
http.endheaders()
# get response
errcode, errmsg, headers = http.getreply()
if errcode != 200:
raise Error(errcode, errmsg, headers)
file = http.getfile()
return file.read()
if _ _name_ _ == "_ _main_ _":
server = Server("www.pythonware.com")
print server.fetch("/index.htm")
注意 httplib 提供的 HTTP 客戶端在等待服務器回覆的時候會阻塞程序. 異步的解決方法請參閱 asyncore
模塊中的例子.
httplib
能夠用來發送其餘 HTTP 命令, 例如 POST
, 如 Example 7-27 所示.
File: httplib-example-2.py
import httplib
USER_AGENT = "httplib-example-2.py"
def post(host, path, data, type=None):
http = httplib.HTTP(host)
# write header
http.putrequest("PUT", path)
http.putheader("User-Agent", USER_AGENT)
http.putheader("Host", host)
if type:
http.putheader("Content-Type", type)
http.putheader("Content-Length", str(len(size)))
http.endheaders()
# write body
http.send(data)
# get response
errcode, errmsg, headers = http.getreply()
if errcode != 200:
raise Error(errcode, errmsg, headers)
file = http.getfile()
return file.read()
if _ _name_ _ == "_ _main_ _":
post("www.spam.egg", "/bacon.htm", "a piece of data", "text/plain")
poplib
模塊(如 Example 7-28 所示) 提供了一個 Post Office Protocol ( POP3 協議) 客戶端實現. 這個協議用來從郵件服務器 "pop" (拷貝) 信息到你的我的電腦.
File: poplib-example-1.py
import poplib
import string, random
import StringIO, rfc822
SERVER = "pop.spam.egg"
USER = "mulder"
PASSWORD = "trustno1"
# connect to server
server = poplib.POP3(SERVER)
# login
server.user(USER)
server.pass_(PASSWORD)
# list items on server
resp, items, octets = server.list()
# download a random message
id, size = string.split(random.choice(items))
resp, text, octets = server.retr(id)
text = string.join(text, "/n")
file = StringIO.StringIO(text)
message = rfc822.Message(file)
for k, v in message.items():
print k, "=", v
print message.fp.read()
subject = ANN: (the eff-bot guide to) The Standard Python Library
message-id = <199910120808.KAA09206@spam.egg>
received = (from fredrik@spam.egg)
by spam.egg (8.8.7/8.8.5) id KAA09206
for mulder; Tue, 12 Oct 1999 10:08:47 +0200
from = Fredrik Lundh <fredrik@spam.egg>
date = Tue, 12 Oct 1999 10:08:47 +0200
to = mulder@spam.egg
...
imaplib
模塊提供了一個 Internet Message Access Protocol ( IMAP, Internet 消息訪問協議) 的客戶端實現. 這個協議容許你訪問郵件服務器的郵件目錄, 就好像是在本機訪問同樣. 如 Example 7-29 所示.
File: imaplib-example-1.py
import imaplib
import string, random
import StringIO, rfc822
SERVER = "imap.spam.egg"
USER = "mulder"
PASSWORD = "trustno1"
# connect to server
server = imaplib.IMAP4(SERVER)
# login
server.login(USER, PASSWORD)
server.select()
# list items on server
resp, items = server.search(None, "ALL")
items = string.split(items[0])
# fetch a random item
id = random.choice(items)
resp, data = server.fetch(id, "(RFC822)")
text = data[0][1]
file = StringIO.StringIO(text)
message = rfc822.Message(file)
for k, v in message.items():
print k, "=", v
print message.fp.read()
server.logout()
subject = ANN: (the eff-bot guide to) The Standard Python Library
message-id = <199910120816.KAA12177@larch.spam.egg>
to = mulder@spam.egg
date = Tue, 12 Oct 1999 10:16:19 +0200 (MET DST)
from = <effbot@spam.egg>
received = (effbot@spam.egg) by imap.algonet.se (8.8.8+Sun/8.6.12)
id KAA12177 for effbot@spam.egg; Tue, 12 Oct 1999 10:16:19 +0200
(MET DST)
body text for test 5
smtplib
模塊提供了一個 Simple Mail Transfer Protocol ( SMTP , 簡單郵件傳輸協議) 客戶端實現. 該協議用於經過 Unix 郵件服務器發送郵件, 如 Example 7-30 所示.
讀取郵件請使用 poplib
或 imaplib
模塊.
File: smtplib-example-1.py
import smtplib
import string, sys
HOST = "localhost"
FROM = "effbot@spam.egg"
TO = "fredrik@spam.egg"
SUBJECT = "for your information!"
BODY = "next week: how to fling an otter"
body = string.join((
"From: %s" % FROM,
"To: %s" % TO,
"Subject: %s" % SUBJECT,
"",
BODY), "/r/n")
print body
server = smtplib.SMTP(HOST)
server.sendmail(FROM, [TO], body)
server.quit()
From: effbot@spam.egg
To: fredrik@spam.egg
Subject: for your information!
next week: how to fling an otter
telnetlib
模塊提供了一個 telnet 客戶端實現.
Example 7-31 鏈接到一臺 Unix 計算機, 登錄, 而後請求一個目錄的列表.
File: telnetlib-example-1.py
import telnetlib
import sys
HOST = "spam.egg"
USER = "mulder"
PASSWORD = "trustno1"
telnet = telnetlib.Telnet(HOST)
telnet.read_until("login: ")
telnet.write(USER + "/n")
telnet.read_until("Password: ")
telnet.write(PASSWORD + "/n")
telnet.write("ls librarybook/n")
telnet.write("exit/n")
print telnet.read_all()
[spam.egg mulder]$ ls
README os-path-isabs-example-1.py
SimpleAsyncHTTP.py os-path-isdir-example-1.py
aifc-example-1.py os-path-isfile-example-1.py
anydbm-example-1.py os-path-islink-example-1.py
array-example-1.py os-path-ismount-example-1.py
...
nntplib
模塊提供了一個網絡新聞傳輸協議( Network News Transfer Protocol, NNTP )客戶端的實現.
重新聞服務器上讀取消息以前, 你必須鏈接這個服務器並選擇一個新聞組. Example 7-32 中的腳本會從服務器下載一個完成的消息列表, 而後根據列表作簡單的統計.
File: nntplib-example-1.py
import nntplib
import string
SERVER = "news.spam.egg"
GROUP = "comp.lang.python"
AUTHOR = "fredrik@pythonware.com" # eff-bots human alias
# connect to server
server = nntplib.NNTP(SERVER)
# choose a newsgroup
resp, count, first, last, name = server.group(GROUP)
print "count", "=>", count
print "range", "=>", first, last
# list all items on the server
resp, items = server.xover(first, last)
# extract some statistics
authors = {}
subjects = {}
for id, subject, author, date, message_id, references, size, lines in items:
authors[author] = None
if subject[:4] == "Re: ":
subject = subject[4:]
subjects[subject] = None
if string.find(author, AUTHOR) >= 0:
print id, subject
print "authors", "=>", len(authors)
print "subjects", "=>", len(subjects)
count => 607
range => 57179 57971
57474 Three decades of Python!
...
57477 More Python books coming...
authors => 257
subjects => 200
下載消息是很簡單的, 只須要調用 article
方法, 如 Example 7-33 所示.
File: nntplib-example-2.py
import nntplib
import string
SERVER = "news.spam.egg"
GROUP = "comp.lang.python"
KEYWORD = "tkinter"
# connect to server
server = nntplib.NNTP(SERVER)
resp, count, first, last, name = server.group(GROUP)
resp, items = server.xover(first, last)
for id, subject, author, date, message_id, references, size, lines in items:
if string.find(string.lower(subject), KEYWORD) >= 0:
resp, id, message_id, text = server.article(id)
print author
print subject
print len(text), "lines in article"
"Fredrik Lundh" <fredrik@pythonware.com>
Re: Programming Tkinter (In Python)
110 lines in article
...
Example 7-34 展現瞭如何進一步處理這些消息, 你能夠把它封裝到一個 Message 對象中(使用 rfc822
模塊).
File: nntplib-example-3.py
import nntplib
import string, random
import StringIO, rfc822
SERVER = "news.spam.egg"
GROUP = "comp.lang.python"
# connect to server
server = nntplib.NNTP(SERVER)
resp, count, first, last, name = server.group(GROUP)
for i in range(10):
try:
id = random.randint(int(first), int(last))
resp, id, message_id, text = server.article(str(id))
except (nntplib.error_temp, nntplib.error_perm):
pass # no such message (maybe it was deleted?)
else:
break # found a message!
else:
raise SystemExit
text = string.join(text, "/n")
file = StringIO.StringIO(text)
message = rfc822.Message(file)
for k, v in message.items():
print k, "=", v
print message.fp.read()
mime-version = 1.0
content-type = text/plain; charset="iso-8859-1"
message-id = <008501bf1417$1cf90b70$f29b12c2@sausage.spam.egg>
lines = 22
...
from = "Fredrik Lundh" <fredrik@pythonware.com>
nntp-posting-host = parrot.python.org
subject = ANN: (the eff-bot guide to) The Standard Python Library
...
</F>
到這一步後, 你可使用 htmllib
, uu
, 以及 base64
繼續處理這些消息.
SocketServer
爲各類基於 socket 的服務器提供了一個框架. 該模塊提供了大量的類, 你能夠用它們來建立不一樣的服務器.
Example 7-35 使用該模塊實現了一個 Internet 時間協議服務器. 你能夠用前邊的 timeclient 腳本鏈接它.
File: socketserver-example-1.py
import SocketServer
import time
# user-accessible port
PORT = 8037
# reference time
TIME1970 = 2208988800L
class TimeRequestHandler(SocketServer.StreamRequestHandler):
def handle(self):
print "connection from", self.client_address
t = int(time.time()) + TIME1970
b = chr(t>>24&255) + chr(t>>16&255) + chr(t>>8&255) + chr(t&255)
self.wfile.write(b)
server = SocketServer.TCPServer(("", PORT), TimeRequestHandler)
print "listening on port", PORT
server.serve_forever()
connection from ('127.0.0.1', 1488)
connection from ('127.0.0.1', 1489)
...
這是一個創建在 SocketServer
框架上的基本框架, 用於 HTTP 服務器.
Example 7-36 在每次從新載入頁面時會生成一條隨機信息. path
變量包含當前 URL , 你可使用它爲不一樣的 URL 生成不一樣的內容 (訪問除根目錄的其餘任何 path 該腳本都會返回一個錯誤頁面).
File: basehttpserver-example-1.py
import BaseHTTPServer
import cgi, random, sys
MESSAGES = [
"That's as maybe, it's still a frog.",
"Albatross! Albatross! Albatross!",
"It's Wolfgang Amadeus Mozart.",
"A pink form from Reading.",
"Hello people, and welcome to 'It's a Tree.'"
"I simply stare at the brick and it goes to sleep.",
]
class Handler(BaseHTTPServer.BaseHTTPRequestHandler):
def do_GET(self):
if self.path != "/":
self.send_error(404, "File not found")
return
self.send_response(200)
self.send_header("Content-type", "text/html")
self.end_headers()
try:
# redirect stdout to client
stdout = sys.stdout
sys.stdout = self.wfile
self.makepage()
finally:
sys.stdout = stdout # restore
def makepage(self):
# generate a random message
tagline = random.choice(MESSAGES)
print "<html>"
print "<body>"
print "<p>Today's quote: "
print "<i>%s</i>" % cgi.escape(tagline)
print "</body>"
print "</html>"
PORT = 8000
httpd = BaseHTTPServer.HTTPServer(("", PORT), Handler)
print "serving at port", PORT
httpd.serve_forever()
更有擴展性的 HTTP 框架請參閱 SimpleHTTPServer
和 CGIHTTPServer
模塊.
SimpleHTTPServer
模塊是一個簡單的 HTTP 服務器, 它提供了標準的 GET 和 HEAD 請求處理器. 客戶端請求的路徑名稱會被翻譯爲一個相對文件名 (相對於服務器啓動時的當前路徑). Example 7-37 展現了該模塊的使用.
File: simplehttpserver-example-1.py
import SimpleHTTPServer
import SocketServer
# minimal web server. serves files relative to the
# current directory.
PORT = 8000
Handler = SimpleHTTPServer.SimpleHTTPRequestHandler
httpd = SocketServer.TCPServer(("", PORT), Handler)
print "serving at port", PORT
httpd.serve_forever()
serving at port 8000
localhost - - [11/Oct/1999 15:07:44] code 403, message Directory listing not sup
ported
localhost - - [11/Oct/1999 15:07:44] "GET / HTTP/1.1" 403 -
localhost - - [11/Oct/1999 15:07:56] "GET /samples/sample.htm HTTP/1.1" 200 -
這個服務器會忽略驅動器符號和相對路徑名(例如 `..`). 但它並無任何訪問驗證處理, 因此請當心使用.
Example 7-38 實現了個迷你的 web 代理. 發送給代理的 HTTP 請求必須包含目標服務器的完整 URI . 代理服務器使用 urllib
來獲取目標服務器的數據.
File: simplehttpserver-example-2.py
# a truly minimal HTTP proxy
import SocketServer
import SimpleHTTPServer
import urllib
PORT = 1234
class Proxy(SimpleHTTPServer.SimpleHTTPRequestHandler):
def do_GET(self):
self.copyfile(urllib.urlopen(self.path), self.wfile)
httpd = SocketServer.ForkingTCPServer(('', PORT), Proxy)
print "serving at port", PORT
httpd.serve_forever()
CGIHTTPServer
模塊是一個能夠經過公共網關接口( common gateway interface , CGI )調用外部腳本的 HTTP 服務器. 如 Example 7-39 所示.
File: cgihttpserver-example-1.py
import CGIHTTPServer
import BaseHTTPServer
class Handler(CGIHTTPServer.CGIHTTPRequestHandler):
cgi_directories = ["/cgi"]
PORT = 8000
httpd = BaseHTTPServer.HTTPServer(("", PORT), Handler)
print "serving at port", PORT
httpd.serve_forever()
cgi
模塊爲 CGI 腳本提供了函數和類支持. 它還能夠處理 CGI 表單數據.
Example 7-40 展現了一個簡單的 CGI 腳本, 它返回給定目錄下的文件列表 (相對於腳本中指定的根目錄)
File: cgi-example-1.py
import cgi
import os, urllib
ROOT = "samples"
# header
print "text/html"
query = os.environ.get("QUERY_STRING")
if not query:
query = "."
script = os.environ.get("SCRIPT_NAME", "")
if not script:
script = "cgi-example-1.py"
print "<html>"
print "<head>"
print "<title>file listing</title>"
print "</head>"
print "</html>"
print "<body>"
try:
files = os.listdir(os.path.join(ROOT, query))
except os.error:
files = []
for file in files:
link = cgi.escape(file)
if os.path.isdir(os.path.join(ROOT, query, file)):
href = script + "?" + os.path.join(query, file)
print "<p><a href= '%s'>%s</a>" % (href, cgi.escape(link))
else:
print "<p>%s" % link
print "</body>"
print "</html>"
text/html
<html>
<head>
<title>file listing</title>
</head>
</html>
<body>
<p>sample.gif
<p>sample.gz
<p>sample.netrc
...
<p>sample.txt
<p>sample.xml
<p>sample~
<p><a href='cgi-example-1.py?web'>web</a>
</body>
</html>
(2.0 中新增) webbrowser
模塊提供了一個到系統標準 web 瀏覽器的接口. 它提供了一個 open
函數, 接受文件名或 URL 做爲參數, 而後在瀏覽器中打開它. 若是你又一次調用 open
函數, 那麼它會嘗試在相同的窗口打開新頁面. 如 Example 7-41 所示.
File: webbrowser-example-1.py
import webbrowser
import time
webbrowser.open("http://www.pythonware.com")
# wait a while, and then go to another page
time.sleep(5)
webbrowser.open(
"http://www.pythonware.com/people/fredrik/librarybook.htm"
)
在 Unix 下, 該模塊支持 lynx , Netscape , Mosaic , Konquerer , 和 Grail . 在 Windows 和 Macintosh 下, 它會調用標準瀏覽器 (在註冊表或是 Internet 選項面板中定義).
locale
模塊提供了 C 本地化( localization )函數的接口, 如 Example 8-1 所示. 同時提供相關函數, 實現基於當前 locale 設置的數字, 字符串轉換. (而 int
,float
, 以及 string
模塊中的相關轉換函數不受 locale 設置的影響.)
====Example 8-1. 使用 locale 模塊格式化數據=====[eg-8-1]
File: locale-example-1.py
import locale
print "locale", "=>", locale.setlocale(locale.LC_ALL, "")
# integer formatting
value = 4711
print locale.format("%d", value, 1), "==",
print locale.atoi(locale.format("%d", value, 1))
# floating point
value = 47.11
print locale.format("%f", value, 1), "==",
print locale.atof(locale.format("%f", value, 1))
info = locale.localeconv()
print info["int_curr_symbol"]
locale => Swedish_Sweden.1252
4,711 == 4711
47,110000 == 47.11
SEK
Example 8-2 展現瞭如何使用 locale
模塊得到當前平臺 locale 設置.
File: locale-example-2.py
import locale
language, encoding = locale.getdefaultlocale()
print "language", language
print "encoding", encoding
language sv_SE
encoding cp1252
( 2.0 中新增) unicodedata
模塊包含了 Unicode 字符的屬性, 例如字符類別, 分解數據, 以及數值. 如 Example 8-3 所示.
File: unicodedata-example-1.py
import unicodedata
for char in [u"A", u"-", u"1", u"/N{LATIN CAPITAL LETTER O WITH DIAERESIS}"]:
print repr(char),
print unicodedata.category(char),
print repr(unicodedata.decomposition(char)),
print unicodedata.decimal(char, None),
print unicodedata.numeric(char, None)
u'A' Lu '' None None
u'-' Pd '' None None
u'1' Nd '' 1 1.0
u'/303/226' Lu '004F 0308' None None
在 Python 2.0 中缺乏 CJK 象形文字和韓語音節的屬性. 這影響到了 0x3400-0x4DB5 , 0x4E00-0x9FA5 , 以及 0xAC00-D7A3 中的字符, 不過每一個區間內的第一個字符屬性是正確的, 咱們能夠把字符映射到起始 實現正常操做:
def remap(char):
# fix for broken unicode property database in Python 2.0
c = ord(char)
if 0x3400 <= c <= 0x4DB5:
return unichr(0x3400)
if 0x4E00 <= c <= 0x9FA5:
return unichr(0x4E00)
if 0xAC00 <= c <= 0xD7A3:
return unichr(0xAC00)
return char
Python 2.1 修復了這個 bug .
(僅適用於 2.0 ) ucnhash
模塊爲一些 Unicode 字符代碼提供了特定的命名. 你能夠直接使用 /N{}
轉義符將 Unicode 字符名稱映射到字符代碼上. 如 Example 8-4 所示.
File: ucnhash-example-1.py
# Python imports this module automatically, when it sees
# the first /N{} escape
# import ucnhash
print repr(u"/N{FROWN}")
print repr(u"/N{SMILE}")
print repr(u"/N{SKULL AND CROSSBONES}")
u'/u2322'
u'/u2323'
u'/u2620'
"Wot? No quote?"
- Guido van Rossum
Python 提供了一些用於處理圖片和音頻文件的模塊.
另請參閱 Pythonware Image Library ( PIL , http://www.pythonware.com/products/pil/ ), 以及 PythonWare Sound Toolkit (PST ,http://www.pythonware.com/products/pst/ ).
譯註: 別參閱 PST 了, 廢了, 用 pymedia
代替吧.
imghdr
模塊可識別不一樣格式的圖片文件. 當前版本能夠識別 bmp
, gif
, jpeg
, pbm
, pgm
, png
, ppm
, rast
(Sun raster), rgb
(SGI), tiff
, 以及 xbm
圖像. 如Example 9-1 所示.
File: imghdr-example-1.py
import imghdr
result = imghdr.what("samples/sample.jpg")
if result:
print "file format:", result
else:
print "cannot identify file"
file format: jpeg
# 使用 PIL
import Image
im = Image.open("samples/sample.jpg")
print im.format, im.mode, im.size
sndhdr
模塊, 可來識別不一樣的音頻文件格式, 並提取文件內容相關信息. 如 Example 9-2 所示.
執行成功後, what
函數將返回一個由文件類型, 採樣頻率, 聲道數, 音軌數和每一個採樣點位數組成的元組. 具體含義請參考 help(sndhdr)
.
File: sndhdr-example-1.py
import sndhdr
result = sndhdr.what("samples/sample.wav")
if result:
print "file format:", result
else:
print "cannot identify file"
file format: ('wav', 44100, 1, -1, 16)
(已廢棄) whatsound
是 sndhdr
模塊的一個別名. 如 Example 9-3 所示.
File: whatsound-example-1.py
import whatsound # same as sndhdr
result = whatsound.what("samples/sample.wav")
if result:
print "file format:", result
else:
print "cannot identify file"
file format: ('wav', 44100, 1, -1, 16)
aifc
模塊用於讀寫 AIFF 和 AIFC 音頻文件(在 SGI 和 Macintosh 的計算機上使用). 如 Example 9-4 所示.
File: SimpleAsyncHTTP.py
import asyncore
import string, socket
import StringIO
import mimetools, urlparse
class AsyncHTTP(asyncore.dispatcher_with_send):
# HTTP requestor
def _ _init_ _(self, uri, consumer):
asyncore.dispatcher_with_send._ _init_ _(self)
self.uri = uri
self.consumer = consumer
# turn the uri into a valid request
scheme, host, path, params, query, fragment = urlparse.urlparse(uri)
assert scheme == "http", "only supports HTTP requests"
try:
host, port = string.split(host, ":", 1)
port = int(port)
except (TypeError, ValueError):
port = 80 # default port
if not path:
path = "/"
if params:
path = path + ";" + params
if query:
path = path + "?" + query
self.request = "GET %s HTTP/1.0/r/nHost: %s/r/n/r/n" % (path, host)
self.host = host
self.port = port
self.status = None
self.header = None
self.data = ""
# get things going!
self.create_socket(socket.AF_INET, socket.SOCK_STREAM)
self.connect((host, port))
def handle_connect(self):
# connection succeeded
self.send(self.request)
def handle_expt(self):
# connection failed; notify consumer (status is None)
self.close()
try:
http_header = self.consumer.http_header
except AttributeError:
pass
else:
http_header(self)
def handle_read(self):
data = self.recv(2048)
if not self.header:
self.data = self.data + data
try:
i = string.index(self.data, "/r/n/r/n")
except ValueError:
return # continue
else:
# parse header
fp = StringIO.StringIO(self.data[:i+4])
# status line is "HTTP/version status message"
status = fp.readline()
self.status = string.split(status, " ", 2)
# followed by a rfc822-style message header
self.header = mimetools.Message(fp)
# followed by a newline, and the payload (if any)
data = self.data[i+4:]
self.data = ""
# notify consumer (status is non-zero)
try:
http_header = self.consumer.http_header
except AttributeError:
pass
else:
http_header(self)
if not self.connected:
return # channel was closed by consumer
self.consumer.feed(data)
def handle_close(self):
self.consumer.close()
self.close()
sunau 模塊用於讀寫 Sun AU 音頻文件. 如 Example 9-5 所示.
File: sunau-example-1.py
import sunau
w = sunau.open("samples/sample.au", "r")
if w.getnchannels() == 1:
print "mono,",
else:
print "stereo,",
print w.getsampwidth()*8, "bits,",
print w.getframerate(), "Hz sampling rate"
mono, 16 bits, 8012 Hz sampling rate
sunaudio
模塊用於識別 Sun AU 音頻文件, 並提取其基本信息. sunau
模塊爲 Sun AU 文件提供了更完成的支持. 如 Example 9-6 所示
File: sunaudio-example-1.py
import sunaudio
file = "samples/sample.au"
print sunaudio.gethdr(open(file, "rb"))
(6761, 1, 8012, 1, 'sample.au')
wave 模塊用於讀寫 Microsoft WAV 音頻文件, 如 Example 9-7 所示.
File: wave-example-1.py
import wave
w = wave.open("samples/sample.wav", "r")
if w.getnchannels() == 1:
print "mono,",
else:
print "stereo,",
print w.getsampwidth()*8, "bits,",
print w.getframerate(), "Hz sampling rate"
mono, 16 bits, 44100 Hz sampling rate
(只用於 Unix) audiodev
爲 Sun 和 SGI 計算機提供了音頻播放支持. 如 Example 9-8 所示.
File: audiodev-example-1.py
import audiodev
import aifc
sound = aifc.open("samples/sample.aiff", "r")
player = audiodev.AudioDev()
player.setoutrate(sound.getframerate())
player.setsampwidth(sound.getsampwidth())
player.setnchannels(sound.getnchannels())
bytes_per_frame = sound.getsampwidth() * sound.getnchannels()
bytes_per_second = sound.getframerate() * bytes_per_frame
while 1:
data = sound.readframes(bytes_per_second)
if not data:
break
player.writeframes(data)
player.wait()
(只用於 Windows ) winsound
模塊容許你在 Winodws 平臺上播放 Wave 文件. 如 Example 9-9 所示.
File: winsound-example-1.py
import winsound
file = "samples/sample.wav"
winsound.PlaySound(
file,
winsound.SND_FILENAME|winsound.SND_NOWAIT,
)
flag 變量說明:
"Unlike mainstream component programming, scripts usually do not introduce new components but simply 'wire' existing ones. Scripts can be seen as introducing behavior but no new state ... Of course, there is nothing to stop a 'scripting' language from introducing persistent state — it then simply turns into a normal programming language."
- Clemens Szyperski, in Component Software
Python 提供了多種類似數據庫管理( database manager )的驅動, 它們的模型都基於 Unix 的 dbm
庫. 這些數據庫和普通的字典對象相似, 但這裏須要注意的是它只能接受字符串做爲鍵和值. ( shelve 模塊能夠處理任何類型的值)
anydbm
模塊爲簡單數據庫驅動提供了統一標準的接口.
當第一次被導入的時候, anydbm
模塊會自動尋找一個合適的數據庫驅動, 按照 dbhash
,
gdbm , dbm
, 或 dumbdbm
的順序嘗試. 若是沒有找到任何模塊, 它將引起一個 ImportError 異常.
open
函數用於打開或建立一個數據庫(使用導入時找到的數據庫驅動), 如 Example 10-1 所示.
File: anydbm-example-1.py
import anydbm
db = anydbm.open("database", "c")
db["1"] = "one"
db["2"] = "two"
db["3"] = "three"
db.close()
db = anydbm.open("database", "r")
for key in db.keys():
print repr(key), repr(db[key])
'2' 'two'
'3' 'three'
'1' 'one'
whichdb
模塊能夠判斷給定數據庫文件的格式, 如 Example 10-2 所示.
File: whichdb-example-1.py
import whichdb
filename = "database"
result = whichdb.whichdb(filename)
if result:
print "file created by", result
handler = _ _import_ _(result)
db = handler.open(filename, "r")
print db.keys()
else:
# cannot identify data base
if result is None:
print "cannot read database file", filename
else:
print "cannot identify database file", filename
db = None
這個例子中使用了 _ _import_ _
函數來導入對應模塊(還記得咱們在第一章的例子麼?).
shelve
模塊使用數據庫驅動實現了字典對象的持久保存. shelve
對象使用字符串做爲鍵, 但值能夠是任意類型, 全部能夠被 pickle 模塊處理的對象均可以做爲它的值. 如 Example 10-3 所示.
File: shelve-example-1.py
import shelve
db = shelve.open("database", "c")
db["one"] = 1
db["two"] = 2
db["three"] = 3
db.close()
db = shelve.open("database", "r")
for key in db.keys():
print repr(key), repr(db[key])
'one' 1
'three' 3
'two' 2
Example 10-4 展現瞭如何使用 shelve 處理給定的數據庫驅動.
File: shelve-example-3.py
import shelve
import gdbm
def gdbm_shelve(filename, flag="c"):
return shelve.Shelf(gdbm.open(filename, flag))
db = gdbm_shelve("dbfile")
(可選) dbhash
模塊爲 bsddb
數據庫驅動提供了一個 dbm
兼容的接口. 如 Example 10-5 所示.
File: dbhash-example-1.py
import dbhash
db = dbhash.open("dbhash", "c")
db["one"] = "the foot"
db["two"] = "the shoulder"
db["three"] = "the other foot"
db["four"] = "the bridge of the nose"
db["five"] = "the naughty bits"
db["six"] = "just above the elbow"
db["seven"] = "two inches to the right of a very naughty bit indeed"
db["eight"] = "the kneecap"
db.close()
db = dbhash.open("dbhash", "r")
for key in db.keys():
print repr(key), repr(db[key])
(可選) dbm
模塊提供了一個到 dbm
數據庫驅動的接口(在許多 Unix 平臺上均可用). 如 Example 10-6 所示.
File: dbm-example-1.py
import dbm
db = dbm.open("dbm", "c")
db["first"] = "bruce"
db["second"] = "bruce"
db["third"] = "bruce"
db["fourth"] = "bruce"
db["fifth"] = "michael"
db["fifth"] = "bruce" # overwrite
db.close()
db = dbm.open("dbm", "r")
for key in db.keys():
print repr(key), repr(db[key])
'first' 'bruce'
'second' 'bruce'
'fourth' 'bruce'
'third' 'bruce'
'fifth' 'bruce'
dumbdbm
模塊是一個簡單的數據庫實現, 與 dbm
一類類似, 但使用純 Python 實現. 它使用兩個文件: 一個二進制文件 (.dat) 用於儲存數據, 一個文本文件 (.dir) 用於數據描述.
File: dumbdbm-example-1.py
import dumbdbm
db = dumbdbm.open("dumbdbm", "c")
db["first"] = "fear"
db["second"] = "surprise"
db["third"] = "ruthless efficiency"
db["fourth"] = "an almost fanatical devotion to the Pope"
db["fifth"] = "nice red uniforms"
db.close()
db = dumbdbm.open("dumbdbm", "r")
for key in db.keys():
print repr(key), repr(db[key])
'first' 'fear'
'third' 'ruthless efficiency'
'fifth' 'nice red uniforms'
'second' 'surprise'
'fourth' 'an almost fanatical devotion to the Pope'
(可選) gdbm
模塊提供了到 GNU dbm 數據驅動的接口, 如 Example 10-8 所示.
File: gdbm-example-1.py
import gdbm
db = gdbm.open("gdbm", "c")
db["1"] = "call"
db["2"] = "the"
db["3"] = "next"
db["4"] = "defendant"
db.close()
db = gdbm.open("gdbm", "r")
keys = db.keys()
keys.sort()
for key in keys:
print db[key],
call the next defendant
標準庫中有一些模塊既可用做模塊又能夠做爲命令行實用程序.
dis
模塊是 Python 的反彙編器. 它能夠把字節碼轉換爲更容易讓人看懂的格式.
你能夠從命令行調用反彙編器. 它會編譯給定的腳本並把反彙編後的字節代碼輸出到終端上:
$ dis.py hello.py
0 SET_LINENO 0
3 SET_LINENO 1
6 LOAD_CONST 0 ('hello again, and welcome to the show')
9 PRINT_ITEM
10 PRINT_NEWLINE
11 LOAD_CONST 1 (None)
14 RETURN_VALUE
固然 dis
也能夠做爲模塊使用. dis
函數接受一個類, 方法, 函數, 或者 code 對象 做爲單個參數. 如 Example 11-1 所示.
File: dis-example-1.py
import dis
def procedure():
print 'hello'
dis.dis(procedure)
0 SET_LINENO 3
3 SET_LINENO 4
6 LOAD_CONST 1 ('hello')
9 PRINT_ITEM
10 PRINT_NEWLINE
11 LOAD_CONST 0 (None)
14 RETURN_VALUE
pdb
模塊是標準 Python 調試器( debugger ). 它基於 bdb
調試器框架.
你能夠從命令行調用調試器 (鍵入 n 或 進入下一行代碼, 鍵入 help 得到可用命令列表):
$ pdb.py hello.py
> hello.py(0)?()
(Pdb) n
> hello.py()
(Pdb) n
hello again, and welcome to the show
--Return--
> hello.py(1)?()->None
(Pdb)
Example 11-2 展現瞭如何從程序中啓動調試器.
File: pdb-example-1.py
import pdb
def test(n):
j = 0
for i in range(n):
j = j + i
return n
db = pdb.Pdb()
db.runcall(test, 1)
> pdb-example-1.py(3)test()
-> def test(n):
(Pdb) s
> pdb-example-1.py(4)test()
-> j = 0
(Pdb) s
> pdb-example-1.py(5)test()
-> for i in range(n):
...
bdb
模塊爲提供了一個調試器框架. 你可使用它來建立自定義的調試器, 如 Example 11-3 所示.
你須要作的只是繼承 Bdb 類, 覆蓋它的 user
方法(在每次調試器中止的時候被調用). 使用各類各樣的 set
方法能夠控制調試器.
File: bdb-example-1.py
import bdb
import time
def spam(n):
j = 0
for i in range(n):
j = j + i
return n
def egg(n):
spam(n)
spam(n)
spam(n)
spam(n)
def test(n):
egg(n)
class myDebugger(bdb.Bdb):
run = 0
def user_call(self, frame, args):
name = frame.f_code.co_name or "<unknown>"
print "call", name, args
self.set_continue() # continue
def user_line(self, frame):
if self.run:
self.run = 0
self.set_trace() # start tracing
else:
# arrived at breakpoint
name = frame.f_code.co_name or "<unknown>"
filename = self.canonic(frame.f_code.co_filename)
print "break at", filename, frame.f_lineno, "in", name
print "continue..."
self.set_continue() # continue to next breakpoint
def user_return(self, frame, value):
name = frame.f_code.co_name or "<unknown>"
print "return from", name, value
print "continue..."
self.set_continue() # continue
def user_exception(self, frame, exception):
name = frame.f_code.co_name or "<unknown>"
print "exception in", name, exception
print "continue..."
self.set_continue() # continue
db = myDebugger()
db.run = 1
db.set_break("bdb-example-1.py", 7)
db.runcall(test, 1)
continue...
call egg None
call spam None
break at C:/ematter/librarybook/bdb-example-1.py 7 in spam
continue...
call spam None
break at C:/ematter/librarybook/bdb-example-1.py 7 in spam
continue...
call spam None
break at C:/ematter/librarybook/bdb-example-1.py 7 in spam
continue...
call spam None
break at C:/ematter/librarybook/bdb-example-1.py 7 in spam
continue...
profile
模塊是標準 Python 分析器.
和反彙編器, 調試器相同, 你能夠從命令行調用分析器:
$ profile.py hello.py
hello again, and welcome to the show
3 function calls in 0.785 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.001 0.001 0.002 0.002 <string>:1(?)
1 0.001 0.001 0.001 0.001 hello.py:1(?)
1 0.783 0.783 0.785 0.785 profile:0(execfile('hello.py'))
0 0.000 0.000 profile:0(profiler)
如 Example 11-4 所示, 咱們還能夠從程序中調用 profile
來對程序性能作分析.
File: profile-example-1.py
import profile
def func1():
for i in range(1000):
pass
def func2():
for i in range(1000):
func1()
profile.run("func2()")
1003 function calls in 2.380 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 2.040 2.040 <string>:1(?)
1000 1.950 0.002 1.950 0.002 profile-example-1.py:3(func1)
1 0.090 0.090 2.040 2.040 profile-example-1.py:7(func2)
1 0.340 0.340 2.380 2.380 profile:0(func2())
0 0.000 0.000 profile:0(profiler)
你可使用 pstats
模塊來修改結果報告的形式.
pstats
模塊用於分析 Python 分析器收集的數據. 如 Example 11-5 所示.
File: pstats-example-1.py
import pstats
import profile
def func1():
for i in range(1000):
pass
def func2():
for i in range(1000):
func1()
p = profile.Profile()
p.run("func2()")
s = pstats.Stats(p)
s.sort_stats("time", "name").print_stats()
1003 function calls in 1.574 CPU seconds
Ordered by: internal time, function name
ncalls tottime percall cumtime percall filename:lineno(function)
1000 1.522 0.002 1.522 0.002 pstats-example-1.py:4(func1)
1 0.051 0.051 1.573 1.573 pstats-example-1.py:8(func2)
1 0.001 0.001 1.574 1.574 profile:0(func2())
1 0.000 0.000 1.573 1.573 <string>:1(?)
0 0.000 0.000 profile:0(profiler)
(2.0 新增) tabnanny
模塊用於檢查 Python 源文件中的含糊的縮進. 當文件混合了 tab 和空格兩種縮進時候, nanny (保姆)會當即給出提示.
在下邊使用的 badtabs.py
文件中, if 語句後的第一行使用 4 個空格和 1 個 tab . 第二行只使用了空格.
$ tabnanny.py -v samples/badtabs.py
';samples/badtabs.py': *** Line 3: trouble in tab city! ***
offending line: print "world"
indent not equal e.g. at tab sizes 1, 2, 3, 5, 6, 7, 9
由於 Python 解釋器把 tab 做爲 8 個空格來處理, 因此這個腳本能夠正常運行. 在全部符合代碼標準(一個 tab 爲 8 個空格)的編輯器中它也會正常顯示. 固然, 這些都騙不過 nanny .
Example 11-6 展現瞭如何在你本身的程序中使用 tabnanny
.
File: tabnanny-example-1.py
import tabnanny
FILE = "samples/badtabs.py"
file = open(FILE)
for line in file.readlines():
print repr(line)
# let tabnanny look at it
tabnanny.check(FILE)
'if 1:/012'
' /011print "hello"/012'
' print "world"/012'
samples/badtabs.py 3 ' print "world"'/012'
將 sys.stdout
重定向到一個 StringIO
對象就能夠捕獲輸出.
本章介紹了一些平臺相關的模塊. 重點放在了適用於整個平臺家族的模塊上. (好比 Unix , Windows 家族)
(只用於 Unix) fcntl
模塊爲 Unix上的 ioctl
和 fcntl
函數提供了一個接口. 它們用於文件句柄和 I/O 設備句柄的 "out of band" 操做, 包括讀取擴展屬性, 控制阻塞. 更改終端行爲等等. (out of band management: 指使用分離的渠道進行設備管理. 這使系統管理員能在機器關機的時候對服務器, 網絡進行監視和管理. 出處: http://en.wikipedia.org/wiki/Out-of-band_management )
關於如何在平臺上使用這些函數, 請查閱對應的 Unix man 手冊.
該模塊同時提供了 Unix 文件鎖定機制的接口. Example 12-1 展現瞭如何使用 flock
函數, 更新文件時爲文件設置一個 advisory lock .
輸出結果是由同時運行 3 個副本獲得的. 像這樣(都在一句命令行裏):
python fcntl-example-1.py& python fcntl-example-1.py& python fcntl-example-1.py&
若是你註釋掉對 flock
的調用, 那麼 counter 文件不會正確地更新.
File: fcntl-example-1.py
import fcntl, FCNTL
import os, time
FILE = "counter.txt"
if not os.path.exists(FILE):
# create the counter file if it doesn't exist
# 建立 counter 文件
file = open(FILE, "w")
file.write("0")
file.close()
for i in range(20):
# increment the counter
file = open(FILE, "r+")
fcntl.flock(file.fileno(), FCNTL.LOCK_EX)
counter = int(file.readline()) + 1
file.seek(0)
file.write(str(counter))
file.close() # unlocks the file
print os.getpid(), "=>", counter
time.sleep(0.1)
30940 => 1
30942 => 2
30941 => 3
30940 => 4
30941 => 5
30942 => 6
(只用於 Unix) pwd
提供了一個到 Unix 密碼/password "數據庫"( /etc/passwd 以及相關文件 )的接口. 這個數據庫(通常是一個純文本文件)包含本地機器用戶帳戶的信息. 如 Example 12-2 所示.
File: pwd-example-1.py
import pwd
import os
print pwd.getpwuid(os.getgid())
print pwd.getpwnam("root")
('effbot', 'dsWjk8', 4711, 4711, 'eff-bot', '/home/effbot', '/bin/bosh')
('root', 'hs2giiw', 0, 0, 'root', '/root', '/bin/bash')
getpwall
函數返回一個包含全部可用用戶數據庫入口的列表. 你可使用它搜索一個用戶.
當須要查詢不少名稱的時候, 你可使用 getpwall
來預加載一個字典, 如 Example 12-3 所示.
File: pwd-example-2.py
import pwd
import os
# preload password dictionary
_pwd = {}
for info in pwd.getpwall():
_pwd[info[0]] = _pwd[info[2]] = info
def userinfo(uid):
# name or uid integer
return _pwd[uid]
print userinfo(os.getuid())
print userinfo("root")
('effbot', 'dsWjk8', 4711, 4711, 'eff-bot', '/home/effbot', '/bin/bosh')
('root', 'hs2giiw', 0, 0, 'root', '/root', '/bin/bash')
(只用於 Unix) grp
模塊提供了一個到 Unix 用戶組/group ( /etc/group )數據庫的接口. getgrgid
函數返回給定用戶組 id 的相關數據(參見 Example 12-4 ),getgrnam
返回給定用戶組名稱的相關數據.
File: grp-example-1.py
import grp
import os
print grp.getgrgid(os.getgid())
print grp.getgrnam("wheel")
('effbot', '', 4711, ['effbot'])
('wheel', '', 10, ['root', 'effbot', 'gorbot', 'timbot'])
getgrall
函數返回包含全部可用用戶組數據庫入口的列表.
若是須要執行不少用戶組查詢, 你可使用 getgrall
來把當前全部的用戶組複製到一個字典裏, 這能夠節省一些時間. Example 12-5 中的 groupinfo
函數返回一個用戶組 id ( int )或是 一個用戶組名稱( str )的信息.
File: grp-example-2.py
import grp
import os
# preload password dictionary
_grp = {}
for info in grp.getgrall():
_grp[info[0]] = _grp[info[2]] = info
def groupinfo(gid):
# name or gid integer
return _grp[gid]
print groupinfo(os.getgid())
print groupinfo("wheel")
('effbot', '', 4711, ['effbot'])
('wheel', '', 10, ['root', 'effbot', 'gorbot', 'timbot'])
(ֻ���� Unix , ��ѡ) nis ģ���ṩ�� NIS ( Network Information Services , ������Ϣ���� ,��ҳ) ����Ľӿ�, �� Example 12-6 ��ʾ. �����ڴӿ��õ� NIS ��ݿ��л�����.
File: nis-example-1.py
import nis
import string
print nis.cat("ypservers")
print string.split(nis.match("bacon", "hosts.byname"))
{'bacon.spam.egg': 'bacon.spam.egg'}
['194.18.155.250', 'bacon.spam.egg', 'bacon', 'spam-010']
(ֻ���� Unix ��ѡ) curses
ģ���ṩ�˶��ı��ַ��ն˴��ڵĿ���, ��ʹ����һ�ֶ�����ն˵ķ���. �� Example 12-7 ��ʾ.
File: curses-example-1.py
import curses
text = [
"a very simple curses demo",
"",
"(press any key to exit)"
]
# connect to the screen
# ��ӵ���Ļ
screen = curses.initscr()
# setup keyboard
# ���ü���
curses.noecho() # no keyboard echo
curses.cbreak() # don't wait for newline
# screen size
# ��Ļ�ߴ�
rows, columns = screen.getmaxyx()
# draw a border around the screen
# ��һ��߿�
screen.border()
# display centered text
# ��ʾ����
y = (rows - len(text)) / 2
for line in text:
screen.addstr(y, (columns-len(line))/2, line)
y = y + 1
screen.getch()
curses.endwin()
(只用於 Unix , 可選) termios
爲 Unix 的終端控制設備提供了一個接口. 它可用於控制終端通信端口的大多方面.
Example 12-8 中, 該模塊臨時關閉了鍵盤迴顯(由第三個標誌域的 ECHO 標誌控制).
File: termios-example-1.py
import termios, TERMIOS
import sys
fileno = sys.stdin.fileno()
attr = termios.tcgetattr(fileno)
orig = attr[:]
print "attr =>", attr[:4] # flags
# disable echo flag
attr[3] = attr[3] & ~TERMIOS.ECHO
try:
termios.tcsetattr(fileno, TERMIOS.TCSADRAIN, attr)
message = raw_input("enter secret message: ")
finally:
# restore terminal settings
termios.tcsetattr(fileno, TERMIOS.TCSADRAIN, orig)
print "secret =>", repr(message)
attr => [1280, 5, 189, 35387]
enter secret message:
secret => 'and now for something completely different'
(只用於 Unix) tty
模塊包含一些用於處理 tty 設備的工具函數. Example 12-9 將終端窗口切換爲 "raw" 模式.
File: tty-example-1.py
import tty
import os, sys
fileno = sys.stdin.fileno()
tty.setraw(fileno)
print raw_input("raw input: ")
tty.setcbreak(fileno)
print raw_input("cbreak input: ")
os.system("stty sane") # ...
raw input: this is raw input
cbreak input: this is cbreak input
(只用於 Unix , 可選) resource
模塊用於查詢或修改當前系統資源限制設置. Example 12-10 展現瞭如何執行查詢操做, Example 12-11 展現瞭如何執行修改操做.
File: resource-example-1.py
import resource
print "usage stats", "=>", resource.getrusage(resource.RUSAGE_SELF)
print "max cpu", "=>", resource.getrlimit(resource.RLIMIT_CPU)
print "max data", "=>", resource.getrlimit(resource.RLIMIT_DATA)
print "max processes", "=>", resource.getrlimit(resource.RLIMIT_NPROC)
print "page size", "=>", resource.getpagesize()
usage stats => (0.03, 0.02, 0, 0, 0, 0, 75, 168, 0, 0, 0, 0, 0, 0, 0, 0)
max cpu => (2147483647, 2147483647)
max data => (2147483647, 2147483647)
max processes => (256, 256)
page size => 4096
File: resource-example-2.py
import resource
resource.setrlimit(resource.RLIMIT_CPU, (0, 1))
# pretend we're busy
for i in range(1000):
for j in range(1000):
for k in range(1000):
pass
CPU time limit exceeded
(只用於 Unix 可選) syslog 模塊用於向系統日誌設備發送信息( syslogd ). 這些信息如何處理依不一樣的系統而定, 一般會被記錄在一個 log 文件中, 例如/var/log/messages , /var/adm/syslog , 或者其餘相似處理. (若是你找不到這個文件, 請聯繫你的系統管理員). Example 12-12 展現了該模塊的使用.
File: syslog-example-1.py
import syslog
import sys
syslog.openlog(sys.argv[0])
syslog.syslog(syslog.LOG_NOTICE, "a log notice")
syslog.syslog(syslog.LOG_NOTICE, "another log notice: %s" % "watch out!")
syslog.closelog()
(只用於 Windows/DOS ) msvcrt
模塊用於訪問 Microsoft Visual C/C++ Runtime Library (MSVCRT) 中函數的方法.
Example 12-13 展現了 getch
函數, 它用於從命令行讀取一次按鍵操做.
File: msvcrt-example-1.py
import msvcrt
print "press 'escape' to quit..."
while 1:
char = msvcrt.getch()
if char == chr(27):
break
print char,
if char == chr(13):
press 'escape' to quit...
h e l l o
kbhit
函數在按鍵時返回(這樣的捕獲操做不會讓 getch
阻塞), 如 Example 12-14 所示.
File: msvcrt-example-2.py
import msvcrt
import time
print "press SPACE to enter the serial number"
while not msvcrt.kbhit() or msvcrt.getch() != " ":
# do something else
print ".",
time.sleep(0.1)
# clear the keyboard buffer
# 清除鍵盤緩衝區
while msvcrt.kbhit():
msvcrt.getch()
serial = raw_input("enter your serial number: ")
print "serial number is", serial
press SPACE to enter the serial number
. . . . . . . . . . . . . . . . . . . . . . . .
enter your serial number: 10
serial number is 10
譯註: 某翻譯在這裏評註道: 我能在 cmd 下運行. 用別的 IDLE 要否則卡住, 要否則接受不了鍵盤輸入. 緣由未知. 這是由於 IDLE 啓動兩個 python 線程, 使用 socket 發送數據, 得到程序返回的.
locking
函數實現了 Windows 下的跨進程文件鎖定, 如 Example 12-15 所示.
File: msvcrt-example-3.py
import msvcrt
import os
LK_UNLCK = 0 # unlock the file region 解鎖區域
LK_LOCK = 1 # lock the file region 鎖定文件區域
LK_NBLCK = 2 # non-blocking lock 非阻塞文件鎖
LK_RLCK = 3 # lock for writing 爲寫入文件提供鎖定
LK_NBRLCK = 4 # non-blocking lock for writing 爲寫入文件提供的非阻塞鎖定
FILE = "counter.txt"
if not os.path.exists(FILE):
file = open(FILE, "w")
file.write("0")
file.close()
for i in range(20):
file = open(FILE, "r+")
# look from current position (0) to end of file
msvcrt.locking(file.fileno(), LK_LOCK, os.path.getsize(FILE))
counter = int(file.readline()) + 1
file.seek(0)
file.write(str(counter))
file.close() # unlocks the file
print os.getpid(), "=>", counter
time.sleep(0.1)
208 => 21
208 => 22
208 => 23
208 => 24
208 => 25
208 => 26
(非直接使用模塊, 只用於 Windows ) nt
模塊是 os
模塊在 Windows 平臺下調用的執行模塊. 幾乎沒有任何緣由直接使用這個模塊, 請使用 os
模塊替代.Example 12-16 展現了它的使用.
File: nt-example-1.py
import nt
# in real life, use os.listdir and os.stat instead!
for file in nt.listdir("."):
print file, nt.stat(file)[6]
aifc-example-1.py 314
anydbm-example-1.py 259
array-example-1.py 48
(只用於 Windows , 2.0 中新增) _winreg
模塊提供了訪問 Windows 註冊表數據庫的一個基本接口. Example 12-17 展現了它的使用.
File: winreg-example-1.py
import _winreg
explorer = _winreg.OpenKey(
_winreg.HKEY_CURRENT_USER,
"Software//Microsoft//Windows/CurrentVersion//Explorer"
)
# list values owned by this registry key
# 列出該註冊表鍵下的全部值
try:
i = 0
while 1:
name, value, type= _winreg.EnumValue(explorer, i)
print repr(name),
i += 1
except WindowsError:
value, type = _winreg.QueryValueEx(explorer, "Logon User Name")
print "user is", repr(value)
'Logon User Name' 'CleanShutdown' 'ShellState' 'Shutdown Setting'
'Reason Setting' 'FaultCount' 'FaultTime' 'IconUnderline'...
user is u'Effbot'
(非直接使用模塊, 只用於 Unix/POSIX ) posix
模塊是 os
模塊在 Unix 及其餘 POSIX 系統下使用的實現模塊. 通常只須要經過 os
模塊訪問它便可. 如 Example 12-18 所示.
File: posix-example-1.py
import posix
for file in posix.listdir("."):
print file, posix.stat(file)[6]
aifc-example-1.py 314
anydbm-example-1.py 259
array-example-1.py 48
就是其餘模塊中用到的模塊.
dospath
模塊(參見 Example 13-1 )提供了 DOS 平臺下的 os.path
功能. 你可使用它在其餘平臺處理 DOS 路徑.
File: dospath-example-1.py
import dospath
file = "/my/little/pony"
print "isabs", "=>", dospath.isabs(file)
print "dirname", "=>", dospath.dirname(file)
print "basename", "=>", dospath.basename(file)
print "normpath", "=>", dospath.normpath(file)
print "split", "=>", dospath.split(file)
print "join", "=>", dospath.join(file, "zorba")
isabs => 1
dirname => /my/little
basename => pony
normpath => /my/little/pony
split => ('/my/little', 'pony')
join => /my/little/pony/zorba
注意 Python 的 DOS 支持可使用斜槓和反斜槓做爲目錄分隔符.
macpath
模塊( 參見 Example 13-2 )提供了 Macintosh 平臺下的 os.path
功能. 你也可使用它在其餘平臺處理 Macintosh 路徑.
File: macpath-example-1.py
import macpath
file = "my:little:pony"
print "isabs", "=>", macpath.isabs(file)
print "dirname", "=>", macpath.dirname(file)
print "basename", "=>", macpath.basename(file)
print "normpath", "=>", macpath.normpath(file)
print "split", "=>", macpath.split(file)
print "join", "=>", macpath.join(file, "zorba")
isabs => 1
dirname => my:little
basename => pony
normpath => my:little:pony
split => ('my:little', 'pony')
join => my:little:pony:zorba
ntpath
模塊( 參見 Example 13-3 )提供了 Windows 平臺下的 os.path
功能. 你也可使用它在其餘平臺處理 Windows 路徑.
File: ntpath-example-1.py
import ntpath
file = "/my/little/pony"
print "isabs", "=>", ntpath.isabs(file)
print "dirname", "=>", ntpath.dirname(file)
print "basename", "=>", ntpath.basename(file)
print "normpath", "=>", ntpath.normpath(file)
print "split", "=>", ntpath.split(file)
print "join", "=>", ntpath.join(file, "zorba")
isabs => 1
dirname => /my/little
basename => pony
normpath => /my/little/pony
split => ('/my/little', 'pony')
join => /my/little/pony/zorba
注意該模塊能夠同時使用斜槓和反斜槓做爲目錄分隔符.
posixpath
模塊( 參見 Example 13-4 )提供了 Unix 和其餘 POSIX 兼容平臺下的 os.path
功能. 你也可使用它在其餘平臺處理 POSIX 路徑. 另外, 它也能夠處理 URL .
File: posixpath-example-1.py
import posixpath
file = "/my/little/pony"
print "isabs", "=>", posixpath.isabs(file)
print "dirname", "=>", posixpath.dirname(file)
print "basename", "=>", posixpath.basename(file)
print "normpath", "=>", posixpath.normpath(file)
print "split", "=>", posixpath.split(file)
print "join", "=>", posixpath.join(file, "zorba")
isabs => 1
dirname => /my/little
basename => pony
normpath => /my/little/pony
split => ('/my/little', 'pony')
join => /my/little/pony/zorba
(已廢棄) strop
爲 string
模塊中的大多函數提供了底層 C 語言實現. string
模塊會自動調用它, 因此通常你不須要直接使用它.
不過在導入 Python 模塊以前處理路徑的時候你可能會用到它. 如 Example 13-5 所示.
File: strop-example-1.py
import strop
import sys
# assuming we have an executable named ".../executable", add a
# directory named ".../executable-extra" to the path
if strop.lower(sys.executable)[-4:] == ".exe":
extra = sys.executable[:-4] # windows
else:
extra = sys.executable
sys.path.insert(0, extra + "-extra")
import mymodule
在 Python 2.0 及之後版本中, 你應該使用字符串方法代替 strop
, 例如在上邊的代碼中. 使用 "sys.executable.lower()
" 替換 "strop.lower(sys.executable)
" .
imp
模塊包含的函數能夠用於實現自定義的 import 行爲. Example 13-6 重載了 import
語句, 實現了對模塊來源的記錄功能.
File: imp-example-1.py
import imp
import sys
def my_import(name, globals=None, locals=None, fromlist=None):
try:
module = sys.modules[name] # already imported?
except KeyError:
file, pathname, description = imp.find_module(name)
print "import", name, "from", pathname, description
module = imp.load_module(name, file, pathname, description)
return module
import _ _builtin_ _
_ _builtin_ _._ _import_ _ = my_import
import xmllib
import xmllib from /python/lib/xmllib.py ('.py', 'r', 1)
import re from /python/lib/re.py ('.py', 'r', 1)
import sre from /python/lib/sre.py ('.py', 'r', 1)
import sre_compile from /python/lib/sre_compile.py ('.py', 'r', 1)
import _sre from /python/_sre.pyd ('.pyd', 'rb', 3)
注意這裏的導入功能不支持包. 具體實現請參閱 knee
模塊的源代碼.
new
模塊是一個底層的模塊, 你可使用它來建立不一樣的內建對象, 例如類對象, 函數對象, 以及其餘由 Python 運行時系統建立的類型. Example 13-7 展現了該模塊的使用.
若是你使用的是 1.5.2 版本 , 那麼你有可能須要從新編譯 Python 來使用這個模塊, 在默認狀況下並非全部平臺都有這個模塊. 在 2.0 及之後版本中, 不須要這麼作.
File: new-example-1.py
import new
class Sample:
a = "default"
def _ _init_ _(self):
self.a = "initialised"
def _ _repr_ _(self):
return self.a
#
# create instances
a = Sample()
print "normal", "=>", a
b = new.instance(Sample, {})
print "new.instance", "=>", b
b._ _init_ _()
print "after _ _init_ _", "=>", b
c = new.instance(Sample, {"a": "assigned"})
print "new.instance w. dictionary", "=>", c
normal => initialised
new.instance => default
after _ _init_ _ => initialised
new.instance w. dictionary => assigned
(已廢棄) pre
模塊是 1.5.2 中 re
模塊調用的實現功能模塊. 在當前版本中已廢棄. Example 13-8 展現了它的使用.
File: pre-example-1.py
import pre
p = pre.compile("[Python]+")
print p.findall("Python is not that bad")
['Python', 'not', 'th', 't']
(功能實現模塊, 已聲明不支持) sre
模塊是 re
模塊的底層實現. 通常不必直接使用它, 並且之後版本將不會支持它. Example 13-9 展現了它的使用.
File: sre-example-1.py
import sre
text = "The Bookshop Sketch"
# a single character
m = sre.match(".", text)
if m: print repr("."), "=>", repr(m.group(0))
# and so on, for all 're' examples...
'.' => 'T'
py_compile
模塊用於將 Python 模塊編譯爲字節代碼. 它和 Python 的 import 語句行爲相似, 不過它接受文件名而不是模塊名做爲參數. 使用方法如 Example 13-10 所示.
File: py-compile-example-1.py
import py_compile
# explicitly compile this module
py_compile.compile("py-compile-example-1.py")
compileall
模塊能夠把一個目錄樹下的全部 Python 文件編譯爲字節代碼.
compileall
模塊用於將給定目錄下(以及 Python path )的全部 Python 腳本編譯爲字節代碼. 它也能夠做爲可執行腳本使用(在 Unix 系統下, Python 安裝時會自動調用執行它). 用法參見 Example 13-11 .
File: compileall-example-1.py
import compileall
print "This may take a while!"
compileall.compile_dir(".", force=1)
This may take a while!
Listing . ...
Compiling ./SimpleAsyncHTTP.py ...
Compiling ./aifc-example-1.py ...
Compiling ./anydbm-example-1.py ...
...
ihooks
模塊爲替換導入提供了一個框架. 這容許多個導入機制共存. 使用方法參見 Example 13-12 .
File: ihooks-example-1.py
import ihooks, imp, os
def import_from(filename):
"Import module from a named file"
loader = ihooks.BasicModuleLoader()
path, file = os.path.split(filename)
name, ext = os.path.splitext(file)
m = loader.find_module_in_dir(name, path)
if not m:
raise ImportError, name
m = loader.load_module(name, m)
return m
colorsys = import_from("/python/lib/colorsys.py")
print colorsys
<module 'colorsys' from '/python/lib/colorsys.py'>
linecache
模塊用於從模塊源文件中讀取代碼. 它會緩存最近訪問的模塊 (整個源文件). 如 Example 13-13 .
File: linecache-example-1.py
import linecache
print linecache.getline("linecache-example-1.py", 5)
print linecache.getline("linecache-example-1.py", 5)
traceback
模塊使用這個模塊實現了對導入操做的跟蹤.
(功能實現模塊) macurl2path
模塊用於 URL 和 Macintosh 文件名 的相互映射. 通常沒有必要直接使用它, 請使用 urllib
中的機制. 它的用法參見 Example 13-14 .
File: macurl2path-example-1.py
import macurl2path
file = ":my:little:pony"
print macurl2path.pathname2url(file)
print macurl2path.url2pathname(macurl2path.pathname2url(file))
my/little/pony
:my:little:pony
(功能實現模塊) nturl2path
模塊用於 URL 和 Windows 文件名的 相互映射. 用法參見 Example 13-15 .
File: nturl2path-example-1.py
import nturl2path
file = r"c:/my/little/pony"
print nturl2path.pathname2url(file)
print nturl2path.url2pathname(nturl2path.pathname2url(file))
///C|/my/little/pony
C:/my/little/pony
一樣地, 請經過 urllib
模塊來訪問這些函數, 如 Example 13-16 所示.
File: nturl2path-example-2.py
import urllib
file = r"c:/my/little/pony"
print urllib.pathname2url(file)
print urllib.url2pathname(urllib.pathname2url(file))
///C|/my/little/pony
C:/my/little/pony
tokenize
模塊將一段 Python 源文件分割成不一樣的 token . 你能夠在代碼高亮工具中使用它.
在 Example 13-17 中, 咱們分別打印出這些 token .
File: tokenize-example-1.py
import tokenize
file = open("tokenize-example-1.py")
def handle_token(type, token, (srow, scol), (erow, ecol), line):
print "%d,%d-%d,%d:/t%s/t%s" % /
(srow, scol, erow, ecol, tokenize.tok_name[type], repr(token))
tokenize.tokenize(
file.readline,
handle_token
)
1,0-1,6: NAME 'import'
1,7-1,15: NAME 'tokenize'
1,15-1,16: NEWLINE '/012'
2,0-2,1: NL '/012'
3,0-3,4: NAME 'file'
3,5-3,6: OP '='
3,7-3,11: NAME 'open'
3,11-3,12: OP '('
3,12-3,35: STRING '"tokenize-example-1.py"'
3,35-3,36: OP ')'
3,36-3,37: NEWLINE '/012'
...
注意這裏的 tokenize
函數接受兩個可調用對象做爲參數: 前一個用於獲取新的代碼行, 第二個用於在得到每一個 token 時調用.
keyword
模塊(參見 Example 13-18 )有一個包含當前 Python 版本所使用的關鍵字的列表. 它還提供了一個字典, 以關鍵字做爲 key , 以一個描述性函數做爲 value , 它可用於檢查 給定單詞是不是 Python 關鍵字.
File: keyword-example-1.py
import keyword
name = raw_input("Enter module name: ")
if keyword.iskeyword(name):
print name, "is a reserved word."
print "here's a complete list of reserved words:"
print keyword.kwlist
Enter module name: assert
assert is a reserved word.
here's a complete list of reserved words:
['and', 'assert', 'break', 'class', 'continue', 'def', 'del',
'elif', 'else', 'except', 'exec', 'finally', 'for', 'from',
'global', 'if', 'import', 'in', 'is', 'lambda', 'not', 'or',
'pass', 'print', 'raise', 'return', 'try', 'while']
(可選) parser
模塊提供了一個到 Python 內建語法分析器和編譯器的接口.
Example 13-19 將一個簡單的表達式編譯爲一個抽象語法樹( abstract syntax tree , AST ), 而後將 AST 轉換爲一個嵌套列表, 轉儲樹 ( 其中每一個節點包含一個語法符號或者是一個 token )中的內容, 將全部數字加上 1 , 最後將列表轉回一個代碼對象. 至少我認爲它是這麼作的.
File: parser-example-1.py
import parser
import symbol, token
def dump_and_modify(node):
name = symbol.sym_name.get(node[0])
if name is None:
name = token.tok_name.get(node[0])
print name,
for i in range(1, len(node)):
item = node[i]
if type(item) is type([]):
dump_and_modify(item)
else:
print repr(item)
if name == "NUMBER":
# increment all numbers!
node[i] = repr(int(item)+1)
ast = parser.expr("1 + 3")
list = ast.tolist()
dump_and_modify(list)
ast = parser.sequence2ast(list)
print eval(parser.compileast(ast))
eval_input testlist test and_test not_test comparison
expr xor_expr and_expr shift_expr arith_expr term factor
power atom NUMBER '1'
PLUS '+'
term factor power atom NUMBER '3'
NEWLINE ''
ENDMARKER ''
6
symbol
模塊包含 Python 語法中的非終止符號. 可能只有你涉及 parser
模塊的時候用到它. 用法參見 Example 13-20 .
File: symbol-example-1.py
import symbol
print "print", symbol.print_stmt
print "return", symbol.return_stmt
print 268
return 274
token
模塊包含標準 Python tokenizer 所使用的 token 標記. 如 Example 13-21 所示.
File: token-example-1.py
import token
print "NUMBER", token.NUMBER
print "PLUS", token.STAR
print "STRING", token.STRING
NUMBER 2
PLUS 16
STRING 3
本章描述的是一些並不怎麼常見的模塊. 一些是很實用的, 另些是已經廢棄的模塊.
pyclbr
模塊包含一個基本的 Python 類解析器, 如 Example 14-1 所示.
版本 1.5.2 中, 改模塊只包含一個 readmodule
函數, 解析給定模塊, 返回一個模塊全部頂層類組成的列表.
File: pyclbr-example-1.py
import pyclbr
mod = pyclbr.readmodule("cgi")
for k, v in mod.items():
print k, v
MiniFieldStorage <pyclbr.Class instance at 7873b0>
InterpFormContentDict <pyclbr.Class instance at 79bd00>
FieldStorage <pyclbr.Class instance at 790e20>
SvFormContentDict <pyclbr.Class instance at 79b5e0>
StringIO <pyclbr.Class instance at 77dd90>
FormContent <pyclbr.Class instance at 79bd60>
FormContentDict <pyclbr.Class instance at 79a9c0>
2.0 及之後版本中, 添加了另個接口 readmodule_ex
, 它還會讀取全局函數. 如 Example 14-2 所示.
File: pyclbr-example-3.py
import pyclbr
# 2.0 and later
mod = pyclbr.readmodule_ex("cgi")
for k, v in mod.items():
print k, v
MiniFieldStorage <pyclbr.Class instance at 00905D2C>
parse_header <pyclbr.Function instance at 00905BD4>
test <pyclbr.Function instance at 00906FBC>
print_environ_usage <pyclbr.Function instance at 00907C94>
parse_multipart <pyclbr.Function instance at 00905294>
FormContentDict <pyclbr.Class instance at 008D3494>
initlog <pyclbr.Function instance at 00904AAC>
parse <pyclbr.Function instance at 00904EFC>
StringIO <pyclbr.Class instance at 00903EAC>
SvFormContentDict <pyclbr.Class instance at 00906824>
...
訪問類實例的屬性能夠得到關於類的更多信息, 如 Example 14-3 所示.
File: pyclbr-example-2.py
import pyclbr
import string
mod = pyclbr.readmodule("cgi")
def dump(c):
# print class header
s = "class " + c.name
if c.super:
s = s + "(" + string.join(map(lambda v: v.name, c.super), ", ") + ")"
print s + ":"
# print method names, sorted by line number
methods = c.methods.items()
methods.sort(lambda a, b: cmp(a[1], b[1]))
for method, lineno in methods:
print " def " + method
for k, v in mod.items():
dump(v)
class MiniFieldStorage:
def _ _init_ _
def _ _repr_ _
class InterpFormContentDict(SvFormContentDict):
def _ _getitem_ _
def values
def items
...
( 2.0 新增) filecmp
模塊用於比較文件和目錄, 如 Example 14-4 所示.
File: filecmp-example-1.py
import filecmp
if filecmp.cmp("samples/sample.au", "samples/sample.wav"):
print "files are identical"
else:
print "files differ!"
# files differ!
1.5.2 以及先前版本中, 你可使用 cmp
和 dircmp
模塊代替.
cmd
模塊爲命令行接口( command-line interfaces , CLI )提供了一個簡單的框架. 它被用在 pdb
模塊中, 固然你也能夠在本身的程序中使用它, 如 Example 14-5 所示.
你只須要繼承 Cmd 類, 定義 do
和 help
方法. 基類會自動地將這些方法轉換爲對應命令.
File: cmd-example-1.py
import cmd
import string, sys
class CLI(cmd.Cmd):
def _ _init_ _(self):
cmd.Cmd._ _init_ _(self)
self.prompt = '> '
def do_hello(self, arg):
print "hello again", arg, "!"
def help_hello(self):
print "syntax: hello [message]",
print "-- prints a hello message"
def do_quit(self, arg):
sys.exit(1)
def help_quit(self):
print "syntax: quit",
print "-- terminates the application"
# shortcuts
do_q = do_quit
#
# try it out
cli = CLI()
cli.cmdloop()
> help
Documented commands (type help <topic>):
========================================
hello quit
Undocumented commands:
======================
help q
> hello world
hello again world !
> q
Feather 注: 版本 2.3 時取消了改模塊的支持, 具體緣由請參閱 : http://www.amk.ca/python/howto/rexec/ 和 http://mail.python.org/pipermail/python-dev/2002-December/031160.html
解決方法請參閱: http://mail.python.org/pipermail/python-list/2003-November/234581.html
rexec
模塊提供了在限制環境下的 exec
, eval
, 以及 import
語句, 如 Example 14-6 所示. 在這個環境下, 全部可能對機器形成威脅的函數都不可用.
File: rexec-example-1.py
import rexec
r = rexec.RExec()
print r.r_eval("1+2+3")
print r.r_eval("_ _import_ _('os').remove('file')")
6
Traceback (innermost last):
File "rexec-example-1.py", line 5, in ?
print r.r_eval("_ _import_ _('os').remove('file')")
File "/usr/local/lib/python1.5/rexec.py", line 257, in r_eval
return eval(code, m._ _dict_ _)
File "<string>", line 0, in ?
AttributeError: remove
Feather 注: 版本 2.3 時取消了改模塊的支持, 具體緣由請參閱 : http://www.amk.ca/python/howto/rexec/ 和 http://mail.python.org/pipermail/python-dev/2003-January/031848.html
Bastion
模塊, 容許你控制給定對象如何使用, 如 Example 14-7 所示. 你能夠經過它把對象從未限制部分傳遞到限制部分.
默認狀況下, 全部的實例變量都是隱藏的, 全部的方法如下劃線開頭.
File: bastion-example-1.py
import Bastion
class Sample:
value = 0
def _set(self, value):
self.value = value
def setvalue(self, value):
if 10 < value <= 20:
self._set(value)
else:
raise ValueError, "illegal value"
def getvalue(self):
return self.value
#
# try it
s = Sample()
s._set(100) # cheat
print s.getvalue()
s = Bastion.Bastion(Sample())
s._set(100) # attempt to cheat
print s.getvalue()
100
Traceback (innermost last):
...
AttributeError: _set
你能夠控制發佈哪一個函數. 在 Example 14- 中, 內部方法能夠從外部調用, 但 getvalue 再也不起做用.
File: bastion-example-2.py
import Bastion
class Sample:
value = 0
def _set(self, value):
self.value = value
def setvalue(self, value):
if 10 < value <= 20:
self._set(value)
else:
raise ValueError, "illegal value"
def getvalue(self):
return self.value
#
# try it
def is_public(name):
return name[:3] != "get"
s = Bastion.Bastion(Sample(), is_public)
s._set(100) # this works
print s.getvalue() # but not this
100
Traceback (innermost last):
...
AttributeError: getvalue
(可選) readline
模塊使用 GNU readline 庫(或兼容庫)實現了 Unix 下加強的輸入編輯支持. 如 Example 14-9 所示.
該模塊提供了加強的命令行編輯功能, 例如命令行歷史等. 它還加強了 input
和 raw_input
函數.
File: readline-example-1.py
import readline # activate readline editing
(可選, 只用於 Unix ) rlcompleter
模塊爲 readline 模塊提供了單詞自動完成功能.
導入該模塊就能夠啓動自動完成功能. 默認狀況下完成函數被綁定在了 Esc 鍵上. 按兩次 Esc 鍵就能夠自動完成當前單詞. 你可使用下面的代碼修改所綁定的鍵:
import readline
readline.parse_and_bind("tab: complete")
Example 14-10 展現瞭如何在程序中使用自動完成函數.
File: rlcompleter-example-1.py
import rlcompleter
import sys
completer = rlcompleter.Completer()
for phrase in "co", "sys.p", "is":
print phrase, "=>",
# emulate readline completion handler
try:
for index in xrange(sys.maxint):
term = completer.complete(phrase, index)
if term is None:
break
print term,
except:
pass
co => continue compile complex coerce completer
sys.p => sys.path sys.platform sys.prefix
is => is isinstance issubclass
statvfs
模塊包含一些與 os.statvfs
(可選)函數配合使用的常量和函數, 該函數會返回文件系統的相關信息. 如 Example 14-11 所示.
File: statvfs-example-1.py
import statvfs
import os
st = os.statvfs(".")
print "preferred block size", "=>", st[statvfs.F_BSIZE]
print "fundamental block size", "=>", st[statvfs.F_FRSIZE]
print "total blocks", "=>", st[statvfs.F_BLOCKS]
print "total free blocks", "=>", st[statvfs.F_BFREE]
print "available blocks", "=>", st[statvfs.F_BAVAIL]
print "total file nodes", "=>", st[statvfs.F_FILES]
print "total free nodes", "=>", st[statvfs.F_FFREE]
print "available nodes", "=>", st[statvfs.F_FAVAIL]
print "max file name length", "=>", st[statvfs.F_NAMEMAX]
preferred block size => 8192
fundamental block size => 1024
total blocks => 749443
total free blocks => 110442
available blocks => 35497
total file nodes => 92158
total free nodes => 68164
available nodes => 68164
max file name length => 255
calendar
模塊是 Unix cal 命令的 Python 實現. 它能夠將給定年份/月份的日曆輸出到標準輸出設備上.
prmonth(year, month)
打印給定月份的日曆, 如 Example 14-12 所示.
File: calendar-example-1.py
import calendar
calendar.prmonth(1999, 12)
December 1999
Mo Tu We Th Fr Sa Su
1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31
prcal(year)
打印給定年份的日曆, 如 Example 14-13 所示.
File: calendar-example-2.py
import calendar
calendar.prcal(2000)
2000
January February March
Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su
1 2 1 2 3 4 5 6 1 2 3 4 5
3 4 5 6 7 8 9 7 8 9 10 11 12 13 6 7 8 9 10 11 12
10 11 12 13 14 15 16 14 15 16 17 18 19 20 13 14 15 16 17 18 19
17 18 19 20 21 22 23 21 22 23 24 25 26 27 20 21 22 23 24 25 26
24 25 26 27 28 29 30 28 29 27 28 29 30 31
31
April May June
Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su
1 2 1 2 3 4 5 6 7 1 2 3 4
3 4 5 6 7 8 9 8 9 10 11 12 13 14 5 6 7 8 9 10 11
10 11 12 13 14 15 16 15 16 17 18 19 20 21 12 13 14 15 16 17 18
17 18 19 20 21 22 23 22 23 24 25 26 27 28 19 20 21 22 23 24 25
24 25 26 27 28 29 30 29 30 31 26 27 28 29 30
July August September
Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su
1 2 1 2 3 4 5 6 1 2 3
3 4 5 6 7 8 9 7 8 9 10 11 12 13 4 5 6 7 8 9 10
10 11 12 13 14 15 16 14 15 16 17 18 19 20 11 12 13 14 15 16 17
17 18 19 20 21 22 23 21 22 23 24 25 26 27 18 19 20 21 22 23 24
24 25 26 27 28 29 30 28 29 30 31 25 26 27 28 29 30
31
October November December
Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su
1 1 2 3 4 5 1 2 3
2 3 4 5 6 7 8 6 7 8 9 10 11 12 4 5 6 7 8 9 10
9 10 11 12 13 14 15 13 14 15 16 17 18 19 11 12 13 14 15 16 17
16 17 18 19 20 21 22 20 21 22 23 24 25 26 18 19 20 21 22 23 24
23 24 25 26 27 28 29 27 28 29 30 25 26 27 28 29 30 31
30 31
注意這裏的日曆是按照歐洲習慣打印的, 也就是說星期一是一個星期的第一天, 其餘狀況須要請參考模塊中的幾個類. (和我們同樣, 不用管了)
該模塊中的其餘類或函數能夠幫助你輸出本身須要的格式.
sched 模塊爲非線程環境提供了一個簡單的計劃任務模式. 如 Example 14-14 所示.
File: sched-example-1.py
import sched
import time, sys
scheduler = sched.scheduler(time.time, time.sleep)
# add a few operations to the queue
scheduler.enter(0.5, 100, sys.stdout.write, ("one/n",))
scheduler.enter(1.0, 300, sys.stdout.write, ("three/n",))
scheduler.enter(1.0, 200, sys.stdout.write, ("two/n",))
scheduler.run()
one
two
three
statcache
模塊提供了訪問文件相關信息的相關函數. 它是 os.stat
的擴展模塊, 並且它會緩存收集到的信息. 如 Example 14-15 所示.
2.2 後該模塊被廢棄, 請使用 os.stat()
函數代替, 緣由很簡單, 它致使了更復雜的緩存管理, 反而下降了性能.
File: statcache-example-1.py
import statcache
import os, stat, time
now = time.time()
for i in range(1000):
st = os.stat("samples/sample.txt")
print "os.stat", "=>", time.time() - now
now = time.time()
for i in range(1000):
st = statcache.stat("samples/sample.txt")
print "statcache.stat", "=>", time.time() - now
print "mode", "=>", oct(stat.S_IMODE(st[stat.ST_MODE]))
print "size", "=>", st[stat.ST_SIZE]
print "last modified", "=>", time.ctime(st[stat.ST_MTIME])
os.stat => 0.371000051498
statcache.stat => 0.0199999809265
mode => 0666
size => 305
last modified => Sun Oct 10 18:39:37 1999
grep
模塊提供了在文本文件中搜索字符串的另種方法, 如 Example 14-16 所示.
版本 2.1 時被聲明不支持, 及就是說, 當前版本已經沒法使用該模塊.
File: grep-example-1.py
import grep
import glob
grep.grep("/<rather/>", glob.glob("samples/*.txt"))
# 4: indentation, rather than delimiters, might become
(已經廢棄) 與 statcache
相似, 該模塊是 os.listdir
函數的一個擴展, 提供了緩存支持, 可能由於一樣的緣由被廢棄吧~ MUHAHAHAHAHA~~~~ . 請使用os.listdir
代替. 如 Example 14-17 所示.
File: dircache-example-1.py
import dircache
import os, time
#
# test cached version
t0 = time.clock()
for i in range(100):
dircache.listdir(os.sep)
print "cached", time.clock() - t0
#
# test standard version
t0 = time.clock()
for i in range(100):
os.listdir(os.sep)
print "standard", time.clock() - t0
cached 0.0664509964968
standard 0.5560845807
(已廢棄, 只用於 1.5.2) dircmp
模塊用於比較兩個目錄的內容, 如 Example 14-18 所示.
File: dircmp-example-1.py
import dircmp
d = dircmp.dircmp()
d.new("samples", "oldsamples")
d.run()
d.report()
diff samples oldsamples
Only in samples : ['sample.aiff', 'sample.au', 'sample.wav']
Identical files : ['sample.gif', 'sample.gz', 'sample.jpg', ...]
Python 2.0 後, 該模塊被 filecmp
替換.
(已廢棄, 只用於 1.5.2) cmp
模塊用於比較兩個文件, 如 Example 14-19 所示.
File: cmp-example-1.py
import cmp
if cmp.cmp("samples/sample.au", "samples/sample.wav"):
print "files are identical"
else:
print "files differ!"
files differ!
Python 2.0 後, 該模塊被 filecmp
替換.
(已廢棄, 只用於 1.5.2) cmpcache
模塊用於比較兩個文件. 它是 cmp 模塊的擴展, 提供了緩存支持. 如 Example 14-20 所示.
File: cmpcache-example-1.py
import cmpcache
if cmpcache.cmp("samples/sample.au", "samples/sample.wav"):
print "files are identical"
else:
print "files differ!"
files differ!
Python 2.0 後, 該模塊被 filecmp
替換.
但 filecmp
已經不提供緩存支持.
(已廢棄, 只用於 1.5.2) util
模塊提供了常見操做的封裝函數. 新代碼可使用如 Examples 14-21 到 14-23 的實現方法.
Example 14-21 展現了 remove(sequence, item)
函數.
File: util-example-1.py
def remove(sequence, item):
if item in sequence:
sequence.remove(item)
Example 14-22 展現了 readfile(filename) => string
函數.
File: util-example-2.py
def readfile(filename):
file = open(filename, "r")
return file.read()
Example 14-23 展現了 `readopenfile(file) => string
函數.
File: util-example-3.py
def readopenfile(file):
return file.read()
(已廢棄, 只用於 1.5.2) soundex
實現了一個簡單的 hash 算法, 基於英文發音將單詞轉換爲 6 個字符的字符串.
版本 2.0 後, 該模塊已從標準庫中刪除.
get_soundex(word)
返回給定單詞的 soundex 字符串. sound_similar(word1, word2)
判斷兩個單詞的 soundex 是否相同. 通常說來發音類似的單詞有相同的 soundex . 如 Example 14-24 所示.
File: soundex-example-1.py
import soundex
a = "fredrik"
b = "friedrich"
print soundex.get_soundex(a), soundex.get_soundex(b)
print soundex.sound_similar(a, b)
F63620 F63620
1
(已廢棄, 只用於 Unix ) timing
用於監控 Python 程序的執行時間. 如 Example 14-25 所示.
File: timing-example-1.py
import timing
import time
def procedure():
time.sleep(1.234)
timing.start()
procedure()
timing.finish()
print "seconds:", timing.seconds()
print "milliseconds:", timing.milli()
print "microseconds:", timing.micro()
seconds: 1
milliseconds: 1239
microseconds: 1239999
你能夠按照 Example 14-26 中的方法用 time
模塊實現 timing
模塊的功能.
File: timing-example-2.py
import time
t0 = t1 = 0
def start():
global t0
t0 = time.time()
def finish():
global t1
t1 = time.time()
def seconds():
return int(t1 - t0)
def milli():
return int((t1 - t0) * 1000)
def micro():
return int((t1 - t0) * 1000000)
time.clock()
能夠替換 time.time()
得到 CPU 時間.
(已廢棄, 只用於 Unix ) posixfile
提供了一個類文件的對象( file-like object ), 實現了文件鎖定的支持. 如 Example 14-27 所示. 新程序請使用 fcntl
模塊代替.
File: posixfile-example-1.py
import posixfile
import string
filename = "counter.txt"
try:
# open for update
file = posixfile.open(filename, "r+")
counter = int(file.read(6)) + 1
except IOError:
# create it
file = posixfile.open(filename, "w")
counter = 0
file.lock("w|", 6)
file.seek(0) # rewind
file.write("%06d" % counter)
file.close() # releases lock
bisect
模塊用於向排序後的序列插入對象.
insort(sequence, item)
將條目插入到序列中, 而且保證序列的排序. 序列能夠是任意實現了 _ _getitem_ _
和 insert
方法的序列對象. 如 Example 14-28所示.
File: bisect-example-1.py
import bisect
list = [10, 20, 30]
bisect.insort(list, 25)
bisect.insort(list, 15)
print list
[10, 15, 20, 25, 30]
bisect(sequence, item) => index
返回條目插入後的索引值, 不對序列作任何修改. 如 Example 14-29 所示.
File: bisect-example-2.py
import bisect
list = [10, 20, 30]
print list
print bisect.bisect(list, 25)
print bisect.bisect(list, 15)
[10, 20, 30]
2
1
knee
模塊用於 Python 1.5 中導入包( package import )的實現. 固然 Python 解釋器已經支持了這個, 因此這個模塊幾乎沒有什麼做用, 不過你能夠看看它的代碼, 明白這一切是怎麼完成的.
代碼請參見 Python-X.tgz/Python-2.4.4/Demo/imputil/knee.py
固然, 你能夠導入該模塊,如 Example 14-30 所示.
File: knee-example-1.py
import knee
# that's all, folks!
(已廢棄) tzparse
模塊用於解析時區標誌( time zone specification ). 導入時它會自動分析 TZ
環境變量. 如 Example 14-31 所示.
File: tzparse-example-1.py
import os
if not os.environ.has_key("TZ"):
# set it to something...
os.environ["TZ"] = "EST+5EDT;100/2,300/2"
# importing this module will parse the TZ variable
import tzparse
print "tzparams", "=>", tzparse.tzparams
print "timezone", "=>", tzparse.timezone
print "altzone", "=>", tzparse.altzone
print "daylight", "=>", tzparse.daylight
print "tzname", "=>", tzparse.tzname
tzparams => ('EST', 5, 'EDT', 100, 2, 300, 2)
timezone => 18000
altzone => 14400
daylight => 1
tzname => ('EST', 'EDT')
除了這些變量以外, 該模塊還提供了一些用於時間計算的函數.
(已廢棄) regex
模塊是舊版本的(1.5 前)正則表達式模塊, 用法如 Example 14-32 所示. 新代碼請使用 re
模塊實現.
注意在 Python 1.5.2 中 regex
比 re
模塊要快. 但在新版本中 re
模塊更快.
File: regex-example-1.py
import regex
text = "Man's crisis of identity in the latter half of the 20th century"
p = regex.compile("latter") # literal
print p.match(text)
print p.search(text), repr(p.group(0))
p = regex.compile("[0-9]+") # number
print p.search(text), repr(p.group(0))
p = regex.compile("/</w/w/>") # two-letter word
print p.search(text), repr(p.group(0))
p = regex.compile("/w+$") # word at the end
print p.search(text), repr(p.group(0))
-1
32 'latter'
51 '20'
13 'of'
56 'century'
(已廢棄) regsub
模塊提供了基於正則表達式的字符串替換操做. 用法如 Example 14-33 所示. 新代碼請使用 re
模塊中的 replace
函數代替.
File: regsub-example-1.py
import regsub
text = "Well, there's spam, egg, sausage, and spam."
print regsub.sub("spam", "ham", text) # just the first
print regsub.gsub("spam", "bacon", text) # all of them
Well, there's ham, egg, sausage, and spam.
Well, there's bacon, egg, sausage, and bacon.
(已廢棄) reconvert
提供了舊樣式正則表達式( regex
模塊中使用)到新樣式( re
模塊)的轉換工具. 如 Example 14-34 所示. 它也能夠做爲一個命令行工具.
File: reconvert-example-1.py
import reconvert
for pattern in "abcd", "a/(b*c/)d", "/</w+/>":
print pattern, "=>", reconvert.convert(pattern)
abcd => abcd
a/(b*c/)d => a(b*c)d
/</w+/> => /b/w+/b
(已廢棄) regex_syntax
模塊用於改變正則表達式的模式, 如 Example 14-35 所示.
File: regex-syntax-example-1.py
import regex_syntax
import regex
def compile(pattern, syntax):
syntax = regex.set_syntax(syntax)
try:
pattern = regex.compile(pattern)
finally:
# restore original syntax
regex.set_syntax(syntax)
return pattern
def compile_awk(pattern):
return compile(pattern, regex_syntax.RE_SYNTAX_AWK)
def compile_grep(pattern):
return compile(pattern, regex_syntax.RE_SYNTAX_GREP)
def compile_emacs(pattern):
return compile(pattern, regex_syntax.RE_SYNTAX_EMACS)
(已廢棄, 只用於 1.5.2) find
模塊用於在給定目錄及其子目錄中查找符合給定匹配模式的文件, 如 Example 14-36 所示.
匹配模式的語法與 fnmatch
中相同.
File: find-example-1.py
import find
# find all JPEG files in or beneath the current directory
for file in find.find("*.jpg", "."):
print file
./samples/sample.jpg
本章將在之後的時間裏慢慢完成, 更新.