Python學習心路歷程-day5

時間 2019-11-09

標籤 python 學習心路歷程 day5 day 欄目 Python 简体版

原文原文鏈接

學習內容：html

1.模塊介紹node

2.time &datetime模塊python

3.random模塊git

4.os模塊正則表達式

5.sys模塊算法

6.shutil模塊shell

7.json & picle模塊json

8.shelve模塊 windows

9.xml處理bash

10.yaml處理

11.configparser模塊

12.hashlib模塊

13.subprocess模塊

14.logging模塊

15.re正則表達式

1.模塊介紹

定義：

　　用來從邏輯上組織Python代碼（變量，函數，類，邏輯：實現一個功能），本質就是.py結尾的Python文件（例如test.py 對應的模塊名就是test）
包的定義：用來從邏輯上組織模塊的，本質就是一個目錄（必須帶有一個__init__.py文件）。

導入方法：

import module_name
import module1_name,import modulel2_name
from module_name import *
from module_name import m1,m2.m3

import本質（路徑搜索和搜索路徑）　　

導入模塊的本質就是把python文件解釋一遍，如：

import module_name --->module_name.py----->module_name.py的路徑---->sys.path

導入包的本質就是執行該包下的__init__.py文件。

導入優化

　　from module_test import test

模塊分類

　　A.標準庫

　　B.開源模塊

　　C.自定義模塊

2.time 與 datetime模塊

　　Python中，一般有這幾種方式來表示時間：
　　　　1）時間戳
　　　　2）格式化的時間字符串
　　　　3）元組（struct_time）共九個元素。
　　因爲Python的time模塊實現主要調用C庫，因此各個平臺可能有所不一樣。
UTC（Coordinated Universal Time，世界協調時）亦即格林威治天文時間，世界標準時間。
　　在中國爲UTC+8。
DST（Daylight Saving Time）即夏令時。
　　時間戳（timestamp）的方式：
　　一般來講，時間戳表示的是從1970年1月1日00:00:00開始按秒計算的偏移量。
　　咱們運行「type(time.time())」，返回的是float類型。返回時間戳方式的函數主要有time()，clock()等。
　　元組（struct_time）方式：struct_time元組共有9個元素，返回struct_time的函數主要有gmtime()，localtime()，strptime()。
　　下面列出這種方式元組中的幾個元素：

time模塊：

>>> time.time()  #返回當前時間戳
1522035652.215034

>>> time.localtime() #返回本地時間 的struct time對象格式
time.struct_time(tm_year=2018, tm_mon=3, tm_mday=26, tm_hour=11, tm_min=45, tm_sec=8, tm_wday=0, tm_yday=85, tm_isdst=0)
>>>

>>> time.gmtime()   #當前時間戳轉化爲UTC
time.struct_time(tm_year=2018, tm_mon=3, tm_mday=26, tm_hour=3, tm_min=45, tm_sec=54, tm_wday=0, tm_yday=85, tm_isdst=0)
>>> time.localtime()#當前時間utc+8
time.struct_time(tm_year=2018, tm_mon=3, tm_mday=26, tm_hour=11, tm_min=46, tm_sec=2, tm_wday=0, tm_yday=85, tm_isdst=0)

>>> x = time.localtime()
>>> time.mktime(x)  #元組轉時間戳
1522036050.0
>>>

>>> time.strftime('%Y-%m-%d %H:%M:%S',time.localtime())  #元組轉化格式化字符串
'2018-03-26 11:48:14'
>>>

>>> time.strptime('2016-08-23 16:06:54','%Y-%m-%d %H:%M:%S') #格式化字符串轉化原組
time.struct_time(tm_year=2016, tm_mon=8, tm_mday=23, tm_hour=16, tm_min=6, tm_sec=54, tm_wday=1, tm_yday=236, tm_isdst=-1)
>>>

strftime("格式"，struct_time)--->"格式化字符串"
strptime("格式化字符串"，「格式」)--->struct_time

>>> time.asctime()
'Mon Mar 26 11:50:02 2018'
>>> time.ctime()
'Mon Mar 26 11:50:10 2018'
>>> time.ctime(1522035652.215034) #時間戳轉特殊格式
'Mon Mar 26 11:40:52 2018'
>>>

datetime模塊：

>>> import datetime
>>> datetime.datetime.now()
datetime.datetime(2018, 3, 26, 12, 22, 37, 766518)
>>> print(datetime.datetime.now())
2018-03-26 12:22:45.683983
>>> print(datetime.datetime.now()+datetime.timedelta(+3))      #三天後的時間
2018-03-29 12:23:08.399578
>>> print(datetime.datetime.now()+datetime.timedelta(-3))      #三天前的時間
2018-03-23 12:23:11.624363
>>> print(datetime.datetime.now()+datetime.timedelta(hours=3)) #三小時後
2018-03-26 15:23:20.175275
>>> print(datetime.datetime.now()+datetime.timedelta(hours=-3))#三小時前
2018-03-26 09:23:27.564384

格式參照

%a    本地（locale）簡化星期名稱
%A    本地完整星期名稱
%b    本地簡化月份名稱
%B    本地完整月份名稱
%c    本地相應的日期和時間表示
%d    一個月中的第幾天（01 - 31）
%H    一天中的第幾個小時（24小時制，00 - 23）
%I    第幾個小時（12小時制，01 - 12）
%j    一年中的第幾天（001 - 366）
%m    月份（01 - 12）
%M    分鐘數（00 - 59）
%p    本地am或者pm的相應符    一
%S    秒（01 - 61）    二
%U    一年中的星期數。（00 - 53星期天是一個星期的開始。）第一個星期天以前的全部天數都放在第0周。    三
%w    一個星期中的第幾天（0 - 6，0是星期天）    三
%W    和%U基本相同，不一樣的是%W以星期一爲一個星期的開始。
%x    本地相應日期
%X    本地相應時間
%y    去掉世紀的年份（00 - 99）
%Y    完整的年份
%Z    時區的名字（若是不存在爲空字符）
%%    ‘%’字符

時間轉換關係

3.random模塊

隨機浮點數

>>> import random
>>> random.random()
0.38741916300777435
>>> random.random()
0.2726009482506605
>>> random.random()
0.8928518510787847
>>> random.random()
0.12703455294635024
>>> random.random()
0.054001403811667514

整數隨機數

>>> random.randint(1,9) #隨機1-9不包括9
3
>>> random.uniform(1,9) #指定區間
6.532363738442411
>>>random.randrange(0,5)#指基數遞增集合中取隨機數

隨機獲取元素

>>> random.choice('csqzyy')
'y'

從序列中隨機取指定長度的片

>>> random.sample('csqzyy', 5)
['c', 'z', 'y', 's', 'y']

洗牌

  1 items = [1,2,3,4,5,6,7]
  2 print(items) #[1, 2, 3, 4, 5, 6, 7]
  3 random.shuffle(items)
  4 print(items) #[1, 4, 7, 2, 5, 3, 6]

練習：生成隨機數

  1 #!/user/bin/env python
  2 # -*- coding: UTF-8 -*-
  3 # Author: cs
  4 # 用於生成4位隨機驗證碼
  5 import random
  6 checkcode = ""
  7 for i in range(4):
  8     current = random.randrange(0, 4)  #生成隨機數與循環次數對比
  9     current1 = random.randrange(0, 4)
 10     if current == i:
 11         tmp = chr(random.randint(65, 90))  #65-90爲ASCII碼錶A-Z
 12     elif current1 == i:
 13         tmp = chr(random.randint(97, 122))   #97-122爲ASCII碼a-z
 14     else:
 15         tmp = random.randint(0, 9)
 16     checkcode += str(tmp)
 17 print(checkcode)

注意：該python文件名爲「random」，運行時可能會出現 AttributeError: module 'random' has no attribute 'randrange'錯誤提示，後來文件名改成「random1」就能夠了

4.OS模塊

提供對操做系統進行調用的接口

  1 os.getcwd() 獲取當前工做目錄，即當前python腳本工做的目錄路徑
  2 os.chdir("dirname")  改變當前腳本工做目錄；至關於shell下cd
  3 os.curdir  返回當前目錄: ('.')
  4 os.pardir  獲取當前目錄的父目錄字符串名：('..')
  5 os.makedirs('dirname1/dirname2')    可生成多層遞歸目錄
  6 os.removedirs('dirname1')    若目錄爲空，則刪除，並遞歸到上一級目錄，如若也爲空，則刪除，依此類推
  7 os.mkdir('dirname')    生成單級目錄；至關於shell中mkdir dirname
  8 os.rmdir('dirname')    刪除單級空目錄，若目錄不爲空則沒法刪除，報錯；至關於shell中rmdir dirname
  9 os.listdir('dirname')    列出指定目錄下的全部文件和子目錄，包括隱藏文件，並以列表方式打印
 10 os.remove()  刪除一個文件
 11 os.rename("oldname","newname")  重命名文件/目錄
 12 os.stat('path/filename')  獲取文件/目錄信息
 13 os.sep    輸出操做系統特定的路徑分隔符，win下爲"\\",Linux下爲"/"
 14 os.linesep    輸出當前平臺使用的行終止符，win下爲"\t\n",Linux下爲"\n"
 15 os.pathsep    輸出用於分割文件路徑的字符串
 16 os.name    輸出字符串指示當前使用平臺。win->'nt'; Linux->'posix'
 17 os.system("bash command")  運行shell命令，直接顯示
 18 os.environ  獲取系統環境變量
 19 os.path.abspath(path)  返回path規範化的絕對路徑
 20 os.path.split(path)  將path分割成目錄和文件名二元組返回
 21 os.path.dirname(path)  返回path的目錄。其實就是os.path.split(path)的第一個元素
 22 os.path.basename(path)  返回path最後的文件名。如何path以／或\結尾，那麼就會返回空值。即os.path.split(path)的第二個元素
 23 os.path.exists(path)  若是path存在，返回True；若是path不存在，返回False
 24 os.path.isabs(path)  若是path是絕對路徑，返回True
 25 os.path.isfile(path)  若是path是一個存在的文件，返回True。不然返回False
 26 os.path.isdir(path)  若是path是一個存在的目錄，則返回True。不然返回False
 27 os.path.join(path1[, path2[, ...]])  將多個路徑組合後返回，第一個絕對路徑以前的參數將被忽略
 28 os.path.getatime(path)  返回path所指向的文件或者目錄的最後存取時間
 29 os.path.getmtime(path)  返回path所指向的文件或者目錄的最後修改時間

5.sys模塊

  1 sys.argv           命令行參數List，第一個元素是程序自己路徑
  2 sys.exit(n)        退出程序，正常退出時exit(0)
  3 sys.version        獲取Python解釋程序的版本信息
  4 sys.maxint         最大的Int值
  5 sys.path           返回模塊的搜索路徑，初始化時使用PYTHONPATH環境變量的值
  6 sys.platform       返回操做系統平臺名稱
  7 sys.stdout.write('please:')
  8 val = sys.stdin.readline()[:-1]

6.shutil模塊

高級的文件、文件夾、壓縮包處理模塊

shutil.copyfileobj(fsrc, fdst[, length])
將文件內容拷貝到另外一個文件中，能夠部份內容

  1 def copyfileobj(fsrc, fdst, length=16*1024):
  2     """copy data from file-like object fsrc to file-like object fdst"""
  3     while 1:
  4         buf = fsrc.read(length)
  5         if not buf:
  6             break
  7         fdst.write(buf)

shutil.copyfile(src, dst)
拷貝文件

  1 def copyfile(src, dst):
  2     """Copy data from src to dst"""
  3     if _samefile(src, dst):
  4         raise Error("`%s` and `%s` are the same file" % (src, dst))
  5 
  6     for fn in [src, dst]:
  7         try:
  8             st = os.stat(fn)
  9         except OSError:
 10             # File most likely does not exist
 11             pass
 12         else:
 13             # XXX What about other special files? (sockets, devices...)
 14             if stat.S_ISFIFO(st.st_mode):
 15                 raise SpecialFileError("`%s` is a named pipe" % fn)
 16 
 17     with open(src, 'rb') as fsrc:
 18         with open(dst, 'wb') as fdst:
 19             copyfileobj(fsrc, fdst)

shutil.copymode(src, dst)
僅拷貝權限。內容、組、用戶均不變

  1 def copystat(src, dst):
  2     """Copy all stat info (mode bits, atime, mtime, flags) from src to dst"""
  3     st = os.stat(src)
  4     mode = stat.S_IMODE(st.st_mode)
  5     if hasattr(os, 'utime'):
  6         os.utime(dst, (st.st_atime, st.st_mtime))
  7     if hasattr(os, 'chmod'):
  8         os.chmod(dst, mode)
  9     if hasattr(os, 'chflags') and hasattr(st, 'st_flags'):
 10         try:
 11             os.chflags(dst, st.st_flags)
 12         except OSError, why:
 13             for err in 'EOPNOTSUPP', 'ENOTSUP':
 14                 if hasattr(errno, err) and why.errno == getattr(errno, err):
 15                     break
 16             else:
 17                 raise

shutil.copy(src, dst)
拷貝文件和權限

  1 def copy(src, dst):
  2     """Copy data and mode bits ("cp src dst").
  3 
  4     The destination may be a directory.
  5 
  6     """
  7     if os.path.isdir(dst):
  8         dst = os.path.join(dst, os.path.basename(src))
  9     copyfile(src, dst)
 10     copymode(src, dst)

shutil.copy2(src, dst)
拷貝文件和狀態信息

  1 def copy2(src, dst):
  2     """Copy data and all stat info ("cp -p src dst").
  3 
  4     The destination may be a directory.
  5 
  6     """
  7     if os.path.isdir(dst):
  8         dst = os.path.join(dst, os.path.basename(src))
  9     copyfile(src, dst)
 10     copystat(src, dst)

shutil.ignore_patterns(*patterns)
shutil.copytree(src, dst, symlinks=False, ignore=None)
遞歸的去拷貝文件

例如：copytree(source, destination, ignore=ignore_patterns('*.pyc', 'tmp*'))

  1 def ignore_patterns(*patterns):
  2     """Function that can be used as copytree() ignore parameter.
  3 
  4     Patterns is a sequence of glob-style patterns
  5     that are used to exclude files"""
  6     def _ignore_patterns(path, names):
  7         ignored_names = []
  8         for pattern in patterns:
  9             ignored_names.extend(fnmatch.filter(names, pattern))
 10         return set(ignored_names)
 11     return _ignore_patterns
 12 
 13 def copytree(src, dst, symlinks=False, ignore=None):
 14     """Recursively copy a directory tree using copy2().
 15 
 16     The destination directory must not already exist.
 17     If exception(s) occur, an Error is raised with a list of reasons.
 18 
 19     If the optional symlinks flag is true, symbolic links in the
 20     source tree result in symbolic links in the destination tree; if
 21     it is false, the contents of the files pointed to by symbolic
 22     links are copied.
 23 
 24     The optional ignore argument is a callable. If given, it
 25     is called with the `src` parameter, which is the directory
 26     being visited by copytree(), and `names` which is the list of
 27     `src` contents, as returned by os.listdir():
 28 
 29         callable(src, names) -> ignored_names
 30 
 31     Since copytree() is called recursively, the callable will be
 32     called once for each directory that is copied. It returns a
 33     list of names relative to the `src` directory that should
 34     not be copied.
 35 
 36     XXX Consider this example code rather than the ultimate tool.
 37 
 38     """
 39     names = os.listdir(src)
 40     if ignore is not None:
 41         ignored_names = ignore(src, names)
 42     else:
 43         ignored_names = set()
 44 
 45     os.makedirs(dst)
 46     errors = []
 47     for name in names:
 48         if name in ignored_names:
 49             continue
 50         srcname = os.path.join(src, name)
 51         dstname = os.path.join(dst, name)
 52         try:
 53             if symlinks and os.path.islink(srcname):
 54                 linkto = os.readlink(srcname)
 55                 os.symlink(linkto, dstname)
 56             elif os.path.isdir(srcname):
 57                 copytree(srcname, dstname, symlinks, ignore)
 58             else:
 59                 # Will raise a SpecialFileError for unsupported file types
 60                 copy2(srcname, dstname)
 61         # catch the Error from the recursive copytree so that we can
 62         # continue with other files
 63         except Error, err:
 64             errors.extend(err.args[0])
 65         except EnvironmentError, why:
 66             errors.append((srcname, dstname, str(why)))
 67     try:
 68         copystat(src, dst)
 69     except OSError, why:
 70         if WindowsError is not None and isinstance(why, WindowsError):
 71             # Copying file access times may fail on Windows
 72             pass
 73         else:
 74             errors.append((src, dst, str(why)))
 75     if errors:
 76         raise Error, errors

View Code

shutil.rmtree(path[, ignore_errors[, onerror]])
遞歸的去刪除文件

  1 def rmtree(path, ignore_errors=False, onerror=None):
  2     """Recursively delete a directory tree.
  3 
  4     If ignore_errors is set, errors are ignored; otherwise, if onerror
  5     is set, it is called to handle the error with arguments (func,
  6     path, exc_info) where func is os.listdir, os.remove, or os.rmdir;
  7     path is the argument to that function that caused it to fail; and
  8     exc_info is a tuple returned by sys.exc_info().  If ignore_errors
  9     is false and onerror is None, an exception is raised.
 10 
 11     """
 12     if ignore_errors:
 13         def onerror(*args):
 14             pass
 15     elif onerror is None:
 16         def onerror(*args):
 17             raise
 18     try:
 19         if os.path.islink(path):
 20             # symlinks to directories are forbidden, see bug #1669
 21             raise OSError("Cannot call rmtree on a symbolic link")
 22     except OSError:
 23         onerror(os.path.islink, path, sys.exc_info())
 24         # can't continue even if onerror hook returns
 25         return
 26     names = []
 27     try:
 28         names = os.listdir(path)
 29     except os.error, err:
 30         onerror(os.listdir, path, sys.exc_info())
 31     for name in names:
 32         fullname = os.path.join(path, name)
 33         try:
 34             mode = os.lstat(fullname).st_mode
 35         except os.error:
 36             mode = 0
 37         if stat.S_ISDIR(mode):
 38             rmtree(fullname, ignore_errors, onerror)
 39         else:
 40             try:
 41                 os.remove(fullname)
 42             except os.error, err:
 43                 onerror(os.remove, fullname, sys.exc_info())
 44     try:
 45         os.rmdir(path)
 46     except os.error:
 47         onerror(os.rmdir, path, sys.exc_info())

View Code

shutil.move(src, dst)
遞歸的去移動文件

  1 def move(src, dst):
  2     """Recursively move a file or directory to another location. This is
  3     similar to the Unix "mv" command.
  4 
  5     If the destination is a directory or a symlink to a directory, the source
  6     is moved inside the directory. The destination path must not already
  7     exist.
  8 
  9     If the destination already exists but is not a directory, it may be
 10     overwritten depending on os.rename() semantics.
 11 
 12     If the destination is on our current filesystem, then rename() is used.
 13     Otherwise, src is copied to the destination and then removed.
 14     A lot more could be done here...  A look at a mv.c shows a lot of
 15     the issues this implementation glosses over.
 16 
 17     """
 18     real_dst = dst
 19     if os.path.isdir(dst):
 20         if _samefile(src, dst):
 21             # We might be on a case insensitive filesystem,
 22             # perform the rename anyway.
 23             os.rename(src, dst)
 24             return
 25 
 26         real_dst = os.path.join(dst, _basename(src))
 27         if os.path.exists(real_dst):
 28             raise Error, "Destination path '%s' already exists" % real_dst
 29     try:
 30         os.rename(src, real_dst)
 31     except OSError:
 32         if os.path.isdir(src):
 33             if _destinsrc(src, dst):
 34                 raise Error, "Cannot move a directory '%s' into itself '%s'." % (src, dst)
 35             copytree(src, real_dst, symlinks=True)
 36             rmtree(src)
 37         else:
 38             copy2(src, real_dst)
 39             os.unlink(src)

View Code

shutil.make_archive(base_name, format,...)

建立壓縮包並返回文件路徑，例如：zip、tar

base_name：壓縮包的文件名，也能夠是壓縮包的路徑。只是文件名時，則保存至當前目錄，不然保存至指定路徑，
如：www =>保存至當前路徑
如：/Users/wupeiqi/www =>保存至/Users/wupeiqi/
format：壓縮包種類，「zip」, 「tar」, 「bztar」，「gztar」
root_dir：要壓縮的文件夾路徑（默認當前目錄）
owner：用戶，默認當前用戶
group：組，默認當前組
logger：用於記錄日誌，一般是logging.Logger對象

  1 #將 /Users/wupeiqi/Downloads/test 下的文件打包放置當前程序目錄
  2 
  3 import shutil
  4 ret = shutil.make_archive("wwwwwwwwww", 'gztar', root_dir='/Users/wupeiqi/Downloads/test')
  5 
  6 
  7 #將 /Users/wupeiqi/Downloads/test 下的文件打包放置 /Users/wupeiqi/目錄
  8 import shutil
  9 ret = shutil.make_archive("/Users/wupeiqi/wwwwwwwwww", 'gztar', root_dir='/Users/wupeiqi/Downloads/test')

  1 def make_archive(base_name, format, root_dir=None, base_dir=None, verbose=0,
  2                  dry_run=0, owner=None, group=None, logger=None):
  3     """Create an archive file (eg. zip or tar).
  4 
  5     'base_name' is the name of the file to create, minus any format-specific
  6     extension; 'format' is the archive format: one of "zip", "tar", "bztar"
  7     or "gztar".
  8 
  9     'root_dir' is a directory that will be the root directory of the
 10     archive; ie. we typically chdir into 'root_dir' before creating the
 11     archive.  'base_dir' is the directory where we start archiving from;
 12     ie. 'base_dir' will be the common prefix of all files and
 13     directories in the archive.  'root_dir' and 'base_dir' both default
 14     to the current directory.  Returns the name of the archive file.
 15 
 16     'owner' and 'group' are used when creating a tar archive. By default,
 17     uses the current owner and group.
 18     """
 19     save_cwd = os.getcwd()
 20     if root_dir is not None:
 21         if logger is not None:
 22             logger.debug("changing into '%s'", root_dir)
 23         base_name = os.path.abspath(base_name)
 24         if not dry_run:
 25             os.chdir(root_dir)
 26 
 27     if base_dir is None:
 28         base_dir = os.curdir
 29 
 30     kwargs = {'dry_run': dry_run, 'logger': logger}
 31 
 32     try:
 33         format_info = _ARCHIVE_FORMATS[format]
 34     except KeyError:
 35         raise ValueError, "unknown archive format '%s'" % format
 36 
 37     func = format_info[0]
 38     for arg, val in format_info[1]:
 39         kwargs[arg] = val
 40 
 41     if format != 'zip':
 42         kwargs['owner'] = owner
 43         kwargs['group'] = group
 44 
 45     try:
 46         filename = func(base_name, base_dir, **kwargs)
 47     finally:
 48         if root_dir is not None:
 49             if logger is not None:
 50                 logger.debug("changing back to '%s'", save_cwd)
 51             os.chdir(save_cwd)
 52 
 53     return filename

View Code

shutil 對壓縮包的處理是調用 ZipFile 和 TarFile 兩個模塊來進行的，詳細：

  1 import zipfile
  2 
  3 # 壓縮
  4 z = zipfile.ZipFile('laxi.zip', 'w')
  5 z.write('a.log')
  6 z.write('data.data')
  7 z.close()
  8 
  9 # 解壓
 10 z = zipfile.ZipFile('laxi.zip', 'r')
 11 z.extractall()
 12 z.close()

zipfile壓縮解壓

  1 import tarfile
  2 
  3 # 壓縮
  4 tar = tarfile.open('your.tar','w')
  5 tar.add('/Users/wupeiqi/PycharmProjects/bbs2.zip', arcname='bbs2.zip')
  6 tar.add('/Users/wupeiqi/PycharmProjects/cmdb.zip', arcname='cmdb.zip')
  7 tar.close()
  8 
  9 # 解壓
 10 tar = tarfile.open('your.tar','r')
 11 tar.extractall()  # 可設置解壓地址
 12 tar.close()

tar 壓縮解壓

  1 class ZipFile(object):
  2     """ Class with methods to open, read, write, close, list zip files.
  3 
  4     z = ZipFile(file, mode="r", compression=ZIP_STORED, allowZip64=False)
  5 
  6     file: Either the path to the file, or a file-like object.
  7           If it is a path, the file will be opened and closed by ZipFile.
  8     mode: The mode can be either read "r", write "w" or append "a".
  9     compression: ZIP_STORED (no compression) or ZIP_DEFLATED (requires zlib).
 10     allowZip64: if True ZipFile will create files with ZIP64 extensions when
 11                 needed, otherwise it will raise an exception when this would
 12                 be necessary.
 13 
 14     """
 15 
 16     fp = None                   # Set here since __del__ checks it
 17 
 18     def __init__(self, file, mode="r", compression=ZIP_STORED, allowZip64=False):
 19         """Open the ZIP file with mode read "r", write "w" or append "a"."""
 20         if mode not in ("r", "w", "a"):
 21             raise RuntimeError('ZipFile() requires mode "r", "w", or "a"')
 22 
 23         if compression == ZIP_STORED:
 24             pass
 25         elif compression == ZIP_DEFLATED:
 26             if not zlib:
 27                 raise RuntimeError,\
 28                       "Compression requires the (missing) zlib module"
 29         else:
 30             raise RuntimeError, "That compression method is not supported"
 31 
 32         self._allowZip64 = allowZip64
 33         self._didModify = False
 34         self.debug = 0  # Level of printing: 0 through 3
 35         self.NameToInfo = {}    # Find file info given name
 36         self.filelist = []      # List of ZipInfo instances for archive
 37         self.compression = compression  # Method of compression
 38         self.mode = key = mode.replace('b', '')[0]
 39         self.pwd = None
 40         self._comment = ''
 41 
 42         # Check if we were passed a file-like object
 43         if isinstance(file, basestring):
 44             self._filePassed = 0
 45             self.filename = file
 46             modeDict = {'r' : 'rb', 'w': 'wb', 'a' : 'r+b'}
 47             try:
 48                 self.fp = open(file, modeDict[mode])
 49             except IOError:
 50                 if mode == 'a':
 51                     mode = key = 'w'
 52                     self.fp = open(file, modeDict[mode])
 53                 else:
 54                     raise
 55         else:
 56             self._filePassed = 1
 57             self.fp = file
 58             self.filename = getattr(file, 'name', None)
 59 
 60         try:
 61             if key == 'r':
 62                 self._RealGetContents()
 63             elif key == 'w':
 64                 # set the modified flag so central directory gets written
 65                 # even if no files are added to the archive
 66                 self._didModify = True
 67             elif key == 'a':
 68                 try:
 69                     # See if file is a zip file
 70                     self._RealGetContents()
 71                     # seek to start of directory and overwrite
 72                     self.fp.seek(self.start_dir, 0)
 73                 except BadZipfile:
 74                     # file is not a zip file, just append
 75                     self.fp.seek(0, 2)
 76 
 77                     # set the modified flag so central directory gets written
 78                     # even if no files are added to the archive
 79                     self._didModify = True
 80             else:
 81                 raise RuntimeError('Mode must be "r", "w" or "a"')
 82         except:
 83             fp = self.fp
 84             self.fp = None
 85             if not self._filePassed:
 86                 fp.close()
 87             raise
 88 
 89     def __enter__(self):
 90         return self
 91 
 92     def __exit__(self, type, value, traceback):
 93         self.close()
 94 
 95     def _RealGetContents(self):
 96         """Read in the table of contents for the ZIP file."""
 97         fp = self.fp
 98         try:
 99             endrec = _EndRecData(fp)
100         except IOError:
101             raise BadZipfile("File is not a zip file")
102         if not endrec:
103             raise BadZipfile, "File is not a zip file"
104         if self.debug > 1:
105             print endrec
106         size_cd = endrec[_ECD_SIZE]             # bytes in central directory
107         offset_cd = endrec[_ECD_OFFSET]         # offset of central directory
108         self._comment = endrec[_ECD_COMMENT]    # archive comment
109 
110         # "concat" is zero, unless zip was concatenated to another file
111         concat = endrec[_ECD_LOCATION] - size_cd - offset_cd
112         if endrec[_ECD_SIGNATURE] == stringEndArchive64:
113             # If Zip64 extension structures are present, account for them
114             concat -= (sizeEndCentDir64 + sizeEndCentDir64Locator)
115 
116         if self.debug > 2:
117             inferred = concat + offset_cd
118             print "given, inferred, offset", offset_cd, inferred, concat
119         # self.start_dir:  Position of start of central directory
120         self.start_dir = offset_cd + concat
121         fp.seek(self.start_dir, 0)
122         data = fp.read(size_cd)
123         fp = cStringIO.StringIO(data)
124         total = 0
125         while total < size_cd:
126             centdir = fp.read(sizeCentralDir)
127             if len(centdir) != sizeCentralDir:
128                 raise BadZipfile("Truncated central directory")
129             centdir = struct.unpack(structCentralDir, centdir)
130             if centdir[_CD_SIGNATURE] != stringCentralDir:
131                 raise BadZipfile("Bad magic number for central directory")
132             if self.debug > 2:
133                 print centdir
134             filename = fp.read(centdir[_CD_FILENAME_LENGTH])
135             # Create ZipInfo instance to store file information
136             x = ZipInfo(filename)
137             x.extra = fp.read(centdir[_CD_EXTRA_FIELD_LENGTH])
138             x.comment = fp.read(centdir[_CD_COMMENT_LENGTH])
139             x.header_offset = centdir[_CD_LOCAL_HEADER_OFFSET]
140             (x.create_version, x.create_system, x.extract_version, x.reserved,
141                 x.flag_bits, x.compress_type, t, d,
142                 x.CRC, x.compress_size, x.file_size) = centdir[1:12]
143             x.volume, x.internal_attr, x.external_attr = centdir[15:18]
144             # Convert date/time code to (year, month, day, hour, min, sec)
145             x._raw_time = t
146             x.date_time = ( (d>>9)+1980, (d>>5)&0xF, d&0x1F,
147                                      t>>11, (t>>5)&0x3F, (t&0x1F) * 2 )
148 
149             x._decodeExtra()
150             x.header_offset = x.header_offset + concat
151             x.filename = x._decodeFilename()
152             self.filelist.append(x)
153             self.NameToInfo[x.filename] = x
154 
155             # update total bytes read from central directory
156             total = (total + sizeCentralDir + centdir[_CD_FILENAME_LENGTH]
157                      + centdir[_CD_EXTRA_FIELD_LENGTH]
158                      + centdir[_CD_COMMENT_LENGTH])
159 
160             if self.debug > 2:
161                 print "total", total
162 
163 
164     def namelist(self):
165         """Return a list of file names in the archive."""
166         l = []
167         for data in self.filelist:
168             l.append(data.filename)
169         return l
170 
171     def infolist(self):
172         """Return a list of class ZipInfo instances for files in the
173         archive."""
174         return self.filelist
175 
176     def printdir(self):
177         """Print a table of contents for the zip file."""
178         print "%-46s %19s %12s" % ("File Name", "Modified    ", "Size")
179         for zinfo in self.filelist:
180             date = "%d-%02d-%02d %02d:%02d:%02d" % zinfo.date_time[:6]
181             print "%-46s %s %12d" % (zinfo.filename, date, zinfo.file_size)
182 
183     def testzip(self):
184         """Read all the files and check the CRC."""
185         chunk_size = 2 ** 20
186         for zinfo in self.filelist:
187             try:
188                 # Read by chunks, to avoid an OverflowError or a
189                 # MemoryError with very large embedded files.
190                 with self.open(zinfo.filename, "r") as f:
191                     while f.read(chunk_size):     # Check CRC-32
192                         pass
193             except BadZipfile:
194                 return zinfo.filename
195 
196     def getinfo(self, name):
197         """Return the instance of ZipInfo given 'name'."""
198         info = self.NameToInfo.get(name)
199         if info is None:
200             raise KeyError(
201                 'There is no item named %r in the archive' % name)
202 
203         return info
204 
205     def setpassword(self, pwd):
206         """Set default password for encrypted files."""
207         self.pwd = pwd
208 
209     @property
210     def comment(self):
211         """The comment text associated with the ZIP file."""
212         return self._comment
213 
214     @comment.setter
215     def comment(self, comment):
216         # check for valid comment length
217         if len(comment) > ZIP_MAX_COMMENT:
218             import warnings
219             warnings.warn('Archive comment is too long; truncating to %d bytes'
220                           % ZIP_MAX_COMMENT, stacklevel=2)
221             comment = comment[:ZIP_MAX_COMMENT]
222         self._comment = comment
223         self._didModify = True
224 
225     def read(self, name, pwd=None):
226         """Return file bytes (as a string) for name."""
227         return self.open(name, "r", pwd).read()
228 
229     def open(self, name, mode="r", pwd=None):
230         """Return file-like object for 'name'."""
231         if mode not in ("r", "U", "rU"):
232             raise RuntimeError, 'open() requires mode "r", "U", or "rU"'
233         if not self.fp:
234             raise RuntimeError, \
235                   "Attempt to read ZIP archive that was already closed"
236 
237         # Only open a new file for instances where we were not
238         # given a file object in the constructor
239         if self._filePassed:
240             zef_file = self.fp
241             should_close = False
242         else:
243             zef_file = open(self.filename, 'rb')
244             should_close = True
245 
246         try:
247             # Make sure we have an info object
248             if isinstance(name, ZipInfo):
249                 # 'name' is already an info object
250                 zinfo = name
251             else:
252                 # Get info object for name
253                 zinfo = self.getinfo(name)
254 
255             zef_file.seek(zinfo.header_offset, 0)
256 
257             # Skip the file header:
258             fheader = zef_file.read(sizeFileHeader)
259             if len(fheader) != sizeFileHeader:
260                 raise BadZipfile("Truncated file header")
261             fheader = struct.unpack(structFileHeader, fheader)
262             if fheader[_FH_SIGNATURE] != stringFileHeader:
263                 raise BadZipfile("Bad magic number for file header")
264 
265             fname = zef_file.read(fheader[_FH_FILENAME_LENGTH])
266             if fheader[_FH_EXTRA_FIELD_LENGTH]:
267                 zef_file.read(fheader[_FH_EXTRA_FIELD_LENGTH])
268 
269             if fname != zinfo.orig_filename:
270                 raise BadZipfile, \
271                         'File name in directory "%s" and header "%s" differ.' % (
272                             zinfo.orig_filename, fname)
273 
274             # check for encrypted flag & handle password
275             is_encrypted = zinfo.flag_bits & 0x1
276             zd = None
277             if is_encrypted:
278                 if not pwd:
279                     pwd = self.pwd
280                 if not pwd:
281                     raise RuntimeError, "File %s is encrypted, " \
282                         "password required for extraction" % name
283 
284                 zd = _ZipDecrypter(pwd)
285                 # The first 12 bytes in the cypher stream is an encryption header
286                 #  used to strengthen the algorithm. The first 11 bytes are
287                 #  completely random, while the 12th contains the MSB of the CRC,
288                 #  or the MSB of the file time depending on the header type
289                 #  and is used to check the correctness of the password.
290                 bytes = zef_file.read(12)
291                 h = map(zd, bytes[0:12])
292                 if zinfo.flag_bits & 0x8:
293                     # compare against the file type from extended local headers
294                     check_byte = (zinfo._raw_time >> 8) & 0xff
295                 else:
296                     # compare against the CRC otherwise
297                     check_byte = (zinfo.CRC >> 24) & 0xff
298                 if ord(h[11]) != check_byte:
299                     raise RuntimeError("Bad password for file", name)
300 
301             return ZipExtFile(zef_file, mode, zinfo, zd,
302                     close_fileobj=should_close)
303         except:
304             if should_close:
305                 zef_file.close()
306             raise
307 
308     def extract(self, member, path=None, pwd=None):
309         """Extract a member from the archive to the current working directory,
310            using its full name. Its file information is extracted as accurately
311            as possible. `member' may be a filename or a ZipInfo object. You can
312            specify a different directory using `path'.
313         """
314         if not isinstance(member, ZipInfo):
315             member = self.getinfo(member)
316 
317         if path is None:
318             path = os.getcwd()
319 
320         return self._extract_member(member, path, pwd)
321 
322     def extractall(self, path=None, members=None, pwd=None):
323         """Extract all members from the archive to the current working
324            directory. `path' specifies a different directory to extract to.
325            `members' is optional and must be a subset of the list returned
326            by namelist().
327         """
328         if members is None:
329             members = self.namelist()
330 
331         for zipinfo in members:
332             self.extract(zipinfo, path, pwd)
333 
334     def _extract_member(self, member, targetpath, pwd):
335         """Extract the ZipInfo object 'member' to a physical
336            file on the path targetpath.
337         """
338         # build the destination pathname, replacing
339         # forward slashes to platform specific separators.
340         arcname = member.filename.replace('/', os.path.sep)
341 
342         if os.path.altsep:
343             arcname = arcname.replace(os.path.altsep, os.path.sep)
344         # interpret absolute pathname as relative, remove drive letter or
345         # UNC path, redundant separators, "." and ".." components.
346         arcname = os.path.splitdrive(arcname)[1]
347         arcname = os.path.sep.join(x for x in arcname.split(os.path.sep)
348                     if x not in ('', os.path.curdir, os.path.pardir))
349         if os.path.sep == '\\':
350             # filter illegal characters on Windows
351             illegal = ':<>|"?*'
352             if isinstance(arcname, unicode):
353                 table = {ord(c): ord('_') for c in illegal}
354             else:
355                 table = string.maketrans(illegal, '_' * len(illegal))
356             arcname = arcname.translate(table)
357             # remove trailing dots
358             arcname = (x.rstrip('.') for x in arcname.split(os.path.sep))
359             arcname = os.path.sep.join(x for x in arcname if x)
360 
361         targetpath = os.path.join(targetpath, arcname)
362         targetpath = os.path.normpath(targetpath)
363 
364         # Create all upper directories if necessary.
365         upperdirs = os.path.dirname(targetpath)
366         if upperdirs and not os.path.exists(upperdirs):
367             os.makedirs(upperdirs)
368 
369         if member.filename[-1] == '/':
370             if not os.path.isdir(targetpath):
371                 os.mkdir(targetpath)
372             return targetpath
373 
374         with self.open(member, pwd=pwd) as source, \
375              file(targetpath, "wb") as target:
376             shutil.copyfileobj(source, target)
377 
378         return targetpath
379 
380     def _writecheck(self, zinfo):
381         """Check for errors before writing a file to the archive."""
382         if zinfo.filename in self.NameToInfo:
383             import warnings
384             warnings.warn('Duplicate name: %r' % zinfo.filename, stacklevel=3)
385         if self.mode not in ("w", "a"):
386             raise RuntimeError, 'write() requires mode "w" or "a"'
387         if not self.fp:
388             raise RuntimeError, \
389                   "Attempt to write ZIP archive that was already closed"
390         if zinfo.compress_type == ZIP_DEFLATED and not zlib:
391             raise RuntimeError, \
392                   "Compression requires the (missing) zlib module"
393         if zinfo.compress_type not in (ZIP_STORED, ZIP_DEFLATED):
394             raise RuntimeError, \
395                   "That compression method is not supported"
396         if not self._allowZip64:
397             requires_zip64 = None
398             if len(self.filelist) >= ZIP_FILECOUNT_LIMIT:
399                 requires_zip64 = "Files count"
400             elif zinfo.file_size > ZIP64_LIMIT:
401                 requires_zip64 = "Filesize"
402             elif zinfo.header_offset > ZIP64_LIMIT:
403                 requires_zip64 = "Zipfile size"
404             if requires_zip64:
405                 raise LargeZipFile(requires_zip64 +
406                                    " would require ZIP64 extensions")
407 
408     def write(self, filename, arcname=None, compress_type=None):
409         """Put the bytes from filename into the archive under the name
410         arcname."""
411         if not self.fp:
412             raise RuntimeError(
413                   "Attempt to write to ZIP archive that was already closed")
414 
415         st = os.stat(filename)
416         isdir = stat.S_ISDIR(st.st_mode)
417         mtime = time.localtime(st.st_mtime)
418         date_time = mtime[0:6]
419         # Create ZipInfo instance to store file information
420         if arcname is None:
421             arcname = filename
422         arcname = os.path.normpath(os.path.splitdrive(arcname)[1])
423         while arcname[0] in (os.sep, os.altsep):
424             arcname = arcname[1:]
425         if isdir:
426             arcname += '/'
427         zinfo = ZipInfo(arcname, date_time)
428         zinfo.external_attr = (st[0] & 0xFFFF) << 16L      # Unix attributes
429         if compress_type is None:
430             zinfo.compress_type = self.compression
431         else:
432             zinfo.compress_type = compress_type
433 
434         zinfo.file_size = st.st_size
435         zinfo.flag_bits = 0x00
436         zinfo.header_offset = self.fp.tell()    # Start of header bytes
437 
438         self._writecheck(zinfo)
439         self._didModify = True
440 
441         if isdir:
442             zinfo.file_size = 0
443             zinfo.compress_size = 0
444             zinfo.CRC = 0
445             zinfo.external_attr |= 0x10  # MS-DOS directory flag
446             self.filelist.append(zinfo)
447             self.NameToInfo[zinfo.filename] = zinfo
448             self.fp.write(zinfo.FileHeader(False))
449             return
450 
451         with open(filename, "rb") as fp:
452             # Must overwrite CRC and sizes with correct data later
453             zinfo.CRC = CRC = 0
454             zinfo.compress_size = compress_size = 0
455             # Compressed size can be larger than uncompressed size
456             zip64 = self._allowZip64 and \
457                     zinfo.file_size * 1.05 > ZIP64_LIMIT
458             self.fp.write(zinfo.FileHeader(zip64))
459             if zinfo.compress_type == ZIP_DEFLATED:
460                 cmpr = zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION,
461                      zlib.DEFLATED, -15)
462             else:
463                 cmpr = None
464             file_size = 0
465             while 1:
466                 buf = fp.read(1024 * 8)
467                 if not buf:
468                     break
469                 file_size = file_size + len(buf)
470                 CRC = crc32(buf, CRC) & 0xffffffff
471                 if cmpr:
472                     buf = cmpr.compress(buf)
473                     compress_size = compress_size + len(buf)
474                 self.fp.write(buf)
475         if cmpr:
476             buf = cmpr.flush()
477             compress_size = compress_size + len(buf)
478             self.fp.write(buf)
479             zinfo.compress_size = compress_size
480         else:
481             zinfo.compress_size = file_size
482         zinfo.CRC = CRC
483         zinfo.file_size = file_size
484         if not zip64 and self._allowZip64:
485             if file_size > ZIP64_LIMIT:
486                 raise RuntimeError('File size has increased during compressing')
487             if compress_size > ZIP64_LIMIT:
488                 raise RuntimeError('Compressed size larger than uncompressed size')
489         # Seek backwards and write file header (which will now include
490         # correct CRC and file sizes)
491         position = self.fp.tell()       # Preserve current position in file
492         self.fp.seek(zinfo.header_offset, 0)
493         self.fp.write(zinfo.FileHeader(zip64))
494         self.fp.seek(position, 0)
495         self.filelist.append(zinfo)
496         self.NameToInfo[zinfo.filename] = zinfo
497 
498     def writestr(self, zinfo_or_arcname, bytes, compress_type=None):
499         """Write a file into the archive.  The contents is the string
500         'bytes'.  'zinfo_or_arcname' is either a ZipInfo instance or
501         the name of the file in the archive."""
502         if not isinstance(zinfo_or_arcname, ZipInfo):
503             zinfo = ZipInfo(filename=zinfo_or_arcname,
504                             date_time=time.localtime(time.time())[:6])
505 
506             zinfo.compress_type = self.compression
507             if zinfo.filename[-1] == '/':
508                 zinfo.external_attr = 0o40775 << 16   # drwxrwxr-x
509                 zinfo.external_attr |= 0x10           # MS-DOS directory flag
510             else:
511                 zinfo.external_attr = 0o600 << 16     # ?rw-------
512         else:
513             zinfo = zinfo_or_arcname
514 
515         if not self.fp:
516             raise RuntimeError(
517                   "Attempt to write to ZIP archive that was already closed")
518 
519         if compress_type is not None:
520             zinfo.compress_type = compress_type
521 
522         zinfo.file_size = len(bytes)            # Uncompressed size
523         zinfo.header_offset = self.fp.tell()    # Start of header bytes
524         self._writecheck(zinfo)
525         self._didModify = True
526         zinfo.CRC = crc32(bytes) & 0xffffffff       # CRC-32 checksum
527         if zinfo.compress_type == ZIP_DEFLATED:
528             co = zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION,
529                  zlib.DEFLATED, -15)
530             bytes = co.compress(bytes) + co.flush()
531             zinfo.compress_size = len(bytes)    # Compressed size
532         else:
533             zinfo.compress_size = zinfo.file_size
534         zip64 = zinfo.file_size > ZIP64_LIMIT or \
535                 zinfo.compress_size > ZIP64_LIMIT
536         if zip64 and not self._allowZip64:
537             raise LargeZipFile("Filesize would require ZIP64 extensions")
538         self.fp.write(zinfo.FileHeader(zip64))
539         self.fp.write(bytes)
540         if zinfo.flag_bits & 0x08:
541             # Write CRC and file sizes after the file data
542             fmt = '<LQQ' if zip64 else '<LLL'
543             self.fp.write(struct.pack(fmt, zinfo.CRC, zinfo.compress_size,
544                   zinfo.file_size))
545         self.fp.flush()
546         self.filelist.append(zinfo)
547         self.NameToInfo[zinfo.filename] = zinfo
548 
549     def __del__(self):
550         """Call the "close()" method in case the user forgot."""
551         self.close()
552 
553     def close(self):
554         """Close the file, and for mode "w" and "a" write the ending
555         records."""
556         if self.fp is None:
557             return
558 
559         try:
560             if self.mode in ("w", "a") and self._didModify: # write ending records
561                 pos1 = self.fp.tell()
562                 for zinfo in self.filelist:         # write central directory
563                     dt = zinfo.date_time
564                     dosdate = (dt[0] - 1980) << 9 | dt[1] << 5 | dt[2]
565                     dostime = dt[3] << 11 | dt[4] << 5 | (dt[5] // 2)
566                     extra = []
567                     if zinfo.file_size > ZIP64_LIMIT \
568                             or zinfo.compress_size > ZIP64_LIMIT:
569                         extra.append(zinfo.file_size)
570                         extra.append(zinfo.compress_size)
571                         file_size = 0xffffffff
572                         compress_size = 0xffffffff
573                     else:
574                         file_size = zinfo.file_size
575                         compress_size = zinfo.compress_size
576 
577                     if zinfo.header_offset > ZIP64_LIMIT:
578                         extra.append(zinfo.header_offset)
579                         header_offset = 0xffffffffL
580                     else:
581                         header_offset = zinfo.header_offset
582 
583                     extra_data = zinfo.extra
584                     if extra:
585                         # Append a ZIP64 field to the extra's
586                         extra_data = struct.pack(
587                                 '<HH' + 'Q'*len(extra),
588                                 1, 8*len(extra), *extra) + extra_data
589 
590                         extract_version = max(45, zinfo.extract_version)
591                         create_version = max(45, zinfo.create_version)
592                     else:
593                         extract_version = zinfo.extract_version
594                         create_version = zinfo.create_version
595 
596                     try:
597                         filename, flag_bits = zinfo._encodeFilenameFlags()
598                         centdir = struct.pack(structCentralDir,
599                         stringCentralDir, create_version,
600                         zinfo.create_system, extract_version, zinfo.reserved,
601                         flag_bits, zinfo.compress_type, dostime, dosdate,
602                         zinfo.CRC, compress_size, file_size,
603                         len(filename), len(extra_data), len(zinfo.comment),
604                         0, zinfo.internal_attr, zinfo.external_attr,
605                         header_offset)
606                     except DeprecationWarning:
607                         print >>sys.stderr, (structCentralDir,
608                         stringCentralDir, create_version,
609                         zinfo.create_system, extract_version, zinfo.reserved,
610                         zinfo.flag_bits, zinfo.compress_type, dostime, dosdate,
611                         zinfo.CRC, compress_size, file_size,
612                         len(zinfo.filename), len(extra_data), len(zinfo.comment),
613                         0, zinfo.internal_attr, zinfo.external_attr,
614                         header_offset)
615                         raise
616                     self.fp.write(centdir)
617                     self.fp.write(filename)
618                     self.fp.write(extra_data)
619                     self.fp.write(zinfo.comment)
620 
621                 pos2 = self.fp.tell()
622                 # Write end-of-zip-archive record
623                 centDirCount = len(self.filelist)
624                 centDirSize = pos2 - pos1
625                 centDirOffset = pos1
626                 requires_zip64 = None
627                 if centDirCount > ZIP_FILECOUNT_LIMIT:
628                     requires_zip64 = "Files count"
629                 elif centDirOffset > ZIP64_LIMIT:
630                     requires_zip64 = "Central directory offset"
631                 elif centDirSize > ZIP64_LIMIT:
632                     requires_zip64 = "Central directory size"
633                 if requires_zip64:
634                     # Need to write the ZIP64 end-of-archive records
635                     if not self._allowZip64:
636                         raise LargeZipFile(requires_zip64 +
637                                            " would require ZIP64 extensions")
638                     zip64endrec = struct.pack(
639                             structEndArchive64, stringEndArchive64,
640                             44, 45, 45, 0, 0, centDirCount, centDirCount,
641                             centDirSize, centDirOffset)
642                     self.fp.write(zip64endrec)
643 
644                     zip64locrec = struct.pack(
645                             structEndArchive64Locator,
646                             stringEndArchive64Locator, 0, pos2, 1)
647                     self.fp.write(zip64locrec)
648                     centDirCount = min(centDirCount, 0xFFFF)
649                     centDirSize = min(centDirSize, 0xFFFFFFFF)
650                     centDirOffset = min(centDirOffset, 0xFFFFFFFF)
651 
652                 endrec = struct.pack(structEndArchive, stringEndArchive,
653                                     0, 0, centDirCount, centDirCount,
654                                     centDirSize, centDirOffset, len(self._comment))
655                 self.fp.write(endrec)
656                 self.fp.write(self._comment)
657                 self.fp.flush()
658         finally:
659             fp = self.fp
660             self.fp = None
661             if not self._filePassed:
662                 fp.close()

ZipFile

  1 class ZipFile(object):
  2     """ Class with methods to open, read, write, close, list zip files.
  3 
  4     z = ZipFile(file, mode="r", compression=ZIP_STORED, allowZip64=False)
  5 
  6     file: Either the path to the file, or a file-like object.
  7           If it is a path, the file will be opened and closed by ZipFile.
  8     mode: The mode can be either read "r", write "w" or append "a".
  9     compression: ZIP_STORED (no compression) or ZIP_DEFLATED (requires zlib).
 10     allowZip64: if True ZipFile will create files with ZIP64 extensions when
 11                 needed, otherwise it will raise an exception when this would
 12                 be necessary.
 13 
 14     """
 15 
 16     fp = None                   # Set here since __del__ checks it
 17 
 18     def __init__(self, file, mode="r", compression=ZIP_STORED, allowZip64=False):
 19         """Open the ZIP file with mode read "r", write "w" or append "a"."""
 20         if mode not in ("r", "w", "a"):
 21             raise RuntimeError('ZipFile() requires mode "r", "w", or "a"')
 22 
 23         if compression == ZIP_STORED:
 24             pass
 25         elif compression == ZIP_DEFLATED:
 26             if not zlib:
 27                 raise RuntimeError,\
 28                       "Compression requires the (missing) zlib module"
 29         else:
 30             raise RuntimeError, "That compression method is not supported"
 31 
 32         self._allowZip64 = allowZip64
 33         self._didModify = False
 34         self.debug = 0  # Level of printing: 0 through 3
 35         self.NameToInfo = {}    # Find file info given name
 36         self.filelist = []      # List of ZipInfo instances for archive
 37         self.compression = compression  # Method of compression
 38         self.mode = key = mode.replace('b', '')[0]
 39         self.pwd = None
 40         self._comment = ''
 41 
 42         # Check if we were passed a file-like object
 43         if isinstance(file, basestring):
 44             self._filePassed = 0
 45             self.filename = file
 46             modeDict = {'r' : 'rb', 'w': 'wb', 'a' : 'r+b'}
 47             try:
 48                 self.fp = open(file, modeDict[mode])
 49             except IOError:
 50                 if mode == 'a':
 51                     mode = key = 'w'
 52                     self.fp = open(file, modeDict[mode])
 53                 else:
 54                     raise
 55         else:
 56             self._filePassed = 1
 57             self.fp = file
 58             self.filename = getattr(file, 'name', None)
 59 
 60         try:
 61             if key == 'r':
 62                 self._RealGetContents()
 63             elif key == 'w':
 64                 # set the modified flag so central directory gets written
 65                 # even if no files are added to the archive
 66                 self._didModify = True
 67             elif key == 'a':
 68                 try:
 69                     # See if file is a zip file
 70                     self._RealGetContents()
 71                     # seek to start of directory and overwrite
 72                     self.fp.seek(self.start_dir, 0)
 73                 except BadZipfile:
 74                     # file is not a zip file, just append
 75                     self.fp.seek(0, 2)
 76 
 77                     # set the modified flag so central directory gets written
 78                     # even if no files are added to the archive
 79                     self._didModify = True
 80             else:
 81                 raise RuntimeError('Mode must be "r", "w" or "a"')
 82         except:
 83             fp = self.fp
 84             self.fp = None
 85             if not self._filePassed:
 86                 fp.close()
 87             raise
 88 
 89     def __enter__(self):
 90         return self
 91 
 92     def __exit__(self, type, value, traceback):
 93         self.close()
 94 
 95     def _RealGetContents(self):
 96         """Read in the table of contents for the ZIP file."""
 97         fp = self.fp
 98         try:
 99             endrec = _EndRecData(fp)
100         except IOError:
101             raise BadZipfile("File is not a zip file")
102         if not endrec:
103             raise BadZipfile, "File is not a zip file"
104         if self.debug > 1:
105             print endrec
106         size_cd = endrec[_ECD_SIZE]             # bytes in central directory
107         offset_cd = endrec[_ECD_OFFSET]         # offset of central directory
108         self._comment = endrec[_ECD_COMMENT]    # archive comment
109 
110         # "concat" is zero, unless zip was concatenated to another file
111         concat = endrec[_ECD_LOCATION] - size_cd - offset_cd
112         if endrec[_ECD_SIGNATURE] == stringEndArchive64:
113             # If Zip64 extension structures are present, account for them
114             concat -= (sizeEndCentDir64 + sizeEndCentDir64Locator)
115 
116         if self.debug > 2:
117             inferred = concat + offset_cd
118             print "given, inferred, offset", offset_cd, inferred, concat
119         # self.start_dir:  Position of start of central directory
120         self.start_dir = offset_cd + concat
121         fp.seek(self.start_dir, 0)
122         data = fp.read(size_cd)
123         fp = cStringIO.StringIO(data)
124         total = 0
125         while total < size_cd:
126             centdir = fp.read(sizeCentralDir)
127             if len(centdir) != sizeCentralDir:
128                 raise BadZipfile("Truncated central directory")
129             centdir = struct.unpack(structCentralDir, centdir)
130             if centdir[_CD_SIGNATURE] != stringCentralDir:
131                 raise BadZipfile("Bad magic number for central directory")
132             if self.debug > 2:
133                 print centdir
134             filename = fp.read(centdir[_CD_FILENAME_LENGTH])
135             # Create ZipInfo instance to store file information
136             x = ZipInfo(filename)
137             x.extra = fp.read(centdir[_CD_EXTRA_FIELD_LENGTH])
138             x.comment = fp.read(centdir[_CD_COMMENT_LENGTH])
139             x.header_offset = centdir[_CD_LOCAL_HEADER_OFFSET]
140             (x.create_version, x.create_system, x.extract_version, x.reserved,
141                 x.flag_bits, x.compress_type, t, d,
142                 x.CRC, x.compress_size, x.file_size) = centdir[1:12]
143             x.volume, x.internal_attr, x.external_attr = centdir[15:18]
144             # Convert date/time code to (year, month, day, hour, min, sec)
145             x._raw_time = t
146             x.date_time = ( (d>>9)+1980, (d>>5)&0xF, d&0x1F,
147                                      t>>11, (t>>5)&0x3F, (t&0x1F) * 2 )
148 
149             x._decodeExtra()
150             x.header_offset = x.header_offset + concat
151             x.filename = x._decodeFilename()
152             self.filelist.append(x)
153             self.NameToInfo[x.filename] = x
154 
155             # update total bytes read from central directory
156             total = (total + sizeCentralDir + centdir[_CD_FILENAME_LENGTH]
157                      + centdir[_CD_EXTRA_FIELD_LENGTH]
158                      + centdir[_CD_COMMENT_LENGTH])
159 
160             if self.debug > 2:
161                 print "total", total
162 
163 
164     def namelist(self):
165         """Return a list of file names in the archive."""
166         l = []
167         for data in self.filelist:
168             l.append(data.filename)
169         return l
170 
171     def infolist(self):
172         """Return a list of class ZipInfo instances for files in the
173         archive."""
174         return self.filelist
175 
176     def printdir(self):
177         """Print a table of contents for the zip file."""
178         print "%-46s %19s %12s" % ("File Name", "Modified    ", "Size")
179         for zinfo in self.filelist:
180             date = "%d-%02d-%02d %02d:%02d:%02d" % zinfo.date_time[:6]
181             print "%-46s %s %12d" % (zinfo.filename, date, zinfo.file_size)
182 
183     def testzip(self):
184         """Read all the files and check the CRC."""
185         chunk_size = 2 ** 20
186         for zinfo in self.filelist:
187             try:
188                 # Read by chunks, to avoid an OverflowError or a
189                 # MemoryError with very large embedded files.
190                 with self.open(zinfo.filename, "r") as f:
191                     while f.read(chunk_size):     # Check CRC-32
192                         pass
193             except BadZipfile:
194                 return zinfo.filename
195 
196     def getinfo(self, name):
197         """Return the instance of ZipInfo given 'name'."""
198         info = self.NameToInfo.get(name)
199         if info is None:
200             raise KeyError(
201                 'There is no item named %r in the archive' % name)
202 
203         return info
204 
205     def setpassword(self, pwd):
206         """Set default password for encrypted files."""
207         self.pwd = pwd
208 
209     @property
210     def comment(self):
211         """The comment text associated with the ZIP file."""
212         return self._comment
213 
214     @comment.setter
215     def comment(self, comment):
216         # check for valid comment length
217         if len(comment) > ZIP_MAX_COMMENT:
218             import warnings
219             warnings.warn('Archive comment is too long; truncating to %d bytes'
220                           % ZIP_MAX_COMMENT, stacklevel=2)
221             comment = comment[:ZIP_MAX_COMMENT]
222         self._comment = comment
223         self._didModify = True
224 
225     def read(self, name, pwd=None):
226         """Return file bytes (as a string) for name."""
227         return self.open(name, "r", pwd).read()
228 
229     def open(self, name, mode="r", pwd=None):
230         """Return file-like object for 'name'."""
231         if mode not in ("r", "U", "rU"):
232             raise RuntimeError, 'open() requires mode "r", "U", or "rU"'
233         if not self.fp:
234             raise RuntimeError, \
235                   "Attempt to read ZIP archive that was already closed"
236 
237         # Only open a new file for instances where we were not
238         # given a file object in the constructor
239         if self._filePassed:
240             zef_file = self.fp
241             should_close = False
242         else:
243             zef_file = open(self.filename, 'rb')
244             should_close = True
245 
246         try:
247             # Make sure we have an info object
248             if isinstance(name, ZipInfo):
249                 # 'name' is already an info object
250                 zinfo = name
251             else:
252                 # Get info object for name
253                 zinfo = self.getinfo(name)
254 
255             zef_file.seek(zinfo.header_offset, 0)
256 
257             # Skip the file header:
258             fheader = zef_file.read(sizeFileHeader)
259             if len(fheader) != sizeFileHeader:
260                 raise BadZipfile("Truncated file header")
261             fheader = struct.unpack(structFileHeader, fheader)
262             if fheader[_FH_SIGNATURE] != stringFileHeader:
263                 raise BadZipfile("Bad magic number for file header")
264 
265             fname = zef_file.read(fheader[_FH_FILENAME_LENGTH])
266             if fheader[_FH_EXTRA_FIELD_LENGTH]:
267                 zef_file.read(fheader[_FH_EXTRA_FIELD_LENGTH])
268 
269             if fname != zinfo.orig_filename:
270                 raise BadZipfile, \
271                         'File name in directory "%s" and header "%s" differ.' % (
272                             zinfo.orig_filename, fname)
273 
274             # check for encrypted flag & handle password
275             is_encrypted = zinfo.flag_bits & 0x1
276             zd = None
277             if is_encrypted:
278                 if not pwd:
279                     pwd = self.pwd
280                 if not pwd:
281                     raise RuntimeError, "File %s is encrypted, " \
282                         "password required for extraction" % name
283 
284                 zd = _ZipDecrypter(pwd)
285                 # The first 12 bytes in the cypher stream is an encryption header
286                 #  used to strengthen the algorithm. The first 11 bytes are
287                 #  completely random, while the 12th contains the MSB of the CRC,
288                 #  or the MSB of the file time depending on the header type
289                 #  and is used to check the correctness of the password.
290                 bytes = zef_file.read(12)
291                 h = map(zd, bytes[0:12])
292                 if zinfo.flag_bits & 0x8:
293                     # compare against the file type from extended local headers
294                     check_byte = (zinfo._raw_time >> 8) & 0xff
295                 else:
296                     # compare against the CRC otherwise
297                     check_byte = (zinfo.CRC >> 24) & 0xff
298                 if ord(h[11]) != check_byte:
299                     raise RuntimeError("Bad password for file", name)
300 
301             return ZipExtFile(zef_file, mode, zinfo, zd,
302                     close_fileobj=should_close)
303         except:
304             if should_close:
305                 zef_file.close()
306             raise
307 
308     def extract(self, member, path=None, pwd=None):
309         """Extract a member from the archive to the current working directory,
310            using its full name. Its file information is extracted as accurately
311            as possible. `member' may be a filename or a ZipInfo object. You can
312            specify a different directory using `path'.
313         """
314         if not isinstance(member, ZipInfo):
315             member = self.getinfo(member)
316 
317         if path is None:
318             path = os.getcwd()
319 
320         return self._extract_member(member, path, pwd)
321 
322     def extractall(self, path=None, members=None, pwd=None):
323         """Extract all members from the archive to the current working
324            directory. `path' specifies a different directory to extract to.
325            `members' is optional and must be a subset of the list returned
326            by namelist().
327         """
328         if members is None:
329             members = self.namelist()
330 
331         for zipinfo in members:
332             self.extract(zipinfo, path, pwd)
333 
334     def _extract_member(self, member, targetpath, pwd):
335         """Extract the ZipInfo object 'member' to a physical
336            file on the path targetpath.
337         """
338         # build the destination pathname, replacing
339         # forward slashes to platform specific separators.
340         arcname = member.filename.replace('/', os.path.sep)
341 
342         if os.path.altsep:
343             arcname = arcname.replace(os.path.altsep, os.path.sep)
344         # interpret absolute pathname as relative, remove drive letter or
345         # UNC path, redundant separators, "." and ".." components.
346         arcname = os.path.splitdrive(arcname)[1]
347         arcname = os.path.sep.join(x for x in arcname.split(os.path.sep)
348                     if x not in ('', os.path.curdir, os.path.pardir))
349         if os.path.sep == '\\':
350             # filter illegal characters on Windows
351             illegal = ':<>|"?*'
352             if isinstance(arcname, unicode):
353                 table = {ord(c): ord('_') for c in illegal}
354             else:
355                 table = string.maketrans(illegal, '_' * len(illegal))
356             arcname = arcname.translate(table)
357             # remove trailing dots
358             arcname = (x.rstrip('.') for x in arcname.split(os.path.sep))
359             arcname = os.path.sep.join(x for x in arcname if x)
360 
361         targetpath = os.path.join(targetpath, arcname)
362         targetpath = os.path.normpath(targetpath)
363 
364         # Create all upper directories if necessary.
365         upperdirs = os.path.dirname(targetpath)
366         if upperdirs and not os.path.exists(upperdirs):
367             os.makedirs(upperdirs)
368 
369         if member.filename[-1] == '/':
370             if not os.path.isdir(targetpath):
371                 os.mkdir(targetpath)
372             return targetpath
373 
374         with self.open(member, pwd=pwd) as source, \
375              file(targetpath, "wb") as target:
376             shutil.copyfileobj(source, target)
377 
378         return targetpath
379 
380     def _writecheck(self, zinfo):
381         """Check for errors before writing a file to the archive."""
382         if zinfo.filename in self.NameToInfo:
383             import warnings
384             warnings.warn('Duplicate name: %r' % zinfo.filename, stacklevel=3)
385         if self.mode not in ("w", "a"):
386             raise RuntimeError, 'write() requires mode "w" or "a"'
387         if not self.fp:
388             raise RuntimeError, \
389                   "Attempt to write ZIP archive that was already closed"
390         if zinfo.compress_type == ZIP_DEFLATED and not zlib:
391             raise RuntimeError, \
392                   "Compression requires the (missing) zlib module"
393         if zinfo.compress_type not in (ZIP_STORED, ZIP_DEFLATED):
394             raise RuntimeError, \
395                   "That compression method is not supported"
396         if not self._allowZip64:
397             requires_zip64 = None
398             if len(self.filelist) >= ZIP_FILECOUNT_LIMIT:
399                 requires_zip64 = "Files count"
400             elif zinfo.file_size > ZIP64_LIMIT:
401                 requires_zip64 = "Filesize"
402             elif zinfo.header_offset > ZIP64_LIMIT:
403                 requires_zip64 = "Zipfile size"
404             if requires_zip64:
405                 raise LargeZipFile(requires_zip64 +
406                                    " would require ZIP64 extensions")
407 
408     def write(self, filename, arcname=None, compress_type=None):
409         """Put the bytes from filename into the archive under the name
410         arcname."""
411         if not self.fp:
412             raise RuntimeError(
413                   "Attempt to write to ZIP archive that was already closed")
414 
415         st = os.stat(filename)
416         isdir = stat.S_ISDIR(st.st_mode)
417         mtime = time.localtime(st.st_mtime)
418         date_time = mtime[0:6]
419         # Create ZipInfo instance to store file information
420         if arcname is None:
421             arcname = filename
422         arcname = os.path.normpath(os.path.splitdrive(arcname)[1])
423         while arcname[0] in (os.sep, os.altsep):
424             arcname = arcname[1:]
425         if isdir:
426             arcname += '/'
427         zinfo = ZipInfo(arcname, date_time)
428         zinfo.external_attr = (st[0] & 0xFFFF) << 16L      # Unix attributes
429         if compress_type is None:
430             zinfo.compress_type = self.compression
431         else:
432             zinfo.compress_type = compress_type
433 
434         zinfo.file_size = st.st_size
435         zinfo.flag_bits = 0x00
436         zinfo.header_offset = self.fp.tell()    # Start of header bytes
437 
438         self._writecheck(zinfo)
439         self._didModify = True
440 
441         if isdir:
442             zinfo.file_size = 0
443             zinfo.compress_size = 0
444             zinfo.CRC = 0
445             zinfo.external_attr |= 0x10  # MS-DOS directory flag
446             self.filelist.append(zinfo)
447             self.NameToInfo[zinfo.filename] = zinfo
448             self.fp.write(zinfo.FileHeader(False))
449             return
450 
451         with open(filename, "rb") as fp:
452             # Must overwrite CRC and sizes with correct data later
453             zinfo.CRC = CRC = 0
454             zinfo.compress_size = compress_size = 0
455             # Compressed size can be larger than uncompressed size
456             zip64 = self._allowZip64 and \
457                     zinfo.file_size * 1.05 > ZIP64_LIMIT
458             self.fp.write(zinfo.FileHeader(zip64))
459             if zinfo.compress_type == ZIP_DEFLATED:
460                 cmpr = zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION,
461                      zlib.DEFLATED, -15)
462             else:
463                 cmpr = None
464             file_size = 0
465             while 1:
466                 buf = fp.read(1024 * 8)
467                 if not buf:
468                     break
469                 file_size = file_size + len(buf)
470                 CRC = crc32(buf, CRC) & 0xffffffff
471                 if cmpr:
472                     buf = cmpr.compress(buf)
473                     compress_size = compress_size + len(buf)
474                 self.fp.write(buf)
475         if cmpr:
476             buf = cmpr.flush()
477             compress_size = compress_size + len(buf)
478             self.fp.write(buf)
479             zinfo.compress_size = compress_size
480         else:
481             zinfo.compress_size = file_size
482         zinfo.CRC = CRC
483         zinfo.file_size = file_size
484         if not zip64 and self._allowZip64:
485             if file_size > ZIP64_LIMIT:
486                 raise RuntimeError('File size has increased during compressing')
487             if compress_size > ZIP64_LIMIT:
488                 raise RuntimeError('Compressed size larger than uncompressed size')
489         # Seek backwards and write file header (which will now include
490         # correct CRC and file sizes)
491         position = self.fp.tell()       # Preserve current position in file
492         self.fp.seek(zinfo.header_offset, 0)
493         self.fp.write(zinfo.FileHeader(zip64))
494         self.fp.seek(position, 0)
495         self.filelist.append(zinfo)
496         self.NameToInfo[zinfo.filename] = zinfo
497 
498     def writestr(self, zinfo_or_arcname, bytes, compress_type=None):
499         """Write a file into the archive.  The contents is the string
500         'bytes'.  'zinfo_or_arcname' is either a ZipInfo instance or
501         the name of the file in the archive."""
502         if not isinstance(zinfo_or_arcname, ZipInfo):
503             zinfo = ZipInfo(filename=zinfo_or_arcname,
504                             date_time=time.localtime(time.time())[:6])
505 
506             zinfo.compress_type = self.compression
507             if zinfo.filename[-1] == '/':
508                 zinfo.external_attr = 0o40775 << 16   # drwxrwxr-x
509                 zinfo.external_attr |= 0x10           # MS-DOS directory flag
510             else:
511                 zinfo.external_attr = 0o600 << 16     # ?rw-------
512         else:
513             zinfo = zinfo_or_arcname
514 
515         if not self.fp:
516             raise RuntimeError(
517                   "Attempt to write to ZIP archive that was already closed")
518 
519         if compress_type is not None:
520             zinfo.compress_type = compress_type
521 
522         zinfo.file_size = len(bytes)            # Uncompressed size
523         zinfo.header_offset = self.fp.tell()    # Start of header bytes
524         self._writecheck(zinfo)
525         self._didModify = True
526         zinfo.CRC = crc32(bytes) & 0xffffffff       # CRC-32 checksum
527         if zinfo.compress_type == ZIP_DEFLATED:
528             co = zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION,
529                  zlib.DEFLATED, -15)
530             bytes = co.compress(bytes) + co.flush()
531             zinfo.compress_size = len(bytes)    # Compressed size
532         else:
533             zinfo.compress_size = zinfo.file_size
534         zip64 = zinfo.file_size > ZIP64_LIMIT or \
535                 zinfo.compress_size > ZIP64_LIMIT
536         if zip64 and not self._allowZip64:
537             raise LargeZipFile("Filesize would require ZIP64 extensions")
538         self.fp.write(zinfo.FileHeader(zip64))
539         self.fp.write(bytes)
540         if zinfo.flag_bits & 0x08:
541             # Write CRC and file sizes after the file data
542             fmt = '<LQQ' if zip64 else '<LLL'
543             self.fp.write(struct.pack(fmt, zinfo.CRC, zinfo.compress_size,
544                   zinfo.file_size))
545         self.fp.flush()
546         self.filelist.append(zinfo)
547         self.NameToInfo[zinfo.filename] = zinfo
548 
549     def __del__(self):
550         """Call the "close()" method in case the user forgot."""
551         self.close()
552 
553     def close(self):
554         """Close the file, and for mode "w" and "a" write the ending
555         records."""
556         if self.fp is None:
557             return
558 
559         try:
560             if self.mode in ("w", "a") and self._didModify: # write ending records
561                 pos1 = self.fp.tell()
562                 for zinfo in self.filelist:         # write central directory
563                     dt = zinfo.date_time
564                     dosdate = (dt[0] - 1980) << 9 | dt[1] << 5 | dt[2]
565                     dostime = dt[3] << 11 | dt[4] << 5 | (dt[5] // 2)
566                     extra = []
567                     if zinfo.file_size > ZIP64_LIMIT \
568                             or zinfo.compress_size > ZIP64_LIMIT:
569                         extra.append(zinfo.file_size)
570                         extra.append(zinfo.compress_size)
571                         file_size = 0xffffffff
572                         compress_size = 0xffffffff
573                     else:
574                         file_size = zinfo.file_size
575                         compress_size = zinfo.compress_size
576 
577                     if zinfo.header_offset > ZIP64_LIMIT:
578                         extra.append(zinfo.header_offset)
579                         header_offset = 0xffffffffL
580                     else:
581                         header_offset = zinfo.header_offset
582 
583                     extra_data = zinfo.extra
584                     if extra:
585                         # Append a ZIP64 field to the extra's
586                         extra_data = struct.pack(
587                                 '<HH' + 'Q'*len(extra),
588                                 1, 8*len(extra), *extra) + extra_data
589 
590                         extract_version = max(45, zinfo.extract_version)
591                         create_version = max(45, zinfo.create_version)
592                     else:
593                         extract_version = zinfo.extract_version
594                         create_version = zinfo.create_version
595 
596                     try:
597                         filename, flag_bits = zinfo._encodeFilenameFlags()
598                         centdir = struct.pack(structCentralDir,
599                         stringCentralDir, create_version,
600                         zinfo.create_system, extract_version, zinfo.reserved,
601                         flag_bits, zinfo.compress_type, dostime, dosdate,
602                         zinfo.CRC, compress_size, file_size,
603                         len(filename), len(extra_data), len(zinfo.comment),
604                         0, zinfo.internal_attr, zinfo.external_attr,
605                         header_offset)
606                     except DeprecationWarning:
607                         print >>sys.stderr, (structCentralDir,
608                         stringCentralDir, create_version,
609                         zinfo.create_system, extract_version, zinfo.reserved,
610                         zinfo.flag_bits, zinfo.compress_type, dostime, dosdate,
611                         zinfo.CRC, compress_size, file_size,
612                         len(zinfo.filename), len(extra_data), len(zinfo.comment),
613                         0, zinfo.internal_attr, zinfo.external_attr,
614                         header_offset)
615                         raise
616                     self.fp.write(centdir)
617                     self.fp.write(filename)
618                     self.fp.write(extra_data)
619                     self.fp.write(zinfo.comment)
620 
621                 pos2 = self.fp.tell()
622                 # Write end-of-zip-archive record
623                 centDirCount = len(self.filelist)
624                 centDirSize = pos2 - pos1
625                 centDirOffset = pos1
626                 requires_zip64 = None
627                 if centDirCount > ZIP_FILECOUNT_LIMIT:
628                     requires_zip64 = "Files count"
629                 elif centDirOffset > ZIP64_LIMIT:
630                     requires_zip64 = "Central directory offset"
631                 elif centDirSize > ZIP64_LIMIT:
632                     requires_zip64 = "Central directory size"
633                 if requires_zip64:
634                     # Need to write the ZIP64 end-of-archive records
635                     if not self._allowZip64:
636                         raise LargeZipFile(requires_zip64 +
637                                            " would require ZIP64 extensions")
638                     zip64endrec = struct.pack(
639                             structEndArchive64, stringEndArchive64,
640                             44, 45, 45, 0, 0, centDirCount, centDirCount,
641                             centDirSize, centDirOffset)
642                     self.fp.write(zip64endrec)
643 
644                     zip64locrec = struct.pack(
645                             structEndArchive64Locator,
646                             stringEndArchive64Locator, 0, pos2, 1)
647                     self.fp.write(zip64locrec)
648                     centDirCount = min(centDirCount, 0xFFFF)
649                     centDirSize = min(centDirSize, 0xFFFFFFFF)
650                     centDirOffset = min(centDirOffset, 0xFFFFFFFF)
651 
652                 endrec = struct.pack(structEndArchive, stringEndArchive,
653                                     0, 0, centDirCount, centDirCount,
654                                     centDirSize, centDirOffset, len(self._comment))
655                 self.fp.write(endrec)
656                 self.fp.write(self._comment)
657                 self.fp.flush()
658         finally:
659             fp = self.fp
660             self.fp = None
661             if not self._filePassed:
662                 fp.close()
663 
664 ZipFile
665 
666 tarfile

TarFile

7.json & pickle 模塊

用於序列化的兩個模塊

json，用於字符串和 python數據類型間進行轉換
pickle，用於python特有的類型和 python的數據類型間進行轉換

Json模塊提供了四個功能：dumps、dump、loads、load

pickle模塊提供了四個功能：dumps、dump、loads、load

8.shelve模塊 shelve模塊是一個簡單的k,v將內存數據經過文件持久化的模塊，能夠持久化任何pickle可支持的python數據格式

  1 import shelve
  2 
  3 d = shelve.open('shelve_test') #打開一個文件
  4 
  5 class Test(object):
  6     def __init__(self,n):
  7         self.n = n
  8 
  9 t = Test(123)
 10 t2 = Test(123334)
 11 
 12 name = ["alex","rain","test"]
 13 d["test"] = name #持久化列表
 14 d["t1"] = t      #持久化類
 15 d["t2"] = t2
 16 
 17 d.close()

9.xml處理

xml是實現不一樣語言或程序之間進行數據交換的協議，跟json差很少，但json使用起來更簡單，不過，古時候，在json還沒誕生的黑暗年代，你們只能選擇用xml呀，至今不少傳統公司如金融行業的不少系統的接口還主要是xml。

xml的格式以下，就是經過<>節點來區別數據結構的:

  1 <?xml version="1.0"?>
  2 <data>
  3     <country name="Liechtenstein">
  4         <rank updated="yes">2</rank>
  5         <year>2008</year>
  6         <gdppc>141100</gdppc>
  7         <neighbor name="Austria" direction="E"/>
  8         <neighbor name="Switzerland" direction="W"/>
  9     </country>
 10     <country name="Singapore">
 11         <rank updated="yes">5</rank>
 12         <year>2011</year>
 13         <gdppc>59900</gdppc>
 14         <neighbor name="Malaysia" direction="N"/>
 15     </country>
 16     <country name="Panama">
 17         <rank updated="yes">69</rank>
 18         <year>2011</year>
 19         <gdppc>13600</gdppc>
 20         <neighbor name="Costa Rica" direction="W"/>
 21         <neighbor name="Colombia" direction="E"/>
 22     </country>
 23 </data>

View Code

xml協議在各個語言裏的都是支持的，在python中能夠用如下模塊操做xml

  1 import xml.etree.ElementTree as ET
  2 
  3 tree = ET.parse("xmltest.xml")
  4 root = tree.getroot()
  5 print(root.tag)
  6 
  7 #遍歷xml文檔
  8 for child in root:
  9     print(child.tag, child.attrib)
 10     for i in child:
 11         print(i.tag,i.text)
 12 
 13 #只遍歷year 節點
 14 for node in root.iter('year'):
 15 　　print(node.tag,node.text)

View Code

修改和刪除xml文檔內容

  1 import xml.etree.ElementTree as ET
  2 
  3 tree = ET.parse("xmltest.xml")
  4 root = tree.getroot()
  5 
  6 #修改
  7 for node in root.iter('year'):
  8     new_year = int(node.text) + 1
  9     node.text = str(new_year)
 10     node.set("updated","yes")
 11 
 12 tree.write("xmltest.xml")
 13 
 14 
 15 #刪除node
 16 for country in root.findall('country'):
 17    rank = int(country.find('rank').text)
 18    if rank > 50:
 19      root.remove(country)
 20 
 21 tree.write('output.xml')

View Code

本身建立xml文檔

  1 import xml.etree.ElementTree as ET
  2 
  3 
  4 new_xml = ET.Element("namelist")
  5 name = ET.SubElement(new_xml,"name",attrib={"enrolled":"yes"})
  6 age = ET.SubElement(name,"age",attrib={"checked":"no"})
  7 sex = ET.SubElement(name,"sex")
  8 sex.text = '33'
  9 name2 = ET.SubElement(new_xml,"name",attrib={"enrolled":"no"})
 10 age = ET.SubElement(name2,"age")
 11 age.text = '19'
 12 
 13 et = ET.ElementTree(new_xml) #生成文檔對象
 14 et.write("test.xml", encoding="utf-8",xml_declaration=True)
 15 
 16 ET.dump(new_xml) #打印生成的格式

View Code

10.PyYAML模塊

Python也能夠很容易的處理ymal文檔格式，只不過須要安裝一個模塊，參考文檔：http://pyyaml.org/wiki/PyYAMLDocumentation

11.ConfigParser模塊

用於生成和修改常見配置文檔，當前模塊的名稱在 python 3.x 版本中變動爲 configparser。

來看一個好多軟件的常見文檔格式以下

  1 
  2 [DEFAULT]
  3 
  4 ServerAliveInterval = 45
  5 
  6 Compression = yes
  7 
  8 CompressionLevel = 9
  9 
 10 ForwardX11 = yes
 11 
 12 
 13 
 14 [bitbucket.org]
 15 
 16 User = hg
 17 
 18 
 19 
 20 [topsecret.server.com]
 21 
 22 Port = 50022
 23 
 24 ForwardX11 = no

View Code

若是想用python生成一個這樣的文檔怎麼作呢？

  1 
  2 import configparser
  3 
  4 
  5 
  6 config = configparser.ConfigParser()
  7 
  8 config["DEFAULT"] = {'ServerAliveInterval': '45',
  9 
 10                       'Compression': 'yes',
 11 
 12                      'CompressionLevel': '9'}
 13 
 14 
 15 
 16 config['bitbucket.org'] = {}
 17 
 18 config['bitbucket.org']['User'] = 'hg'
 19 
 20 config['topsecret.server.com'] = {}
 21 
 22 topsecret = config['topsecret.server.com']
 23 
 24 topsecret['Host Port'] = '50022'     # mutates the parser
 25 
 26 topsecret['ForwardX11'] = 'no'  # same here
 27 
 28 config['DEFAULT']['ForwardX11'] = 'yes'
 29 
 30 with open('example.ini', 'w') as configfile:
 31 
 32    config.write(configfile)

View Code

寫完了還能夠再讀出來哈。

  1 
  2 >>> import configparser
  3 
  4 >>> config = configparser.ConfigParser()
  5 
  6 >>> config.sections()
  7 
  8 []
  9 
 10 >>> config.read('example.ini')
 11 
 12 ['example.ini']
 13 
 14 >>> config.sections()
 15 
 16 ['bitbucket.org', 'topsecret.server.com']
 17 
 18 >>> 'bitbucket.org' in config
 19 
 20 True
 21 
 22 >>> 'bytebong.com' in config
 23 
 24 False
 25 
 26 >>> config['bitbucket.org']['User']
 27 
 28 'hg'
 29 
 30 >>> config['DEFAULT']['Compression']
 31 
 32 'yes'
 33 
 34 >>> topsecret = config['topsecret.server.com']
 35 
 36 >>> topsecret['ForwardX11']
 37 
 38 'no'
 39 
 40 >>> topsecret['Port']
 41 
 42 '50022'
 43 
 44 >>> for key in config['bitbucket.org']: print(key)
 45 
 46 ...
 47 
 48 user
 49 
 50 compressionlevel
 51 
 52 serveraliveinterval
 53 
 54 compression
 55 
 56 forwardx11
 57 
 58 >>> config['bitbucket.org']['ForwardX11']
 59 
 60 'yes'

View Code

configparser增刪改查語法

  1 [section1]
  2 k1 = v1
  3 k2:v2
  4 [section2]
  5 k1 = v1
  6 import ConfigParser
  7 config = ConfigParser.ConfigParser()
  8 config.read('i.cfg')
  9 # ########## 讀 ##########
 10 #secs = config.sections()
 11 #print secs
 12 #options = config.options('group2')
 13 #print options
 14 #item_list = config.items('group2')
 15 #print item_list
 16 #val = config.get('group1','key')
 17 #val = config.getint('group1','key')
 18 # ########## 改寫 ##########
 19 #sec = config.remove_section('group1')
 20 #config.write(open('i.cfg', "w"))
 21 #sec = config.has_section('wupeiqi')
 22 #sec = config.add_section('wupeiqi')
 23 #config.write(open('i.cfg', "w"))
 24 #config.set('group2','k1',11111)
 25 #config.write(open('i.cfg', "w"))
 26 #config.remove_option('group2','age')
 27 #config.write(open('i.cfg', "w"))

View Code

12.hashlib模塊 　　

用於加密相關的操做，3.x裏代替了md5模塊和sha模塊，主要提供 SHA1, SHA224, SHA256, SHA384, SHA512 ，MD5 算法

  1 import hashlib
  2 
  3 m = hashlib.md5()
  4 m.update(b"Hello")
  5 m.update(b"It's me")
  6 print(m.hexdigest())
  7 m.update(b"It's been a long time since last time we ...")
  8 
  9 print(m.hexdigest()) #2進制格式hash
 10 print(len(m.hexdigest())) #16進制格式hash
 11 '''
 12 def digest(self, *args, **kwargs): # real signature unknown
 13     """ Return the digest value as a string of binary data. """
 14     pass
 15 
 16 def hexdigest(self, *args, **kwargs): # real signature unknown
 17     """ Return the digest value as a string of hexadecimal digits. """
 18     pass
 19 
 20 '''
 21 import hashlib
 22 
 23 # ######## md5 ########
 24 
 25 hash = hashlib.md5()
 26 hash.update('admin')
 27 print(hash.hexdigest())
 28 
 29 # ######## sha1 ########
 30 
 31 hash = hashlib.sha1()
 32 hash.update('admin')
 33 print(hash.hexdigest())
 34 
 35 # ######## sha256 ########
 36 
 37 hash = hashlib.sha256()
 38 hash.update('admin')
 39 print(hash.hexdigest())
 40 
 41 
 42 # ######## sha384 ########
 43 
 44 hash = hashlib.sha384()
 45 hash.update('admin')
 46 print(hash.hexdigest())
 47 
 48 # ######## sha512 ########
 49 
 50 hash = hashlib.sha512()
 51 hash.update('admin')
 52 print(hash.hexdigest())

View Code

還不夠吊？python 還有一個 hmac 模塊，它內部對咱們建立 key 和內容再進行處理而後再加密

散列消息鑑別碼，簡稱HMAC，是一種基於消息鑑別碼MAC（Message Authentication Code）的鑑別機制。使用HMAC時,消息通信的雙方，經過驗證消息中加入的鑑別密鑰K來鑑別消息的真僞；

通常用於網絡通訊中消息加密，前提是雙方先要約定好key,就像接頭暗號同樣，而後消息發送把用key把消息加密，接收方用key ＋消息明文再加密，拿加密後的值跟發送者的相對比是否相等，這樣就能驗證消息的真實性，及發送者的合法性了.

  1 import hmac
  2 h = hmac.new('wueiqi')
  3 h.update('hellowo')
  4 print h.hexdigest()

View Code

13.Subprocess模塊

經常使用subprocess方法示例

#執行命令，返回命令執行狀態 ， 0 or 非0
>>> retcode = subprocess.call(["ls", "-l"])

#執行命令，若是命令結果爲0，就正常返回，不然拋異常
>>> subprocess.check_call(["ls", "-l"])
0

#接收字符串格式命令，返回元組形式，第1個元素是執行狀態，第2個是命令結果 
>>> subprocess.getstatusoutput('ls /bin/ls')
(0, '/bin/ls')

#接收字符串格式命令，並返回結果
>>> subprocess.getoutput('ls /bin/ls')
'/bin/ls'

#執行命令，並返回結果，注意是返回結果，不是打印，下例結果返回給res
>>> res=subprocess.check_output(['ls','-l'])
>>> res
b'total 0\ndrwxr-xr-x 12 alex staff 408 Nov 2 11:05 OldBoyCRM\n'

#上面那些方法，底層都是封裝的subprocess.Popen
poll()
 Check if child process has terminated. Returns returncode

wait()
 Wait for child process to terminate. Returns returncode attribute.


terminate() 殺掉所啓動進程
communicate() 等待任務結束

stdin 標準輸入

stdout 標準輸出

stderr 標準錯誤

pid
 The process ID of the child process.

#例子
>>> p = subprocess.Popen("df -h|grep disk",stdin=subprocess.PIPE,stdout=subprocess.PIPE,shell=True)
>>> p.stdout.read()
b'/dev/disk1 465Gi 64Gi 400Gi 14% 16901472 104938142 14% /\n'

View Code

>>> subprocess.run(["ls", "-l"])  # doesn't capture output
CompletedProcess(args=['ls', '-l'], returncode=0)
>>> subprocess.run("exit 1", shell=True, check=True)
Traceback (most recent call last):
  ...
subprocess.CalledProcessError: Command 'exit 1' returned non-zero exit status 1
>>> subprocess.run(["ls", "-l", "/dev/null"], stdout=subprocess.PIPE)
CompletedProcess(args=['ls', '-l', '/dev/null'], returncode=0,
stdout=b'crw-rw-rw- 1 root root 1, 3 Jan 23 16:23 /dev/null\n')

View Code

調用subprocess.run(...)是推薦的經常使用方法，在大多數狀況下能知足需求，但若是你可能須要進行一些複雜的與系統的交互的話，你還能夠用subprocess.Popen(),語法以下：

p = subprocess.Popen("find / -size +1000000 -exec ls -shl {} \;",shell=True,stdout=subprocess.PIPE)
print(p.stdout.read())

View Code

可用參數：

args：shell命令，能夠是字符串或者序列類型（如：list，元組）
bufsize：指定緩衝。0 無緩衝,1 行緩衝,其餘緩衝區大小,負值系統緩衝
stdin, stdout, stderr：分別表示程序的標準輸入、輸出、錯誤句柄
preexec_fn：只在Unix平臺下有效，用於指定一個可執行對象（callable object），它將在子進程運行以前被調用
close_sfs：在windows平臺下，若是close_fds被設置爲True，則新建立的子進程將不會繼承父進程的輸入、輸出、錯誤管道。
因此不能將close_fds設置爲True同時重定向子進程的標準輸入、輸出與錯誤(stdin, stdout, stderr)。
shell：同上
cwd：用於設置子進程的當前目錄
env：用於指定子進程的環境變量。若是env = None，子進程的環境變量將從父進程中繼承。
universal_newlines：不一樣系統的換行符不一樣，True -> 贊成使用 \n
startupinfo與createionflags只在windows下有效
將被傳遞給底層的CreateProcess()函數，用於設置子進程的一些屬性，如：主窗口的外觀，進程的優先級等等

終端輸入的命令分爲兩種：

輸入便可獲得輸出，如：ifconfig
輸入進行某環境，依賴再輸入，如：python

須要交互的命令示例

import subprocess
obj = subprocess.Popen(["python"], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
obj.stdin.write('print 1 \n ')
obj.stdin.write('print 2 \n ')
obj.stdin.write('print 3 \n ')
obj.stdin.write('print 4 \n ')
out_error_list = obj.communicate(timeout=10)
print out_error_list

View Code

subprocess實現sudo 自動輸入密碼

import subprocess
def mypass():
    mypass = '123' #or get the password from anywhere
    return mypass
echo = subprocess.Popen(['echo',mypass()],
                        stdout=subprocess.PIPE,
                        )
sudo = subprocess.Popen(['sudo','-S','iptables','-L'],
                        stdin=echo.stdout,
                        stdout=subprocess.PIPE,
                        )
end_of_pipe = sudo.stdout
print "Password ok \n Iptables Chains %s" % end_of_pipe.read()

View Code

14.logging模塊

不少程序都有記錄日誌的需求，而且日誌中包含的信息即有正常的程序訪問日誌，還可能有錯誤、警告等信息輸出，python的logging模塊提供了標準的日誌接口，你能夠經過它存儲各類格式的日誌，logging的日誌能夠分爲 debug(), info(), warning(), error() and critical() 5個級別，下面咱們看一下怎麼用。

最簡單用法

import logging
logging.warning("user [alex] attempted wrong password more than 3 times")
logging.critical("server is down")
#輸出
WARNING:root:user [alex] attempted wrong password more than 3 times
CRITICAL:root:server is down

View Code

若是想把日誌寫到文件裏，也很簡單

import logging
logging.basicConfig(filename='example.log',level=logging.INFO)
logging.debug('This message should go to the log file')
logging.info('So should this')
logging.warning('And this, too')

View Code

其中下面這句中的level=loggin.INFO意思是，把日誌紀錄級別設置爲INFO，也就是說，只有比日誌是INFO或比INFO級別更高的日誌纔會被紀錄到文件裏，在這個例子，第一條日誌是不會被紀錄的，若是但願紀錄debug的日誌，那把日誌級別改爲DEBUG就好了。

logging.basicConfig(filename='example.log',level=logging.INFO)

View Code

感受上面的日誌格式忘記加上時間啦，日誌不知道時間怎麼行呢，下面就來加上!

import logging
logging.basicConfig(format='%(asctime)s %(message)s', datefmt='%m/%d/%Y %I:%M:%S %p')
logging.warning('is when this event was logged.')

View Code

15.re正則表達式

基本正則表達式元字符和語法

經常使用正則表達式符號

'.'     默認匹配除\n以外的任意一個字符，若指定flag DOTALL,則匹配任意字符，包括換行
'^'     匹配字符開頭，若指定flags MULTILINE,這種也能夠匹配上(r"^a","\nabc\neee",flags=re.MULTILINE)
'$'     匹配字符結尾，或e.search("foo$","bfoo\nsdfsf",flags=re.MULTILINE).group()也能夠
'*'     匹配*號前的字符0次或屢次，re.findall("ab*","cabb3abcbbac")  結果爲['abb', 'ab', 'a']
'+'     匹配前一個字符1次或屢次，re.findall("ab+","ab+cd+abb+bba") 結果['ab', 'abb']
'?'     匹配前一個字符1次或0次
'{m}'   匹配前一個字符m次
'{n,m}' 匹配前一個字符n到m次，re.findall("ab{1,3}","abb abc abbcbbb") 結果'abb', 'ab', 'abb']
'|'     匹配|左或|右的字符，re.search("abc|ABC","ABCBabcCD").group() 結果'ABC'
'(...)' 分組匹配，re.search("(abc){2}a(123|456)c", "abcabca456c").group() 結果 abcabca456c
'\A'    只從字符開頭匹配，re.search("\Aabc","alexabc") 是匹配不到的
'\Z'    匹配字符結尾，同$
'\d'    匹配數字0-9
'\D'    匹配非數字
'\w'    匹配[A-Za-z0-9]
'\W'    匹配非[A-Za-z0-9]
's'     匹配空白字符、\t、\n、\r , re.search("\s+","ab\tc1\n3").group() 結果 '\t'
'(?P<name>...)' 分組匹配 re.search("(?P<province>[0-9]{4})(?P<city>[0-9]{2})(?P<birthday>[0-9]{4})","371481199306143242").groupdict("city") 結果{'province': '3714', 'city': '81', 'birthday': '1993'}

最經常使用的匹配語法

re.match 從頭開始匹配
re.search 匹配包含
re.findall 把全部匹配到的字符放到以列表中的元素返回
re.splitall 以匹配到的字符當作列表分隔符
re.sub      匹配字符並替換

View Code

1）match(pattern, string, flags=0)

從起始位置開始根據模型去字符串中匹配指定內容，匹配單個

正則表達式
要匹配的字符串
標誌位，用於控制正則表達式的匹配方式

import re

obj = re.match('\d+', '123uuasf')
if obj:
    print obj.group()

import re

obj = re.findall('\d+', 'fa123uu888asf')
print obj

# flags
I = IGNORECASE = sre_compile.SRE_FLAG_IGNORECASE # ignore case
L = LOCALE = sre_compile.SRE_FLAG_LOCALE # assume current 8-bit locale
U = UNICODE = sre_compile.SRE_FLAG_UNICODE # assume unicode locale
M = MULTILINE = sre_compile.SRE_FLAG_MULTILINE # make anchors look for newline
S = DOTALL = sre_compile.SRE_FLAG_DOTALL # make dot match newline
X = VERBOSE = sre_compile.SRE_FLAG_VERBOSE # ignore whitespace and comments

flags

2）search(pattern, string, flags=0)

根據模型去字符串中匹配指定內容，匹配單個

import re

obj = re.search('\d+', 'u123uu888asf')
if obj:
    print obj.group()

3）group和groups

a = "123abc456"
print re.search("([0-9]*)([a-z]*)([0-9]*)", a).group()

print re.search("([0-9]*)([a-z]*)([0-9]*)", a).group(0)
print re.search("([0-9]*)([a-z]*)([0-9]*)", a).group(1)
print re.search("([0-9]*)([a-z]*)([0-9]*)", a).group(2)

print re.search("([0-9]*)([a-z]*)([0-9]*)", a).groups()

4）findall(pattern, string, flags=0)

上述兩中方式均用於匹配單值，即：只能匹配字符串中的一個，若是想要匹配到字符串中全部符合條件的元素，則須要使用 findall。

import re

obj = re.findall('\d+', 'fa123uu888asf')
print obj

5）sub(pattern, repl, string, count=0, flags=0)

用於替換匹配的字符串

content = "123abc456"
new_content = re.sub('\d+', 'sb', content)
# new_content = re.sub('\d+', 'sb', content, 1)
print new_content

相比於str.replace功能更增強大

6）split(pattern, string, maxsplit=0, flags=0)

根據指定匹配進行分組

content = "'1 - 2 * ((60-30+1*(9-2*5/3+7/3*99/4*2998+10*568/14))-(-4*3)/(16-3*2) )'"
new_content = re.split('\*', content)
# new_content = re.split('\*', content, 1)
print new_content

content = "'1 - 2 * ((60-30+1*(9-2*5/3+7/3*99/4*2998+10*568/14))-(-4*3)/(16-3*2) )'"
new_content = re.split('[\+\-\*\/]+', content)
# new_content = re.split('\*', content, 1)
print new_content

inpp = '1-2*((60-30 +(-40-5)*(9-2*5/3 + 7 /3*99/4*2998 +10 * 568/14 )) - (-4*3)/ (16-3*2))'
inpp = re.sub('\s*','',inpp)
new_content = re.split('\(([\+\-\*\/]?\d+[\+\-\*\/]?\d+){1}\)', inpp, 1)
print new_content

相比於str.split更增強大

計算器源碼