Python 經常使用模塊

時間 2019-11-12

原文原文鏈接

一、time 模塊

時間表示形式

時間相關的操做，時間有三種表示方式：html

時間戳 1970年1月1日以後的秒，即：time.time()
格式化的字符串 2014-11-11 11:11，即：time.strftime('%Y-%m-%d')
結構化時間元組包含了：年、日、星期等... time.struct_time 即：time.localtime()

時間戳：node

import time
print(time.time())

>>> 1526703163.746542

格式化的字符串：python

import time
print(time.strftime("%Y-%m-%d,%H-%M-%S"))

>>> 2018-05-19,12-18-12

%y 兩位數的年份表示（00-99）
%Y 四位數的年份表示（000-9999）
%m 月份（01-12）
%d 月內中的一天（0-31）
%H 24小時制小時數（0-23）
%I 12小時制小時數（01-12）
%M 分鐘數（00=59）
%S 秒（00-59）
%a 本地簡化星期名稱
%A 本地完整星期名稱
%b 本地簡化的月份名稱
%B 本地完整的月份名稱
%c 本地相應的日期表示和時間表示
%j 年內的一天（001-366）
%p 本地A.M.或P.M.的等價符
%U 一年中的星期數（00-53）星期天爲星期的開始
%w 星期（0-6），星期天爲星期的開始
%W 一年中的星期數（00-53）星期一爲星期的開始
%x 本地相應的日期表示
%X 本地相應的時間表示
%Z 當前時區的名稱
%% %號自己

結構化時間:linux

import time
print(time.localtime())

>>> time.struct_time(tm_year=2018, tm_mon=5, tm_mday=19, tm_hour=12, tm_min=19, tm_sec=8, tm_wday=5, tm_yday=139, tm_isdst=0)

小結：時間戳是計算機可以識別的時間；時間字符串是人可以看懂的時間；元組則是用來操做時間的ios

幾種時間形式的轉換

print time.time()
print time.mktime(time.localtime())
  
print time.gmtime()    #可加時間戳參數
print time.localtime() #可加時間戳參數
print time.strptime('2014-11-11', '%Y-%m-%d')
  
print time.strftime('%Y-%m-%d') #默認當前時間
print time.strftime('%Y-%m-%d',time.localtime()) #默認當前時間
print time.asctime()
print time.asctime(time.localtime())
print time.ctime(time.time())

二、datetime 模塊

datetime包下面的模塊

object:
timedelta # 主要用於計算時間跨度
tzinfo # 時區相關
time # 只關注時間
date # 只關注日期
datetime # 同時有時間和日期

其中class中datetime和timedelta比較經常使用，time和date和datetime在使用方法上基本是相同的。正則表達式

獲取當前日期和時間

from datetime import datetime
now = datetime.now() # 獲取當前datetime
print(now)

>>> 2018-05-19 12:29:09.400586

注意到 datetime 是模塊，datetime 模塊還包含一個 datetime 類，經過 from datetime import datetime 導入的纔是 datetime 這個類。算法

若是僅導入 import datetime，則必須引用全名 datetime.datetime。shell

datetime.now() 返回當前日期和時間，其類型是 datetime。數據庫

獲取指定日期和時間

from datetime import datetime
dt = datetime(2015, 4, 19, 12, 20) # 用指定日期時間建立datetime
print(dt)

>>> 2015-04-19 12:20:00

datetime轉換爲timestamp

在計算機中，時間其實是用數字表示的。咱們把1970年1月1日 00:00:00 UTC+00:00時區的時刻稱爲epoch time，記爲0（1970年之前的時間timestamp爲負數），當前時間就是相對於epoch time的秒數，稱爲timestamp。編程

你能夠認爲：

timestamp = 0 = 1970-1-1 00:00:00 UTC+0:00

對應的北京時間是：

timestamp = 0 = 1970-1-1 08:00:00 UTC+8:00

可見timestamp的值與時區毫無關係，由於timestamp一旦肯定，其UTC時間就肯定了，轉換到任意時區的時間也是徹底肯定的，這就是爲何計算機存儲的當前時間是以timestamp表示的，由於全球各地的計算機在任意時刻的timestamp都是徹底相同的（假定時間已校準）。

把一個 datetime 類型轉換爲 timestamp 只須要簡單調用 timestamp() 方法：

from datetime import datetime
dt = datetime(2015, 4, 19, 12, 20) # 用指定日期時間建立datetime
print(dt.timestamp()) # 把datetime轉換爲timestamp

timestamp轉換爲datetime

要把 timestamp 轉換爲 datetime ，使用 datetime 提供的 fromtimestamp() 方法：

from datetime import datetime
t = 1429417200.0
print(datetime.fromtimestamp(t))

>>> 2015-04-19 12:20:00

注意到timestamp是一個浮點數，它沒有時區的概念，而datetime是有時區的。上述轉換是在timestamp和本地時間作轉換。

timestamp也能夠直接被轉換到UTC標準時區的時間：

from datetime import datetime
t = 1429417200.0
print(datetime.fromtimestamp(t)) # 本地時間

>>> 2015-04-19 12:20:00

print(datetime.utcfromtimestamp(t)) # UTC時間

>>> 2015-04-19 04:20:00

str轉換爲datetime

不少時候，用戶輸入的日期和時間是字符串，要處理日期和時間，首先必須把str轉換爲datetime。轉換方法是經過datetime.strptime()實現，須要一個日期和時間的格式化字符串：

from datetime import datetime
cday = datetime.strptime('2015-6-1 18:19:59', '%Y-%m-%d %H:%M:%S')
print(cday)

>>> 2015-06-01 18:19:59

注意轉換後的datetime是沒有時區信息的。

datetime轉換爲str

若是已經有了datetime對象，要把它格式化爲字符串顯示給用戶，就須要轉換爲str，轉換方法是經過strftime()實現的，一樣須要一個日期和時間的格式化字符串：

from datetime import datetime
now = datetime.now()
print(now.strftime('%a, %b %d %H:%M'))

>>> Sat, May 19 12:41

datetime加減

對日期和時間進行加減實際上就是把datetime日後或往前計算，獲得新的 datetime。加減能夠直接用 +和 - 運算符，不過須要導入timedelta這個類：

>>> from datetime import datetime, timedelta
>>> now = datetime.now()
>>> now
datetime.datetime(2015, 5, 18, 16, 57, 3, 540997)
>>> now + timedelta(hours=10)
datetime.datetime(2015, 5, 19, 2, 57, 3, 540997)
>>> now - timedelta(days=1)
datetime.datetime(2015, 5, 17, 16, 57, 3, 540997)
>>> now + timedelta(days=2, hours=12)
datetime.datetime(2015, 5, 21, 4, 57, 3, 540997)

可見，使用timedelta你能夠很容易地算出前幾天和後幾天的時刻。

本地時間轉換爲UTC時間

時區轉換

>>> 廖雪峯大大的教程

三、 random模塊

>>> import random
>>> random.random()      # 大於0且小於1之間的小數
0.7664338663654585

>>> random.randint(1,5)  # 大於等於1且小於等於5之間的整數
2

>>> random.randrange(1,3) # 大於等於1且小於3之間的整數
1

>>> random.choice([1,'23',[4,5]])  # #1或者23或者[4,5]
1

>>> random.sample([1,'23',[4,5]],2) # #列表元素任意2個組合
[[4, 5], '23']

>>> random.uniform(1,3) #大於1小於3的小數
1.6270147180533838

>>> item=[1,3,5,7,9]
>>> random.shuffle(item) # 打亂次序
>>> item
[5, 1, 3, 7, 9]
>>> random.shuffle(item)
>>> item
[5, 9, 7, 1, 3]

練習：生成驗證碼

import random

def v_code():

    code = ''
    for i in range(5):

        num=random.randint(0,9)
        alf=chr(random.randint(65,90))
        add=random.choice([num,alf])
        code="".join([code,str(add)])

    return code

print(v_code())

四、 hashlib

4.1 算法介紹

Python的hashlib提供了常見的摘要算法，如MD5，SHA1等等。

什麼是摘要算法呢？摘要算法又稱哈希算法、散列算法。它經過一個函數，把任意長度的數據轉換爲一個長度固定的數據串（一般用16進制的字符串表示）。

摘要算法就是經過摘要函數f()對任意長度的數據data計算出固定長度的摘要digest，目的是爲了發現原始數據是否被人篡改過。

摘要算法之因此能指出數據是否被篡改過，就是由於摘要函數是一個單向函數，計算f(data)很容易，但經過digest反推data卻很是困難。並且，對原始數據作一個bit的修改，都會致使計算出的摘要徹底不一樣。

咱們以常見的摘要算法MD5爲例，計算出一個字符串的MD5值：

import hashlib
 
md5 = hashlib.md5()
md5.update(b'how to use md5 in python hashlib?')
  # md5.update(bytes("how to use md5 in python hashlib?","utf-8"))

print(md5.hexdigest()) 
計算結果以下： d26a53750bc40b38b65a520292f69306

若是數據量很大，能夠分塊屢次調用update()，最後計算的結果是同樣的：

md5 = hashlib.md5()
md5.update(b'how to use md5 in ')
md5.update(b'python hashlib?')
print(md5.hexdigest())

MD5是最多見的摘要算法，速度很快，生成結果是固定的128 bit字節，一般用一個32位的16進制字符串表示。另外一種常見的摘要算法是SHA1，調用SHA1和調用MD5徹底相似：

import hashlib
 
sha1 = hashlib.sha1()
sha1.update('how to use sha1 in ')
sha1.update('python hashlib?')
print sha1.hexdigest()

SHA1的結果是160 bit字節，一般用一個40位的16進制字符串表示。比SHA1更安全的算法是SHA256和SHA512，不過越安全的算法越慢，並且摘要長度更長。

4.2 摘要算法應用

任何容許用戶登陸的網站都會存儲用戶登陸的用戶名和口令。如何存儲用戶名和口令呢？方法是存到數據庫表中：

name    | password
--------+----------
michael | 123456
bob     | abc999
alice   | alice2008

若是以明文保存用戶口令，若是數據庫泄露，全部用戶的口令就落入黑客的手裏。此外，網站運維人員是能夠訪問數據庫的，也就是能獲取到全部用戶的口令。正確的保存口令的方式是不存儲用戶的明文口令，而是存儲用戶口令的摘要，好比MD5：

username | password
---------+---------------------------------
michael  | e10adc3949ba59abbe56e057f20f883e
bob      | 878ef96e86145580c38c87f0410ad153
alice    | 99b1c2188db85afee403b1536010c2c9

考慮這麼個狀況，不少用戶喜歡用123456，888888，password這些簡單的口令，因而，黑客能夠事先計算出這些經常使用口令的MD5值，獲得一個反推表：

'e10adc3949ba59abbe56e057f20f883e': '123456'
'21218cca77804d2ba1922c33e0151105': '888888'
'5f4dcc3b5aa765d61d8327deb882cf99': 'password'

這樣，無需破解，只須要對比數據庫的MD5，黑客就得到了使用經常使用口令的用戶帳號。

對於用戶來說，固然不要使用過於簡單的口令。可是，咱們可否在程序設計上對簡單口令增強保護呢？

因爲經常使用口令的MD5值很容易被計算出來，因此，要確保存儲的用戶口令不是那些已經被計算出來的經常使用口令的MD5，這一方法經過對原始口令加一個複雜字符串來實現，俗稱「加鹽」：

hashlib.md5("salt".encode("utf8"))

通過Salt處理的MD5口令，只要Salt不被黑客知道，即便用戶輸入簡單口令，也很難經過MD5反推明文口令。

可是若是有兩個用戶都使用了相同的簡單口令好比123456，在數據庫中，將存儲兩條相同的MD5值，這說明這兩個用戶的口令是同樣的。有沒有辦法讓使用相同口令的用戶存儲不一樣的MD5呢？

若是假定用戶沒法修改登陸名，就能夠經過把登陸名做爲Salt的一部分來計算MD5，從而實現相同口令的用戶也存儲不一樣的MD5。

摘要算法在不少地方都有普遍的應用。要注意摘要算法不是加密算法，不能用於加密（由於沒法經過摘要反推明文），只能用於防篡改，可是它的單向計算特性決定了能夠在不存儲明文口令的狀況下驗證用戶口令。

五、hmac

Python自帶的hmac模塊實現了標準的Hmac算法

　　咱們首先須要準備待計算的原始消息message，隨機key，哈希算法，這裏採用MD5，使用hmac的代碼以下：

import hmac
message = b'Hello world'
key = b'secret'

h = hmac.new(key,message,digestmod='MD5')
# 若是消息很長，能夠屢次調用h.update(msg)

print(h.hexdigest())

　　可見使用hmac和普通hash算法很是相似。hmac輸出的長度和原始哈希算法的長度一致。須要注意傳入的key和message都是bytes類型，str類型須要首先編碼爲bytes。

跟hashlib的比較，沒有加鹽是同樣的

import hashlib

md5_obj = hashlib.md5(b'hello')
md5_obj.update(b'world')
print(md5_obj.hexdigest())  # fc5e038d38a57032085441e7fe7010b0

print(hashlib.md5(b'helloworld').hexdigest())  # fc5e038d38a57032085441e7fe7010b0

　　hashlib和hmac均可以進行加鹽的md5加密,即便是相同的鹽和數據,加密出來的結果是不同的哦!

# coding:utf-8
import hmac
import hashlib

content = "hello world"
salt_str = "誰言寸草心,報得三春暉."

obj_md5 = hashlib.md5(salt_str.encode("utf-8"))
obj_md5.update(content.encode("utf-8"))
hashlib_md5 = obj_md5.hexdigest()
print(hashlib_md5)  # 051f2913990c618c0757118687f02354

hmac_md5 = hmac.new(salt_str.encode("utf-8"), content.encode("utf-8"), "md5").hexdigest()
print(hmac_md5)  # 9c1c1559002fd870a4fca899598ba408

五、 os模塊

os模塊是與操做系統交互的一個接口。

'''
os.getcwd() 獲取當前工做目錄，即當前python腳本工做的目錄路徑
os.chdir("dirname")  改變當前腳本工做目錄；至關於shell下cd
os.curdir  返回當前目錄: ('.')
os.pardir  獲取當前目錄的父目錄字符串名：('..')
os.makedirs('dirname1/dirname2')    可生成多層遞歸目錄
os.removedirs('dirname1')    若目錄爲空，則刪除，並遞歸到上一級目錄，如若也爲空，則刪除，依此類推
os.mkdir('dirname')    生成單級目錄；至關於shell中mkdir dirname
os.rmdir('dirname')    刪除單級空目錄，若目錄不爲空則沒法刪除，報錯；至關於shell中rmdir dirname
os.listdir('dirname')    列出指定目錄下的全部文件和子目錄，包括隱藏文件，並以列表方式打印
os.remove()  刪除一個文件
os.rename("oldname","newname")  重命名文件/目錄
os.stat('path/filename')  獲取文件/目錄信息
os.sep    輸出操做系統特定的路徑分隔符，win下爲"\\",Linux下爲"/"
os.linesep    輸出當前平臺使用的行終止符，win下爲"\t\n",Linux下爲"\n"
os.pathsep    輸出用於分割文件路徑的字符串 win下爲;,Linux下爲:
os.name    輸出字符串指示當前使用平臺。win->'nt'; Linux->'posix'
os.system("bash command")  運行shell命令，直接顯示
os.environ  獲取系統環境變量
os.path.abspath(path)  返回path規範化的絕對路徑
os.path.split(path)  將path分割成目錄和文件名二元組返回
os.path.dirname(path)  返回path的目錄。其實就是os.path.split(path)的第一個元素
os.path.basename(path)  返回path最後的文件名。如何path以／或\結尾，那麼就會返回空值。即os.path.split(path)的第二個元素
os.path.exists(path)  若是path存在，返回True；若是path不存在，返回False
os.path.isabs(path)  若是path是絕對路徑，返回True
os.path.isfile(path)  若是path是一個存在的文件，返回True。不然返回False
os.path.isdir(path)  若是path是一個存在的目錄，則返回True。不然返回False
os.path.join(path1[, path2[, ...]])  將多個路徑組合後返回，第一個絕對路徑以前的參數將被忽略
os.path.getatime(path)  返回path所指向的文件或者目錄的最後訪問時間
os.path.getmtime(path)  返回path所指向的文件或者目錄的最後修改時間
os.path.getsize(path) 返回path的大小

os.getpid() 得到進程號
os.getppid() 得到父進程號
'''
os.urandom(20) 生成20位隨機字符

注意：

一、os.stat('path/filename')  獲取文件/目錄信息。

stat 結構:

st_mode: inode 保護模式
st_ino: inode 節點號。
st_dev: inode 駐留的設備。
st_nlink: inode 的連接數。
st_uid: 全部者的用戶ID。
st_gid: 全部者的組ID。
st_size: 普通文件以字節爲單位的大小；包含等待某些特殊文件的數據。
st_atime: 上次訪問的時間。
st_mtime: 最後一次修改的時間。
st_ctime: 由操做系統報告的"ctime"。在某些系統上（如Unix）是最新的元數據更改的時間，在其它系統上（如Windows）是建立時間（詳細信息參見平臺的文檔）。

import os
a = os.stat("D:\WeChat\WeChat.exe").st_size
print(a)

os.popen(command[, mode[, bufsize]])

cmd：要執行的命令。
mode：打開文件的模式，默認爲'r'，用法與open()相同。
buffering：0意味着無緩衝；1意味着行緩衝；其它正值表示使用參數大小的緩衝。負的bufsize意味着使用系統的默認值，通常來講，對於tty設備，它是行緩衝；對於其它文件，它是全緩衝。

#!/usr/bin/python
# -*- coding: UTF-8 -*-

import os, sys

# 使用 mkdir 命令
a = 'mkdir nwdir'

b = os.popen(a,'r',1)

print(b)

------------------------------
open file 'mkdir nwdir', mode 'r' at 0x81614d0

遞歸刪除非空目錄

import os
def remove_dir(dir):
    dir = dir.replace('\\', '/')
    if(os.path.isdir(dir)):
        for p in os.listdir(dir):
            remove_dir(os.path.join(dir,p))
        if(os.path.exists(dir)):
            os.rmdir(dir)
    else:
        if(os.path.exists(dir)):
            os.remove(dir)
if __name__ == '__main__':
    remove_dir(r'D:/python/practice/') #函數使用



import shutil
 
path = 'g:\zhidao'
shutil.rmtree(path)

os.urandom

import os
import base64

# 生成32位隨機字符
a = os.urandom(32)
# 編碼爲base64
base64.b64encode(a)

>>>
b'2QDq4HSpT8U4W6iZ2xDzGW3CcY2WVsJXVEwYv0qludY='

六、 sys模塊

sys 模塊提供了許多函數和變量來處理 Python 運行時環境的不一樣部分.

sys.argv           命令行參數List，第一個元素是程序自己路徑
sys.exit(n)        退出程序，正常退出時exit(0)
sys.version        獲取Python解釋程序的版本信息
sys.maxint         最大的Int值
sys.path           返回模塊的搜索路徑，初始化時使用PYTHONPATH環境變量的值
sys.platform       返回操做系統平臺名稱
sys.stdout.write("sdads")

sys.modules   是一個字典，內部包含模塊名與模塊對象的映射，該字典決定了導入模塊時是否須要從新導入。
sys.setrecursionlimit()   遞歸的最大層數

七、 logging模塊

7.1 函數式簡單配置

import logging  
logging.debug('debug message')  
logging.info('info message')  
logging.warning('warning message')  
logging.error('error message')  
logging.critical('critical message')

默認狀況下Python的logging模塊將日誌打印到了標準輸出中，且只顯示了大於等於WARNING級別的日誌，這說明默認的日誌級別設置爲WARNING（日誌級別等級CRITICAL > ERROR > WARNING > INFO > DEBUG），默認的日誌格式爲日誌級別：Logger名稱：用戶輸出消息。

靈活配置日誌級別，日誌格式，輸出位置:

import logging  
logging.basicConfig(level=logging.DEBUG,  
                    format='%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(message)s',  
                    datefmt='%%Y-%m-%d,%H:%M:%S',  
                    filename='/tmp/test.log',  
                    filemode='w')  
  
logging.debug('debug message')  
logging.info('info message')  
logging.warning('warning message')  
logging.error('error message')  
logging.critical('critical message')


配置參數：

logging.basicConfig()函數中可經過具體參數來更改logging模塊默認行爲，可用參數有：

filename：用指定的文件名建立FiledHandler，這樣日誌會被存儲在指定的文件中。
filemode：文件打開方式，在指定了filename時使用這個參數，默認值爲「a」還可指定爲「w」。
format：指定handler使用的日誌顯示格式。
datefmt：指定日期時間格式。
level：設置rootlogger（後邊會講解具體概念）的日誌級別
stream：用指定的stream建立StreamHandler。能夠指定輸出到sys.stderr,sys.stdout或者文件(f=open(‘test.log’,’w’))，默認爲sys.stderr。若同時列出了filename和stream兩個參數，則stream參數會被忽略。

format參數中可能用到的格式化串：
%(name)s Logger的名字
%(levelno)s 數字形式的日誌級別
%(levelname)s 文本形式的日誌級別
%(pathname)s 調用日誌輸出函數的模塊的完整路徑名，可能沒有
%(filename)s 調用日誌輸出函數的模塊的文件名
%(module)s 調用日誌輸出函數的模塊名
%(funcName)s 調用日誌輸出函數的函數名
%(lineno)d 調用日誌輸出函數的語句所在的代碼行
%(created)f 當前時間，用UNIX標準的表示時間的浮 點數表示
%(relativeCreated)d 輸出日誌信息時的，自Logger建立以 來的毫秒數
%(asctime)s 字符串形式的當前時間。默認格式是 「2003-07-08 16:49:45,896」。逗號後面的是毫秒
%(thread)d 線程ID。可能沒有
%(threadName)s 線程名。可能沒有
%(process)d 進程ID。可能沒有
%(message)s用戶輸出的消息

禁用日誌

在調試完程序後，你可能不但願全部這些日誌消息出如今屏幕上。logging.disable() 函數禁用了這些消息，這樣就沒必要進入到程序中，手工刪除全部的日誌調用。只要向 logging.disable() 傳入一個日誌級別，它就會禁止該級別和更低級別的所有日誌消息。因此，若是想要禁用全部日誌，只要在程序中添加 logging. disable（logging.CRITICAL）。

import logging
logging.basicConfig(level=logging.INFO, format=' %(asctime)s -
%(levelname)s - %(message)s')
logging.critical('Critical error! Critical error!')
>>>2015-05-22 11:10:48,054 - CRITICAL - Critical error! Critical error!
logging.disable(logging.CRITICAL)
logging.critical('Critical error! Critical error!')
logging.error('Error! Error!')

由於 logging.disable() 將禁用它以後的全部消息，你可能但願將它添加到程序中接近 import logging 代碼行的位置。這樣就很容易找到它，根據須要註釋掉它，或取消註釋，從而啓用或禁用日誌消息。

#basicconfig 簡單，能作的事情相對少。

　　#中文亂碼問題，這個目前不知道怎麼解決。

　　#不能同時往文件和屏幕上輸出。

#配置log對象，稍微有點複雜，能作的事情相對多。

7.2 logger對象配置

若是想同時把log打印在屏幕和文件日誌裏，就須要瞭解一點複雜的知識了

Python 使用logging模塊記錄日誌涉及四個主要類，使用官方文檔中的歸納最爲合適：

logger提供了應用程序能夠直接使用的接口；
handler將(logger建立的)日誌記錄發送到合適的目的輸出；
filter提供了細度設備來決定輸出哪條日誌記錄；
formatter決定日誌記錄的最終輸出格式。

logger：

　　每一個程序在輸出信息以前都要得到一個Logger。Logger一般對應了程序的模塊名。

好比聊天工具的圖形界面模塊能夠這樣得到它的Logger：

LOG=logging.getLogger(」chat.gui」)

而核心模塊能夠這樣：
LOG=logging.getLogger(」chat.kernel」)

Logger.setLevel(lel):指定最低的日誌級別，低於lel的級別將被忽略。debug是最低的內置級別，critical爲最高
Logger.addFilter(filt)、Logger.removeFilter(filt):添加或刪除指定的filter
Logger.addHandler(hdlr)、Logger.removeHandler(hdlr)：增長或刪除指定的handler
Logger.debug()、Logger.info()、Logger.warning()、Logger.error()、Logger.critical()：能夠設置的日誌級別

handler

　　handler對象負責發送相關的信息到指定目的地。Python的日誌系統有多種Handler可使用。有些Handler能夠把信息輸出到控制檯，有些Logger能夠把信息輸出到文件，還有些 Handler能夠把信息發送到網絡上。若是以爲不夠用，還能夠編寫本身的Handler。能夠經過addHandler()方法添加多個多handler

Handler.setLevel(lel):指定被處理的信息級別，低於lel級別的信息將被忽略
Handler.setFormatter()：給這個handler選擇一個格式
Handler.addFilter(filt)、Handler.removeFilter(filt)：新增或刪除一個filter對象

　　每一個Logger能夠附加多個Handler。接下來咱們就來介紹一些經常使用的Handler：

　　1) logging.StreamHandler
　　使用這個Handler能夠向相似與sys.stdout或者sys.stderr的任何文件對象(file object)輸出信息。它的構造函數是：

StreamHandler([strm])
其中strm參數是一個文件對象。默認是sys.stderr

　　2) logging.FileHandler
　　和StreamHandler相似，用於向一個文件輸出日誌信息。不過FileHandler會幫你打開這個文件。它的構造函數是：

FileHandler(filename[,mode])
filename是文件名，必須指定一個文件名。
mode是文件的打開方式。參見Python內置函數open()的用法。默認是’a'，即添加到文件末尾。

　　3) logging.handlers.RotatingFileHandler
　　這個Handler相似於上面的FileHandler，可是它能夠管理文件大小。當文件達到必定大小以後，它會自動將當前日誌文件更名，而後建立一個新的同名日誌文件繼續輸出。好比日誌文件是chat.log。當chat.log達到指定的大小以後，RotatingFileHandler自動把文件更名爲chat.log.1。不過，若是chat.log.1已經存在，會先把chat.log.1重命名爲chat.log.2。。。最後從新建立 chat.log，繼續輸出日誌信息。它的構造函數是：

RotatingFileHandler( filename[, mode[, maxBytes[, backupCount]]])
其中filename和mode兩個參數和FileHandler同樣。
maxBytes用於指定日誌文件的最大文件大小。若是maxBytes爲0，意味着日誌文件能夠無限大，這時上面描述的重命名過程就不會發生。
backupCount用於指定保留的備份文件的個數。好比，若是指定爲2，當上面描述的重命名過程發生時，原有的chat.log.2並不會被改名，而是被刪除。

　　4) logging.handlers.TimedRotatingFileHandler
　　這個Handler和RotatingFileHandler相似，不過，它沒有經過判斷文件大小來決定什麼時候從新建立日誌文件，而是間隔必定時間就自動建立新的日誌文件。重命名的過程與RotatingFileHandler相似，不過新的文件不是附加數字，而是當前時間。它的構造函數是：

TimedRotatingFileHandler( filename [,when [,interval [,backupCount]]])
其中filename參數和backupCount參數和RotatingFileHandler具備相同的意義。
interval是時間間隔。

backupCount表示保留的文件個數。

when參數是一個字符串。表示時間間隔的單位，不區分大小寫。它有如下取值：
S 秒
M 分
H 小時
D 天
W 每星期（interval==0時表明星期一）
midnight 天天凌晨

實例一：

import logging

logger = logging.getLogger()
# 建立一個handler，用於寫入日誌文件
fh = logging.FileHandler('test.log'，encoding='utf-8')

# 再建立一個handler，用於輸出到控制檯
ch = logging.StreamHandler()
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
fh.setLevel(logging.DEBUG)

fh.setFormatter(formatter)
ch.setFormatter(formatter)
logger.addHandler(fh) #logger對象能夠添加多個fh和ch對象
logger.addHandler(ch)

logger.debug('logger debug message')
logger.info('logger info message')
logger.warning('logger warning message')
logger.error('logger error message')
logger.critical('logger critical message')

實例二：

import logging
 
#create logger
logger = logging.getLogger('TEST-LOG')
logger.setLevel(logging.DEBUG)
 
 
# create console handler and set level to debug
ch = logging.StreamHandler()
ch.setLevel(logging.DEBUG)
 
# create file handler and set level to warning
fh = logging.FileHandler("access.log")
fh.setLevel(logging.WARNING)
# create formatter
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
 
# add formatter to ch and fh
ch.setFormatter(formatter)
fh.setFormatter(formatter)
 
# add ch and fh to logger
logger.addHandler(ch)
logger.addHandler(fh)
 
# 'application' code
logger.debug('debug message')
logger.info('info message')
logger.warn('warn message')
logger.error('error message')
logger.critical('critical message')

其餘參考：日誌滾動和過時日誌刪除

八、序列化模塊

8.一、什麼是序列化？

什麼叫序列化——將本來的字典、列表等內容轉換成一個字符串的過程就叫作序列化。

咱們把對象(變量)從內存中變成可存儲或傳輸的過程稱之爲序列化。序列化以後，就能夠把序列化後的內容寫入磁盤，或者經過網絡傳輸到別的機器上。反過來，把變量內容從序列化的對象從新讀到內存裏稱之爲反序列化，即unpickling。

8.二、爲何要有序列化模塊

好比，咱們在python代碼中計算的一個數據須要給另一段程序使用，那咱們怎麼給？
如今咱們能想到的方法就是存在文件裏，而後另外一個python程序再從文件裏讀出來。
可是咱們都知道，對於文件來講是沒有字典這個概念的，因此咱們只能將數據轉換成字典放到文件中。
你必定會問，將字典轉換成一個字符串很簡單，就是str(dic)就能夠辦到了，爲何咱們還要學習序列化模塊呢？
沒錯序列化的過程就是從dic 變成str(dic)的過程。如今你能夠經過str(dic)，將一個名爲dic的字典轉換成一個字符串，
可是你要怎麼把一個字符串轉換成字典呢？
聰明的你確定想到了eval()，若是咱們將一個字符串類型的字典str_dic傳給eval，就會獲得一個返回的字典類型了。
eval()函數十分強大，可是eval是作什麼的？e官方demo解釋爲：將字符串str當成有效的表達式來求值並返回計算結果。
ＢＵＴ！強大的函數有代價。安全性是其最大的缺點。
想象一下，若是咱們從文件中讀出的不是一個數據結構，而是一句"刪除文件"相似的破壞性語句，那麼後果實在不堪設設想。
而使用eval就要擔這個風險。
因此，咱們並不推薦用eval方法來進行反序列化操做(將str轉換成python中的數據結構)

8.三、序列化的目的

一、以某種存儲形式使自定義對象持久化；

二、將對象從一個地方傳遞到另外一個地方。

三、使程序更具維護性。

8.四、json模塊

若是咱們要在不一樣的編程語言之間傳遞對象，就必須把對象序列化爲標準格式，好比XML，但更好的方法是序列化爲JSON，由於JSON表示出來就是一個字符串，能夠被全部語言讀取，也能夠方便地存儲到磁盤或者經過網絡傳輸。JSON不只是標準格式，而且比XML更快，並且能夠直接在Web頁面中讀取，很是方便。

JSON表示的對象就是標準的JavaScript語言的對象一個子集，JSON和Python內置的數據類型對應以下：

Json模塊提供了四個功能：dumps、dump、loads、load

loads和dumps

import json
dic = {'k1':'v1','k2':'v2','k3':'v3'}
str_dic = json.dumps(dic)  #序列化：將一個字典轉換成一個字符串
print(type(str_dic),str_dic)  #<class 'str'> {"k3": "v3", "k1": "v1", "k2": "v2"}
#注意，json轉換完的字符串類型的字典中的字符串是由""表示的

dic2 = json.loads(str_dic)  #反序列化：將一個字符串格式的字典轉換成一個字典
#注意，要用json的loads功能處理的字符串類型的字典中的字符串必須由""表示
print(type(dic2),dic2)  #<class 'dict'> {'k1': 'v1', 'k2': 'v2', 'k3': 'v3'}


list_dic = [1,['a','b','c'],3,{'k1':'v1','k2':'v2'}]
str_dic = json.dumps(list_dic) #也能夠處理嵌套的數據類型 
print(type(str_dic),str_dic) #<class 'str'> [1, ["a", "b", "c"], 3, {"k1": "v1", "k2": "v2"}]
list_dic2 = json.loads(str_dic)
print(type(list_dic2),list_dic2) #<class 'list'> [1, ['a', 'b', 'c'], 3, {'k1': 'v1', 'k2': 'v2'}]

load和dump

import json
f = open('json_file','w')
dic = {'k1':'v1','k2':'v2','k3':'v3'}
json.dump(dic,f)  #dump方法接收一個文件句柄，直接將字典轉換成json字符串寫入文件
f.close()

f = open('json_file')
dic2 = json.load(f)  #load方法接收一個文件句柄，直接將文件中的json字符串轉換成數據結構返回
f.close()
print(type(dic2),dic2)

ensure_ascii關鍵字參數

import json
f = open('file','w')
json.dump({'國籍':'中國'},f)
ret = json.dumps({'國籍':'中國'})
f.write(ret+'\n')
json.dump({'國籍':'美國'},f,ensure_ascii=False)
ret = json.dumps({'國籍':'美國'},ensure_ascii=False)
f.write(ret+'\n')
f.close()

其餘參數

Serialize obj to a JSON formatted str.(字符串表示的json對象) 
Skipkeys：默認值是False，若是dict的keys內的數據不是python的基本類型(str,unicode,int,long,float,bool,None)，設置爲False時，就會報TypeError的錯誤。此時設置成True，則會跳過這類key 
ensure_ascii:，當它爲True的時候，全部非ASCII碼字符顯示爲\uXXXX序列，只需在dump時將ensure_ascii設置爲False便可，此時存入json的中文便可正常顯示。) 
If check_circular is false, then the circular reference check for container types will be skipped and a circular reference will result in an OverflowError (or worse). 
If allow_nan is false, then it will be a ValueError to serialize out of range float values (nan, inf, -inf) in strict compliance of the JSON specification, instead of using the JavaScript equivalents (NaN, Infinity, -Infinity). 
indent：應該是一個非負的整型，若是是0就是頂格分行顯示，若是爲空就是一行最緊湊顯示，不然會換行且按照indent的數值顯示前面的空白分行顯示，這樣打印出來的json數據也叫pretty-printed json 
separators：分隔符，其實是(item_separator, dict_separator)的一個元組，默認的就是(‘,’,’:’)；這表示dictionary內keys之間用「,」隔開，而KEY和value之間用「：」隔開。 
default(obj) is a function that should return a serializable version of obj or raise TypeError. The default simply raises TypeError. 
sort_keys：將數據根據keys的值進行排序。 
To use a custom JSONEncoder subclass (e.g. one that overrides the .default() method to serialize additional types), specify it with the cls kwarg; otherwise JSONEncoder is used.

其餘參數

json的格式化輸出

import json
data = {'username':['李華','二愣子'],'sex':'male','age':16}
json_dic2 = json.dumps(data,sort_keys=True,indent=2,separators=(',',':'),ensure_ascii=False)
print(json_dic2)

關於json屢次寫入的問題

import json
json dump load
dic = {1:"中國",2:'b'}
f = open('fff','w',encoding='utf-8')
json.dump(dic,f,ensure_ascii=False)
json.dump(dic,f,ensure_ascii=False)
f.close()
f = open('fff',encoding='utf-8')
res1 = json.load(f)
res2 = json.load(f)
f.close()
print(type(res1),res1)
print(type(res2),res2)

----------------------------------------------------------------
Traceback (most recent call last):
  File "C:/Users/Administrator/Desktop/py/sss.py", line 13, in <module>
    res1 = json.load(f)
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\json\__init__.py", line 296, in load
    parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\json\__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\json\decoder.py", line 340, in decode
    raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 1 column 22 (char 21)

寫入的時候沒有問題，能夠屢次寫入，可是寫入後，{"1": "中國", "2": "b"}{"1": "中國", "2": "b"}，文件裏相似於這種格式，json無法讀取。

若是須要分次寫入和分次讀取，能夠以下操做

import json
l = [{'k':'111'},{'k2':'111'},{'k3':'111'}]
f = open('file','w')
import json
for dic in l:
    str_dic = json.dumps(dic)
    f.write(str_dic+'\n')
f.close()

f = open('file')
import json
l = []
for line in f:
    dic = json.loads(line.strip())
    l.append(dic)
f.close()
print(l)

分次讀寫實例二

import json

dic = {1:"中國",2:'b'}
f = open('fff','w',encoding='utf-8')
json.dump(dic,f,ensure_ascii=False)
f.write("\n") #只要分隔開，而後讀取的時候分行讀取就行。
json.dump(dic,f,ensure_ascii=False)
f.close()
f = open('fff',encoding='utf-8')
for i in f:
    res1 = json.loads(i)
    res2 = json.loads(i)
f.close()
print(type(res1),res1)
print(type(res2),res2)

8.五、pickle模塊

用於序列化的兩個模塊

json，用於字符串和 python數據類型間進行轉換
pickle，用於python特有的類型和 python的數據類型間進行轉換

pickle模塊提供了四個功能：dumps、dump(序列化，存）、loads（反序列化，讀）、load （不只能夠序列化字典，列表...能夠把python中任意的數據類型序列化）

##----------------------------序列化
import pickle
 
dic={'name':'alvin','age':23,'sex':'male'}
 
print(type(dic))#<class 'dict'>
 
j=pickle.dumps(dic)
print(type(j))#<class 'bytes'>
 
 
f=open('序列化對象_pickle','wb')#注意是w是寫入str,wb是寫入bytes,j是'bytes'
f.write(j)  #-------------------等價於pickle.dump(dic,f)
 
f.close()
#-------------------------反序列化
import pickle
f=open('序列化對象_pickle','rb')
 
data=pickle.loads(f.read())#  等價於data=pickle.load(f)
 
print(data['age'])

pickle自帶屢次寫入和讀取功能，先寫先讀

import time
struct_time1  = time.localtime(1000000000)
struct_time2  = time.localtime(2000000000)
f = open('pickle_file','wb')
pickle.dump(struct_time1,f)
pickle.dump(struct_time2,f)
f.close()
f = open('pickle_file','rb')
struct_time1 = pickle.load(f)
struct_time2 = pickle.load(f)
print(struct_time1.tm_year)
print(struct_time2.tm_year)
f.close()

8.六、shelve模塊

8.6.一、在已有json和pickle的狀況下，爲何用shelve？

　　使用json或者 pickle 持久化數據，能 dump 屢次，但 load 的話只能取到最新的 dump，由於先前的數據已經被後面 dump 的數據覆蓋掉了。若是想要實現 dump 屢次不被覆蓋，可使用 shelve 模塊。

8.6.二、shelve模塊的特色

　　shelve 是一個簡單的數據存儲方案，相似 key-value 數據庫，能夠很方便的保存 python 對象，其內部是經過 pickle 協議來實現數據序列化。shelve 只有一個 open() 函數，這個函數用於打開指定的文件（一個持久的字典），而後返回一個 shelf 對象。shelf 是一種持久的、相似字典的對象。

shelve 模塊能夠看作是 pickle 模塊的升級版，能夠持久化全部 pickle 所支持的數據類型，並且 shelve 比 pickle 提供的操做方式更加簡單、方便；
在 shelve 模塊中，key 必須爲字符串，而值能夠是 python 所支持的數據類型。
shelve 只提供給咱們一個 open 方法，是用 key 來訪問的，使用起來和字典相似。能夠像字典同樣使用get來獲取數據等。
shelve 模塊其實用 anydbm 去建立DB而且管理持久化對象的。

8.6.三、shelve的使用

持久化及解析內容

import shelve
f = shelve.open('shelve_file')
f['key'] = {'int':10, 'float':9.5, 'string':'Sample data'}  #直接對文件句柄操做，就能夠存入數據
f.close()

import shelve
f1 = shelve.open('shelve_file')
existing = f1['key']  #取出數據的時候也只須要直接用key獲取便可，可是若是key不存在會報錯
f1.close()
print(existing)

　　shelve模塊有個限制，它不支持多個應用同一時間往同一個DB（文件）進行寫操做。
　　因此若是隻需進行讀操做，能夠修改默認參數flag=’r’ 讓shelve經過只讀方式打開DB（文件）。

　　注：經測試，目前發現的是r模式在python2.7環境下能夠生效

import shelve
f = shelve.open('shelve_file', flag='r')
existing = f['key']
print(existing)

f.close()

f = shelve.open('shelve_file', flag='r')
existing2 = f['key']
f.close()
print(existing2)

通常狀況下，咱們經過shelve來open一個對象後，只能進行一次賦值處理，賦值後不能再次更新處理。（能夠整個從新賦值，可是不能作修改）

緣由：從shelve的db文件中從新再訪問一個key拿的是它的拷貝！修改此拷貝後不作拷貝寫回並不影響原來的key，但你要是直接作的操做是賦值新的值到一個key裏，那確定就是指向原來的key，會被覆蓋的。

因爲shelve在默認狀況下是不會記錄待持久化對象的任何修改的，因此咱們在shelve.open()時候須要修改默認參數，不然對象的修改不會保存。

import shelve
# f1 = shelve.open('shelve_file')
# print(f1['key'])
# f1['key']['new_value'] = 'this was not here before'
# f1.close()

f2 = shelve.open('shelve_file', writeback=True)
print(f2['key'])
# f2['key']['new_value'] = 'this was not here before'
f2.close()

　　writeback方式有優勢也有缺點。優勢是減小了咱們出錯的機率，而且讓對象的持久化對用戶更加的透明瞭；但這種方式並非全部的狀況都須要，首先，使用writeback之後，shelf在open()的時候回增長額外的內存消耗，而且當DB在close()的時候會將緩存中的每個對象都寫入到DB，這也會帶來額外的等待時間。由於shelve沒有辦法知道緩存中哪些對象修改了，哪些對象沒有修改，所以全部的對象都會被寫入

九、collections模塊

在內置數據類型（dict、list、set、tuple）的基礎上，collections模塊還提供了幾個額外的數據類型：Counter、deque、defaultdict、namedtuple和OrderedDict等。

1.namedtuple: 生成可使用名字來訪問元素內容的tuple

2.deque: 雙端隊列，能夠快速的從另一側追加和推出對象

3.Counter: 計數器，主要用來計數

4.OrderedDict: 有序字典

5.defaultdict: 帶有默認值的字典

9.1 namedtuple

我們知道tuple能夠表示不變集合，例如，一個點的二維座標就能夠表示成：

>>> p = (1, 2)

可是，看到(1, 2)，很難看出這個tuple是用來表示一個座標的。

這時，namedtuple就派上了用場：

>>> from collections import namedtuple
>>> Point = namedtuple('Point', ['x', 'y'])
>>> p = Point(1, 2)
>>> p.x
1
>>> p.y
2

類似的，若是要用座標和半徑表示一個圓，也能夠用namedtuple定義：

#namedtuple('名稱', [屬性list]):
Circle = namedtuple('Circle', ['x', 'y', 'r'])

9.2 deque

使用list存儲數據時，按索引訪問元素很快，可是插入和刪除元素就很慢了，由於list是線性存儲，數據量大的時候，插入和刪除效率很低。

deque是爲了高效實現插入和刪除操做的雙向列表，適合用於隊列和棧：

隊列
import queue
q = queue.Queue()
q.put([1,2,3])
q.put(5)
q.put(6)
print(q)
print(q.get())
print(q.get())
print(q.get())
print(q.get())   # 阻塞
print(q.qsize())

from collections import deque
dq = deque([1,2])
dq.append('a')   # 從後面放數據  [1,2,'a']
dq.appendleft('b') # 從前面放數據 ['b',1,2,'a']
dq.insert(2,3)    #['b',1,3,2,'a']
print(dq.pop())      # 從後面取數據
print(dq.pop())      # 從後面取數據
print(dq.popleft())  # 從前面取數據
print(dq)

deque除了實現list的append()和pop()外，還支持appendleft()和popleft()，這樣就能夠很是高效地往頭部添加或刪除元素。

9.3 OrderedDict

使用dict時，Key是無序的。在對dict作迭代時，咱們沒法肯定Key的順序。

若是要保持Key的順序，能夠用OrderedDict：

>>> from collections import OrderedDict
>>> d = dict([('a', 1), ('b', 2), ('c', 3)])
>>> d # dict的Key是無序的
{'a': 1, 'c': 3, 'b': 2}
>>> od = OrderedDict([('a', 1), ('b', 2), ('c', 3)])
>>> od # OrderedDict的Key是有序的
OrderedDict([('a', 1), ('b', 2), ('c', 3)])

注意，OrderedDict的Key會按照插入的順序排列，不是Key自己排序：

>>> od = OrderedDict()
>>> od['z'] = 1
>>> od['y'] = 2
>>> od['x'] = 3
>>> od.keys() # 按照插入的Key的順序返回
['z', 'y', 'x']

9.4 defaultdict

有以下值集合 [11,22,33,44,55,66,77,88,99,90...]，將全部大於 66 的值保存至字典的第一個key中，將小於 66 的值保存至第二個key的值中。

即： { 'k1' : 大於 66 , 'k2' : 小於 66 }

原生字典
values = [11, 22, 33,44,55,66,77,88,99,90]

my_dict = {}

for value in  values:
    if value>66:
        if my_dict.has_key('k1'):
            my_dict['k1'].append(value)
        else:
            my_dict['k1'] = [value]
    else:
        if my_dict.has_key('k2'):
            my_dict['k2'].append(value)
        else:
            my_dict['k2'] = [value]

defaultdict字典

from collections import defaultdict

values = [11, 22, 33,44,55,66,77,88,99,90]

my_dict = defaultdict(list)

for value in  values:
    if value>66:
        my_dict['k1'].append(value)
    else:
        my_dict['k2'].append(value)

使用dict時，若是引用的Key不存在，就會拋出KeyError。若是但願key不存在時，返回一個默認值，就能夠用defaultdict：

>>> from collections import defaultdict
>>> dd = defaultdict(lambda: 'N/A')
>>> dd['key1'] = 'abc'
>>> dd['key1'] # key1存在
'abc'
>>> dd['key2'] # key2不存在，返回默認值
'N/A'

Counter

Counter類的目的是用來跟蹤值出現的次數。它是一個無序的容器類型，以字典的鍵值對形式存儲，其中元素做爲key，其計數做爲value。計數值能夠是任意的Interger（包括0和負數）。Counter類和其餘語言的bags或multisets很類似。

c = Counter('abcdeabcdabcaba')
print c
輸出：Counter({'a': 5, 'b': 4, 'c': 3, 'd': 2, 'e': 1})

其餘詳細內容 http://www.cnblogs.com/Eva-J/articles/7291842.html

十、re模塊

import re

ret = re.findall('a', 'eva egon yuan')  # 返回全部知足匹配條件的結果,放在列表裏
print(ret) #結果 : ['a', 'a']

ret = re.search('a', 'eva egon yuan').group()
print(ret) #結果 : 'a'
# 函數會在字符串內查找模式匹配,只到找到第一個匹配而後返回一個包含匹配信息的對象,該對象能夠
# 經過調用group()方法獲得匹配的字符串,若是字符串沒有匹配，則返回None。

ret = re.match('a', 'abc').group()  
print(ret)
# match是從頭開始匹配，若是正則規則從頭開始能夠匹配上，
#就返回一個變量。匹配的內容須要用group才能顯示，若是沒匹配上，就返回None，調用group會報錯
#結果 : 'a'

ret = re.split('[ab]', 'abcd')  # 先按'a'分割獲得''和'bcd',在對''和'bcd'分別按'b'分割
print(ret)  # ['', '', 'cd']

ret = re.sub('\d', 'H', 'eva3egon4yuan4', 1)#將數字替換成'H'，參數1表示只替換1個
print(ret) #evaHegon4yuan4

ret = re.subn('\d', 'H', 'eva3egon4yuan4')#將數字替換成'H'，返回元組(替換的結果,替換了多少次)
print(ret)   結果：('evaHegonHyuanH', 3)

obj = re.compile('\d{3}')  #將正則表達式編譯成爲一個 正則表達式對象，規則要匹配的是3個數字
ret = obj.search('abc123eeee') #正則表達式對象調用search，參數爲待匹配的字符串
print(ret.group())  #結果 ： 123

flags有不少可選值：

re.I(IGNORECASE)忽略大小寫，括號內是完整的寫法
re.M(MULTILINE)多行模式，改變^和$的行爲 re.S(DOTALL)點能夠匹配任意字符，包括換行符 re.L(LOCALE)作本地化識別的匹配，表示特殊字符集 \w, \W, \b, \B, \s, \S 依賴於當前環境，不推薦使用 re.U(UNICODE) 使用\w \W \s \S \d \D使用取決於unicode定義的字符屬性。在python3中默認使用該flag re.X(VERBOSE)冗長模式，該模式下pattern字符串能夠是多行的，忽略空白字符，並能夠添加註釋
 phoneRegex = re.compile(r"""( (\d{3}|\(\d{3}\))？ #註釋一 (\s|-|\.)? #註釋二 ...... #註釋三 )""",re.VERBOSE)

import re
ret = re.finditer('\d', 'ds3sy4784a')   #finditer返回一個存放匹配結果的迭代器
print(ret)  # <callable_iterator object at 0x10195f940>
print(next(ret).group())  #查看第一個結果
print(next(ret).group())  #查看第二個結果
print([i.group() for i in ret])  #查看剩餘的左右結果

注意：

1 findall的優先級查詢：

import re

ret = re.findall('www.(baidu|oldboy).com', 'www.oldboy.com')
print(ret)  # ['oldboy']     這是由於findall會優先把匹配結果組裏內容返回,若是想要匹配結果,取消權限便可

ret = re.findall('www.(?:baidu|oldboy).com', 'www.oldboy.com')
print(ret)  # ['www.oldboy.com']

2 split的優先級查詢

ret=re.split("\d+","eva3egon4yuan")
print(ret) #結果 ： ['eva', 'egon', 'yuan']

ret=re.split("(\d+)","eva3egon4yuan")
print(ret) #結果 ： ['eva', '3', 'egon', '4', 'yuan']

#在匹配部分加上（）以後所切出的結果是不一樣的，
#沒有（）的沒有保留所匹配的項，可是有（）的卻可以保留了匹配的項，
#這個在某些須要保留匹配部分的使用過程是很是重要的。

練習

一、匹配標籤

import re


ret = re.search("<(?P<tag_name>\w+)>\w+</(?P=tag_name)>","<h1>hello</h1>")
#還能夠在分組中利用?<name>的形式給分組起名字
#獲取的匹配結果能夠直接用group('名字')拿到對應的值
print(ret.group('tag_name'))  #結果 ：h1
print(ret.group())  #結果 ：<h1>hello</h1>

ret = re.search(r"<(\w+)>\w+</\1>","<h1>hello</h1>")
#若是不給組起名字，也能夠用\序號來找到對應的組，表示要找的內容和前面的組內容一致
#獲取的匹配結果能夠直接用group(序號)拿到對應的值
print(ret.group(1))
print(ret.group())  #結果 ：<h1>hello</h1>

二、匹配整數

import re

ret=re.findall(r"\d+","1-2*(60+(-40.35/5)-(-4*3))")
print(ret) #['1', '2', '60', '40', '35', '5', '4', '3']
ret=re.findall(r"-?\d+\.\d*|(-?\d+)","1-2*(60+(-40.35/5)-(-4*3))")
print(ret) #['1', '-2', '60', '', '5', '-4', '3']
ret.remove("")
print(ret) #['1', '-2', '60', '5', '-4', '3']

三、數字匹配

1、 匹配一段文本中的每行的郵箱
y='123@qq.comaaa@163.combbb@126.comasdfasfs33333@adfcom'
import re
ret=re.findall('\w+@(?:qq|163|126).com',y)
print(ret)

二、 匹配一段文本中的每行的時間字符串，好比：‘1990-07-12’；
time='asfasf1990-07-12asdfAAAbbbb434241'
import re
ret=re.search(r'(?P<year>19[09]\d)-(?P<month>\d+)-(?P<days>\d+)',time)
print(ret.group('year'))
print(ret.group('month'))
print(ret.group('days'))

# 三、 匹配一段文本中全部的身份證數字。
a='sfafsf,34234234234,1231313132,154785625475896587,sdefgr54184785ds85,4864465asf86845'
import re
ret=re.findall('\d{18}',a)
print(ret)

# 四、 匹配qq號。(騰訊QQ號從10000開始)  ［1,9］[0,9]{4,}
q='3344,88888,7778957,10000,99999,414,4,867287672'
import re
ret=re.findall('[1-9][0-9]{4,}',q)
print(ret)

# 五、 匹配一個浮點數
import re
ret=re.findall('-?\d+\.?\d*','-1,-2.5,8.8,1,0')
print(ret)

# 六、 匹配漢字。             ^[\u4e00-\u9fa5]{0,}$
import re
ret=re.findall('[\u4e00-\u9fa5]{0,}','的沙發斯蒂芬')
print(ret)

# 七、 匹配出全部整數
a='1,-3,a,-2.5,7.7,asdf'
import re
ret=re.findall(r"'(-?\d+)'",str(re.split(',',a)))
print(ret)

四、爬蟲練習

import re
import json
from urllib.request import urlopen

def getPage(url):
    response = urlopen(url)
    return response.read().decode('utf-8')

def parsePage(s):
    com = re.compile(
        '<div class="item">.*?<div class="pic">.*?<em .*?>(?P<id>\d+).*?<span class="title">(?P<title>.*?)</span>'
        '.*?<span class="rating_num" .*?>(?P<rating_num>.*?)</span>.*?<span>(?P<comment_num>.*?)評價</span>', re.S)

    ret = com.finditer(s)
    for i in ret:
        yield {
            "id": i.group("id"),
            "title": i.group("title"),
            "rating_num": i.group("rating_num"),
            "comment_num": i.group("comment_num"),
        }


def main(num):
    url = 'https://movie.douban.com/top250?start=%s&filter=' % num
    response_html = getPage(url)
    ret = parsePage(response_html)
    print(ret)
    f = open("move_info7", "a", encoding="utf8")

    for obj in ret:
        print(obj)
        data = str(obj)
        f.write(data + "\n")

count = 0
for i in range(10):
    main(count)
    count += 25

簡化版

import requests

import re
import json

def getPage(url):

    response=requests.get(url)
    return response.text

def parsePage(s):
    
    com=re.compile('<div class="item">.*?<div class="pic">.*?<em .*?>(?P<id>\d+).*?<span class="title">(?P<title>.*?)</span>'
                   '.*?<span class="rating_num" .*?>(?P<rating_num>.*?)</span>.*?<span>(?P<comment_num>.*?)評價</span>',re.S)

    ret=com.finditer(s)
    for i in ret:
        yield {
            "id":i.group("id"),
            "title":i.group("title"),
            "rating_num":i.group("rating_num"),
            "comment_num":i.group("comment_num"),
        }

def main(num):

    url='https://movie.douban.com/top250?start=%s&filter='%num
    response_html=getPage(url)
    ret=parsePage(response_html)
    print(ret)
    f=open("move_info7","a",encoding="utf8")

    for obj in ret:
        print(obj)
        data=json.dumps(obj,ensure_ascii=False)
        f.write(data+"\n")

if __name__ == '__main__':
    count=0
    for i in range(10):
        main(count)
        count+=25

複雜版

flags有不少可選值：

re.I(IGNORECASE)忽略大小寫，括號內是完整的寫法
re.M(MULTILINE)多行模式，改變^和$的行爲
re.S(DOTALL)點能夠匹配任意字符，包括換行符
re.L(LOCALE)作本地化識別的匹配，表示特殊字符集 \w, \W, \b, \B, \s, \S 依賴於當前環境，不推薦使用
re.U(UNICODE) 使用\w \W \s \S \d \D使用取決於unicode定義的字符屬性。在python3中默認使用該flag
re.X(VERBOSE)冗長模式，該模式下pattern字符串能夠是多行的，忽略空白字符，並能夠添加註釋

phoneRegex = re.compile(r"""(
(\d{3}|\(\d{3}\))？ #註釋一
(\s|-|\.)? #註釋二
...... #註釋三
)""",re.VERBOSE)

十一、 configparser模塊

該模塊適用於配置文件的格式與windows ini文件相似，能夠包含一個或多個節（section），每一個節能夠有多個參數（鍵=值）。

建立文件

來看一個好多軟件的常見文檔格式以下：

[DEFAULT]
ServerAliveInterval = 45
Compression = yes
CompressionLevel = 9
ForwardX11 = yes
  
[bitbucket.org]
User = hg
  
[topsecret.server.com]
Port = 50022
ForwardX11 = no

若是想用python生成一個這樣的文檔怎麼作呢？

import configparser

config = configparser.ConfigParser()

config["DEFAULT"] = {'ServerAliveInterval': '45',
                      'Compression': 'yes',
                     'CompressionLevel': '9',
                     'ForwardX11':'yes'
                     }

config['bitbucket.org'] = {'User':'hg'}

config['topsecret.server.com'] = {'Host Port':'50022','ForwardX11':'no'}

with open('example.ini', 'w') as configfile:

   config.write(configfile)

查找文件

import configparser

config = configparser.ConfigParser()

#---------------------------查找文件內容,基於字典的形式

print(config.sections())        #  []

config.read('example.ini')

print(config.sections())        #   ['bitbucket.org', 'topsecret.server.com']

print('bytebong.com' in config) # False
print('bitbucket.org' in config) # True


print(config['bitbucket.org']["user"])  # hg

print(config['DEFAULT']['Compression']) #yes

print(config['topsecret.server.com']['ForwardX11'])  #no


print(config['bitbucket.org'])          #<Section: bitbucket.org>

for key in config['bitbucket.org']:     # 注意,有default會默認default的鍵
    print(key)

print(config.options('bitbucket.org'))  # 同for循環,找到'bitbucket.org'下全部鍵

print(config.items('bitbucket.org'))    #找到'bitbucket.org'下全部鍵值對

print(config.get('bitbucket.org','compression')) # yes       get方法取深層嵌套的值

增刪改操做

import configparser

config = configparser.ConfigParser()

config.read('example.ini')

config.add_section('yuan')



config.remove_section('bitbucket.org')
config.remove_option('topsecret.server.com',"forwardx11")


config.set('topsecret.server.com','k1','11111')
config.set('yuan','k2','22222')

config.write(open('new2.ini', "w"))

十二、 subprocess模塊

當咱們須要調用系統的命令的時候，最早考慮的os模塊。用os.system()和os.popen()來進行操做。可是這兩個命令過於簡單，不能完成一些複雜的操做，如給運行的命令提供輸入或者讀取命令的輸出，判斷該命令的運行狀態，管理多個命令的並行等等。這時subprocess中的Popen命令就能有效的完成咱們須要的操做。

subprocess模塊容許一個進程建立一個新的子進程，經過管道鏈接到子進程的stdin/stdout/stderr，獲取子進程的返回值等操做。

The subprocess module allows you to spawn new processes, connect to their input/output/error pipes, and obtain their return codes.

This module intends to replace several other, older modules and functions, such as: os.system、os.spawn*、os.popen*、popen2.*、commands.*

這個模塊只一個類：Popen。

簡單命令

import subprocess

#  建立一個新的進程,與主進程不一樣步  if in win: s=subprocess.Popen('dir',shell=True)
s=subprocess.Popen('ls')
s.wait()                  # s是Popen的一個實例對象，等待ls命令完成再執行ending

print('ending...')

命令帶參數

linux：

import subprocess

subprocess.Popen('ls -l',shell=True)

#subprocess.Popen(['ls','-l'])

shell = True(windows必須有)

shell=True參數會讓subprocess.call接受字符串類型的變量做爲命令，並調用shell去執行這個字符串，當shell=False是，subprocess.call只接受數組變量做爲命令，並將數組的第一個元素做爲命令，剩下的所有做爲該命令的參數。

舉個例子來講明:

from subprocess import call  
import shlex  
cmd = "cat test.txt; rm test.txt"  
call(cmd, shell=True)

上述腳本中，shell=True的設置，最終效果是執行了兩個命令

cat test.txt 和 rm test.txt

把shell=True 改成False，

from subprocess import call  
import shlex  
cmd = "cat test.txt; rm test.txt"  
cmd = shlex(cmd)  
call(cmd, shell=False)

則調用call的時候，只會執行cat的命令，且把 "test.txt;" "rm" "test.txt" 三個字符串看成cat的參數，因此並非咱們直觀看到的好像有兩個shell命令了。

也許你會說，shell=True 不是很好嗎，執行兩個命令就是我指望的呀。但其實，這種作法是不安全的，由於多個命令用分號隔開，萬一檢查不夠仔細，執行了危險的命令好比 rm -rf / 這種那後果會很是嚴重，而使用shell=False就能夠避免這種風險。

整體來講，看實際須要而定，官方的推薦是儘可能不要設置shell=True。

控制子進程

當咱們想要更個性化咱們的需求的時候，就要轉向Popen類，該類生成的對象用來表明子進程。剛纔咱們使用到了一個wait方法

此外，你還能夠在父進程中對子進程進行其它操做：

s.poll() # 檢查子進程狀態
s.kill() # 終止子進程
s.send_signal() # 向子進程發送信號
s.terminate() # 終止子進程

s.pid:子進程號

子進程的文本流控制

能夠在Popen()創建子進程的時候改變標準輸入、標準輸出和標準錯誤，並能夠利用subprocess.PIPE將多個子進程的輸入和輸出鏈接在一塊兒，構成管道(pipe)：

import subprocess

# s1 = subprocess.Popen(["ls","-l"], stdout=subprocess.PIPE)
# print(s1.stdout.read())



#s2.communicate()

s1 = subprocess.Popen(["cat","/etc/passwd"], stdout=subprocess.PIPE)
s2 = subprocess.Popen(["grep","0:0"],stdin=s1.stdout, stdout=subprocess.PIPE)
out = s2.communicate()

print(out)

subprocess.PIPE實際上爲文本流提供一個緩存區。s1的stdout將文本輸出到緩存區，隨後s2的stdin從該PIPE中將文本讀取走。s2的輸出文本也被存放在PIPE中，直到communicate()方法從PIPE中讀取出PIPE中的文本。
注意：communicate()是Popen對象的一個方法，該方法會阻塞父進程，直到子進程完成

快捷API

'''
subprocess.call()

父進程等待子進程完成
返回退出信息(returncode，至關於Linux exit code)


subprocess.check_call()
父進程等待子進程完成
返回0,檢查退出信息，若是returncode不爲0，則舉出錯誤subprocess.CalledProcessError，該對象包含
有returncode屬性，可用try…except…來檢查


subprocess.check_output()
父進程等待子進程完成
返回子進程向標準輸出的輸出結果
檢查退出信息，若是returncode不爲0，則舉出錯誤subprocess.CalledProcessError，該對象包含
有returncode屬性和output屬性，output屬性爲標準輸出的輸出結果，可用try…except…來檢查。


'''

爲何要用subprocess模塊，由於能夠把stdout和stderr分開讀取。

import subprocess
res = subprocess.Popen('dir',shell=True,stdout=subprocess.PIPE,stderr=subprocess.PIPE)
print(res.stdout.read().decode('gbk'))
print(res.stderr.read().decode('gbk'))

1三、base64

　　base64模塊是用來做base64編碼解碼，經常使用於小型數據的傳輸。編碼後的數據是一個字符串，其包括a-z、A-Z、0-九、/、+共64個字符，便可用6個字節表示，寫出數值就是0-63.故三個字節編碼的話就變成了4個字節，若是數據字節數不是3的倍數，就不能精確地劃分6位的塊，此時須要在原數據後添加1個或2個零值字節，使其字節數爲3的倍數，而後在編碼後的字符串後添加1個或2個‘=’，表示零值字節，故事實上總共由65個字符組成。

['A', 'B', 'C', ... 'a', 'b', 'c', ... '0', '1', ... '+', '/'] + 「=」

　　而後，對二進制數據進行處理，每3個字節一組，一共是3x8=24bit，劃爲4組，每組正好6個bit：

將3個字節的‘Xue‘進行base64編碼：

將2個字節’Xu‘進行base64編碼：

將1個字節’X'進行base64編碼：

　　base64模塊真正用的上的方法只有8個，分別是encode, decode, encodestring, decodestring, b64encode,b64decode, urlsafe_b64decode,urlsafe_b64encode。他們8個能夠兩兩分爲4組，encode,decode一組，專門用來編碼和解碼文件的,也能夠StringIO裏的數據作編解碼；encodestring,decodestring一組，專門用來編碼和解碼字符串； b64encode和b64decode一組，用來編碼和解碼字符串，而且有一個替換符號字符的功能；urlsafe_b64encode和urlsafe_b64decode一組，這個就是用來專門對url進行base64編解碼的。

13.一、代碼實例

13.1.一、b64encode和b64decode：對字符串操做

import base64
 
st = 'hello world!'.encode()#默認以utf8編碼
res = base64.b64encode(st)
print(res.decode())#默認以utf8解碼
res = base64.b64decode(res)
print(res.decode())#默認以utf8解碼


>>>
aGVsbG8gd29ybGQh
hello world!

import os, base64

# 圖片裝換
with open("./robot.png", "rb") as f:
    # 將讀取的二進制文件轉換爲base64字符串
    bs64_str = base64.b64encode(f.read())
    # 打印圖像轉換base64格式的字符串,type結果爲<class 'bytes'>
    print(bs64_str, type(bs64_str))
    # 將base64格式的數據裝換爲二進制數據
    imgdata = base64.b64decode(bs64_str)
    # 將二進制數據裝換爲圖片
    with open("./robot2.png", "wb") as f2:
        f2.write(imgdata)

13.1.二、encode和code

對文件操做，有兩個參數，一個是input，一個是output。

import base64
import io
 
st = "hello world!"
f = io.StringIO() #建立文件
out1 = io.StringIO()
out2 = io.StringIO()
f.write(st)
f.seek(0)
base64.encode(f,out1)
print(out1.getvalue())
out1.seek(0)
base64.decode(out1,out2)
print(out2.getvalue())

1四、csv

一、CSV介紹

　　CSV，全稱爲Comma-Separated Values,它以逗號分隔值，其文件以純文本形式存儲表格數據，該文件是一個字符序列，能夠由任意數目的記錄組成，每條記錄有字段組成，字段間分隔符是逗號或製表符，至關於結構化的純文本形式，它比Excel文件更簡潔，用來存儲數據比較方便

二、CSV經常使用類與方法

csv.reader(csvfile,dialect='excel',**fmtparams)

　　遍歷CSV文件對象並返回，csvfiel能夠是任何支持迭代器協議的對象，若是csvfile是一個文件對象，它須要指定newline=''

csv.writer(csvfile,dialect='excel',**fmtparams)

　　寫入數據到csv文件中，csvfile能夠是具備寫入方法的任何對象，若是csvfiel是一個文件對象，應該用newline=''指定換行符(unix上位'\n'，windows上位'\r\n')#!/usr/bin/env python

# -*- coding: utf-8 -*-
# @Time    : 2018/6/27 11:44
# @Author  : Py.qi
# @File    : csv_file1.py
# @Software: PyCharm
import csv
iterable=[['1','zs',20,8998,20180627],['1','zs',20,8998,20180627],['1','zs',20,8998,20180627]]

with open('csvfile.csv','w',newline='') as csvf:
    spanwriter=csv.writer(csvf,dialect='excel')   #建立writer對象
    spanwriter.writerow(['id','name','age','salary','date'])  #使用writer的方法writerow寫入到文件
    spanwriter.writerows(iterable)  #迭代寫入數據

with open('csvfile.csv','r',newline='') as csvf:
    spamreader=csv.reader(csvf)  #建立reader對象
    for i in spamreader:
        print('\t'.join(i))   #指定分隔符，讀取csv文件數據
　　　　# for j in i:
　　　　#　　  print("%-10s"%j,end="")
　　　　#　print("\n")



#
id    name    age    salary    date
1    zs    20    8998    20180627
1    zs    20    8998    20180627
1    zs    20    8998    20180627

class csv.DictReader(f, fieldnames=None, restkey=None, restval=None, dialect='excel', *args, **kwds)

　　以字典的形式讀取csv文件的行，fileldnames參數指定鍵，restkey指定默認key，restval指定默認value，dialect指定方言

class csv.DictWriter(f, fieldnames, restval='', extrasaction='raise', dialect='excel', *args, **kwds)

　　建立一個字典形式的csv操做對象，f爲寫入的文件，fieldnames爲制定key，restval制定默認value，extrasaction表示，若是在使用writerow()方法寫入數據的字典中字段名找不到的鍵，則此參數將執行操做，若是設置爲saise則會引起valueError，若是設置爲ignore則字典中的額外值將被忽略

reader對象，DictReader實例和reader()對象具備的方法和屬性：

　　csvreader.__next__()：迭代讀取對象的下一行

　　csvreader.dialect：解析器使用的方言

　　csvreader.line_num：從源迭代器讀取的行數

　　csvreader.fieldnames：若是在建立對象時爲做爲參數傳遞，則在首次訪問文件或讀取第一條記錄是初始化此屬性，此屬性只適用於DictReader對象

writer對象，DictWriter和writer()實例對象具備的方法和屬性：

　　csvwriter.writerow()：將行參數寫入到文件對象，根據當前的方言格式化

　　csvwriter.writerows(row)：將row中的全部元素，行對象的迭代寫入到文件對象

　　csvwriter.dialect：解析器使用的方言

　　DictWriter.writeheader()：寫入一行字段名,只適用於DictWriter對象

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time    : 2018/6/27 12:13
# @Author  : Py.qi
# @File    : csv_file2.py
# @Software: PyCharm
import csv
import pandas

iter=[
    {'id':2,'name':'wanwu','age':23,'date':20180627},
    {'id':3,'name':'zhaoliu','age':24,'date':20180627},
    {'id':4,'name':'tianqi','age':25,'date':20180627}
]
#寫入文件
with open('names.csv','w',newline='') as csvf:
    fieldnames=['id','name','age','date']
    writer=csv.DictWriter(csvf,fieldnames=fieldnames)
    writer.writeheader()
    writer.writerow({'id':1,'name':'lisii','age':22,'date':20180627})
    writer.writerows(iter)

#讀取文件
with open('names.csv','r') as csvf:
    reader=csv.DictReader(csvf,fieldnames=fieldnames)
    for i in reader:
        print(i['id'],i['name'],i['age'],i['date'])

#也可使用pandas模塊來讀取csv文件
df=pandas.read_csv('names.csv')
print(df)

#
id name age date
1 lisii 22 20180627
2 wanwu 23 20180627
3 zhaoliu 24 20180627
4 tianqi 25 20180627

   id     name  age      date
0   1    lisii   22  20180627
1   2    wanwu   23  20180627
2   3  zhaoliu   24  20180627
3   4   tianqi   25  20180627

1五、paramiko

　　paramiko實現了SSHv2協議(底層使用cryptography)。有了Paramiko之後，咱們就能夠在Python代碼中直接使用SSH協議對遠程服務器執行操做，而不是經過ssh命令對遠程服務器進行操做。

pip3 install paramiko

15.一、Paramiko介紹

paramiko包含兩個核心組件：SSHClient 和 SFTPClient。

SSHClient的做用相似於Linux的ssh命令，是對SSH會話的封裝，該類封裝了傳輸(Transport)，通道(Channel)及SFTPClient創建的方法(open_sftp)，一般用於執行遠程命令。
SFTPClient的做用相似與Linux的sftp命令，是對SFTP客戶端的封裝，用以實現遠程文件操做，如文件上傳、下載、修改文件權限等操做。

# Paramiko中的幾個基礎名詞：
 
1、Channel：是一種類Socket，一種安全的SSH傳輸通道；
2、Transport：是一種加密的會話，使用時會同步建立了一個加密的Tunnels(通道)，這個Tunnels叫作Channel；
三、Session：是client與Server保持鏈接的對象，用connect()/start_client()/start_server()開始會話。

15.二、Paramiko基本使用

基於用戶名和密碼的 sshclient 方式登陸

import paramiko
# 建立SSH對象（實例化SSHClient）
ssh = paramiko.SSHClient()

# 加載本地HOSTS主機文件
ssh.load_system_host_keys() 

# 容許鏈接不在know_hosts文件中的主機
# 自動添加策略，保存服務器的主機名和密鑰信息，若是不添加，那麼再也不本地know_hosts文件中記錄的主機將沒法鏈接
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())

# 鏈接服務器
ssh.connect(hostname='192.168.199.146', port=22, username='fishman', password='9')

# 執行命令
stdin, stdout, stderr = ssh.exec_command('df')

# 獲取命令結果
res,err = stdout.read(),stderr.read()
result = res if res else err
print(result.decode())
 
# 關閉鏈接
ssh.close()

基於祕鑰連接登陸

# 配置私人密鑰文件位置
private = paramiko.RSAKey.from_private_key_file('/Users/ch/.ssh/id_rsa')
 
#實例化SSHClient
client = paramiko.SSHClient()
 
#自動添加策略，保存服務器的主機名和密鑰信息，若是不添加，那麼再也不本地know_hosts文件中記錄的主機將沒法鏈接
client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
 
#鏈接SSH服務端，以用戶名和密碼進行認證
client.connect(hostname='10.0.0.1',port=22,username='root',pkey=private)

基於用戶名和密碼的 transport 方式登陸

　　基於SSHClient是傳統的鏈接服務器、執行命令、關閉的一個操做，有時候須要登陸上服務器執行多個操做，好比執行命令、上傳/下載文件，上面方法則沒法實現，能夠經過以下方式來操做。

　　SSHClient()裏面有一個transport變量，這個是用於獲取鏈接的，所以咱們也能夠單獨的獲取到transport變量，而後執行鏈接操做

#SSHClient 封裝 Transport

import paramiko

# 實例化一個transport對象
transport = paramiko.Transport(('192.168.199.146', 22))

# 創建鏈接
transport.connect(username='fishman', password='9')

# 將sshclient的對象的transport指定爲以上的transport
ssh = paramiko.SSHClient()
ssh._transport = transport

# 執行命令，和傳統方法同樣
stdin, stdout, stderr = ssh.exec_command('df')
print (stdout.read().decode())

# 關閉鏈接
transport.close()

基於祕鑰的 transport 方式登陸

import paramiko

private_key = paramiko.RSAKey.from_private_key_file('/Users/ljf/.ssh/id_rsa')

# 實例化一個transport對象
transport = paramiko.Transport(('192.168.199.146', 22))

# 創建鏈接
transport.connect(username='fishman', pkey=private_key)
ssh = paramiko.SSHClient()
ssh._transport = transport
 
# 執行命令及獲取結果
stdin, stdout, stderr = ssh.exec_command('df')
res,err = stdout.read(),stderr.read()
result = res if res else err
print(result.decode())

# 關閉鏈接
transport.close()

SFTPClient用於鏈接遠程服務器並執行上傳下載-----基於用戶名密碼

import paramiko

# 實例化一個trans對象# 實例化一個transport對象
transport = paramiko.Transport(('192.168.199.146', 22))

# 創建鏈接
transport.connect(username='fishman', password='9')
# 實例化一個 sftp對象,指定鏈接的通道
sftp = paramiko.SFTPClient.from_transport(transport)
 
#LocalFile.txt 上傳至服務器 /home/fishman/test/remote.txt
sftp.put('LocalFile.txt', '/home/fishman/test/remote.txt')

# 將LinuxFile.txt 下載到本地 fromlinux.txt文件中
sftp.get('/home/fishman/test/LinuxFile.txt', 'fromlinux.txt')
transport.close()

SFTPClient用於鏈接遠程服務器並執行上傳下載-----基於祕鑰

import paramiko

private_key = paramiko.RSAKey.from_private_key_file('/Users/ljf/.ssh/id_rsa')
transport = paramiko.Transport(('192.168.199.146', 22))
transport.connect(username='fishman', password='9')
sftp = paramiko.SFTPClient.from_transport(transport)
 
# LocalFile.txt 上傳至服務器 /home/fishman/test/remote.txt
sftp.put('LocalFile.txt', '/home/fishman/test/remote.txt')
# 將LinuxFile.txt 下載到本地 fromlinux.txt文件中
sftp.get('/home/fishman/test/LinuxFile.txt', 'fromlinux.txt')
 
transport.close()

綜合實例

class SSHConnection(object):
 
    def __init__(self, host_dict):
        self.host = host_dict['host']
        self.port = host_dict['port']
        self.username = host_dict['username']
        self.pwd = host_dict['pwd']
        self.__k = None
 
    def connect(self):
        transport = paramiko.Transport((self.host,self.port))
        transport.connect(username=self.username,password=self.pwd)
        self.__transport = transport
 
    def close(self):
        self.__transport.close()
 
    def run_cmd(self, command):
        """
         執行shell命令,返回字典
         return {'color': 'red','res':error}或
         return {'color': 'green', 'res':res}
        :param command:
        :return:
        """
        ssh = paramiko.SSHClient()
        ssh._transport = self.__transport
        # 執行命令
        stdin, stdout, stderr = ssh.exec_command(command)
        # 獲取命令結果
        res = unicode_utils.to_str(stdout.read())
        # 獲取錯誤信息
        error = unicode_utils.to_str(stderr.read())
        # 若是有錯誤信息，返回error
        # 不然返回res
        if error.strip():
            return {'color':'red','res':error}
        else:
            return {'color': 'green', 'res':res}
 
    def upload(self,local_path, target_path):
        # 鏈接，上傳
        sftp = paramiko.SFTPClient.from_transport(self.__transport)
        # 將location.py 上傳至服務器 /tmp/test.py
        sftp.put(local_path, target_path, confirm=True)
        # print(os.stat(local_path).st_mode)
        # 增長權限
        # sftp.chmod(target_path, os.stat(local_path).st_mode)
        sftp.chmod(target_path, 0o755)  # 注意這裏的權限是八進制的，八進制須要使用0o做爲前綴
 
    def download(self,target_path, local_path):
        # 鏈接，下載
        sftp = paramiko.SFTPClient.from_transport(self.__transport)
        # 將location.py 下載至服務器 /tmp/test.py
        sftp.get(target_path, local_path)
 
    # 銷燬
    def __del__(self):
        self.close()
 
　　
#unicode_utils.py
def to_str(bytes_or_str):
    """
    把byte類型轉換爲str
    :param bytes_or_str:
    :return:
    """
    if isinstance(bytes_or_str, bytes):
        value = bytes_or_str.decode('utf-8')
    else:
        value = bytes_or_str
    return value

實現輸入命令立馬返回結果的功能

　　以上操做都是基本的鏈接，若是咱們想實現一個相似xshell工具的功能，登陸之後能夠輸入命令回車後就返回結果：

import paramiko
import os
import select
import sys
 
# 創建一個socket
trans = paramiko.Transport(('192.168.2.129', 22))
# 啓動一個客戶端
trans.start_client()
 
# 若是使用rsa密鑰登陸的話
'''
default_key_file = os.path.join(os.environ['HOME'], '.ssh', 'id_rsa')
prikey = paramiko.RSAKey.from_private_key_file(default_key_file)
trans.auth_publickey(username='super', key=prikey)
'''
# 若是使用用戶名和密碼登陸
trans.auth_password(username='super', password='super')
# 打開一個通道
channel = trans.open_session()
# 獲取終端
channel.get_pty()
# 激活終端，這樣就能夠登陸到終端了，就和咱們用相似於xshell登陸系統同樣
channel.invoke_shell()
# 下面就能夠執行你全部的操做，用select實現
# 對輸入終端sys.stdin和 通道進行監控,
# 當用戶在終端輸入命令後，將命令交給channel通道，這個時候sys.stdin就發生變化，select就能夠感知
# channel的發送命令、獲取結果過程其實就是一個socket的發送和接受信息的過程
while True:
    readlist, writelist, errlist = select.select([channel, sys.stdin,], [], [])
    # 若是是用戶輸入命令了,sys.stdin發生變化
    if sys.stdin in readlist:
        # 獲取輸入的內容
        input_cmd = sys.stdin.read(1)
        # 將命令發送給服務器
        channel.sendall(input_cmd)
 
    # 服務器返回告終果,channel通道接受到結果,發生變化 select感知到
    if channel in readlist:
        # 獲取結果
        result = channel.recv(1024)
        # 斷開鏈接後退出
        if len(result) == 0:
            print("\r\n**** EOF **** \r\n")
            break
        # 輸出到屏幕
        sys.stdout.write(result.decode())
        sys.stdout.flush()
 
# 關閉通道
channel.close()
# 關閉連接
trans.close()

支持tab自動補全

import paramiko
import os
import select
import sys
import tty
import termios
 
'''
實現一個xshell登陸系統的效果，登陸到系統就不斷輸入命令同時返回結果
支持自動補全，直接調用服務器終端
'''
# 創建一個socket
trans = paramiko.Transport(('192.168.2.129', 22))
# 啓動一個客戶端
trans.start_client()
 
# 若是使用rsa密鑰登陸的話
'''
default_key_file = os.path.join(os.environ['HOME'], '.ssh', 'id_rsa')
prikey = paramiko.RSAKey.from_private_key_file(default_key_file)
trans.auth_publickey(username='super', key=prikey)
'''
# 若是使用用戶名和密碼登陸
trans.auth_password(username='super', password='super')
# 打開一個通道
channel = trans.open_session()
# 獲取終端
channel.get_pty()
# 激活終端，這樣就能夠登陸到終端了，就和咱們用相似於xshell登陸系統同樣
channel.invoke_shell()
 
# 獲取原操做終端屬性
oldtty = termios.tcgetattr(sys.stdin)
try:
    # 將如今的操做終端屬性設置爲服務器上的原生終端屬性,能夠支持tab了
    tty.setraw(sys.stdin)
    channel.settimeout(0)
 
    while True:
        readlist, writelist, errlist = select.select([channel, sys.stdin,], [], [])
        # 若是是用戶輸入命令了,sys.stdin發生變化
        if sys.stdin in readlist:
            # 獲取輸入的內容，輸入一個字符發送1個字符
            input_cmd = sys.stdin.read(1)
            # 將命令發送給服務器
            channel.sendall(input_cmd)
 
        # 服務器返回告終果,channel通道接受到結果,發生變化 select感知到
        if channel in readlist:
            # 獲取結果
            result = channel.recv(1024)
            # 斷開鏈接後退出
            if len(result) == 0:
                print("\r\n**** EOF **** \r\n")
                break
            # 輸出到屏幕
            sys.stdout.write(result.decode())
            sys.stdout.flush()
finally:
    # 執行完後將如今的終端屬性恢復爲原操做終端屬性
    termios.tcsetattr(sys.stdin, termios.TCSADRAIN, oldtty)
 
# 關閉通道
channel.close()
# 關閉連接
trans.close()

1六、XML

　　在XML解析方面，Python貫徹了本身「開箱即用」（batteries included）的原則。在自帶的標準庫中，Python提供了大量能夠用於處理XML語言的包和工具，數量之多，甚至讓Python編程新手無從選擇。

16.一、有哪些能夠解析XML的Python包？

　　Python的標準庫中，提供了6種能夠用於處理XML的包。

一、xml.dom

　　xml.dom實現的是W3C制定的DOM API。若是你習慣於使用DOM API或者有人要求這這樣作，可使用這個包。不過要注意，在這個包中，還提供了幾個不一樣的模塊，各自的性能有所區別。

　　DOM解析器在任何處理開始以前，必須把基於XML文件生成的樹狀數據放在內存，因此DOM解析器的內存使用量徹底根據輸入資料的大小。

二、xml.dom.minidom

　　xml.dom.minidom是DOM API的極簡化實現，比完整版的DOM要簡單的多，並且這個包也小的多。那些不熟悉DOM的朋友，應該考慮使用xml.etree.ElementTree模塊。據lxml的做者評價，這個模塊使用起來並不方便，效率也不高，並且還容易出現問題。

三、xml.dom.pulldom

　　與其餘模塊不一樣，xml.dom.pulldom模塊提供的是一個「pull解析器」，其背後的基本概念指的是從XML流中pull事件，而後進行處理。雖然與SAX同樣採用事件驅動模型（event-driven processing model），可是不一樣的是，使用pull解析器時，使用者須要明確地從XML流中pull事件，並對這些事件遍歷處理，直處處理完成或者出現錯誤。

pull解析（pull parsing）是近來興起的一種XML處理趨勢。此前諸如SAX和DOM這些流行的XML解析框架，都是push-based，也就是說對解析工做的控制權，掌握在解析器的手中。

四、xml.sax

　　xml.sax模塊實現的是SAX API，這個模塊犧牲了便捷性來換取速度和內存佔用。SAX是Simple API for XML的縮寫，它並非由W3C官方所提出的標準。它是事件驅動的，並不須要一次性讀入整個文檔，而文檔的讀入過程也就是SAX的解析過程。所謂事件驅動，是指一種基於回調（callback）機制的程序運行方法。

五、xml.parser.expat

　　xml.parser.expat提供了對C語言編寫的expat解析器的一個直接的、底層API接口。expat接口與SAX相似，也是基於事件回調機制，可是這個接口並非標準化的，只適用於expat庫。

expat是一個面向流的解析器。您註冊的解析器回調（或handler）功能，而後開始搜索它的文檔。當解析器識別該文件的指定的位置，它會調用該部分相應的處理程序（若是您已經註冊的一個）。該文件被輸送到解析器，會被分割成多個片段，並分段裝到內存中。所以expat能夠解析那些巨大的文件。

六、xml.etree.ElementTree（如下簡稱ET）

　　xml.etree.ElementTree模塊提供了一個輕量級、Pythonic的API，同時還有一個高效的C語言實現，即xml.etree.cElementTree。與DOM相比，ET的速度更快，API使用更直接、方便。與SAX相比，ET.iterparse函數一樣提供了按需解析的功能，不會一次性在內存中讀入整個文檔。ET的性能與SAX模塊大體相仿，可是它的API更加高層次，用戶使用起來更加便捷。

16.二、利用ElementTree解析XML

　　Python標準庫中，提供了ET的兩種實現。一個是純Python實現的xml.etree.ElementTree，另外一個是速度更快的C語言實現xml.etree.cElementTree。請記住始終使用C語言實現，由於它的速度要快不少，並且內存消耗也要少不少。若是你所使用的Python版本中沒有cElementTree所需的加速模塊，你能夠這樣導入模塊：

try:
    import xml.etree.cElementTree as ET
except ImportError:
    import xml.etree.ElementTree as ET

　　若是某個API存在不一樣的實現，上面是常見的導入方式。固然，極可能你直接導入第一個模塊時，並不會出現問題。請注意，自Python 3.3以後，就不用採用上面的導入方法，由於ElemenTree模塊會自動優先使用C加速器，若是不存在C實現，則會使用Python實現。所以，使用Python 3.3+的朋友，只須要import xml.etree.ElementTree便可。

以country.xml爲例,內容以下:

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank updated="yes">2</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank updated="yes">5</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank updated="yes">69</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>

1.解析

1)調用 parse() 方法，返回解析樹

import xml.etree.ElementTree as ET

tree = ET.parse("country.xml")  # <class 'xml.etree.ElementTree.ElementTree'>
root = tree.getroot()           # 獲取根節點 <Element 'data' at 0x02BF6A80>

本質上和方法三相同，parse() 源碼以下：

def parse(source, parser=None):
    """Parse XML document into element tree.

    *source* is a filename or file object containing XML data,
    *parser* is an optional parser instance defaulting to XMLParser.

    Return an ElementTree instance.

    """
    tree = ElementTree()
    tree.parse(source, parser)
    return tree

2)調用 from_string() ,返回解析樹的根元素

import xml.etree.ElementTree as ET
data = open("country.xml").read()
root = ET.fromstring(data)   # <Element 'data' at 0x036168A0>

3)調用 ElementTree模塊的 ElementTree(self, element=None, file=None)類 # 這裏的element做爲根節點

import xml.etree.ElementTree as ET
tree = ET.ElementTree(file="country.xml")  # <xml.etree.ElementTree.ElementTree object at 0x03031390>
root = tree.getroot()  # <Element 'data' at 0x030EA600>

2.遍歷

1)簡單遍歷

import xml.etree.ElementTree as ET

tree = ET.parse("country.xml")
root = tree.getroot()
print(root.tag, ":", root.attrib)  # 打印根元素的tag和屬性

# 遍歷xml文檔的第二層
for child in root:
    # 第二層節點的標籤名稱和屬性
    print("\t" + child.tag,":", child.attrib)
    # 遍歷xml文檔的第三層
    for children in child:
        # 第三層節點的標籤名稱和屬性
        print("\t\t" + children.tag, ":", children.attrib)

能夠經過下標的方式直接訪問節點

# 訪問根節點下第一個country的第二個節點year,獲取對應的文本
year = root[0][1].text    # 2008

2)ElementTree提供的方法

find(match) 　　 # 查找第一個匹配的子元素， match能夠時tag或是xpaht路徑
findall(match) # 返回全部匹配的子元素列表
findtext(match, default=None) #
iter(tag=None) # 以當前元素爲根節點建立樹迭代器,若是tag不爲None,則以tag進行過濾
iterfind(match) #

例子:

# 過濾出全部neighbor標籤
for neighbor in root.iter("neighbor"):
    print(neighbor.tag, ":", neighbor.attrib)

---

# 遍歷全部的counry標籤
for country in root.findall("country"):
    # 查找country標籤下的第一個rank標籤
    rank = country.find("rank").text
    # 獲取country標籤的name屬性
    name = country.get("name")
    print(name, rank)

3.修改xml結構

1) 屬性相關

# 將全部的rank值加1,並添加屬性updated爲yes
for rank in root.iter("rank"):
    new_rank = int(rank.text) + 1
    rank.text = str(new_rank)  # 必須將int轉爲str
    rank.set("updated", "yes") # 添加屬性

# 再終端顯示整個xml
ET.dump(root)
# 注意 修改的內容存在內存中 還沒有保存到文件中
# 保存修改後的內容
tree.write("output.xml")

---

import xml.etree.ElementTree as ET

tree = ET.parse("output.xml")
root = tree.getroot()

for rank in root.iter("rank"):
    # attrib爲屬性字典
    # 刪除對應的屬性updated
    del rank.attrib['updated']  

ET.dump(root)

小結: 關於class xml.etree.ElementTree.Element 屬性相關

attrib 　　 # 爲包含元素屬性的字典
keys() # 返回元素屬性名稱列表
items() # 返回(name,value)列表
get(key, default=None) # 獲取屬性
set(key, value) # 跟新/添加屬性
del xxx.attrib[key] # 刪除對應的屬性

2) 節點/元素相關

刪除子元素remove()

import xml.etree.ElementTree as ET

tree = ET.parse("country.xml")
root = tree.getroot()

# 刪除rank大於50的國家
for country in root.iter("country"):
    rank = int(country.find("rank").text)
    if rank > 50:
        # remove()方法 刪除子元素
        root.remove(country)

ET.dump(root)

添加子元素

import xml.etree.ElementTree as ET

tree = ET.parse("country.xml")
root = tree.getroot()

country = root[0]
last_ele = country[len(list(country))-1]
last_ele.tail = '\n\t\t'
# 建立新的元素, tag爲test_append
elem1 = ET.Element("test_append")
elem1.text = "elem 1"
# elem.tail = '\n\t'
country.append(elem1)

# SubElement() 其實內部調用的時append()
elem2 = ET.SubElement(country, "test_subelement")
elem2.text = "elem 2"

# extend()
elem3 = ET.Element("test_extend")
elem3.text = "elem 3"
elem4 = ET.Element("test_extend")
elem4.text = "elem 4"
country.extend([elem3, elem4])

# insert()
elem5 = ET.Element("test_insert")
elem5.text = "elem 5"
country.insert(5, elem5)

ET.dump(country)

添加子元素方法總結：

append(subelement)
extend(subelements)
insert(index, element)

4.建立xml文檔

　　想建立root Element,而後建立SubElement,最後將root element傳入ElementTree(element),建立tree，調用tree.write()方法寫入文件

　　對於建立元素的3個方法: 使用ET.Element、Element對象的makeelement()方法以及ET.SubElement

import xml.etree.ElementTree as ET


def subElement(root, tag, text):
    ele = ET.SubElement(root, tag)
    ele.text = text
    ele.tail = '\n'


root = ET.Element("note")

to = root.makeelement("to", {})
to.text = "peter"
to.tail = '\n'
root.append(to)

subElement(root, "from", "marry")
subElement(root, "heading", "Reminder")
subElement(root, "body", "Don't forget the meeting!")

tree = ET.ElementTree(root)
tree.write("note.xml", encoding="utf-8", xml_declaration=True)

效果:

　　因爲原生保存的XML時默認無縮進，若是想要設置縮進的話，須要修改保存方式

import xml.etree.ElementTree as ET
from xml.dom import minidom


def subElement(root, tag, text):
    ele = ET.SubElement(root, tag)
    ele.text = text


def saveXML(root, filename, indent="\t", newl="\n", encoding="utf-8"):
    rawText = ET.tostring(root)
    dom = minidom.parseString(rawText)
    with open(filename, 'w') as f:
        dom.writexml(f, "", indent, newl, encoding)


root = ET.Element("note")

to = root.makeelement("to", {})
to.text = "peter"
root.append(to)

subElement(root, "from", "marry")
subElement(root, "heading", "Reminder")
subElement(root, "body", "Don't forget the meeting!")

# 保存xml文件
saveXML(root, "note.xml")

1七、gzip

建立gzip文件

import gzip
content = "Lots of content here.\n第二行"
f = gzip.open('file.txt.gz', 'wb')
f.write(content.encode("utf-8"))
f.close()

對gzip文件的解壓

import gzip
f = gzip.open("file.txt.gz", 'rb')#打開壓縮文件對象
f_out=open("file","w")#打開解壓後內容保存的文件
file_content = f.read()    #讀取解壓後文件內容
f_out.write(file_content.decode("utf-8")) #寫入新文件當中
print(file_content.decode("utf-8")) #打印讀取內容
f.close() #關閉文件流
f_out.close()

壓縮現有文件

import gzip
f_in = open('test.png', 'rb')
f_out = gzip.open('test.png.gz', 'wb')

f_out.writelines(f_in)
# f_out.write(f_in.read())

f_out.close()
f_in.close()

壓縮數據

import gzip
data = "這是一串字符串"
bytes_data = gzip.compress(data.encode("utf-8"))
print("字節壓縮: ", bytes_data)

>>>
字節壓縮:  b'\x1f\x8b\x08\x00\xbb5Y]\x02\xff{\xb1\x7f\xe6\xb3\x19\xeb\x9f\xechx\xb2c\xd3\xd3\xb5\xd3\x9f\xafY\x06d\x00\x00u\xa1\x12E\x15\x00\x00\x00'

數據解壓

import gzip
data = "這是一串字符串"
bytes_data = gzip.compress(data.encode("utf-8"))
print("字節壓縮: ", bytes_data)

bytes_dedata = gzip.decompress(bytes_data)
print("字節解壓: ", bytes_dedata.decode("utf-8"))