python基礎---經常使用模塊（未完待續）

時間 2019-11-06

原文原文鏈接

re模塊（正則模塊）node

正則就是用一些具備特殊含義的符號組合到一塊兒（稱爲正則表達式）來描述字符或者字符串的方法。或者說：正則就是用來描述一類事物的規則。（在Python中）它內嵌在Python中，並經過 re 模塊實現。正則表達式模式被編譯成一系列的字節碼，而後由用 C 編寫的匹配引擎執行。python

\w 匹配字母數字及下劃線linux

\W 匹配非字母數字下劃線正則表達式

\s 匹配任意空白字符，等價於【\t\n\r\f】算法

\S 匹配任意非空字符shell

\d 匹配任意數字，等價於【0-9】編程

\D 匹配任意非數字json

\A 匹配字符串windows

\Z 匹配字符串結束，若是是存在換行，只匹配到換行前的結束字符串網絡

\z 匹配字符串結束

\G 匹配最後匹配完成的位置

\n 匹配一個換行符

\t 匹配一個製表符

^ 匹配字符串的開頭

$ 匹配字符串的末尾

. 匹配任意字符，除了換行符，當re.DOTALL標記被指定時，則能夠匹配包括換行符的任意字符

[…] 用來表示一組字符，單獨列出：【amk】匹配’a’，’m’或‘k’

[^…] 不在[]中的字符

* 匹配0個或多個的表達式

+ 匹配1個或多個的表達式

? 匹配0個或1個由前面的正則表達式定義的片斷，非貪婪方式

{n} 精確匹配n個前面表達式

{n,m} 匹配n到m次由前面的正則表達式定義的片斷，貪婪方式

a|b 匹配a或b

() 匹配括號內的表達式，也表示一個組

import re
print(re.findall('\w','hello_ | egon 123'))
print(re.findall('\W','hello_ | egon 123'))
print(re.findall('\s','hello_ | egon 123 \n \t'))
print(re.findall('\S','hello_ | egon 123 \n \t'))
print(re.findall('\d','hello_ | egon 123 \n \t'))
print(re.findall('\D','hello_ | egon 123 \n \t'))
print(re.findall('h','hello_ | hello h egon 123 \n \t'))
print(re.findall('\Ahe','hello_ | hello h egon 123 \n \t'))
print(re.findall('^he','hello_ | hello h egon 123 \n \t'))
print(re.findall('123\Z','hello_ | hello h egon 123 \n \t123'))
print(re.findall('123$','hello_ | hello h egon 123 \n \t123'))
print(re.findall('\n','hello_ | hello h egon 123 \n \t123'))
print(re.findall('\t','hello_ | hello h egon 123 \n \t123'))

輸出：
['h', 'e', 'l', 'l', 'o', '_', 'e', 'g', 'o', 'n', '1', '2', '3']
[' ', '|', ' ', ' ']
[' ', ' ', ' ', ' ', '\n', ' ', '\t']
['h', 'e', 'l', 'l', 'o', '_', '|', 'e', 'g', 'o', 'n', '1', '2', '3']
['1', '2', '3']
['h', 'e', 'l', 'l', 'o', '_', ' ', '|', ' ', 'e', 'g', 'o', 'n', ' ', ' ', '\n', ' ', '\t']
['h', 'h', 'h']
['he']
['he']
['123']
['123']
['\n']
['\t']

re模塊提供的方法：

re.findall() 查找全部知足匹配條件的結果，放在列表中

re.search()             只找到第一個匹配到的而後返回一個包含匹配信息的對象，該對象能夠經過調用group()方法獲得匹配的字符串,若是字符串沒有匹配，則返回None

re.match()              同search，不過在字符串開始出進行匹配，徹底可使用search+^代替match

re.split()                 按匹配內容對對象進行分割

re.sub()                  替換，（老的值，新的值，替換對象，替換次數），不指定替換次數，默認替換全部

re.subn()                同sub，不過結果中返回替換的次數

re.compile             重用匹配格式

3、time模塊

Python中，一般有如下三種方式來計算時間：

a.時間戳：

時間戳表示的是從1970年1月1日00:00:00開始按秒計算的偏移量。咱們運行「type(time.time())」，返回的是float類型

b.格式化的時間字符串

c.結構化的時間

struct_time元組共有9個元素:(年，月，日，時，分，秒，一年中第幾周，一年中第幾天，夏令時)

4、random模塊

5、os模塊

6、sys模塊

7、json和pickle模塊（序列化模塊）

把對象(變量)從內存中變成可存儲或傳輸的過程稱爲序列化

在Python中叫pickling，在其餘語言中也被稱之爲serialization，marshalling，flattening等等

序列化的做用：

a.持久保存狀態

在斷電或重啓程序以前將程序當前內存中全部的數據都保存下來（保存到文件中），以便於下次程序執行可以從文件中載入以前的數據，而後繼續執行，這就是序列化

b.跨平臺數據交互

序列化以後，不只能夠把序列化後的內容寫入磁盤，還能夠經過網絡傳輸到別的機器上，若是收發的雙方約定好實用一種序列化的格式，那麼便打破了平臺/語言差別化帶來的限制，實現了跨平臺數據交互。反過來，把變量內容從序列化的對象從新讀到內存裏稱之爲反序列化，即unpickling

json模塊

若是咱們要在不一樣的編程語言之間傳遞對象，就必須把對象序列化爲標準格式，好比XML，但更好的方法是序列化爲JSON，由於JSON表示出來就是一個字符串，能夠被全部語言讀取，也能夠方便地存儲到磁盤或者經過網絡傳輸。JSON不只是標準格式，而且比XML更快，並且能夠直接在Web頁面中讀取，很是方便，因此json適合數據跨平臺交互時使用（可是跨平臺意味着不會支持某種語言的全部數據類型，如不支持python函數的序列化）

內存中結構化的數據<---> 格式json <--->字符串 <---> 保存到文件中或基於網絡傳輸

使用：

dump 序列化

load 反序列化

import json
dic={'name':'egon','age':18}
with open('a.json','w') as f: # 序列化字典到文件內容
    f.write(json.dumps(dic)) 
with open('a.json','r') as f: # 反序列化輸出
    data=f.read()
    dic=json.loads(data)

dumps 序列化

loads 反序列化

import json
dic={'name':'egon','age':18}
json.dump(dic,open('b.json','w'))     # 序列化字典到文件內容
print(json.load(open('b.json','r'))['name'])  # 反序列化輸出

pickle模塊

pickle只能用於Python（全部數據類型），而且可能不一樣版本的Python彼此都不兼容，所以，只能用Pickle保存那些不重要的數據，不能成功地反序列化也不要緊。

內存中結構化的數據<---> 格式pickl<---> bytes類型 <---> 保存到文件中或基於網絡傳輸

dumps 序列化

loads 反序列化

dump 序列化

load 反序列化

import pickle
dic={'name':'egon','age':18}
with open('d.pkl','wb') as f:        # 序列化字典到文件內容
    f.write(pickle.dumps(dic)) 
with open('d.pkl','rb') as f:        # 反序列化輸出
    dic=pickle.loads(f.read())         
    print(dic['name'])

import pickle
dic={'name':'egon','age':18}
pickle.dump(dic,open('e.pkl','wb'))   # 序列化字典到文件內容
print(pickle.load(open('e.pkl','rb'))['name']) # 反序列化輸出

pickle是根據內存地址進行反序列化的，因此該內存地址對應的數據在命名空間中必須是已定義的

8、shelve模塊

九、shutil模塊

高級的文件、文件夾、壓縮包處理模塊

經常使用方法：

將文件內容拷貝到另外一個文件中：

shutil.copyfileobj(源文件, 目標文件[, length])

拷貝文件：

shutil.copyfile(src, dst) # 目標文件無需存在

僅拷貝權限。內容、組、用戶均不變

shutil.copymode(src, dst) # 目標文件必須存在

僅拷貝狀態的信息，包括：mode bits,atime, mtime, flags

shutil.copystat(src, dst) #目標文件必須存在

拷貝文件和權限

shutil.copy(src, dst)

拷貝文件和狀態信息

shutil.copy2(src, dst)

遞歸的去拷貝文件夾

shutil.ignore_patterns(*patterns)
shutil.copytree(src, dst, symlinks=False, ignore=None) #目標目錄不能存在，注意對dst目錄父級目錄要有可寫權限，ignore的意思是排除

拷貝軟鏈接

import shutil

shutil.copytree('f1', 'f2', symlinks=True,ignore=shutil.ignore_patterns('*.pyc', 'tmp*'))

一般的拷貝都把軟鏈接拷貝成硬連接，即對待軟鏈接來講，建立新的文件

遞歸的去刪除文件

shutil.rmtree(path[, ignore_errors[,onerror]])

遞歸的去移動文件，它相似mv命令，其實就是重命名

shutil.move(src, dst)

建立壓縮包並返回文件路徑，例如：zip、tar

shutil.make_archive(base_name, format,...)

base_name：壓縮包的文件名，也能夠是壓縮包的路徑。只是文件名時，則保存至當前目錄，不然保存至指定路徑
如 data_bak =>保存至當前路徑
如 /tmp/data_bak =>保存至/tmp/

format：壓縮包種類，「zip」, 「tar」, 「bztar」，「gztar」

root_dir：要壓縮的文件夾路徑（默認當前目錄）

owner：用戶，默認當前用戶

group：組，默認當前組

logger：用於記錄日誌，一般是logging.Logger對象

練習：

#將 /data 下的文件打包放置當前程序目錄
import shutil
ret = shutil.make_archive("data_bak", 'gztar', root_dir='/data')
  
#將 /data下的文件打包放置 /tmp/目錄
import shutil
ret = shutil.make_archive("/tmp/data_bak", 'gztar', root_dir='/data')

shutil 對壓縮包的處理是調用 ZipFile 和 TarFile 兩個模塊來進行的，詳細：

import zipfile
# 壓縮
z = zipfile.ZipFile('laxi.zip', 'w')
z.write('a.log')
z.write('data.data')
z.close()
 
# 解壓
z = zipfile.ZipFile('laxi.zip', 'r')
z.extractall(path='.')
z.close()
 
import tarfile
 
# 壓縮
t=tarfile.open('/tmp/egon.tar','w')
t.add('/test1/a.py',arcname='a.bak')
t.add('/test1/b.py',arcname='b.bak')
t.close()
 
 
# 解壓
t=tarfile.open('/tmp/egon.tar','r')
t.extractall('/egon')
t.close()

十、xml模塊

xml是實現不一樣語言或程序之間進行數據交換的協議，跟json功能差很少，但json使用起來更簡單，因爲比json出現的早，至今不少傳統公司如金融行業的不少系統的接口還主要是xml

xml是經過<>節點（標籤）來區別數據結構的

<?xml version="1.0"?>
<data>
   <country name="Liechtenstein">
       <rank updated="yes">2</rank>
       <year>2008</year>
       <gdppc>141100</gdppc>
       <neighbor name="Austria" direction="E"/>
       <neighbor name="Switzerland" direction="W"/>
   </country>
   <country name="Singapore">
       <rank updated="yes">5</rank>
       <year>2011</year>
       <gdppc>59900</gdppc>
       <neighbor name="Malaysia" direction="N"/>
   </country>
   <country name="Panama">
       <rank updated="yes">69</rank>
       <year>2011</year>
       <gdppc>13600</gdppc>
       <neighbor name="Costa Rica" direction="W"/>
       <neighbor name="Colombia" direction="E"/>
   </country>
</data>

對xml進行操做：

import xml.etree.ElementTree as ET    #導入模塊方法
 
tree = ET.parse("xmltest.xml")
root = tree.getroot()
print(root.tag)
 
#遍歷xml文檔
for child in root:
    print('========>',child.tag,child.attrib,child.attrib['name'])
    fori in child:
       print(i.tag,i.attrib,i.text)
 
#只遍歷year 節點
for node in root.iter('year'):
    print(node.tag,node.text)
#---------------------------------------
 
import xml.etree.ElementTree as ET
 
tree = ET.parse("xmltest.xml")
root = tree.getroot()
 
#修改
for node in root.iter('year'):
   new_year=int(node.text)+1
   node.text=str(new_year)
   node.set('updated','yes')
   node.set('version','1.0')
tree.write('test.xml')

#刪除node
for country in root.findall('country'):
  rank = int(country.find('rank').text)
   ifrank > 50:
    root.remove(country)
 
tree.write('output.xml')

十一、configparser模塊

主要用來解析配置文件

配置文件爲如下格式：

[section1]

k1 = v1

k2:v2

user=egon

age=18

is_admin=true

salary=31

[section2]

k1 = v1

操做方法以下：

import configparser # 導入模塊

config=configparser.ConfigParser() #使用ConfigParser方法獲得一個對象賦值給config

查看標題：

config.sections()

查看標題section1下全部key=value的key

config.options('section1')

查看標題section1下全部key=value的(key,value)格式

config.items('section1')

查看標題section1下user的值，字符串格式

config.get('section1','user')

查看標題section1下age的值，整數格式

val1=config.getint('section1','age')

查看標題section1下is_admin的值，布爾值格式

config.getboolean('section1','is_admin')

查看標題section1下salary的值，浮點型格式

config.getfloat('section1','salary')

刪除整個標題section2

config.remove_section('section2')

刪除標題section1下的某個k1和k2

config.remove_option('section1','k1')

config.remove_option('section1','k2')

判斷是否存在某個標題

config.has_section('section1')

判斷標題section1下是否有user

config.has_option('section1','user')

添加一個標題

config.add_section('egon')

在標題egon下添加name=egon,age=18的配置

config.set('egon','name','egon')

config.set('egon','age',18) #報錯,必須是字符串

最後將修改的內容寫入文件,完成最終的修改

config.write(open('a.cfg','w'))

十二、hashlib模塊

hash：一種算法 ,3.x裏代替了md5模塊和sha模塊，主要提供 SHA1, SHA224, SHA256, SHA384, SHA512 ，MD5 算法
三個特色：
1.內容相同則hash運算結果相同，內容稍微改變則hash值則變
2.不可逆推
3.相同算法：不管校驗多長的數據，獲得的哈希值長度固定

import hashlib
 
m=hashlib.md5()# m=hashlib.sha256()
m.update('hello'.encode('utf8'))
print(m.hexdigest())  #5d41402abc4b2a76b9719d911017c592
m.update('alvin'.encode('utf8'))
print(m.hexdigest())  #92a7e713c30abbb0319fa07da2a5c4af
m2=hashlib.md5()
m2.update('helloalvin'.encode('utf8'))
print(m2.hexdigest()) #92a7e713c30abbb0319fa07da2a5c4af
'''
注意：把一段很長的數據update屢次，與一次update這段長數據，獲得的結果同樣
可是update屢次爲校驗大文件提供了可能。
'''

以上加密算法雖然依然很是厲害，但時候存在缺陷，即：經過撞庫能夠反解。因此，有必要對加密算法中添加自定義key再來作加密。

import hashlib
 
# ######## 256 ########
hash = hashlib.sha256('898oaFs09f'.encode('utf8'))
hash.update('alvin'.encode('utf8'))
print (hash.hexdigest())#e79e68f070cdedcfe63eaf1a2e92c83b4cfb1b5c6bc452d214c1b7e77cdfd1c7
 
 
import hashlib
passwds=[
    'alex3714',
    'alex1313',
    'alex94139413',
    'alex123456',
    '123456alex',
    'a123lex',
    ]
def make_passwd_dic(passwds):
    dic={}
    for passwd inpasswds:
        m=hashlib.md5()
       m.update(passwd.encode('utf-8'))
       dic[passwd]=m.hexdigest()
    return dic
 
def break_code(cryptograph,passwd_dic):
    for k,v inpasswd_dic.items():
        if v == cryptograph:
            print('密碼是===>\033[46m%s\033[0m'%k)
 
cryptograph='aee949757a2e698417463d47acac93df'
break_code(cryptograph,make_passwd_dic(passwds))
python 還有一個 hmac 模塊，它內部對咱們建立 key 和 內容 進行進一步的處理而後再加密:
import hmac
h = hmac.new('alvin'.encode('utf8'))
h.update('hello'.encode('utf8'))
print (h.hexdigest())#320df9832eab4c038b6c1d7ed73a5940

#要想保證hmac最終結果一致，必須保證：
#1:hmac.new括號內指定的初始key同樣
#2:不管update多少次，校驗的內容累加到一塊兒是同樣的內容
 
import hmac
 
h1=hmac.new(b'egon')
h1.update(b'hello')
h1.update(b'world')
print(h1.hexdigest())
 
h2=hmac.new(b'egon')
h2.update(b'helloworld')
print(h2.hexdigest())
 
h3=hmac.new(b'egonhelloworld')
print(h3.hexdigest())
 
'''
f1bf38d054691688f89dcd34ac3c27f2
f1bf38d054691688f89dcd34ac3c27f2
bcca84edd9eeb86f30539922b28f3981
'''

5.subprocess模塊

在python解釋器中開啓一個子進程執行shell命令

stdout 標準正確輸出 # 輸出內容爲bytes類型，若是在windows輸出須要解碼爲decode（‘gbk’），linux解碼爲decode（‘utf-8’）

stderr 標準錯誤輸出

stdin 標準輸入

shell=True 使用shell命令

subprocess.PIPE 把輸出結果放到管道

res1=subprocess.Popen('ls/Users/jieli/Desktop',shell=True,stdout=subprocess.PIPE)

# 先列出桌面上的文件

subprocess.Popen('grep txt$',shell=True,stdin=res1.stdout,stdout=subprocess.PIPE)

# 把上面的數據交給這條命令做爲輸入結果，過濾以txt結尾的文件