Python菜鳥之路：Python基礎-模塊

時間 2019-12-01

原文原文鏈接

什麼是模塊？

　　在計算機程序的開發過程當中，隨着程序代碼越寫越多，在一個文件裏代碼就會愈來愈長，愈來愈不容易維護。爲了編寫可維護的代碼，咱們把不少函數分組，分別放到不一樣的文件裏，分組的規則就是把實現了某個功能的代碼集合，放置到一個模塊中，這樣，每一個文件包含的代碼就相對較少，不少編程語言都採用這種組織代碼的方式。在Python中，一個.py文件就稱之爲一個模塊（Module）。模塊也被叫作庫。html

模塊的做用？

一、模塊內有許多函數方法，利用這些方法能夠更簡單的完成許多工做。
二、模塊能夠在文件中永久保存代碼。在python交互式解釋器中操做輸入的代碼，在退出python時是不會被保存的，而模塊文件中的代碼是永久存在的。
三、從實用性方面來講，模塊能夠跨系統平臺使用，只須要Copy代碼就能夠。好比說，有一個全局對像，會被許多文件使用，這時爲了方便把它寫入一個模塊中，再被調用是最好的方法。node

模塊的分類

內置模塊：Python官方提供的一些模塊功能，好比：random,json,string,base64,pickle,sys,os等python

自定義模塊：根據本身需求編寫的一些.py文件或一類模塊以及包。linux

第三方模塊：非Python自己自帶的一些模塊甚至框架。好比：request,Image,Flask,Django,Scrapy等。正則表達式

怎麼使用？

1.導入

　　模塊的導入使用import語句來完成。 import module1[, module2[,... moduleN] 若是導入的模塊和主程序在同個目錄下，直接import就好了。算法

　　若是導入的模塊是在主程序所在目錄的子目錄下，能夠在子目錄中增長一個空白的__init__.py文件，該文件使得python解釋器將子目錄整個也當成一個模塊，而後直接經過「import 子目錄.模塊」導入便可。shell

　　若是導入的模塊是在主程序所在目錄的父目錄下，則要經過修改path來解決，有兩種方法：數據庫

　　(1)經過」import sys，sys.path.append('父目錄的路徑')「來改變，這種方法屬於一次性的，只對當前的python解釋器進程有效，關掉python重啓後就失效了。編程

　　(2)直接修改環境變量：在windows中是「 set 變量=‘路徑’ 」例如：set PYTHONPATH=‘C:\test\...’ 查看是否設置成功用echo %PYTHONPATH%,並且進到python解釋器中查看sys.path,會發現已經有了新增長的路徑了。這種方式是永久的，一次設置之後一直都有效。json

　　注意：經過修改path是通用的方法，由於python解釋器就是經過sys.path去一個地方一個地方的尋找模塊，且當前目錄優先導入。

擴展補充：

　　還有一種動態導入模塊的方法，以字符串方式導入，能夠根據用戶輸入，或者url等來進行動態建立導入模塊的語句，案例以下：

1 inp = input("請輸入要訪問的url：")
2 m, f = inp.split("/")
3 obj = __import__(m)
4 # 這樣導入的m模塊，就被實例到對象obj。
5 
6 
7 # 帶入帶路徑的模塊，擴展：__import__的使用
8 # __import__("lib.xxx.xxx.xx"+ m) 默認狀況下只導入lib
9 # __import__("lib.xxx.xxx.xx"+ m， fromlist=True) fromlist參數可使它導入lib.xx.xx.xx,使多層次導入生效。默認爲單層

案例代碼：

1 def run():
2     inp = input("請輸入要訪問的url：")
3     if hasattr(commons, inp):
4         func = getattr(commons, inp) # commons是模塊，inp是對應函數
5         func()
6     else:print("不存在")

2.命名

　　因爲Python在導入模塊的時候，是按照sys.path路徑去順序查找，所以，若是在當前目錄下找到的話，就會終止向下查找，所以模塊的命名應該避免與第三方模塊、內置模塊衝突。

3.編寫

　　模塊的編寫與常規函數的編寫無其餘區別，須要注意的就是儘可能聚合一類功能的代碼放入一個模塊中，提升了整合度，也方便其餘人來調用。同時還能夠提升構建項目時，包的有序性和可維護性。

4. 幾個重要的內置變量

__doc__ 函數、或者文檔的註釋
__file__ 獲取當前運行的py文件所在的目錄

__cached__ __pycache__的路徑,知道就行
__name__ 1. 獲取函數的名稱 2.只有執行當前文件時，當前文件的特殊變量__name__ 就等於「__main__」
__package__ 輸出對應函數屬於哪一個包 . admin.__package__

常見模塊的用法

1. sys

　　包括了一組很是實用的服務，內含不少函數方法和變量，用來處理Python運行時配置以及資源，從而能夠與當前程序以外的系統環境交互

1）sys.argv 獲取一個命令行參數的list。第一個元素是python腳本名稱，其他的每一個元素相似shell中傳參的$1, $2, $3....$n

2）sys.path 查找模塊所在目錄的目錄名列表。經常使用來添加其餘目錄的包或者模塊

import sys, os
# 程序主目錄
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
# 添加主目錄至環境變量,一般寫於文件首部位置
sys.path.append(BASE_DIR)
 
print(sys.path[0], type(sys.path[0]))

#out: 
E:\學習經歷\python勃起\SVN目錄\S13-Day05\class <class 'str'>

sys.path

3）sys.exit(n) 退出程序，正常退出時exit(0).

4）sys.platform 返回操做系統平臺名稱

5）sys.stdin 輸入相關

6）sys.stdout 輸出相關，實際上，這就是print函數真正作的事情：它在你打印的字符串後面加上一個硬回車，而後調用 sys.stdout.write 函數。

7）sys.stderr 錯誤相關

2. os

　　這個模塊包含廣泛的操做系統功能。若是你但願你的程序可以與平臺無關的話，這個模塊是尤其重要的。即它容許一個程序在編寫後不須要任何改動，也不會發生任何問題，就能夠在Linux和Windows下運行。

 1 os.getcwd()                 獲取當前工做目錄，即當前python腳本工做的目錄路徑
 2 
 3 os.chdir("dirname")         改變當前腳本工做目錄；至關於shell下cd
 4 os.makedirs('dir1/dir2')    可生成多層遞歸目錄，至關於linux中的mkdir -p
 5 os.removedirs('dirname1')   若目錄爲空，則刪除，並遞歸到上一級目錄，如若也爲空，則刪除，依此類推
 6 os.mkdir('dirname')         生成單級目錄；至關於shell中mkdir dirname
 7 os.rmdir('dirname')         刪除單級空目錄，若目錄不爲空則沒法刪除，報錯；至關於shell中rmdir dirname
 8 os.listdir('dirname')       列出指定目錄下的全部文件和子目錄，包括隱藏文件，並以列表方式打印
 9 os.remove()                 刪除一個文件
10 os.rename("oldname","new")  重命名文件/目錄
11 os.stat('path/filename')    獲取文件/目錄信息
12 os.sep                      操做系統特定的路徑分隔符，win下爲"\\",Linux下爲"/"
13 os.linesep                  當前平臺使用的行終止符，win下爲"\t\n",Linux下爲"\n"
14 os.pathsep                  用於分割文件路徑的字符串,windows下爲";"，Linux下爲"："
15 os.name                     字符串指示當前使用平臺。win->'nt'; Linux->'posix'
16 os.system("bash command")   運行shell命令，並輸出對應結果
17 os.environ                  獲取系統環境變量
18 os.path.abspath(path)       返回path規範化的絕對路徑
19 os.path.split(path)         將path分割成目錄和文件名二元組返回
20 os.path.dirname(path)       返回path的目錄。其實就是os.path.split(path)的第一個元素
21 os.path.basename(path)      返回path最後的文件名。如何path以／或\結尾，那麼就會返回空值。即os.path.split(path)的第二個元素
22 os.path.exists(path)        若是path存在，返回True；若是path不存在，返回False
23 os.path.isabs(path)         若是path是絕對路徑，返回True
24 os.path.isfile(path)        若是path是一個存在的文件，返回True。不然返回False
25 os.path.isdir(path)         若是path是一個存在的目錄，則返回True。不然返回False
26 os.path.join(path1[, path2[, ...]])  將多個路徑組合後返回，第一個絕對路徑以前的參數將被忽略
27 os.path.getatime(path)      返回path所指向的文件或者目錄的最後存取時間
28 os.path.getmtime(path)      返回path所指向的文件或者目錄的最後修改時間

os.*

重點：os.path.join，用於鏈接多個字符串來組成路徑，能夠根據不一樣的操做系統，生成不一樣表現形式的地址，'/'，'\'

3. random

　　python中的隨機數模塊，經常使用的幾個方法以下：

1 random.random()       用於生成一個0到1的隨機浮點數: 0 <= n < 1.0
2 random.uniform(a,b)  用於生成一個指定範圍內的隨機符點數，兩個參數其中一個是上限，一個是下限
3 random.randint(a, b)    用於生成一個指定範圍內的整數。其中參數a是下限，參數b是上限，生成的隨機數n: a <= n <= b
4 random.randrange([start], stop[, step])  從指定範圍內，按指定基數遞增的集合中 獲取一個隨機數
5 random.choice   從序列中獲取一個隨機元素。其函數原型爲：random.choice(sequence)。參數sequence表示一個有序類型
6 random.sample(sequence, k)     從指定序列中隨機獲取指定長度的片段

randon.functions

4. time 和 datetime

 1 import time
 2 import datetime
 3 
 4 print(time.time()) #返回當前系統時間戳
 5 print(time.ctime()) #輸出Tue Jan 26 18:23:48 2016 ,當前系統時間
 6 print(time.ctime(time.time()-86640)) #將時間戳轉爲字符串格式
 7 print(time.gmtime(time.time()-86640)) #將時間戳轉換成struct_time格式
 8 print(time.localtime(time.time()-86640)) #將時間戳轉換成struct_time格式,但返回 的本地時間
 9 print(time.mktime(time.localtime())) #與time.localtime()功能相反,將struct_time格式轉回成時間戳格式
10 #time.sleep(4) #sleep
11 print(time.strftime("%Y-%m-%d %H:%M:%S",time.gmtime()) ) #將struct_time格式轉成指定的字符串格式
12 print(time.strptime("2016-01-28","%Y-%m-%d") ) #將字符串格式轉換成struct_time格式
13  
14 #datetime module
15  
16 print(datetime.date.today()) #輸出格式 2016-01-26
17 print(datetime.date.fromtimestamp(time.time()-864400) ) #2016-01-16 將時間戳轉成日期格式
18 current_time = datetime.datetime.now() #
19 print(current_time) #輸出2016-01-26 19:04:30.335935
20 print(current_time.timetuple()) #返回struct_time格式
21  
22 #datetime.replace([year[, month[, day[, hour[, minute[, second[, microsecond[, tzinfo]]]]]]]])
23 print(current_time.replace(2014,9,12)) #輸出2014-09-12 19:06:24.074900,返回當前時間,但指定的值將被替換
24  
25 str_to_date = datetime.datetime.strptime("21/11/06 16:30", "%d/%m/%y %H:%M") #將字符串轉換成日期格式
26 new_date = datetime.datetime.now() + datetime.timedelta(days=10) #比如今加10天
27 new_date = datetime.datetime.now() + datetime.timedelta(days=-10) #比如今減10天
28 new_date = datetime.datetime.now() + datetime.timedelta(hours=-10) #比如今減10小時
29 new_date = datetime.datetime.now() + datetime.timedelta(seconds=120) #比如今+120s
30 print(new_date)

time&&datetime

5. 序列化模塊json

json，用於字符串和 python數據類型間進行轉換.更加適合跨語言(通常都是字符串)

　　json.loads 將字符串轉換爲python的數據類型
　　json.dumps 將python的基本數據類型轉換爲字符串

 1 import json
 2 dic = '{"k1":1, "k2":2}'
 3 print(json.loads(dic), type(json.loads(dic)))
 4 
 5 out: {'k1': 1, 'k2': 2} <class 'dict'>
 6 
 7 
 8 
 9 dic = {'k1':1}
10 s = json.dumps(dic)
11 print(s, type(s))
12 
13 out: {"k1": 1} <class 'str'>

json.loads&&dumps

　　json.load　　　從文件讀取json數據格式的字符串，進而轉換成python中的數據格式

　　json.dump　　將json數據，寫入文件

 1 import json, os
 2 li = [11, 22, 33]
 3 json.dump(li, open('write.txt', 'w'))
 4 os.system("type write.txt")
 5 
 6 out: [11, 22, 33]
 7 
 8 
 9 LI = json.load(open('write.txt', 'r'))
10 print(LI, type(LI))
11 
12 out: [11, 22, 33] <class 'list'>

json.load&&dump

pickle，用於python特有的類型和 python的數據類型間進行轉換，對python複雜類型作操做，是一種持久化存儲的方式。缺點：python版本之間的不一樣，可能會致使沒法反序列化其餘版本的序列化結果或文件。

　　pickle.loads 將bytes數據類型轉換爲對應的python數據類型

　　pickles.dumps 將python數據類型轉換爲bytes對象

 1 import pickle
 2 li = [11,22,33]
 3 r = pickle.dumps(li)
 4 print(r, type(r))
 5 
 6 out: b'\x80\x03]q\x00(K\x0bK\x16K!e.' <class 'bytes'>
 7 
 8 
 9 s = pickle.loads(r)
10 print(s, type(s))
11 
12 out: [11, 22, 33] <class 'list'>

pickle.dumps&&loads

　　pickle.load　　　從pickle數據格式的文件中讀取數據，並轉化爲python數據格式。

　　pickles.dump 將python數據格式，存儲入文件中，返回None

 1 import pickle
 2 li = [11,22,33]
 3 r = pickle.dump(li,open("write.txt",'wb'))
 4 print(r, type(r))
 5 
 6 out: None <class 'NoneType'>
 7 
 8 s = pickle.load(open("write.txt",'rb'), encoding='utf-8')
 9 print(s, type(s))
10 
11 out: [11, 22, 33] <class 'list'>

pickle.dump&&load

6. logging 用於便捷記錄日誌且線程安全的模塊

　　日誌模塊基本上是全部程序中最經常使用的功能，而logging模塊屬於python內置的一個模塊（注意，是內置哦，能夠跨平臺使用，能夠跨平臺使用，能夠跨平臺使用，重要的事情說三遍）。若是簡單的打印日誌信息到文件，使用很是簡單，分爲如下倆步：

　　1）定義文件 2）輸出信息 (若是隻是輸出至屏幕，第一步「1」)能夠省去)

 1 import logging, os
 2 
 3 logging.basicConfig(filename='example.log',level=logging.INFO)
 4 logging.debug('This message should go to the log file')
 5 logging.info('So should this')
 6 logging.warning('And this, too')
 7 os.system("type example.log")
 8 
 9 out: 
10 INFO:root:So should this
11 WARNING:root:And this, too

logging.sample1

　　上邊示例代碼中，其實日誌等級level不是必須配置的，默認level=warning。輸出不一樣級別的日誌，只有日誌等級大於或等於設置的日誌級別的日誌纔會被輸出。所有的日誌級別以下：

Level	When it’s used
`DEBUG`	Detailed information, typically of interest only when diagnosing problems.
`INFO`	Confirmation that things are working as expected.
`WARNING`	An indication that something unexpected happened, or indicative of some problem in the near future (e.g. ‘disk space low’). The software is still working as expected.
`ERROR`	Due to a more serious problem, the software has not been able to perform some function.
`CRITICAL`	A serious error, indicating that the program itself may be unable to continue running.

logging.basicConfig的其餘配置參數：

    filename  Specifies that a FileHandler be created, using the specified
              filename, rather than a StreamHandler.#定義輸出文件名
    filemode  Specifies the mode to open the file, if filename is specified
              (if filemode is unspecified, it defaults to 'a').#定義輸出日誌文件的打開方式，默認爲append追加模式。
    format    Use the specified format string for the handler.#定義日誌格式
    datefmt   Use the specified date/time format.#定義時間格式，即%(asctime)的格式
    style     If a format string is specified, use this to specify the
              type of format string (possible values '%', '{', '$', for
              %-formatting, :meth:`str.format` and :class:`string.Template`
              - defaults to '%').
    level     Set the root logger level to the specified level.#定義日誌輸出級別
    stream    Use the specified stream to initialize the StreamHandler. Note
              that this argument is incompatible with 'filename' - if both
              are present, 'stream' is ignored.#與finename配置項衝突，共存時此項配置忽略
    handlers  If specified, this should be an iterable of already created
              handlers, which will be added to the root handler. Any handler
              in the list which does not have a formatter assigned will be
              assigned the formatter created in this function.

其中format是最經常使用的一個參數，用來定義日誌格式，好比：format='%(asctime)s - %(name)s - %(levelname)s -%(module)s: %(message)s'

%()s中，分別表明什麼呢？看下錶：(着重關注：levelname，filename, module, lineno, funcName, asctime, message)

   %(name)s            Name of the logger (logging channel)
    %(levelno)s         Numeric logging level for the message (DEBUG, INFO,
                        WARNING, ERROR, CRITICAL)
    %(levelname)s       Text logging level for the message ("DEBUG", "INFO",
                        "WARNING", "ERROR", "CRITICAL")
    %(pathname)s        Full pathname of the source file where the logging
                        call was issued (if available)
    %(filename)s        Filename portion of pathname
    %(module)s          Module (name portion of filename)
    %(lineno)d          Source line number where the logging call was issued
                        (if available)
    %(funcName)s        Function name
    %(created)f         Time when the LogRecord was created (time.time()
                        return value)
    %(asctime)s         Textual time when the LogRecord was created
    %(msecs)d           Millisecond portion of the creation time
    %(relativeCreated)d Time in milliseconds when the LogRecord was created,
                        relative to the time the logging module was loaded
                        (typically at application startup time)
    %(thread)d          Thread ID (if available)
    %(threadName)s      Thread name (if available)
    %(process)d         Process ID (if available)
    %(message)s         The result of record.getMessage(), computed just as
                        the record is emitted

案例1：將日誌打印到屏幕

import logging

logging.debug('This is debug message')
logging.info('This is info message')
logging.warning('This is warning message')
logging.critical('This is critical message')
logging.error('This is error message')

out:
WARNING:root:This is warning message
CRITICAL:root:This is critical message
ERROR:root:This is error message

stdout-to-screen

#上邊示例中，發現只有warning級別以上的打印到屏幕，是由於默認記錄level爲warning的緣由，上邊已經說到過。

案例2：將日誌同時輸出到屏幕和日誌

import logging

#define logfile/logformat/loglevel for file log
logging.basicConfig(level=logging.DEBUG,
                format='%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(message)s',
                datefmt='%a, %d %b %Y %H:%M:%S',
                filename='example.log',
                filemode='w')

#create logger obj
logger = logging.getLogger('CURRENT-USER')
logger.setLevel(logging.DEBUG)

#create console handler and set level to INFO
ch = logging.StreamHandler()
ch.setLevel(logging.INFO)

# define log format for console log
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
ch.setFormatter(formatter)

# add console handle to logger obj
logger.addHandler(ch)

logger.debug('This is debug message')
logger.info('This is info message')
logger.warning('This is warning message')
logger.critical('This is critical message')
logger.error('This is error message')

stdout-screen-and-log

最終終端展現：

2016-06-08 16:26:50,007 - CURRENT-USER - INFO - This is info message
2016-06-08 16:26:50,007 - CURRENT-USER - WARNING - This is warning message
2016-06-08 16:26:50,007 - CURRENT-USER - CRITICAL - This is critical message
2016-06-08 16:26:50,010 - CURRENT-USER - ERROR - This is error message

最終文件內容：

Wed, 08 Jun 2016 16:30:35 practice3.py[line:183] DEBUG This is debug message
Wed, 08 Jun 2016 16:30:35 practice3.py[line:184] INFO This is info message
Wed, 08 Jun 2016 16:30:35 practice3.py[line:185] WARNING This is warning message
Wed, 08 Jun 2016 16:30:35 practice3.py[line:186] CRITICAL This is critical message
Wed, 08 Jun 2016 16:30:35 practice3.py[line:187] ERROR This is error message

案例3：設置log rotate(TimedRotatingFileHandler和RotatingFileHandler)

　　不管是TimedRotatingFileHandler仍是RotatingFileHandler，都是繼承自logging.FileHandler。

#定義一個RotatingFileHandler，最多備份5個日誌文件，每一個日誌文件最大238byte
Rthandler = RotatingFileHandler('example.log', maxBytes=238,backupCount=5)
Rthandler.setLevel(logging.INFO)
formatter = logging.Formatter('%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(message)s')
Rthandler.setFormatter(formatter)
logging.getLogger('').addHandler(Rthandler)

logging.debug('This is debug message')
logging.info('This is info message')
logging.warning('This is warning message')
logging.critical('This is critical message')
logging.error('This is error message')

更多的用法，參照http://www.cnblogs.com/dkblog/archive/2011/08/26/2155018.html

PS. 還有一個日誌的第三方模塊，syslog感受沒有logging好用，關鍵不支持跨平臺操做(windows就不能夠)。因此這裏不做過多說明。

7. 加密模塊hashlib

　　因爲只是簡單使用hashlib.md5() , hashlib.sha1() , hashlib.sha256() , hashlib.sha384() , hashlib.sha512() 的話，能夠經過撞庫的方式進行反解，所以有必要對加密算法中添加自定義key再來作加密，即加鹽。以sha512加鹽加密爲例，其他的使用方法同樣。

1 import hashlib
2 
3 hash = hashlib.sha512('nihao'.encode('utf-8'))
4 hash.update('123'.encode("utf-8"))
5 print(hash.hexdigest())
6 
7 out: 480ad41a6a159cba1811ccac4561845816e9a488cc992b0979a73065560e6a30f34a1f1a051c7044ae7d636df0327cc4f3bb7f54e129e4d76688f389394c257c

Ps: 須要額外注意，python在全部平臺上均可以使用的加密算法以下：

>>> hashlib.algorithms_guaranteed

{'sha224', 'sha512', 'sha256', 'sha384', 'sha1', 'md5'}

8. 簽名算法hmac

hmac主要應用在身份驗證中，它的使用方法是這樣的：

　　1. 客戶端發出登陸請求（假設是瀏覽器的GET請求）

　　2. 服務器返回一個隨機值，並在會話中記錄這個隨機值

　　3. 客戶端將該隨機值做爲密鑰，用戶密碼進行hmac運算，而後提交給服務器

　　4. 服務器讀取用戶數據庫中的用戶密碼和步驟2中發送的隨機值作與客戶端同樣的hmac運算，而後與用戶發送的結果比較，若是結果一致則驗證用戶合法

　　在這個過程當中，可能遭到安全攻擊的是服務器發送的隨機值和用戶發送的hmac結果，而對於截獲了這兩個值的黑客而言這兩個值是沒有意義的，絕無獲取用戶密碼的可能性，隨機值的引入使hmac只在當前會話中有效，大大加強了安全性和實用性

1 import hmac
2 myhmac = hmac.new(b'suijizhi')
3 myhmac.update(b'mypassword')
4 print(myhmac.hexdigest())
5 
6 out: 7b6a9485f5b1f513d6d55b24642db70c

擴展閱讀：哈希長度擴展攻擊解析
 　　　　　科普哈希長度擴展攻擊(Hash Length Extension Attacks)_百度安全論壇

9. re模塊

1）re.match(pattern, string, flags=0) 從起始位置開始根據模型去字符串中匹配指定內容，匹配單個.起始位置不匹配，則返回None

第一個參數是正則表達式，若是匹配成功，則返回一個Match，不然返回一個None；
第二個參數表示要匹配的字符串；
第三個參數是標識位，用於控制正則表達式的匹配方式，如：是否區分大小寫，多行匹配等等。

# 標誌位以下

I = IGNORECASE = sre_compile.SRE_FLAG_IGNORECASE # ignore case
L = LOCALE = sre_compile.SRE_FLAG_LOCALE # assume current 8-bit locale
U = UNICODE = sre_compile.SRE_FLAG_UNICODE # assume unicode locale
M = MULTILINE = sre_compile.SRE_FLAG_MULTILINE # make anchors look for newline
S = DOTALL = sre_compile.SRE_FLAG_DOTALL # make dot match newline
X = VERBOSE = sre_compile.SRE_FLAG_VERBOSE # ignore whitespace and comments

案例：

import re
# 匹配第一個單詞
text = "JGood is a handsome boy, he is cool, clever, and so on..."
m = re.match(r"(\w+)\s", text)
print(m)
print(m.group())
print(m.group(0))
print(m.group(1))

out:
<_sre.SRE_Match object; span=(0, 6), match='JGood '>
JGood_   # _表示空格
JGood_   
JGood

從上面結果能夠看出， m.group() = m.group(0) = m.group(1) + '\s'

2） re.search(pattern, string, flags=0) 在字符串內查找模式匹配,匹配單個,只到找到第一個匹配而後返回，若是字符串沒有匹配，則返回None。

text = "JGood is a handsome boy, he is cool, clever, and so on..."
m = re.search(r"\w{8}\s", text)
print(m)
print(m.group())
print(m.group(0))

out:
<_sre.SRE_Match object; span=(11, 20), match='handsome '>
handsome_    # _表示空格
handsome_

3）從上邊的print(m) 能夠看到，匹配結果返回一個 SRE_Match object，下面講講這個Object 的幾個經常使用方法：

　　group([group1,…])

　　返回匹配到的一個或者多個子組。若是是一個參數，那麼結果就是一個字符串，若是是多個參數，那麼結果就是一個參數一個item的元組。group1的默認等於0(即返回全部的匹配值).若是groupN參數爲0，相對應的返回值就是所有匹配的字符串，若是group1的值是[1…99]範圍以內的,那麼將匹配對應括號組的字符串。若是組號是負的或者比pattern中定義的組號大，那麼將拋出IndexError異常。若是pattern沒有匹配到，可是group匹配到了，那麼group的值也爲None。若是一個pattern能夠匹配多個，那麼組對應的是樣式匹配的最後一個。另外，子組是根據括號從左向右來進行區分的。

　　groups([default])

　　返回一個包含全部子組的元組。Default是用來設置沒有匹配到組的默認值的。Default默認是"None」。

　　groupdict([default])

　返回匹配到的全部命名子組的字典。Key是name值，value是匹配到的值。參數default是沒有匹配到的子組的默認值。這裏與groups()方法的參數是同樣的。默認值爲None

4）findall(pattern, string, flags=0) 獲取字符串中全部匹配的字符串

1 text = "JGood is a handsome boy, he is cool, clever, and so on..."
2 obj = re.findall('\wo{2}\w', text)
3 print(obj)
4 
5 out: ['Good', 'cool']

5) re.sub(pattern, repl, string, count=0, flags=0) re.sub用於替換字符串中的匹配項。

1 text = "JGood is a handsome boy, he is cool, clever, and so on..."
2 obj = re.sub(r'\s+', '-', text)  # 將空格替換成「-」
3 print(obj)
4 
5 out: 
6 JGood-is-a-handsome-boy,-he-is-cool,-clever,-and-so-on...

6）re.split(pattern, string, maxsplit=0, flags=0) 根據指定匹配進行分組

1 text = "JGood is a handsome boy, he is cool, clever, and so on..."
2 obj = re.split(r'\s+', text) #以空格爲分隔符進行切分
3 print(obj)
4 
5 out: 
6 ['JGood', 'is', 'a', 'handsome', 'boy,', 'he', 'is', 'cool,', 'clever,', 'and', 'so', 'on...']

7) re.compile(pattern, flags=0) 能夠把正則表達式編譯成一個正則表達式對象。能夠把那些常用的正則表達式編譯成正則表達式對象，這樣能夠提升必定的效率

 1 import re
 2 
 3 text = "JGood is a handsome boy, he is cool, clever, and so on..."
 4 regex = re.compile(r'\w*oo\w*')
 5 print(regex.findall(text)) #查找全部包含'oo'的單詞
 6 print(regex.sub(lambda m: '[' + m.group(0) + ']', text)) #將字符串中含有'oo'的單詞用[]括起來。
 7 
 8 out:
 9 ['JGood', 'cool']
10 [JGood] is a handsome boy, he is [cool], clever, and so on...

10. configparser 處理特定格式的文件，其本質上是利用open來操做文件

1）聲明1：文件格式以下：

[section1] # 節點1
k1 = v1    # 值1
k2:v2       # 值2
 
[section2] # 節點2
k1 = v1    # 值1

　聲明2：configparser取出的值，默認都爲str類型，所以存儲的時候也要傳入str類型的參數。若是但願取出的值爲其餘類型，能夠經過以下方式進行轉換

config.getint(setion_name, key_name) , config.getfloat(setion_name, key_name), config.getboolean(setion_name, key_name)

2）獲取全部的節點section ： config.sections() 返回值是一個list

3）獲取指定節點下鍵值對： config.items(section_name)

4）獲取指定節點下全部的key ： config.options(section_name)

5）獲取指定節點下指定的key ： config.get(section_name, key_name)

6）檢查、刪除、增長節點：

config.has_section(section_name)

config.remove_section(section_name)

config.add_section(section_name)

7）檢查、刪除、設置指定section中的鍵值：

config.has_option(section_name, key_name)

config.remove_option(section_name, key_name)

config.set(section_name, key_name, value)

11. XML處理模塊：xml

　　用途:1. 頁面上作展現 2.配置文件

　　存儲方式：1.文件

　　　　　　 2.內部數據XML格式

1）解析XML

　　解析XML有兩種方式，第一種是直接解析，就是直接將xml文件，加載到內存，解析爲xml對象。

　　　　　　　　　　　第二種是間接解析，就是將xml經過open函數讀入內存，而後將讀出的str類型數據，解析爲xml對象。

　測試數據以下：

# filename : example.xml
<data>
    <country name="Liechtenstein">
        <rank updated="yes">2</rank>
        <year>2023</year>
        <gdppc>141100</gdppc>
        <neighbor direction="E" name="Austria" />
        <neighbor direction="W" name="Switzerland" />
    </country>
    <country name="Singapore">
        <rank updated="yes">5</rank>
        <year>2026</year>
        <gdppc>59900</gdppc>
        <neighbor direction="N" name="Malaysia" />
    </country>
    <country name="Panama">
        <rank updated="yes">69</rank>
        <year>2026</year>
        <gdppc>13600</gdppc>
        <neighbor direction="W" name="Costa Rica" />
        <neighbor direction="E" name="Colombia" />
    </country>
</data>

結構分析圖：

　直接解析

from xml.etree import ElementTree as ET

# 直接解析XML
# ElementTree 類型具備將內存中xml數據寫入文件的屬性，而Element不具有
tree = ET.parse("example.xml")
root = tree.getroot()
print(root)

out: <Element 'data' at 0x0000000000A56138>

直接解析

　間接解析

from xml.etree import ElementTree as ET

str_xml = open('example.xml', 'r').read()
root = ET.XML(str_xml)
print(root) 

out: <Element 'data' at 0x0000000000C37818>

間接解析

2）遍歷XML文檔中的全部內容

from xml.etree import ElementTree as ET
tree = ET.parse("example.xml")
root = tree.getroot()

for child in root:
    print(child, child.tag, child.attrib)
    for gradechild in child:
        print(gradechild, gradechild.tag, gradechild.text, gradechild.attrib, )

out: <Element 'country' at 0x0000000000E03AE8> country {'name': 'Liechtenstein'}
<Element 'rank' at 0x0000000000E18318> rank 2 {'updated': 'yes'}
<Element 'year' at 0x0000000000E18368> year 2023 {}
<Element 'gdppc' at 0x0000000000E183B8> gdppc 141100 {}
<Element 'neighbor' at 0x0000000000E18408> neighbor None {'direction': 'E', 'name': 'Austria'}
<Element 'neighbor' at 0x0000000000E18458> neighbor None {'direction': 'W', 'name': 'Switzerland'}
<Element 'country' at 0x0000000000E184A8> country {'name': 'Singapore'}
<Element 'rank' at 0x0000000000E184F8> rank 5 {'updated': 'yes'}
<Element 'year' at 0x0000000000E18548> year 2026 {}
<Element 'gdppc' at 0x0000000000E18598> gdppc 59900 {}
<Element 'neighbor' at 0x0000000000E185E8> neighbor None {'direction': 'N', 'name': 'Malaysia'}
<Element 'country' at 0x0000000000E18638> country {'name': 'Panama'}
<Element 'rank' at 0x0000000000E18688> rank 69 {'updated': 'yes'}
<Element 'year' at 0x0000000000E186D8> year 2026 {}
<Element 'gdppc' at 0x0000000000E18728> gdppc 13600 {}
<Element 'neighbor' at 0x0000000000E18778> neighbor None {'direction': 'W', 'name': 'Costa Rica'}
<Element 'neighbor' at 0x0000000000E187C8> neighbor None {'direction': 'E', 'name': 'Colombia'}

遍歷XML文檔中全部內容

　　遍歷XML某個節點的全部內容

from xml.etree import ElementTree as ET

str_xml = open('example.xml', 'r').read()
root = ET.XML(str_xml)

for node in root.iter('year'): # 去全部子和子孫節點中，找尋year節點
    print(node.tag, node.text)

out: 
year 2023
year 2026
year 2026

遍歷XML某個節點的內容

3）修改節點內容

from xml.etree import ElementTree as ET
# 打開文件，讀取XML內容
str_xml = open('example.xml', 'r').read()

# 將字符串解析成xml特殊對象，root代指xml文件的根節點
root = ET.XML(str_xml)

############ 操做 ############

# 頂層標籤
print(root.tag)

# 循環全部的year節點
for node in root.iter('year'):
    # 將year節點中的內容自增一
    new_year = int(node.text) + 1
    node.text = str(new_year)

    # 設置屬性
    node.set('name', 'alex')
    node.set('age', '18')
    # 刪除屬性
    del node.attrib['name']


############ 保存文件 ############
tree = ET.ElementTree(root)
tree.write("test3.xml", encoding='utf-8')

修改節點內容

　　刪除節點

# 直接解析xml文件
tree = ET.parse("example.xml")

# 獲取xml文件的根節點
root = tree.getroot()

############ 操做 ############

# 頂層標籤
print(root.tag)

# 遍歷data下的全部country節點
for country in root.findall('country'):
    # 獲取每個country節點下rank節點的內容
    rank = int(country.find('rank').text)

    if rank > 50:
        # 刪除指定country節點
        root.remove(country)

############ 保存文件 ############
tree.write("test-delnode.xml", encoding='utf-8')

刪除指定節點

　　在原xml基礎上建立節點

from xml.etree import ElementTree as ET
tree = ET.parse("example.xml")
root = tree.getroot()

# ele = ET.Element()
ele = ET.Element('test', {'k1': 'v1'})
ele.text = "內容"
# 在無text內容的時候，採用自閉合標籤，即<test k1='v1' />
# def __init__(self, tag, attrib={}, **extra):
root.append(ele)
tree.write('createxml.xml', encoding='utf-8')

在原xml基礎上建立節點

4）建立XML文檔

4.1）方式1：嫁接的方式生成XML文檔。即先生成子孫，而後將子孫嫁接到root部位，最後保存，完成整個建立工做

from xml.etree import ElementTree as ET
# 建立根節點
root = ET.Element("family")

# 建立節點大兒子
son1 = ET.Element('son', {'name': 'lisi'})
# 建立節點小兒子
son2 = ET.Element('son', {'name': 'zhangsan'})

# 在大兒子中建立兩個孫子
grandson1 = ET.Element('grandson', {'name': 'wangwu'})
grandson2 = ET.Element('grandson', {'name': 'maliu'})

# 把孫子添加到父親節點中
son1.append(grandson1)
son2.append(grandson2)
# 把父親添加到爺爺節點中
root.append(son1)
root.append(son2)

# 將爺爺節點轉換爲Etree類型
tree = ET.ElementTree(root)
# 默認狀況下write，會保存爲一行，沒有縮進
# tree.write("create_new_xml.xml", encoding='utf-8')
tree.write("create_new_xml.xml", encoding='GBK', xml_declaration=True, short_empty_elements=False)
# short_empty_elements = True表示開啓自封閉標籤，False表示關閉自封閉標籤
# xml_declaration = None時，若是爲US-ASCII 或者UTF-8則不添加聲明，其餘編碼格式添加聲明。若是爲True則永遠添加聲明.False關閉添加聲明

建立不帶縮進的XML文檔

建立結果以下：

<?xml version='1.0' encoding='GBK'?>
<family><son name="lisi"><grandson name="wangwu"></grandson></son><son name="zhangsan"><grandson name="maliu"></grandson></son></family>

能夠看到，利用原生的XML保存文件時，默認沒有縮進。所以須要修改保存方式

def prettify(elem):
    """將節點轉換成字符串，並添加縮進。
    """
    rough_string = ET.tostring(elem, 'utf-8')
    reparsed = minidom.parseString(rough_string)
    return reparsed.toprettyxml(indent="\t")

from xml.dom import minidom
from xml.etree import ElementTree as ET
# 建立根節點
root = ET.Element("family")

# 建立節點大兒子
son1 = ET.Element('son', {'name': 'lisi'})
# 建立節點小兒子
son2 = ET.Element('son', {'name': 'zhangsan'})

# 在大兒子中建立兩個孫子
grandson1 = ET.Element('grandson', {'name': 'wangwu'})
grandson2 = ET.Element('grandson', {'name': 'maliu'})

# 把孫子添加到父親節點中
son1.append(grandson1)
son2.append(grandson2)
# 把父親添加到爺爺節點中
root.append(son1)
root.append(son2)

raw_str = prettify(root)
f = open("create_new_xml.xml",'w',encoding='utf-8')
f.write(raw_str)
f.close()

建立帶縮進的XML文檔

4.2）方式2：開枝散葉的方式生成XML文檔。即子孫由root開始，長出son,再基於son長出grandson，依次生長，最後保存，完成整個建立工做。

from xml.etree import ElementTree as ET

# 建立根節點
root = ET.Element("famliy")


# 建立大兒子
# son1 = ET.Element('son', {'name': '兒1'})
son1 = root.makeelement('son', {'name': '兒1'})
# 建立小兒子
# son2 = ET.Element('son', {"name": '兒2'})
son2 = root.makeelement('son', {"name": '兒2'})

# 在大兒子中建立兩個孫子
# grandson1 = ET.Element('grandson', {'name': '兒11'})
grandson1 = son1.makeelement('grandson', {'name': '兒11'})
# grandson2 = ET.Element('grandson', {'name': '兒12'})
grandson2 = son1.makeelement('grandson', {'name': '兒12'})

son1.append(grandson1)
son1.append(grandson2)


# 把兒子添加到根節點中
root.append(son1)
root.append(son1)

tree = ET.ElementTree(root)
tree.write('oooo.xml',encoding='utf-8', short_empty_elements=False)

開枝散葉生成XML

4.3）方式3：以拼湊的方式生成XML文檔。即以某個節點爲準，直接插入對應節點的子節點位置。

from xml.etree import ElementTree as ET


# 建立根節點
root = ET.Element("famliy")


# 建立節點大兒子
son1 = ET.SubElement(root, "son", attrib={'name': '兒1'})
# 建立小兒子
son2 = ET.SubElement(root, "son", attrib={"name": "兒2"})

# 在大兒子中建立一個孫子
grandson1 = ET.SubElement(son1, "age", attrib={'name': '兒11'})
grandson1.text = '孫子'


et = ET.ElementTree(root)  #生成文檔對象
et.write("test.xml", encoding="utf-8", xml_declaration=True, short_empty_elements=False)

拼湊生成XML

5）命名空間：暫時沒用到，用到了再說！

參考連接： http://www.w3school.com.cn/xml/xml_namespaces.asp

12. shutil 模塊以及壓縮包處理

1）將文件內容拷貝到另外一個文件中

1 import shutil
2 shutil.copyfileobj(open('old.txt','r'), open('new.txt', 'w'))

2）拷貝文件

shutil.copyfile('old.txt', 'new.txt')

3）僅拷貝權限。內容、組、用戶均不變

shutil.copymode('old.txt', 'new.txt')

4）僅拷貝狀態的信息，包括：mode bits, atime, mtime, flags

shutil.copystat('old.txt', 'new.txt')

5）拷貝文件和權限

shutil.copy('old.txt', 'new.txt')

6）拷貝文件和狀態信息

shutil.copy2('old.txt', 'new.txt')

7）遞歸的去拷貝文件夾

shutil.copytree('folder1', 'folder2', ignore=shutil.ignore_patterns('*.pyc', 'tmp*'))

8）遞歸的去刪除文件夾

 shutil.rmtree('folder1')

shutil模塊對於壓縮包的處理很弱，所以選用其餘模塊來處理壓縮包。這裏介紹zipfile和tarfile兩個模塊。

13. zipfile和tarfile

import zipfile

# 壓縮。 壓縮以後源文件不消失
z = zipfile.ZipFile('test.zip', 'w')
# w表示建立新的，a表示追加
z.write('file_1.log') # file必須存在，不然會報錯FileNotFoundError
z.write('file_2.txt')  
z.close()

# 解壓
z = zipfile.ZipFile('test.zip', 'r')
# 查看壓縮包中文件名列表
print(z.namelist())
# 解壓單個文件
z.extract('file_1.log')
# 解壓所有文件
z.extractall()
z.close()

zipfile的用法

# 壓縮
tar = tarfile.open('test.tar','w')
tar.add('file_1.log', arcname='bbs2.log') # 壓縮後可改變壓縮名
tar.add('file_2.txt') # 不寫arcname的話，文件名保持不變
tar.close()

# 解壓
tar = tarfile.open('test.tar','r')
# 獲取壓縮文件的文件名列表
print(tar.getnames())
# 解壓單個文件
tar.extract("file_2.txt")
# 解壓所有文件
tar.extractall()  # 可設置解壓地址,默認爲當前目錄
tar.close()

tarfile的用法

14. subprocess 執行命令

　　在執行一些Linux系統命令的時候，有多種方式：好比os.system(command) , os.popen(commond).read(), commands.getstatusoutput(command) 等方法。以上執行shell命令的相關的模塊和函數的功能均在 subprocess 模塊中實現，並提供了更豐富的功能。

　　subprocess包中定義有數個建立子進程的函數，這些函數分別以不一樣的方式建立子進程，因此咱們能夠根據須要來從中選取一個使用。另外subprocess還提供了一些管理標準流(standard stream)和管道(pipe)的工具，從而在進程間使用文本通訊。

1）subprocess.call() 執行命令，返回狀態碼,至關於return exit_code

retcode = subprocess.call(["ls", "-l"], shell=False)

retcode = subprocess.call("ls -l", shell=True)

爲何用shell=True:

　　shell=False時，該方法的執行是以os.execvp(file, args)來執行的，若是接收一個列表或元組，則列表第一個元素當作命令，以後的當作參數進行執行。若是接收一個字符串，則認爲該字符串是一個可執行文件的文件名，會執行該文件，文件不存在報：OSError: [Errno 2] No such file or directory

　　咱們使用了shell=True這個參數。Python將先運行一個shell，再用這個shell來解釋這整個字符串。shell命令中有一些是shell的內建命令，這些命令必須經過shell運行，$cd。shell=True容許咱們運行這樣一些命令。

2）subprocess.check_call() 執行命令，若是執行狀態碼是0，能夠取到返回的狀態碼，不然報出錯誤subprocess.CalledProcessError(returncode, cmd, output=None, stderr=None)，該對象包含有returncode屬性

　　subprocess.check_call(["ls", "-l"], shell=False)

　　subprocess.check_call("ls -l", shell=True)

import subprocess

b = subprocess.CalledProcessError

try:
subprocess.check_call('fff', shell=True)
except b:
print b

out:

/bin/sh: fff: command not found
Command 'fff' returned non-zero exit status 127

3）subprocess.check_output() 執行命令，若是執行狀態碼是 0 ，則返回執行結果，且return值存在，若是return code不爲0，則舉出錯誤subprocess.CalledProcessError，該對象包含有returncode屬性和output屬性，output屬性爲標準輸出的輸出結果。

　retinfo = subprocess.check_output(["ls", "-l"], shell=False)

retinfo = subprocess.check_output("ls -l", shell=True)

上述的三個方法，本質上都是調用了subprocess.Popen()方法。

4）subprocess.Popen(self, args, bufsize=-1, executable=None,stdin=None, stdout=None, stderr=None,preexec_fn=None, lose_fds=_PLATFORM_DEFAULT_CLOSE_FDS,shell=False, cwd=None, env=None, universal_newlines=False,startupinfo=None, creationflags=0,restore_signals=True, start_new_session=False,pass_fds=()) 用戶執行復雜的命令

參數：

args：shell命令，能夠是字符串或者序列類型（如：list，元組）
bufsize：指定緩衝。0 無緩衝,1 行緩衝,其餘緩衝區大小,負值系統緩衝
stdin, stdout, stderr：分別表示程序的標準輸入、輸出、錯誤句柄
preexec_fn：只在Unix平臺下有效，用於指定一個可執行對象（callable object），它將在子進程運行以前被調用
close_sfs：在windows平臺下，若是close_fds被設置爲True，則新建立的子進程將不會繼承父進程的輸入、輸出、錯誤管道。
因此不能將close_fds設置爲True同時重定向子進程的標準輸入、輸出與錯誤(stdin, stdout, stderr)。
shell：同上
cwd：用於設置子進程的當前目錄
env：用於指定子進程的環境變量。若是env = None，子進程的環境變量將從父進程中繼承。
universal_newlines：不一樣系統的換行符不一樣，True -> 贊成使用 \n
startupinfo與createionflags只在windows下有效。將被傳遞給底層的CreateProcess()函數，用於設置子進程的一些屬性，如：主窗口的外觀，進程的優先級等等

import subprocess
ret1 = subprocess.Popen(["mkdir","t1"])
ret2 = subprocess.Popen("mkdir t2", shell=True)

終端輸入的命令分爲兩種：

輸入便可獲得輸出，如：ifconfig
輸入進行某環境，依賴再輸入，如：python

情景1：輸入後即刻獲得輸出

import subprocess

obj = subprocess.Popen("mkdir t3", shell=True, cwd='/home/dev',)

情景2：輸入進行某環境，依賴再輸入

import subprocess

obj = subprocess.Popen(["python3"],
                        stdin=subprocess.PIPE,
                        stdout=subprocess.PIPE,
                        stderr=subprocess.PIPE,
                        universal_newlines=True)
obj.stdin.write("print(1)\n")
obj.stdin.write("print(2)")
obj.stdin.close()

cmd_out = obj.stdout.read()
obj.stdout.close()
cmd_error = obj.stderr.read()
obj.stderr.close()

print(cmd_out)
print(cmd_error)

code_1

import subprocess

obj = subprocess.Popen(["python"], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True)
obj.stdin.write("print(1)\n")
obj.stdin.write("print(2)")

out_error_list = obj.communicate()
print(out_error_list)
# out_error_list = (stdout, stderr)

code_2

import subprocess

obj = subprocess.Popen(["python"], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True)
out_error_list = obj.communicate('print("hello")')
print(out_error_list)
# if self.universal_newlines is True, this should be a string; if it is False, "input" should be bytes.

code_3

# universal_newlines=True表示以text的方式打開stdout和stderr。