Python之讀寫文本數據

時間 2019-11-08

原文原文鏈接

知識點很少python

一：普通操做 函數

# rt 模式的 open() 函數讀取文本文件
# wt 模式的 open() 函數清除覆蓋掉原文件，write新文件
# at 模式的 open() 函數添加write新文件
with open("../../testData","rt",encoding="utf-8") as f :
    for line in f :
        print(line)


# 寫操做默認使用系統編碼，能夠經過調用 sys.getdefaultencoding() 來獲得,也以經過傳遞一個可選的 encoding 參數給open()函數，幾個常見的編碼是ascii, latin-1, utf-8和utf-16
# 關於換行符的識別問題：在Unix和Windows中是不同的(分別是 \n 和 \r\n )。 默認狀況下，Python會以統一模式處理換行，識別普通換行符並將其轉換爲單個 \n 字符
# 若是你不但願這種默認的處理方式，能夠給 open() 函數傳入參數 newline=''
f =open('somefile.txt', 'rt', newline='')


# 最後一個問題就是文本文件中可能出現的編碼錯誤
# 若是修改了編碼方式以後，編碼錯誤仍是存在的話，你能夠給 open() 函數傳遞一個可選的 errors 參數來處理這些錯誤。
m = open('sample.txt', 'rt', encoding='ascii', errors='replace')

# 若是你常常使用 errors 參數來處理編碼錯誤，也是很差的。
# 原則是確保你老是使用的是正確編碼。當模棱兩可的時候，就使用默認的設置(一般都是UTF-8)。

二：打印輸出至文件中

with open('../../testData', 'at') as f:
    print('Hello python!', file=f)

# 使用其餘分隔符或行終止符打印
# 使用 print() 函數輸出數據，可是想改變默認的分隔符或者行尾符,使用sep  end
print("hello","cool","zzy",sep="***",end="!")   # hello***cool***zzy!

#使用  str.join()
print("***".join(("hello","cool","zzy")))  # hello***cool***zzy

# 對比
row = ("hello","cool","zzy")
print(*row,sep="***")   # hello***cool***zzy

三：讀寫字節數據

# 讀寫二進制文件，好比圖片，聲音文件,使用模式爲 rb 或 wb 的 open() 函數

# 在讀取二進制數據的時候，字節字符串和文本字符串的語義差別可能會致使一個潛在的陷阱:索引和迭代動做返回的是字節的值而不是字節字符串
a='Hello World'
print(a[0])   # H

b = b'Hello World'  # Byte string
print(b[0])  # 72  索引和迭代動做返回的是字節的值而不是字節字符串


# 若是你想從二進制模式的文件中讀取或寫入文本數據，必須確保要進行解碼和編碼操做
# 解碼
with open("../../testDta","rb") as f :
    data=f.read(16)
    text=data.decode("utf-8")

# 編碼
with open("../../testDta","wb") as d :
    text="hello"
    f.write(text.encode("utf-8"))

四：文件不存在才能寫入

一個替代方案是先測試這個文件是否存在，像下面這樣：

import os
if not os.path.exists("../../testData"):
    with open("../../testData", "wt") as f:
        f.write("")
else:
    print('File already exists!')

最優解決

# 文件中寫入數據,不容許覆蓋已存在的數據，能夠用"xt",這個參數會判斷文件是否已經存在(無論是否有內容，也無論內容是怎樣的)，若是有會報錯FileExistsError:
with open("../../testData","wt") as f :
    f.write("")
with open("../../testData","xt") as f :
    f.write("你好")    # FileExistsError: [Errno 17] File exists: '../../testData'

五：讀壓縮文件

import gzip
with gzip.open('somefile.gz', 'rt') as f:
    text = f.read()

# bz2 compression
import bz2
with bz2.open('somefile.bz2', 'rt') as f:
    text = f.read()

# 寫入壓縮數據
import gzip
with gzip.open('somefile.gz', 'wt') as f:
    f.write(text)

# bz2 compression
import bz2
with bz2.open('somefile.bz2', 'wt') as f:
    f.write(text)

讀寫壓縮數據  若是你不指定模式，那麼默認的就是二進制模式，若是這時候程序想要接受的是文本數據，那麼就會出錯。
gzip.open() 和 bz2.open() 接受跟內置的 open() 函數同樣的參數， 包括 encoding，errors，newline 等等。
可使用 compresslevel 這個可選的關鍵字參數來指定一個壓縮級別，默認的等級是9，也是最高的壓縮等級。等級越低性能越好，可是數據壓縮程度也越低。

with gzip.open('somefile.gz', 'wt', compresslevel=5) as f:
    f.write(text)

六：固定大小記錄的文件迭代

from functools import partial
RECORD_SIZE= 32
# somefile.data是二進制文件
with open('somefile.data', 'rb') as f:
    records = iter(partial(f.read, RECORD_SIZE), b'')
    for r in records:
        ...