day3-python之文件操做(一)

時間 2019-11-13

標籤 day3 day python 文件欄目 Python 简体版

原文原文鏈接

　　目錄

1、文件操做
　　1.1 文件操做基本流程
　　1.2 文件編碼
　　1.3 文件的打開模式
　　1.4 上下文管理
　　1.5 文件的修改
　　1.6 文件操做方法python

2、總結linux






1、文件操做
1.1 文件操做基本流程
一、打開文件，獲得文件句柄並賦值給一個變量
二、經過句柄對文件進行操做
三、關閉文件
例1：相對路徑讀取文件

1 # 1、打開文件，獲得文件句柄並賦值給一個變量（file、f_handle、file_handle、f_obj、f1） 2 f1 = open('a.txt',encoding='utf-8',mode='r') 3 # 2、經過句柄對文件進行操做 4 content = f1.read() 5 # 3、關閉文件 6 f1.close()

# 注意：open指令爲windows的指令。windows默認編碼方式爲gbk，linux默認編碼方式爲utf-8。

例2：絕對路徑讀取文件

 1 f1 = open('‪D:\a.txt', encoding='utf-8', mode='r')  2 content = f1.read()  3 print(content)  4 輸出結果：  5 '''  6 Traceback (most recent call last):  7   File "C:/Users/benjamin/python自動化21期/day3/筆記文本.py", line 1, in <module>
 8     f1 = open('D:\a.txt', encoding='utf-8')  9 OSError: [Errno 22] Invalid argument: 'D:\x07.txt'
10 '''

解決方法1（推薦）：vim

1 f1 = open(r'D:\a.txt', encoding='utf-8') 2 content = f1.read() 3 print(content)

解決方法2（不推薦）：windows

1 f1 = open('D:\\a.txt', encoding='utf-8') 2 content = f1.read() 3 print(content)

# 注意：windows默認編碼爲gbk，Linux默認編碼爲utf-8，讀取文件時，讀取編碼不一樣，也會報錯。網絡

1.2 文件編碼

unicode：簡單粗暴，全部的字符都是2Bytes,優勢是字符--數字的轉換速度快；缺點是佔用空間大。
utf-8:精準，可變長，優勢是節省空間；缺點是轉換速度慢，由於每次轉換都須要計算出須要多長Bytes纔可以準確表示。app

1.內存中使用的編碼是unicode，用空間換時間（程序都須要加載到內存才能運行，於是內存應該是越快越好）
2.硬盤中或網絡傳輸用utf-8，保證數據傳輸的穩定性。less

全部程序，最終都要加載到內存，程序保存到硬盤不一樣的國家用不一樣的編碼格式，可是到內存中咱們爲了兼容萬國（計算機能夠運行任何國家的程序緣由在於此），統一且固定使用unicode，這就是爲什麼內存固定用unicode的緣由，你可能會說兼容萬國我能夠用utf－8啊，能夠，徹底能夠正常工做，之因此不用確定是unicode比utf－8更高效啊（uicode固定用2個字節編碼，utf－8則須要計算），可是unicode更浪費空間，沒錯，這就是用空間換時間的一種作法，而存放到硬盤，或者網絡傳輸，都須要把unicode轉成utf－8，由於數據的傳輸，追求的是穩定，高效，數據量越小數據傳輸就越靠譜，因而都轉成utf－8格式的，而不是unicode。

unicode------>encode(編碼)-------->utf-8
utf-8---------->decode--------->unicode編輯器

文件從內存刷到硬盤的操做簡稱存文件
文件從硬盤讀到內存的操做簡稱讀文件
亂碼：存文件時就已經亂碼或者存文件時不亂碼而讀文件時亂碼
總結：
不管是何種編輯器，要防止文件出現亂碼（請必定注意，存放一段代碼的文件也僅僅只是一個普通文件而已，此處指的是文件沒有執行前，咱們打開文件時出現的亂碼）
核心法則就是，文件以什麼編碼保存的，就以什麼編碼方式打開ide

1.3 文件的打開模式
文件句柄 = open('文件路徑','模式')
一、打開文件時，須要指定文件路徑和以什麼方式打開文件。

r，只讀模式【默認模式，文件必須存在，不存在則拋出異常】
w，只寫模式【不可讀；不存在則建立；存在則清空內容】
x，只寫模式【不可讀；不存在則建立；存在則報錯】
a，追加模式【可讀，不存在則建立；存在則只追加內容】

r模式：

 1 ## r模式：  2 # read() 所有讀出  3 f1 = open('log1', encoding='utf-8')  4 content = f1.read()  5 print(content)  6 f1.close()  7 
 8 # read(n) r模式：按照字符讀取  9 f1 = open('log1',encoding='utf-8') 10 content = f1.read(5) 11 print(content) 12 f1.close() 13 
14 # read(n) rb模式：按照字節讀取。1個字符3個字節，寫4個字節會報錯。 15 f1 = open('log1',mode='rb') 16 content = f1.read(3) 17 print(content.decode('utf-8')) 18 f1.close() 19 
20 # readline() 按行讀取，讀取完，打印空行 21 f1 = open('log1',encoding='utf-8') 22 print(f1.readline()) 23 print(f1.readline()) 24 f1.close() 25 
26 # readlines() 將文件每一行做爲列表的一個元素並返回這個列表 27 f1 = open('log1',encoding='utf-8') 28 print(f1.readlines()) 29 f1.close() 30 
31 # for循環 for循環一個文件句柄，在內存中只佔用一條的空間 32 f1 = open('log1',encoding='utf-8') 33 for i in f1: 34  print(i) 35 f1.close() 36 
37 # 編碼的補充 38 s1 = '中國'
39 s2 = s1.encode('gbk') 40 print(s2) 41 # 輸出結果：b'\xd6\xd0\xb9\xfa'
42 
43 s1 = b'\xd6\xd0\xb9\xfa'
44 s2 = s1.decode('gbk') 45 s3 = s2.encode('utf-8') 46 print(s3) 47 # 輸出結果：b'\xe4\xb8\xad\xe5\x9b\xbd'
48 
49 # 簡化 50 s1 = b'\xd6\xd0\xb9\xfa'.decode('gbk').encode('utf-8') 51 print(s1) 52 # 輸出結果：b'\xe4\xb8\xad\xe5\x9b\xbd'

w模式：學習

1 ## w模式 2 # 不可讀，文件不存在則建立，存在則清空內容，而後再寫入。 3 f1 = open('log2',encoding='utf-8',mode='w') 4 f1.write('python是一門高級語言') 5 f1.close()

a模式：

1 ## a模式 2 # 可讀，不存在則建立，存在則只追加內容 3 f1 = open('log2',encoding='utf-8',mode='a') 4 f1.write('\npython學習') 5 f1.close()

二、「+」表示能夠同時讀寫某個文件（就是增長了一個功能）

r+，讀寫【可讀，可寫】
w+，寫讀【可讀，可寫】
x+，寫讀【可讀，可寫】
a+，寫讀【可讀，可寫】

r+模式：

 1 # r+模式 先讀出原文件，而後追加寫入  2 f1 = open('log1',encoding='utf-8',mode='r+')  3 print(f1.read())  4 f1.write('666')  5 f1.close()  6 
 7 #r+模式 先寫後讀，正常狀況會出錯  8 f1 = open('log1',encoding='utf-8',mode='r+')  9 f1.write('666') 10 print(f1.read()) 11 f1.close() 12 # 原來內容：快快樂樂 13 # 輸出內容：快樂樂 14 # 文件內容：666快樂樂 15 # 光標按照字節去運轉 16 
17 # r+模式 先寫後讀，調整光標位置 18 f1 = open('log1',encoding='utf-8',mode='r+') 19 f1.seek(0,2) 20 f1.write('666') 21 f1.seek(0) 22 print(f1.read()) 23 f1.close() 24 # 輸出內容：快快樂樂666

w+模式：

1 # w+模式 先寫後讀，原文件裏內容會先刪除，而後再寫入 2 f1 = open('log2',encoding='utf-8',mode='w+') 3 f1.write('老男孩') 4 f1.seek(0) 5 print(f1.read()) 6 f1.close()

a+模式：

1 # a+模式 2 f1 = open('log2',encoding='utf-8',mode='a+') 3 f1.write('ababababab') 4 f1.seek(0) 5 print(f1.read()) 6 f1.close()

三、「b」表示以字節的方式操做

對於非文本文件，咱們只能使用b模式，"b"表示以字節的方式操做（而全部文件也都是以字節的形式存儲的，使用這種模式無需考慮文本文件的字符編碼、圖片文件的jgp格式、視頻文件的avi格式）

注：以b方式打開時，讀取到的內容是字節類型，寫入時也須要提供字節類型，不能指定編碼

rb模式：

1 # rb模式 按照字節讀取 2 f1 = open('log1', mode='rb') 3 content = f1.read(3) 4 print(content.decode('utf-8')) 5 f1.close()

wb模式：

1 # wb模式 2 f1 = open('log2',mode='wb') 3 f1.write('python語言'.encode('utf-8')) 4 f1.close()

ab模式：

# ab模式 f1 = open('log2',mode='ab') f1.write('\npython語言'.encode('utf-8')) f1.close()

四、以bytes類型操做的讀寫、寫讀、寫讀模式

r+b，讀寫【可讀，可寫】
w+b，寫讀【可寫，可讀】
x+b，寫讀【可寫，可讀】
a+b，寫讀【可寫，可讀】
1.4 上下文管理

1 # with open() as: 在循環的時候不能用 2 with open('log1',encoding='utf-8') as f1: 3  print(f1.read()) 4 
5 # with open() as: 操做多個文件句柄 6 with open('log1',encoding='utf-8') as f1,\ 7         open('log2',encoding='utf-8',mode='w') as f2: 8  print(f1.read()) 9     f2.write('777')

1.5 文件的修改

一、打開原文件，產生文件句柄
二、建立新文件產生文件句柄
三、讀取原文件，進行修改，寫入新文件
四、將原文件刪除
五、新文件重命名爲原文件

文件的數據是存放於硬盤上的，於是只存在覆蓋、不存在修改這麼一說，咱們平時看到的修改文件，都是模擬出來的效果，具體的說有兩種實現方式：
方式一：將硬盤存放的該文件的內容所有加載到內存，在內存中是能夠修改的，修改完畢後，再由內存覆蓋到硬盤（word，vim，nodpad++等編輯器）
方式二：將硬盤存放的該文件的內容一行一行地讀入內存，修改完畢就寫入新文件，最後用新文件覆蓋源文件

 1 # 方式一  2 import os  3 with open('file_test',encoding='utf-8') as f1,\  4     open('file_test.bak',encoding='utf-8',mode='w') as f2:  5    old_content = f1.read()  6    new_content = old_content.replace('alex','SB')  7    f2.write(new_content)  8 os.remove('file_test')  9 os.rename('file_test.bak','file_test') 10 
11 # 方式二 12 import os 13 with open('file_test',encoding='utf-8') as f1,\ 14     open('file_test.bak',encoding='utf-8',mode='w') as f2: 15     for line in f1: 16         new_line = line.replace('SB','alex') 17         f2.write(new_line) 18 os.remove('file_test') 19 os.rename('file_test.bak','file_test')

1.6 文件操做方法

一、經常使用操做方法

read（3）：
1. 文件打開方式爲文本模式時，表明讀取3個字符
2. 文件打開方式爲b模式時，表明讀取3個字節
其他的文件內光標移動都是以字節爲單位的如：seek，tell，truncate
注意：
1. seek有三種移動方式0，1，2，其中1和2必須在b模式下進行，但不管哪一種模式，都是以bytes爲單位移動的

seek控制光標的移動，是以文件開頭做爲參照的

tell當前光標的位置
2. truncate是截斷文件，因此文件的打開方式必須可寫，可是不能用w或w+等方式打開，由於那樣直接清空文件了，因此truncate要在r+或a或a+等模式下測試效果。

 1 # readable() 判斷是否可讀  2 f1 = open('log2',encoding='utf-8',mode='w')  3 print(f1.readable())  4 f1.write('ababababab')  5 f1.close()  6 # 輸出結果：False  7 
 8 # writable() 判斷是否可寫  9 f1 = open('log2',encoding='utf-8',mode='w') 10 print(f1.writable()) 11 f1.write('ababababab') 12 f1.close() 13 # 輸出結果：True 14 
15 # tell 告知指針的位置 16 f1 = open('log2',encoding='utf-8',mode='w') 17 f1.write('ababababab') 18 print(f1.tell()) 19 f1.close() 20 # 輸出結果：10
21 
22 # seek(參數) 按照字節去調整 23 # seek(0,2)   調至最後位置

二、全部操做方法

 1 class file(object)  2  def close(self): # real signature unknown; restored from __doc__  3  關閉文件  4         """  5         close() -> None or (perhaps) an integer.  Close the file.  6          
 7         Sets data attribute .closed to True.  A closed file cannot be used for
 8         further I/O operations.  close() may be called more than once without  9         error.  Some kinds of file objects (for example, opened by popen())  10  may return an exit status upon closing.  11         """  12  
 13  def fileno(self): # real signature unknown; restored from __doc__  14  文件描述符  15          """  16         fileno() -> integer "file descriptor".  17          
 18         This is needed for lower-level file interfaces, such os.read().  19         """  20         return 0    
 21  
 22  def flush(self): # real signature unknown; restored from __doc__  23  刷新文件內部緩衝區  24         """ flush() -> None. Flush the internal I/O buffer. """
 25  pass  26  
 27  
 28  def isatty(self): # real signature unknown; restored from __doc__  29  判斷文件是不是贊成tty設備  30         """ isatty() -> true or false. True if the file is connected to a tty device. """
 31  return False  32  
 33  
 34  def next(self): # real signature unknown; restored from __doc__  35  獲取下一行數據，不存在，則報錯  36         """ x.next() -> the next value, or raise StopIteration """
 37  pass  38  
 39     def read(self, size=None): # real signature unknown; restored from __doc__  40  讀取指定字節數據  41         """  42         read([size]) -> read at most size bytes, returned as a string.  43          
 44         If the size argument is negative or omitted, read until EOF is reached.  45         Notice that when in non-blocking mode, less data than what was requested  46         may be returned, even if no size parameter was given.  47         """  48  pass  49  
 50  def readinto(self): # real signature unknown; restored from __doc__  51  讀取到緩衝區，不要用，將被遺棄  52         """ readinto() -> Undocumented. Don't use this; it may go away. """
 53  pass  54  
 55     def readline(self, size=None): # real signature unknown; restored from __doc__  56  僅讀取一行數據  57         """  58         readline([size]) -> next line from the file, as a string.  59          
 60         Retain newline.  A non-negative size argument limits the maximum  61         number of bytes to return (an incomplete line may be returned then).  62         Return an empty string at EOF.  63         """  64  pass  65  
 66     def readlines(self, size=None): # real signature unknown; restored from __doc__  67  讀取全部數據，並根據換行保存值列表  68         """  69         readlines([size]) -> list of strings, each a line from the file.  70          
 71  Call readline() repeatedly and return a list of the lines so read.  72         The optional size argument, if given, is an approximate bound on the  73         total number of bytes in the lines returned.  74         """  75  return []  76  
 77     def seek(self, offset, whence=None): # real signature unknown; restored from __doc__  78  指定文件中指針位置  79         """  80         seek(offset[, whence]) -> None.  Move to new file position.  81          
 82         Argument offset is a byte count. Optional argument whence defaults to  83 (offset from start of file, offset should be >= 0); other values are 1
 84         (move relative to current position, positive or negative), and 2 (move  85         relative to end of file, usually negative, although many platforms allow  86         seeking beyond the end of a file).  If the file is opened in text mode,  87  only offsets returned by tell() are legal. Use of other offsets causes  88  undefined behavior.  89         Note that not all file objects are seekable.  90         """  91  pass  92  
 93  def tell(self): # real signature unknown; restored from __doc__  94  獲取當前指針位置  95         """ tell() -> current file position, an integer (may be a long integer). """
 96  pass  97  
 98     def truncate(self, size=None): # real signature unknown; restored from __doc__  99  截斷數據，僅保留指定以前數據 100         """ 101         truncate([size]) -> None.  Truncate the file to at most size bytes. 102          
103         Size defaults to the current file position, as returned by tell(). 104         """ 105  pass 106  
107     def write(self, p_str): # real signature unknown; restored from __doc__ 108  寫內容 109         """ 110         write(str) -> None.  Write string str to file. 111          
112  Note that due to buffering, flush() or close() may be needed before 113         the file on disk reflects the data written. 114         """ 115  pass 116  
117  def writelines(self, sequence_of_strings): # real signature unknown; restored from __doc__ 118  將一個字符串列表寫入文件 119         """ 120         writelines(sequence_of_strings) -> None.  Write the strings to the file. 121          
122         Note that newlines are not added.  The sequence can be any iterable object
123         producing strings. This is equivalent to calling write() for each string. 124         """ 125  pass 126  
127  def xreadlines(self): # real signature unknown; restored from __doc__ 128  可用於逐行讀取文件，非所有 129         """ 130         xreadlines() -> returns self. 131          
132  For backward compatibility. File objects now include the performance 133         optimizations previously implemented in the xreadlines module. 134         """ 135  pass 136 
137 2.x 138 
139 2.x

View Code

 1 class TextIOWrapper(_TextIOBase):  2     """  3     Character and line based layer over a BufferedIOBase object, buffer.  4     
 5  encoding gives the name of the encoding that the stream will be  6  decoded or encoded with. It defaults to locale.getpreferredencoding(False).  7     
 8  errors determines the strictness of encoding and decoding (see  9     help(codecs.Codec) or the documentation for codecs.register) and  10     defaults to "strict".  11     
 12     newline controls how line endings are handled. It can be None, '',  13     '\n', '\r', and '\r\n'. It works as follows:  14     
 15     * On input, if newline is None, universal newlines mode is  16       enabled. Lines in the input can end in '\n', '\r', or '\r\n', and  17       these are translated into '\n' before being returned to the  18       caller. If it is '', universal newline mode is enabled, but line  19  endings are returned to the caller untranslated. If it has any of  20  the other legal values, input lines are only terminated by the given  21       string, and the line ending is returned to the caller untranslated.  22     
 23     * On output, if newline is None, any '\n' characters written are  24  translated to the system default line separator, os.linesep. If  25       newline is '' or '\n', no translation takes place. If newline is any  26       of the other legal values, any '\n' characters written are translated  27       to the given string.  28     
 29  If line_buffering is True, a call to flush is implied when a call to  30     write contains a newline character.  31     """  32     def close(self, *args, **kwargs): # real signature unknown  33  關閉文件  34  pass  35 
 36     def fileno(self, *args, **kwargs): # real signature unknown  37  文件描述符  38  pass  39 
 40     def flush(self, *args, **kwargs): # real signature unknown  41  刷新文件內部緩衝區  42  pass  43 
 44     def isatty(self, *args, **kwargs): # real signature unknown  45  判斷文件是不是贊成tty設備  46  pass  47 
 48     def read(self, *args, **kwargs): # real signature unknown  49  讀取指定字節數據  50  pass  51 
 52     def readable(self, *args, **kwargs): # real signature unknown  53  是否可讀  54  pass  55 
 56     def readline(self, *args, **kwargs): # real signature unknown  57  僅讀取一行數據  58  pass  59 
 60     def seek(self, *args, **kwargs): # real signature unknown  61  指定文件中指針位置  62  pass  63 
 64     def seekable(self, *args, **kwargs): # real signature unknown  65  指針是否可操做  66  pass  67 
 68     def tell(self, *args, **kwargs): # real signature unknown  69  獲取指針位置  70  pass  71 
 72     def truncate(self, *args, **kwargs): # real signature unknown  73  截斷數據，僅保留指定以前數據  74  pass  75 
 76     def writable(self, *args, **kwargs): # real signature unknown  77  是否可寫  78  pass  79 
 80     def write(self, *args, **kwargs): # real signature unknown  81  寫內容  82  pass  83 
 84     def __getstate__(self, *args, **kwargs): # real signature unknown  85  pass  86 
 87     def __init__(self, *args, **kwargs): # real signature unknown  88  pass  89 
 90     @staticmethod # known case of __new__  91     def __new__(*args, **kwargs): # real signature unknown  92         """ Create and return a new object. See help(type) for accurate signature. """
 93  pass  94 
 95     def __next__(self, *args, **kwargs): # real signature unknown  96         """ Implement next(self). """
 97  pass  98 
 99     def __repr__(self, *args, **kwargs): # real signature unknown 100         """ Return repr(self). """
101  pass 102 
103     buffer = property(lambda self: object(), lambda self, v: None, lambda self: None) # default 104 
105     closed = property(lambda self: object(), lambda self, v: None, lambda self: None) # default 106 
107     encoding = property(lambda self: object(), lambda self, v: None, lambda self: None) # default 108 
109     errors = property(lambda self: object(), lambda self, v: None, lambda self: None) # default 110 
111     line_buffering = property(lambda self: object(), lambda self, v: None, lambda self: None) # default 112 
113     name = property(lambda self: object(), lambda self, v: None, lambda self: None) # default 114 
115     newlines = property(lambda self: object(), lambda self, v: None, lambda self: None) # default 116 
117     _CHUNK_SIZE = property(lambda self: object(), lambda self, v: None, lambda self: None) # default 118 
119     _finalizing = property(lambda self: object(), lambda self, v: None, lambda self: None) # default 120 
121 3.x 122 
123 3.x

View Code

2、總結

# 打開文件    # f = open('文件路徑')  默認的打開方式r ，默認的打開編碼是操做系統的默認編碼    # r w a （r+ w+ a+） 以上6種加b  ，若是打開模式+b，就不須要指定編碼了。r+ w+ a+ 工做中避免用這三個。主要用r w a 模式。    # 經常使用編碼：UTF-8 、 gbk# 操做文件    # 讀        # read 不傳參數 意味着讀全部            # 傳參，若是是r方式打開的，參數指的是讀多少個字符            # 傳參，若是是rb方式打開的，參數指的是讀多少個字節        # readline            # 一行一行讀  每次只讀一行，不會自動中止        # for循環的方式            # 一行一行讀  從第一行開始 每次讀一行 讀到沒有以後就中止        # readlines 不經常使用    # 寫        # write 寫內容（不會本身換行，須要收到換行\n）# 關閉文件    # f.close()    # with open() as f:# 修改文件 ：    # import os    # os.remove    # os.rename

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。