Python用python-docx讀寫word文檔

時間 2020-03-26

原文原文鏈接

python-docx庫可用於建立和編輯Microsoft Word（.docx）文件。官方文檔：https://python-docx.readthedocs.io/en/latest/index.htmlhtml

備註：doc是微軟的專有的文件格式，docx是Microsoft Office2007以後版本使用，其基於Office Open XML標準的壓縮文件格式，比doc文件所佔用空間更小。docx格式的文件本質上是一個ZIP文件，因此其實也能夠把.docx文件直接改爲.zip，解壓後，裏面的word/document.xml包含了Word文檔的大部份內容，圖片文件則保存在word/media裏面。python-docx不支持.doc文件，間接解決方法是在代碼裏面先把.doc轉爲.docx。python

1、安裝包ide

`pip3 install python-docx``
2、建立word文檔函數

下面是在官文示例基礎上對個別地方稍微修改，並加上函數的使用說明
****from docx import Document
from docx.shared import Inches
document = Document()
#添加標題，並設置級別，範圍：0 至 9，默認爲1
document.add_heading('Document Title', 0)
#添加段落，文本能夠包含製表符（\t）、換行符（\n）或回車符（\r）等
p = document.add_paragraph('A plain paragraph having some ')
#在段落後面追加文本，並可設置樣式
p.add_run('bold').bold = True
p.add_run(' and some ')
p.add_run('italic.').italic = True
document.add_heading('Heading, level 1', level=1)
document.add_paragraph('Intense quote', style='Intense Quote')
#添加項目列表（前面一個小圓點）
document.add_paragraph(
'first item in unordered list', style='List Bullet'
)
document.add_paragraph('second item in unordered list', style='List Bullet')
#添加項目列表（前面數字）
document.add_paragraph('first item in ordered list', style='List Number')
document.add_paragraph('second item in ordered list', style='List Number')
#添加圖片
document.add_picture('monty-truth.png', width=Inches(1.25))
records = (
(3, '101', 'Spam'),
(7, '422', 'Eggs'),
(4, '631', 'Spam, spam, eggs, and spam')
)
#添加表格：一行三列
# 表格樣式參數可選：
# Normal Table
# Table Grid
# Light Shading、 Light Shading Accent 1 至 Light Shading Accent 6
# Light List、Light List Accent 1 至 Light List Accent 6
# Light Grid、Light Grid Accent 1 至 Light Grid Accent 6
# 太多了其它省略...
table = document.add_table(rows=1, cols=3, style='Light Shading Accent 2')
#獲取第一行的單元格列表
hdr_cells = table.rows[0].cells
#下面三行設置上面第一行的三個單元格的文本值
hdr_cells[0].text = 'Qty'
hdr_cells[1].text = 'Id'
hdr_cells[2].text = 'Desc'
for qty, id, desc in records:
#表格添加行，並返回行所在的單元格列表
row_cells = table.add_row().cells
row_cells[0].text = str(qty)
row_cells[1].text = id
row_cells[2].text = desc
document.add_page_break()
#保存.docx文檔
document.save('demo.docx')**spa

建立的demo.docx內容以下：

3、讀取word文檔orm

from docx import Document
doc = Document('demo.docx')
#每一段的內容
for para in doc.paragraphs:
print(para.text)
#每一段的編號、內容
for i in range(len(doc.paragraphs)):
print(str(i), doc.paragraphs[i].text)
#表格
tbs = doc.tables
for tb in tbs:
#行
for row in tb.rows:
#列
for cell in row.cells:
print(cell.text)
#也能夠用下面方法
'''text = ''
for p in cell.paragraphs:
text += p.text
print(text)'''
運行結果：xml

Document Title
A plain paragraph having some bold and some italic.
Heading, level 1
Intense quote
first item in unordered list
second item in unordered list
first item in ordered list
second item in ordered list
Document Title
A plain paragraph having some bold and some italic.
Heading, level 1
Intense quote
first item in unordered list
second item in unordered list
first item in ordered list
second item in ordered list
Qty
Id
Desc
101
Spam
422
Eggs
631
Spam, spam, eggs, and spam
[Finished in 0.2s]htm

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。