對於許多xml文件,一個根節點向下會有不少層級的子節點,一般會把文本放置到最最底層的節點
所以要想訪問文本,就必需要訪問最底層的那個節點
但也有一些xml文件,text會放置到中間層級的節點中,好比htmlhtml
建立帶文本節點ui
root = etree.Element("root") root.text = "TEXT" print(root.text) #輸出:TEXT etree.tostring(root) #b'<root>TEXT</root>'
節點文本相關操做spa
html = etree.Element("html") body = etree.SubElement(html, "body") body.text = "TEXT" etree.tostring(html) # b'<html><body>TEXT</body></html>' br = etree.SubElement(body, "br") etree.tostring(html) #b'<html><body>TEXT<br/></body></html>' br.tail = "TAIL" etree.tostring(html) #b'<html><body>TEXT<br/>TAIL</body></html>' etree.tostring(br) #b'<br/>TAIL' etree.tostring(br, with_tail=False) #b'<br/>' etree.tostring(html, method="text") #b'TEXTTAIL' #-----------------------使用xpath提取文本----------------------------- print(html.xpath("string()")) #輸出:TEXTTAIL print(html.xpath("//text()")) #輸出:['TEXT', 'TAIL'] build_text_list = etree.XPath("//text()") print(build_text_list(html)) #輸出:['TEXT', 'TAIL'],事先寫好Xpath,並將xpath套用到html節點上 texts = build_text_list(html) print(texts[0]) #輸出:TEXT parent = texts[0].getparent() print(parent.tag) #輸出:body print(texts[1]) #輸出:TAIL print(texts[1].getparent().tag) #輸出:br stringify = etree.XPath("string()") print(stringify(html)) #輸出:TEXTTAIL print(stringify(html).getparent()) #輸出:NONE,因爲經過string()獲取是各個節點拼接的文本,所以沒法得到其父節點 #----------------------判斷是否爲普通文本,或者是tail文本----------------------------- print(texts[0].is_text) #輸出:True print(texts[1].is_text) #輸出:False print(texts[1].is_tail) #輸出:True
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------code