XML是實現不一樣語言或程序之間進行數據交換的協議,XML文件格式以下python
讀xml文件web
<data> <country name="Liechtenstein"> <rank updated="yes">2</rank> <year>2023</year> <gdppc>141100</gdppc> <neighbor direction="E" name="Austria" /> <neighbor direction="W" name="Switzerland" /> </country> <country name="Singapore"> <rank updated="yes">5</rank> <year>2026</year> <gdppc>59900</gdppc> <neighbor direction="N" name="Malaysia" /> </country> <country name="Panama"> <rank updated="yes">69</rank> <year>2026</year> <gdppc>13600</gdppc> <neighbor direction="W" name="Costa Rica" /> <neighbor direction="E" name="Colombia" /> </country> </data>
from xml.etree import ElementTree #導入xml處理模塊app
XML()模塊函數dom
功能:解析字符串形式的xml,返回的xml的最外層標籤節點,也就是一級標籤節點【有參】ide
使用方法:模塊名稱.XML(xml字符串變量)
函數
格式如:a = ElementTree.XML(neir)oop
text模塊關鍵字ui
功能:獲取標籤裏的字符串this
使用方法:要獲取字符串的標籤節點變量.text
編碼
格式如:b = a.text
http請求處理xmlQQ在線狀態
#!/usr/bin/env python # -*- coding:utf8 -*- """http請求處理xmlQQ在線狀態""" import requests #導入http請求模塊 from xml.etree import ElementTree #導入xml處理模塊 http =requests.get("http://www.webxml.com.cn//webservices/qqOnlineWebService.asmx/qqCheckOnline?qqCode=729088188") #發送http請求 http.encoding = "utf-8" #http請求編碼 neir = http.text #獲取http請求的xml字符串代碼 print(neir) #打印獲取到http請求的xml字符串代碼 print("\n") a = ElementTree.XML(neir) #解析xml,返回的xml的最外層標籤節點,也就是一級標籤節點 print(a) #打印xml的最外層標籤節點,也就是一級標籤節點 print("\n") b = a.text #獲取節點標籤裏的字符串 print(b) #能夠根據這個標籤的字符串來判斷QQ是否在線 #輸出 # <?xml version="1.0" encoding="utf-8"?> # <string xmlns="http://WebXml.com.cn/">V</string> # <Element '{http://WebXml.com.cn/}string' at 0x0000005692820548> # V
注意: 返回以Element開頭的爲標籤節點 如:<Element '{http://WebXml.com.cn/}DataSet' at 0x0000008C179C0548>
iter()模塊函數
功能:獲取一級標籤節點下的,多個同名同級的標籤節點,可跨級的獲取節點,返回迭代節點,須要for循環出標籤【有參】
使用方法:解析xml節點變量.iter("要獲取的標籤名稱")
格式如:b = a.iter("TrainDetailInfo")
#!/usr/bin/env python # -*- coding:utf8 -*- """http請求處理列車時刻表""" import requests #導入http請求模塊 from xml.etree import ElementTree #導入xml處理模塊 http =requests.get("http://www.webxml.com.cn/WebServices/TrainTimeWebService.asmx/getDetailInfoByTrainCode?TrainCode=k234&UserID=") #發送http請求 http.encoding = "utf-8" #http請求編碼 neir = http.text #獲取http請求的xml字符串代碼 a = ElementTree.XML(neir) #解析xml,返回的xml的最外層標籤節點,也就是一級標籤節點 print(a) #打印xml的最外層標籤節點,也就是一級標籤節點 print("\n") b = a.iter("TrainDetailInfo") #獲取一級標籤下的,多個同名同級的標籤節點,返回迭代節點,須要for循環出標籤節點 for i in b: print(i) #循環出全部的TrainDetailInfo標籤節點 # 輸出 # <Element '{http://WebXml.com.cn/}DataSet' at 0x000000B52FC7F548> # # # <Element 'TrainDetailInfo' at 0x000000B52FC7FB88> # <Element 'TrainDetailInfo' at 0x000000B52FC7FD18> # <Element 'TrainDetailInfo' at 0x000000B52FC7FEA8> # <Element 'TrainDetailInfo' at 0x000000B52FC98098> # <Element 'TrainDetailInfo' at 0x000000B52FC98228> # <Element 'TrainDetailInfo' at 0x000000B52FC983B8> # <Element 'TrainDetailInfo' at 0x000000B52FC98548> # <Element 'TrainDetailInfo' at 0x000000B52FC986D8> # <Element 'TrainDetailInfo' at 0x000000B52FC98868> # <Element 'TrainDetailInfo' at 0x000000B52FC989F8> # <Element 'TrainDetailInfo' at 0x000000B52FC98B88> # <Element 'TrainDetailInfo' at 0x000000B52FC98D18> # <Element 'TrainDetailInfo' at 0x000000B52FC98EA8> # <Element 'TrainDetailInfo' at 0x000000B52FC9B098> # <Element 'TrainDetailInfo' at 0x000000B52FC9B228> # <Element 'TrainDetailInfo' at 0x000000B52FC9B3B8> # <Element 'TrainDetailInfo' at 0x000000B52FC9B548> # <Element 'TrainDetailInfo' at 0x000000B52FC9B6D8>
tag模塊關鍵字
功能:獲取標籤的名稱,返回標籤名稱
使用方法:標籤節點變量.tag
格式如:i.tag
attrib模塊關鍵字
功能:獲取標籤的屬性,以字典形式返回標籤屬性
使用方法:標籤節點變量.attrib
格式如:i.attrib
#!/usr/bin/env python # -*- coding:utf8 -*- """http請求處理列車時刻表""" import requests #導入http請求模塊 from xml.etree import ElementTree #導入xml處理模塊 http =requests.get("http://www.webxml.com.cn/WebServices/TrainTimeWebService.asmx/getDetailInfoByTrainCode?TrainCode=k234&UserID=") #發送http請求 http.encoding = "utf-8" #http請求編碼 neir = http.text #獲取http請求的xml字符串代碼 a = ElementTree.XML(neir) #解析xml,返回的xml的最外層標籤節點,也就是一級標籤節點 print(a) #打印xml的最外層標籤節點,也就是一級標籤節點 print("\n") b = a.iter("TrainDetailInfo") #獲取一級標籤下的,多個同名同級的標籤節點,返回迭代節點,須要for循環出標籤節點 for i in b: print(i.tag,i.attrib) #tag獲取標籤的名稱,attrib獲取標籤的屬性 # 輸出 # <Element '{http://WebXml.com.cn/}DataSet' at 0x0000008C179C0548> # # # TrainDetailInfo {'{urn:schemas-microsoft-com:xml-msdata}rowOrder': '0', '{urn:schemas-microsoft-com:xml-diffgram-v1}id': 'TrainDetailInfo1', '{urn:schemas-microsoft-com:xml-diffgram-v1}hasChanges': 'inserted'} # TrainDetailInfo {'{urn:schemas-microsoft-com:xml-msdata}rowOrder': '1', '{urn:schemas-microsoft-com:xml-diffgram-v1}id': 'TrainDetailInfo2', '{urn:schemas-microsoft-com:xml-diffgram-v1}hasChanges': 'inserted'} # TrainDetailInfo {'{urn:schemas-microsoft-com:xml-msdata}rowOrder': '2', '{urn:schemas-microsoft-com:xml-diffgram-v1}id': 'TrainDetailInfo3', '{urn:schemas-microsoft-com:xml-diffgram-v1}hasChanges': 'inserted'} # TrainDetailInfo {'{urn:schemas-microsoft-com:xml-msdata}rowOrder': '3', '{urn:schemas-microsoft-com:xml-diffgram-v1}id': 'TrainDetailInfo4', '{urn:schemas-microsoft-com:xml-diffgram-v1}hasChanges': 'inserted'} # TrainDetailInfo {'{urn:schemas-microsoft-com:xml-msdata}rowOrder': '4', '{urn:schemas-microsoft-com:xml-diffgram-v1}id': 'TrainDetailInfo5', '{urn:schemas-microsoft-com:xml-diffgram-v1}hasChanges': 'inserted'} # TrainDetailInfo {'{urn:schemas-microsoft-com:xml-msdata}rowOrder': '5', '{urn:schemas-microsoft-com:xml-diffgram-v1}id': 'TrainDetailInfo6', '{urn:schemas-microsoft-com:xml-diffgram-v1}hasChanges': 'inserted'} # TrainDetailInfo {'{urn:schemas-microsoft-com:xml-msdata}rowOrder': '6', '{urn:schemas-microsoft-com:xml-diffgram-v1}id': 'TrainDetailInfo7', '{urn:schemas-microsoft-com:xml-diffgram-v1}hasChanges': 'inserted'} # TrainDetailInfo {'{urn:schemas-microsoft-com:xml-msdata}rowOrder': '7', '{urn:schemas-microsoft-com:xml-diffgram-v1}id': 'TrainDetailInfo8', '{urn:schemas-microsoft-com:xml-diffgram-v1}hasChanges': 'inserted'} # TrainDetailInfo {'{urn:schemas-microsoft-com:xml-msdata}rowOrder': '8', '{urn:schemas-microsoft-com:xml-diffgram-v1}id': 'TrainDetailInfo9', '{urn:schemas-microsoft-com:xml-diffgram-v1}hasChanges': 'inserted'} # TrainDetailInfo {'{urn:schemas-microsoft-com:xml-msdata}rowOrder': '9', '{urn:schemas-microsoft-com:xml-diffgram-v1}id': 'TrainDetailInfo10', '{urn:schemas-microsoft-com:xml-diffgram-v1}hasChanges': 'inserted'}
find()模塊函數
功能:查找一個標籤節點下的子標籤節點,返回子標籤節點【有參】
使用方法:父標籤節點變量.find("要查找的子標籤名稱")
格式如:i.find("TrainStation")
#!/usr/bin/env python # -*- coding:utf8 -*- """http請求處理xmll列出時刻表""" import requests #導入http請求模塊 from xml.etree import ElementTree #導入xml處理模塊 http =requests.get("http://www.webxml.com.cn/WebServices/TrainTimeWebService.asmx/getDetailInfoByTrainCode?TrainCode=k234&UserID=") #發送http請求 http.encoding = "utf-8" #http請求編碼 neir = http.text #獲取http請求的xml字符串代碼 a = ElementTree.XML(neir) #解析xml,返回的xml的最外層標籤節點,也就是一級標籤節點 print(a) #打印xml的最外層標籤節點,也就是一級標籤節點 print("\n") b = a.iter("TrainDetailInfo") #獲取一級標籤下的,多個同名同級的標籤節點,返回迭代句柄,須要for循環出標籤節點 for i in b: print(i.find("TrainStation")) #find()查找一個標籤節點下的子標籤節點 # 輸出 # <Element '{http://WebXml.com.cn/}DataSet' at 0x000000BF7CFC0548> # # # <Element 'TrainStation' at 0x000000BF7CFC0BD8> # <Element 'TrainStation' at 0x000000BF7CFC0D68> # <Element 'TrainStation' at 0x000000BF7CFC0EF8> # <Element 'TrainStation' at 0x000000BF7CFD90E8> # <Element 'TrainStation' at 0x000000BF7CFD9278> # <Element 'TrainStation' at 0x000000BF7CFD9408> # <Element 'TrainStation' at 0x000000BF7CFD9598> # <Element 'TrainStation' at 0x000000BF7CFD9728> # <Element 'TrainStation' at 0x000000BF7CFD98B8> # <Element 'TrainStation' at 0x000000BF7CFD9A48> # <Element 'TrainStation' at 0x000000BF7CFD9BD8> # <Element 'TrainStation' at 0x000000BF7CFD9D68> # <Element 'TrainStation' at 0x000000BF7CFD9EF8> # <Element 'TrainStation' at 0x000000BF7CFDC0E8> # <Element 'TrainStation' at 0x000000BF7CFDC278> # <Element 'TrainStation' at 0x000000BF7CFDC408> # <Element 'TrainStation' at 0x000000BF7CFDC598> # <Element 'TrainStation' at 0x000000BF7CFDC728>
拿出標籤裏須要的數據
#!/usr/bin/env python # -*- coding:utf8 -*- """http請求處理xmlQQ在線狀態""" import requests #導入http請求模塊 from xml.etree import ElementTree #導入xml處理模塊 http =requests.get("http://www.webxml.com.cn/WebServices/TrainTimeWebService.asmx/getDetailInfoByTrainCode?TrainCode=k567&UserID=") #發送http請求 http.encoding = "utf-8" #http請求編碼 neir = http.text #獲取http請求的xml字符串代碼 a = ElementTree.XML(neir) #解析xml,返回的xml的最外層標籤節點,也就是一級標籤節點 print(a) #打印xml的最外層標籤節點,也就是一級標籤節點 print("\n") b = a.iter("TrainDetailInfo") #獲取一級標籤下的,多個同名同級的標籤節點,返回迭代句柄,須要for循環出標籤節點 for i in b: print(i.find("TrainStation").text,i.find("ArriveTime").text,i.find("StartTime").text,i.find("KM").text) #獲取標籤裏的字符串 # 輸出 # <Element '{http://WebXml.com.cn/}DataSet' at 0x000000B9C3775408> # # # 天津(車次:k567) None 15:30:00 0 # 唐山 17:06:00 17:11:00 114 # 昌黎 18:19:00 18:22:00 226 # 北戴河 18:43:00 18:46:00 254 # 秦皇島 19:04:00 19:09:00 275 # 山海關 19:34:00 19:50:00 292 # 綏中 20:54:00 20:58:00 357 # 興城 21:34:00 21:38:00 405 # 葫蘆島 21:58:00 22:02:00 426 # 錦州 22:47:00 22:53:00 476 # 溝幫子 23:42:00 23:46:00 540 # 瀋陽 02:19:00 02:31:00 718 # 四平 05:16:00 05:19:00 906 # 八面城 05:40:00 05:42:00 934 # 雙遼 06:28:00 06:33:00 996 # 保康 07:28:00 07:30:00 1076 # 太平川 07:56:00 08:00:00 1110 # 開通 08:33:00 08:36:00 1159 # 洮南 09:21:00 09:24:00 1227 # 白城 09:52:00 10:04:00 1259 # 鎮賚 10:31:00 10:34:00 1297 # 泰來 11:22:00 11:24:00 1361 # 江橋 12:02:00 12:06:00 1409 # 三間房 12:51:00 12:53:00 1447 # 齊齊哈爾 13:26:00 None 1477
注意:xml經過XML()解析獲得一級節點後,能夠用for循環節點,和嵌套循環節點來獲得想要的節點【重點】
以下列:
#!/usr/bin/env python # -*- coding:utf8 -*- """ 注意:xml經過ElementTree.XML()解析獲得一級節點後,能夠用for循環節點,和嵌套循環節點來獲得想要的節點 以下列: <data> <country name="Liechtenstein"> <rank updated="yes">2</rank> <year>2023</year> <gdppc>141100</gdppc> <neighbor direction="E" name="Austria" /> <neighbor direction="W" name="Switzerland" /> </country> <country name="Singapore"> <rank updated="yes">5</rank> <year>2026</year> <gdppc>59900</gdppc> <neighbor direction="N" name="Malaysia" /> </country> <country name="Panama"> <rank updated="yes">69</rank> <year>2026</year> <gdppc>13600</gdppc> <neighbor direction="W" name="Costa Rica" /> <neighbor direction="E" name="Colombia" /> </country> </data> """ a = open("xml.xml","r",encoding="utf-8") #本地打開一個xml文件 b = a.read() #讀出文件內容 from xml.etree import ElementTree #導入xml處理模塊 c = ElementTree.XML(b) #解析xml獲得,第一個節點,也就是一級節點 for i in c: #循環一級節點裏的節點,獲得二級節點 for s in i: #循環二級節點裏的節點,獲得三級節點下的節點 print(s) #打印出三級節點 # 輸出 # <Element 'rank' at 0x000000A4C3B083B8> # <Element 'year' at 0x000000A4C3B08408> # <Element 'gdppc' at 0x000000A4C3B08458> # <Element 'neighbor' at 0x000000A4C3B084A8> # <Element 'neighbor' at 0x000000A4C3B084F8> # <Element 'rank' at 0x000000A4C3B08598> # <Element 'year' at 0x000000A4C3B085E8> # <Element 'gdppc' at 0x000000A4C3B08638> # <Element 'neighbor' at 0x000000A4C3B08688> # <Element 'rank' at 0x000000A4C3B08728> # <Element 'year' at 0x000000A4C3B08778> # <Element 'gdppc' at 0x000000A4C3B087C8> # <Element 'neighbor' at 0x000000A4C3B08818> # <Element 'neighbor' at 0x000000A4C3B08868>
重點:推薦iter()跨級獲取父點,如iter()沒法知足需求考慮用for循環節在配合使用。:for循環節點,配合iter()跨級獲取父點,配合find()獲取子字典,能夠獲取到全部你想須要的節點。
解析XML
解析xml有兩種方式,一種是字符串方式解析XML(),一種是xml文件直接解析parse()
字符串方式解析XML()
#!/usr/bin/env python # -*- coding:utf8 -*- a = open("xml.xml","r",encoding="utf-8") #以utf-8編碼只讀模式打開 b = a.read() #讀出文件裏的字符串 a.close() from xml.etree import ElementTree #導入xml解析模塊 c = ElementTree.XML(b) #解析字符串形式的xml,返回的xml的最外層標籤節點,也就是一級標籤節點 print(c) # 輸出 # <Element 'data' at 0x000000D2756B3228>
xml文件直接解析parse() 注意:parse()方式解析xml是便可讀,也可對xml文件寫入的,包括,增,刪,改
#!/usr/bin/env python # -*- coding:utf8 -*- from xml.etree import ElementTree #導入xml解析模塊 c = ElementTree.parse("xml.xml") #打開一個xml文件 b = c.getroot() #獲取xml文件的一級節點 print(b) # 輸出 # <Element 'data' at 0x0000006C681C3228>
寫xml文件包括(增,刪,改)
parse()模塊函數
功能:打開xml文件解析,直接打開一個xml文件,parse()方式解析xml是便可讀,也可對xml文件寫入的,包括,增,刪,改【有參】
使用方法:模塊名稱.parse("要打開解析的xml文件路徑或名稱")
格式如:c = ElementTree.parse("xml.xml")
getroot()模塊函數
功能:獲取parse()方式解析的xml節點,返回的一級節點,也就是最外層節點
使用方法:打開xml文件解析變量.getroot()
格式如:b = c.getroot()
#!/usr/bin/env python # -*- coding:utf8 -*- from xml.etree import ElementTree #導入xml解析模塊 c = ElementTree.parse("xml.xml") #打開一個xml文件 b = c.getroot() #獲取xml文件的一級節點 print(b) # 輸出 # <Element 'data' at 0x000000F9AF0D3228>
注意:parse()解析xml文件能夠讀寫,只是解析方式不一樣,其餘的獲取裏面的標籤節點,和獲取標籤裏的字符串的方式,都和XML()解析是同樣的
write()模塊函數
功能:寫入保存修改的xml文件【有參】
使用方法:parse()解析變量.write("保存文件名稱",encoding='字符編碼', xml_declaration=True, short_empty_elements=False)
參數說明
encoding="字符編碼"
xml_declaration=True :寫入xml文件時是否寫入xml版本信息和字符編碼如:<?xml version='1.0' encoding='utf-8'?> True 寫入 False 不寫入
short_empty_elements=False :是否自動將沒有值的空標籤保存爲單標籤 True 是 False 否
格式如:c.write("xml.xml", encoding='utf-8')
修改一個標籤裏的字符串
#!/usr/bin/env python # -*- coding:utf8 -*- from xml.etree import ElementTree #導入xml解析模塊 c = ElementTree.parse("xml.xml") #打開一個xml文件 b = c.getroot() #獲取xml文件的一級節點 print(b,"\n") #打印獲取xml文件的一級節點 d = b.iter("rank") #獲取一級節點下的,全部名稱爲rank的節點 for i in d: #循環出全部rank節點 f1 = int(i.text) + 1 #獲取rank節點標籤裏的字符串,而後轉換成數字類型加1,賦值給f1 i.text = str(f1) #而後將rank節點裏的字符串,修改爲f1的值 c.write("xml.xml", encoding='utf-8',xml_declaration=True, short_empty_elements=False) #最後將寫入保存
set()模塊函數
功能:爲節點標籤添加屬性【有參】
使用方法:節點變量.set("標籤屬性名稱","標籤屬性值")
格式如:i.set("linguixiou2","yes2")
爲標籤添加屬性
#!/usr/bin/env python # -*- coding:utf8 -*- from xml.etree import ElementTree #導入xml解析模塊 c = ElementTree.parse("xml.xml") #打開一個xml文件 b = c.getroot() #獲取xml文件的一級節點 print(b,"\n") #打印獲取xml文件的一級節點 d = b.iter("year") #獲取一級節點下的,全部名稱爲rank的節點 for i in d: #循環出全部rank節點 i.set("linguixiou2","yes2") #爲當前節點標籤添加屬性 i.set("linguixiou","yes") #爲當前節點標籤添加屬性 c.write("xml.xml", encoding='utf-8') #最後將寫入保存
del 模塊關鍵字
功能:刪除標籤屬性
使用方法:del 當前標籤節點.attrib["要刪除的標籤屬性名稱"]
格式如:del i.attrib["linguixiou"]
刪除標籤屬性
#!/usr/bin/env python # -*- coding:utf8 -*- from xml.etree import ElementTree #導入xml解析模塊 c = ElementTree.parse("xml.xml") #打開一個xml文件 b = c.getroot() #獲取xml文件的一級節點 print(b,"\n") #打印獲取xml文件的一級節點 d = b.iter("year") #獲取一級節點下的,全部名稱爲rank的節點 for i in d: #循環出全部rank節點 i.set("linguixiou2","yes2") #爲當前節點標籤添加屬性 i.set("linguixiou","yes") #爲當前節點標籤添加屬性 del i.attrib["linguixiou"] #(刪除標籤的屬性)del刪除,i當前節點,attrib獲取標籤名稱,["linguixiou"]標籤名稱裏的屬性名稱 c.write("xml.xml", encoding='utf-8') #最後將寫入保存
findall()模塊函數
功能:獲取一級節點,下的多個同名同級的節點,返回成一個列表,每一個元素是一個節點
使用方法:一級節點變量.findall("要獲取的節點標籤名稱")
格式如:d = b.findall("country")
查找多個同名同級的節點的某一個節點
#!/usr/bin/env python # -*- coding:utf8 -*- from xml.etree import ElementTree #導入xml解析模塊 c = ElementTree.parse("xml.xml") #打開一個xml文件 b = c.getroot() #獲取xml文件的一級節點 print(b,"\n") #打印獲取xml文件的一級節點 d = b.findall("country") #獲取一級節點,下的多個同名同級的節點,返回成一個列表,每一個元素是一個節點 print(d,"\n") e = d[0].find("rank") #索引列表裏的第0個元素節點,查找0個元素節點下的rank子節點 print(e) # 輸出 # <Element 'data' at 0x000000278AB63228> # # [<Element 'country' at 0x000000278AD12B88>, <Element 'country' at 0x000000278AD27548>, <Element 'country' at 0x000000278AD276D8>] # # <Element 'rank' at 0x000000278AD273B8>
remove()模塊函數
功能:刪除父節點下的子節點
使用方法:父節點變量.remove(從父節點找到子節點)
格式如:e.remove(e.find("rank"))
#!/usr/bin/env python # -*- coding:utf8 -*- from xml.etree import ElementTree as xml #導入xml解析模塊 c = xml.parse("xml.xml") #打開一個xml文件 b = c.getroot() #獲取xml文件的一級節點 e = b.find("country") #獲取根節點下的country節點 e.remove(e.find("rank")) #刪除country節點下的gdppc節點 c.write("xml.xml", encoding='utf-8',xml_declaration=True, short_empty_elements=False) #最後將寫入保存
Element()模塊函數
功能:建立標籤節點,注意只是建立了節點,須要結合append()追加到父節點下
使用方法:要建立的父級節點變量.Element("標籤名稱",attrib={'鍵值對字典形式的標籤屬性})
格式如:xml.Element('er', attrib={'name': '2'})
append()模塊函數
功能:追加標籤節點,將一個建立的節點,追加到父級節點下
使用方法:父級節點變量.append("建立節點變量")
格式如:geng.append(b1)
ElementTree()模塊函數
功能1:建立一個ElementTree對象,生成xml,追加等操做後生成xml
注意:XML()解析的Element對象,使用ElementTree()建立一個ElementTree對象,也能夠修改保存的
使用方法:模塊名稱.ElementTree(根節點)
格式如:tree = xml.ElementTree(geng)
新建一個xml文件1,注意這個方式生成的xml文件是沒自動縮進的
#!/usr/bin/env python # -*- coding:utf8 -*- from xml.etree import ElementTree as xml #as將模塊名稱重命名爲xml geng = xml.Element("geng") #建立一級節點,根節點 b1 = xml.Element('er', attrib={'name': '2'}) #建立二級節點 b2 = xml.Element('er', attrib={'name': '2'})#建立二級節點 geng.append(b1) #將二級節點添加到根節點下 geng.append(b2) #將二級節點添加到根節點下 c1 = xml.Element('san', attrib={'name': '3'})#建立三級節 c2 = xml.Element('san', attrib={'name': '3'})#建立三級節 b1.append(c1) #將三級節點添加到二級節點下 b2.append(c2) #將三級節點添加到二級節點下 tree = xml.ElementTree(geng) #生成xml tree.write('oooo.xml',encoding='utf-8',xml_declaration=True, short_empty_elements=False)
SubElement()【推薦】
功能:建立節點追加節點,而且能夠定義標籤名稱,標籤屬性,以及標籤的text文本值
使用方法:定義變量 = 模塊名稱.SubElement(要追加的父級變量,"標籤名稱",attrib={"字典形式標籤屬性"})
定義變量.text = "標籤的text文本值"
格式如:ch1 = xml.SubElement(geng,"er",attrib={"name":"zhi"})
ch1.text = "二級標籤"
新建xml文件2,注意這個方式生成的xml文件是沒自動縮進的
#!/usr/bin/env python # -*- coding:utf8 -*- from xml.etree import ElementTree as xml #as將模塊名稱重命名爲xml geng = xml.Element("geng") #建立一級節點,根節點 ch1 = xml.SubElement(geng,"er",attrib={"name":"zhi"}) ch1.text = "二級標籤" ch2 = xml.SubElement(ch1,"er3",attrib={"name":"zhi3"}) ch2.text = "三級標籤" tree = xml.ElementTree(geng) #生成xml tree.write('oooo.xml',encoding='utf-8',xml_declaration=True, short_empty_elements=False)
新建xml文件3【推薦】,自動縮進
縮進須要引入 xml文件夾下dom文件夾裏的 minidom 模塊文件
from xml.etree import ElementTree as ET 引入ElementTree模塊
from xml.dom import minidom 引入模塊
建立節點等操做用ElementTree模塊,minidom只作縮進處理
#!/usr/bin/env python # -*- coding:utf8 -*- from xml.etree import ElementTree as ET from xml.dom import minidom def prettify(elem): """將節點轉換成字符串,並添加縮進。""" rough_string = ET.tostring(elem, 'utf-8') reparsed = minidom.parseString(rough_string) return reparsed.toprettyxml(indent="\t") # 建立根節點 root = ET.Element("famliy") #建立一級節點 a = ET.SubElement(root,"erji",attrib={"mna":"zhi"}) #建立二級節點 b1 = ET.SubElement(a,"mingzi") #建立三級節點 b1.text = "林貴秀" b2 = ET.SubElement(a,"xingbie") #建立三級節點 b2.text = "男" b3 = ET.SubElement(a,"nianl") #建立三級節點 b3.text = "32" raw_str = prettify(root) #將整個根節點傳入函數裏縮進處理 f = open("xxxoo.xml",'w',encoding='utf-8') #打開xxxoo.xml文件 f.write(raw_str) #將縮進處理好的整個xml寫入文件 f.close() #關閉打開的文件
XML 命名空間提供避免元素命名衝突的方法。
在 XML 中,元素名稱是由開發者定義的,當兩個不一樣的文檔使用相同的元素名時,就會發生命名衝突。
這個 XML 文檔攜帶着某個表格中的信息:
<table> <tr> <td>Apples</td> <td>Bananas</td> </tr> </table>
這個 XML 文檔攜帶有關桌子的信息(一件傢俱):
<table> <name>African Coffee Table</name> <width>80</width> <length>120</length> </table>
假如這兩個 XML 文檔被一塊兒使用,因爲兩個文檔都包含帶有不一樣內容和定義的 <table> 元素,就會發生命名衝突。
XML 解析器沒法肯定如何處理這類衝突。
此文檔帶有某個表格中的信息:
<h:table> <h:tr> <h:td>Apples</h:td> <h:td>Bananas</h:td> </h:tr> </h:table>
此 XML 文檔攜帶着有關一件傢俱的信息:
<f:table> <f:name>African Coffee Table</f:name> <f:width>80</f:width> <f:length>120</f:length> </f:table>
如今,命名衝突不存在了,這是因爲兩個文檔都使用了不一樣的名稱來命名它們的 <table> 元素 (<h:table> 和 <f:table>)。
經過使用前綴,咱們建立了兩種不一樣類型的 <table> 元素。
register_namespace()模塊函數
功能:建立命名空間
使用方法:模塊名稱.register_namespace("命名名稱","命名名稱值")
格式如:ET.register_namespace('com',"http://www.company.com")
#!/usr/bin/env python # -*- coding:utf8 -*- from xml.etree import ElementTree as ET ET.register_namespace('com',"http://www.company.com") #register_namespace("命名名稱","命名名稱值")建立命名空間 """ 建立命名空間後,後面建立節點的時候,定義節點標籤,和定義節點標籤屬性的時候,在裏面加入命名名稱值 如【"{命名名稱值}節點標籤名稱"】,這樣會自動將命名名稱值轉換成命名名稱 簡單的理解就是,給標籤,標籤屬性,加上一個標示,防止名稱衝突 """ root = ET.Element("{http://www.company.com}STUFF") body = ET.SubElement(root, "{http://www.company.com}MORE_STUFF", attrib={"{http://www.company.com}hhh": "123"}) body.text = "STUFF EVERYWHERE!" tree = ET.ElementTree(root) tree.write("page.xml",xml_declaration=True,encoding='utf-8',method="xml") #生成以下 # <?xml version='1.0' encoding='utf-8'?> # <com:STUFF xmlns:com="http://www.company.com"> # <com:MORE_STUFF com:hhh="123">STUFF EVERYWHERE!</com:MORE_STUFF> # </com:STUFF>
重點總結
一.解析xml文件
有兩種方式
字符串方式:XML()解析,返回的是Element對象,直接獲得根節點
文件方式:parse()解析,返回的是ElementTree對象,要經過getroot()來獲得Element對象,獲得根節點
重點:ElementTree對象才能夠寫入文件,Element沒法寫入文件,因此若是是Element解析的,須要ElementTree(根節點變量)函數來建立ElementTree對象,後就能夠寫入了
二.ElementTree對象
1.ElementTree 類建立,能夠經過ElementTree(xxxx)建立對象,parse()底層仍是調用ElementTree()建立的
2.ElementTree 對象,經過getroot()獲取根節點
ElementTree() 建立一個ElementTree對象,生成xml
write() 內存中的xml寫入文件
三.Element對象
iter() 獲取一級標籤節點下的,多個同名同級的標籤節點,可跨級的獲取節點,返回迭代節點,須要for循環出標籤【有參】
findall() 獲取一級節點,下的多個同名同級的節點,返回成一個列表,每一個元素是一個節點
find() 查找一個標籤節點下的子標籤節點,返回子標籤節點【有參】
set() 爲節點標籤添加屬性【有參】
remove() 刪除父節點下的子節點
text 獲取標籤裏的文本字符串
tag 獲取標籤的名稱,返回標籤名稱
attrib 獲取標籤的屬性,以字典形式返回標籤屬性
del 刪除標籤屬性
Element() 建立標籤節點,注意只是建立了節點,須要結合append()追加到父節點下
append() 追加標籤節點,將一個建立的節點,追加到父級節點下
SubElement() 建立節點追加節點,而且能夠定義標籤名稱,標籤屬性,以及標籤的text文本值
register_namespace() 建立命名空間
Element,操做詳細源碼
class Element: """An XML element. This class is the reference implementation of the Element interface. An element's length is its number of subelements. That means if you want to check if an element is truly empty, you should check BOTH its length AND its text attribute. The element tag, attribute names, and attribute values can be either bytes or strings. *tag* is the element name. *attrib* is an optional dictionary containing element attributes. *extra* are additional element attributes given as keyword arguments. Example form: <tag attrib>text<child/>...</tag>tail """ 當前節點的標籤名 tag = None """The element's name.""" 當前節點的屬性 attrib = None """Dictionary of the element's attributes.""" 當前節點的內容 text = None """ Text before first subelement. This is either a string or the value None. Note that if there is no text, this attribute may be either None or the empty string, depending on the parser. """ tail = None """ Text after this element's end tag, but before the next sibling element's start tag. This is either a string or the value None. Note that if there was no text, this attribute may be either None or an empty string, depending on the parser. """ def __init__(self, tag, attrib={}, **extra): if not isinstance(attrib, dict): raise TypeError("attrib must be dict, not %s" % ( attrib.__class__.__name__,)) attrib = attrib.copy() attrib.update(extra) self.tag = tag self.attrib = attrib self._children = [] def __repr__(self): return "<%s %r at %#x>" % (self.__class__.__name__, self.tag, id(self)) def makeelement(self, tag, attrib): 建立一個新節點 """Create a new element with the same type. *tag* is a string containing the element name. *attrib* is a dictionary containing the element attributes. Do not call this method, use the SubElement factory function instead. """ return self.__class__(tag, attrib) def copy(self): """Return copy of current element. This creates a shallow copy. Subelements will be shared with the original tree. """ elem = self.makeelement(self.tag, self.attrib) elem.text = self.text elem.tail = self.tail elem[:] = self return elem def __len__(self): return len(self._children) def __bool__(self): warnings.warn( "The behavior of this method will change in future versions. " "Use specific 'len(elem)' or 'elem is not None' test instead.", FutureWarning, stacklevel=2 ) return len(self._children) != 0 # emulate old behaviour, for now def __getitem__(self, index): return self._children[index] def __setitem__(self, index, element): # if isinstance(index, slice): # for elt in element: # assert iselement(elt) # else: # assert iselement(element) self._children[index] = element def __delitem__(self, index): del self._children[index] def append(self, subelement): 爲當前節點追加一個子節點 """Add *subelement* to the end of this element. The new element will appear in document order after the last existing subelement (or directly after the text, if it's the first subelement), but before the end tag for this element. """ self._assert_is_element(subelement) self._children.append(subelement) def extend(self, elements): 爲當前節點擴展 n 個子節點 """Append subelements from a sequence. *elements* is a sequence with zero or more elements. """ for element in elements: self._assert_is_element(element) self._children.extend(elements) def insert(self, index, subelement): 在當前節點的子節點中插入某個節點,即:爲當前節點建立子節點,而後插入指定位置 """Insert *subelement* at position *index*.""" self._assert_is_element(subelement) self._children.insert(index, subelement) def _assert_is_element(self, e): # Need to refer to the actual Python implementation, not the # shadowing C implementation. if not isinstance(e, _Element_Py): raise TypeError('expected an Element, not %s' % type(e).__name__) def remove(self, subelement): 在當前節點在子節點中刪除某個節點 """Remove matching subelement. Unlike the find methods, this method compares elements based on identity, NOT ON tag value or contents. To remove subelements by other means, the easiest way is to use a list comprehension to select what elements to keep, and then use slice assignment to update the parent element. ValueError is raised if a matching element could not be found. """ # assert iselement(element) self._children.remove(subelement) def getchildren(self): 獲取全部的子節點(廢棄) """(Deprecated) Return all subelements. Elements are returned in document order. """ warnings.warn( "This method will be removed in future versions. " "Use 'list(elem)' or iteration over elem instead.", DeprecationWarning, stacklevel=2 ) return self._children def find(self, path, namespaces=None): 獲取第一個尋找到的子節點 """Find first matching element by tag name or path. *path* is a string having either an element tag or an XPath, *namespaces* is an optional mapping from namespace prefix to full name. Return the first matching element, or None if no element was found. """ return ElementPath.find(self, path, namespaces) def findtext(self, path, default=None, namespaces=None): 獲取第一個尋找到的子節點的內容 """Find text for first matching element by tag name or path. *path* is a string having either an element tag or an XPath, *default* is the value to return if the element was not found, *namespaces* is an optional mapping from namespace prefix to full name. Return text content of first matching element, or default value if none was found. Note that if an element is found having no text content, the empty string is returned. """ return ElementPath.findtext(self, path, default, namespaces) def findall(self, path, namespaces=None): 獲取全部的子節點 """Find all matching subelements by tag name or path. *path* is a string having either an element tag or an XPath, *namespaces* is an optional mapping from namespace prefix to full name. Returns list containing all matching elements in document order. """ return ElementPath.findall(self, path, namespaces) def iterfind(self, path, namespaces=None): 獲取全部指定的節點,並建立一個迭代器(能夠被for循環) """Find all matching subelements by tag name or path. *path* is a string having either an element tag or an XPath, *namespaces* is an optional mapping from namespace prefix to full name. Return an iterable yielding all matching elements in document order. """ return ElementPath.iterfind(self, path, namespaces) def clear(self): 清空節點 """Reset element. This function removes all subelements, clears all attributes, and sets the text and tail attributes to None. """ self.attrib.clear() self._children = [] self.text = self.tail = None def get(self, key, default=None): 獲取當前節點的屬性值 """Get element attribute. Equivalent to attrib.get, but some implementations may handle this a bit more efficiently. *key* is what attribute to look for, and *default* is what to return if the attribute was not found. Returns a string containing the attribute value, or the default if attribute was not found. """ return self.attrib.get(key, default) def set(self, key, value): 爲當前節點設置屬性值 """Set element attribute. Equivalent to attrib[key] = value, but some implementations may handle this a bit more efficiently. *key* is what attribute to set, and *value* is the attribute value to set it to. """ self.attrib[key] = value def keys(self): 獲取當前節點的全部屬性的 key """Get list of attribute names. Names are returned in an arbitrary order, just like an ordinary Python dict. Equivalent to attrib.keys() """ return self.attrib.keys() def items(self): 獲取當前節點的全部屬性值,每一個屬性都是一個鍵值對 """Get element attributes as a sequence. The attributes are returned in arbitrary order. Equivalent to attrib.items(). Return a list of (name, value) tuples. """ return self.attrib.items() def iter(self, tag=None): 在當前節點的子孫中根據節點名稱尋找全部指定的節點,並返回一個迭代器(能夠被for循環)。 """Create tree iterator. The iterator loops over the element and all subelements in document order, returning all elements with a matching tag. If the tree structure is modified during iteration, new or removed elements may or may not be included. To get a stable set, use the list() function on the iterator, and loop over the resulting list. *tag* is what tags to look for (default is to return all elements) Return an iterator containing all the matching elements. """ if tag == "*": tag = None if tag is None or self.tag == tag: yield self for e in self._children: yield from e.iter(tag) # compatibility def getiterator(self, tag=None): # Change for a DeprecationWarning in 1.4 warnings.warn( "This method will be removed in future versions. " "Use 'elem.iter()' or 'list(elem.iter())' instead.", PendingDeprecationWarning, stacklevel=2 ) return list(self.iter(tag)) def itertext(self): 在當前節點的子孫中根據節點名稱尋找全部指定的節點的內容,並返回一個迭代器(能夠被for循環)。 """Create text iterator. The iterator loops over the element and all subelements in document order, returning all inner text. """ tag = self.tag if not isinstance(tag, str) and tag is not None: return if self.text: yield self.text for e in self: yield from e.itertext() if e.tail: yield e.tail 複製代碼