最近須要用python處理一個簡單的XML,因其格式較亂,恰巧爲了測試BeautifulSoup,因此百度學習了下,發現大多數都是解析HTML的文章,因此翻文檔大概筆記下,功能是實現了,但問題不少後期再說吧。python
測試XML代碼:web
<?xml version="1.0" encoding="utf-8"?> <web-app> <context-param> <param-name>地址</param-name> <param-value>北京西街</param-value> </context-param> <listener> <listener-class> 寡婦牆..... </listener-class> </listener> <servlet> <servlet-name>姓名</servlet-name> <servlet-class>小強</servlet-class> <init-param> <param-name>動物</param-name> <param-value>人類</param-value> </init-param> <load-on-startup>1</load-on-startup> </servlet> </web-app>
測試python代碼shell
#coding=utf-8 ''' 簡單測試BeautifulSoup解析XML ''' from bs4 import BeautifulSoup import re #使用BeautifulSoup以XML格式打開test.xml文件 soup = BeautifulSoup(open('test.xml'),'xml') #格式化XML輸出 print soup.prettify() #查找全部叫param-value的tag子節點 print "\n" + "*"*20 + "\n" print soup.find_all('param-value') print "\n" + "*"*20 + "\n" #打印出全部符合條件的子節點屬性值 for tag in soup.find_all('param-value'): print tag.text.strip() print "\n" + "*"*20 + "\n" #使用正則的方式查找符合條件的子節點 for tag1 in soup.find_all(re.compile('param-value')): print tag1.text.strip()
輸出結果app
<?xml version="1.0" encoding="utf-8"?> <web-app> <context-param> <param-name> 地址 </param-name> <param-value> 北京西街 </param-value> </context-param> <listener> <listener-class> 寡婦牆..... </listener-class> </listener> <servlet> <servlet-name> 姓名 </servlet-name> <servlet-class> 小強 </servlet-class> <init-param> <param-name> 動物 </param-name> <param-value> 人類 </param-value> </init-param> <load-on-startup> 1 </load-on-startup> </servlet> </web-app> ******************** [<param-value>北京西街</param-value>, <param-value>人類</param-value>] ******************** 北京西街 人類 ******************** 北京西街 人類