1.問題描述node
屬性無序問題和xml聲明不是單獨一行python
# cat HKEX-EPS_20180830_003249795.xmldom
<?xml version="1.0" encoding="UTF-8"?><ETCML><IISHeadline><News Encoding="UTF-8" Language="en-us" TimeStamp="20180830194015"><NewsID>2468438</NewsID><NewsDate>20180830194015</NewsDate><ProviderID>HKEX-EPS</ProviderID><Type>AMENDED</Type><Language>en-us</Language><HeadlineTChi></HeadlineTChi><HeadlineSChi></HeadlineSChi><HeadlineEng>CHANGE OF COMPANY NAME,STOCK SHORT NAME AND COMPANY LOGO</HeadlineEng><ExpiryDate>20180831</ExpiryDate><MktCode>MAIN</MktCode><Cancel>false</Cancel><AttachmentList Total="1"><Attachement><FilePath>HKEX-EPS_20180830_003249795_0.PDF</FilePath><FileContentType>APPLICATION/PDF</FileContentType><FileSize>521386</FileSize></Attachement></AttachmentList><AnnouncementTypeList Total="4"><AnnTypeCd>12700</AnnTypeCd><AnnTypeCd>19790</AnnTypeCd><AnnTypeCd>10000</AnnTypeCd><AnnTypeCd>18540</AnnTypeCd></AnnouncementTypeList><RelatedStockList Total="1"><RelatedStock><Code>1400</Code><NameTChi>?地科技股份</NameTChi><NameSChi>滿地科技股份</NameSChi><NameEng>MOODY TECH HLDG</NameEng></RelatedStock></RelatedStockList></News></IISHeadline><Product>ET Net IIS Category List</Product><Provider>ET Net Ltd</Provider><Copyright>?2018 ET Net Limited. All rights reserved.</Copyright></ETCML>python2.7
達到效果:ide
cat HKEX-EPS_20180830_003249795.xmlthis
<?xml version="1.0" encoding="UTF-8"?>spa
<ETCML><IISHeadline><News TimeStamp="20180830194015" Encoding="UTF-8" Language="en-us"><NewsID>2468438</NewsID><NewsDate>20180830194015</NewsDate><ProviderID>HKEX-EPS</ProviderID><Type>AMENDED</Type><Language>en-us</Language><HeadlineTChi></HeadlineTChi><HeadlineSChi></HeadlineSChi><HeadlineEng>CHANGE OF COMPANY NAME,STOCK SHORT NAME AND COMPANY LOGO</HeadlineEng><ExpiryDate>20180831</ExpiryDate><MktCode>MAIN</MktCode><Cancel>false</Cancel><AttachmentList Total="1"><Attachement><FilePath>HKEX-EPS_20180830_003249795_0.PDF</FilePath><FileContentType>APPLICATION/PDF</FileContentType><FileSize>521386</FileSize></Attachement></AttachmentList><AnnouncementTypeList Total="4"><AnnTypeCd>12700</AnnTypeCd><AnnTypeCd>19790</AnnTypeCd><AnnTypeCd>10000</AnnTypeCd><AnnTypeCd>18540</AnnTypeCd></AnnouncementTypeList><RelatedStockList Total="1"><RelatedStock><Code>1400</Code><NameTChi>?地科技股份</NameTChi><NameSChi>滿地科技股份</NameSChi><NameEng>MOODY TECH HLDG</NameEng></RelatedStock></RelatedStockList></News></IISHeadline><Product>ET Net IIS Category List</Product><Provider>ET Net Ltd</Provider><Copyright>?2018 ET Net Limited. All rights reserved.</Copyright></ETCML>xml
2操做步驟get
2.1環境說明string
系統自帶python2.6.6 升級爲 python2.7.10
若是沒有升級python2.7,
>>> import sys
>>> sys.path
路徑爲 /usr/lib64/python2.6/xml/dom
使用的模塊是
import xml.dom.minidom
2.2換行處理
# cd /usr/local/lib/python2.7/xml/dom/
原始配置
def writexml(self, writer, indent="", addindent="", newl="",
encoding = None):
if encoding is None:
writer.write('<?xml version="1.0" ?>'+newl)
else:
writer.write('<?xml version="1.0" encoding="%s"?>%s' % (encoding, newl))
for node in self.childNodes:
node.writexml(writer, indent, addindent, newl)
修改配置
def writexml(self, writer, indent="", addindent="", newl="",
encoding = None):
if encoding is None:
writer.write('<?xml version="1.0" ?>'+'\n')
else:
writer.write('<?xml version="1.0" encoding="%s"?>%s' % (encoding, '\n'))
for node in self.childNodes:
node.writexml(writer, indent, addindent, newl)
2.3屬性有序處理
原始配置
def __init__(self, tagName, namespaceURI=EMPTY_NAMESPACE, prefix=None,
localName=None):
self.tagName = self.nodeName = tagName
self.prefix = prefix
self.namespaceURI = namespaceURI
self.childNodes = NodeList()
self._attrs = {} # attributes are double-indexed:
self._attrsNS = {} # tagName -> Attribute
# URI,localName -> Attribute
# in the future: consider lazy generation
# of attribute objects this is too tricky
# for now because of headaches with
# namespaces.
......
def writexml(self, writer, indent="", addindent="", newl=""):
# indent = current indentation
# addindent = indentation to add to higher levels
# newl = newline string
writer.write(indent+"<" + self.tagName)
attrs = self._get_attributes()
a_names = attrs.keys()
a_names.sort()
修改配置:
def __init__(self, tagName, namespaceURI=EMPTY_NAMESPACE, prefix=None,
localName=None):
self.tagName = self.nodeName = tagName
self.prefix = prefix
self.namespaceURI = namespaceURI
self.childNodes = NodeList()
#self._attrs = {} # attributes are double-indexed:
self._attrs = OrderedDict() # attributes are double-indexed:
self._attrsNS = {} # tagName -> Attribute
# URI,localName -> Attribute
# in the future: consider lazy generation
# of attribute objects this is too tricky
# for now because of headaches with
# namespaces.
......
def writexml(self, writer, indent="", addindent="", newl=""):
# indent = current indentation
# addindent = indentation to add to higher levels
# newl = newline string
writer.write(indent+"<" + self.tagName)
attrs = self._get_attributes()
a_names = attrs.keys()
#a_names.sort()
3.總結
親測可用