java使用dom4j解析xml

時間 2019-11-13

標籤 java 使用 dom4j dom 解析 xml 欄目 Java 简体版

原文原文鏈接

目標：將指定XML進行解析,以鍵=值的方式返回指定節點下的數據html

所須要的jar包：dom4j1.6.1.jar、jaxen-1.1.jarjava

<?xml version="1.0" encoding="UTF-8"?>
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"
    xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <SOAP-ENV:Body>
        <m:Main xmlns:m="http://webservice.xnhdbbx.org/">
            <arg0>
                <![CDATA[
                    <business>
                        <functioncode>400105</functioncode>
                        <d401_19>411102</d401_19>
                        <t104_03>1</t104_03>
                        <list>
                            <item id='1'>
                                <i201_00>FDAC4FC422F5E339E04000</i201_00>
                            </item>
                        </list>
                        <remark1 />
                        <remark2 />
                        <remark3 />
                        <remark4 />
                        <remark5 />
                    </business>
                ]]>
            </arg0>
        </m:Main>
    </SOAP-ENV:Body>
</SOAP-ENV:Envelope>

View Code

如上的xml，個人目的是得到item下節點數據。格式爲：i201_00=FDAC4FC422F5E339E04000node

下面是代碼web

package cn.code;

import java.io.File;
import java.util.Iterator;
import java.util.List;

import org.dom4j.Document;
import org.dom4j.DocumentException;
import org.dom4j.DocumentHelper;
import org.dom4j.Element;
import org.dom4j.Node;
import org.dom4j.io.SAXReader;

public class Dom4JParseXml {
    public static void main(String[] args) {
        SAXReader reader = new SAXReader();
        File file = new File("in.xml");
        try {
            Document doc = reader.read(file);
            Element root = doc.getRootElement();
            String nodePath = "/business/list/item";
            System.out.println(bar(root, nodePath));
        } catch (DocumentException e) {
            e.printStackTrace();
        }
    }

    // 遍歷root 直到它的下一個節點爲Node
    public static String bar(Element root, String nodePath)
            throws DocumentException {
        Iterator i = root.elementIterator();
        Element element = null;
        while (i.hasNext()) {
            element = (Element) i.next();
            if (element.elementIterator().hasNext()) {
                return bar(element, nodePath);
            } else {
                return barNode(element, nodePath);
            }
        }
        return null;
    }

    // 遍歷element下的Node
    public static String barNode(Node node, String nodePath) {
        StringBuffer buf = new StringBuffer();
        try {
            Document document = DocumentHelper.parseText(node.getStringValue());
            List list1 = document.selectNodes(nodePath);
            for (Object object : list1) {
                Element n = (Element) object;
                List i201_ = n.elements();
                for (Object object2 : i201_) {
                    Node i_node = (Node) object2;
                    buf.append(i_node.getName() + "="
                            + i_node.getStringValue().trim());
                }
            }
        } catch (Exception e) {
            System.out.println("node.getStringValue() parseText Exception");
        }
        return buf.toString();
    }
}

View Code

上面是完整的代碼。app

注意以上的XML中,element arg0下面的數據是經過<![CDATA[..]]>包圍的,<![CDATA[..]]>中的文本解釋器是不會執行的（會被解析器忽略）,那麼從這能夠知道arg0是一個節點的元素(element),而<![CDATA[..]]>裏面的內容只是個純文本.因此在bar這個方法中用到了迭代，主要是將純文本拿到。dom

第二,由純文本的結構可知，該文本是一個document,故在barNode這個方法裏,首先將這個文本解析成一個document.而後調用document.selectNodes("");方法獲得一個Node的集合,再進行遍歷. 其中document還有document.selectSingleNode("")方法,這個方法是直接獲得一個Node節點.ide

參考資料：http://dom4j.sourceforge.net/dom4j-1.6.1/guide.htmlui