【Qt筆記】使用 DOM 處理 XML

時間 2019-11-11

標籤 Qt筆記使用 dom 處理 xml 欄目 HTML 简体版

原文原文鏈接

DOM 是由 W3C 提出的一種處理 XML 文檔的標準接口。Qt 實現了 DOM Level 2 級別的不驗證讀寫 XML 文檔的方法。函數

與以前所說的流的方式不一樣，DOM 一次性讀入整個 XML 文檔，在內存中構造爲一棵樹（被稱爲 DOM 樹）。咱們可以在這棵樹上進行導航，好比移動到下一節點或者返回上一節點，也能夠對這棵樹進行修改，或者是直接將這顆樹保存爲硬盤上的一個 XML 文件。考慮下面一個 XML 片斷：this

<doc>
    <quote>Scio me nihil scire</quote>
    <translation>I know that I know nothing</translation>
</doc>

咱們能夠認爲是以下一棵 DOM 樹spa

Document
  |--Element(doc)
       |--Element(quote)
       |    |--Text("Scio me nihil scire")
       |--Element(translation)
            |--Text("I know that I know nothing")

上面所示的 DOM 樹包含了不一樣類型的節點。例如，Element 類型的節點有一個開始標籤和對應的一個結束標籤。在開始標籤和結束標籤之間的內容做爲這個 Element 節點的子節點。在 Qt 中，全部 DOM 節點的類型名字都以 QDom 開頭，所以，QDomElement就是 Element 節點，QDomText就是 Text 節點。不一樣類型的節點則有不一樣類型的子節點。例如，Element 節點容許包含其它 Element 節點，也能夠是其它類型，好比 EntityReference，Text，CDATASection，ProcessingInstruction 和 Comment。按照 W3C 的規定，咱們有以下的包含規則：指針

[Document]
  <- [Element]
  <- DocumentType
  <- ProcessingInstrument
  <- Comment
[Attr]
  <- [EntityReference]
  <- Text
[DocumentFragment] | [Element] | [EntityReference] | [Entity]
  <- [Element]
  <- [EntityReference]
  <- Text
  <- CDATASection
  <- ProcessingInstrument
  <- Comment

上面表格中，帶有 [] 的能夠帶有子節點，反之則不能。code

下面咱們仍是以上一章所列出的 books.xml 這個文件來做示例。程序的目的仍是同樣的：用QTreeWidget 來顯示這個文件的結構。須要注意的是，因爲咱們選用 DOM 方式處理 XML，不管是 Qt4 仍是 Qt5 都須要在 .pro 文件中添加這麼一句：xml

QT += xml

頭文件也是相似的：對象

class MainWindow : public QMainWindow
{
    Q_OBJECT
public:
    MainWindow(QWidget *parent = 0);
    ~MainWindow();

    bool readFile(const QString &fileName);
private:
    void parseBookindexElement(const QDomElement &element);
    void parseEntryElement(const QDomElement &element, QTreeWidgetItem *parent);
    void parsePageElement(const QDomElement &element, QTreeWidgetItem *parent);
    QTreeWidget *treeWidget;
};

MainWindow的構造函數和析構函數和上一章是同樣的，沒有任何區別：遞歸

MainWindow::MainWindow(QWidget *parent)
    : QMainWindow(parent)
{
    setWindowTitle(tr("XML DOM Reader"));

    treeWidget = new QTreeWidget(this);
    QStringList headers;
    headers << "Items" << "Pages";     treeWidget->setHeaderLabels(headers);
    setCentralWidget(treeWidget);
}

MainWindow::~MainWindow()
{
}

readFile()函數則有了變化：接口

bool MainWindow::readFile(const QString &fileName)
{
    QFile file(fileName);
    if (!file.open(QFile::ReadOnly | QFile::Text)) {
        QMessageBox::critical(this, tr("Error"),
                              tr("Cannot read file %1").arg(fileName));
        return false;
    }

    QString errorStr;
    int errorLine;
    int errorColumn;

    QDomDocument doc;
    if (!doc.setContent(&file, false, &errorStr, &errorLine,
                        &errorColumn)) {
        QMessageBox::critical(this, tr("Error"),
                              tr("Parse error at line %1, column %2: %3")
                                .arg(errorLine).arg(errorColumn).arg(errorStr));
        return false;
    }

    QDomElement root = doc.documentElement();
    if (root.tagName() != "bookindex") {
        QMessageBox::critical(this, tr("Error"),
                              tr("Not a bookindex file"));
        return false;
    }

    parseBookindexElement(root);
    return true;
}

readFile()函數顯然更長更復雜。首先須要使用QFile打開一個文件，這點沒有區別。而後咱們建立一個QDomDocument對象，表明整個文檔。注意看咱們上面介紹的結構圖，Document 是 DOM 樹的根節點，也就是這裏的QDomDocument；使用其setContent()函數填充 DOM 樹。setContent()有八個重載，咱們使用了其中一個：內存

bool QDomDocument::setContent ( QIODevice * dev,
                                bool namespaceProcessing,
                                QString * errorMsg = 0,
                                int * errorLine = 0,
                                int * errorColumn = 0 )

不過，這幾個重載形式都是調用了同一個實現：

bool QDomDocument::setContent ( const QByteArray & data,
                                bool namespaceProcessing,
                                QString * errorMsg = 0,
                                int * errorLine = 0,
                                int * errorColumn = 0 )

兩個函數的參數基本相似。第二個函數有五個參數，第一個是QByteArray，也就是所讀取的真實數據，由QIODevice便可得到這個數據，而QFile就是QIODevice的子類；第二個參數肯定是否處理命名空間，若是設置爲 true，處理器會自動設置標籤的前綴之類，由於咱們的 XML 文檔沒有命名空間，因此直接設置爲 false；剩下的三個參數都是關於錯誤處理。後三個參數都是輸出參數，咱們傳入一個指針，函數會設置指針的實際值，以便咱們在外面獲取並進行進一步處理。

當QDomDocument::setContent()函數調用完畢而且沒有錯誤後，咱們調用QDomDocument::documentElement()函數得到一個 Document 元素。若是這個 Document 元素標籤是 bookindex，則繼續向下處理，不然則報錯。

void MainWindow::parseBookindexElement(const QDomElement &element)
{
    QDomNode child = element.firstChild();
    while (!child.isNull()) {
        if (child.toElement().tagName() == "entry") {
            parseEntryElement(child.toElement(),
                              treeWidget->invisibleRootItem());
        }
        child = child.nextSibling();
    }
}

若是根標籤正確，咱們取第一個子標籤，判斷子標籤不爲空，也就是存在子標籤，而後再判斷其名字是否是 entry。若是是，說明咱們正在處理 entry 標籤，則調用其本身的處理函數；不然則取下一個標籤（也就是nextSibling()的返回值）繼續判斷。注意咱們使用這個 if 只選擇 entry 標籤進行處理，其它標籤直接忽略掉。另外，firstChild()和nextSibling()兩個函數的返回值都是QDomNode。這是全部節點類的基類。當咱們須要對節點進行操做時，咱們必須將其轉換成正確的子類。這個例子中咱們使用toElement()函數將QDomNode轉換成QDomElement。若是轉換失敗，返回值將是空的QDomElement類型，其tagName()返回空字符串，if 判斷失敗，其實也是符合咱們的要求的。

void MainWindow::parseEntryElement(const QDomElement &element,
                                   QTreeWidgetItem *parent)
{
    QTreeWidgetItem *item = new QTreeWidgetItem(parent);
    item->setText(0, element.attribute("term"));

    QDomNode child = element.firstChild();
    while (!child.isNull()) {
        if (child.toElement().tagName() == "entry") {
            parseEntryElement(child.toElement(), item);
        } else if (child.toElement().tagName() == "page") {
            parsePageElement(child.toElement(), item);
        }
        child = child.nextSibling();
    }
}

在parseEntryElement()函數中，咱們建立了一個樹組件的節點，其父節點是根節點或另一個 entry 節點。接着咱們又開始遍歷這個 entry 標籤的子標籤。若是是 entry 標籤，則遞歸調用自身，而且把當前節點做爲父節點；不然則調用parsePageElement()函數。

void MainWindow::parsePageElement(const QDomElement &element,
                                  QTreeWidgetItem *parent)
{
    QString page = element.text();
    QString allPages = parent->text(1);
    if (!allPages.isEmpty()) {
         allPages += ", ";
    }
    allPages += page;
    parent->setText(1, allPages);
}

parsePageElement()則比較簡單，咱們仍是經過字符串拼接設置葉子節點的文本。這與上一章的步驟大體相同。

程序運行結果同上一章如出一轍，這裏再也不貼出截圖。

經過這個例子咱們能夠看到，使用 DOM 當時處理 XML 文檔，除了一開始的setContent()函數，其他部分已經與原始文檔沒有關係了，也就是說，setContent()函數的調用以後，已經在內存中構建好了一個完整的 DOM 樹，咱們能夠在這棵樹上面進行移動，好比取相鄰節點（nextSibling()）。對比上一章流的方式，雖然咱們早早關閉文件，可是咱們始終使用的是readNext()向下移動，同時也不存在readPrevious()這樣的函數。

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。