（原創）boost.property_tree解析xml的幫助類以及中文解析問題的解決

時間 2019-12-01

標籤原創 boost.property boost property tree 解析 xml 幫助以及中文問題解決欄目 C&C++ 简体版

原文原文鏈接

　　boost.property_tree能夠用來解析xml和json文件，我主要用它來解析xml文件，它內部封裝了號稱最快的xml解析器rapid_xml，其解析效率仍是很好的。可是在使用過程當中卻發現各類很差用，概括一下很差用的地方有這些：node

獲取不存在的節點時就拋出異常
獲取屬性值時，要排除屬性和註釋節點，若是沒注意這一點就會拋出異常，讓人摸不着頭腦。
內存模型有點怪。
默認不支持中文的解析。解析中文會亂碼。

ptree獲取子節點

　　獲取子節點接口原型爲get_child(node_path)，這個node_path從當前路徑開始的全路徑，父路徑和子路徑之間經過「.」鏈接，如「root.sub.child」。須要注意的是get_child獲取的是第一個子節點，若是咱們要獲取子節點列表，則要用路徑「root.sub」，這個路徑能夠獲取child的列表。若是獲取節點的路徑不存在則會拋出異常，這時，若是不但願拋出異常則能夠用get_xxx_optional接口，該接口返回一個optional<T>的結果出來，由外面判斷是否獲取到結果了。c++

//ptree的optional接口
auto item = root.get_child_optional("Root.Scenes");

　　該接口返回的是一個optional<ptree>，外面還要判斷該節點是否存在，optional對象經過bool操做符來判斷該對象是不是無效值，經過指針訪問 json

符"*"來訪問該對象的實際內容。建議用optional接口訪問xml節點。api

//ptree的optional接口
auto item = root.get_child_optional("Root.Scenes");
if(item)
    cout<<"該節點存在"<<endl;

ptree的內存模型

　　ptree維護了一個pair<string, ptree>的子節點列表，first指向的是該節點的TagName，second指向的纔是ptree節點，所以在遍歷ptree子節點時要注意迭代器的含義。ide

for (auto& data : root)
{
    for (auto& item : data.second) //列表元素爲pair<string, ptree>，要用second繼續遍歷
    {
        cout<<item.first<<endl;
    }
}

　　須要注意的是ptree.first多是屬性（"<xmlattr>"）也多是註釋（"<xmlcomment>"），只有非註釋類型的節點才能使用獲取屬性值、子節點等經常使用接口。函數

ptree獲取屬性值

　　經過get<T>(attr_name)能夠獲取屬性的值，若是想獲取屬性的整形值的話，能夠用get<int>("Id")，返回一個整數值。有一點要注意若是ptree.first爲"<xmlcomment>"時，是沒有屬性值的，能夠經過data()來獲取註釋內容。若是這個ptree.first不爲<xmlattr>時須要在屬性名稱前面加"<xmlcomment>."，即get<int>("<xmlcomment>.Id")才能正確獲取屬性值。能夠看到獲取屬性值仍是比較繁瑣的，在後面要介紹的幫助類中能夠簡化屬性值的獲取。若是要獲取節點的值則用get_value()接口，該接口用來獲取節點的值，如節點：<Field>2</Field>經過get_value()就能夠獲取值"2"。測試

解析中文的問題

　　ptree解析的xml文件的格式是utf-8格式的，若是xml文件中含有unicode如中文字符，解析出來就是亂碼。解析unicode要用wptree，該類的接口均支持寬字符而且接口和ptree保持一致。要支持中文解析僅僅wptree還不夠，還須要一個unicode轉換器的幫助，該轉換器能夠實現寬字符和窄字符的轉換，寬窄的互相轉換函數有不少實現，不過c++11中有更簡單統一的方式實現寬窄字符的轉換。spa

c++11中寬窄字符的轉換：指針

std::wstring_convert<std::codecvt<wchar_t,char,std::mbstate_t>> conv

(newstd::codecvt<wchar_t,char,std::mbstate_t>("CHS"));
//寬字符轉爲窄字符
string str = conv.to_bytes(L"你好");
//窄字符轉爲寬字符
string wstr = conv.from_bytes(str);

　　boost.property_tree在解析含中文的xml文件時，須要先將該文件轉換一下。c++11

　　boost解決方法：

#include "boost/program_options/detail/utf8_codecvt_facet.hpp"
void ParseChn()
{
    std::wifstream f(fileName);
    std::locale utf8Locale(std::locale(), new boost::program_options::detail::utf8_codecvt_facet());
    f.imbue(utf8Locale); //先轉換一下

    //用wptree去解析
    property_tree::wptree ptree;
    property_tree::read_xml(f, ptree);    
}

　　這種方法有個缺點就是要引入boost的libboost_program_options庫，該庫有二十多M，僅僅是爲了解決一箇中文問題，卻要搞得這麼麻煩，有點得不償失。好在c++11提供更簡單的方式，用c++11能夠這樣：

    void Init(const wstring& fileName, wptree& ptree)
    {
        std::wifstream f(fileName);
        std::locale utf8Locale(std::locale(), new std::codecvt_utf8<wchar_t>);
        f.imbue(utf8Locale); //先轉換一下

        //用wptree去解析
        property_tree::read_xml(f, ptree);
    }

　　用c++11就不須要再引入boost的libboost_program_options庫了，很簡單。

　　另一種方法就是，仍然用ptree和string，只是在取出string字符串後，作一個轉換爲unicode的轉換，就能獲得中文字符串了。例如：

        auto child = item.second.get_child("Scenes.Scene");
        auto oname = child.get_optional<string>("<xmlattr>.Name");

//oname內部存了一個unicode字符串，須要將其轉換爲寬字符串獲得中文
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> converter;
        std::wstring wide = converter.from_bytes(*oname);

                //寬字符串轉爲窄字符串
                //std::string narrow = converter.to_bytes(L"foo");

property_tree的幫助類

　　property_tree的幫助類解決了前面提到的問題：

用c++11解決中文解析問題
簡化屬性的獲取
增長一些操做接口，好比一些查找接口
避免拋出異常，所有返回optional<T>對象
隔離了底層繁瑣的操做接口，提供統1、簡潔的高層接口，使用更加方便。

　　下面來看看這個幫助類是如何實現的吧：

#include<boost/property_tree/ptree.hpp>
#include<boost/property_tree/xml_parser.hpp>
using namespace boost;
using namespace boost::property_tree;

#include <map>
#include <vector>
#include <codecvt>
#include <locale>
using namespace std;

const wstring XMLATTR = L"<xmlattr>";
const wstring XMLCOMMENT = L"<xmlcomment>";
const wstring XMLATTR_DOT = L"<xmlattr>.";
const wstring XMLCOMMENT_DOT = L"<xmlcomment>.";

class ConfigParser
{
public:

    ConfigParser() : m_conv(new code_type("CHS"))
    {
        
    }

    ~ConfigParser()
    {
    }

    void Init(const wstring& fileName, wptree& ptree)
    {
        std::wifstream f(fileName);
        std::locale utf8Locale(std::locale(), new std::codecvt_utf8<wchar_t>);
        f.imbue(utf8Locale); //先轉換一下
        wcout.imbue(std::locale("chs")); //初始化cout爲中文輸出格式

        //用wptree去解析
        property_tree::read_xml(f, ptree);
    }

    // convert UTF-8 string to wstring
    std::wstring to_wstr(const std::string& str)
    {
        return m_conv.from_bytes(str);
    }

    // convert wstring to UTF-8 string
    std::string to_str(const std::wstring& str)
    {
        return m_conv.to_bytes(str);
    }

    //獲取子節點列表
    auto Descendants(const wptree& root, const wstring& key)->decltype(root.get_child_optional(key))
    {
        return root.get_child_optional(key);
    }

    //根據子節點屬性獲取子節點列表
    template<typename T>
    vector<wptree> GetChildsByAttr(const wptree& parant, const wstring& tagName, const wstring& attrName, const T& attrVal)
    {
        vector<wptree> v;

        for (auto& child : parant)
        {
            if (child.first != tagName)
                continue;

            auto attr = Attribute<T>(child, attrName);

            if (attr&&*attr == attrVal)
                v.push_back(child.second);
        }

        return v;
    }

    //獲取節點的某個屬性值
    template<typename R>
    optional<R> Attribute(const wptree& node, const wstring& attrName)
    {
        return node.get_optional<R>(XMLATTR_DOT + attrName);
    }

    //獲取節點的某個屬性值，默認爲string
    optional<wstring> Attribute(const wptree& node, const wstring& attrName)
    {
        return Attribute<wstring>(node, attrName);
    }

    //獲取value_type的某個屬性值
    template<typename R>
    optional<R> Attribute(const wptree::value_type& pair, const wstring& attrName)
    {
        if (pair.first == XMLATTR)
            return pair.second.get_optional<R>(attrName);
        else if (pair.first == XMLCOMMENT)
            return optional<R>();
        else
            return pair.second.get_optional<R>(XMLATTR_DOT + attrName);
    }

    //獲取value_type的某個屬性值，默認爲string
    optional<wstring> Attribute(const wptree::value_type& pair, const wstring& attrName)
    {
        return Attribute<wstring>(pair, attrName);
    }

    //根據某個屬性生成一個<string, ptree>的multimap
    template<class F = std::function<bool(wstring&)>>
    multimap<wstring, wptree> MakeMapByAttr(const wptree& root, const wstring& key, const wstring& attrName, F predict = [](wstring& str){return true; })
    {
        multimap<wstring, wptree> resultMap;
        auto list = Descendants(root, key);
        if (!list)
            return resultMap;
        
        for (auto& item : *list)
        {
            auto attr = Attribute(item, attrName);
            if (attr&&predict(*attr))
                resultMap.insert(std::make_pair(*attr, item.second));
        }

        return resultMap;
    }

private:
    using code_type = std::codecvt<wchar_t, char, std::mbstate_t>;
    std::wstring_convert<code_type> m_conv;
};

View Code

　　測試文件test.xml和測試代碼：

<?xml version="1.0" encoding="UTF-8"?>
<Root Id="123456">
    <Scenes>
        <!--註釋說明1-->
        <Scene Name="測試1">
            <!--註釋說明11-->
            <DataSource>
                <!--註釋說明111-->
                <Data>
                    <!--註釋說明111-->
                    <Item Id="1" FileName="測試文件1" />
                </Data>
                <Data>
                    <Item Id="2" FileName="測試文件2" />
                    <Item Id="3" FileName="測試文件3" />
                </Data>
            </DataSource>
        </Scene>
        <!--註釋說明1-->
        <Scene Name="測試2">
            <DataSource>
                <Data>
                    <Item Id="4" FileName="測試文件4" />
                </Data>
                <Data>
                    <Item Id="5" FileName="測試文件5" />
                </Data>
            </DataSource>
        </Scene>
    </Scenes>
</Root>

void Test()
{
    wptree pt; 
    ConfigParser parser;
    parser.Init(L"test1.xml", pt); //解決中文問題，要轉換爲unicode解析

    auto scenes = parser.Descendants(pt, L"Root.Scenes"); //返回的是optional<wptree>
    if (!scenes)
        return;

    for (auto& scene : *scenes)
    {
        auto s = parser.Attribute(scene, L"Name"); //獲取Name屬性，返回的是optional<wstring>
        if (s)
        {
            wcout << *s << endl;
        }

        auto dataList = parser.Descendants(scene.second, L"DataSource"); //獲取第一個子節點
        if (!dataList)
            continue;

        for (auto& data : *dataList)
        {
            for (auto& item : data.second)
            {
                auto id = parser.Attribute<int>(item, L"Id");
                auto fileName = parser.Attribute(item, L"FileName");

                if (id)
                {
                    wcout << *id << L" " << *fileName << endl; //打印id和filename
                }
            }
        }
    }
}

測試結果:

　　能夠看到經過幫助類，無需使用原生接口就能夠很方便的實現節點的訪問與操做。使用者沒必要關注內部細節，根據統一而簡潔的接口就能夠操做xml文件了。

　　一點題外話，基於這個幫助類再結合linq to object能夠輕鬆的實現linq to xml：

//獲取子節點SubNode的屬性ID的值爲0x10000D的項並打印出該項的Type屬性
from(node.Descendants("Root.SubNode")).where([](XNode& node)
{
    auto s = node.Attribute("ID");
    return s&&*s == "0x10000D";
}).for_each([](XNode& node)
{
    auto s = node.Attribute("Type");
    if (s)
        cout << *s << endl;
});