在msvc中使用Boost.Spirit.X3

時間 2019-11-13

標籤 msvc 使用 boost.spirit.x3 boost spirit 欄目 C&C++ 简体版

原文原文鏈接

Preface

「Examples of designs that meet most of the criteria for "goodness" (easy to understand, flexible, efficient) are a recursive-descent parser, which is traditional procedural code. Another example is the STL, which is a generic library of containers and algorithms depending crucially on both traditional procedural code and on parametric polymorphism.」 --Bjarne Stroustrup php

先把Boost文檔當中引用的Bj的名言搬過來鎮樓。小生在這裏斗膽也來一句。 Boost spirit is a recursive-descent parser, which is depending on traditional procedural code, static(parametric) polymorphism and expression template. Procedural Code控制流程，Static Polymorphism實現模式匹配與分派，再加上使用Expression Template管理語法產生式，讓spirit充滿的魔力。html

鄙文對Spirit的性能問題不做討論，只介紹Spirit.X3的一些基本概念和簡單的使用方法，並在最後給出一個簡單的示例。後面的一兩篇幅，會介紹若是擴展X3. 鄙文還假設，讀者有一些基本的編譯知識，如詞法分析、語法分析、抽象語法樹(AST)、綜合屬性和繼承屬性與終結符和非終結符。web

Terminals & Nonterminals

namespace x3 = boost::spirit::x3;

終結符號在X3中表明瞭一些基本詞法單元(parser)的集合，它們一般都是一元的(unary parser)，在後面的篇幅中會剖析spirit的源碼做詳細解釋。終結符號在展開語法生成式的時候，是最基本的單位。例如x3::char_匹配一個字符，x3::ascii::alpha匹配一個ascii碼的一個字母，x3::float_匹配一個單精度浮點數等，匹配字符串使用了正則表達式引擎。詳細請參考字符單元、數字單元和字符串單元等。正則表達式

非終結符號一般是由終結符號按照必定的邏輯關係組成而來。非終結符號經過組合終結符號來生成定義複雜的語法生成式。例如x3::float_ >> x3::float與"16.0 1.2"匹配成功，>>表示一個順序關係。*x3::char_與"asbcdf234"匹配成功，但一樣也會與"assd s s ddd"匹配成功，在詞法單元的世界中空格或者一些自定義的skipper(如註釋)會被忽略跳過。詳細的參考X3非終結符的文檔。express

上面咱們看到在X3使用終結符與C++的operator來生成非終結符，那麼非終結符究竟是什麼類型。實際上它是使用了expression template，建立了一個靜態樹形結構的語法產生式。那麼展開產生式的過程，就是一個自頂向下的深度優先遍歷，碰到非終結符號，x3會嘗試匹配其子語法單元只到終結符號。數據結構

Synthesized Attribute

不管是終結符仍是非終結符，在匹配字符串成功之後，它們將字符串做爲輸入，總會輸出的某一個類型的值。這個值就是這個語法單元的綜合屬性。例如x3::char_的綜合屬性是char類型的值，x3::float_對應float型數的值。非終結符的屬性比較複雜，能夠參考組合語法單元的綜合屬性。框架

除了綜合屬性外，還有一個繼承屬性。繼承屬性同綜合屬性同樣也是某一個類型的值，這個值可能來自於某個語法產生式其餘節點的綜合屬性。例如xml的節點<Node></Node>，在解析</Node>的時候，須要與前面的匹配，這裏就是使用繼承屬性的場景。惋惜在x3中繼承屬性尚未實現，在boost::spirit::qi中有繼承屬性的實現。小生正在嘗試實現繼承屬性，可是鄙文就不討論繼承屬性了。ide

Start Rule

在編譯解析源語言的開始，x3須要知道其語法產生式的起始語法，也就是語法產生式的靜態樹形數據結構的根節點。整個分析的流程就總根節點開始遞歸向下進行。而根節點的綜合樹形能夠是表明這個源代碼的抽象語法樹。咱們能夠發現X3的詞法分析與語法分析是被合併到一趟(One Pass)來完成了。固然，也能夠在第一趟只作詞法分析，將根節點的綜合屬性依舊爲字符串，而後再作第二趟完成語法分析。函數

Simple Examples

1. 解析"1.2 , 1.3 , 1.4 , 1.5"

#include <boost/spirit/home/x3.hpp>   // x3 core 
#include <boost/fusion/adapted.hpp>   // adapt fusion.vector with std::vector // ......
 std::string source = "1.2 , 1.3 , 1.4 , 1.5"; auto itr = source.cbegin(); auto end = source.cend(); std::vector<float> result; auto r = phrase_parse(itr, end, x3::float_ >> *(',' >> x3::float_), x3::ascii::space, result);

x3::float_ >> *(',' >> x3::float_)表示一個float類型的數據後面緊跟若干個(',' >> x3::float_)的組合。在嘗試寫組合語法產生式的時候，先考慮語法再考慮綜合屬性。那麼這裏就要探究一下，這個組合產生式的綜合屬性是什麼。','是一個字符常量，在x3的文檔中能夠知道，字符串常量x3::lit的綜合屬性是x3::unused，這意味着它只會消費(consume)源碼的字符串而不會消費(consume)綜合屬性的佔位。簡而言之',' >> x3::float_中的','能夠忽略，則其綜合屬性就是float類型的值。那麼整個產生式的綜合屬性就是std::vector<int>類型的值了，或者其類型與std::vector<int>兼容(fusion.adapt)。性能

auto r = phrase_parse(itr, end, x3::float_ % ',', x3::ascii::space, result);

x3::float_ >> *(',' >> x3::float_)能夠簡化爲x3::float_ % ','.

2. 解析" 1.2, Hello World"併產生一個用戶自定義的綜合屬性

struct user_defined { float value; std::string name; }; BOOST_FUSION_ADAPT_STRUCT( user_defined, value, name) // .....
 std::string source = "1.2, Hello World"; auto itr = source.cbegin(); auto end = source.cend(); user_defined data; auto r = phrase_parse(itr, end, x3::float_ >> ',' >> x3::lexeme[*x3::char_], x3::ascii::space, data);

    藉助Boost.Fusion庫，咱們能夠把一個struct適配成一個tuple. 宏BOOST_FUSION_ADAPT_STRUCT就把struct user_defined適配成了boost::fusion::vector<float, std::string>.

 x3::lexeme是一個詞法探測器。詞法探測器一樣是一個parser，一樣有綜合屬性。lexeme的綜合屬性是一個字符串值，可是它修改字符串迭代器的行爲，在匹配的時候不跳過空格。若是是默認跳過空格的行爲，那麼*x3::char_會跳過字符串間的空格，匹配的結果將會是"HelloWorld"，這是一個錯誤的結果；而x3::lexeme[*x3::char_]匹配的結果是"Hello World".

phrase_parse函數定義在boost::spirit::x3的命名空間下，在這裏phrase_parse是一個非限定性名稱(unqualified name)，使用ADL查找就能正確找到函數的入口。

3. 解析C++的identifier

C++的identifier要求第一個字符只能是字母或者下劃線，然後面的字符能夠是字母數字或者下劃線；

auto const identifier_def = x3::lexeme[x3::char_("_a-zA-Z") >> *x3::char_("_0-9a-zA-Z")];

第一種方法比較直觀。x3::char_只匹配一個字符，x3::char_重載的operator call能夠羅列其能夠匹配的所有字符，別忘了使用lexeme不跳過空格。

auto const identifier_def = x3::lexeme[(x3::alpha | x3::char_('_')) >> *(x3::alnum | x3::char_('_'))];

第二種方法使用了x3中內置的charactor parser. x3::alpha是一個字母的parser而x3::alnum是字母和數字的parser.

auto const identifier_def = x3::lexeme[(x3::alpha | '_') >> *(x3::alnum | '_')];

這一種看似更簡潔，可是它其實是錯誤的。緣由在於'_'是一個常量字符，x3::lit是沒有綜合屬性的，因此當咱們使用這個parser去解析一個identirier的時候，它會漏掉下劃線。

auto const identifier_def = x3::raw[x3::lexeme[(x3::alpha | '_') >> *(x3::alnum | '_')]];

這一個例子會讓咱們更深入的理解匹配串與綜合屬性的關係。雖然x3::raw的重載的operator index中的表達式的綜合屬性會忽略下劃線，可是它匹配的字符串沒有忽略下劃線！x3::raw探測器，是一個unary parser，其綜合屬性的類型是一個字符串。它忽略其operator index中parser的綜合屬性，以其匹配的串來代替！例如，"_foo_1"中x3::lexeme[(x3::alpha | '_') >> *(x3::alnum | '_')]匹配的串是"_foo_1"，其綜合屬性是"foo1"；identifier_def的綜合屬性就把"foo1"用匹配串"_foo_1"代替。

4. 解析C++的註釋

C++中註釋有兩種"//"和"/**/"。"//"一直到本行結束都是註釋；而"/*"與下一個"*/"之間的都是註釋。

auto const annotation_def = (x3::lit("//") > x3::seek[x3::eol | x3::eoi]) | (x3::lit("/*") > x3::seek[x3::lit("*/")]);

operator> 與operator>>都是順序關係，可是前者比後者更嚴格。後者由operator>>順序鏈接的parser不存在也是能夠經過匹配的；可是前者有一個predicate的性質在其中，operator>鏈接的parser必須匹配才能成功。x3::eol與x3::eoi是兩個charactor parser，分別表示文件的換行符與文件末尾符。咱們值關心註釋匹配的串，在真正的解析中會被忽略掉，而不關心註釋語法單元的綜合屬性。x3::seek是另一個詞法探測器，它的綜合屬性依舊是一個字符串，它同x3::lexeme同樣修改了迭代器的行爲，匹配一個串直到出現一個指定的字符爲止。

msvc中使用x3

x3使用了C++14標準的特性，如Expression SFINAE(基本上都是它的鍋)， Generic Lambda等。它使用的大部分C++14的特性在vs2015的編譯器上暫時都有實現除了Expression SFINAE. 小生只過了X3官方的例子，發現只用把這些使用了Expression SFINAE的代碼改爲傳統的SFINAE的方法。除此以外還有Boost.Preprocessor庫與decltype一塊兒使用的時候在msvc14.0的編譯器下有bug的問題。順便噴一下微軟，msvc都開始實現C++17的提案了，居然連C++11的標準都尚未所有搞定！

1. 修改<boost\spirit\home\x3\nonterminal\detail\rule.hpp>中的代碼

//template <typename ID, typename Iterator, typename Context, typename Enable = void>
    //struct has_on_error : mpl::false_ {};
    //
    //template <typename ID, typename Iterator, typename Context>
    //struct has_on_error<ID, Iterator, Context,
    //    typename disable_if_substitution_failure<
    //        decltype(
    //            std::declval<ID>().on_error(
    //                std::declval<Iterator&>()
    //              , std::declval<Iterator>()
    //              , std::declval<expectation_failure<Iterator>>()
    //              , std::declval<Context>()
    //            )
    //        )>::type
    //    >
    //  : mpl::true_
    //{};
 template <typename ID, typename Iterator, typename Context>
struct has_on_error_impl { template <typename U, typename = decltype(declval<U>().on_error( std::declval<Iterator&>(), std::declval<Iterator>(), std::declval<expectation_failure<Iterator>>(), std::devlval<Context>() ))>
    static mpl::true_ test(int); template<typename> static mpl::false_ test(...); using type = decltype(test<ID>(0)); }; template <typename ID, typename Iterator, typename Context>
using has_on_error = typename has_on_error_impl<ID, Iterator, Context>::type; //template <typename ID, typename Iterator, typename Attribute, typename Context, typename Enable = void>
//struct has_on_success : mpl::false_ {};
//
//template <typename ID, typename Iterator, typename Attribute, typename Context>
//struct has_on_success<ID, Iterator, Context, Attribute,
//    typename disable_if_substitution_failure<
//        decltype(
//            std::declval<ID>().on_success(
//                std::declval<Iterator&>()
//              , std::declval<Iterator>()
//              , std::declval<Attribute&>()
//              , std::declval<Context>()
//            )
//        )>::type
//    >
//  : mpl::true_
//{};
 template <typename ID, typename Iterator, typename Attribute, typename Context>
struct has_on_success_impl { template <typename U, typename = decltype(declval<U>().on_success( std::declval<Iterator&>(), std::declval<Iterator>(), std::declval<Attribute>(), std::declval<Context>() ))>
    static mpl::true_ test(int); template<typename> static mpl::false_ test(...); using type = decltype(test<ID>(0)); }; template<typename ID, typename Iterator, typename Attribute, typename Context>
using has_on_success = typename has_on_success_impl<ID, Iterator, Attribute, Context>::type;

2. 修改<boost/spirit/home/x3/support/utility/is_callable.hpp>中的代碼

    //template <typename Sig, typename Enable = void>
    //struct is_callable_impl : mpl::false_ {};
    
    //template <typename F, typename... A>
    //struct is_callable_impl<F(A...), typename disable_if_substitution_failure<
    //    decltype(std::declval<F>()(std::declval<A>()...))>::type>
    //  : mpl::true_
    //{};
 template <typename Sig>
    struct is_callable_impl : mpl::false_ {}; template <typename F, typename ... A>
    struct is_callable_impl<F(A...)> { template <typename T, typename = decltype(std::declval<F>()(std::declval<A>()...))>
        static mpl::true_ test(int); template <typename T>
        static mpl::false_ test(...); using type = decltype(test<F>(0)); };

3. 修改<boost/spirit/home/x3/nonterminal/rule.hpp>中的BOOST_SPIRIT_DEFINE爲以下代碼

#define BOOST_SPIRIT_DEFINE_(r, data, rule_name)                                \
    using BOOST_PP_CAT(rule_name, _t) = decltype(rule_name); \ template <typename Iterator, typename Context, typename Attribute> \ inline bool parse_rule( \ BOOST_PP_CAT(rule_name, _t) rule_ \ , Iterator& first, Iterator const& last \ , Context const& context, Attribute& attr) \ { \ using boost::spirit::x3::unused; \ static auto const def_ = (rule_name = BOOST_PP_CAT(rule_name, _def)); \ return def_.parse(first, last, context, unused, attr); \ } \ /***/

修改出一、2都是由於Expression SFINAE在msvc中尚未實現。而修改處3的緣由是在使用BOOST_SPIRIT_DEFINE貌似與decltype有衝突，小生寫了一些測試代碼，最後把問題鎖定在decltype(rule_name)做爲形參類型的用法上。這裏在gcc上編譯是沒有問題的，應該是msvc對decltype的支持還不徹底。BOOST_SPIRIT_DEFINE涉及到x3::rule的使用，將在下一篇詳細講解使用方法。

Ending

Boost.Spirit乍看把C++語法弄得面目全非，其實在處理Expression Template的時候，重載operator是最優雅的作法。在UE4的UI框架，還有一些基於Expression Template的數學庫中也大量使用了這種技巧。Recursive Descent - 迭代是人，遞歸是神；Static Polymorphism - 形散而神不散。而Expression Template應用在其中，就像是前面二者的軀骨框架。可是Expression Template若是構建特別複雜的語法產生式，也會使得編譯器負擔很重，下降編譯速度，甚至致使類型標識符的長度大於4K！這些問題將在後面的篇幅同Spirit運行期的效率問題一同討論。整體而言，小生以爲Spirit依舊是優雅的。

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。