介紹 ATL CAtlRegExp，GRETA，Boost::regex 等正則表達式庫

時間 2019-11-12

標籤介紹 atl catlregexp greta boost regex 正則表達式欄目 HTML 简体版

原文原文鏈接

本文摘要翻譯了幾篇文章的內容，簡單介紹 ATL CAtlRegExp，GRETA，Boost::regex 等正則表達式庫，這些表達式庫使咱們能夠方便地利用正則庫的巨大威力，給咱們的工做提供了便利。

　　正則表達式語法ios

字符元意義 . 匹配單個字符 [ ] 指定一個字符類，匹配方括號內的任意字符。例：[abc] 匹配 "a", "b"或 "c"。 ^ 若是^出如今字符類的開始處，它否認了字符類，這個被否認的字符類匹配除卻方括號內的字符的字符。如：[^abc]匹配除了"a", "b"和"c"以外的字符。若是^出如今正則表達式前邊，它匹配輸入的開頭，例：^[abc]匹配以"a", "b"或"c"開頭的輸入。 - 在字符類中，指定一個字符的範圍。例如：[0-9]匹配"0"到"9"的數字。 ? 指明?前的表達式是可選的，它能夠匹配一次或不進行匹配。例如： [0-9][0-9]? 匹配"2"或"12"。 + 指明?前的表達式匹配一次或屢次。例如：[0-9]+匹配"1", "13", "666"等。

　　*git

指明*前的表達式匹配零次或屢次。 ??, +?, *? ?, +和*的非貪婪匹配版本，它們儘量匹配較少的字符；而?, +和*則是貪婪版本，儘量匹配較多的字符。例如：輸入"<abc><def>", 則<.*?> 匹配"<abc>"，而<.*>匹配"<abc><def>"。 ( ) 分組操做符。例如：(d+,)*d+匹配一串由逗號分開的數字，例如： "1"或"1,23,456"。轉義字符，轉義緊跟的字符。例如，[0-9]+ 匹配一個或多個數字，而 [0-9]+ 匹配一個數字後跟隨一個加號的狀況。反斜槓也用於表示縮寫，a 就表示任何數字、字母。若是後緊跟一個數字n，則它匹配第n個匹配羣組(從0開始)，例如，<{.*?}>.*?</>匹配 "<head>Contents</head>"。注意，在C++字符串中，反斜槓須要用雙反斜槓來表示： "+", "a", "<{.*?}>.*?</ >"。 $ 放在正則表達式的最後，它匹配輸入的末端。例如：[0-9]$匹配輸入的最後一個數字。 | 間隔符，分隔兩個表達式，以正確匹配其中一個，例如：T|the匹配"The" 或"the"。

　　縮寫匹配正則表達式

縮寫匹配 a 字母、數字([a-zA-Z0-9]) 空格(blank): ([ ]) c 字母([a-zA-Z]) d 十進制數 ([0-9]) h 十六進制數([0-9a-fA-F]) 換行: ( |( ? )) q 引用字符串("[^"]*")|(''''[^'''']*'''') w 一段文字 ([a-zA-Z]+) z 一個整數([0-9]+)

　　ATL CATLRegExpapp

　　ATL Server經常須要對地址、命令等複雜文字字段信息解碼，而正則表達式是強大的文字解析工具，因此，ATL提供了正則表達式解釋工具。函數

　　示例：工具

#include "stdafx.h"
 #include <atlrx.h>
 int main(int argc, char* argv[])
 {
 CAtlRegExp<> reUrl;
 // five match groups: scheme, authority, path, query, fragment
 REParseError status = reUrl.Parse(
 "({[^:/?#]+}:)?(//{[^/?#]*})?{[^?#]*}(?{[^#]*})?(#{.*})?" );
 if (REPARSE_ERROR_OK != status)
 {
 // Unexpected error.
 return 0;
 }
 CAtlREMatchContext<> mcUrl;
 if (!reUrl.Match(
 "http://search.microsoft.com/us/Search.asp?qu=atl&boolean=ALL#results",
 &mcUrl))
 {
 // Unexpected error.
 return 0;
 }
 for (UINT nGroupIndex = 0; nGroupIndex < mcUrl.m_uNumGroups;
 ++nGroupIndex)
 {
 const CAtlREMatchContext<>::RECHAR* szStart = 0;
 const CAtlREMatchContext<>::RECHAR* szEnd = 0;
 mcUrl.GetMatch(nGroupIndex, &szStart, &szEnd);
 ptrdiff_t nLength = szEnd - szStart;
 printf("%d: "%.*s"
 ", nGroupIndex, nLength, szStart);
 }
 }

輸出：

0: "http"
 1: "search.microsoft.com"
 2: "/us/Search.asp"
 3: "qu=atl&boolean=ALL"
 4: "results"

　　Match的結果經過第二個參數pContext所指向的CAtlREMatchContext類來返回，Match的結果及其相關信息都被存放在 CAtlREMatchContext類中，只要訪問CAtlREMatchContext的方法和成員就能夠獲得匹配的結果。 CAtlREMatchContext經過m_uNumGroups成員以及GetMatch（）方法向調用者提供匹配的結果信息。 m_uNumGroups表明匹配上的Group有多少組，GetMatch()則根據傳遞給它的Group的Index值，返回匹配上的字符串的 pStart和pEnd指針，調用者有了這兩個指針，天然能夠很方便的獲得匹配結果。性能

　　更多內容請參閱: CAtlRegExp Class測試

　　GRETAui

　　GRETA是微軟研究院推出的一個正則表達式模板類庫，GRETA 包含的 C++ 對象和函數，使字符串的模式匹配和替換變得很容易，它們是:this

" rpattern: 搜索的模式

" match_results/subst_results: 放置匹配、替換結果的容器

　　爲了執行搜索和替換的操做，用戶首先須要用一個描述匹配規則的字符串來顯式初始化一個rpattern對象，而後把須要匹配的字符串做爲參數，調用 rpattern的函數，好比match()或者substitute()，就能夠獲得匹配後的結果。若是match()/substitute()調用失敗，函數返回false，若是調用成功，函數返回true，此時，match_results對象存儲了匹配結果。請看例子代碼：

#include <iostream>
 #include <string>
 #include "regexpr2.h"
 using namespace std;
 using namespace regex;
 int main() {
 match_results results;
 string str( "The book cost $12.34" );
 rpattern pat( "$(d+)(.(dd))?" );　
 // Match a dollar sign followed by one or more digits,
 // optionally followed by a period and two more digits.
 // The double-escapes are necessary to satisfy the compiler.
 match_results::backref_type br = pat.match( str, results );
 if( br.matched ) {
 cout << "match success!" << endl;
 cout << "price: " << br << endl;
 } else {
 cout << "match failed!" << endl;
 }
 return 0;
 }

程序輸出將是:

match success!
 price: $12.34

　　您能夠閱讀GRETA文檔，獲知rpattern對象的細節內容，並掌握如何自定義搜索策略來獲得更好的效率。

　　注意：全部在頭文件regexpr2.h裏的聲明都在名稱空間regex之中，用戶使用其中的對象和函數時，必須加上前綴"regex::"，或者預先 "using namespace regex;" 一下，爲了簡單起見，下文的示例代碼中將省略"regex::" 前綴。做者生成了greta.lib和regexpr2.h文件，只需這兩個文件的支持便可使用greta來解析正則表達式。

　　匹配速度小議

　　不一樣的正則表達式匹配引擎擅長於不一樣匹配模式。做爲一個基準，當用模式："^([0-9]+)(-| |$)(.*)$" 匹配字符串"100- this is a line of ftp response which contains a message string"時，GRETA的匹配速度比boost(http://www.boost.org)正則表達式庫大約快7倍，比ATL7的 CATLRegExp快10倍之多! Boost Regex 的說明文檔帶有一個不少模式的匹配測試Performance結果。比較這個結果後，我發現GRETA在大部分狀況下和Boost Regex性能差很少，可是在用Visual Studio.Net 2003編譯的狀況下，GRETA還略勝一籌。

　　Boost.Regex

　　Boost提供了boost::basic_regex來支持正則表達式。boost::basic_regex的設計很是相似std::basic_string：

namespace boost{
 template <class charT,
 class traits = regex_traits<charT>,
 class Allocator = std::allocator<charT> > class basic_regex;
 typedef basic_regex<char> regex;
 typedef basic_regex<wchar_t> wregex;
 }

　　Boost Regex 庫附帶的文檔很是豐富，示例更是精彩，好比有兩個例子程序，很少的代碼，程序就能夠直接對 C++ 文件進行語法高亮標記，生成相應的 HTML (converts a C++ file to syntax highlighted HTML)。下面的例子能夠分割一個字符串到一串標記符號(split a string into tokens)。

#include <list> #include <boost/regex.hpp> unsigned tokenise(std::list<std::string>& l, std::string& s) { return boost::regex_split(std::back_inserter(l), s); } #include <iostream> using namespace std; #if defined(BOOST_MSVC) || (defined(__BORLANDC__) && (__BORLANDC__ == 0x550)) // problem with std::getline under MSVC6sp3 istream& getline(istream& is, std::string& s) { s.erase(); char c = is.get(); while(c != '''' '''') { s.append(1, c); c = is.get(); } return is; } #endif int main(int argc) { string s; list<string> l; do{ if(argc == 1) { cout << "Enter text to split (or "quit" to exit): "; getline(cin, s); if(s == "quit") break; } else s = "This is a string of tokens"; unsigned result = tokenise(l, s); cout << result << " tokens found" << endl; cout << "The remaining text is: "" << s << """ << endl; while(l.size()) { s = *(l.begin()); l.pop_front(); cout << s << endl; } }while(argc == 1); return 0; }