c++字符串分割方法整理

時間 2019-11-07

標籤 c++ 字符串分割方法整理欄目 C&C++ 简体版

原文原文鏈接

c++的標準庫string不少東西沒有提供,包括但不限於split/join/slice, 而用到的機會又不少, 雖然利用標準庫/第三方庫實現split功能的方式有千千萬, 本篇就按照how to split a string in c++中的幾種方式給出一下,已經能覆蓋到平時的使用場景了.git

solution 1.1 : 字符流迭代器

使用場景：只有標準庫可用;　只對空格進行分割
原理：輸入流迭代器istream_iterator將輸入流(文件/string等)分割成若干元素,　分割是經過迭代器自增實現,　每自增一次迭代器就從輸入流中獲取一個元素直到末尾,　而尾迭代器不須要賦值, 默認就空的;　流迭代器分割輸入流的默認算法是按照空格來截取,　所以能夠利用輸入流迭代器的此種特性來作字符串分割.
代碼：github

std::istringstream iss(text);
     std::istream_iterator<std::string> Itbegin = std::istream_iterator<std::string>(iss);
     std::istream_iterator<std::string> ItEnd   = std::istream_iterator<std::string>();
     std::vector<std::string> results1(Itbegin, ItEnd);

solution 1.2 : 對輸入流重載>>

使用場景: 只有標準庫可用;　不單單隻對空格進行分割
原理: 輸入流迭代器的自增其實是調用了string的非成員函數operator>>,　在裏面給定了默認分割規則(按空格),　所以若是有其餘須要,　能夠繼承string類,　並重載非成員函數std::istream &operator>>(std::istream &is, std::string &ouput), 在函數實現中給定(或者用模板類制定)分割規則.
代碼：正則表達式

class WordDelimitedByCommas : public std::string
     {
     };
 
     std::istream & operator>>(std::istream &is, WordDelimitedByCommas &output)
     {
         return std::getline(is, output, ',');
     }
    ...
     std::string text2 = "Let,me,split,this,into,words";
     std::istringstream iss2(text2);
     auto Itbegin2 = std::istream_iterator<WordDelimitedByCommas>(iss2);
     auto ItEnd2   = std::istream_iterator<WordDelimitedByCommas>();
     std::vector<std::string> results2(Itbegin2, ItEnd2);

solution 2: boost::split

使用場景：　有boost可用;　分割規則客戶端指定
原理：　client提供容器vector<string>, boost分割以後存入其中
代碼：算法

std::vector<std::string> results3;
     boost::split(results3, text, boost::is_any_of(","));

solution 3: using ranges

使用場景: 能夠使用其餘第三方庫, head-only庫range-v3
原理：range-v3提供pipeline風格語法
代碼：tcp

std::string text = "Let me split this into words";
    auto splitText = text | view::split(' ');

solution 4:

使用場景: 編譯器支持c++14; 待分割字符有必定規則(能夠用正則表達式抽象)
原理:　正則表達式的迭代器std::regex_token_iterator按照正則因子的規則分割字符串
代碼:函數

std::regex sep ("[ ,.]+");
     std::sregex_token_iterator tokens(text.cbegin(), text.cend(), sep, -1);
     std::sregex_token_iterator end;
     std::cout<<"[solution:regex]"<<std::endl;
     for(; tokens != end; ++tokens){
         std::cout << "token found: " << *tokens << "\n";
     }