c++並行計算庫TBB和PPL的基本用法

時間 2019-11-11

標籤 c++ 並行計算 tbb ppl 基本用法欄目 C&C++ 简体版

原文原文鏈接

並行庫充分利用多核的優點，經過並行運算提升程序效率，本文主要介紹c++中兩個知名的並行庫，一個是intel開發的TBB，一個是微軟開發的PPL。本文只介紹其基本的經常使用用法：並行算法和任務。c++

TBB（Intel® Threading Building Blocks ）

TBB是intel用標準c++寫的一個開源的並行計算庫。它的目的是提高數據並行計算的能力，能夠在他的官網上下載最新的庫和文檔。TBB主要功能：算法

1. 並行算法
2. 任務調度
3. 並行容器
4. 同步原語
5. 內存分配器

TBB並行算法

parallel_for：並行方式遍歷一個區間。windows

parallel_for(1, 20000, [](int i){cout << i << endl; });
parallel_for(blocked_range<size_t>(0, 20000), [](blocked_range<size_t>& r)
{
    for (size_t i = r.begin(); i != r.end(); ++i)
        cout << i << endl; 
});

parallel_do和parallel_for_each：將算法應用於一個區間數組

vector<size_t> v;
parallel_do(v.begin(), v.end(), [](size_t i){cout << i << endl; });
parallel_for_each(v.begin(), v.end(), [](size_t i){cout << i << endl; });

parallel_reduceide

　　相似於map_reduce，可是有區別。它先將區間自動分組，對每一個分組進行聚合(accumulate)計算，每組獲得一個結果，最後將各組的結果進行匯聚(reduce)。這個算法稍微複雜一點，parallel_reduce(range,identity,func,reduction)，第一個參數是區間範圍，第二個參數是計算的初始值，第三個參數是聚合函數，第四個參數是匯聚參數。函數

float ParallelSum(float array [], size_t n) {
    return parallel_reduce(
        blocked_range<float*>(array, array + n),
        0.f,
        [](const blocked_range<float*>& r, float value)->float {
            return std::accumulate(r.begin(), r.end(), value);
    },
        std::plus<float>()
        );
}

這個對數組求和的例子就是先自動分組而後對各組中的元素進行聚合累加，最後將各組結果匯聚相加。ui

parallel_pipeline:並行的管道過濾器spa

　　數據流通過一個管道，在數據流動的過程當中依次要通過一些過濾器的處理，其中有些過濾器可能會並行處理數據，這時就能夠用到並行的管道過濾器。舉一個例子，好比我要讀入一個文件，先將文件中的數字提取出來，再將提取出來的數字作一個轉換，最後將轉換後的數字輸出到另一個文件中。其中讀文件和輸出文件不能並興去作，可是中間數字轉換的環節能夠並行去作的。parallel_pipeline的原型：設計

parallel_pipeline( max_number_of_live_tokens, 
                   make_filter<void,I1>(mode0,g0) &
                   make_filter<I1,I2>(mode1,g1) &
                   make_filter<I2,I3>(mode2,g2) &
                   ... 
                   make_filter<In,void>(moden,gn) );

　　第一個參數是最大的並行數，咱們能夠經過&鏈接多個filter，這些filter是順序執行的，前一個filter的輸出是下一個filter的輸入。c++11

float RootMeanSquare( float* first, float* last ) {
    float sum=0;
    parallel_pipeline( /*max_number_of_live_token=*/16,       
        make_filter<void,float*>(
            filter::serial,
            [&](flow_control& fc)-> float*{
                if( first<last ) {
                    return first++;
                 } else {
                    fc.stop();
                    return NULL;
                }
            }    
        ) &
        make_filter<float*,float>(
            filter::parallel,
            [](float* p){return (*p)*(*p);} 
        ) &
        make_filter<float,void>(
            filter::serial,
            [&](float x) {sum+=x;}
        )
    );
    return sqrt(sum);
}

　　第一個filter生成數據（如從文件中讀取數據等），第二個filter對產生的數據進行轉換，第三個filter是對轉換後的數據作累加。其中第二個filter是能夠並行處理的，經過filter::parallel來指定其處理模式。

parallel_sort:並行排序

const int N = 1000000;
float a[N];
float b[N];
parallel_sort(a, a + N);
parallel_sort(b, b + N, std::greater<float>());

parallel_invoke:並行調用，並行調用多個函數

void f();
extern void bar(int);

void RunFunctionsInParallel() {
    tbb::parallel_invoke(f, []{bar(2);}, []{bar(3);} );
}

TBB任務

task_group表示能夠等待或者取消的任務集合

task_group g;
g.run([]{TestPrint(); });
g.run([]{TestPrint(); });
g.run([]{TestPrint(); });
g.wait();

PPL(Parallel Patterns Library)

　　PPL是微軟開發的並行計算庫，它的功能和TBB是差很少的，可是PPL只能在windows上使用。兩者在並行算法的使用上基本上是同樣的, 但仍是有差別的。兩者的差別：

parallel_reduce的原型有些不一樣，PPL的paraller_reduce函數多一個參數，原型爲parallel_reduce(begin,end,identity,func,reduction), 比tbb多了一個參數，可是表達的意思差很少，一個是區間，一個是區間迭代器。
PPL中沒有parallel_pipeline接口。
TBB的task沒有PPL的task強大，PPL的task能夠鏈式連續執行還能夠組合任務，TBB的task則不行。

PPL任務的鏈式連續執行then

int main()
{
    auto t = create_task([]() -> int
    { 
        return 0;
    });

    // Create a lambda that increments its input value.
    auto increment = [](int n) { return n + 1; };

    // Run a chain of continuations and print the result. 
    int result = t.then(increment).then(increment).then(increment).get();
    cout << result << endl;
}
/* Output:
    3
*/

PPL任務的組合

　　1.when_all能夠執行一組任務，全部任務完成以後將全部任務的結果返回到一個集合中。要求該組任務中的全部任務的返回值類型都相同。

array<task<int>, 3> tasks =
{
    create_task([]() -> int { return 88; }),
    create_task([]() -> int { return 42; }),
    create_task([]() -> int { return 99; })
};

auto joinTask = when_all(begin(tasks), end(tasks)).then([](vector<int> results)
{
    cout << "The sum is " 
          << accumulate(begin(results), end(results), 0)
          << '.' << endl;
});

// Print a message from the joining thread.
cout << "Hello from the joining thread." << endl;

// Wait for the tasks to finish.
joinTask.wait();

2.when_any任務組中的某一個任務執行完成以後，返回一個pair，鍵值對是結果和任務序號。

array<task<int>, 3> tasks = {
        create_task([]() -> int { return 88; }),
        create_task([]() -> int { return 42; }),
        create_task([]() -> int { return 99; })
    };

    // Select the first to finish.
    when_any(begin(tasks), end(tasks)).then([](pair<int, size_t> result)
    {
        cout << "First task to finish returns "
              << result.first
              << " and has index "
              << result.second<<endl;
    }).wait();
//output: First task to finish returns 42 and has index 1.