運用預編譯命令:markdown
#pragma omp parallel for for(……) { /* Content */ }
#include <omp.h> #include <cstdio> #include <cstdlib>//用到rand()函數 #include <ctime> //用到clock()函數 const int maxn = 1e8; int main() { int begintime, endtime; printf("It is use parallel compute:\n"); begintime = clock(); //計時開始 #pragma omp parallel for for (int i = 0; i < maxn; ++i); endtime = clock(); //計時結束 printf("\n\nRunning Time:%dms\n", endtime - begintime); printf("\n\n\nIt is not use parallel compute:\n"); begintime = clock(); //計時開始 for (int i = 0; i < maxn; ++i); endtime = clock(); //計時結束 printf("\n\nRunning Time:%dms\n", endtime - begintime); return 0; }
性能差距以下:ide
#include <omp.h> #include <cstdio> #include <cstdlib>//用到rand()函數 #include <ctime> //用到clock()函數 const int maxn = 1e8; int main() { int begintime, endtime; int nthreads; printf("It is use parallel compute:\n\n\n"); #pragma omp parallel nthreads = omp_get_num_threads(); printf("Now, it is %d threads\n", nthreads); begintime = clock(); //計時開始 #pragma omp parallel for for (int i = 0; i < maxn; ++i); endtime = clock(); //計時結束 printf("\nRunning Time:%dms\n", endtime - begintime); #pragma omp parallel nthreads = 12; omp_set_num_threads(nthreads); printf("\nNow, it is %d threads\n", nthreads); begintime = clock(); //計時開始 #pragma omp parallel for for (int i = 0; i < maxn; ++i); endtime = clock(); //計時結束 printf("\nRunning Time:%dms\n", endtime - begintime); printf("\nIt is not use parallel compute:\n"); begintime = clock(); //計時開始 for (int i = 0; i < maxn; ++i); endtime = clock(); //計時結束 printf("\nRunning Time:%dms\n", endtime - begintime); return 0; }
修改以前的那一份代碼,首先用函數:函數
int nthreads = omp_get_num_threads();
獲知當前並行計算的的線程數,我這邊默認的線程數是 8 個線程。須要注意的是,omp的函數接口,必定要寫在並行區域內,也就是宏指令內!!!性能
若是我設置的並行線程數只有 1 ,那麼它和串行計算相比較,效率結果會如何呢??學習
實驗一下:atom
#pragma omp parallel nthreads = 1; omp_set_num_threads(nthreads); printf("\nNow, it is %d threads\n", nthreads); begintime = clock(); //計時開始 #pragma omp parallel for for (int i = 0; i < maxn; ++i); endtime = clock(); //計時結束 printf("\nRunning Time:%dms\n", endtime - begintime); printf("\nIt is not use parallel compute:\n"); begintime = clock(); //計時開始 for (int i = 0; i < maxn; ++i); endtime = clock(); //計時結束 printf("\nRunning Time:%dms\n", endtime - begintime);
看看結果:spa
能夠看到:單線程的並行計算要比串行計算慢了近1倍!!!爲何呢?線程
其實不難理解,但線程非但沒有進行計算任務的分配(就他一個光桿司令,無法分配),可是宏指令下的仍是並行,因此線程間不起做用的並行過程拖延了時間!!致使了上述的結果。code
OpenMP和以前咱們學習的MPI有着較大的區別,那就是計算模式:blog
MPI:核心是不共享內存,並行計算依賴於消息傳遞,只有消息傳遞各個進程間才能共享數據。只有共享數據才能作到並行計算。所以,MPI的計算模式是全部進程運行一樣的程序,這個程度都是同樣的,也都是完整的。如何肯定消息的收發方依賴於進程的秩。
OpenMP:核心是插入並行語句塊,各個線程間共享內存。依賴private和shared指令來區分各個線程在並行區計算的數據是否共享。而後須要等待並行計算區的語句全都執行完,而後必定會回到串行程序,直到程序運行結束。