(Matlab)GPU計算簡介,及其與CPU計算性能的比較

1GPUCPU結構上的對比html

2GPU能加速個人應用程序嗎?
promise

3GPUCPU在計算效率上的對比
app

4、利用Matlab進行GPU計算的通常流程
less

5GPU計算的硬件、軟件配置
函數

5.1 硬件及驅動性能

5.2 軟件測試

6、示例Matlab代碼——GPU計算與CPU計算效率的對比lua

一、GPU與CPU結構上的對比

原文:spa

Multicore machines and hyper-threading technology have enabled scientists, engineers, and financial analysts to speed up computationally intensive applications in a variety of disciplines. Today, another type of hardware promises even higher computational performance: the graphics processing unit (GPU).3d

Originally used to accelerate graphics rendering, GPUs are increasingly applied to scientific calculations. Unlike a traditional CPU, which includes no more than a handful of cores, a GPU has a massively parallel array of integer and floating-point processors, as well as dedicated, high-speed memory. A typical GPU comprises hundreds of these smaller processors (Figure 1).

我的註解1:從上圖能夠看出:CPU的核心數是遠遠小於GPU的核心數的,雖然CPU每一個核心的性能很是強大,可是典型的GPU都包含了數百個小型處理器,這些處理器是並行工做的,若是在處理大量的數據時,就會表現出至關高的效率,這就是所謂的衆人拾柴火焰高。

二、GPU能加速個人應用程序嗎?

原文:

我的註解2:這兩個條件的第二個我以爲是進行GPU計算的大前提,正是由於任務可以碎片化,纔可以充分利用GPU的物理結構,從而提升計算效率。第一個說則條件說明了GPU計算須要將數據傳輸給GPU顯存,這一步會花一些時間,若是數據傳輸花費的時間比較多的話,那就不推薦使用GPU計算啦。下面是我截的圖片,演示了一個GPU計算的具體流程。

至於數據傳輸所花費的時間是什麼量級,能夠經過gpuArray函數和gather函數來傳遞一個矩陣B來作測試,兩次傳輸過程之間不要作任何多餘的操做,將該過程循環幾千次而後求出其用時的平均值;依次改變B的大小重複上面的步驟,而後將B的大小做爲橫軸,平均傳輸時間作爲縱軸,plot一個圖便可。

因爲目前我不太用GPU計算了,各類軟件沒有安裝,不方便給出結果,感興趣讀者的能夠親自嘗試驗證一下,再也不贅述。

三、GPU與CPU在計算效率上的對比

原文:

To evaluate the benefits of using the GPU to solve second-order wave equations, we ran a benchmark study in which we measured the amount of time the algorithm took to execute 50 time steps for grid sizes of 64, 128, 512, 1024, and 2048 on an Intel® Xeon® Processor X5650 and then using an NVIDIA® Tesla™ C2050 GPU.

For a grid size of 2048, the algorithm shows a 7.5x decrease in compute time from more than a minute on the CPU to less than 10 seconds on the GPU (Figure 4). The log scale plot shows that the CPU is actually faster for small grid sizes. As the technology evolves and matures, however, GPU solutions are increasingly able to handle smaller problems, a trend that we expect to continue.

我的註解3:這個圖是很是直觀的,我當時就是看到這個圖才知道原來GPU計算這麼厲害!!! 從上圖能夠看出,數據量越大,GPU計算相對於CPU計算的效率越高;而在數據量較小時,CPU計算的效率是遠遠高於GPU計算的,我以爲應該有兩個緣由(僅供參考):其中一個緣由是,小量的數據可能根本佔用不了GPU那麼多的核心,而每一個核心的計算效率又相對較低,因此速度會比較慢。另一個緣由,就是上面所說的數據傳輸相對耗時的緣由。

四、利用Matlab進行GPU計算的通常流程

我的註解4:上圖標記的地方解釋了第二節所說的數據傳輸。

五、GPU計算的硬件、軟件配置

5.1 硬件及驅動

電腦:聯想揚天 M4400

系統:win 7 X64

硬件:NVIDIA GeForce GT 740M 獨顯2G

硬件驅動:

5.2 軟件

Matlab 2015a 須要安裝Parallel Computing Toolbox

VS 2013 只安裝了 C++基礎類

CUDA 7.5.18 只安裝了Toolkit

六、示例Matlab代碼——GPU計算與CPU計算效率的對比

%%首先以200*200的矩陣作加減乘除作比較

t = zeros(1,100);

A = rand(200,200);B = rand(200,200);C = rand(200,200);

for i=1:100

tic;

D=A+B;E=A.*D;F=B./(E+eps);

t(i)=toc;

end;mean(t)

%%%%ans = 2.4812e-04

t1 = gpuArray(zeros(1,100));

A1 = gpuArray(rand(200,200));

B1 = gpuArray(rand(200,200));

C1 = gpuArray(rand(200,200));

for i=1:100

tic;

D1=A1+B1;E1=A1.*D1;F1=B1./(E1+eps);

t1(i)=toc;

end;mean(t1)

%%%%ans = 1.2260e-04

%%%%%%速度快了近兩倍!

%%而後將矩陣大小提升到2000*2000作實驗

t = zeros(1,100);

A = rand(2000,2000);B = rand(2000,2000);C = rand(2000,2000);

for i=1:100

tic;

D=A+B;E=A.*D;F=B./(E+eps);

t(i)=toc;

end;mean(t)

%%%%ans = 0.0337

t1 = gpuArray(zeros(1,100));

A1 = gpuArray(rand(2000,2000));

B1 = gpuArray(rand(2000,2000));

C1 = gpuArray(rand(2000,2000));

for i=1:100

tic;

D1=A1+B1;E1=A1.*D1;F1=B1./(E1+eps);

t1(i)=toc;

end;mean(t1)

%%%%ans = 1.1730e-04

%%%mean(t)/mean(t1) = 287.1832 快了287倍!!!

參考連接:https://ww2.mathworks.cn/company/newsletters/articles/gpu-programming-in-matlab.html

相關文章
相關標籤/搜索