初探 C# GPU 通用計算技術

時間 2021-08-13

標籤編程數組 ide 測試 this spa .net pwa orm 遊戲欄目 C# 简体版

原文原文鏈接

GPU 的並行計算能力高於 CPU，因此最近也有不少利用 GPU 的項目出如今咱們的視野中，在 InfoQ 上看到這篇介紹 Accelerator-V2 的文章，它是微軟研究院的研究項目，須要註冊後才能下載，感受做爲我接觸 GPU 通用運算的第一步還不錯，因而去下載了回來。編程

在安裝包裏，包含了幾個例子程序，好比著名的 Life 遊戲，不過，Life 遊戲，相對於剛接觸 GPU 運算的我，仍是稍顯複雜了。因而簡化一下，只是進行一些簡單的計算，發現，DX9Target.ToArray 若是返回參數是 int 數組的話，則會爆出「未支持的操做」的異常，想一想也對，顯卡確實是精於浮點運算的。數組

原本，我覺得，GPU 運算是 DirectX 11 纔有的功能，可是 Accelerator 支持的倒是 DirectX 9，想來 DirectX 11 支持的運算能力更高、方式更簡單吧。ide

爲了簡單比較一下 CPU 和 GPU 的速度，也寫了一個 .net 4 的並行運算的程序，由於 DX9Target 不支持 int，因此這裏的數組也用 float，以下：測試

代碼

private const int GridSize = 1024 ; private float [] _map; public Form1() { InitializeComponent(); _map = new float [GridSize * GridSize]; for ( int y = 0 ; y < GridSize; y ++ ) { for ( int x = 0 ; x < GridSize; x ++ ) { _map[x * GridSize + y] = x * y; } } Render(); } private void Start_Click( object sender, EventArgs e) { var stopwatch = new Stopwatch(); stopwatch.Start(); _map = _map.AsParallel().Select(p => p * p * p / 4 + 194 ).ToArray(); var time = stopwatch.ElapsedMilliseconds; this .Text = time.ToString(); Render(); } private void Render() { var workingBitmap = new Bitmap(pictureBox1.Width, pictureBox1.Height); for ( int y = 0 ; y < pictureBox1.Height; y ++ ) { for ( int x = 0 ; x < pictureBox1.Width; x ++ ) { workingBitmap.SetPixel(x, y, Color.FromArgb( - 0x1000000 | ( int )_map[x * 2 * GridSize + y * 2 ])); } } pictureBox1.Image = workingBitmap; }

而使用 Accelerator 的代碼以下：this

代碼

private const int GridSize = 1024 ; private readonly DX9Target _target; private float [,] _map; public Form1() { InitializeComponent(); _target = new DX9Target(); _map = new float [GridSize, GridSize]; for ( int y = 0 ; y < GridSize; y ++ ) { for ( int x = 0 ; x < GridSize; x ++ ) { _map[x, y] = x * y; } } Render(); } private void Start_Click( object sender, EventArgs e) { var stopwatch = new Stopwatch(); stopwatch.Start(); var p = new FloatParallelArray(_map); p = p * p * p / 4 + 194 ; _target.ToArray(p, out _map); var time = stopwatch.ElapsedMilliseconds; this .Text = time.ToString(); Render(); } private void Render() { var workingBitmap = new Bitmap(pictureBox1.Width, pictureBox1.Height); for ( int y = 0 ; y < pictureBox1.Height; y ++ ) { for ( int x = 0 ; x < pictureBox1.Width; x ++ ) { workingBitmap.SetPixel(x, y, Color.FromArgb( - 0x1000000 | ( int )_map[x * 2 ， y * 2 ])); } } pictureBox1.Image = workingBitmap; }

用個人筆記本（CPU 爲 Core i5 430, 顯卡爲 ATI 5650）測試，對它們兩個程序，都點擊幾回 Start 按鈕，發現運行 3 次左右，圖片框會變成全黑，這時，普通並行程序運算速度變慢，而 GPU 程序運行速度無明顯變化，普通並行程序 4 次值爲：96，89，277，291，而 GPU 程序 4 次值爲：71，40，35，50。單就這個測試來講，在個人電腦上，使用 GPU 的程序，大概比普通並行程序快一倍左右吧。這個測試自己，其實不見得很公平，結果僅供參考。spa

不過，在 Accelerator 中的並行編程，明顯感受受到的約束很大，日常很容易的代碼，要改爲這種並行模式，須要花費不少力氣，有些邏輯甚至沒法實現。相對於 Accelerator，Brahma 的代碼寫起來就容易得多，也更易於閱讀，其 Life 遊戲的例子程序讀起來簡單而清晰，惋惜我編譯了 Brahma v0.1 和 v0.4，在個人電腦上，DirectX 的例子程序沒有效果，而 OpenGL 的例子程序則會報一個「The generated GLSL was invalid」的異常，看來還須要等它完善以後才能使用吧。.net