.NET 並行編程——數據並行

時間 2019-12-01

標籤並行編程數據简体版

原文原文鏈接

本文內容

並行編程
數據並行
- 環境
- 計算 PI
- 矩陣相乘
- 把目錄中的所有圖片複製到另外一個目錄
- 列出指定目錄中的全部文件，包括其子目錄

最近，對多線程編程，並行編程，異步編程，這三個概念有點暈了，以前我研究了異步編程《VS 2013 C# 異步編程 async await》，如今猛然發覺，本身怎麼有點不明白這三者之間有什麼聯繫和區別了呢？有點說不清、道不明的感受~html

所以，回顧了一下我的經歷，屢屢思路~我剛接觸計算機時，仍是學校的 DOS 和 win 3.x，以後，學校換了 Windows 95，再以後，我有本身的臺式機……但不管如何，那時電腦的 CPU 都是單核的，即使採用多線程，程序不管看上多麼像「同時」執行，其本質上仍是順序的，由於代碼段是獨佔 CPU 的；以後，我賣了臺式機，買了筆記本電腦，CPU 是雙核的，若是用多線程，那狀況就不一樣了，能達到正真的「同時」執行，也就是並行。程序員

「並行」是目的，爲了實現這個目的，咱們採用「多線程編程」這個手段，而咱們知道，多線程編程涉及的問題不少，申請超額、競爭條件、死鎖、活鎖、二步舞、優先級翻轉等，爲了簡化多線程編程，加之多核 CPU 愈來愈廣泛，因而不少編程框架自己就提供了對多線程的封裝，好比一些類和方法，這些就是並行編程。所以，多線程編程變成了較底層東西，而並行編程則是較高層次，較高抽象，至少能將一段很簡單的代碼從順序的直接編程並行的；而異步編程呢，異步方法旨在成爲非阻止操做，異步並不會建立其餘線程。異步方法不會在其自身線程上運行，而是在 CLR 提供的線程上，所以它不須要多線程。express

總之，在多核和衆核（manycore）時代，想一想一下，在將來，具備一百萬個核的 CPU 不是不可能的事。人類中樞神經系統中約含1000億個神經元，僅大腦皮層中就約有140億。若是再讓程序員本身用多線程編程，顯然過低效了，低效也就算了，還容易犯錯，因此才須要並行編程。編程

2009年Google推出了它的第二個開源語言 Go。對 Go 的評價褒貶不一，中國比國外的熱情高中國比國外的熱情高。Go 天生就是爲併發和網絡而生的，除了這點外，在靜態編譯、GC、跨平臺、易學、豐富的標準庫等，其實並不如 C/C++、Java、C#、Python。由此可想而知，爲何會出現 Go？以及爲何 Go 存在如此多的問題和爭論？——也許Go 更像是一個「天才的自閉症患者」，若是看清了這點，對 Go 的褒貶也就能泰然啦~網絡

使用 TPL，除了線程方面的知識，你最好對委託、匿名方法或 Lambda 表達式有所瞭解。多線程

下載 MyDemo

下載 Samples for Parallel Programming with .net framework 完整示例

下載 Professional Parallel Programming with C#: Master Parallel Extensions with .NET 4 完整示例（該書的例子，深刻淺出，按部就班，對理解並行編程幫助很大，針對本文的數據並行，你能夠參考 CH2，看做者如何對 ASE 和 MD5 的計算進行改進的，評價的標準是 Amdahl 定律）

並行編程

多核 CPU 已經至關廣泛，使得多個線程可以同時執行。將代碼並行化，工做也就分攤到多個 CPU 上。併發

過去，並行化須要線程和鎖的低級操做。而 Visual Studio 2010 和 .NET Framework 4 開始提供了新的運行時、新的類庫類型以及新的診斷工具，從而加強了對並行編程的支持。這些功能簡化了並行開發，經過固有方法編寫高效、細化且可伸縮的並行代碼，而沒必要直接處理線程或線程池。app

下圖從較高層面上概述了 .NET Framework 4 中的並行編程體系結構。框架

任務並行庫（The Task Parallel Library，TPL）是 System.Threading 和 System.Threading.Tasks 空間中的一組公共類型和 API。TPL 的目的是經過簡化將並行和併發添加到應用程序的過程來提升開發人員的工做效率。TPL 能動態地最有效地使用全部可用的處理器。此外，TPL 還處理工做分區、ThreadPool 上的線程調度、取消支持、狀態管理以及其餘低級別的細節操做。經過使用 TPL，你能夠將精力集中於程序要完成的工做，同時最大程度地提升代碼的性能。 dom

從 .NET Framework 4 開始，TPL 是編寫多線程代碼和並行代碼的首選方法。但並非全部代碼都適合並行化，例如，若是某個循環在每次迭代時只執行少許工做，或它在不少次迭代時都不運行，那麼並行化的開銷可能致使代碼運行更慢。此外，像任何多線程代碼同樣，並行化會增長程序執行的複雜性。儘管 TPL 簡化了多線程方案，但建議對線程處理概念（例如，鎖、死鎖和爭用條件）進行基本瞭解，以便可以有效地使用 TPL。

數據並行

咱們能夠對數據進行並行，簡單地說，對集合中的每一個數據同時執行相同的操做，固然也能夠對任務和數據流進行並行。本文主要描述數據並行。

TPL 經過 System.Threading.Tasks.Parallel 類實現數據並行，此類提供了 for 和 foreach 基於並行的實現。爲 Parallel.For 或 Parallel.ForEach 編寫循環邏輯與編寫順序循環很是相似。你沒必要建立線程或隊列工做項。基本循環中沒必要採用鎖。TPL 將處理全部低級別工做。

System.Threading.Tasks.Parallel類有三個方法：For、ForEach、Invoke，它們有不少重載，不必說明這些方法自己，所以，下面用實例說明如何用這些方法進行並行編程，並對比與順序執行的性能。

環境

Windows 7 旗艦版 SP1
Microsoft Visual Studio Ultimate 2013 Update 4

計算 PI

對比順序計算 PI、並行計算 PI 和並行分區計算 PI 的性能。

using System;

using System.Collections.Concurrent;

using System.Diagnostics;

using System.Threading.Tasks;

namespace ComputePi

    class Program

        const int num_steps = 100000000;

        static void Main(string[] args)

            Time(() => SerialPi());

            Time(() => ParallelPi());

            Time(() => ParallelPartitionerPi());

            Console.WriteLine("Press any keys to Exit.");

            Console.ReadLine();

        /// <summary>

        /// Times the execution of a function and outputs both the elapsed time and the function's result.

        /// </summary>

        static void Time<T>(Func<T> work)

            var sw = Stopwatch.StartNew();

            var result = work();

            Console.WriteLine(sw.Elapsed + ": " + result);

        /// <summary>

        /// Estimates the value of PI using a for loop.

        /// </summary>

        static double SerialPi()

            double sum = 0.0;

            double step = 1.0 / (double)num_steps;

            for (int i = 0; i < num_steps; i++)

                double x = (i + 0.5) * step;

                sum = sum + 4.0 / (1.0 + x * x);

            return step * sum;

        /// <summary>

        /// Estimates the value of PI using a Parallel.For.

        /// </summary>

        static double ParallelPi()

            double sum = 0.0;

            double step = 1.0 / (double)num_steps;

            object monitor = new object();

            Parallel.For(0, num_steps, () => 0.0, (i, state, local) =>

                double x = (i + 0.5) * step;

                return local + 4.0 / (1.0 + x * x);

            }, local => { lock (monitor) sum += local; });

            return step * sum;

        /// <summary>

        /// Estimates the value of PI using a Parallel.ForEach and a range partitioner.

        /// </summary>

        static double ParallelPartitionerPi()

            double sum = 0.0;

            double step = 1.0 / (double)num_steps;

            object monitor = new object();

            Parallel.ForEach(Partitioner.Create(0, num_steps), () => 0.0, (range, state, local) =>

                for (int i = range.Item1; i < range.Item2; i++)

                    double x = (i + 0.5) * step;

                    local += 4.0 / (1.0 + x * x);

                return local;

            }, local => { lock (monitor) sum += local; });

            return step * sum;

//RESULT:

//00:00:00.4358850: 3.14159265359043

//00:00:00.4523856: 3.14159265358987

//00:00:00.1435475: 3.14159265358979

//Press any keys to Exit.

當 For 循環的循環體很小時，它的執行速度可能比等效的順序循環更慢。這也就是爲何順序計算 PI 與並行計算 PI 的時間差很少，由於對數據進行分區所涉及的開銷以及調用每一個循環迭代上的委託的開銷致使了性能下降。爲了解決相似狀況，Partitioner 類提供 Partitioner.Create 方法，該方法使您能夠爲委託體提供順序循環，以便每一個分區只調用一次委託，而不是每一個迭代調用一次委託。所以，並行分區計算 PI 時，性能有大幅度提高。

矩陣相乘

對比順序與並行計算矩陣乘法的性能。

using System;

using System.Diagnostics;

using System.Threading.Tasks;

namespace DataParallelismDemo

    class Program

        /// <summary>

        /// Sequential_Loop

        /// </summary>

        /// <param name="matA"></param>

        /// <param name="matB"></param>

        /// <param name="result"></param>

        static void MultiplyMatricesSequential(double[,] matA, double[,] matB, double[,] result)

            int matACols = matA.GetLength(1);

            int matBCols = matB.GetLength(1);

            int matARows = matA.GetLength(0);

            for (int i = 0; i < matARows; i++)

                for (int j = 0; j < matBCols; j++)

                    for (int k = 0; k < matACols; k++)

                        result[i, j] += matA[i, k] * matB[k, j];

        /// <summary>

        /// Parallel_Loop

        /// </summary>

        /// <param name="matA"></param>

        /// <param name="matB"></param>

        /// <param name="result"></param>

        static void MultiplyMatricesParallel(double[,] matA, double[,] matB, double[,] result)

            int matACols = matA.GetLength(1);

            int matBCols = matB.GetLength(1);

            int matARows = matA.GetLength(0);

            // A basic matrix multiplication.

            // Parallelize the outer loop to partition the source array by rows.

            Parallel.For(0, matARows, i =>

                for (int j = 0; j < matBCols; j++)

                    // Use a temporary to improve parallel performance.

                    double temp = 0;

                    for (int k = 0; k < matACols; k++)

                        temp += matA[i, k] * matB[k, j];

                    result[i, j] = temp;

            }); // Parallel.For

        static void Main(string[] args)

            // Set up matrices. Use small values to better view

            // result matrix. Increase the counts to see greater

            // speedup in the parallel loop vs. the sequential loop.

            int colCount = 180;

            int rowCount = 2000;

            int colCount2 = 270;

            double[,] m1 = InitializeMatrix(rowCount, colCount);

            double[,] m2 = InitializeMatrix(colCount, colCount2);

            double[,] result = new double[rowCount, colCount2];

            // First do the sequential version.

            Console.WriteLine("Executing sequential loop...");

            Stopwatch stopwatch = new Stopwatch();

            stopwatch.Start();

            MultiplyMatricesSequential(m1, m2, result);

            stopwatch.Stop();

            Console.WriteLine("Sequential loop time in milliseconds: {0}", stopwatch.ElapsedMilliseconds);

            // For the skeptics.

            OfferToPrint(rowCount, colCount2, result);

            // Reset timer and results matrix.

            stopwatch.Reset();

            result = new double[rowCount, colCount2];

            // Do the parallel loop.

            Console.WriteLine("Executing parallel loop...");

            stopwatch.Start();

            MultiplyMatricesParallel(m1, m2, result);

            stopwatch.Stop();

            Console.WriteLine("Parallel loop time in milliseconds: {0}", stopwatch.ElapsedMilliseconds);

            OfferToPrint(rowCount, colCount2, result);

            // Keep the console window open in debug mode.

            Console.WriteLine("Press any key to exit.");

            Console.ReadKey();

        /// <summary>

        /// 生成矩陣

        /// </summary>

        /// <param name="rows"></param>

        /// <param name="cols"></param>

        /// <returns></returns>

        static double[,] InitializeMatrix(int rows, int cols)

            double[,] matrix = new double[rows, cols];

            Random r = new Random();

            for (int i = 0; i < rows; i++)

                for (int j = 0; j < cols; j++)

                    matrix[i, j] = r.Next(100);

            return matrix;

        private static void OfferToPrint(int rowCount, int colCount, double[,] matrix)

            Console.WriteLine("Computation complete. Print results? y/n");

            char c = Console.ReadKey().KeyChar;

            if (c == 'y' || c == 'Y')

                Console.WindowWidth = 180;

                Console.WriteLine();

                for (int x = 0; x < rowCount; x++)

                    Console.WriteLine("ROW {0}: ", x);

                    for (int y = 0; y < colCount; y++)

                        Console.Write("{0:#.##} ", matrix[x, y]);

                    Console.WriteLine();

//RESULST:

//Executing sequential loop...

//Sequential loop time in milliseconds: 1168

//Computation complete. Print results? y/n

//nExecuting parallel loop...

//Parallel loop time in milliseconds: 360

//Computation complete. Print results? y/n

//nPress any key to exit.

using System;

//using System.Collections.Generic;

//using System.Linq;

//using System.Text;

using System.Threading;

using System.Threading.Tasks;

using System.Configuration;

namespace MovePics

    class Program

        protected static string PIC_PATH = ConfigurationManager.AppSettings["PicPath"].ToString();

        protected static string NEW_PIC_PATH = ConfigurationManager.AppSettings["NewPicPath"].ToString();

        static void Main(string[] args)

            // A simple source for demonstration purposes. Modify this path as necessary.

            string[] files = System.IO.Directory.GetFiles(PIC_PATH, "*.png");

            System.IO.Directory.CreateDirectory(NEW_PIC_PATH);

            //  Method signature: Parallel.ForEach(IEnumerable<TSource> source, Action<TSource> body)

            Parallel.ForEach(files, currentFile =>

                // The more computational work you do here, the greater

                // the speedup compared to a sequential foreach loop.

                string filename = System.IO.Path.GetFileName(currentFile);

                System.Drawing.Bitmap bitmap = new System.Drawing.Bitmap(currentFile);

                bitmap.RotateFlip(System.Drawing.RotateFlipType.Rotate180FlipNone);

                bitmap.Save(System.IO.Path.Combine(NEW_PIC_PATH, filename));

                // Peek behind the scenes to see how work is parallelized.

                // But be aware: Thread contention for the Console slows down parallel loops!!!

                Console.WriteLine("Processing {0} on thread {1}", filename,

                                    Thread.CurrentThread.ManagedThreadId);

            } //close lambda expression

                 ); //close method invocation

            // Keep the console window open in debug mode.

            Console.WriteLine("Processing complete. Press any key to exit.");

            Console.ReadKey();

using System;

using System.Collections.Generic;

using System.Diagnostics;

using System.IO;

using System.Linq;

using System.Security;

using System.Text;

using System.Threading;

using System.Threading.Tasks;

namespace TraverseTreeParallelForEach

    class Program

        static void Main(string[] args)

try

                TraverseTreeParallelForEach(@"C:\Program Files", (f) =>

                    // Exceptions are no-ops.

try

                        // Do nothing with the data except read it.

                        byte[] data = File.ReadAllBytes(f);

                    catch (FileNotFoundException) { }

                    catch (IOException) { }

                    catch (UnauthorizedAccessException) { }

                    catch (SecurityException) { }

                    // Display the filename.

                    Console.WriteLine(f);

});

            catch (ArgumentException)

                Console.WriteLine(@"The directory 'C:\Program Files' does not exist.");

            // Keep the console window open.

            Console.WriteLine("Press any key to exit.");

            Console.ReadKey();

        public static void TraverseTreeParallelForEach(string root, Action<string> action)

            //Count of files traversed and timer for diagnostic output

            int fileCount = 0;

            var sw = Stopwatch.StartNew();

            // Determine whether to parallelize file processing on each folder based on processor count.

            int procCount = System.Environment.ProcessorCount;

            // Data structure to hold names of subfolders to be examined for files.

            Stack<string> dirs = new Stack<string>();

            if (!Directory.Exists(root))

                throw new ArgumentException();

            dirs.Push(root);

            while (dirs.Count > 0)

                string currentDir = dirs.Pop();

                string[] subDirs = { };

                string[] files = { };

try

                    subDirs = Directory.GetDirectories(currentDir);

                // Thrown if we do not have discovery permission on the directory.

                catch (UnauthorizedAccessException e)

                    Console.WriteLine(e.Message);

                    continue;

                // Thrown if another process has deleted the directory after we retrieved its name.

                catch (DirectoryNotFoundException e)

                    Console.WriteLine(e.Message);

                    continue;

try

                    files = Directory.GetFiles(currentDir);

                catch (UnauthorizedAccessException e)

                    Console.WriteLine(e.Message);

                    continue;

                catch (DirectoryNotFoundException e)

                    Console.WriteLine(e.Message);

                    continue;

                catch (IOException e)

                    Console.WriteLine(e.Message);

                    continue;

                // Execute in parallel if there are enough files in the directory.

                // Otherwise, execute sequentially.Files are opened and processed

                // synchronously but this could be modified to perform async I/O.

try

                    if (files.Length < procCount)

                        foreach (var file in files)

                            action(file);

                            fileCount++;

                    else

                        Parallel.ForEach(files, () => 0, (file, loopState, localCount) =>

                            action(file);

                            return (int)++localCount;

},

                                         (c) =>

                                             Interlocked.Add(ref fileCount, c);

});

                catch (AggregateException ae)

                    ae.Handle((ex) =>

                        if (ex is UnauthorizedAccessException)

                            // Here we just output a message and go on.

                            Console.WriteLine(ex.Message);

                            return true;

                        // Handle other exceptions here if necessary...

                        return false;

});

                // Push the subdirectories onto the stack for traversal.

                // This could also be done before handing the files.

                foreach (string str in subDirs)

                    dirs.Push(str);

            // For diagnostic purposes.

            Console.WriteLine("Processed {0} files in {1} milleseconds", fileCount, sw.ElapsedMilliseconds);

另外，Parallel.For 和 Parallel.ForEach 方法都有若干重載，利用這些重載能夠中止或中斷循環執行、監視其餘線程上循環的狀態、維護線程本地狀態、完成線程本地對象、控制併發程度，等等。啓用此功能的幫助器類型包括 ParallelLoopState、ParallelOptions、ParallelLoopResult、 CancellationToken 和 CancellationTokenSource。

參考資料

下載 MyDemo

下載 Samples for Parallel Programming with .net framework 完整示例

下載 Professional Parallel Programming with C#: Master Parallel Extensions with .NET 4 完整示例（該書的例子，深刻淺出，按部就班，對理解並行編程幫助很大，針對本文的數據並行，你能夠參考 CH2，看做者如何對 ASE 和 MD5 的計算進行改進的，評價的標準是 Amdahl 定律）