前段時間看到園子裏面有同窗在用Parallel進行批量插入數據庫。後面也有不少同窗針對這一事件給出了本身的見解和看法。我在這裏不評論內容的好壞,至少能將本身東西總結分享這個是要靠勇氣和毅力。git
閒話少說,我在最近看崔鵬飛的github的時候,發現他對這塊也作了必定的總結,那麼我就他這塊進行板書與展現。案例是怎麼回事呢?話說我有一個公司,裏面須要統計一下總收入,另外有一個公司被我收購了,我一塊兒計算總收入。當一天我收購了N個公司,計算總收入的時候,咱們採用並行計算。github
1 internal class Company 2 { 3 public decimal TotalIncome; 4 5 public Company Merge(Company that) 6 { 7 Calc(); 8 TotalIncome += that.TotalIncome; 9 return this; 10 } 11 12 /// <summary> 13 /// 複雜運算 14 /// </summary> 15 private void Calc() 16 { 17 //TODO:省略500字 18 } 19 }
首先咱們想到的是採用直接累加就好了吧,這是所謂的線性預算。shell
/// <summary> /// 線性運行 /// </summary> /// <param name="bigCompany"></param> /// <param name="smallCompanies"></param> /// <returns></returns> private static Company LinearMerge(Company bigCompany, IEnumerable<Company> smallCompanies) { foreach (Company smallCompany in smallCompanies) { bigCompany.Merge(smallCompany); } return bigCompany; }
採用線性運算,毫無疑問結果是正確的。可是,若是的N大一點,例如30000000個,可能就要花一點時間了。數據庫
那麼是否咱們能夠採用並行處理呢?OK,直接上代碼。ide
1 /// <summary> 2 /// 並行處理 3 /// </summary> 4 /// <param name="bigCompany"></param> 5 /// <param name="smallCompanies"></param> 6 /// <returns></returns> 7 private static Company ParallelMerge(Company bigCompany, IEnumerable<Company> smallCompanies) 8 { 9 Parallel.ForEach(smallCompanies, smallCompany => bigCompany.Merge(smallCompany)); 10 return bigCompany; 11 }
時間很快,可是結果呢?結果和上面線性的一致麼?函數
那麼我若是在並行的基礎上面加一把鎖呢,保證每次獨佔資源。測試
1 /// <summary> 2 /// 並行加鎖 3 /// </summary> 4 /// <param name="bigCompany"></param> 5 /// <param name="smallCompanies"></param> 6 /// <returns></returns> 7 private static Company ParallelMergeLock(Company bigCompany, IEnumerable<Company> smallCompanies) 8 { 9 var obj = new object(); 10 Parallel.ForEach(smallCompanies, smallCompany => 11 { 12 lock (obj) 13 { 14 bigCompany.Merge(smallCompany); 15 } 16 }); 17 return bigCompany; 18 }
毫無疑問,結果也是正確的,那麼耗時可能咱們就要關心了。那麼耗時究竟怎麼樣呢?this
咱們能夠採用函數式處理嘛。spa
1 /// <summary> 2 /// 函數式合併 3 /// </summary> 4 /// <param name="bigCompany"></param> 5 /// <param name="smallCompanies"></param> 6 /// <returns></returns> 7 private static Company FunctionalMerger(Company bigCompany, IEnumerable<Company> smallCompanies) 8 { 9 return smallCompanies.Aggregate(bigCompany, (buyer, seller) => buyer.Merge(seller)); 10 }
那麼咱們在在函數式的基礎上面進行並行化處理呢?pwa
1 /// <summary> 2 /// 函數式的並行化 3 /// </summary> 4 /// <param name="bigCompany"></param> 5 /// <param name="smallCompanies"></param> 6 /// <returns></returns> 7 private static Company FunctionParallelMerge(Company bigCompany, IEnumerable<Company> smallCompanies) 8 { 9 return smallCompanies.AsParallel().Aggregate(() => new Company(), (shell, smallCompany) => shell.Merge(smallCompany), (shell1, shell2) => shell1.Merge(shell2), bigCompany.Merge); 10 }
上面提出了一些問題,這裏咱們用實際的測試數據查看。
測試代碼
1 private static IEnumerable<Company> GenerateSmallCompanies() 2 { 3 return Enumerable.Range(0, 30000000).Select(number => new Company { TotalIncome = number }).ToArray(); 4 } 5 6 private static void PrintMergeResult(Func<Company, IEnumerable<Company>, Company> mergeMethod, string funcApproach) 7 { 8 var stopWatch = new Stopwatch(); 9 stopWatch.Start(); 10 var mergeResult = mergeMethod(new Company { TotalIncome = 1000000 }, m_SmallCompanies); 11 stopWatch.Stop(); 12 Console.WriteLine("{0}:{1} Time:{2}", funcApproach, mergeResult.TotalIncome, stopWatch.ElapsedMilliseconds); 13 } 14 15 private static void TryAll() 16 { 17 Console.WriteLine("============================"); 18 PrintMergeResult(LinearMerge, "簡單直接 "); 19 PrintMergeResult(ParallelMerge, "錯誤並行 "); 20 PrintMergeResult(ParallelMergeLock, "加鎖並行 "); 21 Console.WriteLine("***********"); 22 PrintMergeResult(FunctionalMerge,"函數式合併 "); 23 PrintMergeResult(FunctionParallelMerge, "函數式並行合併 "); 24 } 25 26 27 private static readonly IEnumerable<Company> m_SmallCompanies = GenerateSmallCompanies(); 28 static void Main() 29 { 30 Console.WriteLine("測試數據30000000個"); 31 for (int i = 0; i < 5; i++) 32 { 33 TryAll(); 34 } 35 Console.ReadKey(); 36 }
測試結果以下:
按照理論狀況,錯誤並行應該比直接更快,可是不知道我機器(CPU AMD)上面出現這樣的狀況,其餘狀況還算正常。在另外一臺計算機(CPU Intel)上面運行測試,數據以下: