Scala 具體的並行集合庫【翻譯】

原文地址html

本文內容

  • 並行數組(Parallel Array)
  • 並行向量(Parallel Vector)
  • 並行範圍(Parallel Range)
  • 並行哈希表(Parallel Hash Tables)
  • 並行散列 Tries(Parallel Hash Tries)
  • 並行併發 Tries(Parallel Concurrent Tries)
  • 參考資料

並行數組(Parallel Array)


一個 ParArray 序列包含線性、連續的元素數組。這意味着,經過修改底層數組,能夠高效地訪問和修改元素。所以,反序元素也很高效。並行數組跟數組同樣也是固定大小的。es6

scala> val pa = scala.collection.parallel.mutable.ParArray.tabulate(1000)(x =>2*
 x +1)
pa: scala.collection.parallel.mutable.ParArray[Int] = ParArray(1, 3, 5, 7, 9, 11
, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51
, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91
, 93, 95, 97, 99, 101, 103, 105, 107, 109, 111, 113, 115, 117, 119, 121, 123, 12
5, 127, 129, 131, 133, 135, 137, 139, 141, 143, 145, 147, 149, 151, 153, 155, 15
7, 159, 161, 163, 165, 167, 169, 171, 173, 175, 177, 179, 181, 183, 185, 187, 18
9, 191, 193, 195, 197, 199, 201, 203, 205, 207, 209, 211, 213, 215, 217, 219, 22
1, 223, 225, 227, 229, 231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 251, 25
3, 255, 257, 259, 261, 263, 265, 267, 269, 271, 273, 275, 277, 279, 281, 283, 28
5, 287, 289, 291, 293, 295, 297, 299, 301, 303, 305, 307, 309, 311, 313, 315,...
 
scala> pa reduce(_+_)
res0: Int = 1000000
 
scala> pa map(x=>(x-1)/2)
res1: scala.collection.parallel.mutable.ParArray[Int] = ParArray(0, 1, 2, 3, 4,
5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 2
6, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 4
6, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 6
6, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 8
6, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104,
105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120,
121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136,
137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152,
153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 16...
 
scala>

並行向量(Parallel Vector)


一個 ParVector 是一個不可變序列,具備低常量因子對數的訪問(low-constant factor logarithmic access )和更新時間。api

scala> val pv = scala.collection.parallel.immutable.ParVector.tabulate(1000)(x =
> x)
pv: scala.collection.parallel.immutable.ParVector[Int] = ParVector(0, 1, 2, 3, 4
, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45,
 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65,
 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85,
 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104
, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120
, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136
, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152
, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, ...
 
scala> pv filter (_ %2==0)
res2: scala.collection.parallel.immutable.ParVector[Int] = ParVector(0, 2, 4, 6,
 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46,
48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86,
88, 90, 92, 94, 96, 98, 100, 102, 104, 106, 108, 110, 112, 114, 116, 118, 120, 1
22, 124, 126, 128, 130, 132, 134, 136, 138, 140, 142, 144, 146, 148, 150, 152, 1
54, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 182, 184, 1
86, 188, 190, 192, 194, 196, 198, 200, 202, 204, 206, 208, 210, 212, 214, 216, 2
18, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 2
50, 252, 254, 256, 258, 260, 262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 2
82, 284, 286, 288, 290, 292, 294, 296, 298, 300, 302, 304, 306, 308, 310, 312...
 
scala>

並行範圍(Parallel Range)


一個 ParRange 是一個有序的等差整數數列(an ordered sequence of elements equally spaced apart)。並行範圍(parallel range)的建立與順序範圍相似(sequential Range)。數組

scala> 1 to 3 par
warning: there were 1 feature warning(s); re-run with -feature for details
res3: scala.collection.parallel.immutable.ParRange = ParRange(1, 2, 3)
 
scala> 15 to 5 by -2 par
warning: there were 1 feature warning(s); re-run with -feature for details
res4: scala.collection.parallel.immutable.ParRange = ParRange(15, 13, 11, 9, 7,
5)
 
scala>

並行哈希表(Parallel Hash Tables)


並行哈希表(Parallel hash tables)存儲底層數組的元素,並將它們放置在由各自元素哈希碼的位置。並行可變哈希集(mutable.ParHashSet)和並行可變哈希映射(mutable.ParHashMap)都是基於哈希表。安全

scala> val phs = scala.collection.parallel.mutable.ParHashSet(1 until 2000: _*)
phs: scala.collection.parallel.mutable.ParHashSet[Int] = ParHashSet(307, 1705, 3
67, 1007, 1954, 1067, 1316, 1033, 707, 1980, 335, 1093, 395, 1342, 1016, 644, 12
65, 35, 704, 95, 1042, 344, 1291, 1044, 404, 1991, 1351, 653, 1353, 1602, 1662,
1379, 1053, 41, 1628, 1302, 1688, 1362, 1079, 381, 1328, 441, 1388, 690, 1637, 9
39, 1390, 1697, 52, 999, 18, 1699, 639, 78, 1665, 1339, 327, 1, 1725, 1399, 1648
, 1365, 27, 1708, 1425, 727, 1674, 976, 1734, 89, 1036, 1983, 676, 115, 736, 38,
 985, 287, 1045, 1685, 347, 64, 1745, 313, 1711, 1385, 373, 1013, 1771, 1073, 13
22, 1382, 773, 75, 1022, 1969, 324, 1082, 384, 1657, 1331, 633, 350, 24, 693, 41
0, 84, 1357, 659, 333, 50, 1731, 719, 393, 110, 359, 1059, 419, 361, 668, 1119,
421, 1368, 728, 670, 1617, 730, 1677, 1394, 979, 696, 370, 1643, 1317, 756, 4...
 
scala> phs map (x => x * x)
res5: scala.collection.parallel.mutable.ParHashSet[Int] = ParHashSet(11236, 2563
201, 1957201, 143641, 227529, 3556996, 1214404, 1946025, 670761, 2181529, 219024
, 1648656, 2062096, 2152089, 343396, 2418025, 1582564, 440896, 925444, 312481, 3
526884, 2775556, 3359889, 175561, 35721, 84681, 2244004, 20164, 2102500, 576081,
 1557504, 338724, 952576, 300304, 1030225, 3139984, 687241, 1227664, 2627641, 35
75881, 51984, 851929, 94249, 2505889, 3972049, 788544, 1459264, 2937796, 2920681
, 2446096, 413449, 59536, 690561, 306916, 441, 2647129, 5929, 1054729, 746496, 3
392964, 3207681, 2989441, 2547216, 180625, 1868689, 166464, 33124, 237169, 25856
64, 1, 3625216, 57600, 99225, 315844, 251001, 238144, 32761, 595984, 1194649, 12
18816, 676, 1464100, 797449, 2131600, 1527696, 2383936, 786769, 3697929, 3222...
 
scala>

並行散列 Tries(Parallel Hash Tries)


並行哈希 tries(Parallel hash tries )是不可變哈希 tries(immutable hash tries)的並行版本,它用來高效地表示不可變集(immutable sets)和映射(immutable maps)。他們由 immutable.ParHashSetimmutable.ParHashMap 支持。數據結構

scala> val phs = scala.collection.parallel.immutable.ParHashSet(1 until 1000: _
)
phs: scala.collection.parallel.immutable.ParHashSet[Int] = ParSet(645, 892, 69,
809, 629, 365, 138, 760, 101, 479, 347, 846, 909, 333, 628, 249, 893, 518, 962,
468, 234, 941, 777, 555, 666, 88, 481, 352, 408, 977, 170, 523, 582, 762, 115,
83, 730, 217, 276, 994, 308, 741, 5, 873, 449, 120, 247, 379, 878, 440, 655, 51
, 614, 269, 677, 202, 597, 861, 10, 385, 384, 56, 533, 550, 142, 500, 797, 715,
472, 814, 698, 747, 913, 945, 340, 538, 153, 930, 670, 829, 174, 404, 898, 185,
42, 782, 709, 841, 417, 24, 973, 885, 288, 301, 320, 565, 436, 37, 25, 651, 257
 389, 52, 724, 14, 570, 184, 719, 785, 372, 504, 110, 587, 619, 838, 917, 702,
51, 802, 125, 344, 934, 357, 196, 949, 542, 460, 157, 817, 902, 559, 638, 853,
89, 20, 421, 870, 46, 969, 93, 606, 284, 770, 881, 416, 325, 152, 228, 289, 4..
 
scala> phs map { x => x * x } sum
warning: there were 1 feature warning(s); re-run with -feature for details
res6: Int = 332833500
 
scala>

並行併發 tries(Parallel Concurrent Tries)


concurrent.TrieMap 是一個併發線程安全的映射(map),而mutable.ParTrieMap 是它的並行版本。若果數據結構在遍歷期間被修改,那麼大多數併發數據結構不能保證一致性,Ctries 保證在下一次迭代中更新是可見的。這意味着,當你遍歷是,能夠改變併發 trie,以下例子所示,輸出1到99的平方根。併發

scala> val numbers = scala.collection.parallel.mutable.ParTrieMap((1 until 100)
zip (1 until 100): _*) map {case(k, v)=>(k.toDouble, v.toDouble)}
numbers: scala.collection.parallel.mutable.ParTrieMap[Double,Double] = ParTrieMa
p(15.0 -> 15.0, 51.0 -> 51.0, 33.0 -> 33.0, 48.0 -> 48.0, 84.0 -> 84.0, 30.0 ->
30.0, 66.0 -> 66.0, 12.0 -> 12.0, 27.0 -> 27.0, 9.0 -> 9.0, 99.0 -> 99.0, 81.0 -
> 81.0, 63.0 -> 63.0, 45.0 -> 45.0, 78.0 -> 78.0, 60.0 -> 60.0, 96.0 -> 96.0, 19
.0 -> 19.0, 1.0 -> 1.0, 37.0 -> 37.0, 52.0 -> 52.0, 88.0 -> 88.0, 34.0 -> 34.0,
70.0 -> 70.0, 16.0 -> 16.0, 67.0 -> 67.0, 13.0 -> 13.0, 49.0 -> 49.0, 85.0 -> 85
.0, 31.0 -> 31.0, 82.0 -> 82.0, 64.0 -> 64.0, 46.0 -> 46.0, 5.0 -> 5.0, 56.0 ->
56.0, 2.0 -> 2.0, 38.0 -> 38.0, 74.0 -> 74.0, 20.0 -> 20.0, 89.0 -> 89.0, 71.0 -
> 71.0, 17.0 -> 17.0, 53.0 -> 53.0, 35.0 -> 35.0, 86.0 -> 86.0, 68.0 -> 68.0, 32
.0 -> 32.0, 50.0 -> 50.0, 83.0 -> 83.0, 6.0 -> 6.0, 42.0 -> 42.0, 24.0 -> 24....
 
scala> while(numbers.nonEmpty){
     | numbers foreach{case(num, sqrt)=>
     | val nsqrt =0.5*(sqrt + num / sqrt)
     | numbers(num)= nsqrt
     | if(math.abs(nsqrt - sqrt)<0.01){
     | println(num, nsqrt)
     | numbers.remove(num)
     | }
     | }
     | }
(1.0,1.0)
(2.0,1.4142156862745097)
(5.0,2.2360688956433634)
(6.0,2.4494943716069653)
(3.0,1.7320508100147274)
(7.0,2.64576704419029)
(4.0,2.0000000929222947)
(15.0,3.872983698008724)
(12.0,3.4641016533502986)
(9.0,3.000000001396984)
(19.0,4.358901750853372)
(16.0,4.000000636692939)
(13.0,3.6055513629176015)
(20.0,4.4721402170657)
……
 
scala>

性能特徵


順序類型(sequence types)的性能特色: app

  head tail apply update prepend append insert
ParArray C L C C L L L
ParVector eC eC eC eC eC eC -
ParRange C C C - - - -

集(set)和映射(map)類型的性能特色:ide

  lookup add remove
immutable      
ParHashSet/ParHashMap eC eC eC
mutable      
ParHashSet/ParHashMap C C C
ParTieMap eC eC eC

鍵(Key)

上面兩個表的條目,說明以下:函數

該操做花費常量時間(快)
eC 該操做有效地花費常量時間,但可能依賴於某些假設,如向量的最大長度或哈希鍵的離散性
aC 該操做花費分期常量時間。Some invocations of the operation might take longer, but if many operations are performed on average only constant time per operation is taken.
Log 該操做花費時間與集合大小的對數成比例
L 該操做是線性的,花費的時間與集合大小成比例
- 該操做不被支持

下表處理序列類型——可變和不可變——具有以下操做:

head 選擇序列的第一個元素
tail 產生一個由除了第一個元素的全部元素組成的新序列
apply 索引
update 對於不可變序列(immutable sequence)的函數式更新,對於可變序列(mutable sequences)的反作用(side effect)更新
prepend 添加一個元素到序列前面。針對不可變序列,這將產生一個新序列。針對可變序列,這將修改已經存在的序列
append 添加一個元素到序列尾部。針對不可變的序列,這將產生一個新序列,針對可變序列,這將修改已經存在的序列。
insert 在序列中的任意位置插入一個元素。只支持可變序列(mutable sequence)

下表處理可變和不可變集(set)和映射(map)具備以下操做:

lookup 測試一個元素是否包含在集(set)中,或選擇與鍵有關的值
add 添加一個元素到集(set),或添加鍵/值對到映射(map)
remove 從集(set)或刪除一個元素,或從映射(map)刪除一個鍵
min 集(set)中最小的元素,或映射(map)中最小的鍵

參考資料


相關文章
相關標籤/搜索