spark中的默認分區方式是org.apache.spark.HashPartitioner,具體代碼以下所示:java
class HashPartitioner(partitions: Int) extends Partitioner { require(partitions >= 0, s"Number of partitions ($partitions) cannot be negative.") def numPartitions: Int = partitions def getPartition(key: Any): Int = key match { case null => 0 case _ => Utils.nonNegativeMod(key.hashCode, numPartitions) } override def equals(other: Any): Boolean = other match { case h: HashPartitioner => h.numPartitions == numPartitions case _ => false } override def hashCode: Int = numPartitions }
若是想要在Python中獲取一個key的分區,只須要實現hashCode,而後取模。python
hashCode的實現方式以下:apache
def java_string_hashcode(s): h = 0 for c in s: h = (31 * h + ord(c)) & 0xFFFFFFFF return ((h + 0x80000000) & 0xFFFFFFFF) - 0x80000000
驗證ide
Scala實現ui
Python實現spa