Spark的HashPartitioner方式的Python實現

時間 2019-11-12

標籤 spark hashpartitioner 方式 python 實現欄目 Spark 简体版

原文原文鏈接

spark中的默認分區方式是org.apache.spark.HashPartitioner，具體代碼以下所示：java

class HashPartitioner(partitions: Int) extends Partitioner {
  require(partitions >= 0, s"Number of partitions ($partitions) cannot be negative.")

  def numPartitions: Int = partitions

  def getPartition(key: Any): Int = key match {
    case null => 0
    case _ => Utils.nonNegativeMod(key.hashCode, numPartitions)
  }

  override def equals(other: Any): Boolean = other match {
    case h: HashPartitioner =>
      h.numPartitions == numPartitions
    case _ =>
      false
  }

  override def hashCode: Int = numPartitions
}

若是想要在Python中獲取一個key的分區，只須要實現hashCode，而後取模。python

hashCode的實現方式以下：apache

def java_string_hashcode(s):
    h = 0
    for c in s:
        h = (31 * h + ord(c)) & 0xFFFFFFFF
    return ((h + 0x80000000) & 0xFFFFFFFF) - 0x80000000

驗證ide

Scala實現ui