Scala比較器:Ordered與Ordering

在項目中,咱們經常會遇到排序(或比較)需求,好比:對一個Person類html

case class Person(name: String, age: Int) {
  override def toString = {
    "name: " + name + ", age: " + age
  }
}

按name值逆詞典序、age值升序作排序;在Scala中應如何實現呢?api

1. 兩個特質

Scala提供兩個特質(trait)OrderedOrdering用於比較。其中,Ordered混入(mix)Java的Comparable接口,而Ordering則混入Comparator接口。衆所周知,在Java中ide

  • 實現Comparable接口的類,其對象具備了可比較性;
  • 實現Comparator接口的類,則提供一個外部比較器,用於比較兩個對象。

Ordered與Ordering的區別與之相相似:函數

  • Ordered特質定義了相同類型間的比較方式,但這種內部比較方式是單一的;
  • Ordered則是提供比較器模板,能夠自定義多種比較方式。

如下源碼分析基於Scala 2.10.5。源碼分析

Ordered

Ordered特質更像是rich版的Comparable接口,除了compare方法外,更豐富了比較操做(<, >, <=, >=):ui

trait Ordered[T] extends Comparable[T] {
  def compare(that: A): Int
  def <  (that: A): Boolean = (this compare that) <  0
  def >  (that: A): Boolean = (this compare that) >  0
  def <= (that: A): Boolean = (this compare that) <= 0
  def >= (that: A): Boolean = (this compare that) >= 0
  def compareTo(that: A): Int = compare(that)
}

此外,Ordered對象提供了從T到Ordered[T]的隱式轉換(隱式參數爲Ordering[T]):this

object Ordered {
  /** Lens from `Ordering[T]` to `Ordered[T]` */
  implicit def orderingToOrdered[T](x: T)(implicit ord: Ordering[T]): Ordered[T] =
    new Ordered[T] { def compare(that: T): Int = ord.compare(x, that) }
}

Ordering

Ordering,內置函數Ordering.byOrdering.on進行自定義排序:scala

import scala.util.Sorting
val pairs = Array(("a", 5, 2), ("c", 3, 1), ("b", 1, 3))

// sort by 2nd element
Sorting.quickSort(pairs)(Ordering.by[(String, Int, Int), Int](_._2))

// sort by the 3rd element, then 1st
Sorting.quickSort(pairs)(Ordering[(Int, String)].on(x => (x._3, x._1)))

2. 實戰

比較

對於Person類,如何作讓其對象具備可比較性呢?咱們可以使用Ordered對象的函數orderingToOrdered作隱式轉換,但還須要組織一個Ordering[Person]的隱式參數:code

implicit object PersonOrdering extends Ordering[Person] {
  override def compare(p1: Person, p2: Person): Int = {
    p1.name == p2.name match {
      case false => -p1.name.compareTo(p2.name)
      case _ => p1.age - p2.age
    }
  }
}

val p1 = new Person("rain", 13)
val p2 = new Person("rain", 14)
import Ordered._
p1 < p2 // True

Collection Sort

在實際項目中,咱們經常須要對集合進行排序。回到開篇的問題——如何對Person類的集合作指定排序呢?下面用List集合做爲demo,探討在scala集合排序。首先,咱們來看看List的sort函數:htm

// scala.collection.SeqLike

def sortWith(lt: (A, A) => Boolean): Repr = sorted(Ordering fromLessThan lt)

def sortBy[B](f: A => B)(implicit ord: Ordering[B]): Repr = sorted(ord on f)

def sorted[B >: A](implicit ord: Ordering[B]): Repr = {
...
}

若調用sorted函數作排序,則須要指定Ordering隱式參數:

val p1 = new Person("rain", 24)
val p2 = new Person("rain", 22)
val p3 = new Person("Lily", 15)
val list = List(p1, p2, p3)

implicit object PersonOrdering extends Ordering[Person] {
  override def compare(p1: Person, p2: Person): Int = {
    p1.name == p2.name match {
      case false => -p1.name.compareTo(p2.name)
      case _ => p1.age - p2.age
    }
  }
}
list.sorted 
// res3: List[Person] = List(name: rain, age: 22, name: rain, age: 24, name: Lily, age: 15)

若使用sortWith,則須要定義返回值爲Boolean的比較函數:

list.sortWith { (p1: Person, p2: Person) =>
   p1.name == p2.name match {
     case false => -p1.name.compareTo(p2.name) < 0
     case _ => p1.age - p2.age < 0
   }
}
// res4: List[Person] = List(name: rain, age: 22, name: rain, age: 24, name: Lily, age: 15)

若使用sortBy,也須要指定Ordering隱式參數:

implicit object PersonOrdering extends Ordering[Person] {
  override def compare(p1: Person, p2: Person): Int = {
    p1.name == p2.name match {
      case false => -p1.name.compareTo(p2.name)
      case _ => p1.age - p2.age
    }
  }
}

list.sortBy[Person](t => t)

RDD sort

RDD的sortBy函數,提供根據指定的key對RDD作全局的排序。sortBy定義以下:

def sortBy[K](
  f: (T) => K,
  ascending: Boolean = true,
  numPartitions: Int = this.partitions.length)
  (implicit ord: Ordering[K], ctag: ClassTag[K]): RDD[T]

僅需定義key的隱式轉換便可:

scala> val rdd = sc.parallelize(Array(new Person("rain", 24),
      new Person("rain", 22), new Person("Lily", 15)))

scala> implicit object PersonOrdering extends Ordering[Person] {
        override def compare(p1: Person, p2: Person): Int = {
          p1.name == p2.name match {
            case false => -p1.name.compareTo(p2.name)
            case _ => p1.age - p2.age
          }
        }
      }

scala> rdd.sortBy[Person](t => t).collect()
// res1: Array[Person] = Array(name: rain, age: 22, name: rain, age: 24, name: Lily, age: 15)

3. 參考資料

[1] Alvin Alexander, How to sort a sequence (Seq, List, Array, Vector) in Scala.

相關文章
相關標籤/搜索