STL::sort函數實現

時間 2019-12-15

標籤 stl sort 函數實現简体版

原文原文鏈接

聲明：本文參考連接：STL::sort實現。html

排序是面試中常常被問及的算法基礎知識點，雖然實際應用中不會直接使用，可是理解這些簡單的算法知識對於更復雜更實用的算法有必定的幫助，畢竟面試總不能問的太過深刻，那麼這些知識點就顯得很重要了。咱們在程序中常常利用sort給序列排序，那麼你知道它是什麼實現的嗎？面試

函數聲明

#include <algorithm>

template <class RandomAccessIterator>
  void sort (RandomAccessIterator first, RandomAccessIterator last);

template <class RandomAccessIterator, class Compare>
  void sort (RandomAccessIterator first, RandomAccessIterator last, Compare comp);

來自sort - C++ Reference。STL提供了兩種調用方式，一種是使用默認的 < 操做符比較，一種能夠自定義比較函數。但是爲何它一般比咱們本身寫的排序要快那麼多呢？算法

實現原理

STL中的sort不是普通的快排，除了對普通的快速排序進行優化，它還結合了插入排序和堆排序。根據不一樣的數量級別以及不一樣狀況，能自動選用合適的排序方法。當數據量較大時採用快速排序，分段遞歸。一旦分段後的數據量小於某個閥值，爲避免遞歸調用帶來過大的額外負荷，便會改用插入排序；而若是遞歸層次過深，有出現最壞狀況的傾向，還會改用堆排序。dom

普通的快速排序

參考個人另外一篇隨筆：十大排序算法，有對各個排序算法的分析。其中快速排序的描述以下：函數

- 從序列中選取排序基準（pivot）；
- 對序列進行排序，全部比基準值小的擺放在基準左邊，全部比基準值大的擺在基準的右邊，序列分爲左右兩個子序列。稱爲分區操做（partition）；
- 遞歸，對左右兩個子序列分別進行快速排序。oop

其中分區操做的方法一般採用兩個迭代器head和tail，head從頭端往尾端移動，tail從尾端往頭端移動，當head遇到大於等於pivot的元素就停下來，tail遇到小於等於pivot的元素也停下來，若head迭代器仍然小於tail迭代器，即二者沒有交叉，則互換元素，而後繼續進行相同的動做，向中間逼近，直到兩個迭代器交叉，結束一次分割。優化

快速排序最關鍵的地方在於基準的選擇，最壞的狀況發生在分割時產生了一個空的區間，這樣就徹底沒有達到分割的效果。STL採用的作法稱爲median-of-three，即取整個序列的首、尾、中央三個地方的元素，以其中值做爲基準。ui

內省式排序 Introsort

不當的基準選擇會致使不當的分割，會使快速排序惡化爲 O(n^2)。David R.Musser於1996年提出一種混合式排序算法：Introspective Sorting（內省式排序），簡稱IntroSort，其行爲大部分與上面所說的median-of-three Quick Sort徹底相同，可是當分割行爲有惡化爲二次方的傾向時，可以自我偵測，轉而改用堆排序，使效率維持在堆排序的 O(nlgn)，又比一開始就使用堆排序來得好。spa

代碼剖析

sort 函數中最後經過調用 __sort 函數，下面是 __sort 函數的具體實現，默認使用<操做符。code

template<typename _RandomAccessIterator, typename _Compare>
    inline void
    __sort(_RandomAccessIterator __first, _RandomAccessIterator __last,
	   _Compare __comp)
    {
      if (__first != __last)
	{
	  std::__introsort_loop(__first, __last,
				std::__lg(__last - __first) * 2,
				__comp);
	  std::__final_insertion_sort(__first, __last, __comp);
	}
    }

其中的 std::__introsort_loop 即是上面介紹的內省式排序，其第三個參數中所調用的函數 __lg() 即是用來控制分割惡化狀況，具體功能相似求lg(n)（取下整），意味着快速排序的遞歸調用最多 2*lg(n) 層。

1.內省式：__introsort_loop

__sort函數首先調用內省式排序，__introsort_loop 函數的實現以下：

/// This is a helper function for the sort routine.
  template<typename _RandomAccessIterator, typename _Size, typename _Compare>
    void
    __introsort_loop(_RandomAccessIterator __first,
		     _RandomAccessIterator __last,
		     _Size __depth_limit, _Compare __comp)
    {
      while (__last - __first > int(_S_threshold))
	{
	  if (__depth_limit == 0)
	    {
	      std::__partial_sort(__first, __last, __last, __comp);
	      return;
	    }
	  --__depth_limit;
	  _RandomAccessIterator __cut =
	    std::__unguarded_partition_pivot(__first, __last, __comp);
	  std::__introsort_loop(__cut, __last, __depth_limit, __comp);
	  __last = __cut;
	}
    }

首先判斷元素規模是否大於閥值_S_threshold，_S_threshold是一個常整形的全局變量，值爲16，表示若元素規模小於等於16，則結束內省式排序算法，返回sort函數，改用插入排序 __final_insertion_sort。
若元素規模大於_S_threshold，則判斷遞歸調用深度是否超過限制。若已經到達最大限制層次的遞歸調用，則改用堆排序。代碼中的 __partial_sort 即用堆排序實現。
若沒有超過遞歸調用深度，則調用函數 __unguarded_partition_pivot 對當前元素作一趟快速排序，並返回基準位置。
快排以後，再遞歸對右半部分調用內省式排序算法。而後回到while循環，對左半部分進行排序。源碼寫法和咱們通常的寫法不一樣，但原理是同樣的，這是很明顯的尾遞歸優化，須要注意。

2.快速排序：__unguarded_partition_pivot

快速排序函數 __unguarded_partition_pivot 的代碼以下：

/// This is a helper function...
  template<typename _RandomAccessIterator, typename _Compare>
    _RandomAccessIterator
    __unguarded_partition(_RandomAccessIterator __first,
			  _RandomAccessIterator __last,
			  _RandomAccessIterator __pivot, _Compare __comp)
    {
      while (true)
	{
	  while (__comp(__first, __pivot))
	    ++__first;
	  --__last;
	  while (__comp(__pivot, __last))
	    --__last;
	  if (!(__first < __last))
	    return __first;
	  std::iter_swap(__first, __last);
	  ++__first;
	}
    }

  /// This is a helper function...
  template<typename _RandomAccessIterator, typename _Compare>
    inline _RandomAccessIterator
    __unguarded_partition_pivot(_RandomAccessIterator __first,
				_RandomAccessIterator __last, _Compare __comp)
    {
      _RandomAccessIterator __mid = __first + (__last - __first) / 2;
      std::__move_median_to_first(__first, __first + 1, __mid, __last - 1,
				  __comp);
      return std::__unguarded_partition(__first + 1, __last, __first, __comp);
    }

這個代碼比較容易理解，快速排序，並返回樞軸位置。__unguarded_partition()函數採用的即是上面所講的使用兩個迭代器的方法，將序列分爲左右兩個子序列。其中還注意到 __move_median_to_first 函數，就是以前提到的 median-of-three，目的是從頭部、中部、尾部三個數中選出中間值做爲「基準」，基準保存在 __first 中，實現代碼以下：

/// Swaps the median value of *__a, *__b and *__c under __comp to *__result
  template<typename _Iterator, typename _Compare>
    void
    __move_median_to_first(_Iterator __result,_Iterator __a, _Iterator __b,
			   _Iterator __c, _Compare __comp)
    {
      if (__comp(__a, __b))
	{
	  if (__comp(__b, __c))
	    std::iter_swap(__result, __b);
	  else if (__comp(__a, __c))
	    std::iter_swap(__result, __c);
	  else
	    std::iter_swap(__result, __a);
	}
      else if (__comp(__a, __c))
	std::iter_swap(__result, __a);
      else if (__comp(__b, __c))
	std::iter_swap(__result, __c);
      else
	std::iter_swap(__result, __b);
    }

3.堆排序：__partial_sort

以前在 __introsort_loop 函數中看到若是遞歸調用深度是否超過限制，若已經到達最大限制層次的遞歸調用，則改用堆排序。代碼中的 __partial_sort 即用堆排序實現，其部分實現代碼以下（堆排序的代碼特別多）：

/// This is a helper function for the sort routines.
  template<typename _RandomAccessIterator, typename _Compare>
    void
    __heap_select(_RandomAccessIterator __first,
		  _RandomAccessIterator __middle,
		  _RandomAccessIterator __last, _Compare __comp)
    {
      std::__make_heap(__first, __middle, __comp);
      for (_RandomAccessIterator __i = __middle; __i < __last; ++__i)
	if (__comp(__i, __first))
	  std::__pop_heap(__first, __middle, __i, __comp);
    }

template<typename _RandomAccessIterator, typename _Compare>
    void
    __sort_heap(_RandomAccessIterator __first, _RandomAccessIterator __last,
		_Compare __comp)
    {
      while (__last - __first > 1)
	{
	  --__last;
	  std::__pop_heap(__first, __last, __last, __comp);
	}
    }

template<typename _RandomAccessIterator, typename _Compare>
    inline void
    __partial_sort(_RandomAccessIterator __first,
		   _RandomAccessIterator __middle,
		   _RandomAccessIterator __last,
		   _Compare __comp)
    {
      std::__heap_select(__first, __middle, __last, __comp);
      std::__sort_heap(__first, __middle, __comp);
    }

4.插入排序:__final_insertion_sort

通過__introsort_loop排序以後，元素規模小於_S_threshold，最後再次回到 __sort 函數，執行插入排序__final_insertion_sort，其實現代碼以下：

/// This is a helper function for the sort routine.
  template<typename _RandomAccessIterator, typename _Compare>
    void
    __final_insertion_sort(_RandomAccessIterator __first,
			   _RandomAccessIterator __last, _Compare __comp)
    {
      if (__last - __first > int(_S_threshold))
	{
	  std::__insertion_sort(__first, __first + int(_S_threshold), __comp);
	  std::__unguarded_insertion_sort(__first + int(_S_threshold), __last,
					  __comp);
	}
      else
	std::__insertion_sort(__first, __last, __comp);
    }