算法（第4版） Chapter 5.1 字符串排序

時間 2019-12-05

標籤算法 chapter 5.1 字符串排序简体版

原文原文鏈接

Algorithms Fourth Edition
Written By Robert Sedgewick & Kevin Wayne
Translated By 謝路雲
Chapter 5 Section 1 字符串排序算法

參考資料
http://blog.csdn.net/guanhang...數組

引入

字符串方便比較嗎？不方便ui
怎麼辦呢？把每個字符對應成一個數字 toIndex(c)spa
一共有多少個字符？ R個.net
數字R須要幾個二進制位來表示？ lgR個指針
- 如擴展ASCII碼共256個字符，須要8位二進數來表示。code
區別orm
- Alphabet.toChar(index) 把數字對應成字符。這個是字母表的第i位對象
- String.charAt(index) 字符串的第i位是什麼字符。這個是字符串的第i位。
  字符表APIblog

標準字符表

鍵索引計數

輸入字符串和字符串對應的組別（組別也是字符串的鍵）
在知足組別有小到大排序的狀況下，將字符串按字母順序排序

算法步驟

第一步，記錄組別的頻率
（爲了獲得某個字符串在排序後的範圍，好比組別2確定在組別1後面，在組別3前面，把每一個組別有多少我的記錄下來，方便咱們定位）

共 5 組，從第 0 組到第 4 組。建立數組大小爲 6（ = 5 + 1 ）。int[] count=new count[6];
count[]記錄頻率
記錄的位置是鍵值+1，加1是方便後期更新鍵的位置起點

第二步，轉化爲索引
（獲得每一個組別的位置起點）

第三步，分類

建立一個副本（由於在遍歷正本，正本當前不能被覆蓋）
按組別 丟進副本里，丟到該組別的位置起點處
- 當前的數據是有序的
- 下面是我的的小思考，可不用看
- 若是原先的數據是有序的，那麼在每一個組別中的數據也將會是有序的
- 若是原先的數據是無序的，那麼先排序
- 有種遞歸的思想
  - 外面先排好序，裏面一層一層的去排序
  - 裏面先排好序，外面一層一層的去排序
該組別的位置起點 向後挪一位 （由於當前位被用了）

第四步，複製

把副本的數據拷貝回正本

KeyIndexedCounting 代碼

複雜度
- 訪問數組11N+4R+1次
索引計數法是穩定的

int N = a.length;
String[] aux = new String[N]; //訪問數組N次
int[] count = new int[R+1]; //訪問數組R+1次
// Compute frequency counts.
for(int i = 0;i<N;i++) //訪問數組2N次
    count[a[i].key()+1]++;
// Transform counts to indices.
for(int r = 0;r<R;r++) //訪問數組2R次，進行R次加法
    count[r+1]+=count[r];
// Distribute the records.
for(int i = 0;i<N;i++) //訪問數組3N次，使計數器值增大N次並移動數據N次
    aux[count[a[i].key()]++]=a[i];
// Copy back.
for(int i = 0;i<N;i++) //訪問數組2N次，移動數據N次
    a[i]=aux[i];

低位優先排序

結合索引排序，從字符串的低位（從右面開始），從右到左，每一個字符都當一次該字符串的鍵，給整個字符串排序
如下代碼的侷限性：每一個字符串的長度是相等的。稍做修改可適應不等長的字符串。

LSD 代碼

複雜度
- 訪問數組
  - 最壞狀況：~7WN + 3WR 次
  - 最好狀況：8N+3R 次
- 空間： R+N

public class LSD {
    public static void sort(String[] a, int W) { // Sort a[] on leading W characters.
        int N = a.length;
        int R = 256;
        String[] aux = new String[N];
        for (int d = W - 1; d >= 0; d--) { // Sort by key-indexed counting on dth char.
            int[] count = new int[R + 1];  // 建立數組大小爲R+1
            for (int i = 0; i < N; i++) // Compute frequency counts. 頻率
                count[a[i].charAt(d) + 1]++;
            for (int r = 0; r < R; r++) // Transform counts to indices. 索引
                count[r + 1] += count[r];
            for (int i = 0; i < N; i++) // Distribute. 按組別丟到副本里去
                aux[count[a[i].charAt(d)]++] = a[i];
            for (int i = 0; i < N; i++) // Copy back. 賦回正本
                a[i] = aux[i];
        }
    }
}

高位優先排序

考慮不等長字符串的比較

e.g. as 排在 aspect 前面。所以增長一個組別，記錄字符爲空的頻次。
這個組別應該在最前面，爲count[0]
- 怎麼讓字符爲空落到count[0]裏呢？
- 字符爲空時，對應數字爲0（具體實現的時候爲返回-1，再在-1的基礎上+1）
- 其餘字符對應的數字在原來基礎上+1（就是給0騰個位置出來，不佔用0，全部位次順移）
int[] count=new int[R+2];
- 原爲R+1
- 再在原來的基礎上+1，即爲R+2
字符爲空，也即搜尋的時候超出字符串的原來長度

MSD 代碼

public class MSD {
    private static int R = 256; // radix 256個字符
    private static final int M = 15; // cutoff for small subarrays 數組小到多少的時候用插入排序？
    private static String[] aux; // auxiliary array for distribution 副本

    private static int charAt(String s, int d) {
        if (d < s.length())
            return s.charAt(d);
        else
            return -1;
    }

    public static void sort(String[] a) {
        int N = a.length;
        aux = new String[N];
        sort(a, 0, N - 1, 0);
    }

    // Sort from a[lo] to a[hi], starting at the dth character.
    private static void sort(String[] a, int lo, int hi, int d) { 
        //若是數組較小，插入排序，具體實現略
        if (hi <= lo + M) {
            Insertion.sort(a, lo, hi, d);
            return;
        }
        
        int[] count = new int[R + 2]; // 數組大小R+2
        for (int i = lo; i <= hi; i++)// Compute frequency counts.頻次，只累計了hi-lo+1次
            count[charAt(a[i], d) + 2]++; // 每一個對應數字在原來基礎上+1
        for (int r = 0; r < R + 1; r++) // Transform counts to indices. 索引
            count[r + 1] += count[r];
        for (int i = lo; i <= hi; i++) // Distribute.丟到對應組別裏去
            aux[count[charAt(a[i], d) + 1]++] = a[i]; // 每一個對應數字在原來基礎上+1
                                                      // aux的賦值從aux[0]開始，到aux[hi-lo]結束
                                                      // 在這裏count會發生變化。原來這裏的count只是爲了移動到下一位爲下一個元素找位置用，如今這裏的count[i]還能夠經過是否到達count[i+1]來判斷是否能夠結束遞歸
        for (int i = lo; i <= hi; i++) // Copy back. 注意aux的起終點和a的對應關係
            a[i] = aux[i - lo];
        // Recursively sort for each character value.
        for (int r = 0; r < R; r++) //私認爲初始化條件r=1更好，由於r=0都是字符爲空的子字符串
            sort(a, lo + count[r], lo + count[r + 1] - 1, d + 1); // 將當前相同字符的分爲一組，每組如下一位字符爲比較對象排序
    }
}

LSD
- 從右到左，每次都是N個字符做爲一組，總體進行排序
MSD
- 從從到右，每次是第i位相同的字符串分紅一組，按第i+1位排序

三向字符串快速排序

能夠處理等值鍵，較長公共前綴，小數組，取值範圍較小的鍵
避免建立大量空數組，不須要額外空間

Quick3string 代碼

複雜度
- 平均： 2NlnN

public class Quick3string {
    private static int charAt(String s, int d) {
        if (d < s.length())
            return s.charAt(d);
        else
            return -1;
    }

    public static void sort(String[] a) {
        sort(a, 0, a.length - 1, 0);
    }

    private static void sort(String[] a, int lo, int hi, int d) {
        if (hi <= lo)
            return;
        int lt = lo, gt = hi; // 低位指針，高位指針
        int v = charAt(a[lo], d); // 切分值
        int i = lo + 1; // 從第二個字符串的d位開始
        while (i <= gt) {
            int t = charAt(a[i], d);
            if (t < v) // 比切分值小，放到切分值前面去
                exch(a, lt++, i++);
            else if (t > v) // 比切分值大，放到最後去
                exch(a, i, gt--);
            else
                i++;
        }

        // a[lo..lt-1] < v = a[lt..gt] < a[gt+1..hi]
        sort(a, lo, lt - 1, d);
        if (v >= 0) // d位字母相同且不爲空，則這部分從下一位開始再比較
            sort(a, lt, gt, d + 1);
        sort(a, gt + 1, hi, d);
    }
}

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。