《算法筆記》5. 前綴樹、桶排序、排序算法總結

時間 2020-07-17

標籤算法筆記前綴排序排序算法總結欄目應用數學简体版

原文原文鏈接

1 前綴樹結構(trie)、桶排序、排序總結

1 前綴樹結構(trie)、桶排序、排序總結

1.1 前綴樹結構

單個字符串中，字符從前到後的加到一顆多叉樹上java

字符放在路上，節點上有專屬的數據項(常見的是pass和end值)node

全部樣本都這樣添加。若是沒有路就新建，若是有路就複用git

沿途節點的pass值增長1.每一個字符串結束時來到的節點end值增長1面試

一個字符串數組中，全部字符串的字符數爲N,整個數組加入前綴樹種的代價是O(N)算法

功能一：構建好前綴樹以後，咱們查詢某個字符串在不在前綴樹中，某字符串在這顆前綴樹中出現了幾回都是特別方便的。例如找"ab"在前綴樹中存在幾回，能夠先看有無走向a字符的路徑(若是沒有，直接不存在)，再看走向b字符的路徑，此時檢查該節點的end標記的值，若是爲0，則前綴樹中不存在"ab"字符串，若是e>0則，e等於幾則"ab"在前綴樹種出現了幾回數組

功能二：若是單單是功能一，那麼哈希表也能夠實現。現查詢全部加入到前綴樹的字符串，有多少個以"a"字符做爲前綴，來到"a"的路徑，查看p值大小，就是以"a"做爲前綴的字符串數量緩存

package class05;

import java.util.HashMap;

public class Code02_TrieTree {

	public static class Node1 {
	        // pass表示字符從該節點的路徑經過
		public int pass;
		// end表示該字符到此節點結束
		public int end;
		public Node1[] nexts;

		public Node1() {
			pass = 0;
			end = 0;
			// 每一個節點下默認26條路，分別是a~z
			// 0    a
			// 1    b
			// 2    c
			// ..   ..
			// 25   z
			// nexts[i] == null   i方向的路不存在
			// nexts[i] != null   i方向的路存在
			nexts = new Node1[26];
		}
	}

	public static class Trie1 {
	         // 默認只留出頭節點
		private Node1 root;

		public Trie1() {
			root = new Node1();
		}

                // 往該前綴樹中添加字符串
		public void insert(String word) {
			if (word == null) {
				return;
			}
			char[] str = word.toCharArray();
			// 初始引用指向頭節點
			Node1 node = root;
			// 頭結點的pass首先++
			node.pass++;
			// 路徑的下標
			int path = 0;
			for (int i = 0; i < str.length; i++) { // 從左往右遍歷字符
			    // 當前字符減去'a'的ascii碼獲得須要添加的下個節點下標
				path = str[i] - 'a'; // 由字符，對應成走向哪條路
				// 當前方向上沒有創建節點，即一開始不存在這條路，新開闢
				if (node.nexts[path] == null) {
					node.nexts[path] = new Node1();
				}
				// 引用指向當前來到的節點
				node = node.nexts[path];
				// 當前節點的pass++
				node.pass++;
			}
			// 當新加的字符串全部字符處理結束，最後引用指向的當前節點就是該字符串的結尾節點，end++
			node.end++;
		}

                // 刪除該前綴樹的某個字符串
		public void delete(String word) {
		        // 首先要查一下該字符串是否加入過
			if (search(word) != 0) {
			    // 沿途pass--
				char[] chs = word.toCharArray();
				Node1 node = root;
				node.pass--;
				int path = 0;
				for (int i = 0; i < chs.length; i++) {
					path = chs[i] - 'a';
					// 在尋找的過程當中，pass爲0，提早能夠得知在本次刪除以後，該節點如下的路徑再也不須要，能夠直接刪除。
					// 那麼該節點之下下個方向的節點引用置爲空（JVM垃圾回收，至關於該節點下的路徑被刪了）
					if (--node.nexts[path].pass == 0) {
						node.nexts[path] = null;
						return;
					}
					node = node.nexts[path];
				}
				// 最後end--
				node.end--;
			}
		}
                // 在該前綴樹中查找
		// word這個單詞以前加入過幾回
		public int search(String word) {
			if (word == null) {
				return 0;
			}
			char[] chs = word.toCharArray();
			Node1 node = root;
			int index = 0;
			for (int i = 0; i < chs.length; i++) {
				index = chs[i] - 'a';
				// 尋找該字符串的路徑中若是提早找不到path，就是未加入過，0次
				if (node.nexts[index] == null) {
					return 0;
				}
				node = node.nexts[index];
			}
			// 若是順利把word字符串在前綴樹中走完路徑，那麼此時的node對應的end值就是當前word在該前綴樹中添加了幾回
			return node.end;
		}

		// 全部加入的字符串中，有幾個是以pre這個字符串做爲前綴的
		public int prefixNumber(String pre) {
			if (pre == null) {
				return 0;
			}
			char[] chs = pre.toCharArray();
			Node1 node = root;
			int index = 0;
			for (int i = 0; i < chs.length; i++) {
				index = chs[i] - 'a';
				// 走不到最後，就沒有
				if (node.nexts[index] == null) {
					return 0;
				}
				node = node.nexts[index];
			}
			// 順利走到最後，返回的pass就是有多少個字符串以當前pre爲前綴的
			return node.pass;
		}
	}


       /**
        * 實現方式二,針對各類字符串，路徑不只僅是a~z對應的26個，用HashMap<Integer, Node2>表示ascii碼值對應的node。
        **/
	public static class Node2 {
		public int pass;
		public int end;
		public HashMap<Integer, Node2> nexts;

		public Node2() {
			pass = 0;
			end = 0;
			nexts = new HashMap<>();
		}
	}

	public static class Trie2 {
		private Node2 root;

		public Trie2() {
			root = new Node2();
		}

		public void insert(String word) {
			if (word == null) {
				return;
			}
			char[] chs = word.toCharArray();
			Node2 node = root;
			node.pass++;
			int index = 0;
			for (int i = 0; i < chs.length; i++) {
				index = (int) chs[i];
				if (!node.nexts.containsKey(index)) {
					node.nexts.put(index, new Node2());
				}
				node = node.nexts.get(index);
				node.pass++;
			}
			node.end++;
		}

		public void delete(String word) {
			if (search(word) != 0) {
				char[] chs = word.toCharArray();
				Node2 node = root;
				node.pass--;
				int index = 0;
				for (int i = 0; i < chs.length; i++) {
					index = (int) chs[i];
					if (--node.nexts.get(index).pass == 0) {
						node.nexts.remove(index);
						return;
					}
					node = node.nexts.get(index);
				}
				node.end--;
			}
		}

		// word這個單詞以前加入過幾回
		public int search(String word) {
			if (word == null) {
				return 0;
			}
			char[] chs = word.toCharArray();
			Node2 node = root;
			int index = 0;
			for (int i = 0; i < chs.length; i++) {
				index = (int) chs[i];
				if (!node.nexts.containsKey(index)) {
					return 0;
				}
				node = node.nexts.get(index);
			}
			return node.end;
		}

		// 全部加入的字符串中，有幾個是以pre這個字符串做爲前綴的
		public int prefixNumber(String pre) {
			if (pre == null) {
				return 0;
			}
			char[] chs = pre.toCharArray();
			Node2 node = root;
			int index = 0;
			for (int i = 0; i < chs.length; i++) {
				index = (int) chs[i];
				if (!node.nexts.containsKey(index)) {
					return 0;
				}
				node = node.nexts.get(index);
			}
			return node.pass;
		}
	}

    
	public static class Right {

		private HashMap<String, Integer> box;

		public Right() {
			box = new HashMap<>();
		}

		public void insert(String word) {
			if (!box.containsKey(word)) {
				box.put(word, 1);
			} else {
				box.put(word, box.get(word) + 1);
			}
		}

		public void delete(String word) {
			if (box.containsKey(word)) {
				if (box.get(word) == 1) {
					box.remove(word);
				} else {
					box.put(word, box.get(word) - 1);
				}
			}
		}

		public int search(String word) {
			if (!box.containsKey(word)) {
				return 0;
			} else {
				return box.get(word);
			}
		}

		public int prefixNumber(String pre) {
			int count = 0;
			for (String cur : box.keySet()) {
				if (cur.startsWith(pre)) {
					count += box.get(cur);
				}
			}
			return count;
		}
	}

	// for test
	public static String generateRandomString(int strLen) {
		char[] ans = new char[(int) (Math.random() * strLen) + 1];
		for (int i = 0; i < ans.length; i++) {
			int value = (int) (Math.random() * 6);
			ans[i] = (char) (97 + value);
		}
		return String.valueOf(ans);
	}

	// for test
	public static String[] generateRandomStringArray(int arrLen, int strLen) {
		String[] ans = new String[(int) (Math.random() * arrLen) + 1];
		for (int i = 0; i < ans.length; i++) {
			ans[i] = generateRandomString(strLen);
		}
		return ans;
	}

	public static void main(String[] args) {
		int arrLen = 100;
		int strLen = 20;
		int testTimes = 100000;
		for (int i = 0; i < testTimes; i++) {
			String[] arr = generateRandomStringArray(arrLen, strLen);
			Trie1 trie1 = new Trie1();
			Trie2 trie2 = new Trie2();
			Right right = new Right();
			for (int j = 0; j < arr.length; j++) {
				double decide = Math.random();
				if (decide < 0.25) {
					trie1.insert(arr[j]);
					trie2.insert(arr[j]);
					right.insert(arr[j]);
				} else if (decide < 0.5) {
					trie1.delete(arr[j]);
					trie2.delete(arr[j]);
					right.delete(arr[j]);
				} else if (decide < 0.75) {
					int ans1 = trie1.search(arr[j]);
					int ans2 = trie2.search(arr[j]);
					int ans3 = right.search(arr[j]);
					if (ans1 != ans2 || ans2 != ans3) {
						System.out.println("Oops!");
					}
				} else {
					int ans1 = trie1.prefixNumber(arr[j]);
					int ans2 = trie2.prefixNumber(arr[j]);
					int ans3 = right.prefixNumber(arr[j]);
					if (ans1 != ans2 || ans2 != ans3) {
						System.out.println("Oops!");
					}
				}
			}
		}
		System.out.println("finish!");

	}

}

1.2 不基於比較的排序-桶排序

例如：一個表明員工年齡的數組，排序。數據範圍有限，對每一個年齡作詞頻統計。arr[0~200] = 0,M=200dom

空間換時間ide

1.2.1 計數排序

桶排序思想下的排序：計數排序 & 基數排序

一、 桶排序思想下的排序都是不基於比較的排序

二、 時間複雜度爲O(N),二維空間複雜複雜度爲O(M)

三、 應用範圍有限，須要樣本的數據情況知足桶的劃分

缺點：與樣本數據情況強相關。code

1.2.2 基數排序

應用條件：十進制數據，非負

[100,17,29,13,5,27] 進行排序 =>

一、找最高位的那個數的長度，這裏100的長度爲3，其餘數前補0，得出

[100,017,029,013,005,027]

二、 準備10個桶，對應的數字0~9號桶，每一個桶是一個隊列。根據樣本按個位數字對應進桶，相同個位數字進入隊列，再從0號桶以此倒出，隊列先進先出。個位進桶再依次倒出，得出：

[100,013,005,017,027,029]

三、 再把按照個位進桶倒出的樣本，再按十位進桶，再按相同規則倒出得：

[100,005,013,017,027,029]

四、再把獲得的樣本按百位進桶，倒出得：

[005,013,017,027,029,100]

此時達到有序！

思想：先按各位數字排序，各位數字排好序，再用十位數字的順序去調整，再按百位次序調整。優先級依次遞增，百位優先級最高，百位優先級同樣默認按照上一層十位的順序...

結論：基於比較的排序，時間複雜度的極限就是O(NlogN)，而不基於比較的排序，時間複雜度能夠達到O(N)。在面試或刷題，估算排序的時間複雜度的時候，必須用基於比較的排序來估算

/**
* 計數排序
**/
package class05;

import java.util.Arrays;

public class Code03_CountSort {

        // 計數排序
	// only for 0~200 value
	public static void countSort(int[] arr) {
		if (arr == null || arr.length < 2) {
			return;
		}
		int max = Integer.MIN_VALUE;
		for (int i = 0; i < arr.length; i++) {
			max = Math.max(max, arr[i]);
		}
		int[] bucket = new int[max + 1];
		for (int i = 0; i < arr.length; i++) {
			bucket[arr[i]]++;
		}
		int i = 0;
		for (int j = 0; j < bucket.length; j++) {
			while (bucket[j]-- > 0) {
				arr[i++] = j;
			}
		}
	}

	// for test
	public static void comparator(int[] arr) {
		Arrays.sort(arr);
	}

	// for test
	public static int[] generateRandomArray(int maxSize, int maxValue) {
		int[] arr = new int[(int) ((maxSize + 1) * Math.random())];
		for (int i = 0; i < arr.length; i++) {
			arr[i] = (int) ((maxValue + 1) * Math.random());
		}
		return arr;
	}

	// for test
	public static int[] copyArray(int[] arr) {
		if (arr == null) {
			return null;
		}
		int[] res = new int[arr.length];
		for (int i = 0; i < arr.length; i++) {
			res[i] = arr[i];
		}
		return res;
	}

	// for test
	public static boolean isEqual(int[] arr1, int[] arr2) {
		if ((arr1 == null && arr2 != null) || (arr1 != null && arr2 == null)) {
			return false;
		}
		if (arr1 == null && arr2 == null) {
			return true;
		}
		if (arr1.length != arr2.length) {
			return false;
		}
		for (int i = 0; i < arr1.length; i++) {
			if (arr1[i] != arr2[i]) {
				return false;
			}
		}
		return true;
	}

	// for test
	public static void printArray(int[] arr) {
		if (arr == null) {
			return;
		}
		for (int i = 0; i < arr.length; i++) {
			System.out.print(arr[i] + " ");
		}
		System.out.println();
	}

	// for test
	public static void main(String[] args) {
		int testTime = 500000;
		int maxSize = 100;
		int maxValue = 150;
		boolean succeed = true;
		for (int i = 0; i < testTime; i++) {
			int[] arr1 = generateRandomArray(maxSize, maxValue);
			int[] arr2 = copyArray(arr1);
			countSort(arr1);
			comparator(arr2);
			if (!isEqual(arr1, arr2)) {
				succeed = false;
				printArray(arr1);
				printArray(arr2);
				break;
			}
		}
		System.out.println(succeed ? "Nice!" : "Fucking fucked!");

		int[] arr = generateRandomArray(maxSize, maxValue);
		printArray(arr);
		countSort(arr);
		printArray(arr);

	}

}

下面代碼的思想：

例如原數組[101,003,202,41,302]。獲得按個位的詞頻conut數組爲[0,2,2,1,0,0,0,0,0,0]。經過conut詞頻累加獲得conut'爲[0,2,4,5,5,5,5,5,5,5],此時conut'的含義表示個位數字小於等於0的數字有0個，個位數字小於等於1的有兩個，個位數字小於等於2的有4個......

獲得conut'以後，對原數組[101,003,202,41,302]從右往左遍歷。根據基數排序的思想，302應該是2號桶最後被倒出的，咱們已經知道個位數字小於等於2的有4個，那麼302就是4箇中的最後一個，放在help數組的3號位置,相應的conut'小於等於2位置的詞頻減減變爲3。同理，41是1號桶的最後一個，個位數字小於等於1的數字有兩個，那麼41須要放在1號位置，小於等於1位置的詞頻減減變爲1，同理......

實質增長conut和count'結構，避免申請十個隊列結構，不想炫技直接申請10個隊列結構，按基數排序思想直接作沒問題

實質上，基數排序的時間複雜度是O(Nlog10max(N)),log10N表示十進制的數的位數，可是咱們認爲基數排序的應用樣本範圍不大。若是要排任意位數的值，嚴格上就是O(Nlog10max(N))

/**
* 基數排序
**/
package class05;

import java.util.Arrays;

public class Code04_RadixSort {

        // 非負數，十進制，若是負數須要深度改寫這個方法
	// only for no-negative value
	public static void radixSort(int[] arr) {
		if (arr == null || arr.length < 2) {
			return;
		}
		radixSort(arr, 0, arr.length - 1, maxbits(arr));
	}

        // 計算數組樣本中最大值的位數
	public static int maxbits(int[] arr) {
		int max = Integer.MIN_VALUE;
		for (int i = 0; i < arr.length; i++) {
			max = Math.max(max, arr[i]);
		}
		int res = 0;
		while (max != 0) {
			res++;
			max /= 10;
		}
		return res;
	}

	// arr[l..r]排序  ,  digit：最大值的位數
	// l..r    [3, 56, 17, 100]    3
	public static void radixSort(int[] arr, int L, int R, int digit) {
	    // 因爲十進制的數，咱們依10位基底
		final int radix = 10;
		int i = 0, j = 0;
		// 有多少個數準備多少個輔助空間
		int[] help = new int[R - L + 1];
		for (int d = 1; d <= digit; d++) { // 有多少位就進出幾回
			// 10個空間
		        // count[0] 當前位(d位)是0的數字有多少個
			// count[1] 當前位(d位)是(0和1)的數字有多少個
			// count[2] 當前位(d位)是(0、1和2)的數字有多少個
			// count[i] 當前位(d位)是(0~i)的數字有多少個
			int[] count = new int[radix]; // count[0..9]
			for (i = L; i <= R; i++) {
				// 103的話  d是1表示個位 取出j=3
				// 209  1   9
				j = getDigit(arr[i], d);
				count[j]++;
			}
			// conut往conut'的轉化
			for (i = 1; i < radix; i++) {
				count[i] = count[i] + count[i - 1];
			}
			// i從最後位置往前看
			for (i = R; i >= L; i--) {
				j = getDigit(arr[i], d);
				help[count[j] - 1] = arr[i];
				// 詞頻--
				count[j]--;
			}
			// 處理完個位十位...以後都要往原數組copy
			for (i = L, j = 0; i <= R; i++, j++) {
				arr[i] = help[j];
			}
		}
		
		
		
		
	}

	public static int getDigit(int x, int d) {
		return ((x / ((int) Math.pow(10, d - 1))) % 10);
	}

	// for test
	public static void comparator(int[] arr) {
		Arrays.sort(arr);
	}

	// for test
	public static int[] generateRandomArray(int maxSize, int maxValue) {
		int[] arr = new int[(int) ((maxSize + 1) * Math.random())];
		for (int i = 0; i < arr.length; i++) {
			arr[i] = (int) ((maxValue + 1) * Math.random());
		}
		return arr;
	}

	// for test
	public static int[] copyArray(int[] arr) {
		if (arr == null) {
			return null;
		}
		int[] res = new int[arr.length];
		for (int i = 0; i < arr.length; i++) {
			res[i] = arr[i];
		}
		return res;
	}

	// for test
	public static boolean isEqual(int[] arr1, int[] arr2) {
		if ((arr1 == null && arr2 != null) || (arr1 != null && arr2 == null)) {
			return false;
		}
		if (arr1 == null && arr2 == null) {
			return true;
		}
		if (arr1.length != arr2.length) {
			return false;
		}
		for (int i = 0; i < arr1.length; i++) {
			if (arr1[i] != arr2[i]) {
				return false;
			}
		}
		return true;
	}

	// for test
	public static void printArray(int[] arr) {
		if (arr == null) {
			return;
		}
		for (int i = 0; i < arr.length; i++) {
			System.out.print(arr[i] + " ");
		}
		System.out.println();
	}

	// for test
	public static void main(String[] args) {
		int testTime = 500000;
		int maxSize = 100;
		int maxValue = 100000;
		boolean succeed = true;
		for (int i = 0; i < testTime; i++) {
			int[] arr1 = generateRandomArray(maxSize, maxValue);
			int[] arr2 = copyArray(arr1);
			radixSort(arr1);
			comparator(arr2);
			if (!isEqual(arr1, arr2)) {
				succeed = false;
				printArray(arr1);
				printArray(arr2);
				break;
			}
		}
		System.out.println(succeed ? "Nice!" : "Fucking fucked!");

		int[] arr = generateRandomArray(maxSize, maxValue);
		printArray(arr);
		radixSort(arr);
		printArray(arr);

	}

}

1.3 排序算法的穩定性

穩定性是指一樣大小的樣本在排序以後不會改變相對次序。基礎類型穩定性沒意義，用處是按引用傳遞後是否穩定。好比學生有班級和年齡兩個屬性，先按班級排序，再按年齡排序，那麼若是是穩定性的排序，不會破壞以前已經按班級拍好的順序

穩定性排序的應用場景：購物時候，先按價格排序商品，再按好評度排序，那麼好評度實在價格排好序的基礎上。反之不穩定排序會破壞一開始按照價格排好的次序

1.3.1 穩定的排序

一、冒泡排序（處理相等時不交換）

二、插入排序（相等不交換）

三、歸併排序（merge時候，相等先copy左邊的）

1.3.2 不穩定的排序

一、選擇排序

二、快速排序（partion過程沒法保證穩定）

三、堆排序（維持堆結構）

1.3.3 排序穩定性對比

排序	時間複雜度	空間複雜度	穩定性
選擇排序	O(N^2)	O(1)	無
冒泡排序	O(N^2)	O(1)	有
插入排序	O(N^2)	O(1)	有
歸併排序	O(NlogN)	O(N)	有
隨機快拍	O(NlogN)	O(logN)	無
堆排序	O(NlogN)	O(1)	無
計數排序	O(N)	O(M)	有
堆排序	O(N)	O(N)	有

1.4 排序算法總結

不基於比較的排序，對樣本數據有嚴格要求，不易改寫
基於比較的排序，只要規定好兩個樣本怎麼比較大小就能夠直接複用
基於比較的排序，時間複雜度的極限是O(NlogN)
時間複雜度O(NlogN)、額外空間複雜度低於O(N)，且穩定的基於比較的排序是不存在的
爲了絕對的速度選擇快排（快排的常數時間低），爲了節省空間選擇堆排序，爲了穩定性選歸併

1.5 排序常見的坑點

歸併排序的額爲空間複雜度能夠變爲O(1)。「歸併排序內部緩存法」，可是將會變的不穩定。不考慮穩定不如直接選擇堆排序

「原地歸併排序」是垃圾帖子，會讓時間複雜度變成O(N ^2)。時間複雜度退到O(N ^2)不如直接選擇插入排序

快速排序穩定性改進，「01 stable sort」，可是會對樣本數據要求更多。對數據進行限制，不如選擇桶排序

在整形數組中，請把奇數放在數組左邊，偶數放在數組右邊，要求全部奇數之間原始次序不變，全部偶數之間原始次序不變。要求時間複雜度O(N),額爲空間複雜度O(1)。這是個01標準的partion，奇偶規則，可是快速排序的partion過程作不到穩定性。因此正常實現不了，學術論文(01 stable sort,不建議碰，比較難)中須要把數據閹割限制以後才能作到