Majority Vote Alogrithm（最大投票算法）及其擴展

時間 2019-11-10

標籤 majority vote alogrithm 最大投票算法及其擴展简体版

原文原文鏈接

Boyer-Moore：A Linear Time Majority Vote Alogrithm，這是最基礎的最大投票算法。html

原文中提到：decides which element of a sequence is in the majority, provided there is such an element.,可是講的有一些含糊。我再補充一下：在一次投票中，若是某一種投票出現的數量大於（這裏必須是大於而不能是等於，不然在某些特殊條件下會獲得錯誤結果）總投票，咱們就認爲這種投票是咱們要找的 Majority Element。java

參考 Leetcode 上的這道題：169.Majority Element算法

Given an array of size n, find the Majority Element. The Majority Element is the element that appears more than ⌊ n/2 ⌋ times.數組

You may assume that the array is non-empty and the Majority Element always exist in the array.app

算法的具體思路是：假設在給定長度爲 n 的數組中，Majority Element 出現的次數是 k 次，那麼非 Majority Element 的出現次數就爲 n-k。若是咱們能去掉這 n-k 個元素那麼剩下的就所有是 Majority Element 了。ide

咱們能夠遍歷數組,當碰到兩個不同的數字時,咱們將這兩個數字同時丟棄這兩個數字中可能有一個爲 Majority Element,也可能兩個都不爲Majority Element.由於k 大於 n/2,因此在最差狀況下(每次移除不一樣數字時都包含一個Majority Element),咱們仍然可以保證最後獲得的數字是Majority Element.idea

在網上看到的不少資料中，對這一步的解釋都是略微有些問題的。不少人簡單的將這一步解釋爲：找到一個Majority Element，隨後找到一個 非Majority Element，並將他們一併移除，這實際上是錯誤的。咱們在循環的時候，並無辦法斷定當前的數字是否爲 Majority Element，因此在移除的時候，咱們多是移除了一個 Majority Element 和一個 非Majority Element，也有可能移除的是兩個非Majority Element。因此最後 count 的值是不肯定的，可是它可以保證在最差狀況下，剩餘的仍然是 Majority Element。例如，[1,2,3,3,3] 和 [1,3,2,3,3] 這兩個數組最後獲得的 count 分別爲 3 和 1，可是這並不影響答案的正確性。spa

這也是前面提到的Majority Element的數量必須大於n/2的緣由.code

很容易算出最後剩餘的Majority Element個數最少爲： n - ((n - k) + (n - k)) = 2k - n。htm

public class Solution {
    public int majorityElement(int[] nums) {
        int candidate = 0;
        for(int i = 0,count = 0; i < nums.length; i++){
            //問題一： if 的斷定順序有要求嗎？若是有要求的話應該是怎麼樣的呢？
            if(count == 0){
                count++;
                candidate = nums[i];
            }else if(candidate != nums[i]){
                count--;
            }else{
                count++;
            }
        }
        return candidate;
    }
}

這個算法很經典，也很簡單，~~畢竟不用本身想~~。

接下來，咱們能夠對這個算法作一些簡單的擴展，咱們當前定義的 Majority Element 的數量大於 n/2 的元素。

若是咱們在投票只要知足投票數量超過 n/3 即認爲它是最大投票，咱們能不能求出這個值呢？

~~媽蛋，文章中這種問題就跟小說裏主角跳崖會不會死同樣，有標準答案的。~~
~~喬治啊啊馬丁：？~~

最大投票資料片：~~熊貓人之謎~~ 229. Majority Element II

Given an integer array of size n, find all elements that appear more than ⌊ n/3 ⌋ times. The algorithm should run in linear time and in O(1) space.

思路依然同 Majority Element 同樣，不一樣的是咱們須要兩個 Majority Element 的候選者，同時須要兩個 count 分別對候選者進行計數。

count 爲 candidate 當前出現的次數。count == 0 說明當前 candidate 對應的候選者已經被移除，咱們須要設定一個新的候選者。

public class Solution {
    public List<Integer> majorityElement(int[] nums) {
        //問題二：這裏給 candidate0 candidate1 初始化值爲 0，這會不會影響咱們運行的結果？
        int candidate0 = 0,candidate1 = 0,count0 = 0, count1 = 0;
        for(int i = 0; i < nums.length; i++){
            if(candidate0 == nums[i]){
                //當前數字等於一號候選數字
                count0++;
            }else if(candidate1 == nums[i]){
                //當前數字等於二號候選數字
                count1++;
            }else if(count0 == 0){
                //當前數字不等於一號候選數字或二號候選數字
                //同時必須知足 count 等於 0，由於若是 count != 0，說明還有候選數字在等待與它一組的另外兩個數字
                count0++;
                candidate0 = nums[i];
            }else if(count1 == 0){
                count1++;
                candidate1 = nums[i];
            }else{
                //只有 不知足以上全部條件咱們才能對 count 進行減操做
                count0--;
                count1--;
            }
        }
        
        //**問題三：這裏可以省略 distinct() 嗎？爲何？**
        return Stream.of(candidate0, candidate1).distinct().filter(num -> {
            int count = 0;
            for(int i = 0; i < nums.length; i++){
                if(nums[i] == num){
                    count++;
                }
            }
            return count > nums.length / 3;
        }).collect(Collectors.toList());
    }
}

咱們再梳理一遍思路：咱們須要找到三個不一樣的數字，而後拋棄掉這三個數字：
首先要判斷是否等於candidate，若是等於candidate那麼對應的 candidate 必須加一等待其餘的數字來消除它
當有一個 candidate 的 count 爲 0 時，說明該 candidate 已經所有被消除，咱們須要設定新的 candidate 數字。
當一個數字不等於兩個 candidate，同時兩個 candidate 的 count 都不爲零。這意味着當前這個數字就是這兩個 candidate 等待的第三個數字。因而這三個數字被移除，同時他們的 count 都要減一。

這個算法到這裏就結束了，時間複雜度是線性的 O(n),空間複雜度是 O(1)。
接下來是問題解答時間：

問題一： if 的斷定順序有要求嗎？若是有要求的話應該是怎麼樣的呢？

答案是有要求，細心的讀者可能發現，在 Majority Element 中，咱們對 count == 0 的判斷在對 candidate == nums[i] 的判斷以前，而在 Majority Element II 中則正好相反。

這是由於，count == 0 是用來判斷對應 candidate 的當前存活量，在判斷這一步以前，咱們必須確保數組中當前數字不等於 兩個 candidate中的任意一個。不然，咱們可能會在 count0!=0 && count1==0 && nums[i]==candidate0 時錯誤的將 nums[i] 賦值給 candidate1。

問題二：這裏給 candidate0 candidate1 初始化值爲 0，這會不會影響咱們運行的結果？

不會，由於 candidate0 只會在第一次循環中使用，若是 candidate0 == nums[0]，count++不會引發任何問題。若是 candidate != nums[0] 那麼咱們此時 count==0 從新初始化 candidate0 == nums[0]，一樣不會有任何影響。

問題二擴充：若是咱們初始化 int candidate0 = 0, candidate1 = 1 會不會影響咱們的運行結果呢？

問題三：這裏可以省略 distinct() 嗎？爲何？

不能，儘管咱們在循環中首先經過 if(candidate0 == nums[i]) 和 else if(candidate1 == nums[i]) 兩個 if 判斷，使得 candidate0 != candidate1 在絕大部分下成立，可是在一種極爲特殊的狀況下仍然可能會使得咱們獲得重複的數組。

試想當整個數組全部的數字都相等的時候，咱們 candidate0 和 candidate1 這兩個候選數字中，有一個數字將永遠不會被從新賦值,也就是說，有一個數字將咱們賦給的初值保持到了最後。

在咱們的代碼中，由於咱們將兩個候選數字都初始化 0，因此當數組 全爲0 時會返回錯誤的結果。

這一點，咱們能夠經過將兩個候選數字初始化爲不一樣的數字來解決：int candidate0 = 0,candidate1 = 1，這樣咱們就能夠移除掉 distinct() 了