[leetcode] 287. Find the Duplicate Number

時間 2019-12-15

標籤 leetcode duplicate number 简体版

原文原文鏈接

題目大意

https://leetcode.com/problems/find-the-duplicate-number/description/html

287. Find the Duplicate Numberpython

Given an array nums containing n + 1 integers where each integer is between 1 and n (inclusive), prove that at least one duplicate number must exist. Assume that there is only one duplicate number, find the duplicate one.web

Example 1:算法

Input: [1,3,4,2,2]
Output: 2

Example 2:數組

Input: [3,1,3,4,2]
Output: 3

Note:數據結構

You must not modify the array (assume the array is read only).
You must use only constant, O(1) extra space.
Your runtime complexity should be less than O(n2).
There is only one duplicate number in the array, but it could be repeated more than once.

給定一個包含n + 1個整數的數組，其中每個整數均介於[1, n]之間，證實其中至少有一個重複元素存在。假設只有一個數字出現重複，找出這個重複的數字。app

注意：less

不能夠修改數組（假設數組是隻讀的）
只能使用常數空間
運行時間複雜度應該小於O(n2)
數組中只存在一個重複數，可是可能重複屢次

解題思路

Note
The first two approaches mentioned do not satisfy the constraints given in the prompt, but they are solutions that you might be likely to come up with during a technical interview. As an interviewer, I personally would not expect someone to come up with the cycle detection solution unless they have heard it before.dom

Proof
Proving that at least one duplicate must exist in nums is simple application of the pigeonhole principle. Here, each number in nums is a "pigeon" and each distinct number that can appear in nums is a "pigeonhole". Because there are n+1n+1 numbers are nn distinct possible numbers, the pigeonhole principle implies that at least one of the numbers is duplicated.ide

思路一：Sorting 排序

Intuition

If the numbers are sorted, then any duplicate numbers will be adjacent in the sorted array.

Algorithm

Given the intuition, the algorithm follows fairly simply. First, we sort the array, and then we compare each element to the previous element. Because there is exactly one duplicated element in the array, we know that the array is of at least length 2, and we can return the duplicate element as soon as we find it.

class Solution:
    def findDuplicate(self, nums):
        nums.sort()
        for i in range(1, len(nums)):
            if nums[i] == nums[i-1]:
                return nums[i]

Complexity Analysis

Time complexity : O(nlgn)O(nlgn)

　　　The sort invocation costs O(nlgn)O(nlgn) time in Python and Java, so it dominates the subsequent linear scan.

Space complexity : O(1)O(1) (or O(n)O(n))

　　　Here, we sort nums in place, so the memory footprint is constant. If we cannot modify the input array, then we must allocate linear space for a copy of nums and sort that instead.

思路二：Set 集合

Intuition

If we store each element as we iterate over the array, we can simply check each element as we iterate over the array.

Algorithm

In order to achieve linear time complexity, we need to be able to insert elements into a data structure (and look them up) in constant time. A Set satisfies these constraints nicely, so we iterate over the array and insert each element into seen. Before inserting it, we check whether it is already there. If it is, then we found our duplicate, so we return it.

class Solution:
    def findDuplicate(self, nums):
        seen = set()
        for num in nums:
            if num in seen:
                return num
            seen.add(num)

Complexity Analysis

Time complexity : O(n)O(n)

　　　Set in both Python and Java rely on underlying hash tables, so insertion and lookup have amortized constant time complexities. The algorithm is therefore linear, as it consists of a for loop that performs constant work nn times.

Space complexity : O(n)O(n)

　　　In the worst case, the duplicate element appears twice, with one of its appearances at array index n-1n−1. In this case, seen will contain n-1n−1 distinct values, and will therefore occupy O(n)O(n) space.

思路三：Floyd's Tortoise and Hare (Cycle Detection) 環檢測

Intuition

https://leetcode.com/problems/linked-list-cycle-ii/solution/

Algorithm

First off, we can easily show that the constraints of the problem imply that a cycle must exist. Because each number in nums is between 11 and nn, it will necessarily point to an index that exists. Therefore, the list can be traversed infinitely, which implies that there is a cycle. Additionally, because 00 cannot appear as a value in nums, nums[0] cannot be part of the cycle. Therefore, traversing the array in this manner from nums[0] is equivalent to traversing a cyclic linked list. Given this, the problem can be solved just like Linked List Cycle II.

To see the algorithm in action, check out the animation from: https://leetcode.com/problems/find-the-duplicate-number/solution/

class Solution:
    def findDuplicate(self, nums):
        # Find the intersection point of the two runners.
        tortoise = nums[0]
        hare = nums[0]
        while True:
            tortoise = nums[tortoise]
            hare = nums[nums[hare]]
            if tortoise == hare:
                break
        
        # Find the "entrance" to the cycle.
        ptr1 = nums[0]
        ptr2 = tortoise
        while ptr1 != ptr2:
            ptr1 = nums[ptr1]
            ptr2 = nums[ptr2]
        
        return ptr1

Complexity Analysis

Time complexity : O(n)O(n)

　　　For detailed analysis, refer to Linked List Cycle II.

Space complexity : O(1)O(1)

　　　For detailed analysis, refer to Linked List Cycle II.

另外一種解說：

參考：http://keithschwarz.com/interesting/code/find-duplicate/FindDuplicate.python.html

以及博文：http://bookshadow.com/weblog/2015/07/10/leetcode-linked-list-cycle-ii/

這道題（聽說）花費了計算機科學界的傳奇人物Don Knuth 24小時才解出來。而且我只見過一我的（Keith Amling）用更短期解出此題。

問題的第一部分 - 證實至少存在一個重複元素 - 是鴿籠原理的直接應用。若是元素的範圍是[1, n]，那麼只存在n種不一樣的值。若是有n+1個元素，其中一個必然重複。

問題的第二部分 - 在給定約束條件下尋找重複元素 - 可就難多了。 要解決這個問題，咱們須要敏銳的洞察力，使問題經過一列的轉化，變爲一個徹底不一樣的問題。

解決本題須要的主要技巧就是要注意到：因爲數組的n + 1個元素範圍從1到n，咱們能夠將數組考慮成一個從集合{1, 2, ..., n}到其自己的函數f。這個函數的定義爲f(i) = A[i]。基於這個設定，重複元素對應於一對下標i != j知足 f(i) = f(j)。咱們的任務就變成了尋找一對(i, j)。一旦咱們找到這個值對，只需經過f(i) = A[i]便可得到重複元素。

可是咱們怎樣尋找這個重複值呢？這變成了計算機科學界一個廣爲人知的「環檢測」問題。問題的通常形式以下：給定一個函數f，序列x_i的定義爲

    x_0     = k       (for some k)
    x_1     = f(x_0)
    x_2     = f(f(x_0))
    ...
    x_{n+1} = f(x_n)

假設函數f從定義域映射到它自己，此時會有3種狀況。首先，若是定義域是無窮的，則序列是無限長而且沒有循環的。例如，函數 f(n) = n + 1，在整數範圍內知足這個性質 - 沒有數字是重複的。 第二， 序列多是一個閉合循環，這意味着存在一個i使得x_0 = x_i。在這個例子中，序列在一組值內無限循環。第三，序列有多是的「ρ型的」，此時序列看起來像下面這樣：

      x_0 -> x_1 -> ... x_k -> x_{k+1} ... -> x_{k+j}
                         ^                       |
                         |                       |
                         +-----------------------+

也就是說，序列從一列鏈條型的元素開始進入一個環，而後無限循環。咱們將環的起點稱爲環的「入口」。

對於從數組中尋找重複元素這個問題，考慮序列從位置n開始重複調用函數f。亦即從數組的最後一個元素開始，而後移動到其元素值對應的下標處，而且重複此過程。能夠獲得：此序列是ρ型的。要明白這一點，須要注意到其中必定有環，由於數組是有限的而且當訪問n個元素時，必定會對某個元素訪問兩次。不管從數組的哪個位置開始，這都是成立的。

另外，注意因爲數組元素範圍1到n，所以不存在值爲0的元素。進而，從數組的第一個元素開始調用一次函數f以後，不再會回到這裏。這意味着第一個元素不會是環的一部分，但若是咱們繼續重複調用函數f，最終總會訪問某個節點兩次。從0節點開始的鏈條與環形相接，使得其形狀必定是ρ型。

此外，考慮一下環的入口。因爲節點位於環的入口，必定存在兩個輸入，其對應的函數f的輸出值都等於入口元素下標。要使其成立，必定存在兩個下標i != j，知足f(i) = f(j)，亦即A[i] = A[j]。於是環的入口必定是重複值。

這是由Robert Floyd提出的一個著名算法，給定一個ρ型序列，在線性時間，只使用常數空間尋找環的起點。這個算法常常被稱爲「龜兔」算法，至於緣由下面就明白了。
算法背後的思想是定義兩個變量。首先，令c爲進入環的鏈的長度，而後令l爲環的長度。接下來，令l'爲大於c的l的倍數的最小值。能夠得出結論：對於上文定義的任意ρ型序列的l'，都有
 
     x_{l'} = x_{2l'}
 
證實實際上很是直觀而且具備自明性 - 這是計算機科學中我最喜歡的證實之一。思路就是因爲l'至少爲c，它必定包含在環內。同時，因爲l'是環長度的倍數，咱們能夠將其寫做ml，其中m爲常數。若是咱們從位置x_{l'}開始（其在環內），而後再走l'步到達x_{2l'}，則咱們剛好繞環m次，正好回到起點。

Floyd算法的一個關鍵點就是即便咱們不明確知道c的值，依然能夠在O(l')時間內找到值l'。思路以下。咱們追蹤兩個值"slow"和"fast"，均從x_0開始。而後迭代計算
 
     slow = f(slow)
     fast = f(f(fast))
 
咱們重複此步驟直到slow與fast彼此相等。此時，咱們可知存在j知足slow = x_j，而且fast = x_{2j}。 因爲x_j = x_{2j}，可知j必定至少爲c，由於此時已經在環中。另外，可知j必定是l的倍數，由於x_j = x_{2j}意味着在環內再走j步會獲得一樣的結果。最後，j必定是大於c的l的最小倍數，由於若是存在一個更小的大於c的l的倍數，咱們必定會在到達j以前到達那裏。因此，咱們必定有j = l'，意味着咱們能夠在不知道環的長度或者形狀的狀況下找到l'。

要完成整個過程，咱們須要明白如何使用l'來找到環的入口（記爲x_c）。要作到這一步，咱們再用最後一個變量，記爲"finder"，從x_0出發。而後迭代重複執行過程：

 
    finder = f(finder)
    slow   = f(slow)
 
直到finder = slow爲止。咱們可知：(1) 二者必定會相遇 (2) 它們會在環的入口相遇。 要理解這兩點，咱們注意因爲slow位於x_{l'}，若是咱們向前走c步，那麼slow會到達位置x_{l' + c}。因爲l'是環長度的倍數，至關於向前走了c步，而後繞環幾圈回到原位。換言之，x_{l' + c} = x_c。另外，考慮finder變量在行進c步以後的位置。 它由x_0出發，所以c步以後會到達x_c。這證實了(1)和(2)，由此咱們已經證實二者最終會相遇，而且相遇點就是環的入口。

算法的美妙之處在於它只用O(1)的額外存儲空間來記錄兩個不一樣的指針 - slow指針和fast指針（第一部分），以及finder指針（第二部分）。可是在此之上，運行時間是O(n)的。要明白這一點，注意slow指針追上fast指針的時間是O(l')。因爲l'是大於c的l的最小倍數，有兩種狀況須要考慮。首先，若是l > c，那麼就是l。 不然，若是l < c，那麼咱們可知必定存在l的倍數介於c與2c之間。要證實這一點，注意在範圍c到2c內，有c個不一樣的值，因爲l < c，其中必定有值對l取模運算等於0。最後，尋找環起點的時間爲O(c)。這給出了總的運行時間至多爲O(c + max{l, 2c})。全部這些值至多爲n，所以算法的運行時間複雜度爲O(n)。

class Solution(object):
    def findDuplicate(self, nums):
        # The "tortoise and hare" step.  We start at the end of the array and try
        # to find an intersection point in the cycle.
        slow = 0
        fast = 0
    
        # Keep advancing 'slow' by one step and 'fast' by two steps until they
        # meet inside the loop.
        while True:
            slow = nums[slow]
            fast = nums[nums[fast]]
    
            if slow == fast:
                break
    
        # Start up another pointer from the end of the array and march it forward
        # until it hits the pointer inside the array.
        finder = 0
        while True:
            slow   = nums[slow]
            finder = nums[finder]
    
            # If the two hit, the intersection index is the duplicate element.
            if slow == finder:
                return slow

時間複雜度O(n)

思路四：二分查找（Binary Search）+ 鴿籠原理（Pigeonhole Principle）

參考維基百科關於鴿籠原理的詞條連接：https://en.wikipedia.org/wiki/Pigeonhole_principle

「不容許修改數組」與「常數空間複雜度」這兩個限制條件意味着：禁止排序，而且不能使用Map等數據結構

小於O(n2)的運行時間複雜度能夠聯想到使用二分將其中的一個n化簡爲log n

參考LeetCode Discuss：https://leetcode.com/discuss/60830/python-solution-explanation-without-changing-input-array

二分枚舉答案範圍，使用鴿籠原理進行檢驗

根據鴿籠原理，給定n + 1個範圍[1, n]的整數，其中必定存在數字出現至少兩次。

假設枚舉的數字爲 n / 2：

遍歷數組，若數組中不大於n / 2的數字個數超過n / 2，則能夠肯定[1, n /2]範圍內必定有解，

不然能夠肯定解落在(n / 2, n]範圍內。

class Solution(object):
    def findDuplicate(self, nums):
        """
        :type nums: List[int]
        :rtype: int
        """
        low, high = 1, len(nums) - 1
        while low <= high:
            mid = (low + high) >> 1
            cnt = sum(x <= mid for x in nums)
            if cnt > mid:
                high = mid - 1
            else:
                low = mid + 1
        return low

時間複雜度O(n * log n)

參考：

https://leetcode.com/problems/find-the-duplicate-number/solution/

http://bookshadow.com/weblog/2015/09/28/leetcode-find-duplicate-number/

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。