[LeetCode] 839. Similar String Groups 類似字符串組

時間 2019-11-05

標籤 leetcode similar string groups 類似字符串简体版

原文原文鏈接

Two strings X and Y are similar if we can swap two letters (in different positions) of X, so that it equals Y.html

For example, "tars" and "rats" are similar (swapping at positions 0 and 2), and "rats" and "arts" are similar, but "star" is not similar to "tars", "rats", or "arts".git

Together, these form two connected groups by similarity: {"tars", "rats", "arts"} and {"star"}. Notice that "tars" and "arts" are in the same group even though they are not similar. Formally, each group is such that a word is in the group if and only if it is similar to at least one other word in the group.github

We are given a list A of strings. Every string in A is an anagram of every other string in A. How many groups are there?數組

Example 1:app

Input: ["tars","rats","arts","star"]
Output: 2

Note:函數

A.length <= 2000
A[i].length <= 1000
A.length * A[i].length <= 20000
All words in A consist of lowercase letters only.
All words in A have the same length and are anagrams of each other.
The judging time limit has been increased for this question.

這道題定義了字符串之間的一種類似關係，說是對於字符串X和Y，交換X中兩個不一樣位置上的字符，若能夠獲得Y的話，就說明X和Y是類似的。如今給了咱們一個字符串數組，要將類似的字符串放到一個羣組裏，這裏同一個羣組裏的字符串沒必要任意兩個都類似，而是隻要能經過某些結點最終連着就好了，有點像連通圖的感受，將全部連通的結點算做一個羣組，問整個數組能夠分爲多少個羣組。因爲這道題的本質就是求連通圖求羣組個數，既然是圖，考察的就是遍歷啦，就有 DFS 和 BFS 的解法。先來看 DFS 的解法，雖然說本質是圖的問題，但並非真正的圖，沒有鄰接鏈表啥的，這裏判斷兩個結點是否相連其實就是判斷是否類似。因此能夠寫一個判斷是否類似的子函數，實現起來也很是的簡單，只要按位置對比字符，若不相等則 diff 自增1，若 diff 大於2了直接返回 false，由於只有 diff 正好等於2或者0的時候才類似。題目中說了字符串之間都是異構詞，說明字符的種類個數都同樣，只是順序不一樣，就不可能出現奇數的 diff，而兩個字符串徹底相等時也是知足要求的，是類似的。下面來進行 DFS 遍歷，用一個 HashSet 來記錄遍歷過的字符串，對於遍歷到的字符串，若已經在 HashSet 中存在了，直接跳過，不然結果 res 自增1，並調用遞歸函數。這裏遞歸函數的做用是找出全部類似的字符串，首先仍是判斷當前字符串 str 是否訪問過，是的話直接返回，不然加入 HashSet 中。而後再遍歷一遍原字符串數組，每個遍歷到的字符串 word 都和 str 檢測是否類似，類似的話就對這個 word 調用遞歸函數，這樣就能夠找出全部類似的字符串啦，參見代碼以下：this

解法一：code

class Solution {
public:
    int numSimilarGroups(vector<string>& A) {
        int res = 0, n = A.size();
        unordered_set<string> visited;
        for (string str : A) {
            if (visited.count(str)) continue;
            ++res;
            helper(A, str, visited);
        }
        return res;
    }
    void helper(vector<string>& A, string& str, unordered_set<string>& visited) {
        if (visited.count(str)) return;
        visited.insert(str);
        for (string word : A) {
            if (isSimilar(word, str)) {
                helper(A, word, visited);
            }
        }
    }
    bool isSimilar(string& str1, string& str2) {
        for (int i = 0, cnt = 0; i < str1.size(); ++i) {
            if (str1[i] == str2[i]) continue;
            if (++cnt > 2) return false;
        }
        return true;
    }
};

咱們也可使用 BFS 遍從來作，用一個 bool 型數組來標記訪問過的單詞，同時用隊列 queue 來輔助計算。遍歷全部的單詞，假如已經訪問過了，則直接跳過，不然就要標記爲 true，而後結果 res 自增1，這裏跟上面 DFS 的解法原理同樣，要一次找完和當前結點相連的全部結點，只不過這裏用了迭代的 BFS 的寫法。先將當前字符串加入隊列 queue 中，而後進行 while 循環，取出隊首字符串，再遍歷一遍全部字符串，遇到訪問過的就跳過，而後統計每一個字符串和隊首字符串之間的不一樣字符個數，假如最終 diff 爲0的話，說明是同樣的，此時不加入隊列，可是要標記這個字符串爲 true；若最終 diff 爲2，說明是類似的，除了要標記字符串爲 true，還要將其加入隊列進行下一輪查找，參見代碼以下：orm

解法二：htm

class Solution {
public:
    int numSimilarGroups(vector<string>& A) {
        int res = 0, n = A.size();
        vector<bool> visited(n);
        queue<string> q;
        for (int i = 0; i < n; ++i) {
            if (visited[i]) continue;
            visited[i] = true;
            ++res;
            q.push(A[i]);
            while (!q.empty()) {
                string t = q.front(); q.pop();
                for (int j = 0; j < n; ++j) {
                    if (visited[j]) continue;
                    int diff = 0;
                    for (int k = 0; k < A[j].size(); ++k) {
                        if (t[k] == A[j][k]) continue;
                        if (++diff > 2) break;
                    }
                    if (diff == 0) visited[j] = true;
                    if (diff == 2) {
                        visited[j] = true;
                        q.push(A[j]);
                    }
                }
            }
        }
        return res;
    }
};

對於這種羣組歸類問題，很適合使用聯合查找 Union Find 來作，LeetCode 中也有其餘用到這個思路的題目，好比 Friend Circles，Accounts Merge，Redundant Connection II，Redundant Connection，Number of Islands II，Graph Valid Tree，和 Number of Connected Components in an Undirected Graph。都是要用一個 root 數組，每一個點開始初始化爲不一樣的值，若是兩個點屬於相同的組，就將其中一個點的 root 值賦值爲另外一個點的位置，這樣只要是相同組裏的兩點，經過 getRoot 函數獲得相同的值。因此這裏對於每一個結點 A[i]，都遍歷前面全部結點 A[j]，假如兩者不類似，直接跳過；不然將 A[j] 結點的 root 值更新爲i，這樣全部相連的結點的 root 值就相同了，一個羣組中只有一個結點的 root 值會保留爲其的初始值，因此最後只要統計到底還有多少個結點的 root 值仍是初始值，就知道有多少個羣組了，參見代碼以下：

解法三：

class Solution {
public:
    int numSimilarGroups(vector<string>& A) {
        int res = 0, n = A.size();
        vector<int> root(n);
        for (int i = 0; i < n; ++i) root[i] = i;
        for (int i = 1; i < n; ++i) {
            for (int j = 0; j < i; ++j) {
                if (!isSimilar(A[i], A[j])) continue;
                root[getRoot(root, j)] = i;
            }
        }
        for (int i = 0; i < n; ++i) {
            if (root[i] == i) ++res;
        }
        return res;
    }
    int getRoot(vector<int>& root, int i) {
        return (root[i] == i) ? i : getRoot(root, root[i]);
    }
    bool isSimilar(string& str1, string& str2) {
        for (int i = 0, cnt = 0; i < str1.size(); ++i) {
            if (str1[i] == str2[i]) continue;
            if (++cnt > 2) return false;
        }
        return true;
    }
};

Github 同步地址:

https://github.com/grandyang/leetcode/issues/839

相似題目：

Friend Circles

Accounts Merge

Redundant Connection II

Redundant Connection

Number of Islands II

Graph Valid Tree

Number of Connected Components in an Undirected Graph

參考資料：

https://leetcode.com/problems/similar-string-groups/