串聯全部單詞的子串

時間 2019-11-11

標籤串聯全部單詞子串简体版

原文原文鏈接

原題

　　You are given a string, s, and a list of words, words, that are all of the same length. Find all starting indices of substring(s) in s that is a concatenation of each word in words exactly once and without any intervening characters.
　　For example, given:
　　s: "barfoothefoobarman"
　　words: ["foo", "bar"]
　　You should return the indices: [0,9].
　　(order does not matter).java

題目大意

　　給定一個字符串s和一個字符串數組words，wrods中的字符串長度都相等，找出s中全部的子串剛好包含words中全部字符各一次，返回子串的起始位置。算法

解題思路

　　把words轉化爲一個HashMap數組

代碼實現

算法實現類spa

import java.util.*;

public class Solution {

    public List<Integer> findSubstring(String s, String[] words) {
        List<Integer> list = new ArrayList<Integer>();
        if (words.length == 0) return list;
        int wLen = words[0].length();
        int len = s.length();
        if (len < wLen * words.length) return list;
        Map<String, Integer> mapW = new HashMap<String, Integer>();
        for (String word : words)
            mapW.put(word, mapW.containsKey(word) ? mapW.get(word) + 1 : 1);
        for (int start = 0; start < wLen; start++) {
            int pos = start;
            int tStart = -1;
            Map<String, Integer> mapT = new HashMap<String, Integer>(mapW);
            while (pos + wLen <= len) {
                String cand = s.substring(pos, pos + wLen);
                if (!mapW.containsKey(cand)) {
                    if (tStart != -1) mapT = new HashMap<String, Integer>(mapW);
                    tStart = -1;
                } else if (mapT.containsKey(cand)) {
                    tStart = tStart == -1 ? pos : tStart;
                    if (mapT.get(cand) == 1) mapT.remove(cand);
                    else mapT.put(cand, mapT.get(cand) - 1);
                    if (mapT.isEmpty()) list.add(tStart);
                } else {
                    while (tStart < pos) {
                        String rCand = s.substring(tStart, tStart + wLen);
                        if (cand.equals(rCand)) {
                            tStart += wLen;
                            if (mapT.isEmpty()) list.add(tStart);
                            break;
                        }
                        tStart += wLen;
                        mapT.put(rCand, mapT.containsKey(rCand) ? mapT.get(rCand) + 1 : 1);
                    }
                }
                pos += wLen;
            }
        }
        return list;
    }
}

這道題讓咱們求串聯全部單詞的子串，就是說給定一個長字符串，再給定幾個長度相同的單詞，讓咱們找出串聯給定全部單詞的子串的起始位置，仍是蠻有難度的一道題。這道題咱們須要用到兩個哈希表，第一個哈希表先把全部的單詞存進去，而後從開頭開始一個個遍歷，中止條件爲當剩餘字符個數小於單詞集裏全部字符的長度。這時候咱們須要定義第二個哈希表，而後每次找出給定單詞長度的子串，看其是否在第一個哈希表裏，若是沒有，則break，若是有，則加入第二個哈希表，但相同的詞只能出現一次，若是多了，也break。若是正好匹配完給定單詞集裏全部的單詞，則把i存入結果中，具體參見代碼以下：.net

解法一：設計

class Solution {
public:
    vector<int> findSubstring(string s, vector<string>& words) {
        vector<int> res;
        if (s.empty() || words.empty()) return res;
        int n = words.size(), m = words[0].size();
        unordered_map<string, int> m1;
        for (auto &a : words) ++m1[a];
        for (int i = 0; i <= (int)s.size() - n * m; ++i) {
            unordered_map<string, int> m2;
            int j = 0; 
            for (j = 0; j < n; ++j) {
                string t = s.substr(i + j * m, m);
                if (m1.find(t) == m1.end()) break;
                ++m2[t];
                if (m2[t] > m1[t]) break;
            }
            if (j == n) res.push_back(i);
        }
        return res;
    }
};

這道題還有一種O(n)時間複雜度的解法，設計思路很是巧妙，可是感受很難想出來，博主目測還未到達這種水平。這種方法再也不是一個字符一個字符的遍歷，而是一個詞一個詞的遍歷，好比根據題目中的例子，字符串s的長度n爲18，words數組中有兩個單詞(cnt=2)，每一個單詞的長度len均爲3，那麼遍歷的順序爲0，3，6，8，12，15，而後偏移一個字符1，4，7，9，13，16，而後再偏移一個字符2，5，8，10，14，17，這樣就能夠把全部狀況都遍歷到，咱們仍是先用一個哈希表m1來記錄words裏的全部詞，而後咱們從0開始遍歷，用left來記錄左邊界的位置，count表示當前已經匹配的單詞的個數。而後咱們一個單詞一個單詞的遍歷，若是當前遍歷的到的單詞t在m1中存在，那麼咱們將其加入另外一個哈希表m2中，若是在m2中個數小於等於m1中的個數，那麼咱們count自增1，若是大於了，那麼須要作一些處理，好比下面這種狀況, s = barfoofoo, words = {bar, foo, abc}, 咱們給words中新加了一個abc，目的是爲了遍歷到barfoo不會中止，那麼當遍歷到第二foo的時候, m2[foo]=2, 而此時m1[foo]=1，這是後已經不連續了，因此咱們要移動左邊界left的位置，咱們先把第一個詞t1=bar取出來，而後將m2[t1]自減1，若是此時m2[t1]<m1[t1]了，說明一個匹配沒了，那麼對應的count也要自減1，而後左邊界加上個len，這樣就能夠了。若是某個時刻count和cnt相等了，說明咱們成功匹配了一個位置，那麼將當前左邊界left存入結果res中，此時去掉最左邊的一個詞，同時count自減1，左邊界右移len，繼續匹配。若是咱們匹配到一個不在m1中的詞，那麼說明跟前面已經斷開了，咱們重置m2，count爲0，左邊界left移到j+len，參見代碼以下：code

解法二：rem

class Solution {
public:
    vector<int> findSubstring(string s, vector<string>& words) {
        if (s.empty() || words.empty()) return {};
        vector<int> res;
        int n = s.size(), cnt = words.size(), len = words[0].size();
        unordered_map<string, int> m1;
        for (string w : words) ++m1[w];
        for (int i = 0; i < len; ++i) {
            int left = i, count = 0;
            unordered_map<string, int> m2;
            for (int j = i; j <= n - len; j += len) {
                string t = s.substr(j, len);
                if (m1.count(t)) {
                    ++m2[t];
                    if (m2[t] <= m1[t]) {
                        ++count;
                    } else {
                        while (m2[t] > m1[t]) {
                            string t1 = s.substr(left, len);
                            --m2[t1];
                            if (m2[t1] < m1[t1]) --count;
                            left += len;
                        }
                    }
                    if (count == cnt) {
                        res.push_back(left);
                        --m2[s.substr(left, len)];
                        --count;
                        left += len;
                    }
                } else {
                    m2.clear();
                    count = 0;
                    left = j + len;
                }
            }
        }
        return res;
    }
};

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。