[LeetCode] Short Encoding of Words 單詞集的短編碼

時間 2019-11-06

標籤 leetcode short encoding words 詞集編碼欄目 Microsoft Office 简体版

原文原文鏈接

Given a list of words, we may encode it by writing a reference string S and a list of indexes A.html

For example, if the list of words is ["time", "me", "bell"], we can write it as S = "time#bell#" and indexes = [0, 2, 5].數組

Then for each index, we will recover the word by reading from the reference string from that index until we reach a "#" character.post

What is the length of the shortest reference string S possible that encodes the given words?編碼

Example:url

Input: words = 
Output: 10
Explanation: S = ].
["time", "me", "bell"]"time#bell#" and indexes = [0, 2, 5

Note:spa

1 <= words.length <= 2000.
1 <= words[i].length <= 7.
Each word has only lowercase letters.

這道題給了咱們一個單詞數組，讓咱們對其編碼，不一樣的單詞之間加入#號，每一個單詞的起點放在一個座標數組內，終點就是#號，能合併的單詞要進行合併，問輸入字符串的最短長度。題意不難理解，難點在於如何合併單詞，咱們觀察題目的那個例子，me和time是可以合併的，只要標清楚其實位置，time的起始位置是0，me的起始位置是2，那麼根據#號位置的不一樣就能夠順利的取出me和time。須要注意的是，若是me換成im，或者tim的話，就不能合併了，由於咱們是要從起始位置到#號以前全部的字符都要取出來。搞清楚了這一點以後，咱們在接着觀察，因爲me是包含在time中的，因此咱們處理的順序應該是先有time#，而後再看可否包含me，而不是先生成了me#以後再處理time，因此咱們能夠得出結論，應該先處理長單詞，那麼就給單詞數組按長度排序一下就行，本身重寫一個comparator就行。而後咱們遍歷數組，對於每一個單詞，咱們都在編碼字符串查找一下，若是沒有的話，直接加上這個單詞，再加一個#號，若是有的話，就能夠獲得出現的位置。好比在time#中查找me，獲得found=2，而後咱們要驗證該單詞後面是否緊跟着一個#號，因此咱們直接訪問found+word.size()這個位置，若是不是#號，說明不能合併，咱們仍是要加上這個單詞和#號。最後返回編碼字符串的長度便可，參見代碼以下：code

解法一：htm

class Solution {
public:
    int minimumLengthEncoding(vector<string>& words) {
        string str = "";
        sort(words.begin(), words.end(), [](string& a, string& b){return a.size() > b.size();});
        for (string word : words) {
            int found = str.find(word);
            if (found == string::npos || str[found + word.size()] != '#') {
                str += word + "#";
            }
        }
        return str.size();
    }
};

咱們再來看一種不用自定義comparator的方法，根據以前的分析，咱們知道實際上是在找單詞的後綴，好比me就是time的後綴。咱們但願將能合併的單詞排在一塊兒，比較好處理，然後綴又很差排序。那麼咱們就將其轉爲前綴，作法就是給每一個單詞翻轉一下，time變成emit，me變成em，這樣咱們只要用默認的字母順序排，就能夠獲得em，emit的順序，那麼能合併的單詞就放到一塊兒了，並且必定是當前的合併到後面一個，那麼就好作不少了。咱們只要判讀當前單詞是不是緊跟着的單詞的前綴，是的話就加0，不是的話就要加上當前單詞的長度並再加1，多加的1是#號。判斷前綴的方法很簡單，直接在後面的單詞中取相同長度的前綴比較就好了。因爲咱們每次都要取下一個單詞，爲了防止越界，只處理到倒數第二個單詞，那麼就要把最後一個單詞的長度加入結果res，並再加1便可，參見代碼以下：blog

解法二：排序

class Solution {
public:
    int minimumLengthEncoding(vector<string>& words) {
        int res = 0, n = words.size();
        for (int i = 0; i < n; ++i) reverse(words[i].begin(), words[i].end());
        sort(words.begin(), words.end());
        for (int i = 0; i < n - 1; ++i) {
            res += (words[i] == words[i + 1].substr(0, words[i].size())) ? 0  : words[i].size() + 1;
        }
        return res + words.back().size() + 1;
    }
};

接下來的這種方法也很巧妙，用了一個HashSet，將全部的單詞先放到這個HashSet中。原理是對於每一個單詞，咱們遍歷其全部的後綴，好比time，那麼就遍歷ime，me，e，而後看HashSet中是否存在這些後綴，有的話就刪掉，那麼HashSet中的me就會被刪掉，這樣保證了留下來的單詞不可能再合併了，最後再加上每一個單詞的長度到結果res，而且同時要加上#號的長度，參見代碼以下：

解法三：

class Solution {
public:
    int minimumLengthEncoding(vector<string>& words) {
        int res = 0;
        unordered_set<string> st(words.begin(), words.end());
        for (string word : st) {
            for (int i = 1; i < word.size(); ++i) {
                st.erase(word.substr(i));
            }
        }
        for (string word : st) res += word.size() + 1;
        return res;
    }
};