https://leetcode.com/problems/encode-and-decode-tinyurlhtml
一種作法是對於每個請求的longURL,從0開始按遞增的順序用一個整數與之對應,這個整數就是對longURL的編碼,同時作爲索引;對短網址解碼時,解析出短網址中的整數信息,查找原來的長網址便可。git
class Solution { public: // Encodes a URL to a shortened URL. string encode(string longUrl) { long_urls.push_back(longUrl); return "http://t.com/" + std::to_string(long_urls.size()-1); } // Decodes a shortened URL to its original URL. string decode(string shortUrl) { auto pos = shortUrl.find_last_of('/'); auto id = std::stoi(shortUrl.substr(pos+1)); return long_urls[id]; } private: vector<string> long_urls; };
遞增方法的好處是編碼的結果都是惟一的,可是缺點也是明顯的:對相同的longURL,每次編碼的結果都不一樣,存在id和存儲資源的浪費。改用哈希表能夠解決空間浪費的問題,可是遞增方法會把短網址的計數器暴露給用戶,也許存在安全隱患。安全
改進的方法是用字符串去設計短網址,僅僅考慮數字和字母的話,就有10+2*26=62
種,變長編碼天然是可行的,可是編碼規則可能比較複雜,定長編碼足夠了。至於多長,聽說新浪微博是用7個字符的,$62^7 \approx 3.5 \times 10^{12}$
,這已經遠遠超過當今互聯網的URL總數了。因而,一個可行的作法是:對每一個新到來的長URL,隨機從62個字符中選出7個構造它的key,並存入哈希表中(若是key已經用過,就繼續生成新的,直到不重複爲止,不太重複的機率是很低的);解碼短網址時,在哈希表中查找對應的key便可。app
另外,爲了避免浪費key,能夠再開一個哈希表,記錄每一個長網址對應的短網址。dom
class Solution { public: Solution() { short2long.clear(); long2short.clear(); dict = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"; len_tiny = 7; srand(time(NULL)); } // Encodes a URL to a shortened URL. string encode(string longUrl) { if (long2short.count(longUrl)) { return "http://t.com/" + long2short[longUrl]; } string tiny = dict.substr(0, len_tiny); while (short2long.count(tiny)) { std::random_shuffle(dict.begin(), dict.end()); tiny = dict.substr(0, len_tiny); } long2short[longUrl] = tiny; short2long[tiny] = longUrl; return "http://t.com/" + tiny; } // Decodes a shortened URL to its original URL. string decode(string shortUrl) { auto pos = shortUrl.find_last_of('/'); auto tiny = shortUrl.substr(pos+1); return short2long.count(tiny)? short2long[tiny] : shortUrl; } private: unordered_map<string, string> short2long, long2short; string dict; int len_tiny; };
參考:編碼