給一個字符串S和一個字符串數組T(T中的字符串要比S短許多)，設計一個算法，在字符串S中查找T中的字符串

時間 2019-11-17

原文原文鏈接

給一個字符串S和一個字符串數組T(T中的字符串要比S短許多)，設計一個算法，在字符串S中查找T中的字符串。java

解答

字符串的多模式匹配問題。算法

咱們把S稱爲目標串，T中的字符串稱爲模式串。設目標串S的長度爲m，模式串的平均長度爲 n，共有k個模式串。若是咱們用KMP算法(或BM算法)去處理每一個模式串，判斷模式串是否在目標串中出現，匹配一個模式串和目標串的時間爲O(m+n)，因此總時間複雜度爲：O(k(m+n))。通常實際應用中，目標串每每是一段文本，一篇文章，甚至是一個基因庫，而模式串則是一些較短的字符串，也就是m通常要遠大於n。這時候若是咱們要匹配的模式串很是多(即k很是大)，那麼咱們使用上述算法就會很是慢。這也是爲何KMP或BM通常只用於單模式匹配，而不用於多模式匹配。數組

那麼有哪些算法能夠解決多模式匹配問題呢？貌似還挺多的，Trie樹，AC自動機，WM算法，後綴樹等等。咱們先從簡單的Trie樹入手來解決這個問題。spa

Trie樹，又稱爲字典樹，單詞查找樹或前綴樹，是一種用於快速檢索的多叉樹結構。好比英文字母的字典樹是一個26叉樹，數字的字典樹是一個10叉樹。.net

Trie樹能夠利用字符串的公共前綴來節約存儲空間，這也是爲何它被叫前綴樹。設計

若是咱們有如下單詞：abc, abcd, abd, b, bcd, efg, hig, 能夠構造以下Trie樹： (最右邊的最後一條邊少了一個字母)code

回到咱們的題目，如今要在字符串S中查找T中的字符串是否出現(或查找它們出現的位置)，這要怎麼和Trie扯上關係呢？ip

假設字符串S = 「abcd"，那麼它的全部後綴是：rem

abcd
bcd
cd
d

咱們發現，若是一個串t是S的子串，那麼t必定是S某個後綴的前綴。好比t = bc，那麼它是後綴bcd的前綴；又好比說t = c，那麼它是後綴cd的前綴。字符串

所以，咱們只須要將字符串S的全部後綴構成一棵Trie樹(後綴Trie)，而後查詢模式串是否在該Trie樹中出現便可。若是模式串t的長度爲n，那麼咱們從根結點向下匹配，能夠用O(n)的時間得出t是否爲S的子串。

下圖是BANANAS的後綴Trie：

後綴Trie的查找效率很優秀，若是你要查找一個長度爲n的字符串，只須要O(n)的時間，比較次數就是字符串的長度，至關給力。可是，構造字符串S的後綴Trie卻須要O(m2 )的時間， (m爲S的長度)，及O(m2 )的空間。

package Hard;
import java.util.ArrayList;
import java.util.HashMap;
/**
* Given a string s and an array of smaller strings T, design a method to search s for each small string in T.
譯文：
給一個字符串S和一個字符串數組T(T中的字符串要比S短許多)，設計一個算法，在字符串S中查找T中的字符串。
*
*/
public class S18_8 {
// 後綴樹節點
static class SuffixTreeNode {
HashMap<Character, SuffixTreeNode> children = new HashMap<Character, SuffixTreeNode>();
char value;
ArrayList<Integer> indexes = new ArrayList<Integer>();
public SuffixTreeNode() {
}
public void insertString(String s, int index) {
indexes.add(index);
if (s != null && s.length() > 0) {
value = s.charAt(0);
SuffixTreeNode child = null;
if (children.containsKey(value)) {
child = children.get(value);
} else {
child = new SuffixTreeNode();
children.put(value, child);
}
String remainder = s.substring(1);
child.insertString(remainder, index);
}
}
public ArrayList<Integer> search(String s) {
if (s == null || s.length() == 0) {
return indexes;
} else {
char first = s.charAt(0);
if (children.containsKey(first)) {
String remainder = s.substring(1);
return children.get(first).search(remainder);
}
}
return null;
}
}
// 後綴樹
static class SuffixTree {
SuffixTreeNode root = new SuffixTreeNode();
public SuffixTree(String s) {
for (int i = 0; i < s.length(); i++) {
String suffix = s.substring(i);
root.insertString(suffix, i);
}
}
public ArrayList<Integer> search(String s) {
return root.search(s);
}
}
public static void main(String[] args) {
String testString = "mississippi";
String[] stringList = { "is", "sip", "hi", "sis" };
SuffixTree tree = new SuffixTree(testString);
for (String s : stringList) {
ArrayList<Integer> list = tree.search(s);
if (list != null) {
System.out.println(s + ": " + list.toString());
}
}
}
}