對於kmp求next數組的理解

時間 2019-12-14

標籤對於 kmp 數組理解简体版

原文原文鏈接

首先附上代碼ios

 1 void GetNext(char* p,int next[])  
 2 {  
 3     int pLen = strlen(p);  
 4     next[0] = -1;  
 5     int k = -1;  
 6     int j = 0;  
 7     while (j < pLen - 1)  
 8     {  
 9         //p[k]表示前綴，p[j]表示後綴  
10         if (k == -1 || p[j] == p[k])   
11         {  
12             ++k;  
13             ++j;  
14             next[j] = k;  
15         }  
16         else   
17         {  
18             k = next[k];  
19         }  
20     }  
21 }

首先咱們得明白，next[j]是表示除了p[j]以外，0~j-1這個串，前綴和後綴的最大匹配長度api

由於咱們下標是從0開始，因此這個最大匹配長度也就是，知足最長的前綴和後綴匹配後的前綴的下一個位置，數組

因此next數組知足以下性質app

對於0~jide

p[j-1]=p[next[j]-1],爲何呢，由於next[j]表示，和以p[j-1]爲最後一個字符的後綴匹配的最長前綴的下一個位置，ui

那麼0~next[j]這個串最後的位置的前一個位置必定和後綴的最後一個位置p[j-1]匹配，因此p[next[j]-1]=p[j-1]this

那麼對於0~next[j]idea

p[next[j]-1]=p[next[next[j]]-1]spa

next[next[j]]表示，和以p[next[j]-1]爲最後一個字符的後綴匹配後的最長前綴的下一個位置，（最長的意思就是下一個位置必定不匹配，若是下一個位置匹配，那麼這個就不是最長），那麼對於0~next[next[j]],它最後一個位置的前一個位置必定和後綴的最後一個位置p[next[j]-1]相等設計

因而

p[next[next[j]-1]]=p[next[j]-1]

因而咱們能推出什麼呢

p[j-1]=p[next[j]-1]=p[next[next[j]]-1]

也就是說記最初的位置爲j,迭代若干次next數組(j=next[j])獲得j'

必定知足p[j-1]=p[j'-1]

另外對於上面的代碼，首先咱們分析每一個位置的狀態

若是它的位置爲0，那麼當不匹配的時候它將陷入一個自環，不斷地只能和第一個0位置匹配，因此咱們設計

0->-1->0 這樣的自環狀態

那麼對於位置不爲0的地方，有兩種狀況

找不到相同的前綴和後綴，若是當前的位置爲j,則說明此時沒有能匹配的知足以p[j-1]爲最後一個字符的後綴，那麼怎麼辦呢，那就只能暴力回溯

到第0個位置

能找到相同的前綴後綴

　　　　　　　　　　此時又分兩種狀況，能匹配上

　　　　　　　　　　若是能匹配上則，下一個位置的next就等於如今的k+1,由於此時的k其實等於next[j],若是p[j]==p[next[j]],那麼next[j+1]=next[j]+1

　　　　　　　　　　由於next[j]沒有算p[j],此時若是再次匹配則說明先後綴公共長多應當多加上表明新加進來的字符p[j]的一個長度

　　　　　　　　　　詳細一點說就是當前的最長匹配前綴從0~next[j]-1,變成了0~next[j]

　　　　　　　　　　不能匹配上

　　　　　　　　　　咱們令j=next[j],由上面的結論推知p[j-1]=p[j'-1],因此咱們能獲得長度不斷遞減的，能匹配以p[j-1]爲最後一個字符的後綴

　　　　　　　　　　若是k回溯到-1,那說明確實沒有相同的先後綴，放棄以前保留的匹配長度，直接將位置回溯到0

實際上不省略內在邏輯的求Next數組的程序是

void initNext(){
    Next[1]=0;
    int p1=0,p2=1;
    while(p2<=m){
        if(p1==0){
            //此時說明它連第一個字符都匹配不了，那麼後續的匹配應當讓前綴的指針停留在第一個字符位置1
            //以p2爲結尾的後綴沒有匹配的前綴
            Next[p2+1]=1;//那麼當它的下一位失配應當比較第一位
            //這裏的含義很是特殊，由於第1位以前的串是空串，也即其實符合了Next數組的定義，空串的最長先後匹配是0
            //比較第一位也就意味着，第一位以前的空串的最大先後匹配是0
            //此處也表示最長先後匹配爲0時,只能將指針回溯到第一位，也即第一位尚未匹配，等待匹配的狀態
            p1=1;//回溯指針，p1懸浮在可能擴展的最後一個字符
            p2++;//考慮計算以p2+1結尾的最大匹配
        }
        else if(b[p1]==b[p2]){
            //當前綴和後綴有一個字符匹配
            Next[p2+1]=Next[p2]+1;//這個轉移表示從1~Next[p2]-1的匹配串延拓到1~Next[p2]
            p1=Next[p2]+1;//p1要移動到當前可能擴展的最後一個字符
            p2++;//考慮計算以p2+1結尾的最大匹配
        }
        else{
            //第一次進入這個狀態時，或者連續這個狀態迭代的第一次，當前的p1其實就等於Next[p2]
            //Next[p2]表示，1~Next[p2]-1,b[Next[p2]-1]=b[p2-1];
            //p1=Next[p1]就等價於p1=Next[Next[p2]];
            //仍然能獲得一個b[Next[Next[p2]]-1]=b[Next[p2]-1]=b[p2-1]的前綴
            p1=Next[p1];//不斷迭代到長度遞減前綴，符合匹配最後一個字符是b[p1-1]
        }
    }
}

而且值得咱們注意的是，

指針p1懸浮在的位置說明該位置狀態不肯定，須要匹配一下確認

p1=minIndex-1時則表示，連第一個字符都匹配不到，應當走自環，讓指針p1再回到minIndex,由於此時minIndex位置待匹配

長度爲len的串，則長度爲len的前綴和後綴必定相等就是它自己，此時最大匹配長度不計數，由於它是沒有意義的

好比a,其實它有前綴a,後綴a但因爲長度爲len因此不計數

因此前綴指針p1,後綴指針p2,初始的時候p2-p1=1也即相鄰，也就是說初始最小長度必定是2，一個在前一個在後

hdu1711求模式串在文本串中出現的最先位置

#include <iostream>
#include <cstdio>
#include <cstring>
using namespace std;
const int maxn=1e5+7;
const int N=1e6+7;
int Next[maxn],a[N],b[maxn],n,m;
void initNext(){
    memset(Next,0,sizeof(Next));
    Next[1]=0;
    int p1=0,p2=1;
    while(p2<=m){
        if(p1==0){
            //此時說明它連第一個字符都匹配不了，那麼後續的匹配應當讓前綴的指針停留在第一個字符位置1
            //以p2爲結尾的後綴沒有匹配的前綴
            Next[p2+1]=1;//那麼當它的下一位失配應當比較第一位
            //這裏的含義很是特殊，由於第1位以前的串是空串，也即其實符合了Next數組的定義，空串的最長先後匹配是0
            //比較第一位也就意味着，第一位以前的空串的最大先後匹配是0
            //此處也表示最長先後匹配爲0時,只能將指針回溯到第一位，也即第一位尚未匹配，等待匹配的狀態
            p1=1;//回溯指針，p1懸浮在可能擴展的最後一個字符
            p2++;//考慮計算以p2+1結尾的最大匹配
        }
        else if(b[p1]==b[p2]){
            //當前綴和後綴有一個字符匹配
            Next[p2+1]=Next[p2]+1;//這個轉移表示從1~Next[p2]-1的匹配串延拓到1~Next[p2]
            p1=Next[p2]+1;//p1要移動到當前可能擴展的最後一個字符
            p2++;//考慮計算以p2+1結尾的最大匹配
        }
        else{
            //第一次進入這個狀態時，或者連續這個狀態迭代的第一次，當前的p1其實就等於Next[p2]
            //Next[p2]表示，1~Next[p2]-1,b[Next[p2]-1]=b[p2-1];
            //p1=Next[p1]就等價於p1=Next[Next[p2]];
            //仍然能獲得一個b[Next[Next[p2]]-1]=b[Next[p2]-1]=b[p2-1]的前綴
            p1=Next[p1];//不斷迭代到長度遞減前綴，符合匹配最後一個字符是b[p1-1]
        }
    }
}
int Match(){
    int i,j;i=1;j=1;
    while(i<=n){
        if(j==0){
            i++;j++;
        }
        else if(a[i]==b[j]){
            i++;j++;
        }
        else j=Next[j];
        if(j==m+1) return i-j+1;
    }
    return -1;
}
int main(){
    int T;scanf("%d",&T);
    while(T--){
        scanf("%d%d",&n,&m);
        for(int i=1;i<=n;++i) scanf("%d",a+i);
        for(int i=1;i<=m;++i) scanf("%d",b+i);
        initNext();
        //for(int i=1;i<=m;++i) printf("nxt:%d,",Next[i]);printf("\n");
        printf("%d\n",Match());
    }
    return 0;
}

可是其實這個程序是不對的，Next[p2+1]=Next[p2]+1;這個轉移不對

真正的轉移是 Next[p2]=i+1;此時i纔是若干次迭代後的Next[p2']

因此對於poj1961,上面的求法就WA了，能過數據只是數據太水

Period

Time Limit: 3000MS		Memory Limit: 30000K
Total Submissions: 17771		Accepted: 8562

Description

For each prefix of a given string S with N characters (each character has an ASCII code between 97 and 126, inclusive), we want to know whether the prefix is a periodic string. That is, for each i (2 <= i <= N) we want to know the largest K > 1 (if there is one) such that the prefix of S with length i can be written as A ^K ,that is A concatenated K times, for some string A. Of course, we also want to know the period K.

Input

The input consists of several test cases. Each test case consists of two lines. The first one contains N (2 <= N <= 1 000 000) – the size of the string S.The second line contains the string S. The input file ends with a line, having the
number zero on it.

Output

For each test case, output "Test case #" and the consecutive test case number on a single line; then, for each prefix with length i that has a period K > 1, output the prefix size i and the period K separated by a single space; the prefix sizes must be in increasing order. Print a blank line after each test case.

Sample Input

3
aaa
12
aabaabaabaab
0

Sample Output

Test case #1
2 2
3 3

Test case #2
2 2
6 2
9 3
12 4

Source

正確代碼以下

#include <iostream>
#include <cstdio>
#include <cstring>
using namespace std;
const int maxn=1e6+7;
char s[maxn];int n,Next[maxn];

int main(){
    int cas=0;
    while(~scanf("%d",&n)){
        if(n!=0&&cas) printf("\n");
        if(n==0) break;
        printf("Test case #%d\n",++cas);
        scanf("%s",s+1);
        memset(Next,-1,sizeof(Next));
        Next[1]=0;int i=0,j=1;
        while(j<=n){
            if(i==0){
                Next[j+1]=1;//1 not 0
                i=1;j++;
                if(Next[j]!=-1&&j>=2&&j<=n&&j%(j-Next[j])==0&&s[j]==s[Next[j]]){
                    printf("%d %d\n",j,j/(j-Next[j]));
                }
            }
            else if(s[i]==s[j]){
                Next[j+1]=i+1;//
                i++;j++;
                if(Next[j]!=-1&&j>=2&&j<=n&&j%(j-Next[j])==0&&s[j]==s[Next[j]]){
                    printf("%d %d\n",j,j/(j-Next[j]));
                }
            }
            else {
                i=Next[i];//not j=Next[j]
            }
        }
    }
    return 0;
}

另外，有一個邏輯錯，週期串不必定是2的倍數

Power Strings

Time Limit: 3000MS		Memory Limit: 65536K
Total Submissions: 48116		Accepted: 20030

Description

Given two strings a and b we define a*b to be their concatenation. For example, if a = "abc" and b = "def" then a*b = "abcdef". If we think of concatenation as multiplication, exponentiation by a non-negative integer is defined in the normal way: a^0 = "" (the empty string) and a^(n+1) = a*(a^n).

Input

Each test case is a line of input representing s, a string of printable characters. The length of s will be at least 1 and will not exceed 1 million characters. A line containing a period follows the last test case.

Output

For each s you should print the largest n such that s = a^n for some string a.

Sample Input

abcd
aaaa
ababab
.

Sample Output

1
4
3

Hint

This problem has huge input, use scanf instead of cin to avoid time limit exceed.

#include <iostream>
#include <cstring>
#include <cstdio>
using namespace std;
const int maxn=1e6+7;
char s[maxn];int Next[maxn];
int main(){
    while(~scanf("%s",s+1)){
        int len=strlen(s+1);
        if(len==1){
            if(s[1]=='.') break;
            printf("1\n");continue;
        }
        memset(Next,-1,sizeof(Next));
        Next[1]=0;int p1=0,p2=1;
        while(p2<=len){
            if(p1==0){
                Next[++p2]=1;
                p1=1;
            }
            else if(s[p1]==s[p2]){
                Next[++p2]=p1+1;
                p1++;
            }
            else p1=Next[p1];
        }
        if(Next[len]==1){
            printf("1\n");continue;
        }
        if((len%(len-Next[len])==0)&&s[len]==s[Next[len]]){
            printf("%d\n",len/(len-Next[len]));
        }    
        else printf("1\n");
    }
    return 0;
}

Oulipo

Time Limit: 1000MS		Memory Limit: 65536K
Total Submissions: 40122		Accepted: 16122

Description

The French author Georges Perec (1936–1982) once wrote a book, La disparition, without the letter 'e'. He was a member of the Oulipo group. A quote from the book:

Tout avait Pair normal, mais tout s’affirmait faux. Tout avait Fair normal, d’abord, puis surgissait l’inhumain, l’affolant. Il aurait voulu savoir où s’articulait l’association qui l’unissait au roman : stir son tapis, assaillant à tout instant son imagination, l’intuition d’un tabou, la vision d’un mal obscur, d’un quoi vacant, d’un non-dit : la vision, l’avision d’un oubli commandant tout, où s’abolissait la raison : tout avait l’air normal mais…

Perec would probably have scored high (or rather, low) in the following contest. People are asked to write a perhaps even meaningful text on some subject with as few occurrences of a given 「word」 as possible. Our task is to provide the jury with a program that counts these occurrences, in order to obtain a ranking of the competitors. These competitors often write very long texts with nonsense meaning; a sequence of 500,000 consecutive 'T's is not unusual. And they never use spaces.

So we want to quickly find out how often a word, i.e., a given string, occurs in a text. More formally: given the alphabet {'A', 'B', 'C', …, 'Z'} and two finite strings over that alphabet, a word W and a text T, count the number of occurrences of W in T. All the consecutive characters of W must exactly match consecutive characters of T. Occurrences may overlap.

Input

The first line of the input file contains a single number: the number of test cases to follow. Each test case has the following format:

One line with the word W, a string over {'A', 'B', 'C', …, 'Z'}, with 1 ≤ |W| ≤ 10,000 (here |W| denotes the length of the string W).
One line with the text T, a string over {'A', 'B', 'C', …, 'Z'}, with |W| ≤ |T| ≤ 1,000,000.

Output

For every test case in the input file, the output should contain a single number, on a single line: the number of occurrences of the word W in the text T.

Sample Input

3
BAPC
BAPC
AZA
AZAZAZA
VERDI
AVERDXIVYERDIAN

Sample Output

1
3
0

kmp統計子串出現的次數，注意當p1>t_len的時候，若是s串中沒有空字符，那麼咱們就至關於考慮，(注意Next數組算到maxIndex+1

T的最後一個空字符和S不匹配，那麼說明T以前的字符和S都匹配了，直接走p1=Next[p1]是正確的，直接p1=1是錯的，會少算答案

好比AZAZAZA，AZA，當第二個串跑到4的時候用Next能夠跳到2，能夠繼續匹配中間的AZA，而直接回溯一，就只能直接算最後一個AZA

#include <iostream>
#include <cstdio>
#include <cstring>
using namespace std;
const int maxn=1e6+7;
char S[maxn],T[maxn];
int Next[maxn];
int main(){
    int t;scanf("%d",&t);
    while(t--){
        scanf("%s%s",T+1,S+1);
        memset(Next,-1,sizeof(Next));
        int s_len=strlen(S+1),t_len=strlen(T+1);
        Next[1]=0;int p1=0,p2=1;
        while(p2<=t_len){
            if(p1==0){
                Next[++p2]=1;p1=1;
            }
            else if(T[p1]==T[p2]){
                Next[++p2]=++p1;
            }
            else p1=Next[p1];
        }
        int ans=0;
        p1=p2=1;
        while(p2<=s_len){
            if(p1==0||T[p1]==S[p2]){
                p1++;p2++;
                if(p1>t_len){
                    ans++;//這裏不用回溯p1指針
                }
            }
            else {
                p1=Next[p1];
            }
        }
        printf("%d\n",ans);
    }
    return 0;
}

Cyclic Nacklace

Time Limit: 2000/1000 MS (Java/Others) Memory Limit: 32768/32768 K (Java/Others)
Total Submission(s): 8622 Accepted Submission(s): 3707

Problem Description

CC always becomes very depressed at the end of this month, he has checked his credit card yesterday, without any surprise, there are only 99.9 yuan left. he is too distressed and thinking about how to tide over the last days. Being inspired by the entrepreneurial spirit of "HDU CakeMan", he wants to sell some little things to make money. Of course, this is not an easy task.

As Christmas is around the corner, Boys are busy in choosing christmas presents to send to their girlfriends. It is believed that chain bracelet is a good choice. However, Things are not always so simple, as is known to everyone, girl's fond of the colorful decoration to make bracelet appears vivid and lively, meanwhile they want to display their mature side as college students. after CC understands the girls demands, he intends to sell the chain bracelet called CharmBracelet. The CharmBracelet is made up with colorful pearls to show girls' lively, and the most important thing is that it must be connected by a cyclic chain which means the color of pearls are cyclic connected from the left to right. And the cyclic count must be more than one. If you connect the leftmost pearl and the rightmost pearl of such chain, you can make a CharmBracelet. Just like the pictrue below, this CharmBracelet's cycle is 9 and its cyclic count is 2:

Now CC has brought in some ordinary bracelet chains, he wants to buy minimum number of pearls to make CharmBracelets so that he can save more money. but when remaking the bracelet, he can only add color pearls to the left end and right end of the chain, that is to say, adding to the middle is forbidden.
CC is satisfied with his ideas and ask you for help.

Input

The first line of the input is a single integer T ( 0 < T <= 100 ) which means the number of test cases.
Each test case contains only one line describe the original ordinary chain to be remade. Each character in the string stands for one pearl and there are 26 kinds of pearls being described by 'a' ~'z' characters. The length of the string Len: ( 3 <= Len <= 100000 ).

Output

For each case, you are required to output the minimum count of pearls added to make a CharmBracelet.

Sample Input

3 aaa abca abcde

Sample Output

0 2 5

Author

possessor WC

注意輸出答案時對於Next[len]=1的特判

abca,abce的區分

以及xyzabcabcqe這個是無週期的

只要前面出現了後綴，後面出現了前綴，那麼咱們能夠只補右面，直接從左往右找週期串，沒必要把中間的週期串摳出來

好比exyzabcabcqe,qexyzabcabcqe,qexyzabcabcqex,xyzabcabcqex,xyzabcabcqexy等

以及週期串的長度必定是len-Next[len],注意len-Next[len]=1的特判和Next[len]=1的特判

#include <iostream>
#include <cstdio>
#include <cstring>
using namespace std;
const int maxn=1e5+7;
char s[maxn];int Next[maxn];
int main(){
    int T;scanf("%d",&T);
    while(T--){
        scanf("%s",s+1);
        memset(Next,-1,sizeof(Next));
        Next[1]=0;int p1=0,p2=1,len=strlen(s+1);
        while(p2<=len){
            if(p1==0){
                Next[++p2]=1;p1=1;
            }
            else if(s[p1]==s[p2]){
                Next[++p2]=++p1;
            }
            else p1=Next[p1];
        }
        int mod=len-Next[len];
        if(Next[len]==1){
            if(s[len]==s[1]) printf("%d\n",len-2);//週期串長度自己變成len-1,在減去已經有的結尾，len-1-1
            else           printf("%d\n",len);
            continue;
        }
        if(mod==1){
            if(s[len]==s[Next[len]]) printf("0\n");
            else                      printf("%d\n",len);
        }
        else{
            if(len%mod==0) printf("0\n");
            else            printf("%d\n",mod-len%mod);
        }
    }
    return 0;
}

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。