KMP算法番外篇--求解next數組

時間 2019-11-21

標籤 kmp 算法求解數組简体版

原文原文鏈接

KMP算法實現字符串的模式匹配的時間複雜度比樸素的模式匹配好不少，可是它時間效率的提升是有前提的，那就是：模式串的重複率很高，否則它的效率也不會凸顯出來。在實際的應用中，KMP算法不算是使用率很高的一個算法，可是它的核心的那點東西倒是使用率很高的，那就是next前綴數組的求解思路。在此次筆記中就單獨摘出來，說一下前綴數組的求解。ios

1. next前綴數組的定義

無論作題仍是推到算法，永遠記住定義，這時最重要的東西。算法

2. next數組的暴力求解

這種方法的主要思路是：數組

爲了求解n_j的值，把的全部的前綴和後綴都找出來，而後從最大的開始匹配，直到找到合適的最長公共前綴後綴。若是沒有，那麼n_j的值就是0。post

先後綴的選取方式：spa

暴力算法就是在這裏面不斷的從最大的那個前綴和後綴逐一的匹配。code

算法描述：blog

(1) 根據定義，初始化n[0] = –1。繼承

(2) 從模式串的下標爲1的位置，依次遍歷整個模式串。對於每個字符，當到達其下標j時，令k=j-1。遞歸

(3) 尋找它前面的字符串的最大公共前綴後綴，也就是判斷的真假？ip

(4) 若是知足條件，令next[j]=k；若是不知足條件k--，繼續執行(3)的步驟，直到k==0，而後令next[j]= 0。

代碼實現：

#include <iostream>
#include <stdlib.h>
bool IsPatternMatch(char *p, int compareNum, int totalNum);
void ViolentGetNext(char *p, int *next);

void main()
{
    int next[20];
    char *str = "agctagcagctagctg";
    ViolentGetNext(str, next);

    system("pause");
}

void ViolentGetNext(char *p, int *next)
{
    int pLen = strlen(p);
    int k = 0;
    next[0] = -1;

    for(int j = 1; j < pLen; j++)
    {
        k = j - 1;
        while(k > 0)
        {
            if(IsPatternMatch(p, k , j))
                break;
            else
                k--;
        }// while

        next[j] = k;
    }// for
}
//param:copareNum表明了要比較的字節數
//param:totalNum表明了要比較的字節數
//上面的兩個參數的做用就是定界前綴和後綴可能的範圍
bool IsPatternMatch(char *p, int compareNum, int totalNum)
{
    int i = 0;
    int j = totalNum - compareNum;

    for(; i < compareNum; i++, j++)
    {
        if(p[i] != p[j])
        {
            return false;
        }
    }

    return true;
}

具體的例子，假設字符串爲ABCDABD
n[5]的求解過程以下：
k=4 
ABCD≠BCDA,k=3 
ABC≠CDA,k=2 
AB≠DA,k=1 
A==A,n[5]=k

3. next數組的遞歸求解

暴力求解每次在計算next[j]的時候都是獨立的，而實際上求解next[j+1]是能夠利用到next[0…j]的，這裏的遞歸算法就是這樣實現的。

設模式串爲，如今已經計算出了next[0…j]，如何計算next[j+1]？

利用前面求解的數值（這也是算法改進的地方，不讓每一個next元素都獨立的計算），若已知next[j]=k，則對於模式串，確定有這樣的關係：

因此算法的描述能夠是這樣的：

(1) 若是k==-1（只有第一個字符的next值是-1），說明如今的位置是第二個位置，還不能算第二個它自己，因此next[j+1]=0，算法結束。

(2) 若是，理解這裏的k是怎麼從next[j]的值轉換到了字符的下標值。則next[j+1]=k+1，算法結束。

提示：前面有分析過，求解next數組的過程的快捷方法就是不讓他們獨立的計算，仍是繼承前面計算好了的對稱性。知道了next[j]的對稱性，只須要在考察一下前綴和後綴的下一個字符是否相等就能夠了。p_k和p_j就是以前最長前綴和後綴的下一個字符。

(3) 那麼這個k’從哪裏來的呢？看這個式子的兩端就知道k’=next[k]。理解好上面的這個式子，就知道k‘是怎麼來的了。

(4) 將k’賦值給k，轉到步驟(1)。

代碼實現：

//the recursion method to abtain the next array
//pLen is the length of the string
void RecursionGetNext(char *p, int pLen, int *next)
{

    if(pLen == 1)
    {
        next[pLen - 1] = -1;
        return;
    }

    RecursionGetNext(p, pLen - 1, next);

    //pLen represents the number of the string
    //pLen - 1 represents the index of the last character,that is the character that will be calculated in the next array.
    //pLen - 1 - 1 represents the index of the sub-last character that has been calculated in the next array.
    int k = next[pLen - 2];

    //k==-1 is a label showing that there is no prefix matching with postfix and the currently added character can not match neither.
    //k==0 can only show that there is no prefix mathching with postfix,but pk may be match with pj
    while(k >= 0)
    {
        if(p[pLen-2] == p[k])
        {
            break;
        }
        else
        {
            k = next[k];
        }
    }//while

    next[pLen -1] = k + 1;

}//RecursionGetNext()

4. next數組的遞歸展開求解

void GetNext(char *p, int *next)
{
    int pLen = strlen(p);
    int j = 0;
    int k = -1;
    next[0] = -1;

    while(j < pLen - 1)
    {
        //accroding to the depiction of the algorithm,the procedure can be programmed below:
        //if(k == -1)
        //{
        //    ++j;
        //    ++k;
        //    next[j] = k;
        //}
        //else if(p[j] == p[k])
        //{
        //    ++j;
        //    ++k;
        //    next[j] = k;
        //}
        //but the fist two procedure can be reduced to one case:

        //p[j] == p[k] shows that we can inherite the feature of the string that matched alreay
        //k==-1 shows two circumstance: 1.the beginning of the algorithm 2.there is no matched prefix and postfix and the last character is also defferent with the first one
        if(k == -1 || p[j] == p[k])
        {
            ++j;
            ++k;
            next[j] = k;
        }
        else
        {
            k = next[k];
        }
    }//while
}