SSE圖像算法優化系列二十三: 基於value-and-criterion structure 系列濾波器(如Kuwahara,MLV,MCV濾波器)的優化。

       基於value-and-criterion structure方式的實現的濾波器在原理上其實比較簡單,感受下面論文中得一段話已經描述的比較清晰了,直接貼英文吧,感受翻譯過來反而失去了原始的韻味了。html

       The value-and-criterion filter structure  is based on the geometrical structure of mathematical morphology, but allows the use of a much wider variety of operations (both linear and nonlinear) than standard morphology. Morphological filters usually only use extreme order statistic operations (maximum and minimum). The new filter structure is much more flexible, but still retains the geometric characteristics of mathematical morphology.算法

       The value-and-criterion filter structure is derived from the 「compound」 structure of the morphological operations opening and closing. Opening and closing are both sequential operations; opening consists of the successive application of the basic morphological operations of erosion and dilation, and closing is dilation followed by erosion. Like opening or closing, a value- and-criterion function has two distinct stages. However, a value-and-criterion filter can have two separate operations acting in parallel in the first stage, as opposed to a single operation (such as erosion in the case of opening). The second stage of a value-and-criterion function uses the output of one of these first stage operations to determine which output from the other first stage operation is selected as the final output。數組

  簡單的理解,value-and-criterion structure這種結構的濾波器有兩個元素,一個是值(Value)函數,一個是評判標準(criterion)函數,普通的過程咱們咱們只須要計算出值函數,就結束了,而value-and-criterion structure這種結構須要根據評判標準來選擇最後的值函數的值。app

  經典的此類函數有Kuwahara濾波器,Mean of Least Variance(MLV)濾波器,Minimum Coefficient of Variation(MCV)濾波器等,咱們稍微介紹一下。ide

  1、Kuwahara濾波器函數

       很明顯,這個是由Kuwahara等人提出來的。其基本原理是:計算圖像模板中鄰域內的均值和方差,選擇圖像灰度值較爲均勻的區域的均值替代模板中心像素灰度值。而圖像模板中較爲均勻的區域所對應的方差是最小的。爲了獲取圖像模板中較爲均勻區域的均值,濾波區域R被劃分爲K個重疊的子區域R1,R2,…, Rk。位於圖像中位置爲(u,v)處的每個像素,其所對應的每個模板子區域的均值和方差都將被計算。那麼標準的Kuwahara濾波器只計算四塊鄰域的方差,取方差最小的鄰域計算其平均值,獲得的結果做爲目標像素的新值。固然後來還有Tomita-Tsuji,Nagao-Matsuyama’s等人提出了不一樣的領域,詳見https://blog.csdn.net/lz0499/article/details/54646952一文。
post

   在這個濾波器裏,value函數對應的是均值,criterion 函數對應的是均方差,從算法實現上核心的是均值和方差的快速實現,這在個人博文SSE圖像算法優化系列十四:局部均方差及局部平方差算法的優化 裏已經有描述,此處不贅述。測試

        那麼一個簡單的Kuwahara濾波器的主要代碼可能以下:flex

int IM_KuwaharaFilter(unsigned char *Src, unsigned char *Dest, int Width, int Height, int Stride, int Radius) { int Channel = Stride / Width; if ((Src == NULL) || (Dest == NULL))                                return IM_STATUS_NULLREFRENCE; if ((Width <= 0) || (Height <= 0))                                    return IM_STATUS_INVALIDPARAMETER; if ((Channel != 1) && (Channel != 3))                                return IM_STATUS_INVALIDPARAMETER; int Status = IM_STATUS_OK; unsigned char *Average = (unsigned char *)malloc(Height * Width * sizeof(unsigned char)); int *Deviation = (int *)malloc(Height * Width * sizeof(int)); int *RowOffset = (int *)malloc((Width + Radius + Radius) * sizeof(int)); int *ColOffset = (int *)malloc((Height + Radius + Radius) * sizeof(int)); unsigned char *Blur = NULL; if ((Average == NULL) || (Deviation == NULL) || (RowOffset == NULL) || (ColOffset == NULL)) { Status = IM_STATUS_OUTOFMEMORY; goto FreeMemory; } if (Channel == 1) { Status = IM_GetLocalMeanAndDeviationFilter(Src, Average, Deviation, Width, Height, Stride, Radius, false); if (Status != IM_STATUS_OK)    goto FreeMemory; } else { Blur = (unsigned char *)malloc(Height * Stride * sizeof(unsigned char));        // 彩色須要使用明度通道來計算均方差,否則會出現彩色異常塊
        if (Blur == NULL) { Status = IM_STATUS_OUTOFMEMORY; goto FreeMemory; } 
        Status = IM_GetLuminance(Src, Average, Width, Height, Stride, false);            // 獲得明度通道
        if (Status != IM_STATUS_OK)    goto FreeMemory; Status = IM_GetLocalMeanAndDeviationFilter(Average, NULL, Deviation, Width, Height, Width, Radius, false);    // 無需保存對應的均值,由於沒用
        if (Status != IM_STATUS_OK)    goto FreeMemory; Status = IM_BoxBlur(Src, Blur, Width, Height, Stride, Radius);                    // 彩色模糊
        if (Status != IM_STATUS_OK)    goto FreeMemory; } for (int Y = 0; Y < Height + Radius + Radius; Y++) ColOffset[Y] = IM_GetMirrorPos(Height, Y - Radius); for (int X = 0; X < Width + Radius + Radius; X++) RowOffset[X] = IM_GetMirrorPos(Width, X - Radius); for (int Y = 0; Y < Height; Y++) { int *LinePST = Deviation + ColOffset[Y] * Width;            // 均方差
        int *LinePSB = Deviation + ColOffset[Y + Radius + Radius] * Width; unsigned char *LinePD = Dest + Y * Stride; if (Channel == 1) { unsigned char *LinePT = Average + ColOffset[Y] * Width;        // 均值
            unsigned char *LinePB = Average + ColOffset[Y + Radius + Radius] * Width; for (int X = 0; X < Width; X++) { int OffsetL = RowOffset[X];                                            // 求4個點的均方差的最小值
                int OffsetR = RowOffset[X + Radius + Radius]; int Min, Mean; if (LinePST[OffsetL] < LinePST[OffsetR]) { Min = LinePST[OffsetL]; Mean = LinePT[OffsetL]; } else { Min = LinePST[OffsetR]; Mean = LinePT[OffsetR]; } if (Min > LinePSB[OffsetL]) { Min = LinePSB[OffsetL]; Mean = LinePB[OffsetL]; } if (Min > LinePSB[OffsetR]) { Min = LinePSB[OffsetR]; Mean = LinePB[OffsetR]; } LinePD[X] = Mean; } } else { unsigned char *LinePT = Blur + ColOffset[Y] * Stride;        // 均值
            unsigned char *LinePB = Blur + ColOffset[Y + Radius + Radius] * Stride; for (int X = 0; X < Width; X++) { int OffsetL = RowOffset[X];                                            // 求4個點的均方差的最小值
                int OffsetR = RowOffset[X + Radius + Radius]; int Min, MeanB, MeanG, MeanR; if (LinePST[OffsetL] < LinePST[OffsetR]) { Min = LinePST[OffsetL]; MeanB = LinePT[OffsetL * 3 + 0]; MeanG = LinePT[OffsetL * 3 + 1]; MeanR = LinePT[OffsetL * 3 + 2]; } else { Min = LinePST[OffsetR]; MeanB = LinePT[OffsetR * 3 + 0]; MeanG = LinePT[OffsetR * 3 + 1]; MeanR = LinePT[OffsetR * 3 + 2]; } if (Min > LinePSB[OffsetL]) { Min = LinePSB[OffsetL]; MeanB = LinePB[OffsetL * 3 + 0]; MeanG = LinePB[OffsetL * 3 + 1]; MeanR = LinePB[OffsetL * 3 + 2]; } if (Min > LinePSB[OffsetR]) { Min = LinePSB[OffsetR]; MeanB = LinePB[OffsetR * 3 + 0]; MeanG = LinePB[OffsetR * 3 + 1]; MeanR = LinePB[OffsetR * 3 + 2]; } LinePD[0] = MeanB; LinePD[1] = MeanG; LinePD[2] = MeanR; LinePD += 3; } } } FreeMemory: if (Average != NULL)        free(Average); if (Deviation != NULL)        free(Deviation); if (RowOffset != NULL)        free(RowOffset); if (ColOffset != NULL)        free(ColOffset); if (Blur != NULL)            free(Blur); return Status; }

  咱們跳過內存分配和均方差等的計算過程,咱們來看看灰度版本在寬度方向上是如何根據標準函數來獲取值函數的:優化

            for (int X = 0; X < Width; X++) { int OffsetL = RowOffset[X];                                            // 求4個點的均方差的最小值
                int OffsetR = RowOffset[X + Radius + Radius]; int Min, Mean; if (LinePST[OffsetL] < LinePST[OffsetR]) { Min = LinePST[OffsetL]; Mean = LinePT[OffsetL]; } else { Min = LinePST[OffsetR]; Mean = LinePT[OffsetR]; } if (Min > LinePSB[OffsetL]) { Min = LinePSB[OffsetL]; Mean = LinePB[OffsetL]; } if (Min > LinePSB[OffsetR]) { Min = LinePSB[OffsetR]; Mean = LinePB[OffsetR]; } LinePD[X] = Mean; }

  很簡單,就是對四個取樣點,獲取方差的最小值,而後同步獲取對應的均值的值,做爲結果值。

      本質上講,這個也是個邊緣保留濾波器,雖然其保邊效果不怎麼樣,其實Kuwahara主要是做爲一個藝術化的濾鏡存在,在開源的GPUImage裏也有Kuwahara濾波器的實現,參見GPUImageKuwaharaFilter.h文件。

  我也用別人用的一個美女圖,貼個效果吧。

                

             原圖                                        半徑1                                         半徑5

   注意,對於彩色圖像,咱們不易將他們分通道來計算,由於分通道時,同一個像素各通道對應的方差最小值不必定在同一位置,此時就會在最終的結果圖像上出現色斑,這不是咱們所但願的,在做者的文章中也有相似的說法,以下所示,此時咱們可取明度值做爲各通道的統一判斷依據。

     Obviously the Normal filter can't be used for color images by applying the filter to each RGB channel separately and then using the three resulting channels to compose the image. The main problem with this is that the sub regions will have different variancesfor each of the channels. For example, a region with the lowest variance in the red channel might have the highest variance in the green channel. This once again causes abiguity which would result in the color of the central pixel to be determined by several regions, which might also result in blurrier edges.

      2、MLV濾波器

  MLV濾波器也是一個典型的value-and-criterion structure結構濾波器,其參考論文見: A morphology-based filter structure for edge-enhanceing smoothing,和Kuwahara不一樣的是,MLV濾波器並不僅取周邊四領域的均方差做爲標準,而是周邊全部包含了中心點店像素的領域的均方差做爲判斷依據,取均方差最小那個位置的均值來替代模板中心的值。那這個時候計算的耗時因素就必須歸入考慮範圍,由於隨着半徑的增大,這個計算量急劇上升。

  此時咱們想到了前面個人博文SSE圖像算法優化系列七:基於SSE實現的極速的矩形核腐蝕和膨脹(最大值和最小值)算法 中提到了和半徑無關的最值算法的優化,那裏巧妙的運用了一些特性,使得算法在半徑大以後有可能計算時間還有所降低。

  可是咱們注意到這裏的MLV濾波器並非簡單的最值計算,其還帶了一個尾巴,在計算最值得時候還須要保留另一個參數同步,這就有點像C的sort函數功能,好比我要對一個結構體數組進行排序,而排序的規則是根據結構體中某一個成員的值來決定的,sort函數能夠輕鬆搞定這個功能,若是咱們用普通的C語言實現上面博文的算法,也是能夠容易搞定這個的,可是我這裏須要進行SSE處理,那咱們先來看看源博文這一塊的核心實現語句。

    __m128i Max1 = _mm_set1_epi8(255), Max2 = _mm_set1_epi8(255);                // 這樣寫能減小一次內存加載
    for (int Y = StartY; Y < EndY; Y++) { Max1 = _mm_min_epu8(_mm_loadu_si128((__m128i *)(LinePS + 0)), Max1);    // 一條AVX指令
        Max2 = _mm_min_epu8(_mm_loadu_si128((__m128i *)(LinePS + 16)), Max2); _mm_store_si128((__m128i *)(LinePD + 0), Max1); _mm_store_si128((__m128i *)(LinePD + 16), Max2); LinePS += Stride; LinePD += 32; }

  很簡單,就是_mm_min_epu8的一個簡單調用,這裏沒有任何的比較函數。

       若是咱們在計算最值得時候還須要利用最值得特性附帶上另一個參量,很明顯,若是直接使用上述過程,咱們沒法作到這一點。必須想辦法帶上比較特徵。

  實際上,min系列的比較函數,咱們也可使用cmp系列的SSE函數來實現,好比要實現兩個32位數的SSE函數的比較,除了使用_mm_min_epi32外,還可使用下面的代碼:

    __m128i S1 = _mm_set_epi32(10, 20, 30, 50); __m128i S2 = _mm_set_epi32(15, 8, 34, 234); __m128i Min1 = _mm_min_epi32(S1, S2); __m128i Cmp = _mm_cmplt_epi32(S1, S2); __m128i Min2 = _mm_blendv_epi8(S2, S1, Cmp); for (int Y = 0; Y < 4; Y++) { printf("%d \t", Min1.m128i_i32[Y]); } printf("\n"); for (int Y = 0; Y < 4; Y++) { printf("%d \t", Min2.m128i_i32[Y]); }

  結果以下,徹底同樣:

      

  注意此時咱們在過程當中得到了一些標誌信息,好比上述代碼中得Cmp變量,這個很是有用,有了它,咱們就至關於有了一枚旗幟,在隊伍A裏有人舉起了一面旗幟,咱們只要在隊伍B裏對應的位置找到對應的人就能夠了。這個功能一樣是能夠藉助於_mm_blendv_epi8這個來實現的。

  具體到本例,通常方差(不是均方差,此例爲未除以取樣總數的值,即(v0-m)*(v0-m)+(v1-m)*(v1-m)+...+(vn-m)*(vn-m)的值,其中m表示v0到vn的平均值,中間用浮點保存),能夠用整形表示(在計算最後舍入),而平均值直接使用unsigned char表示,如上IM_KuwaharaFilter函數所示。 此時,按照最值那篇文章的例子,咱們能夠一次性比較四個SSE變量,這樣產生了4個比較標誌位,而此時咱們須要附帶的信息是4*4個字節類型的數據,此時還需將4個比較標誌位的SSE變量合併成一個SSE變量,以方便_mm_blendv_epi8使用,這個時候咱們就能夠借用packs系列的函數了,具體以下所示:

    __m128i Min0 = _mm_set1_epi32(IM_INT_MAX), Min1 = _mm_set1_epi32(IM_INT_MAX);        // 同時處理16個方差數據
    __m128i Min2 = _mm_set1_epi32(IM_INT_MAX), Min3 = _mm_set1_epi32(IM_INT_MAX); __m128i Avg = _mm_setzero_si128(), AvgB = _mm_setzero_si128(); __m128i AvgG = _mm_setzero_si128(), AvgR = _mm_setzero_si128(); for (int Y = StartY; Y < EndY; Y++) { __m128i Src0 = _mm_loadu_si128((__m128i *)(LinePS + 0)); __m128i Src1 = _mm_loadu_si128((__m128i *)(LinePS + 4)); __m128i Src2 = _mm_loadu_si128((__m128i *)(LinePS + 8)); __m128i Src3 = _mm_loadu_si128((__m128i *)(LinePS + 12)); __m128i Cmp0 = _mm_cmplt_epi32(Src0, Min0); __m128i Cmp1 = _mm_cmplt_epi32(Src1, Min1); __m128i Cmp2 = _mm_cmplt_epi32(Src2, Min2); __m128i Cmp3 = _mm_cmplt_epi32(Src3, Min3); Min0 = _mm_blendv_epi8(Min0, Src0, Cmp0);            // 不使用_mm_min_epi32,主要是爲了獲得標識符
        Min1 = _mm_blendv_epi8(Min1, Src1, Cmp1); Min2 = _mm_blendv_epi8(Min2, Src2, Cmp2); Min3 = _mm_blendv_epi8(Min3, Src3, Cmp3); _mm_storeu_si128((__m128i *)(LinePD + 0), Min0); _mm_storeu_si128((__m128i *)(LinePD + 4), Min1); _mm_storeu_si128((__m128i *)(LinePD + 8), Min2); _mm_storeu_si128((__m128i *)(LinePD + 12), Min3); if (Channel == 1)                                // 注意要使用_mm_packs_epi16而不是_mm_packus_epi16
 { Avg = _mm_blendv_epi8(Avg, _mm_loadu_si128((__m128i *)LinePA), _mm_packs_epi16(_mm_packs_epi32(Cmp0, Cmp1), _mm_packs_epi32(Cmp2, Cmp3))); _mm_storeu_si128((__m128i *)LinePM, Avg);            // 最小方差對應的均值
 } else { __m128i Mask0 = _mm_or_si128(_mm_shuffle_epi8(Cmp0, _mm_setr_epi8(0, 0, 0, 4, 4, 4, 8, 8, 8, 12, 12, 12, -1, -1, -1, -1)), _mm_shuffle_epi8(Cmp1, _mm_setr_epi8(-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 0, 0, 0, 4))); __m128i Mask1 = _mm_or_si128(_mm_shuffle_epi8(Cmp1, _mm_setr_epi8(4, 4, 8, 8, 8, 12, 12, 12, -1, -1, -1, -1, -1, -1, -1, -1)), _mm_shuffle_epi8(Cmp2, _mm_setr_epi8(-1, -1, -1, -1, -1, -1, -1, -1, 0, 0, 0, 4, 4, 4, 8, 8))); __m128i Mask2 = _mm_or_si128(_mm_shuffle_epi8(Cmp2, _mm_setr_epi8(8, 12, 12, 12, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1)), _mm_shuffle_epi8(Cmp3, _mm_setr_epi8(-1, -1, -1, -1, 0, 0, 0, 4, 4, 4, 8, 8, 8, 12, 12, 12))); AvgB = _mm_blendv_epi8(AvgB, _mm_loadu_si128((__m128i *)LinePA), Mask0); AvgG = _mm_blendv_epi8(AvgG, _mm_loadu_si128((__m128i *)(LinePA + 16)), Mask1); AvgR = _mm_blendv_epi8(AvgR, _mm_loadu_si128((__m128i *)(LinePA + 32)), Mask2); _mm_storeu_si128((__m128i *)LinePM, AvgB); _mm_storeu_si128((__m128i *)(LinePM + 16), AvgG); _mm_storeu_si128((__m128i *)(LinePM + 32), AvgR); } LinePS += Width; LinePA += Stride; LinePD += 16; LinePM += 16 * Channel; }

  注意在上述代碼中咱們使用packs而非packus,請讀者自行理解這是爲何。

  對於彩色圖像,根據上面的對Kuwahara濾波器的描述,也是須要計算明度的,此時,一個標誌位對應的就是就是3個份量,此時就不能使用packs之類的函數,而能夠直接使用上述的_mm_shuffle_epi8類的函數能夠輕鬆搞定。

  固然具體的過程還涉及到不少其餘東西,好比int類型矩陣的轉置,內存的有效利用等等,這裏不予以多述。

  3、MCV濾波器

  同MLV濾波器稍微有點不一樣的時,MCV濾波器的評判標準有稍做改動,其參考文件見:An edge-enhancing nonlinear filter for reducing multiplicative noise,可是核心的實現沒有太大的區別。

  按照論文的說法,MCV濾波器能夠有效地去除乘性噪音,而MLV能夠去除加性噪音,以下所示:

       The MCV filter therefore uses  the sample mean  for  the value function,  the coefficient of variation as the criterion function, and  the minimum as the selection  function.  It is a  value-and-criterion filter specifically designed to reduce multiplicative  noise. 

  以及

       The MCV filter is closely related to a value-and-criterion filter designed to remove additive noise known as the Mean of Least Variance (MLV) filter [7–9].  The MLV filter uses the sample variance as its criterion function rather than the coefficient of  variation.  In images corrupted by  additive noise, the variance is theoretically minimum in structuring elements where the signal is constant.

  MLV和MCV能產生比Kuwahara更爲平滑的圖像,在SAR圖像以及醫療圖像上應該有必定用武之地。並且我的感受在小半徑時其產生的結果對圖像的邊界分割、邊緣查找等也有必定的幫助。

   

                             原圖                                                                        Kuwahara

  

                                 MLV                                                        MCV

  上述操做半徑都爲5。

  處理後,彷佛邊緣的分界線更爲明顯。

  固然我貼的圖形其實都不是論文自己強調的應用場景,只是作個例證而已。

  關於這種相似的濾波器,還可使用圓形半徑,或者是橢圓半徑,也有一些人對這方面的需求作了研究,好比https://blog.csdn.net/qq_16013649/article/details/78744910

  針對1024*768的彩色圖像,此類算法優化後大約耗時20ms,灰度圖約10ms,速度也還算比較快了,測試見附件的SSE_Optimization_Demo的stylize菜單。

      Demo下載地址:https://files.cnblogs.com/files/Imageshop/SSE_Optimization_Demo.rar

      

相關文章
相關標籤/搜索