【算法隨記六】一段Matlab版本的Total Variation(TV)去噪算法的C語言翻譯。

  最近看到一篇文章講IMAGE DECOMPOSITION,裏面提到了將圖像分爲Texture layer和Structure layer,測試了不少方法,對於那些具備很是強烈紋理的圖像,總以爲用TV去燥的方法分離的結果都比其餘的方法都要好(好比導向、雙邊),好比下圖:html

   

   再好比:git

 

   可見TV能夠把紋理很好的提取出來。github

  如今應該能找到不少的TV代碼,好比IPOL上就有,詳見 http://www.ipol.im/pub/art/2013/61/算法

  我在其餘地方也見過一些,好比這裏: http://yu-li.github.io/paper/li_eccv14_jpeg.zip,他是藉助於FFT實現的,固然少不了屢次迭代,速度也是比較慢的。微信

  我還收藏了好久前一位朋友寫的M代碼,可是如今我不知道把他QQ或者微信弄到哪裏去了,也不知道他會不會介意我把他的代碼分享出來。ide

function dualROF() clc f0=imread('rr.bmp'); f0=f0(:,:,1); [m,n]=size(f0); f0=double(f0); lamda=30; % smoothness paramter, the larger the smoother tao=.125; % fixed do not change it. p1=zeros(m,n); p2=zeros(m,n); tic for step=1:100
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% div_p=div(p1,p2); cx=Fx(div_p-f0/lamda); cy=Fy(div_p-f0/lamda);

    abs_c=sqrt(cx.^2+cy.^2);
    p1=(p1+tao*cx)./(1+tao*abs_c);
    p2=(p2+tao*cy)./(1+tao*abs_c);post

end u=f0-lamda*div_p; toc figure; imagesc(f0); colormap(gray); axis off; axis equal; figure; imagesc(u); colormap(gray); axis off; axis equal; % Compute divergence using backward derivative function f = div(a,b) f = Bx(a)+By(b); % Forward derivative operator on x with boundary condition u(:,:,1)=u(:,:,1) function Fxu = Fx(u) [m,n] = size(u); Fxu = circshift(u,[0 -1])-u; Fxu(:,n) = zeros(m,1); % Forward derivative operator on y with boundary condition u(1,:,:)=u(m,:,:) function Fyu = Fy(u) [m,n] = size(u); Fyu = circshift(u,[-1 0])-u; Fyu(m,:) = zeros(1,n); % Backward derivative operator on x with boundary condition Bxu(:,1)=u(:,1) function Bxu = Bx(u) [~,n] = size(u); Bxu = u - circshift(u,[0 1]); Bxu(:,1) = u(:,1); Bxu(:,n) = -u(:,n-1); % Backward derivative operator on y with boundary condition Bxu(1,:)=u(1,:) function Byu = By(u) [m,~] = size(u); Byu = u - circshift(u,[1 0]); Byu(1,:) = u(1,:); Byu(m,:) = -u(m-1,:);

  M的代碼,代碼量不大,那是由於Matlab的向量化確實很厲害,可是這個代碼仍是很慢的,256*256的灰度圖迭代100次都要700ms了。性能

  這裏拋開一些優化不說,用這個circshift會形成很大的性能損失,咱們稍微分析下就能看到用這個地方其實就是簡單的水平或者垂直方向的差分,徹底沒有必要這樣寫。學習

  直接按照代碼的意思用C語言把他們展開並不作其餘的優化可獲得大概下面這種不怎麼好的代碼:測試

int IM_DualTVDenoising(unsigned char *Src, unsigned char *Dest, int Width, int Height, int Stride,  float Lamda = 20 , int Iter = 20) { int Channel = Stride / Width; if ((Src == NULL) || (Dest == NULL))                        return IM_STATUS_NULLREFRENCE; if ((Width <= 0) || (Height <= 0))                            return IM_STATUS_INVALIDPARAMETER; if ((Channel != 1) && (Channel != 3) && (Channel != 4))        return IM_STATUS_INVALIDPARAMETER; if (Channel == 1) { float tao = 0.125; // fixed do not change it.
        float InvLamda = 1.0 / Lamda; float *p1 = (float *)malloc(Width * Height * sizeof(float)); float *p2 = (float *)malloc(Width * Height * sizeof(float)); float *div_p = (float *)malloc(Width * Height * sizeof(float)); float *cx = (float *)malloc(Width * Height * sizeof(float)); float *cy = (float *)malloc(Width * Height * sizeof(float)); float *Temp = (float *)malloc(Width * Height * sizeof(float)); int X, Y; float q1, q2, q, abs_c; float *LineP1, *LineP2, *LineP3, *LineP4; unsigned char *LinePS, *LinePD; for (int Z = 0; Z < Iter; Z++) { //Div(p1, p2, div_p);

            for (Y = 0; Y < Height; Y++) { LineP1 = p1 + Y * Width;                        //Fx(Temp, cx);
                LineP2 = cx + Y * Width; LineP2[0] = LineP1[0]; for (X = 1; X < Width; X++) { LineP2[X] = LineP1[X] - LineP1[X - 1]; } LineP2[Width - 1] = -LineP1[Width - 2]; } memcpy(cy, p2, Width * sizeof(float)); for (Y = 1; Y < Height; Y++) { LineP1 = (float *)(p2 + (Y - 1)* Width); LineP2 = (float *)(p2 + Y * Width);            //Fy(Temp, cy);
                LineP3 = (float *)(cy + Y * Width); for (X = 0; X < Width; X++) { LineP3[X] = LineP2[X] - LineP1[X]; } } LineP1 = (float *)(p2 + (Height - 2) * Width); LineP2 = (float *)(cy + (Height - 1) * Width); for (X = 0; X < Width; X++) { LineP2[X] = -LineP1[X]; } for (Y = 0; Y < Height; Y++) { LineP1 = (float *)(cx + Y * Width); LineP2 = (float *)(cy + Y * Width);            //Fy(Temp, cy);
                LineP3 = (float *)(div_p + Y * Width); for (X = 0; X < Width; X++) { LineP3[X] = LineP1[X] + LineP2[X]; } } for (Y = 0; Y < Height; Y++) { LineP1 = (float *)(div_p + Y * Width); LineP2 = (float *)(Temp + Y * Width); LinePS = Src + Y * Stride; for (X = 0; X < Width; X++) { LineP2[X] = LineP1[X] - LinePS[X] * InvLamda; } } for (Y = 0; Y < Height; Y++) { LineP1 = (float *)(Temp + Y * Width);                        //Fx(Temp, cx);
                LineP2 = (float *)(cx + Y * Width); for (X = 0; X < Width - 1; X++) { LineP2[X] = LineP1[X + 1] - LineP1[X]; } LineP2[Width - 1] = 0; } for (Y = 0; Y < Height - 1; Y++) { LineP1 = (float *)(Temp + Y * Width); LineP2 = (float *)(Temp + (Y + 1) * Width);            //Fy(Temp, cy);
                LineP3 = (float *)(cy + Y * Width); for (X = 0; X < Width; X++) { LineP3[X] = LineP2[X] - LineP1[X]; } } memset(Temp + (Height - 1) * Width, 0, Width * sizeof(float)); for (Y = 0; Y < Height; Y++) { LineP1 = (float *)(p1 + Y * Width); LineP2 = (float *)(p2 + Y * Width); LineP3 = (float *)(cx + Y * Width); LineP4 = (float *)(cy + Y * Width); for (X = 0; X < Width; X++) { abs_c = sqrt(LineP3[X] * LineP3[X] + LineP4[X] * LineP4[X]); abs_c = 1 / (1 + tao * abs_c); LineP1[X] = (LineP1[X] + tao * LineP3[X]) * abs_c; LineP2[X] = (LineP2[X] + tao * LineP4[X]) * abs_c; } } } for (Y = 0; Y < Height; Y++) { LineP1 = (float *)(div_p + Y * Width); LinePS = Src + Y * Stride; LinePD = Dest + Y * Stride; for (X = 0; X < Width; X++) { LinePD[X] = IM_ClampToByte((int)(LinePS[X] - Lamda * LineP1[X])); } } free(p1); free(p2); free(div_p); free(cx); free(cy); free(Temp); } else { } }

  算法明顯佔用很大的內存,並且看起來彆扭,不過速度仍是槓槓的,256*256的灰度圖迭代100次都要30ms了。反編譯看了下代碼,編譯器對代碼作了很好的SIMD指令優化。

  上面的C語言仍是能夠繼續優化的,這就須要你們本身的認真的去研讀代碼深層次的邏輯關係了,實際上能夠只要上面的一半的臨時內存的,並且不少計算能夠集中在一個循環裏完成,能夠手動內嵌SIMD指令,或者直接使用編譯器的優化能力,基本上這樣的簡單的算法邏輯編譯器編譯後的速度不會比咱們手寫的SIMD指令慢,有的時候仍是會快一些,不得不佩服那些寫編譯器的大牛。優化後的速度大概在14ms左右。

  研究TV算法須要很好的數學功底,之前朋友曾經給我寄過一本書,裏面都是微分方面的數學公式,看的我嚇死了,不過TV算法彷佛有不少很好的應用,也曾經流行過一段時間,惋惜如今深度學習一出來,不少人都喜歡這種直接從海量數據中建造黑盒模型,而對那些有着很明顯的數學邏輯的算法嗤之以鼻了,真有點惋惜。

  之前在基於總變差模型的紋理圖像中圖像主結構的提取方法 一文中曾提到那個論文附帶的Matlab代碼沒有什麼意義,由於他很難轉換成C的代碼,即時轉換成功了,也處理不了大圖,可是本文這裏的TV算法總的來講在內存佔用或者速度方面都還使人滿意。

  在去噪效果上,這個算法還算能夠:

          

  本文Demo下載地址:  http://files.cnblogs.com/files/Imageshop/SSE_Optimization_Demo.rar, 算法位於Denoise --> TV Denoising下。

相關文章
相關標籤/搜索