Metal 系列教程（2）- Metal 實現 LUT 濾鏡

時間 2019-11-30

標籤 metal 系列教程實現 lut 简体版

原文原文鏈接

簡單濾鏡

在咱們平時作圖像處理的過程當中，最長作的就是改變總體圖像的某個顏色。
咱們舉個例子，若是作一個將全部 RGB 中的 R 值改成原來的 0.5 倍，根據上一個 wiki 裏面所提到的，一張圖表繪製的過程是先頂點 vertex 再 fragment，而 fragment 是負責繪製每一個像素的顏色。ios

fragment float4 myFragmentShader(
                                VertexOut vertexIn [[stage_in]],
                            texture2d<float,access::sample>   inputImage   [[ texture(0) ]],
                                 sampler textureSampler [[sampler(0)]]
                             )
{
    float4 color = inputImage.sample(textureSampler, vertexIn.texCoords);
    return color;

}複製代碼

因此就在這個 shader 裏面將返回的 color 的 r 值乘上 0.5，就可以實現咱們想要的效果。bash

return float4(color.r * 0.5 ,color.gba)app

從新運行咱們以前的 demo ，咱們的三角就有點綠了，說明咱們的效果實現了。機器學習

ColorLUT

可是上面是理想狀況，通常圖片的處理會複雜的多。
假設咱們的圖片是 1280 * 720 像素，那麼就會進行 921600 的浮點運算，對每一個像素的 r 值乘以 0.5。
若是圖片小的話，對 GPU 的計算來講並無什麼壓力，可是當圖片更大而且數量更多的時候，就是會影響 GPU 計算的速度了。ide

look up table
顧名思義就是查找表，而 ColorLUT 就是顏色查找表。函數

因此引入了查詢表，把對應的變換完的像素存起來，用的時候只要進行一次查詢操做就能夠，這樣的操做會比以前的查表操做快的多，特別是在負載的顏色運算的狀況下。post

可是要把全部顏色的變換都存儲起來，假設是 RGB24 ，一個是 8 3 24位，RGB 每一個顏色都是 0-255，全部一共有 16777216 個顏色的變換，全存下來就是 256 256 256 24 / 8 / 1024/ 1024 = 48 mb，若是每一個濾鏡都是 48 mb 的話，那圖片處理軟件裏面那麼多濾鏡，app 的大小不得沒邊了？學習

因此爲了解決這一問題就有了 ColorLut 這樣的標準濾鏡圖片，默認的是以下的圖片，512*512 ，表明着全部顏色的變換，若不在圖片中的顏色就去對應的差值：ui

這是一張標準顏色的圖，rbg 都是原來的顏色，因此對這張圖片進行顏色的調整，而後獲得一張新的 lut 圖片，新的圖片加上修改後的 lut 圖片濾鏡就能夠查詢到對應的顏色該怎麼替換，從而的到新的圖片。編碼

下面咱們來解釋下上面的這張圖片和如何使用：
首先觀察一下這個圖片

8*8 的方塊組成
總體上看每一個方塊左上角從左上往右下由黑變藍
單獨每一個個方塊的右上角是紅色爲主
單獨每一個個方塊的左下角是綠色爲主

上述的信息有沒有給你一點點啓示呢？
咱們在簡化一點
顏色是 r g b 三個值，都以歸一化的值表示（ 1 表明 255 ）。

總體對每一個小方塊而言，從左上往右下 b 從 0 到 1 ，是 z 字型的順序
單獨對每一個小方塊而言，從左到右 r 從 0 到 1，表明 x
單獨對每一個小方塊而言，從上到下 g 從 0 到 1，表明 y

因此獲得 0,0,1 的純藍色對應的位置就是 (7 64 , 7 64)，右下角的那個方塊。

如今讓咱們經過個例子，來演示一遍查詢的過程。

假設咱們如今須要獲取的顏色是（0.4，0.6，0.2）都採用歸一化座標

首先咱們肯定用哪一個方塊 b = 0.2 * 63 = 12.6 即（4，1）那個方塊
r = 0.4 63 = 25.6，g = 0.6 63 = 37.8 轉換到大座標(4 64 + 25.6, 164 + 37.8)
前三步獲得的都是浮點數，可是咱們濾鏡的圖像的像素都是固定的，不存在小數
對於 r,g 最後將的到的座標再轉換爲歸一化座標，( (4 64 + 25.6)/512, (164 + 37.8)/512),經過取樣器 sampler 插值取出精確顏色值
對於 b 咱們能夠經過對下一個方塊 (5,1)再進行取色，再把兩個顏色混合獲得最後的顏色

Metal 圖像處理

在上一篇中，咱們提到 CommandBuffer 有三種 Encoder 。

MTLRenderCommandEncoder 渲染 3D 編碼器
MTLComputeCommandEncoder 計算編碼器
MTLBlitCommandEncoder 位圖複製編碼器拷貝 buffer texture 同時也能生成 mipmap

以前的 demo 是簡單的對圖像進行繪製，用的是 MTLRenderCommandEncoder 的 Encoder。
此次咱們對圖片添加濾鏡，用到的是 MTLComputeCommandEncoder ，經過 GPU 的計算能力，來爲咱們實現查詢 lut，並混合顏色的操做。

簡而言之，相比以前的渲染操做，是輸入圖片的 texture 就能渲染出來了，濾鏡咱們須要作的是有個處理的方法，咱們給 GPU 輸入原始圖片 texture 和 lut 圖片的 texture ， GPU 返回給咱們一個新的添加完濾鏡的圖片 texture，咱們把這個 texture 再給咱們以前的渲染的 Encoder，就會在三角中繪製一張咱們加過濾鏡以後的圖片了。

咱們延續以前的 demo，Device 和 CommandQueue ，CommandBuff，默認都已經有了咱們在以前的渲染的 Encoder 以前增長一個 Compute 的 Encoder。

每一個 Encoder 都須要一個 PipelineState 負責連接 Shader 的方法
這裏新建個 ComputePipelineState ，對應的 shader 方法稍後介紹。

id<MTLLibrary> library = [device newDefaultLibrary];
 id<MTLFunction> function = [library newFunctionWithName:@"image_filiter"];

 self.computeState = [device newComputePipelineStateWithFunction:function error:nil];複製代碼

配置資源，原始圖片和 lut 圖片。

下面是 UIImage 轉換爲 Texture 的一種方法，經過 CGContext 繪製。

- (void)setLutImage:(UIImage *)lutImage{
    _lutImage = lutImage;

    CGImageRef imageRef = [_lutImage CGImage];

    // Create a suitable bitmap context for extracting the bits of the image
    NSUInteger width = CGImageGetWidth(imageRef);
    NSUInteger height = CGImageGetHeight(imageRef);
    CGColorSpaceRef colorSpace = CGColorSpaceCreateDeviceRGB();
    uint8_t *rawData = (uint8_t *)calloc(height * width * 4, sizeof(uint8_t));
    NSUInteger bytesPerPixel = 4;
    NSUInteger bytesPerRow = bytesPerPixel * width;
    NSUInteger bitsPerComponent = 8;
    CGContextRef bitmapContext = CGBitmapContextCreate(rawData, width, height,
                                                       bitsPerComponent, bytesPerRow, colorSpace,
                                                       kCGImageAlphaPremultipliedLast | kCGBitmapByteOrder32Big);
    CGColorSpaceRelease(colorSpace);


    CGContextDrawImage(bitmapContext, CGRectMake(0, 0, width, height), imageRef);
    CGContextRelease(bitmapContext);

    MTLRegion region = MTLRegionMake2D(0, 0, width, height);
    [self.lutTexture replaceRegion:region mipmapLevel:0 withBytes:rawData bytesPerRow:bytesPerRow];

    free(rawData);
    }複製代碼

配置可配參數，如濾鏡的混合度，返回等等。
這裏我新建了一個 struct ，表明了添加濾鏡的返回和強度。經過 bytes 能夠把相應的配置傳到 shader 中去。

typedef struct
{

    UInt32 clipOriginX;
    UInt32 clipOriginY;
    UInt32 clipSizeX;
    UInt32 clipSizeY;
    Float32 saturation;
    bool changeColor;
    bool changeCoord;

    }ImageSaturationParameters;複製代碼

配置 Encoder

將上述的組件都組裝起來，sourceTexture 爲輸入的圖片 texture ，destinationTexture 爲將要寫入的圖片 texture，
self.lutTexture 爲輸入的濾鏡圖片 texture，分爲對應爲 texture 的 0，1，2 輸入源。
把參數配置，做爲 bytes 傳入 shader 中。

ImageSaturationParameters params;
    params.clipOriginX = floor(self.filiterRect.origin.x);
    params.clipOriginY = floor(self.filiterRect.origin.y);
    params.clipSizeX = floor(self.filiterRect.size.width);
    params.clipSizeY = floor(self.filiterRect.size.height);

    params.saturation = self.saturation;
    params.changeColor = self.needColorTrans;
    params.changeCoord = self.needCoordTrans;


    id<MTLComputeCommandEncoder> encoder = [commandBuffer computeCommandEncoder];
    [encoder pushDebugGroup:@"filter"];
    [encoder setLabel:@"filiter encoder"];

    [encoder setComputePipelineState:self.computeState];
    [encoder setTexture:sourceTexture atIndex:0];
    [encoder setTexture:destinationTexture atIndex:1];

    if (self.lutTexture == nil) {
        NSLog(@"lut == nil");
        [encoder setTexture:sourceTexture atIndex:2];
    }else{
        [encoder setTexture:self.lutTexture atIndex:2];
    }

    [encoder setSamplerState:self.samplerState atIndex:0];

    [encoder setBytes:&params length:sizeof(params) atIndex:0];複製代碼

threadgroups
在 Compute encoder 中，爲了提升計算的效率，每一個圖片都會分爲一個小的單元送到 GPU 進行並行處理，分多少組和每一個組的單元大小都是由 Encder 來配置的。

爲了儘量地發揮 GPU 計算最大的效率，能夠經過以下方式來配置：

NSUInteger wid = self.computeState.threadExecutionWidth;
    NSUInteger hei = self.computeState.maxTotalThreadsPerThreadgroup / wid;

    MTLSize threadsPerGrid = {(sourceTexture.width + wid - 1) / wid,(sourceTexture.height + hei - 1) / hei,1};
    MTLSize threadsPerGroup = {wid, hei, 1};


    [encoder dispatchThreadgroups:threadsPerGrid
    threadsPerThreadgroup:threadsPerGroup];複製代碼

Shader
這裏也就是核心的計算邏輯，和以前渲染不一樣的是，它既不是 vertex ，也不是 fragment，而是新的 kernel 修飾的，具體的以下，其實就是上面的解釋 lut 的代碼版本，若是你能理解上面的 lut 座標的定位的，那麼下面的相關代碼也不存在問題。
同時下面代碼還增長了一個是不是須要添加濾鏡的範圍的判斷，能夠看到取樣器是能夠複用的，不一樣 texture 均可以使用同一個取樣器。
能夠看到 image_filiter 函數有 6 個輸入值，從上網上分別爲配置參數，原圖 texture，寫入的目標 texture，濾鏡的 texture，採樣器，執行時的位置（這個參數返回的是在以前配置的 threadgroup 中計算出來的，位於整個圖像中的位置，不是歸一化的值，直接取樣便可獲取對應位置的顏色）

//check the point in pos
bool checkPointInRect(uint2 point,uint2 origin, uint2 rect){
    return point.x >= origin.x &&
    point.y >= origin.y &&
    point.x <= (origin.x + rect.x) &&
    point.y <= (origin.y + rect.y);
}
kernel void image_filiter(constant ImageSaturationParams *params [[buffer(0)]],
                          texture2d<half, access::sample> sourceTexture [[texture(0)]],
                          texture2d<half, access::write> targetTexture [[texture(1)]],
                          texture2d<half, access::sample> lutTexture [[texture(2)]],
                          sampler samp [[sampler(0)]],
                          uint2 gridPos [[thread_position_in_grid]]){


    float2 sourceCoord = float2(gridPos);
    half4 color = sourceTexture.sample(samp,sourceCoord);


    float blueColor = color.b * 63.0;

    int2 quad1;
    quad1.y = floor(floor(blueColor) / 8.0);
    quad1.x = floor(blueColor) - (quad1.y * 8.0);

    int2 quad2;

    quad2.y = floor(ceil(blueColor) / 8.0);
    quad2.x = ceil(blueColor) - (quad2.y * 8.0);

    half2 texPos1;
    texPos1.x = (quad1.x * 0.125) + 0.5/512.0 + ((0.125 - 1.0/512.0) * color.r);
    texPos1.y = (quad1.y * 0.125) + 0.5/512.0 + ((0.125 - 1.0/512.0) * color.g);

    half2 texPos2;
    texPos2.x = (quad2.x * 0.125) + 0.5/512.0 + ((0.125 - 1.0/512.0) * color.r);
    texPos2.y = (quad2.y * 0.125) + 0.5/512.0 + ((0.125 - 1.0/512.0) * color.g);


    half4 newColor1 = lutTexture.sample(samp,float2(texPos1.x * 512 ,texPos2.y * 512));
    half4 newColor2 = lutTexture.sample(samp,float2(texPos2.x * 512,texPos2.y * 512 ));

    half4 newColor = mix(newColor1, newColor2, half(fract(blueColor)));


    half4 finalColor = mix(color, half4(newColor.rgb, color.w), half(params->saturation));


    uint2 destCoords = gridPos + params->clipOrigin;


    uint2 transformCoords = destCoords;

    //transform coords for y
    if (params->changeCoord){
        transformCoords = uint2(destCoords.x, sourceTexture.get_height() - destCoords.y);
    }
    //transform color for r&b
    half4 realColor = finalColor;
    if (params->changeColor){
        realColor = half4(finalColor.bgra);
    }

    if(checkPointInRect(transformCoords,params->clipOrigin,params->clipSize))
    {
        targetTexture.write(realColor, transformCoords);

    }else{

        targetTexture.write(color,transformCoords);
    }
}複製代碼