深度學習基於CNN的紋理合成實踐【附python實現】

時間 2020-08-10

標籤深度學習基於 cnn 紋理合成實踐附python實現欄目 Python 简体版

原文原文鏈接

Q0: Preliminary knowledge of Texture Synthesis

Baseline請見此處，下文全部的代碼修改均創建此代碼基礎之上。python

1. 紋理合成簡述

紋理合成（Texture Systhesis）技術主要應用於計算機圖形學等領域，被用於模擬幾何模型的表面細節、加強繪製模型的真實感。不一樣於傳統的紋理映射（Texture Mapping）技術，紋理合成是從一個樣本紋理中推導一個泛化的過程，並以此來生成具備那種紋理的任意的新圖像，可有效解決紋理接縫和扭曲等問題。git

根據原理的不一樣，咱們經常將紋理合成的方法劃分爲過程紋理合成（Procedural Texture Synthesis，PTS）和基於採樣的紋理合成（Texture Synthesis from Samples，TSFS），具體區別以下。github

PTS：經過對物理生成過程的仿真直接在曲面上生成紋理，如毛髮、雲霧、木紋等。這種方法能夠逼真地生成紋理圖案，前提是對該紋理的生成過程進行準確的物理建模，這顯然是很是困難的，對於較爲複雜的紋理生成問題，PTS行不通；算法
TSFS：經過分析給定樣圖的紋理特徵來生成大面積紋理。TSFS技術既能保證紋理的類似性和連續性，又避免了PTS中物理模型創建的繁瑣過程。其傳統的算法主要有特徵匹配算法、基於馬爾可夫鏈隨機場模型的合成算法以及基於紋理塊拼接的紋理合成算法，而近些年發展較快的，則是基於深度學習的紋理合成方法，本次做業所涉及的《Texture Synthesis Using Convolutional Neural Networks》便屬於此類。網絡

2. 論文思想解讀

2-1 基本架構

紋理分析：原始紋理傳入卷積神經網絡（做業採用的是VGG-16網絡），計算其特徵圖之間的Gram矩陣；
紋理生成：初始化一張白噪聲圖像傳入網絡，計算包含紋理模型的每一個層的損失函數，在每一個像素值的總損失函數上使用梯度降低算法，最終訓練生成Gram矩陣與原始紋理圖像的Gram矩陣相同的紋理圖像。

2-2 Gram矩陣

Gram矩陣能夠視爲特徵圖之間的偏愛協方差矩陣，即沒有減去均值的協方差矩陣。其含義可能夠這樣理解——」在feature map中，每個數字都來自於一個特定濾波器在特定位置的卷積，所以每一個數字就表明一個特徵的強度，而Gram計算的其實是兩兩特徵之間的相關性，哪兩個特徵是同時出現的，哪兩個是此消彼長的等等，同時，Gram的對角線元素，還體現了每一個特徵在圖像中出現的量。」（知乎 90後後生）下圖左式爲Gram矩陣的定義式，其實就是用矩陣的轉置乘以矩陣自身來獲取；右式爲架構

Q1: Implementing Gram matrix and loss function.

Use the features extracted from all the 13 convolution layers, complete the baseline project with loss function based on gram matrix and run the training process.app

q1-1. 代碼

# Gram矩陣的計算
def get_gram_matrix(feature_map):
    shape = feature_map.get_shape().as_list()
    re_shape = tf.reshape(feature_map, (-1, shape[3]))
    gram = tf.matmul(re_shape, re_shape, transpose_a=True) / (shape[1]*shape[2]*shape[3])
    return gram

# L2損失函數的補充
def get_l2_gram_loss_for_layer(noise, source, layer):
    source_feature = getattr(source, layer)
    noise_feature = getattr(noise, layer)
    Gram_s = get_gram_matrix(source_feature)
    Gram_n = get_gram_matrix(noise_feature)
    loss = tf.nn.l2_loss((Gram_s-Gram_n))/2
    return loss

q1-2. 效果

圖片生成的動態效果圖請點擊此處查看。less

Origin	Generate

Q2: Training with non-texture images.

To better understand texture model represents image information, choose another non-texture image(such as robot.jpg in the ./images folder) and rerun the training process.函數

q2-1. 代碼

爲了較好的訓練效果，在Q2中，我給各層添加了遞增的權重，以便更加清晰地對比不一樣紋理圖片下網絡的生成效果。具體代碼以下。學習

def get_gram_loss(noise, source):
    with tf.name_scope('get_gram_loss'):
        # weight = np.logspace(0, len(GRAM_LAYERS)-1, len(GRAM_LAYERS), base=3.5)
        weight = np.linspace(1, len(GRAM_LAYERS), len(GRAM_LAYERS), endpoint=True)
        gram_loss = [get_l2_gram_loss_for_layer(noise, source, layer) for layer in GRAM_LAYERS ]
    return tf.reduce_mean(tf.convert_to_tensor(list(map(lambda x,y:x*y, weight, gram_loss))))

q2-2. 效果

	origin	epoch=1000，weight=1,2,3,4……	epoch=5000，weight=1,2,4,8……
red-peppers
robot
shibuya
stone

q2-3. 分析

從實驗結果來看，對於分佈有必定規律的紋理圖案，本網絡的生成效果尚佳，如圖red-peppers與圖stone；可是對於非紋理圖案來講，彷佛效果並不理想，在生成的圖像中，很難辨認出原圖中的元素。

Q3: Training with less layers of features.

To reduce the parameter size, please use less layers for extracting features (based on which we compute the Gram matrix and loss) and explore a combination of layers with which we can still synthesize texture images with high degrees of naturalness.

q3-1. 代碼

分別將不一樣layer對應的weight設置爲0，以從loss的計算中刪除相應的layer。具體代碼以下。

def get_gram_loss(noise, source):
    with tf.name_scope('get_gram_loss'):
        # weight = [1,1, 1,1, 1,1,1, 1,1,1, 1,1,1]
        # weight = [0,0, 1,1, 1,1,1, 1,1,1, 1,1,1]
        # weight = [1,1, 0,0, 1,1,1, 1,1,1, 1,1,1]
        # weight = [1,1, 1,1, 0,0,0, 1,1,1, 1,1,1]
        # weight = [1,1, 1,1, 1,1,1, 0,0,0, 1,1,1]
        # weight = [1,1, 1,1, 1,1,1, 1,1,1, 0,0,0]
        # weight = [10,10, 20,20, 30,30,30, 40,40,40, 50,50,50]
        # weight = [50,50, 40,40, 30,30,30, 20,20,20, 10,10,10]
        gram_loss = [get_l2_gram_loss_for_layer(noise, source, layer) for layer in GRAM_LAYERS ]
    return tf.reduce_mean(tf.convert_to_tensor(list(map(lambda x,y:x*y, weight, gram_loss))))

q3-2. 效果

all	~~conv1~~	~~conv2~~	~~conv3~~	~~conv4~~	~~conv5~~

所有保留	刪除conv1	刪除conv2	刪除conv3	刪除conv4	刪除conv5

weight ↗	weight ↘

[10,10, 20,20, 30,30,30, 40,40,40, 50,50,50]	[50,50, 40,40, 30,30,30, 20,20,20, 10,10,10]

q3-3. 分析

在刪除不一樣層的嘗試中，對比實驗結果能夠發現第一層對圖像特徵的提取尤爲關鍵；同時，單獨刪除conv2-5，對實驗結果的影響不大。同時，我嘗試着賦予向深層遞增或遞減的權重，經過結果的對比，發現權重遞增的狀況下生成圖像紋理效果較優，這說明提升深層conv對網絡的影響能夠有效提升輸出質量。綜合考量之下，可選擇刪除conv5的feature Map，同時提升深層的權重來得到較優的效果。

Q4: Finding alternatives of Gram matrix.

We may use the Earth mover's distance between the features of source texture image and the generated image.

q4-1. 代碼

EMD（Earth Mover’s Distance）是基於內容的圖像檢索計算兩個分佈之間距離的度量標準。EMD能夠直觀地理解爲線性規劃中運輸問題的最優解，即把一種分配轉換爲另外一種分配所必須支付地最低成本，最先由Peleg等人針對某些視覺問題提出。基於EMD，咱們能夠構建以下的損失函數。

\[Loss = \sum_l w_l \sum_i (sorted(F_i)-sorted(\hat{F_i}))^2 \]

具體代碼以下所示。

def get_l2_emd_loss_for_layer(noise, source, layer):
    noise_feature = getattr(noise, layer)
    source_feature = getattr(source, layer)
    shape = noise_feature.get_shape().as_list()
    noise_re_shape = tf.reshape(noise_feature, (shape[1]*shape[2], shape[3]))
    source_re_shape = tf.reshape(source_feature, (shape[1]*shape[2], shape[3]))
    noise_sort = tf.sort(noise_re_shape, direction='ASCENDING')
    source_sort = tf.sort(source_re_shape, direction='ASCENDING')
    return tf.reduce_sum(tf.math.square(noise_sort-source_sort))

def get_emd_loss(noise, source):
    with tf.name_scope('get_emd_loss'):
        emd_loss = [get_l2_emd_loss_for_layer(noise, source, layer) for layer in GRAM_LAYERS ]
    return tf.reduce_mean(tf.convert_to_tensor(emd_loss))

q4-2. 效果

此時 loss 還未徹底收斂，此爲【e:3700 loss: 2575.86865】時的輸出。~~個人小破電腦已經盡力了……~~

Origin	Generate

q4-3. 分析

從實驗結果來看，網絡學習到了原始紋理圖片的各個特徵向量之間的相關性，生成的圖片與原始圖像的紋理走向類似。但很遺憾的是，更改loss函數爲EMD-loss後，網絡缺失了原始紋理圖片的大多數顏色特徵（可能與EMD計算過程當中的sort操做有關），在色彩呈現上的表現很是很差。

Q5: Training with different weighting factor.

Use the configuration in Q3 as baseline. Change the weighting factor of each layer and rerun the training process.

q5-1. 代碼

根據Q3，使用遞增的權重係數可得到較優的訓練效果，因而，在Q5中，我設定了兩種權重的遞增序列：1）等差數列；2）等比數列。具體代碼以下。

def get_gram_loss(noise, source):
    with tf.name_scope('get_gram_loss'):
        # weight = np.logspace(0, len(GRAM_LAYERS)-4, len(GRAM_LAYERS)-3, base=2)
        weight = np.linspace(1, 128*(len(GRAM_LAYERS)-3), len(GRAM_LAYERS)-3, endpoint=True)
        weight = weight + [0, 0, 0]
        gram_loss = [get_l2_gram_loss_for_layer(noise, source, layer) for layer in GRAM_LAYERS ]
    return tf.reduce_mean(tf.convert_to_tensor(list(map(lambda x,y:x*y, weight, gram_loss))))

q5-2. 效果

等比數列 - 遞增 - \(q\) 爲相鄰項的比

q = 2	q = 2.5	q = 3	q = 3.5

等差數列 - 遞增 - \(d\) 爲相鄰項的差

d = 1	d = 2	d = 4	d = 8

d = 16	d = 32	d = 64	d = 128

q5-3. 分析

相對於等差遞增的權重，在等比遞增的權重下網絡的表現更好。同時，當q或d不斷增大時，生成圖像的還原度也不斷增高。結合這兩種現象，能夠得出初步的結論，經過擴大不一樣層layer權重的差別（即減少淺層layer的權重，增大深層layer的權重），能夠有效地提升紋理圖像的還原度；不一樣層權重的差別越大，網絡生成紋理圖像的效果越好，反之，則生成效果越差。