紋理合成(Texture Systhesis)技術主要應用於計算機圖形學等領域,被用於模擬幾何模型的表面細節、加強繪製模型的真實感。不一樣於傳統的紋理映射(Texture Mapping)技術,紋理合成是從一個樣本紋理中推導一個泛化的過程,並以此來生成具備那種紋理的任意的新圖像,可有效解決紋理接縫和扭曲等問題。git
根據原理的不一樣,咱們經常將紋理合成的方法劃分爲 過程紋理合成(Procedural Texture Synthesis,PTS)和 基於採樣的紋理合成(Texture Synthesis from Samples,TSFS),具體區別以下。github
TSFS:經過分析給定樣圖的紋理特徵來生成大面積紋理。TSFS技術既能保證紋理的類似性和連續性,又避免了PTS中物理模型創建的繁瑣過程。其傳統的算法主要有特徵匹配算法、基於馬爾可夫鏈隨機場模型的合成算法以及基於紋理塊拼接的紋理合成算法,而近些年發展較快的,則是基於深度學習的紋理合成方法,本次做業所涉及的《Texture Synthesis Using Convolutional Neural Networks》便屬於此類。網絡
Gram矩陣能夠視爲特徵圖之間的偏愛協方差矩陣,即沒有減去均值的協方差矩陣。其含義可能夠這樣理解——」在feature map中,每個數字都來自於一個特定濾波器在特定位置的卷積,所以每一個數字就表明一個特徵的強度,而Gram計算的其實是兩兩特徵之間的相關性,哪兩個特徵是同時出現的,哪兩個是此消彼長的等等,同時,Gram的對角線元素,還體現了每一個特徵在圖像中出現的量。」(知乎 90後後生)下圖左式爲Gram矩陣的定義式,其實就是用矩陣的轉置乘以矩陣自身來獲取;右式爲架構
Use the features extracted from all the 13 convolution layers, complete the baseline project with loss function based on gram matrix and run the training
# Gram矩陣的計算 def get_gram_matrix(feature_map): shape = feature_map.get_shape().as_list() re_shape = tf.reshape(feature_map, (-1, shape[3])) gram = tf.matmul(re_shape, re_shape, transpose_a=True) / (shape[1]*shape[2]*shape[3]) return gram # L2損失函數的補充 def get_l2_gram_loss_for_layer(noise, source, layer): source_feature = getattr(source, layer) noise_feature = getattr(noise, layer) Gram_s = get_gram_matrix(source_feature) Gram_n = get_gram_matrix(noise_feature) loss = tf.nn.l2_loss((Gram_s-Gram_n))/2 return loss
Origin | Generate |
To better understand texture model represents image information, choose another non-texture image(such as robot.jpg in the ./images folder) and rerun the training process.函數
def get_gram_loss(noise, source): with tf.name_scope('get_gram_loss'): # weight = np.logspace(0, len(GRAM_LAYERS)-1, len(GRAM_LAYERS), base=3.5) weight = np.linspace(1, len(GRAM_LAYERS), len(GRAM_LAYERS), endpoint=True) gram_loss = [get_l2_gram_loss_for_layer(noise, source, layer) for layer in GRAM_LAYERS ] return tf.reduce_mean(tf.convert_to_tensor(list(map(lambda x,y:x*y, weight, gram_loss))))
origin | epoch=1000,weight=1,2,3,4…… | epoch=5000,weight=1,2,4,8…… | |
red-peppers | |||
robot | |||
shibuya | |||
stone |
To reduce the parameter size, please use less layers for extracting features (based on which we compute the Gram matrix and loss) and explore a combination of layers with which we can still synthesize texture images with high degrees of naturalness.
def get_gram_loss(noise, source): with tf.name_scope('get_gram_loss'): # weight = [1,1, 1,1, 1,1,1, 1,1,1, 1,1,1] # weight = [0,0, 1,1, 1,1,1, 1,1,1, 1,1,1] # weight = [1,1, 0,0, 1,1,1, 1,1,1, 1,1,1] # weight = [1,1, 1,1, 0,0,0, 1,1,1, 1,1,1] # weight = [1,1, 1,1, 1,1,1, 0,0,0, 1,1,1] # weight = [1,1, 1,1, 1,1,1, 1,1,1, 0,0,0] # weight = [10,10, 20,20, 30,30,30, 40,40,40, 50,50,50] # weight = [50,50, 40,40, 30,30,30, 20,20,20, 10,10,10] gram_loss = [get_l2_gram_loss_for_layer(noise, source, layer) for layer in GRAM_LAYERS ] return tf.reduce_mean(tf.convert_to_tensor(list(map(lambda x,y:x*y, weight, gram_loss))))
all | |||||
所有保留 | 刪除conv1 | 刪除conv2 | 刪除conv3 | 刪除conv4 | 刪除conv5 |
weight ↗ | weight ↘ |
[10,10, 20,20, 30,30,30, 40,40,40, 50,50,50] | [50,50, 40,40, 30,30,30, 20,20,20, 10,10,10] |
在刪除不一樣層的嘗試中,對比實驗結果能夠發現第一層對圖像特徵的提取尤爲關鍵;同時,單獨刪除conv2-5,對實驗結果的影響不大。同時,我嘗試着賦予向深層遞增或遞減的權重,經過結果的對比,發現權重遞增的狀況下生成圖像紋理效果較優,這說明提升深層conv對網絡的影響能夠有效提升輸出質量。綜合考量之下,可選擇刪除conv5的feature Map,同時提升深層的權重來得到較優的效果。
We may use the Earth mover's distance between the features of source texture image and the generated image.
EMD(Earth Mover’s Distance)是基於內容的圖像檢索計算兩個分佈之間距離的度量標準。EMD能夠直觀地理解爲線性規劃中運輸問題的最優解,即把一種分配轉換爲另外一種分配所必須支付地最低成本,最先由Peleg等人針對某些視覺問題提出。基於EMD,咱們能夠構建以下的損失函數。
def get_l2_emd_loss_for_layer(noise, source, layer): noise_feature = getattr(noise, layer) source_feature = getattr(source, layer) shape = noise_feature.get_shape().as_list() noise_re_shape = tf.reshape(noise_feature, (shape[1]*shape[2], shape[3])) source_re_shape = tf.reshape(source_feature, (shape[1]*shape[2], shape[3])) noise_sort = tf.sort(noise_re_shape, direction='ASCENDING') source_sort = tf.sort(source_re_shape, direction='ASCENDING') return tf.reduce_sum(tf.math.square(noise_sort-source_sort)) def get_emd_loss(noise, source): with tf.name_scope('get_emd_loss'): emd_loss = [get_l2_emd_loss_for_layer(noise, source, layer) for layer in GRAM_LAYERS ] return tf.reduce_mean(tf.convert_to_tensor(emd_loss))
此時 loss 還未徹底收斂,此爲【e:3700 loss: 2575.86865】時的輸出。
Origin | Generate |
Use the configuration in Q3 as baseline. Change the weighting factor of each layer and rerun the training process.
def get_gram_loss(noise, source): with tf.name_scope('get_gram_loss'): # weight = np.logspace(0, len(GRAM_LAYERS)-4, len(GRAM_LAYERS)-3, base=2) weight = np.linspace(1, 128*(len(GRAM_LAYERS)-3), len(GRAM_LAYERS)-3, endpoint=True) weight = weight + [0, 0, 0] gram_loss = [get_l2_gram_loss_for_layer(noise, source, layer) for layer in GRAM_LAYERS ] return tf.reduce_mean(tf.convert_to_tensor(list(map(lambda x,y:x*y, weight, gram_loss))))
q = 2 | q = 2.5 | q = 3 | q = 3.5 |
d = 1 | d = 2 | d = 4 | d = 8 |
d = 16 | d = 32 | d = 64 | d = 128 |