介紹基於CGAN的pix2pix模型,可用於實現多種配對圖像翻譯任務php
配對圖像翻譯包括不少應用場景,輸入和輸出都是圖片且尺寸相同python
pix2pix提供了一種通用的技術框架,用於完成各類配對圖像翻譯任務git
做者還提供了一個在線Demo,包括曾經火爆一時的edge2cat,https://affinelayer.com/pixsrv/github
pix2pix原理以下,典型的CGAN結構,但G只接受一個固定的輸入X,能夠理解爲一個條件C,即不須要隨機噪音,而後輸出翻譯後的版本Y網絡
D接受一個X(CGAN中的C)和一個Y(真假樣本),並判斷X和Y是否爲配對的翻譯app
除了標準的GAN損失函數以外,pix2pix還考慮了生成樣本和真實樣本之間的L1距離做爲損失框架
$$ L_{L_1}(G)=\mathbb{E}_{x\sim p_x,y\sim p_y}[\left | y-G(x) \right |_1] $$dom
GAN損失負責捕捉圖像高頻特徵,L1損失負責捕捉低頻特徵,使得生成結果既真實且清晰ide
生成器G使用Unet實現,主要用到Skip-Connection來學習配對圖像之間的映射函數
判別器D使用了PatchGAN的思想,以前是對整張圖片給出一個分數,PatchGAN則是將一張圖片分爲不少塊,對每一塊都給出一個分數
代碼參考自如下項目,https://github.com/affinelayer/pix2pix-tensorflow,提供了不少方便好用的功能
數據集下載連接,https://people.eecs.berkeley.edu/~tinghuiz/projects/pix2pix/datasets/,包括五個數據集:樓房、街景、地圖、鞋子、包
以facades樓房數據爲例,train、val、test分別包括400、100、106張圖片,每張圖片包括兩部分,對應翻譯先後的兩個版本
加載庫
# -*- coding: utf-8 -*- import tensorflow as tf import numpy as np import matplotlib.pyplot as plt %matplotlib inline from imageio import imread, imsave, mimsave import glob import os from tqdm import tqdm
加載圖片,使用train和val,共500張圖片
images = glob.glob('data/train/*.jpg') + glob.glob('data/val/*.jpg') print(len(images))
整理數據,從每張圖片中分離出X和Y,B2A表示從右往左
X_all = [] Y_all = [] WIDTH = 256 HEIGHT = 256 for image in images: img = imread(image) img = (img / 255. - 0.5) * 2 # B2A X_all.append(img[:, WIDTH:, :]) Y_all.append(img[:, :WIDTH, :]) X_all = np.array(X_all) Y_all = np.array(Y_all) print(X_all.shape, Y_all.shape)
定義一些常量、網絡tensor、輔助函數,這裏的batch_size
設爲1,所以每次訓練都是一對一的圖像翻譯
batch_size = 1 LAMBDA = 100 OUTPUT_DIR = 'samples' if not os.path.exists(OUTPUT_DIR): os.mkdir(OUTPUT_DIR) X = tf.placeholder(dtype=tf.float32, shape=[None, HEIGHT, WIDTH, 3], name='X') Y = tf.placeholder(dtype=tf.float32, shape=[None, HEIGHT, WIDTH, 3], name='Y') k_initializer = tf.random_normal_initializer(0, 0.02) g_initializer = tf.random_normal_initializer(1, 0.02) def lrelu(x, leak=0.2): return tf.maximum(x, leak * x) def d_conv(inputs, filters, strides): padded = tf.pad(inputs, [[0, 0], [1, 1], [1, 1], [0, 0]], mode='CONSTANT') return tf.layers.conv2d(padded, kernel_size=4, filters=filters, strides=strides, padding='valid', kernel_initializer=k_initializer) def g_conv(inputs, filters): return tf.layers.conv2d(inputs, kernel_size=4, filters=filters, strides=2, padding='same', kernel_initializer=k_initializer) def g_deconv(inputs, filters): return tf.layers.conv2d_transpose(inputs, kernel_size=4, filters=filters, strides=2, padding='same', kernel_initializer=k_initializer) def batch_norm(inputs): return tf.layers.batch_normalization(inputs, axis=3, epsilon=1e-5, momentum=0.1, training=True, gamma_initializer=g_initializer) def sigmoid_cross_entropy_with_logits(x, y): return tf.nn.sigmoid_cross_entropy_with_logits(logits=x, labels=y)
判別器部分,將X和Y按通道拼接,通過屢次卷積後獲得30*30*1
的判別圖,即PatchGAN的思想,而以前是隻有一個神經元的Dense
def discriminator(x, y, reuse=None): with tf.variable_scope('discriminator', reuse=reuse): x = tf.concat([x, y], axis=3) h0 = lrelu(d_conv(x, 64, 2)) # 128 128 64 h0 = d_conv(h0, 128, 2) h0 = lrelu(batch_norm(h0)) # 64 64 128 h0 = d_conv(h0, 256, 2) h0 = lrelu(batch_norm(h0)) # 32 32 256 h0 = d_conv(h0, 512, 1) h0 = lrelu(batch_norm(h0)) # 31 31 512 h0 = d_conv(h0, 1, 1) # 30 30 1 h0 = tf.nn.sigmoid(h0) return h0
生成器部分,Unet先後兩部分各包含8層卷積,且後半部分的前三層卷積使用Dropout,Dropout層在訓練過程當中以必定機率隨機去掉一些神經元,起到防止過擬合的做用
def generator(x): with tf.variable_scope('generator', reuse=None): layers = [] h0 = g_conv(x, 64) layers.append(h0) for filters in [128, 256, 512, 512, 512, 512, 512]: h0 = lrelu(layers[-1]) h0 = g_conv(h0, filters) h0 = batch_norm(h0) layers.append(h0) encode_layers_num = len(layers) # 8 for i, filters in enumerate([512, 512, 512, 512, 256, 128, 64]): skip_layer = encode_layers_num - i - 1 if i == 0: inputs = layers[-1] else: inputs = tf.concat([layers[-1], layers[skip_layer]], axis=3) h0 = tf.nn.relu(inputs) h0 = g_deconv(h0, filters) h0 = batch_norm(h0) if i < 3: h0 = tf.nn.dropout(h0, keep_prob=0.5) layers.append(h0) inputs = tf.concat([layers[-1], layers[0]], axis=3) h0 = tf.nn.relu(inputs) h0 = g_deconv(h0, 3) h0 = tf.nn.tanh(h0, name='g') return h0
損失函數,G加上L1損失
g = generator(X) d_real = discriminator(X, Y) d_fake = discriminator(X, g, reuse=True) vars_g = [var for var in tf.trainable_variables() if var.name.startswith('generator')] vars_d = [var for var in tf.trainable_variables() if var.name.startswith('discriminator')] loss_d_real = tf.reduce_mean(sigmoid_cross_entropy_with_logits(d_real, tf.ones_like(d_real))) loss_d_fake = tf.reduce_mean(sigmoid_cross_entropy_with_logits(d_fake, tf.zeros_like(d_fake))) loss_d = loss_d_real + loss_d_fake loss_g_gan = tf.reduce_mean(sigmoid_cross_entropy_with_logits(d_fake, tf.ones_like(d_fake))) loss_g_l1 = tf.reduce_mean(tf.abs(Y - g)) loss_g = loss_g_gan + loss_g_l1 * LAMBDA
定義優化器
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) with tf.control_dependencies(update_ops): optimizer_d = tf.train.AdamOptimizer(learning_rate=0.0002, beta1=0.5).minimize(loss_d, var_list=vars_d) optimizer_g = tf.train.AdamOptimizer(learning_rate=0.0002, beta1=0.5).minimize(loss_g, var_list=vars_g)
訓練模型
sess = tf.Session() sess.run(tf.global_variables_initializer()) loss = {'d': [], 'g': []} for i in tqdm(range(100000)): k = i % X_all.shape[0] X_batch, Y_batch = X_all[k:k + batch_size, :, :, :], Y_all[k:k + batch_size, :, :, :] _, d_ls = sess.run([optimizer_d, loss_d], feed_dict={X: X_batch, Y: Y_batch}) _, g_ls = sess.run([optimizer_g, loss_g], feed_dict={X: X_batch, Y: Y_batch}) loss['d'].append(d_ls) loss['g'].append(g_ls) if i % 1000 == 0: print(i, d_ls, g_ls) gen_imgs = sess.run(g, feed_dict={X: X_batch}) result = np.zeros([HEIGHT, WIDTH * 3, 3]) result[:, :WIDTH, :] = (X_batch[0] + 1) / 2 result[:, WIDTH: 2 * WIDTH, :] = (Y_batch[0] + 1) / 2 result[:, 2 * WIDTH:, :] = (gen_imgs[0] + 1) / 2 plt.axis('off') plt.imshow(result) imsave(os.path.join(OUTPUT_DIR, 'sample_%d.jpg' % i), result) plt.show() plt.plot(loss['d'], label='Discriminator') plt.plot(loss['g'], label='Generator') plt.legend(loc='upper right') plt.savefig('Loss.png') plt.show()
結果以下圖所示,從左往右三張圖依次爲原圖、真實圖、生成圖
保存模型,以便在單機上使用
saver = tf.train.Saver() saver.save(sess, './pix2pix_diy', global_step=100000)
在單機上加載模型,對val中的圖片進行翻譯
# -*- coding: utf-8 -*- import tensorflow as tf import numpy as np from imageio import imread, imsave import glob images = glob.glob('data/val/*.jpg') X_all = [] Y_all = [] WIDTH = 256 HEIGHT = 256 N = 10 images = np.random.choice(images, N, replace=False) for image in images: img = imread(image) img = (img / 255. - 0.5) * 2 # B2A X_all.append(img[:, WIDTH:, :]) Y_all.append(img[:, :WIDTH, :]) X_all = np.array(X_all) Y_all = np.array(Y_all) print(X_all.shape, Y_all.shape) sess = tf.Session() sess.run(tf.global_variables_initializer()) saver = tf.train.import_meta_graph('./pix2pix_diy-100000.meta') saver.restore(sess, tf.train.latest_checkpoint('./')) graph = tf.get_default_graph() g = graph.get_tensor_by_name('generator/g:0') X = graph.get_tensor_by_name('X:0') gen_imgs = sess.run(g, feed_dict={X: X_all}) result = np.zeros([N * HEIGHT, WIDTH * 3, 3]) for i in range(N): result[i * HEIGHT: i * HEIGHT + HEIGHT, :WIDTH, :] = (X_all[i] + 1) / 2 result[i * HEIGHT: i * HEIGHT + HEIGHT, WIDTH: 2 * WIDTH, :] = (Y_all[i] + 1) / 2 result[i * HEIGHT: i * HEIGHT + HEIGHT, 2 * WIDTH:, :] = (gen_imgs[i] + 1) / 2 imsave('facades翻譯結果.jpg', result)
看一下項目提供了哪些造好的輪子,https://github.com/affinelayer/pix2pix-tensorflow
將圖片處理成256*256
大小,input_dir
表示原始圖片目錄,output_dir
表示大小統一處理後的圖片目錄
python tools/process.py --input_dir input_dir --operation resize --output_dir output_dir
準備好X和Y的配對數據(兩個文件夾分別存放X和Y,對應圖片的名稱和尺寸相同),將圖片像facades那樣兩兩組合起來
python tools/process.py --input_dir X_dir --b_dir Y_dir --operation combine --output_dir combine_dir
獲得combine_dir
以後便可訓練配對圖像pix2pix翻譯模型
python pix2pix.py --mode train --output_dir model_dir --max_epochs 200 --input_dir combine_dir --which_direction AtoB
mode
:運行模式,train
表示訓練模型output_dir
:模型輸出路徑max_epochs
:訓練的輪數(epoch和iteration的區別)input_dir
:組合圖片路徑which_direction
:翻譯的方向,從左往右仍是從右往左模型訓練過程當中,以及模型訓練完畢後,均可以使用tensorboard查看訓練細節
tensorboard --logdir=model_dir
訓練完模型後,在測試數據上進行翻譯
python pix2pix.py --mode test --output_dir output_dir --input_dir input_dir --checkpoint model_dir
mode
:運行模式,test
表示測試output_dir
:翻譯結果輸出路徑input_dir
:待測試的圖片路徑checkpoint
:以前訓練獲得的模型路徑若是要訓練上色模型,則不須要以上提到的組合圖片這一步驟,只須要提供一個彩色圖片文件夾便可,由於對應的灰度圖能夠從彩色圖中自動抽取
python pix2pix.py --mode train --output_dir model_dir --max_epochs 200 --input_dir combine_dir --lab_colorization
項目還提供了一些訓練好的配對圖像翻譯模型
使用如下數據集,http://lear.inrialpes.fr/~jegou/data.php,都是一些旅遊風景照片,已經處理成256*256
大小,分爲train和test兩部分,分別包含750和62張圖片
使用train中的圖片訓練上色模型
python pix2pix.py --mode train --output_dir photos/model --max_epochs 200 --input_dir photos/data/train --lab_colorization
使用test中的圖片進行測試,模型會生成每一張彩色圖對應的灰度圖和上色圖,並將所有上色結果寫入一個網頁中
python pix2pix.py --mode test --output_dir photos/test --input_dir photos/data/test --checkpoint photos/model
上色結果以下,從左往右依次爲灰度圖、上色圖、原圖