題圖來自:Experiments with style transfer
終於寫到這一篇了。這兩年各類藝術風格圖像處理的app層出不窮,好比當初火熱的Prisma。
本文簡單地介紹並實現了風格遷移算法,更多描述可參考以前翻譯的文章(圖像風格化、AI做曲,機器學習與藝術)。
不過在具體應用時可能還需優化,好比視頻中要考慮幀間穩定性等。
node01 - 簡單線性模型 | 02 - 卷積神經網絡 | 03 - PrettyTensor | 04 - 保存& 恢復
05 - 集成學習 | 06 - CIFAR 10 | 07 - Inception 模型 | 08 - 遷移學習
09 - 視頻數據 | 11 - 對抗樣本 | 12 - MNIST的對抗噪聲 | 13 - 可視化分析
14 - DeepDreampython
by Magnus Erik Hvass Pedersen / GitHub / Videos on YouTube
中文翻譯 thrillerist / Githubgit
若有轉載,請附上本文連接。github
在以前的教程#14中,咱們看到了如何最大化神經網絡內部的特徵激活,以便放大輸入圖像中的模式。這個稱爲DeepDream。算法
本文采用了相似的想法,不過有兩張輸入圖:一張內容圖像和一張風格圖像。而後,咱們但願建立一張混合圖像,它包含了內容圖的輪廓以及風格圖的紋理。數組
本文基於以前的教程。你須要大概地熟悉神經網絡(詳見教程 #01和 #02),熟悉教程 #14中的DeepDream也頗有幫助。網絡
這張流程圖顯示了風格遷移算法的大致想法,儘管比起圖中所展現出來的,咱們所使用的VGG-16模型有更多的層次。session
輸入兩張圖像到神經網絡中:一張內容圖像和一張風格圖像。咱們但願建立一張混合圖像,它包含了內容圖的輪廓以及風格圖的紋理。
咱們經過建立幾個能夠被優化的損失函數來完成這一點。app
內容圖像的損失函數會試着在網絡的某一層或多層上,最小化內容圖像以及混合圖像激活特徵的差距。這使得混合圖像和內容圖像的的輪廓類似。dom
風格圖像的損失函數稍微複雜一些,由於它試圖讓風格圖像和混合圖像的格拉姆矩陣(Gram-matrices)的差別最小化。這在網絡的一個或多個層中完成。 Gram-matrices度量了哪一個特徵在給定層中同時被激活。改變混合圖像,使其模仿風格圖像的激活模式(activation patterns),這將致使顏色和紋理的遷移。
咱們用TensorFlow來自動導出這些損失函數的梯度。而後用梯度來更新混合圖像。重複屢次這個過程,直到咱們對結果圖像滿意爲止。
風格遷移算法的一些細節沒有在這張流程圖中顯示出來,好比,對於Gram-matrices的計算,計算並保存中間值來提高效率,還有一個用來給混合圖像去噪的損失函數,對損失函數作歸一化(normalization),這樣它們更容易相對彼此縮放。
from IPython.display import Image, display
Image('images/15_style_transfer_flowchart.png')複製代碼
%matplotlib inline
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np
import PIL.Image複製代碼
使用Python3.5.2(Anaconda)開發,TensorFlow版本是:
tf.__version__複製代碼
'0.11.0rc0'
我花了兩天時間,想用以前教程#14中在DeepDream上使用的Inception 5h模型來實現風格遷移算法,但沒法獲得看起來足夠好的圖像。這有點奇怪,由於教程#14中生成的圖像看起來挺好的。但回想起來,咱們(在教程#14裏)也用了一些技巧來獲得這種質量,好比平滑梯度以及遞歸的降採樣並處理圖像。
原始論文 使用了VGG-19卷積神經網絡。出於因爲某些緣由,對於TendorFlow來講,預訓練的VGG-19模型在本教程中不夠穩定。所以咱們使用VGG-16模型,這是其餘人制做的,能夠很容易地獲取並在TensorFlow中載入。方便起見,咱們封裝了一個類。
import vgg16複製代碼
VGG-16模型是從網上下載的。這是你保存數據文件的默認文件夾。若是文件夾不存在,它就會被建立。
# vgg16.data_dir = 'vgg16/'複製代碼
Download the data for the VGG-16 model if it doesn't already exist in the directory.
WARNING: It is 550 MB!
若是文件夾中沒有VGG-16模型,就自動下載。
注意:它有500MB!
vgg16.maybe_download()複製代碼
Downloading VGG16 Model ...
Data has apparently already been downloaded and unpacked.
這個函數載入一張圖像,並返回一個浮點型numpy數組。圖像能夠被自動地改變大小,所以最大的寬高等於max_size
。
def load_image(filename, max_size=None):
image = PIL.Image.open(filename)
if max_size is not None:
# Calculate the appropriate rescale-factor for
# ensuring a max height and width, while keeping
# the proportion between them.
factor = max_size / np.max(image.size)
# Scale the image's height and width.
size = np.array(image.size) * factor
# The size is now floating-point because it was scaled.
# But PIL requires the size to be integers.
size = size.astype(int)
# Resize the image.
image = image.resize(size, PIL.Image.LANCZOS)
# Convert to numpy floating-point array.
return np.float32(image)複製代碼
將圖像保存成一個jpeg文件。給到的圖像是一個包含0到255像素值的numpy數組。
def save_image(image, filename):
# Ensure the pixel-values are between 0 and 255.
image = np.clip(image, 0.0, 255.0)
# Convert to bytes.
image = image.astype(np.uint8)
# Write the image-file in jpeg-format.
with open(filename, 'wb') as file:
PIL.Image.fromarray(image).save(file, 'jpeg')複製代碼
這個函數繪製出一張大的圖像。給到的圖像是一個包含0到255像素值的numpy數組。
def plot_image_big(image):
# Ensure the pixel-values are between 0 and 255.
image = np.clip(image, 0.0, 255.0)
# Convert pixels to bytes.
image = image.astype(np.uint8)
# Convert to a PIL-image and display it.
display(PIL.Image.fromarray(image))複製代碼
這個函數畫出內容圖像,混合圖像以及風格圖像。
def plot_images(content_image, style_image, mixed_image):
# Create figure with sub-plots.
fig, axes = plt.subplots(1, 3, figsize=(10, 10))
# Adjust vertical spacing.
fig.subplots_adjust(hspace=0.1, wspace=0.1)
# Use interpolation to smooth pixels?
smooth = True
# Interpolation type.
if smooth:
interpolation = 'sinc'
else:
interpolation = 'nearest'
# Plot the content-image.
# Note that the pixel-values are normalized to
# the [0.0, 1.0] range by dividing with 255.
ax = axes.flat[0]
ax.imshow(content_image / 255.0, interpolation=interpolation)
ax.set_xlabel("Content")
# Plot the mixed-image.
ax = axes.flat[1]
ax.imshow(mixed_image / 255.0, interpolation=interpolation)
ax.set_xlabel("Mixed")
# Plot the style-image
ax = axes.flat[2]
ax.imshow(style_image / 255.0, interpolation=interpolation)
ax.set_xlabel("Style")
# Remove ticks from all the plots.
for ax in axes.flat:
ax.set_xticks([])
ax.set_yticks([])
# Ensure the plot is shown correctly with multiple plots
# in a single Notebook cell.
plt.show()複製代碼
這些幫助函數建立了在TensorFlow優化中用的損失函數。
這個函數建立了一個TensorFlow運算,用來計算兩個輸入張量的最小平均偏差(Mean Squared Error)。
def mean_squared_error(a, b):
return tf.reduce_mean(tf.square(a - b))複製代碼
這個函數建立了內容圖像的損失函數。它是在給定層中,內容圖像和混合圖像激活特徵的最小平均偏差。當內容損失最小時,意味着在給定層中,混合圖像與內容圖像的激活特徵很類似。根據你所選擇的層次,這會將內容圖像的輪廓遷移到混合圖像中。
def create_content_loss(session, model, content_image, layer_ids):
""" Create the loss-function for the content-image. Parameters: session: An open TensorFlow session for running the model's graph. model: The model, e.g. an instance of the VGG16-class. content_image: Numpy float array with the content-image. layer_ids: List of integer id's for the layers to use in the model. """
# Create a feed-dict with the content-image.
feed_dict = model.create_feed_dict(image=content_image)
# Get references to the tensors for the given layers.
layers = model.get_layer_tensors(layer_ids)
# Calculate the output values of those layers when
# feeding the content-image to the model.
values = session.run(layers, feed_dict=feed_dict)
# Set the model's graph as the default so we can add
# computational nodes to it. It is not always clear
# when this is necessary in TensorFlow, but if you
# want to re-use this code then it may be necessary.
with model.graph.as_default():
# Initialize an empty list of loss-functions.
layer_losses = []
# For each layer and its corresponding values
# for the content-image.
for value, layer in zip(values, layers):
# These are the values that are calculated
# for this layer in the model when inputting
# the content-image. Wrap it to ensure it
# is a const - although this may be done
# automatically by TensorFlow.
value_const = tf.constant(value)
# The loss-function for this layer is the
# Mean Squared Error between the layer-values
# when inputting the content- and mixed-images.
# Note that the mixed-image is not calculated
# yet, we are merely creating the operations
# for calculating the MSE between those two.
loss = mean_squared_error(layer, value_const)
# Add the loss-function for this layer to the
# list of loss-functions.
layer_losses.append(loss)
# The combined loss for all layers is just the average.
# The loss-functions could be weighted differently for
# each layer. You can try it and see what happens.
total_loss = tf.reduce_mean(layer_losses)
return total_loss複製代碼
咱們將對風格層作相同的處理,但如今須要度量出哪些特徵在風格層和風格圖像中同時被激活,接着將這些激活模式複製到混合圖像中。
一種辦法是爲風格層的輸出張量計算一個所謂的格拉姆矩陣(Gram-matrix)。Gram-matrix本質上就是風格層中激活特徵向量的點乘矩陣。
若是Gram-matrix中的一個元素的值接近於0,這意味着給定層的兩個特徵在風格圖像中沒有同時激活。反之亦然,若是Gram-matrix中有很大的值,表明着兩個特徵同時被激活。接着,咱們會試圖生成複製了風格圖像激活模式的混合圖像。
這個幫助函數用來計算神經網絡中卷積層輸出張量的Gram-matrix。真正的損失函數會在後面建立。
def gram_matrix(tensor):
shape = tensor.get_shape()
# Get the number of feature channels for the input tensor,
# which is assumed to be from a convolutional layer with 4-dim.
num_channels = int(shape[3])
# Reshape the tensor so it is a 2-dim matrix. This essentially
# flattens the contents of each feature-channel.
matrix = tf.reshape(tensor, shape=[-1, num_channels])
# Calculate the Gram-matrix as the matrix-product of
# the 2-dim matrix with itself. This calculates the
# dot-products of all combinations of the feature-channels.
gram = tf.matmul(tf.transpose(matrix), matrix)
return gram複製代碼
下面的函數建立了風格圖像的損失函數。它和上面的create_content_loss()
很像,除了咱們是計算Gram-matrix而非layer輸出張量的最小平方偏差。
def create_style_loss(session, model, style_image, layer_ids):
""" Create the loss-function for the style-image. Parameters: session: An open TensorFlow session for running the model's graph. model: The model, e.g. an instance of the VGG16-class. style_image: Numpy float array with the style-image. layer_ids: List of integer id's for the layers to use in the model. """
# Create a feed-dict with the style-image.
feed_dict = model.create_feed_dict(image=style_image)
# Get references to the tensors for the given layers.
layers = model.get_layer_tensors(layer_ids)
# Set the model's graph as the default so we can add
# computational nodes to it. It is not always clear
# when this is necessary in TensorFlow, but if you
# want to re-use this code then it may be necessary.
with model.graph.as_default():
# Construct the TensorFlow-operations for calculating
# the Gram-matrices for each of the layers.
gram_layers = [gram_matrix(layer) for layer in layers]
# Calculate the values of those Gram-matrices when
# feeding the style-image to the model.
values = session.run(gram_layers, feed_dict=feed_dict)
# Initialize an empty list of loss-functions.
layer_losses = []
# For each Gram-matrix layer and its corresponding values.
for value, gram_layer in zip(values, gram_layers):
# These are the Gram-matrix values that are calculated
# for this layer in the model when inputting the
# style-image. Wrap it to ensure it is a const,
# although this may be done automatically by TensorFlow.
value_const = tf.constant(value)
# The loss-function for this layer is the
# Mean Squared Error between the Gram-matrix values
# for the content- and mixed-images.
# Note that the mixed-image is not calculated
# yet, we are merely creating the operations
# for calculating the MSE between those two.
loss = mean_squared_error(gram_layer, value_const)
# Add the loss-function for this layer to the
# list of loss-functions.
layer_losses.append(loss)
# The combined loss for all layers is just the average.
# The loss-functions could be weighted differently for
# each layer. You can try it and see what happens.
total_loss = tf.reduce_mean(layer_losses)
return total_loss複製代碼
下面建立了用來給混合圖像去噪的損失函數。這個算法稱爲Total Variation Denoising,本質上就是在x和y軸上將圖像偏移一個像素,計算它與原始圖像的差別,取絕對值保證差別是正值,而後對整個圖像全部像素求和。這個步驟建立了一個能夠最小化的損失函數,用來抑制圖像中的噪聲。
def create_denoise_loss(model):
loss = tf.reduce_sum(tf.abs(model.input[:,1:,:,:] - model.input[:,:-1,:,:])) + \
tf.reduce_sum(tf.abs(model.input[:,:,1:,:] - model.input[:,:,:-1,:]))
return loss複製代碼
這是風格遷移主要的優化算法。它基本上就是在上面定義的那些損失函數上作梯度降低。
算法也使用了損失函數的歸一化。這彷佛是一個以前未發表過的新穎想法。在每次優化迭代中,調整損失值,使它們等於一。這讓用戶能夠獨立地設置所選風格層以及內容層的損失權重。同時,在優化過程當中也修改權重,來確保保留風格、內容、去噪之間所需的比重。
def style_transfer(content_image, style_image, content_layer_ids, style_layer_ids, weight_content=1.5, weight_style=10.0, weight_denoise=0.3, num_iterations=120, step_size=10.0):
""" Use gradient descent to find an image that minimizes the loss-functions of the content-layers and style-layers. This should result in a mixed-image that resembles the contours of the content-image, and resembles the colours and textures of the style-image. Parameters: content_image: Numpy 3-dim float-array with the content-image. style_image: Numpy 3-dim float-array with the style-image. content_layer_ids: List of integers identifying the content-layers. style_layer_ids: List of integers identifying the style-layers. weight_content: Weight for the content-loss-function. weight_style: Weight for the style-loss-function. weight_denoise: Weight for the denoising-loss-function. num_iterations: Number of optimization iterations to perform. step_size: Step-size for the gradient in each iteration. """
# Create an instance of the VGG16-model. This is done
# in each call of this function, because we will add
# operations to the graph so it can grow very large
# and run out of RAM if we keep using the same instance.
model = vgg16.VGG16()
# Create a TensorFlow-session.
session = tf.InteractiveSession(graph=model.graph)
# Print the names of the content-layers.
print("Content layers:")
print(model.get_layer_names(content_layer_ids))
print()
# Print the names of the style-layers.
print("Style layers:")
print(model.get_layer_names(style_layer_ids))
print()
# Create the loss-function for the content-layers and -image.
loss_content = create_content_loss(session=session,
model=model,
content_image=content_image,
layer_ids=content_layer_ids)
# Create the loss-function for the style-layers and -image.
loss_style = create_style_loss(session=session,
model=model,
style_image=style_image,
layer_ids=style_layer_ids)
# Create the loss-function for the denoising of the mixed-image.
loss_denoise = create_denoise_loss(model)
# Create TensorFlow variables for adjusting the values of
# the loss-functions. This is explained below.
adj_content = tf.Variable(1e-10, name='adj_content')
adj_style = tf.Variable(1e-10, name='adj_style')
adj_denoise = tf.Variable(1e-10, name='adj_denoise')
# Initialize the adjustment values for the loss-functions.
session.run([adj_content.initializer,
adj_style.initializer,
adj_denoise.initializer])
# Create TensorFlow operations for updating the adjustment values.
# These are basically just the reciprocal values of the
# loss-functions, with a small value 1e-10 added to avoid the
# possibility of division by zero.
update_adj_content = adj_content.assign(1.0 / (loss_content + 1e-10))
update_adj_style = adj_style.assign(1.0 / (loss_style + 1e-10))
update_adj_denoise = adj_denoise.assign(1.0 / (loss_denoise + 1e-10))
# This is the weighted loss-function that we will minimize
# below in order to generate the mixed-image.
# Because we multiply the loss-values with their reciprocal
# adjustment values, we can use relative weights for the
# loss-functions that are easier to select, as they are
# independent of the exact choice of style- and content-layers.
loss_combined = weight_content * adj_content * loss_content + \
weight_style * adj_style * loss_style + \
weight_denoise * adj_denoise * loss_denoise
# Use TensorFlow to get the mathematical function for the
# gradient of the combined loss-function with regard to
# the input image.
gradient = tf.gradients(loss_combined, model.input)
# List of tensors that we will run in each optimization iteration.
run_list = [gradient, update_adj_content, update_adj_style, \
update_adj_denoise]
# The mixed-image is initialized with random noise.
# It is the same size as the content-image.
mixed_image = np.random.rand(*content_image.shape) + 128
for i in range(num_iterations):
# Create a feed-dict with the mixed-image.
feed_dict = model.create_feed_dict(image=mixed_image)
# Use TensorFlow to calculate the value of the
# gradient, as well as updating the adjustment values.
grad, adj_content_val, adj_style_val, adj_denoise_val \
= session.run(run_list, feed_dict=feed_dict)
# Reduce the dimensionality of the gradient.
grad = np.squeeze(grad)
# Scale the step-size according to the gradient-values.
step_size_scaled = step_size / (np.std(grad) + 1e-8)
# Update the image by following the gradient.
mixed_image -= grad * step_size_scaled
# Ensure the image has valid pixel-values between 0 and 255.
mixed_image = np.clip(mixed_image, 0.0, 255.0)
# Print a little progress-indicator.
print(". ", end="")
# Display status once every 10 iterations, and the last.
if (i % 10 == 0) or (i == num_iterations - 1):
print()
print("Iteration:", i)
# Print adjustment weights for loss-functions.
msg = "Weight Adj. for Content: {0:.2e}, Style: {1:.2e}, Denoise: {2:.2e}"
print(msg.format(adj_content_val, adj_style_val, adj_denoise_val))
# Plot the content-, style- and mixed-images.
plot_images(content_image=content_image,
style_image=style_image,
mixed_image=mixed_image)
print()
print("Final image:")
plot_image_big(mixed_image)
# Close the TensorFlow session to release its resources.
session.close()
# Return the mixed-image.
return mixed_image複製代碼
這個例子展現瞭如何將多張圖像的風格遷移到一張肖像上。
首先,咱們載入內容圖像,它有混合圖像想要的大致輪廓。
content_filename = 'images/willy_wonka_old.jpg'
content_image = load_image(content_filename, max_size=None)複製代碼
而後咱們載入風格圖像,它擁有混合圖像想要的顏色和紋理。
style_filename = 'images/style7.jpg'
style_image = load_image(style_filename, max_size=300)複製代碼
接着咱們定義一個整數列表,它表明神經網絡中咱們用來匹配內容圖像的層次。這些是神經網絡層次的索引。對於VGG16模型,第5層(索引4)彷佛是惟一有效的內容層。
content_layer_ids = [4]複製代碼
而後,咱們爲風格層定義另一個整型數組。
# The VGG16-model has 13 convolutional layers.
# This selects all those layers as the style-layers.
# This is somewhat slow to optimize.
style_layer_ids = list(range(13))
# You can also select a sub-set of the layers, e.g. like this:
# style_layer_ids = [1, 2, 3, 4]複製代碼
如今執行風格遷移。它自動地爲風格圖像、內容圖像建立合適的損失函數,而後進行屢次優化迭代。這將逐步地生成一張混合圖像,其擁有內容圖像的大致輪廓,而且它的紋理、顏色和風格圖像相似。
在CPU上這個運算會很慢!
%%time
img = style_transfer(content_image=content_image,
style_image=style_image,
content_layer_ids=content_layer_ids,
style_layer_ids=style_layer_ids,
weight_content=1.5,
weight_style=10.0,
weight_denoise=0.3,
num_iterations=60,
step_size=10.0)複製代碼
Content layers:
['conv3_1/conv3_1']Style layers:
['conv1_1/conv1_1', 'conv1_2/conv1_2', 'conv2_1/conv2_1', 'conv2_2/conv2_2', 'conv3_1/conv3_1', 'conv3_2/conv3_2', 'conv3_3/conv3_3', 'conv4_1/conv4_1', 'conv4_2/conv4_2', 'conv4_3/conv4_3', 'conv5_1/conv5_1', 'conv5_2/conv5_2', 'conv5_3/conv5_3'].
Iteration: 0
Weight Adj. for Content: 5.18e-11, Style: 2.14e-29, Denoise: 5.61e-06. . . . . . . . . .
Iteration: 10
Weight Adj. for Content: 2.79e-11, Style: 4.13e-28, Denoise: 1.25e-07
. . . . . . . . . .
Iteration: 20
Weight Adj. for Content: 2.63e-11, Style: 1.09e-27, Denoise: 1.30e-07. . . . . . . . . .
Iteration: 30
Weight Adj. for Content: 2.66e-11, Style: 1.27e-27, Denoise: 1.27e-07. . . . . . . . . .
Iteration: 40
Weight Adj. for Content: 2.73e-11, Style: 1.16e-27, Denoise: 1.26e-07. . . . . . . . . .
Iteration: 50
Weight Adj. for Content: 2.75e-11, Style: 1.12e-27, Denoise: 1.24e-07. . . . . . . . .
Iteration: 59
Weight Adj. for Content: 1.85e-11, Style: 3.86e-28, Denoise: 1.01e-07Final image:
CPU times: user 20min 1s, sys: 45.5 s, total: 20min 46s
Wall time: 3min 4s
這篇教程說明了用神經網絡來結合兩張圖像內容和風格的基本想法。不幸的是,結果並不像一些商業系統那麼好,好比 DeepArt,它是由這種技術的一些先驅者開發的。(結果很差的)緣由暫不明確。也許咱們只是須要更強的計算力,能夠在高分辨率圖像上以更小的步長,運行更多的優化迭代。或許咱們須要更復雜的優化方法。下面的練習給出了一些可能會提高質量的建議,鼓勵你嘗試一下。
下面使一些可能會讓你提高TensorFlow技能的一些建議練習。爲了學習如何更合適地使用TensorFlow,實踐經驗是很重要的。
在你對這個Notebook進行修改以前,可能須要先備份一下。
load_image()
函數中,你能夠用max_size
參數來改變圖像大小。它對結果有什麼影響?