[譯] TensorFlow 教程 #15 - 風格遷移

時間 2019-12-09

標籤 tensorflow 教程風格遷移简体版

原文原文鏈接

題圖來自：Experiments with style transfer
終於寫到這一篇了。這兩年各類藝術風格圖像處理的app層出不窮，好比當初火熱的Prisma。
本文簡單地介紹並實現了風格遷移算法，更多描述可參考以前翻譯的文章（圖像風格化、AI做曲，機器學習與藝術）。
不過在具體應用時可能還需優化，好比視頻中要考慮幀間穩定性等。
node

01 - 簡單線性模型 | 02 - 卷積神經網絡 | 03 - PrettyTensor | 04 - 保存& 恢復
 05 - 集成學習 | 06 - CIFAR 10 | 07 - Inception 模型 | 08 - 遷移學習
 09 - 視頻數據 | 11 - 對抗樣本 | 12 - MNIST的對抗噪聲 | 13 - 可視化分析
 14 - DeepDreampython

by Magnus Erik Hvass Pedersen / GitHub / Videos on YouTube
中文翻譯 thrillerist / Githubgit

若有轉載，請附上本文連接。github

介紹

在以前的教程#14中，咱們看到了如何最大化神經網絡內部的特徵激活，以便放大輸入圖像中的模式。這個稱爲DeepDream。算法

本文采用了相似的想法，不過有兩張輸入圖：一張內容圖像和一張風格圖像。而後，咱們但願建立一張混合圖像，它包含了內容圖的輪廓以及風格圖的紋理。數組

本文基於以前的教程。你須要大概地熟悉神經網絡（詳見教程 #01和 #02），熟悉教程 #14中的DeepDream也頗有幫助。網絡

流程圖

這張流程圖顯示了風格遷移算法的大致想法，儘管比起圖中所展現出來的，咱們所使用的VGG-16模型有更多的層次。session

輸入兩張圖像到神經網絡中：一張內容圖像和一張風格圖像。咱們但願建立一張混合圖像，它包含了內容圖的輪廓以及風格圖的紋理。
咱們經過建立幾個能夠被優化的損失函數來完成這一點。app

內容圖像的損失函數會試着在網絡的某一層或多層上，最小化內容圖像以及混合圖像激活特徵的差距。這使得混合圖像和內容圖像的的輪廓類似。dom

風格圖像的損失函數稍微複雜一些，由於它試圖讓風格圖像和混合圖像的格拉姆矩陣（Gram-matrices）的差別最小化。這在網絡的一個或多個層中完成。 Gram-matrices度量了哪一個特徵在給定層中同時被激活。改變混合圖像，使其模仿風格圖像的激活模式(activation patterns)，這將致使顏色和紋理的遷移。

咱們用TensorFlow來自動導出這些損失函數的梯度。而後用梯度來更新混合圖像。重複屢次這個過程，直到咱們對結果圖像滿意爲止。

風格遷移算法的一些細節沒有在這張流程圖中顯示出來，好比，對於Gram-matrices的計算，計算並保存中間值來提高效率，還有一個用來給混合圖像去噪的損失函數，對損失函數作歸一化（normalization），這樣它們更容易相對彼此縮放。

from IPython.display import Image, display
Image('images/15_style_transfer_flowchart.png')複製代碼

Imports

%matplotlib inline
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np
import PIL.Image複製代碼

使用Python3.5.2（Anaconda）開發，TensorFlow版本是：

tf.__version__複製代碼

'0.11.0rc0'

VGG-16 模型

我花了兩天時間，想用以前教程#14中在DeepDream上使用的Inception 5h模型來實現風格遷移算法，但沒法獲得看起來足夠好的圖像。這有點奇怪，由於教程#14中生成的圖像看起來挺好的。但回想起來，咱們（在教程#14裏）也用了一些技巧來獲得這種質量，好比平滑梯度以及遞歸的降採樣並處理圖像。

原始論文使用了VGG-19卷積神經網絡。出於因爲某些緣由，對於TendorFlow來講，預訓練的VGG-19模型在本教程中不夠穩定。所以咱們使用VGG-16模型，這是其餘人制做的，能夠很容易地獲取並在TensorFlow中載入。方便起見，咱們封裝了一個類。

import vgg16複製代碼

VGG-16模型是從網上下載的。這是你保存數據文件的默認文件夾。若是文件夾不存在，它就會被建立。

# vgg16.data_dir = 'vgg16/'複製代碼

Download the data for the VGG-16 model if it doesn't already exist in the directory.

WARNING: It is 550 MB!

若是文件夾中沒有VGG-16模型，就自動下載。

注意：它有500MB！

vgg16.maybe_download()複製代碼

Downloading VGG16 Model ...
Data has apparently already been downloaded and unpacked.

操做圖像的幫助函數

這個函數載入一張圖像，並返回一個浮點型numpy數組。圖像能夠被自動地改變大小，所以最大的寬高等於max_size。

def load_image(filename, max_size=None):
    image = PIL.Image.open(filename)

    if max_size is not None:
        # Calculate the appropriate rescale-factor for
        # ensuring a max height and width, while keeping
        # the proportion between them.
        factor = max_size / np.max(image.size)

        # Scale the image's height and width.
        size = np.array(image.size) * factor

        # The size is now floating-point because it was scaled.
        # But PIL requires the size to be integers.
        size = size.astype(int)

        # Resize the image.
        image = image.resize(size, PIL.Image.LANCZOS)

    # Convert to numpy floating-point array.
    return np.float32(image)複製代碼

將圖像保存成一個jpeg文件。給到的圖像是一個包含0到255像素值的numpy數組。

def save_image(image, filename):
    # Ensure the pixel-values are between 0 and 255.
    image = np.clip(image, 0.0, 255.0)

    # Convert to bytes.
    image = image.astype(np.uint8)

    # Write the image-file in jpeg-format.
    with open(filename, 'wb') as file:
        PIL.Image.fromarray(image).save(file, 'jpeg')複製代碼

這個函數繪製出一張大的圖像。給到的圖像是一個包含0到255像素值的numpy數組。

def plot_image_big(image):
    # Ensure the pixel-values are between 0 and 255.
    image = np.clip(image, 0.0, 255.0)

    # Convert pixels to bytes.
    image = image.astype(np.uint8)

    # Convert to a PIL-image and display it.
    display(PIL.Image.fromarray(image))複製代碼

這個函數畫出內容圖像，混合圖像以及風格圖像。

def plot_images(content_image, style_image, mixed_image):
    # Create figure with sub-plots.
    fig, axes = plt.subplots(1, 3, figsize=(10, 10))

    # Adjust vertical spacing.
    fig.subplots_adjust(hspace=0.1, wspace=0.1)

    # Use interpolation to smooth pixels?
    smooth = True

    # Interpolation type.
    if smooth:
        interpolation = 'sinc'
    else:
        interpolation = 'nearest'

    # Plot the content-image.
    # Note that the pixel-values are normalized to
    # the [0.0, 1.0] range by dividing with 255.
    ax = axes.flat[0]
    ax.imshow(content_image / 255.0, interpolation=interpolation)
    ax.set_xlabel("Content")

    # Plot the mixed-image.
    ax = axes.flat[1]
    ax.imshow(mixed_image / 255.0, interpolation=interpolation)
    ax.set_xlabel("Mixed")

    # Plot the style-image
    ax = axes.flat[2]
    ax.imshow(style_image / 255.0, interpolation=interpolation)
    ax.set_xlabel("Style")

    # Remove ticks from all the plots.
    for ax in axes.flat:
        ax.set_xticks([])
        ax.set_yticks([])

    # Ensure the plot is shown correctly with multiple plots
    # in a single Notebook cell.
    plt.show()複製代碼

損失函數

這些幫助函數建立了在TensorFlow優化中用的損失函數。

這個函數建立了一個TensorFlow運算，用來計算兩個輸入張量的最小平均偏差（Mean Squared Error）。

def mean_squared_error(a, b):
    return tf.reduce_mean(tf.square(a - b))複製代碼

這個函數建立了內容圖像的損失函數。它是在給定層中，內容圖像和混合圖像激活特徵的最小平均偏差。當內容損失最小時，意味着在給定層中，混合圖像與內容圖像的激活特徵很類似。根據你所選擇的層次，這會將內容圖像的輪廓遷移到混合圖像中。

def create_content_loss(session, model, content_image, layer_ids):
    """ Create the loss-function for the content-image. Parameters: session: An open TensorFlow session for running the model's graph. model: The model, e.g. an instance of the VGG16-class. content_image: Numpy float array with the content-image. layer_ids: List of integer id's for the layers to use in the model. """

    # Create a feed-dict with the content-image.
    feed_dict = model.create_feed_dict(image=content_image)

    # Get references to the tensors for the given layers.
    layers = model.get_layer_tensors(layer_ids)

    # Calculate the output values of those layers when
    # feeding the content-image to the model.
    values = session.run(layers, feed_dict=feed_dict)

    # Set the model's graph as the default so we can add
    # computational nodes to it. It is not always clear
    # when this is necessary in TensorFlow, but if you
    # want to re-use this code then it may be necessary.
    with model.graph.as_default():
        # Initialize an empty list of loss-functions.
        layer_losses = []

        # For each layer and its corresponding values
        # for the content-image.
        for value, layer in zip(values, layers):
            # These are the values that are calculated
            # for this layer in the model when inputting
            # the content-image. Wrap it to ensure it
            # is a const - although this may be done
            # automatically by TensorFlow.
            value_const = tf.constant(value)

            # The loss-function for this layer is the
            # Mean Squared Error between the layer-values
            # when inputting the content- and mixed-images.
            # Note that the mixed-image is not calculated
            # yet, we are merely creating the operations
            # for calculating the MSE between those two.
            loss = mean_squared_error(layer, value_const)

            # Add the loss-function for this layer to the
            # list of loss-functions.
            layer_losses.append(loss)

        # The combined loss for all layers is just the average.
        # The loss-functions could be weighted differently for
        # each layer. You can try it and see what happens.
        total_loss = tf.reduce_mean(layer_losses)

    return total_loss複製代碼

咱們將對風格層作相同的處理，但如今須要度量出哪些特徵在風格層和風格圖像中同時被激活，接着將這些激活模式複製到混合圖像中。

一種辦法是爲風格層的輸出張量計算一個所謂的格拉姆矩陣（Gram-matrix）。Gram-matrix本質上就是風格層中激活特徵向量的點乘矩陣。

若是Gram-matrix中的一個元素的值接近於0，這意味着給定層的兩個特徵在風格圖像中沒有同時激活。反之亦然，若是Gram-matrix中有很大的值，表明着兩個特徵同時被激活。接着，咱們會試圖生成複製了風格圖像激活模式的混合圖像。

這個幫助函數用來計算神經網絡中卷積層輸出張量的Gram-matrix。真正的損失函數會在後面建立。

def gram_matrix(tensor):
    shape = tensor.get_shape()

    # Get the number of feature channels for the input tensor,
    # which is assumed to be from a convolutional layer with 4-dim.
    num_channels = int(shape[3])

    # Reshape the tensor so it is a 2-dim matrix. This essentially
    # flattens the contents of each feature-channel.
    matrix = tf.reshape(tensor, shape=[-1, num_channels])

    # Calculate the Gram-matrix as the matrix-product of
    # the 2-dim matrix with itself. This calculates the
    # dot-products of all combinations of the feature-channels.
    gram = tf.matmul(tf.transpose(matrix), matrix)

    return gram複製代碼

下面的函數建立了風格圖像的損失函數。它和上面的create_content_loss()很像，除了咱們是計算Gram-matrix而非layer輸出張量的最小平方偏差。

def create_style_loss(session, model, style_image, layer_ids):
    """ Create the loss-function for the style-image. Parameters: session: An open TensorFlow session for running the model's graph. model: The model, e.g. an instance of the VGG16-class. style_image: Numpy float array with the style-image. layer_ids: List of integer id's for the layers to use in the model. """

    # Create a feed-dict with the style-image.
    feed_dict = model.create_feed_dict(image=style_image)

    # Get references to the tensors for the given layers.
    layers = model.get_layer_tensors(layer_ids)

    # Set the model's graph as the default so we can add
    # computational nodes to it. It is not always clear
    # when this is necessary in TensorFlow, but if you
    # want to re-use this code then it may be necessary.
    with model.graph.as_default():
        # Construct the TensorFlow-operations for calculating
        # the Gram-matrices for each of the layers.
        gram_layers = [gram_matrix(layer) for layer in layers]

        # Calculate the values of those Gram-matrices when
        # feeding the style-image to the model.
        values = session.run(gram_layers, feed_dict=feed_dict)

        # Initialize an empty list of loss-functions.
        layer_losses = []

        # For each Gram-matrix layer and its corresponding values.
        for value, gram_layer in zip(values, gram_layers):
            # These are the Gram-matrix values that are calculated
            # for this layer in the model when inputting the
            # style-image. Wrap it to ensure it is a const,
            # although this may be done automatically by TensorFlow.
            value_const = tf.constant(value)

            # The loss-function for this layer is the
            # Mean Squared Error between the Gram-matrix values
            # for the content- and mixed-images.
            # Note that the mixed-image is not calculated
            # yet, we are merely creating the operations
            # for calculating the MSE between those two.
            loss = mean_squared_error(gram_layer, value_const)

            # Add the loss-function for this layer to the
            # list of loss-functions.
            layer_losses.append(loss)

        # The combined loss for all layers is just the average.
        # The loss-functions could be weighted differently for
        # each layer. You can try it and see what happens.
        total_loss = tf.reduce_mean(layer_losses)

    return total_loss複製代碼

下面建立了用來給混合圖像去噪的損失函數。這個算法稱爲Total Variation Denoising，本質上就是在x和y軸上將圖像偏移一個像素，計算它與原始圖像的差別，取絕對值保證差別是正值，而後對整個圖像全部像素求和。這個步驟建立了一個能夠最小化的損失函數，用來抑制圖像中的噪聲。

def create_denoise_loss(model):
    loss = tf.reduce_sum(tf.abs(model.input[:,1:,:,:] - model.input[:,:-1,:,:])) + \
           tf.reduce_sum(tf.abs(model.input[:,:,1:,:] - model.input[:,:,:-1,:]))

    return loss複製代碼

風格遷移算法

這是風格遷移主要的優化算法。它基本上就是在上面定義的那些損失函數上作梯度降低。

算法也使用了損失函數的歸一化。這彷佛是一個以前未發表過的新穎想法。在每次優化迭代中，調整損失值，使它們等於一。這讓用戶能夠獨立地設置所選風格層以及內容層的損失權重。同時，在優化過程當中也修改權重，來確保保留風格、內容、去噪之間所需的比重。

def style_transfer(content_image, style_image, content_layer_ids, style_layer_ids, weight_content=1.5, weight_style=10.0, weight_denoise=0.3, num_iterations=120, step_size=10.0):
    """ Use gradient descent to find an image that minimizes the loss-functions of the content-layers and style-layers. This should result in a mixed-image that resembles the contours of the content-image, and resembles the colours and textures of the style-image. Parameters: content_image: Numpy 3-dim float-array with the content-image. style_image: Numpy 3-dim float-array with the style-image. content_layer_ids: List of integers identifying the content-layers. style_layer_ids: List of integers identifying the style-layers. weight_content: Weight for the content-loss-function. weight_style: Weight for the style-loss-function. weight_denoise: Weight for the denoising-loss-function. num_iterations: Number of optimization iterations to perform. step_size: Step-size for the gradient in each iteration. """

    # Create an instance of the VGG16-model. This is done
    # in each call of this function, because we will add
    # operations to the graph so it can grow very large
    # and run out of RAM if we keep using the same instance.
    model = vgg16.VGG16()

    # Create a TensorFlow-session.
    session = tf.InteractiveSession(graph=model.graph)

    # Print the names of the content-layers.
    print("Content layers:")
    print(model.get_layer_names(content_layer_ids))
    print()

    # Print the names of the style-layers.
    print("Style layers:")
    print(model.get_layer_names(style_layer_ids))
    print()

    # Create the loss-function for the content-layers and -image.
    loss_content = create_content_loss(session=session,
                                       model=model,
                                       content_image=content_image,
                                       layer_ids=content_layer_ids)

    # Create the loss-function for the style-layers and -image.
    loss_style = create_style_loss(session=session,
                                   model=model,
                                   style_image=style_image,
                                   layer_ids=style_layer_ids)    

    # Create the loss-function for the denoising of the mixed-image.
    loss_denoise = create_denoise_loss(model)

    # Create TensorFlow variables for adjusting the values of
    # the loss-functions. This is explained below.
    adj_content = tf.Variable(1e-10, name='adj_content')
    adj_style = tf.Variable(1e-10, name='adj_style')
    adj_denoise = tf.Variable(1e-10, name='adj_denoise')

    # Initialize the adjustment values for the loss-functions.
    session.run([adj_content.initializer,
                 adj_style.initializer,
                 adj_denoise.initializer])

    # Create TensorFlow operations for updating the adjustment values.
    # These are basically just the reciprocal values of the
    # loss-functions, with a small value 1e-10 added to avoid the
    # possibility of division by zero.
    update_adj_content = adj_content.assign(1.0 / (loss_content + 1e-10))
    update_adj_style = adj_style.assign(1.0 / (loss_style + 1e-10))
    update_adj_denoise = adj_denoise.assign(1.0 / (loss_denoise + 1e-10))

    # This is the weighted loss-function that we will minimize
    # below in order to generate the mixed-image.
    # Because we multiply the loss-values with their reciprocal
    # adjustment values, we can use relative weights for the
    # loss-functions that are easier to select, as they are
    # independent of the exact choice of style- and content-layers.
    loss_combined = weight_content * adj_content * loss_content + \
                    weight_style * adj_style * loss_style + \
                    weight_denoise * adj_denoise * loss_denoise

    # Use TensorFlow to get the mathematical function for the
    # gradient of the combined loss-function with regard to
    # the input image.
    gradient = tf.gradients(loss_combined, model.input)

    # List of tensors that we will run in each optimization iteration.
    run_list = [gradient, update_adj_content, update_adj_style, \
                update_adj_denoise]

    # The mixed-image is initialized with random noise.
    # It is the same size as the content-image.
    mixed_image = np.random.rand(*content_image.shape) + 128

    for i in range(num_iterations):
        # Create a feed-dict with the mixed-image.
        feed_dict = model.create_feed_dict(image=mixed_image)

        # Use TensorFlow to calculate the value of the
        # gradient, as well as updating the adjustment values.
        grad, adj_content_val, adj_style_val, adj_denoise_val \
        = session.run(run_list, feed_dict=feed_dict)

        # Reduce the dimensionality of the gradient.
        grad = np.squeeze(grad)

        # Scale the step-size according to the gradient-values.
        step_size_scaled = step_size / (np.std(grad) + 1e-8)

        # Update the image by following the gradient.
        mixed_image -= grad * step_size_scaled

        # Ensure the image has valid pixel-values between 0 and 255.
        mixed_image = np.clip(mixed_image, 0.0, 255.0)

        # Print a little progress-indicator.
        print(". ", end="")

        # Display status once every 10 iterations, and the last.
        if (i % 10 == 0) or (i == num_iterations - 1):
            print()
            print("Iteration:", i)

            # Print adjustment weights for loss-functions.
            msg = "Weight Adj. for Content: {0:.2e}, Style: {1:.2e}, Denoise: {2:.2e}"
            print(msg.format(adj_content_val, adj_style_val, adj_denoise_val))

            # Plot the content-, style- and mixed-images.
            plot_images(content_image=content_image,
                        style_image=style_image,
                        mixed_image=mixed_image)

    print()
    print("Final image:")
    plot_image_big(mixed_image)

    # Close the TensorFlow session to release its resources.
    session.close()

    # Return the mixed-image.
    return mixed_image複製代碼

例子

這個例子展現瞭如何將多張圖像的風格遷移到一張肖像上。

首先，咱們載入內容圖像，它有混合圖像想要的大致輪廓。

content_filename = 'images/willy_wonka_old.jpg'
content_image = load_image(content_filename, max_size=None)複製代碼

而後咱們載入風格圖像，它擁有混合圖像想要的顏色和紋理。

style_filename = 'images/style7.jpg'
style_image = load_image(style_filename, max_size=300)複製代碼

接着咱們定義一個整數列表，它表明神經網絡中咱們用來匹配內容圖像的層次。這些是神經網絡層次的索引。對於VGG16模型，第5層（索引4）彷佛是惟一有效的內容層。

content_layer_ids = [4]複製代碼

而後，咱們爲風格層定義另一個整型數組。

# The VGG16-model has 13 convolutional layers.
# This selects all those layers as the style-layers.
# This is somewhat slow to optimize.
style_layer_ids = list(range(13))

# You can also select a sub-set of the layers, e.g. like this:
# style_layer_ids = [1, 2, 3, 4]複製代碼

如今執行風格遷移。它自動地爲風格圖像、內容圖像建立合適的損失函數，而後進行屢次優化迭代。這將逐步地生成一張混合圖像，其擁有內容圖像的大致輪廓，而且它的紋理、顏色和風格圖像相似。

在CPU上這個運算會很慢！

%%time
img = style_transfer(content_image=content_image,
                     style_image=style_image,
                     content_layer_ids=content_layer_ids,
                     style_layer_ids=style_layer_ids,
                     weight_content=1.5,
                     weight_style=10.0,
                     weight_denoise=0.3,
                     num_iterations=60,
                     step_size=10.0)複製代碼

Content layers:
['conv3_1/conv3_1']

Style layers:
['conv1_1/conv1_1', 'conv1_2/conv1_2', 'conv2_1/conv2_1', 'conv2_2/conv2_2', 'conv3_1/conv3_1', 'conv3_2/conv3_2', 'conv3_3/conv3_3', 'conv4_1/conv4_1', 'conv4_2/conv4_2', 'conv4_3/conv4_3', 'conv5_1/conv5_1', 'conv5_2/conv5_2', 'conv5_3/conv5_3']

.
Iteration: 0
Weight Adj. for Content: 5.18e-11, Style: 2.14e-29, Denoise: 5.61e-06

. . . . . . . . . .
Iteration: 10
Weight Adj. for Content: 2.79e-11, Style: 4.13e-28, Denoise: 1.25e-07

. . . . . . . . . .
Iteration: 20
Weight Adj. for Content: 2.63e-11, Style: 1.09e-27, Denoise: 1.30e-07

. . . . . . . . . .
Iteration: 30
Weight Adj. for Content: 2.66e-11, Style: 1.27e-27, Denoise: 1.27e-07

. . . . . . . . . .
Iteration: 40
Weight Adj. for Content: 2.73e-11, Style: 1.16e-27, Denoise: 1.26e-07

. . . . . . . . . .
Iteration: 50
Weight Adj. for Content: 2.75e-11, Style: 1.12e-27, Denoise: 1.24e-07

. . . . . . . . .
Iteration: 59
Weight Adj. for Content: 1.85e-11, Style: 3.86e-28, Denoise: 1.01e-07

Final image:

CPU times: user 20min 1s, sys: 45.5 s, total: 20min 46s
Wall time: 3min 4s

總結

這篇教程說明了用神經網絡來結合兩張圖像內容和風格的基本想法。不幸的是，結果並不像一些商業系統那麼好，好比 DeepArt，它是由這種技術的一些先驅者開發的。（結果很差的）緣由暫不明確。也許咱們只是須要更強的計算力，能夠在高分辨率圖像上以更小的步長，運行更多的優化迭代。或許咱們須要更復雜的優化方法。下面的練習給出了一些可能會提高質量的建議，鼓勵你嘗試一下。