利用 TensorFlow 實現卷積自編碼器

時間 2019-12-13

標籤利用 tensorflow 實現編碼器简体版

原文原文鏈接

做者：chen_h
微信號 & QQ：862251340
微信公衆號：coderpai
簡書地址：https://www.jianshu.com/p/250...python

介紹和概念

自動編碼器（Auto-encoders）是神經網絡的一種形式，它的輸入數據與輸出數據是相同的。他們經過將輸入數據壓縮到一個潛在表示空間裏面，而後再根據這個表示空間將數據進行重構獲得最後的輸出數據。git

自編碼器的一個很是受歡迎的使用場景是圖像處理。其中使用到的小技巧是用卷積層來替換全鏈接層。這個轉變方法是將一個很是寬的，很是瘦的（好比 100*100 的像素點，3 通道，RGB）圖像轉換成一個很是窄的，很是厚的圖像。這種方法很是有助於幫助咱們從圖像中提取出視覺特徵，從而獲得更準確的潛在表示空間。最後咱們的圖像重構過程採用上採樣和卷積。github

這個自編碼器就稱之爲卷積自編碼器（Convolutional Autoencoder，CAE）算法

使用卷積自編碼器

卷積自編碼器能夠用於圖像的重構工做。例如，他們能夠學習從圖片中去除噪聲，或者重構圖片缺失的部分。數據庫

爲了實現上述提到的效果，咱們通常不使用相同的輸入數據和輸出數據，取而代之的是，使用含有噪聲的圖片做爲輸入數據，而後輸出數據是一個乾淨的圖片。卷積自編碼器就會經過學習，去去除圖片中的噪聲，或者去填補圖片中的空缺部分。api

接下來，讓咱們來看一下 CAE 是如何來填充圖中眼睛上的十字架。咱們假設圖片的眼睛上面存在一個十字架黑影，咱們須要刪除這個十字架噪聲。首先，咱們須要來手動建立這個數據庫，固然，這個動做很是方便。微信

如今咱們的卷積自編碼器就能夠開始訓練了，咱們能夠用它去除咱們從未見過的眼睛照片上面的十字線！網絡

利用 TensorFlow 來實現這個卷積自編碼器

看咱們利用 MNIST 數據集來看看這個網絡是如何實現的，完整的代碼能夠在 Github 上面下載。架構

網絡架構

卷積自編碼器的編碼部分將是一個典型的卷積過程。每個卷積層以後都會加上一個池化層，主要是爲了減小數據的維度。解碼器須要從一個很是窄的數據空間中重構出一個寬的圖像。dom

通常狀況下，你會看到咱們後面是採用反捲積層來增長咱們圖像的寬度和高度。它們的工做原理和卷積層的工做原理幾乎徹底同樣，可是做用方向相反。好比，你有一個 33 的卷積核，那麼在編碼器中咱們是將該區域的圖像編碼成一個元素點，可是在解碼器中，也就是反捲積中，咱們是把一個元素點解碼成 33 個元素點。TensorFlow API 爲咱們提供了這個功能，參考 tf.nn.conv2d_transpose

自動編碼器只須要在噪聲的圖像上進行訓練，就能夠很是成功的進行圖片去燥。好比，咱們能夠在訓練圖片中添加入高斯噪聲來建立包含噪聲的圖像，而後將這些像素值裁剪在 0 到 1 之間。咱們將噪聲圖像做爲輸入數據，最原始的感受圖像做爲輸出數據，也就是咱們的目標值。

模型定義

learning_rate = 0.001
inputs_ = tf.placeholder(tf.float32, (None, 28, 28, 1), name='inputs')
targets_ = tf.placeholder(tf.float32, (None, 28, 28, 1), name='targets')
### Encoder
conv1 = tf.layers.conv2d(inputs=inputs_, filters=32, kernel_size=(3,3), padding='same', activation=tf.nn.relu)
# Now 28x28x32
maxpool1 = tf.layers.max_pooling2d(conv1, pool_size=(2,2), strides=(2,2), padding='same')
# Now 14x14x32
conv2 = tf.layers.conv2d(inputs=maxpool1, filters=32, kernel_size=(3,3), padding='same', activation=tf.nn.relu)
# Now 14x14x32
maxpool2 = tf.layers.max_pooling2d(conv2, pool_size=(2,2), strides=(2,2), padding='same')
# Now 7x7x32
conv3 = tf.layers.conv2d(inputs=maxpool2, filters=16, kernel_size=(3,3), padding='same', activation=tf.nn.relu)
# Now 7x7x16
encoded = tf.layers.max_pooling2d(conv3, pool_size=(2,2), strides=(2,2), padding='same')
# Now 4x4x16
### Decoder
upsample1 = tf.image.resize_images(encoded, size=(7,7), method=tf.image.ResizeMethod.NEAREST_NEIGHBOR)
# Now 7x7x16
conv4 = tf.layers.conv2d(inputs=upsample1, filters=16, kernel_size=(3,3), padding='same', activation=tf.nn.relu)
# Now 7x7x16
upsample2 = tf.image.resize_images(conv4, size=(14,14), method=tf.image.ResizeMethod.NEAREST_NEIGHBOR)
# Now 14x14x16
conv5 = tf.layers.conv2d(inputs=upsample2, filters=32, kernel_size=(3,3), padding='same', activation=tf.nn.relu)
# Now 14x14x32
upsample3 = tf.image.resize_images(conv5, size=(28,28), method=tf.image.ResizeMethod.NEAREST_NEIGHBOR)
# Now 28x28x32
conv6 = tf.layers.conv2d(inputs=upsample3, filters=32, kernel_size=(3,3), padding='same', activation=tf.nn.relu)
# Now 28x28x32
logits = tf.layers.conv2d(inputs=conv6, filters=1, kernel_size=(3,3), padding='same', activation=None)
#Now 28x28x1
# Pass logits through sigmoid to get reconstructed image
decoded = tf.nn.sigmoid(logits)
# Pass logits through sigmoid and calculate the cross-entropy loss
loss = tf.nn.sigmoid_cross_entropy_with_logits(labels=targets_, logits=logits)
# Get cost and define the optimizer
cost = tf.reduce_mean(loss)
opt = tf.train.AdamOptimizer(learning_rate).minimize(cost)

訓練過程：

sess = tf.Session()
epochs = 100
batch_size = 200
# Set's how much noise we're adding to the MNIST images
noise_factor = 0.5
sess.run(tf.global_variables_initializer())
for e in range(epochs):
    for ii in range(mnist.train.num_examples//batch_size):
        batch = mnist.train.next_batch(batch_size)
        # Get images from the batch
        imgs = batch[0].reshape((-1, 28, 28, 1))
        
        # Add random noise to the input images
        noisy_imgs = imgs + noise_factor * np.random.randn(*imgs.shape)
        # Clip the images to be between 0 and 1
        noisy_imgs = np.clip(noisy_imgs, 0., 1.)
        
        # Noisy images as inputs, original images as targets
        batch_cost, _ = sess.run([cost, opt], feed_dict={inputs_: noisy_imgs,
                                                         targets_: imgs})
print("Epoch: {}/{}...".format(e+1, epochs),
              "Training loss: {:.4f}".format(batch_cost))

做者：chen_h
微信號 & QQ：862251340
簡書地址：https://www.jianshu.com/p/250...

CoderPai 是一個專一於算法實戰的平臺，從基礎的算法到人工智能算法都有設計。若是你對算法實戰感興趣，請快快關注咱們吧。加入AI實戰微信羣，AI實戰QQ羣，ACM算法微信羣，ACM算法QQ羣。長按或者掃描以下二維碼，關注「CoderPai」微信號（coderpai）