自編碼器

時間 2019-11-08

標籤編碼器简体版

原文原文鏈接

　　神經網絡就是最簡單的自動編碼器，區別在於其輸出和輸入是相同的，而後訓練器參數，獲得每一層中的權重，天然地咱們就獲得了輸入x的不一樣的表示_{（每一層表明一種）}這些就是特徵，自動編碼器就是一種儘量復現原數據的神經網絡。git

　　「自編碼」是一種數據壓縮算法，其中壓縮和解壓縮過程是有損的。自編碼訓練過程，不是無監督學習而是自監督學習。github

　　自編碼器（AutoEncoder,AE）是一種利用反向傳播算法取得使輸入值和輸出值偏差最小的特徵。自動編碼器由兩部分組成：算法

編碼器Encoder：將輸入值進行特徵提取，數據降維
解碼器Decoder：將特徵還原爲原始數據

咱們應該已經發現，相比於原數據重建數據變得模糊了，自編碼器是一個有損無監督的過程，可是自編碼器的目的不是求得損失函數最小的重建數據，而是使得偏差最小的特徵，，自編碼器的目的：安全

特徵提取
數據降維
數據去噪

　　自編碼器和PCA（主成分分析）有點類似，可是效果超越了PCA。網絡

　　自動編碼器只能壓縮那些與訓練數據相似的數據。訓練好的自編碼器只適用於一種編碼的數據集。若是另一種數據集採用了不一樣的編碼，則這個自編碼器不能起到很好的壓縮效果。訓練自編碼器，可使輸入經過編碼器和解碼器後，保留儘量多的信息，但也能夠訓練自編碼器來使新表徵具備多種不一樣的屬性。不一樣類型的自編碼器旨在實現不一樣類型的屬性。下面將重點介紹四種不一樣的自編碼器。dom

基本自編碼器

　　香草自編碼器，只有三層網絡，即只有一個隱藏層的神經網絡。它的輸入和輸出是相同的，可經過使用Adam優化器和均方偏差損失函數，來學習如何重構輸入。函數

　　隱藏層的壓縮維度爲32，小於輸入維度784，所以這個編碼器是有損的，經過這個約束，來迫使神經網絡來學習數據的壓縮表徵。學習

from keras.layers import Input, Dense
from keras.models import Model
import numpy as np
import matplotlib.pyplot as plt

# 讀取數據
path = './mnist.npz'
f = np.load(path)
x_train, y_train = f['x_train'], f['y_train']   # (60000, 28, 28), (60000,)
x_test, y_test = f['x_test'], f['y_test']       # (60000, 28, 28), (10000,)
f.close()

x_train = x_train.astype('float32') / 255.      # 歸一化
x_test = x_test.astype('float32') / 255.        # 歸一化
x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))   # (60000, 784)
x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:])))       # (60000, 784)

encoding_dim = 32   # 壓縮維度
input_img = Input(shape=(784,))

encoded = Dense(encoding_dim, activation='relu')(input_img)        # 編碼層 (?, 32)
decoded = Dense(784, activation='sigmoid')(encoded)                # 解碼層 (?, 784)

autoencoder = Model(inputs=input_img, outputs=decoded)            # 自編碼器模型
encoder = Model(inputs=input_img, outputs=encoded)                # 編碼器模型

decoder_input = Input(shape=(encoding_dim,))
decoder_layer = autoencoder.layers[-1]

decoder = Model(inputs=decoder_input, outputs=decoder_layer(decoder_input))

autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
# 自編碼器的輸入和輸出都是本身
autoencoder.fit(x_train, x_train, epochs=10, batch_size=256, shuffle=True, validation_data=(x_test, x_test))

encoded_imgs = encoder.predict(x_test)
decoded_imgs = decoder.predict(encoded_imgs)
autoencoder_imgs = autoencoder.predict(x_test)


plt.subplot(1, 3, 1)
plt.imshow(x_test[1].reshape(28, 28))       # 原數據
plt.gray()  # 灰度圖
ax.get_xaxis().set_visible(False)   # 除去 x 刻度
ax.get_yaxis().set_visible(False)   # 除去 y 刻度

plt.subplot(1, 3, 2)
# 編碼後的數據可能已經不是圖片數據了，因此這裏展現解碼數據
plt.imshow(decoded_imgs[1].reshape(28, 28))
plt.gray()  # 灰度圖
ax.get_xaxis().set_visible(False)   # 除去 x 刻度
ax.get_yaxis().set_visible(False)   # 除去 y 刻度

ax = plt.subplot(1, 3, 3)
plt.imshow(autoencoder_imgs[1].reshape(28, 28)) # 自編碼器後的數據其實和解碼後的數據同樣
plt.gray()  # 灰度圖
ax.get_xaxis().set_visible(False)   # 除去 x 刻度
ax.get_yaxis().set_visible(False)   # 除去 y 刻度

plt.show()

圖像1是原圖，圖像2是解碼器的圖，圖像3是自動編碼器的圖優化

多層自編碼器

　　這裏搭建了一個8層隱藏層的自編碼器，編碼器4層，解碼器4層網站

import numpy as np  from keras.datasets import mnist  
from keras.models import Model 　　# 泛型模型  
from keras.layers import Dense, Input  
import matplotlib.pyplot as plt  

# 讀取數據
path = './mnist.npz'
f = np.load(path)
x_train, y_train = f['x_train'], f['y_train']
x_test, y_test = f['x_test'], f['y_test']
f.close()
  
# 數據預處理  
x_train = x_train.astype('float32') / 255.        # 歸一化
x_test = x_test.astype('float32') / 255.        # 歸一化 
x_train = x_train.reshape((x_train.shape[0], -1))   # (60000 28*28)
x_test = x_test.reshape((x_test.shape[0], -1))  # (10000, 28*28)

encoding_dim = 2          # 壓縮特徵維度至2維
input_img = Input(shape=(784,))          # 輸入佔位符
  
# 編碼層  
encoded = Dense(128, activation='relu')(input_img)  
encoded = Dense(64, activation='relu')(encoded)  
encoded = Dense(10, activation='relu')(encoded)  
encoder_output = Dense(encoding_dim)(encoded)  
  
# 解碼層  
decoded = Dense(10, activation='relu')(encoder_output)  
decoded = Dense(64, activation='relu')(decoded)  
decoded = Dense(128, activation='relu')(decoded)  
decoded = Dense(784, activation='tanh')(decoded)  

encoder = Model(inputs=input_img, outputs=encoder_output)   # 搭建編碼模型 
autoencoder = Model(inputs=input_img, outputs=decoded)  # 搭建自編碼模型  

autoencoder.compile(optimizer='adam', loss='mse')   # 編譯自動編碼器
# 編碼器的輸出和輸入都是本身
autoencoder.fit(x_train, x_train, epochs=10, batch_size=256, shuffle=True)  
 
# plotting  
encoded_imgs = encoder.predict(x_test)  
decoded_imgs = autoencoder.predict(x_test)
 
# 原圖
ax = plt.subplot(1, 2, 1)
plt.imshow(x_test[1].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)

# 自動編碼器的圖
ax = plt.subplot(1, 2, 2)
plt.imshow(decoded_imgs[1].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()

圖像1是原圖，圖像2是壓縮解碼後的圖

卷積自編碼器：用卷積層構建自編碼器

　　當輸入是圖像時，使用卷積神經網絡是更好的。卷積自編碼器的編碼器部分由卷積層和MaxPooling層構成，MaxPooling負責空域下采樣。而解碼器由卷積層和上採樣層構成。

keras.layers.MaxPooling2D((2, 2), padding='same')   # 負責下采樣 
keras.layers.convolutional.UpSampling2D((2, 2))  # 負責上採樣

from keras.layers import Input, Convolution2D, MaxPooling2D, UpSampling2D  
from keras.models import Model  
import numpy as np  
import matplotlib.pyplot as plt  
from keras.callbacks import TensorBoard  
 
# 讀取數據
path = './mnist.npz'
f = np.load(path)
x_train, y_train = f['x_train'], f['y_train']
x_test, y_test = f['x_test'], f['y_test']
f.close()
  
# 數據預處理  
x_train = x_train.astype('float32') / 255.        # minmax_normalized  
x_test = x_test.astype('float32') / 255.          # minmax_normalized  
x_train = np.reshape(x_train, (len(x_train), 28, 28, 1))      # shape (60000, 28, 28, 1)
x_test = np.reshape(x_test, (len(x_test), 28, 28, 1))          # shape (10000, 28, 28, 1)        

input_img = Input(shape=(28, 28, 1))  

# 編碼器
x = Convolution2D(16, (3, 3), activation='relu', padding='same')(input_img)  
print(x.shape)        # (?, 28, 28, 16)
x = MaxPooling2D((2, 2), padding='same')(x)  
print(x.shape)        # (?, 14, 14, 16)
x = Convolution2D(8, (3, 3), activation='relu', padding='same')(x)  
print(x.shape)        # (?, 14, 14, 8)
x = MaxPooling2D((2, 2), padding='same')(x)  
print(x.shape)        # (?, 7, 7, 8)
x = Convolution2D(8, (3, 3), activation='relu', padding='same')(x)  
print(x.shape)        # (?, 7, 7, 8)
encoded = MaxPooling2D((2, 2), padding='same')(x)  
print(encoded.shape)        # (?, 4, 4, 8)

# 解碼器
x = Convolution2D(8, (3, 3), activation='relu', padding='same')(encoded)  
print(x.shape)        # (?, 4, 4, 8)
x = UpSampling2D((2, 2))(x)  
print(x.shape)        # (?, 8, 8, 8)
x = Convolution2D(8, (3, 3), activation='relu', padding='same')(x)  
print(x.shape)        # (?, 8, 8, 8)
x = UpSampling2D((2, 2))(x)  
print(x.shape)        # (?, 16, 16, 8)
x = Convolution2D(16, (3, 3), activation='relu')(x)  
print(x.shape)        # (?, 14, 14, 16)
x = UpSampling2D((2, 2))(x)  
print(x.shape)        # (?, 28, 28, 16)
decoded = Convolution2D(1, (3, 3), activation='sigmoid', padding='same')(x)  
print(decoded.shape)        # (?, 28, 28, 1)
  
autoencoder = Model(inputs=input_img, outputs=decoded)  
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')  

# 自編碼器的輸入和輸出都是本身
autoencoder.fit(x_train, x_train, epochs=10, batch_size=256,  
                shuffle=True, validation_data=(x_test, x_test),  
                callbacks=[TensorBoard(log_dir='autoencoder')])  
  
decoded_imgs = autoencoder.predict(x_test)  

# 原圖
ax = plt.subplot(1, 2, 1)
plt.imshow(x_test[1].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)

# 自動編碼器的圖
ax = plt.subplot(1, 2, 2)
plt.imshow(decoded_imgs[1].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()

　　注意：卷積後的形狀，只與步幅有關，最後一個維度=卷積核的個數；當padding是「same」是填充0，「valid」時捨去多餘項。

正則自編碼器

　　除了添加一個比輸入數據維度小的隱藏層。還可使用一些方法用來約束自編碼器重構，如正則自編碼器。

　　正則自編碼器不須要使用淺層的編碼器和解碼器以及小的編碼維數來限制模型容量，而是使用損失函數來鼓勵模型學習其餘特性（除了將輸入複製到輸出）。這些特性包括稀疏表徵、小導數表徵、以及對噪聲或輸入缺失的魯棒性。

在實際應用中，經常使用到兩種正則自編碼器，分別是稀疏自編碼器和降噪自編碼器。

稀疏自編碼器

　　通常用來學習特徵，以便用於像分類這樣的任務。稀疏正則化的自編碼器必須反映訓練數據集的獨特統計特徵，而不是簡單地充當恆等函數。以這種方式訓練，執行附帶稀疏懲罰的復現任務能夠獲得能學習有用特徵的模型。

　　還有一種用來約束自動編碼器重構的方法，是對其損失函數施加約束。好比，可對損失函數添加一個正則化約束，這樣能使自編碼器學習到數據的稀疏表徵。

　　要注意，在隱含層中，咱們還加入了L1正則化，做爲優化階段中損失函數的懲罰項。與基本自編碼器相比，這樣操做後的數據表徵更爲稀疏。

# ------- 稀疏自編碼器 ------- #
x = Input(shape=(784,))
# 僅僅是比Vanilla 自編碼器多一個正則項
h = Dense(32, activation='relu', activity_regularizer=regularizers.l1(10e-5))(x)   # 編碼器
r = Dense(784, activation='sigmoid')(h)     # 解碼器

autoencoder = Model(inputs=x, outputs=r)
autoencoder.compile(optimizer='adam', loss='mse')

history = autoencoder.fit(X_train, X_train, batch_size=128, epochs=15, verbose=1, validation_data=(X_test, X_test))
decoded_imgs = autoencoder.predict(X_test)

降噪自編碼器

　　這裏是經過改變損失函數的重構偏差項來學習一些有用信息。

　　向訓練數據加入噪聲，並使自編碼器學會去除這種噪聲來得到沒有被噪聲污染過的真實輸入。所以，這就迫使編碼器學習提取最重要的特徵並學習輸入數據中更加魯棒的表徵，這也是它的泛化能力比通常編碼器強的緣由。

# -*- encoding:utf-8 -*-
import keras
import numpy as np
import matplotlib.pyplot as plt
from keras.models import Model
from keras.layers import Input
from keras.layers.convolutional import Conv2D, MaxPooling2D, UpSampling2D

f = np.load('./mnist.npz')
X_train, _ = f['x_train'], f['y_train']
X_test, _ = f['x_test'], f['y_test']
f.close()

X_train = X_train.astype("float32") / 255.  # 歸一化
X_test = X_test.astype("float32") / 255.  # 歸一化
X_train = X_train.reshape(X_train.shape[0], 28, 28, 1)  # (60000, 28, 28, 1)
X_test = X_test.reshape(X_test.shape[0], 28, 28, 1)  # (10000, 28, 28, 1)

# 建立噪聲數據
noise_factor = 0.5  # 噪聲因子
X_train_noisy = X_train + noise_factor * np.random.normal(0.0, 1.0, X_train.shape)
X_test_noisy = X_test + noise_factor * np.random.normal(0.0, 1.0, X_test.shape)

X_train_noisy = np.clip(X_train_noisy, 0., 1.)
X_test_noisy = np.clip(X_test_noisy, 0., 1.)

# ------- 降噪自編碼器 ------- #
input_img = Input(shape=(28, 28, 1))

# 編碼器
x = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)  # (?, 28, 28, 32)
x = MaxPooling2D((2, 2), padding='same')(x)  # (?, 14, 14, 32)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)  # (?, 14, 14, 32)
encoded = MaxPooling2D((2, 2), padding='same')(x)  # (?, 7, 7, 32)
print(encoded.shape)

# 解碼器
x = Conv2D(32, (3, 3), activation='relu', padding='same')(encoded)  # (?, 7, 7, 32)
x = UpSampling2D((2, 2))(x)  # (?, 14, 14, 32)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)  # (?, 14, 14, 32)
x = UpSampling2D((2, 2))(x)  # (?, 28, 28, 32)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)  # (?, 28, 28, 1)

autoencoder = Model(inputs=input_img, outputs=decoded)
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')

# 輸入是噪聲數據，輸出是純淨數據
history = autoencoder.fit(X_train_noisy, X_train, batch_size=128, epochs=3,
                          verbose=1, validation_data=(X_test_noisy, X_test))
decoded_imgs = autoencoder.predict(X_test_noisy)

# 原圖
ax = plt.subplot(1, 2, 1)
plt.imshow(X_test[1].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)

# 自編碼器的圖
ax = plt.subplot(1, 2, 2)
plt.imshow(decoded_imgs[1].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()

左邊是噪聲數據，右邊是自編碼器降噪後的數據

Sequence-to-sequence自動編碼器

　　若是輸入是序列而不是2D的圖像，那麼就要針對序列模型構造自編碼器，如LSTM。要構造基於LSTM的自編碼器，首先咱們須要一個LSTM的編碼器來將輸入序列變爲一個向量，而後將這個向量重複N次，而後用LSTM的解碼器將這個N步的時間序列變爲目標序列。

from keras.layers import Input, LSTM, RepeatVector
from keras.models import Model

inputs = Input(shape=(timesteps, input_dim))
encoded = LSTM(latent_dim)(inputs)

decoded = RepeatVector(timesteps)(encoded)
decoded = LSTM(input, return_sequences=True)(decoded)

sequence_autoencoder = Model(inputs, decoded)
encoder = Model(inputs, encoded)

變分自編碼器(Variational autoencoder, VAE)：編碼數據分佈

　　編碼自編碼器是更現代和有趣的一種自動編碼器，它爲碼字施加約束，使得編碼器學習到輸入數據的隱變量模型。隱變量模型是鏈接顯變量集和隱變量集的統計模型，隱變量模型的假設是顯變量是由隱變量的狀態控制的，各個顯變量之間條件獨立。也就是說，變分編碼器再也不學習一個任意的函數，而是學習你的數據機率分佈的一組參數。經過在這個機率分佈中採樣，你能夠生成新的輸入數據，即變分編碼器是一個生成模型。

　　下面是變分編碼器的工做原理：

　　首先，編碼器網絡將輸入樣本x轉換爲隱空間的兩個參數，記做z_mean和z_log_sigma。而後，咱們隨機從隱藏的正態分佈中採樣獲得數據點z，這個隱藏分佈咱們假設就是產生輸入數據的那個分佈。z = z_mean + exp(z_log_sigma)*epsilon，epsilon是一個服從正態分佈的張量。最後，使用解碼器網絡將隱空間映射到顯空間，即將z轉換回原來的輸入數據空間。

　　參數藉由兩個損失函數來訓練，一個是重構損失函數，該函數要求解碼出來的樣本與輸入的樣本類似（與以前的自編碼器相同），第二項損失函數是學習到的隱分佈與先驗分佈的KL距離，做爲一個正則。實際上把後面這項損失函數去掉也能夠，儘管它對學習符合要求的隱空間和防止過擬合有幫助。

　　由於VAE是一個很複雜的例子，咱們把VAE的代碼放在了github上，在這裏。在這裏咱們來一步步回顧一下這個模型是如何搭建的

首先，創建編碼網絡，將輸入影射爲隱分佈的參數：

x = Input(batch_shape=(batch_size, original_dim))
h = Dense(intermediate_dim, activation='relu')(x)
z_mean = Dense(latent_dim)(h)
z_log_sigma = Dense(latent_dim)(h)

而後從這些參數肯定的分佈中採樣，這個樣本至關於以前的隱層值

def sampling(args):
    z_mean, z_log_sigma = args
    epsilon = K.random_normal(shape=(batch_size, latent_dim),
                              mean=0., std=epsilon_std)
    return z_mean + K.exp(z_log_sigma) * epsilon

# note that "output_shape" isn't necessary with the TensorFlow backend
# so you could write `Lambda(sampling)([z_mean, z_log_sigma])`
z = Lambda(sampling, output_shape=(latent_dim,))([z_mean, z_log_sigma])

最後，將採樣獲得的點映射回去重構原輸入：

decoder_h = Dense(intermediate_dim, activation='relu')
decoder_mean = Dense(original_dim, activation='sigmoid')
h_decoded = decoder_h(z)
x_decoded_mean = decoder_mean(h_decoded)

到目前爲止咱們作的工做須要實例化三個模型：

一個端到端的自動編碼器，用於完成輸入信號的重構
一個用於將輸入空間映射爲隱空間的編碼器
一個利用隱空間的分佈產生的樣本點生成對應的重構樣本的生成器

# end-to-end autoencoder
vae = Model(x, x_decoded_mean)

# encoder, from inputs to latent space
encoder = Model(x, z_mean)

# generator, from latent space to reconstructed inputs
decoder_input = Input(shape=(latent_dim,))
_h_decoded = decoder_h(decoder_input)
_x_decoded_mean = decoder_mean(_h_decoded)
generator = Model(decoder_input, _x_decoded_mean)

咱們使用端到端的模型訓練，損失函數是一項重構偏差，和一項KL距離

def vae_loss(x, x_decoded_mean):
    xent_loss = objectives.binary_crossentropy(x, x_decoded_mean)
    kl_loss = - 0.5 * K.mean(1 + z_log_sigma - K.square(z_mean) - K.exp(z_log_sigma), axis=-1)
    return xent_loss + kl_loss

vae.compile(optimizer='rmsprop', loss=vae_loss)

如今使用MNIST庫來訓練變分編碼器：

(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))
x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:])))

vae.fit(x_train, x_train,
        shuffle=True,
        nb_epoch=nb_epoch,
        batch_size=batch_size,
        validation_data=(x_test, x_test))

由於咱們的隱空間只有兩維，因此咱們能夠可視化一下。咱們來看看2D平面中不一樣類的近鄰分佈：

x_test_encoded = encoder.predict(x_test, batch_size=batch_size)
plt.figure(figsize=(6, 6))
plt.scatter(x_test_encoded[:, 0], x_test_encoded[:, 1], c=y_test)
plt.colorbar()
plt.show()

上圖每種顏色表明一個數字，相近聚類的數字表明他們在結構上類似。

由於變分編碼器是一個生成模型，咱們能夠用它來生成新數字。咱們能夠從隱平面上採樣一些點，而後生成對應的顯變量，即MNIST的數字：

# display a 2D manifold of the digits
n = 15  # figure with 15x15 digits
digit_size = 28
figure = np.zeros((digit_size * n, digit_size * n))
# we will sample n points within [-15, 15] standard deviations
grid_x = np.linspace(-15, 15, n)
grid_y = np.linspace(-15, 15, n)

for i, yi in enumerate(grid_x):
    for j, xi in enumerate(grid_y):
        z_sample = np.array([[xi, yi]]) * epsilon_std
        x_decoded = generator.predict(z_sample)
        digit = x_decoded[0].reshape(digit_size, digit_size)
        figure[i * digit_size: (i + 1) * digit_size,
               j * digit_size: (j + 1) * digit_size] = digit

plt.figure(figsize=(10, 10))
plt.imshow(figure)
plt.show()