衆所周知,機器學習的訓練數據之因此很是昂貴,是由於須要大量人工標註數據。git
autoencoder能夠輸入數據和輸出數據維度相同,這樣測試數據匹配時和訓練數據的輸出端直接匹配,從而實現無監督訓練的效果。而且,autoencoder能夠起到降維做用,雖然輸入輸出端維度相同,但中間層能夠維度很小,從而起到降維做用,造成數據的一個濃縮表示。網絡
能夠用autoencoder作Pretraining,對難以訓練的深度模型先把網絡結構肯定,以後再用訓練數據去微調。dom
特定類型的autoencoder能夠作生成模型生成新的東西,好比自動做詩等。機器學習
data representation:ide
人的記憶與數據的模式有強烈聯繫。好比讓一位嫺熟的棋手記憶某局棋局狀態,會顯示出超強的記憶力,但若是面對的是一局雜亂無章的棋局,所展示的記憶能力與普通人沒什麼差異。這體現了模式的力量,能夠經過數據間關係進行記憶,效率更高。函數
autoencoder因爲中間層有維度縮減的功效,於是強制它找到一個數據內部的pattern,從而起到高效的對訓練數據的記憶做用。學習
以下圖所示,通常中間層選取的維度很小,從而起到高效表示的做用。測試
若是徹底作線性訓練,cost function選取MSE,則這個autoencoder訓練出來的效果至關於PCA的效果。ui
# 創建數據集 rnd.seed(4) m = 200 w1, w2 = 0.1, 0.3 noise = 0.1 angles = rnd.rand(m) * 3 * np.pi / 2 - 0.5 data = np.empty((m, 3)) data[:, 0] = np.cos(angles) + np.sin(angles)/2 + noise * rnd.randn(m) / 2 data[:, 1] = np.sin(angles) * 0.7 + noise * rnd.randn(m) / 2 data[:, 2] = data[:, 0] * w1 + data[:, 1] * w2 + noise * rnd.randn(m) # nomalize 訓練集 from sklearn.preprocessing import StandardScaler scaler = StandardScaler() X_train = scaler.fit_transform(data[:100]) X_test = scaler.transform(data[100:]) # 開始創建autoencoder import tensorflow as tf from tensorflow.contrib.layers import fully_connected n_inputs = 3 # 3D inputs n_hidden = 2 # 2D codings # 強制輸出層和輸入層相同 n_outputs = n_inputs learning_rate = 0.01 X = tf.placeholder(tf.float32, shape=[None, n_inputs]) # 隱層和輸入層進行全鏈接 hidden = fully_connected(X, n_hidden, activation_fn=None) # 不作任何非線性處理,activation=none outputs = fully_connected(hidden, n_outputs, activation_fn=None) # lost function使用均方差MSE reconstruction_loss = tf.reduce_mean(tf.square(outputs - X)) # MSE optimizer = tf.train.AdamOptimizer(learning_rate) training_op = optimizer.minimize(reconstruction_loss) init = tf.global_variables_initializer() # 運行部分 # load the dataset X_train, X_test = [...] n_iterations = 1000 # the output of the hidden layer provides the codings codings = hidden with tf.Session() as sess: init.run() for iteration in range(n_iterations): # no labels (unsupervised) training_op.run(feed_dict={X: X_train}) codings_val = codings.eval(feed_dict={X: X_test})
中間隱層做用以下圖所示,將左圖中3維的圖形選取一個最優截面,映射到二維平面上。this
stacked autoencoder
作多個隱層,而且輸入到輸出造成一個對稱的關係,以下圖所示,從輸入到中間是encode,從中間到輸出是一個decode的過程。
但層次加深後,訓練時會有不少困難,好比以下代碼中,使用l2的regularization來正則化,使用ELU來作激活函數
n_inputs = 28 * 28 # for MNIST n_hidden1 = 300 n_hidden2 = 150 # codings n_hidden3 = n_hidden1 n_outputs = n_inputs learning_rate = 0.01 l2_reg = 0.001 X = tf.placeholder(tf.float32, shape=[None, n_inputs]) # arg_scope至關於對fully_connected這個函數填公共參數,如正則化統一使用l2_regularizer等,則如下4個fully_connected的缺省參數所有使用with這裏寫好的 with tf.contrib.framework.arg_scope( [fully_connected], activation_fn=tf.nn.elu,
weights_initializer=tf.contrib.layers.variance_scaling_initializer(),
weights_regularizer=tf.contrib.layers.l2_regularizer(l2_reg)): hidden1 = fully_connected(X, n_hidden1) hidden2 = fully_connected(hidden1, n_hidden2) # codings hidden3 = fully_connected(hidden2, n_hidden3) # 最後一層用none來覆蓋以前缺省的參數設置 outputs = fully_connected(hidden3, n_outputs, activation_fn=None) # 因爲以前使用了正則化,則以後能夠直接把中間計算的loss從REGULARIZATION_LOSSES中提取出來,加入到reconstruction_loss中 reconstruction_loss = tf.reduce_mean(tf.square(outputs - X)) # MSE reg_losses = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES) loss = tf.add_n([reconstruction_loss] + reg_losses) optimizer = tf.train.AdamOptimizer(learning_rate) training_op = optimizer.minimize(loss) init = tf.global_variables_initializer() n_epochs = 5 batch_size = 150 with tf.Session() as sess: init.run() for epoch in range(n_epochs): n_batches = mnist.train.num_examples // batch_size for iteration in range(n_batches): X_batch, y_batch = mnist.train.next_batch(batch_size) # 只提供了x值,沒有標籤 sess.run(training_op, feed_dict={X: X_batch})
既然autoencoder在權重上是對稱的,則權重也是能夠共享的,至關於參數數量減小一半,減小overfitting的風險,提升訓練效率。
常見的訓練手段是逐層訓練,隱層1訓練出後固定權值,訓練hidden2,再對稱一下(hidden3與hidden1徹底對應),獲得最終訓練結果
或者能夠定義不一樣的name scope,在不一樣的phase中訓練,
[...] # Build the whole stacked autoencoder normally. # In this example, the weights are not tied. optimizer = tf.train.AdamOptimizer(learning_rate) with tf.name_scope("phase1"): phase1_outputs = tf.matmul(hidden1, weights4) + biases4 phase1_reconstruction_loss = tf.reduce_mean(tf.square(phase1_outputs - X)) phase1_reg_loss = regularizer(weights1) + regularizer(weights4) phase1_loss = phase1_reconstruction_loss + phase1_reg_loss phase1_training_op = optimizer.minimize(phase1_loss) # 訓練phase2時,phase1會凍結 with tf.name_scope("phase2"): phase2_reconstruction_loss = tf.reduce_mean(tf.square(hidden3 - hidden1)) phase2_reg_loss = regularizer(weights2) + regularizer(weights3) phase2_loss = phase2_reconstruction_loss + phase2_reg_loss train_vars = [weights2, biases2, weights3, biases3] phase2_training_op = optimizer.minimize(phase2_loss, var_list=train_vars)
Pretraining
若大量數據無label,少許數據有label,則用大量無label數據在第一階段做無監督的Pretraining訓練,將encoder部分直接取出,output部分作一個直接改造。減小因爲有label數據過少致使的過擬合問題。好比下圖中的fully connected,和輸出的softmax。
去噪(denoising Autoencoder)
以下的強制加入噪聲,最後學到的是不帶噪聲的結果。而且訓練時能夠加入dropout層,拿掉一部分網絡結構(測試時不加)。這些均可以增長訓練難度,從而增進網絡魯棒性,讓模型更加穩定。
sparse Autoencoder
中間層激活神經元數量有一個上限閾值約束,中間層很是稀疏,只有少許神經元有數據,正所謂言簡意賅,這樣能夠增長中間層對信息的歸納表達能力。
第一種加入平方偏差,第二種KL距離,以下圖能夠看出KL距離和MSE之間差異比較。
def kl_divergence(p, q): return p * tf.log(p / q) + (1 - p) * tf.log((1 - p) / (1 - q)) learning_rate = 0.01 sparsity_target = 0.1 sparsity_weight = 0.2 [...] # Build a normal autoencoder (the coding layer is hidden1) optimizer = tf.train.AdamOptimizer(learning_rate) hidden1_mean = tf.reduce_mean(hidden1, axis=0) # batch mean sparsity_loss = tf.reduce_sum(kl_divergence(sparsity_target, hidden1_mean)) reconstruction_loss = tf.reduce_mean(tf.square(outputs - X)) # MSE loss = reconstruction_loss + sparsity_weight * sparsity_loss training_op = optimizer.minimize(loss) # kl距離不能取0值,於是不能使用tann的激活函數,故選取(0,1)的sigmoid函數 hidden1 = tf.nn.sigmoid(tf.matmul(X, weights1) + biases1) # [...] logits = tf.matmul(hidden1, weights2) + biases2) outputs = tf.nn.sigmoid(logits) reconstruction_loss = tf.reduce_sum( tf.nn.sigmoid_cross_entropy_with_logits(labels=X, logits=logits))
Variational Autoencoder
經過抽樣決定輸出,使用時體現機率的隨機性。是一個generation,同訓練集有關,但只是相似,是一個徹底新的實例。
以下圖,中間層加了一個關於分佈均值方差的超正態分佈的噪聲,從而中間學到的不是簡單編碼而是數據的模式,使得訓練數據與正態分佈造成一個映射關係,這樣輸出層能夠輸出和輸入層很是相像但又不同的數據。
使用時把encoder去掉,隨機加入一個高斯噪聲,在輸出端能夠獲得一個徹底新的輸出。
即input經過NN Encoder以後生成兩個coding,其中一個經某種處理後與一個高斯噪聲(即一系列服從正態分佈的噪聲)相乘,和另外一個coding相加做爲初始的中間coding。下圖與上圖同理,最終生成的output要最小化重構損失,即越接近0越好。
# smoothing term to avoid computing log(0) eps = 1e-10 # 對原輸入空間,經過最小化loss,將本來數據映射到規律的正態分佈中 latent_loss = 0.5 * tf.reduce_sum( tf.square(hidden3_sigma) + tf.square(hidden3_mean) - 1 - tf.log(eps + tf.square(hidden3_sigma))) latent_loss = 0.5 * tf.reduce_sum( tf.exp(hidden3_gamma) + tf.square(hidden3_mean) - 1 - hidden3_gamma) n_inputs = 28 * 28 # for MNIST n_hidden1 = 500 n_hidden2 = 500 n_hidden3 = 20 # codings n_hidden4 = n_hidden2 n_hidden5 = n_hidden1 n_outputs = n_inputs learning_rate = 0.001 with tf.contrib.framework.arg_scope( [fully_connected], activation_fn=tf.nn.elu, weights_initializer=tf.contrib.layers.variance_scaling_initializer()): X = tf.placeholder(tf.float32, [None, n_inputs]) hidden1 = fully_connected(X, n_hidden1) hidden2 = fully_connected(hidden1, n_hidden2) # 中間層是一個分佈的表示,並加入一個noise hidden3_mean = fully_connected(hidden2, n_hidden3, activation_fn=None) hidden3_gamma = fully_connected(hidden2, n_hidden3, activation_fn=None) hidden3_sigma = tf.exp(0.5 * hidden3_gamma) noise = tf.random_normal(tf.shape(hidden3_sigma), dtype=tf.float32) # 使用帶noise的層來鍵以後的層 hidden3 = hidden3_mean + hidden3_sigma * noise hidden4 = fully_connected(hidden3, n_hidden4) hidden5 = fully_connected(hidden4, n_hidden5) logits = fully_connected(hidden5, n_outputs, activation_fn=None) outputs = tf.sigmoid(logits) reconstruction_loss = tf.reduce_sum( tf.nn.sigmoid_cross_entropy_with_logits(labels=X, logits=logits)) latent_loss = 0.5 * tf.reduce_sum( tf.exp(hidden3_gamma) + tf.square(hidden3_mean) - 1 – hidden3_gamma) cost = reconstruction_loss + latent_loss optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate) training_op = optimizer.minimize(cost) init = tf.global_variables_initializer() # 生成數據 import numpy as np n_digits = 60 n_epochs = 50 batch_size = 150 with tf.Session() as sess: init.run() for epoch in range(n_epochs): n_batches = mnist.train.num_examples // batch_size for iteration in range(n_batches): X_batch, y_batch = mnist.train.next_batch(batch_size) sess.run(training_op, feed_dict={X: X_batch}) codings_rnd = np.random.normal(size=[n_digits, n_hidden3]) outputs_val = outputs.eval(feed_dict={hidden3: codings_rnd}) for iteration in range(n_digits): plt.subplot(n_digits, 10, iteration + 1) plot_image(outputs_val[iteration])
生成結果以下所示,都是訓練集中沒有出現的圖像