Deep learning：四十二(Denoise Autoencoder簡單理解)

時間 2019-11-05

標籤 deep learning 四十二 denoise autoencoder 簡單理解简体版

原文原文鏈接

　　前言：html

　　當採用無監督的方法分層預訓練深度網絡的權值時，爲了學習到較魯棒的特徵，能夠在網絡的可視層（即數據的輸入層）引入隨機噪聲，這種方法稱爲Denoise Autoencoder(簡稱dAE)，由Bengio在08年提出，見其文章Extracting and composing robust features with denoising autoencoders.使用dAE時，能夠用被破壞的輸入數據重構出原始的數據（指沒被破壞的數據），因此它訓練出來的特徵會更魯棒。本篇博文主要是根據Benigio的那篇文章簡單介紹下dAE，而後經過2個簡單的實驗來講明實際編程中該怎樣應用dAE。這2個實驗都是網絡上現成的工具稍加改變而成，其中一個就是matlab的Deep Learning toolbox，見https://github.com/rasmusbergpalm/DeepLearnToolbox，另外一個是與python相關的theano，參考：http://deeplearning.net/tutorial/dA.html.python

　　基礎知識：linux

　　首先來看看Bengio論文中關於dAE的示意圖，以下：git

　　由上圖可知，樣本x按照qD分佈加入隨機噪聲後變爲 ,按照文章的意思，這裏並非加入高斯噪聲，而是以必定機率使輸入層節點的值清爲0，這點與上篇博文介紹的dropout（Deep learning：四十一(Dropout簡單理解)）很相似，只不過dropout做用在隱含層。此時輸入到可視層的數據變爲，隱含層輸出爲y，而後由y重構x的輸出z，注意此時這裏不是重構，而是x.github

　　Bengio對dAE的直觀解釋爲：1.dAE有點相似人體的感官系統，好比人眼看物體時，若是物體某一小部分被遮住了，人依然可以將其識別出來，2.多模態信息輸入人體時（好比聲音，圖像等），少了其中某些模態的信息有時影響也不大。3.普通的autoencoder的本質是學習一個相等函數，即輸入和重構後的輸出相等，這種相等函數的表示有個缺點就是當測試樣本和訓練樣本不符合同一分佈，即相差較大時，效果很差，明顯，dAE在這方面的處理有所進步。數據庫

　　固然做者也從數學上給出了必定的解釋。編程

　　1. 流形學習的觀點。通常狀況下，高維的數據都處於一個較低維的流形曲面上，而使用dAE獲得的特徵就基本處於這個曲面上，以下圖所示。而普通的autoencoder，即便是加入了稀疏約束，其提取出的特徵也不是都在這個低維曲面上（雖然這樣也能提取出原始數據的主要信息）。ubuntu

　　2.自頂向下的生成模型觀點的解釋。3.信息論觀點的解釋。4.隨機法觀點的解釋。這幾個觀點的解釋數學有一部分數學公式，你們具體去仔細看他的paper。網絡

　　當在訓練深度網絡時，且採用了無監督方法預訓練權值，一般，Dropout和Denoise Autoencoder在使用時有一個小地方不一樣：Dropout在分層預訓練權值的過程當中是不參與的，只是後面的微調部分引入；而Denoise Autoencoder是在每層預訓練的過程當中做爲輸入層被引入，在進行微調時不參與。另外，通常的重構偏差能夠採用均方偏差的形式，可是若是輸入和輸出的向量元素都是位變量，則通常採用交叉熵來表示二者的差別。app

　　實驗過程：

　　實驗一：

　　一樣是用mnist手寫數字識別數據庫，訓練樣本數爲60000，測試樣本爲10000，採用matlab的Deep Learning工具箱（https://github.com/rasmusbergpalm/DeepLearnToolbox），2個隱含層，每一個隱含層節點個數都是100，即總體網絡結構爲：784-100-100-10. 實驗對比了有無使用denoise技術時識別的錯誤率以及兩種狀況下學習到了的特徵形狀，其實驗結果以下所示：

　　沒采用denoise的autoencoder時特徵圖顯示：

　　測試樣本偏差率：9.33%

　　採用了denoise autoencoder時的特徵圖顯示：

　　測試樣本偏差率：8.26%

　　由實驗結果圖可知，加入了噪聲後的自編碼器學習到的特徵要稍好些（沒有去調參數，若是能調得一手好參的話，效果會更好）。

　　實驗一主要部分的代碼及註釋：

　　Test.m:

%% //導入數據
load mnist_uint8;
train_x = double(train_x)/255;
test_x  = double(test_x)/255;
train_y = double(train_y);
test_y  = double(test_y);

%% //實驗一：採用denoising autoencoder進行預訓練
rng(0);
sae = saesetup([784 100 100]); % //其實這裏nn中的W已經被隨機初始化過
sae.ae{1}.activation_function       = 'sigm';
sae.ae{1}.learningRate              = 1;
sae.ae{1}.inputZeroMaskedFraction   = 0.;
sae.ae{2}.activation_function       = 'sigm';
sae.ae{2}.learningRate              = 1;
sae.ae{2}.inputZeroMaskedFraction   = 0.; %這裏的denoise autocoder至關於隱含層的dropout,但它是分層訓練的
opts.numepochs =   1;
opts.batchsize = 100;
sae = saetrain(sae, train_x, opts);% //無監督學習，不須要傳入標籤值，學習好的權重放在sae中，
                                    %  //而且train_x是最後一個隱含層的輸出。因爲是分層預訓練
                                    %  //的，因此每次訓練其實只考慮了一個隱含層，隱含層的輸入有
                                    %  //相應的denoise操做
visualize(sae.ae{1}.W{1}(:,2:end)')
% Use the SDAE to initialize a FFNN
nn = nnsetup([784 100 100 10]);
nn.activation_function              = 'sigm';
nn.learningRate                     = 1;
%add pretrained weights
nn.W{1} = sae.ae{1}.W{1}; % //將sae訓練好了的權值賦給nn網絡做爲初始值，覆蓋了前面的隨機初始化
nn.W{2} = sae.ae{2}.W{1};
% Train the FFNN
opts.numepochs =   1;
opts.batchsize = 100;
nn = nntrain(nn, train_x, train_y, opts);
[er, bad] = nntest(nn, test_x, test_y);
str = sprintf('testing error rate is: %f',er);
disp(str)


%% //實驗二：採用denoising autoencoder進行預訓練
rng(0);
sae = saesetup([784 100 100]); % //其實這裏nn中的W已經被隨機初始化過
sae.ae{1}.activation_function       = 'sigm';
sae.ae{1}.learningRate              = 1;
sae.ae{1}.inputZeroMaskedFraction   = 0.5;
sae.ae{2}.activation_function       = 'sigm';
sae.ae{2}.learningRate              = 1;
sae.ae{2}.inputZeroMaskedFraction   = 0.5; %這裏的denoise autocoder至關於隱含層的dropout,但它是分層訓練的
opts.numepochs =   1;
opts.batchsize = 100;
sae = saetrain(sae, train_x, opts);% //無監督學習，不須要傳入標籤值，學習好的權重放在sae中，
                                    %  //而且train_x是最後一個隱含層的輸出。因爲是分層預訓練
                                    %  //的，因此每次訓練其實只考慮了一個隱含層，隱含層的輸入有
                                    %  //相應的denoise操做
figure,visualize(sae.ae{1}.W{1}(:,2:end)')
% Use the SDAE to initialize a FFNN
nn = nnsetup([784 100 100 10]);
nn.activation_function              = 'sigm';
nn.learningRate                     = 1;
%add pretrained weights
nn.W{1} = sae.ae{1}.W{1}; % //將sae訓練好了的權值賦給nn網絡做爲初始值，覆蓋了前面的隨機初始化
nn.W{2} = sae.ae{2}.W{1};
% Train the FFNN
opts.numepochs =   1;
opts.batchsize = 100;
nn = nntrain(nn, train_x, train_y, opts);
[er, bad] = nntest(nn, test_x, test_y);
str = sprintf('testing error rate is: %f',er);
disp(str)

　　也能夠相似於上篇博文跟蹤Dropout代碼同樣，這裏去跟蹤下dAE代碼。使用sae時將輸入層加入50%噪聲的語句：

　　sae.ae{1}.inputZeroMaskedFraction = 0.5;

　　繼續跟蹤到sae的訓練過程，其訓練過程也是採用nntrain()函數，裏面有以下代碼：

if(nn.inputZeroMaskedFraction ~= 0)

　　batch_x = batch_x.*(rand(size(batch_x))>nn.inputZeroMaskedFraction); % //在輸入數據上加入噪聲，rand()爲0-1之間的均勻分佈

　　代碼一目瞭然。

　　實驗二：

　　這部分的實驗基本上就是網頁教程上的：http://deeplearning.net/tutorial/dA.html，具體細節能夠參考教程的內容，裏面講得比較詳細。因爲其dAE的實現是用了theano庫，因此首先須要安裝theano以及與之相關的一系列庫，好比在ubuntu下安裝就能夠參考網頁Installing Theano和Easy Installation of an optimized Theano on Ubuntu，很容易成功（注意在測試時有些不重要的小failure能夠忽略掉）。下面是我安裝theano時的各版本號：

　　ubuntu 13.04,linux操做系統.

　　python： 2.7.4，編程語言包.

　　python-numpy 1.7.1，python的數學運算包，包含矩陣運算.

　　python-scipy 0.11，有利於稀疏矩陣運算.

　　python-pip,1.1,python的包管理軟件.　　

　　python-nose,1.1.2,有利於thenao的測試.

　　libopenblas-dev,0.2.6,用來管理頭文件的.

　　git,1.8.1,用來下載軟件版本的.

　　gcc,4.7.3,用來編譯c的.

　　theano,0.6.0rc3,多維矩陣操做，優化，可與GPU結合的python庫.

　　這個實驗也是用的mnist數據庫，不過只用了一個隱含層節點，節點個數爲500. 實驗目的只是爲了對比在使用denoise先後的autoencoder學習到的特徵形狀的區別。

　　沒用denoise時的特徵：

　　使用了denoise時的特徵：

　　由圖可見，加入了denoise後學習到的特徵更具備表明性。

　　實驗二主要部分的代碼及註釋：

　　dA.py:

#_*_coding:UTF-8_*_
import cPickle
import gzip
import os
import sys
import time
import numpy
import theano
import theano.tensor as T #theano中一些常見的符號操做在子庫tensor中
from theano.tensor.shared_randomstreams import RandomStreams
from logistic_sgd import load_data
from utils import tile_raster_images
import PIL.Image #繪圖所用

class dA(object):
    def __init__(self, numpy_rng, theano_rng=None, input=None,
                 n_visible=784, n_hidden=500,
                 W=None, bhid=None, bvis=None):
        self.n_visible = n_visible
        self.n_hidden = n_hidden
        if not theano_rng:
            theano_rng = RandomStreams(numpy_rng.randint(2 ** 30))
        if not W:
            initial_W = numpy.asarray(numpy_rng.uniform(
                      low=-4 * numpy.sqrt(6. / (n_hidden + n_visible)),
                      high=4 * numpy.sqrt(6. / (n_hidden + n_visible)),
                      size=(n_visible, n_hidden)), dtype=theano.config.floatX)
            W = theano.shared(value=initial_W, name='W', borrow=True) #W,bvis,bhid都爲共享變量
        if not bvis:
            bvis = theano.shared(value=numpy.zeros(n_visible, dtype=theano.config.floatX), borrow=True)
        if not bhid:
            bhid = theano.shared(value=numpy.zeros(n_hidden, dtype=theano.config.floatX), name='b', borrow=True)
        self.W = W
        self.b = bhid
        self.b_prime = bvis
        self.W_prime = self.W.T
        self.theano_rng = theano_rng
        if input == None:
            self.x = T.dmatrix(name='input')
        else:
            self.x = input #保存輸入數據
        self.params = [self.W, self.b, self.b_prime]

    def get_corrupted_input(self, input, corruption_level):
        return  self.theano_rng.binomial(size=input.shape, n=1,
                                         p=1 - corruption_level,
                                         dtype=theano.config.floatX) * input #binomial()函數爲產生0，1的分佈，這裏是設置產生1的機率爲p

    def get_hidden_values(self, input):
        return T.nnet.sigmoid(T.dot(input, self.W) + self.b)

    def get_reconstructed_input(self, hidden):
        return  T.nnet.sigmoid(T.dot(hidden, self.W_prime) + self.b_prime)

    def get_cost_updates(self, corruption_level, learning_rate): #每調用該函數一次，就算出了前向傳播的偏差cost，網絡參數及其導數
        tilde_x = self.get_corrupted_input(self.x, corruption_level)
        y = self.get_hidden_values(tilde_x)
        z = self.get_reconstructed_input(y)
        L = - T.sum(self.x * T.log(z) + (1 - self.x) * T.log(1 - z), axis=1)
        cost = T.mean(L)
        gparams = T.grad(cost, self.params)
        updates = []
        for param, gparam in zip(self.params, gparams):
            updates.append((param, param - learning_rate * gparam)) #append列表中存的是參數和其導數構成的元組
        return (cost, updates)

# 測試函數
def test_dA(learning_rate=0.1, training_epochs=15,
            dataset='data/mnist.pkl.gz',
            batch_size=20, output_folder='dA_plots'):
    datasets = load_data(dataset)
    train_set_x, train_set_y = datasets[0] #train_set_x矩陣中每一行表明一個樣本
    n_train_batches = train_set_x.get_value(borrow=True).shape[0] / batch_size #求出batch的個數
    index = T.lscalar()    # index to a [mini]batch
    x = T.matrix('x')  # the data is presented as rasterized images
    if not os.path.isdir(output_folder):
        os.makedirs(output_folder)
    os.chdir(output_folder)

    # 沒有使用denoise時
    rng = numpy.random.RandomState(123)
    theano_rng = RandomStreams(rng.randint(2 ** 30))
    da = dA(numpy_rng=rng, theano_rng=theano_rng, input=x,
            n_visible=28 * 28, n_hidden=500) # 建立dA對象時，並不須要數據x，只是給對象da中的一些網絡結構參數賦值
    cost, updates = da.get_cost_updates(corruption_level=0.,
                                        learning_rate=learning_rate)
    train_da = theano.function([index], cost, updates=updates, #theano.function()爲定義一個符號函數，這裏的自變量爲indexy
         givens={x: train_set_x[index * batch_size: (index + 1) * batch_size]}) #輸出變量爲cost
    start_time = time.clock()
    for epoch in xrange(training_epochs):
        c = []
        for batch_index in xrange(n_train_batches):
            c.append(train_da(batch_index))
        print 'Training epoch %d, cost ' % epoch, numpy.mean(c)
    end_time = time.clock()
    training_time = (end_time - start_time)
    print >> sys.stderr, ('The no corruption code for file ' +
                          os.path.split(__file__)[1] +
                          ' ran for %.2fm' % ((training_time) / 60.))
    image = PIL.Image.fromarray(
        tile_raster_images(X=da.W.get_value(borrow=True).T,
                           img_shape=(28, 28), tile_shape=(10, 10),
                           tile_spacing=(1, 1)))
    image.save('filters_corruption_0.png')

    # 使用了denoise時
    rng = numpy.random.RandomState(123)
    theano_rng = RandomStreams(rng.randint(2 ** 30))
    da = dA(numpy_rng=rng, theano_rng=theano_rng, input=x,
            n_visible=28 * 28, n_hidden=500)
    cost, updates = da.get_cost_updates(corruption_level=0.3,
                                        learning_rate=learning_rate) #將輸入樣本每一個像素點以30%的機率被清0
    train_da = theano.function([index], cost, updates=updates,
         givens={x: train_set_x[index * batch_size:
                                  (index + 1) * batch_size]})
    start_time = time.clock()
    for epoch in xrange(training_epochs):
        c = []
        for batch_index in xrange(n_train_batches):
            c.append(train_da(batch_index))
        print 'Training epoch %d, cost ' % epoch, numpy.mean(c)
    end_time = time.clock()
    training_time = (end_time - start_time)
    print >> sys.stderr, ('The 30% corruption code for file ' +
                          os.path.split(__file__)[1] +
                          ' ran for %.2fm' % (training_time / 60.))
    image = PIL.Image.fromarray(tile_raster_images(
        X=da.W.get_value(borrow=True).T,
        img_shape=(28, 28), tile_shape=(10, 10),
        tile_spacing=(1, 1)))
    image.save('filters_corruption_30.png')
    os.chdir('../')

if __name__ == '__main__':
    test_dA()

　　其中與dAE相關的代碼爲：

def get_corrupted_input(self, input, corruption_level):
      return self.theano_rng.binomial(size=input.shape, n=1,p=1 - corruption_level,\
             dtype=theano.config.floatX) * input #binomial()函數爲產生0，1的分佈，這裏是設置產生1的機率

　　參考資料：

　　Vincent, P., et al. (2008). Extracting and composing robust features with denoising autoencoders. Proceedings of the 25th international conference on Machine learning, ACM.

https://github.com/rasmusbergpalm/DeepLearnToolbox

http://deeplearning.net/tutorial/dA.html

Deep learning：四十一(Dropout簡單理解)

Installing Theano

Easy Installation of an optimized Theano on Ubuntu