做者:凱魯嘎吉 - 博客園 http://www.cnblogs.com/kailugaji/html
因爲受限玻爾茲曼機的特殊結構,所以可使用一種比吉布斯採樣更有效 的學習算法,即對比散度(Contrastive Divergence)對比散度算法僅需k步吉布斯採樣。爲了提升效率,對比散度算法用一個訓練樣本做爲可觀測向量的初始值。而後,交替對可觀測向量和隱藏向量進行吉布斯採樣,不須要等到收斂,只須要k步就足夠了。這就是CD-k 算法。一般,k = 1就能夠學得很好。對比散度的流程如算法12.1所示。git
% maxepoch -- 最大迭代次數maximum number of epochs % numhid -- 隱含層神經元數number of hidden units % batchdata -- 分批後的訓練數據集the data that is divided into batches (numcases numdims numbatches) % restart -- 若是從第1層開始學習,就置restart爲1set to 1 if learning starts from beginning %做用:訓練RBM,利用1步CD算法 直接調用權值迭代公式不使用反向傳播 %可見的、二元的、隨機的像素經過對稱加權鏈接鏈接到隱藏的、二元的、隨機的特徵檢測器 epsilonw = 0.1; % Learning rate for weights 權重學習率 alpha epsilonvb = 0.1; % Learning rate for biases of visible units 可視層偏置學習率 alpha epsilonhb = 0.1; % Learning rate for biases of hidden units 隱藏層偏置學習率 alpha weightcost = 0.0002; %權衰減,用於防止出現過擬合 initialmomentum = 0.5; %動量項學習率,用於克服收斂速度和算法的不穩定性之間的矛盾 finalmomentum = 0.9; [numcases numdims numbatches]=size(batchdata);%[numcases numdims numbatches]=[每批中的樣本數 每一個樣本的維數 訓練樣本批數] if restart ==1 %是否爲從新開始即從頭訓練 restart=0; epoch=1; % Initializing symmetric weights and biases. 初始化權重和兩層偏置 vishid = 0.1*randn(numdims, numhid);% 鏈接權值Wij 784*1000 hidbiases = zeros(1,numhid);% 隱含層偏置項bi visbiases = zeros(1,numdims);% 可視化層偏置項aj poshidprobs = zeros(numcases,numhid); %樣本數*隱藏層NN數,隱藏層輸出p(h1|v0)對應每一個樣本有一個輸出 100*1000 neghidprobs = zeros(numcases,numhid); %重構數據驅動的隱藏層 posprods = zeros(numdims,numhid); % 表示p(h1|v0)*v0,用於更新Wij即<vihj>data 784*1000 negprods = zeros(numdims,numhid); %<vihj>recon vishidinc = zeros(numdims,numhid); % 權值更新的增量 ΔW hidbiasinc = zeros(1,numhid); % 隱含層偏置項更新的增量 1*1000 Δb visbiasinc = zeros(1,numdims); % 可視化層偏置項更新的增量 1*784 Δa batchposhidprobs=zeros(numcases,numhid,numbatches); % 整個數據隱含層的輸出 每批樣本數*隱含層維度*批數 end for epoch = epoch:maxepoch %每一個迭代週期 fprintf(1,'epoch %d\r',epoch); errsum=0; for batch = 1:numbatches %每一批樣本 fprintf(1,'epoch %d batch %d\r',epoch,batch); %%CD-1 %%%%%%%%% START POSITIVE PHASE 正向梯度%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% data = batchdata(:,:,batch); %data裏是100個圖片數據 poshidprobs = 1./(1 + exp(-data*vishid - repmat(hidbiases,numcases,1))); %隱藏層輸出p(h=1|v0)=sigmod函數=1/(1+exp(-wx-b)) 根據這個分佈採集一個隱變量h batchposhidprobs(:,:,batch)=poshidprobs; %將輸出存入一個三位數組 posprods = data' * poshidprobs; %p(h|v0)*v0 更新權重時會使用到 計算正向梯度vh' poshidact = sum(poshidprobs); %隱藏層中神經元機率和,在更新隱藏層偏置時會使用到 posvisact = sum(data); %可視層中神經元機率和,在更新可視層偏置時會使用到 %%%%%%%%% END OF POSITIVE PHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%gibbs採樣 poshidstates = poshidprobs > rand(numcases,numhid); %將隱藏層輸出01化表示,大於隨機機率的置1,小於隨機機率的置0,gibbs抽樣,設定狀態 %%%%%%%%% START NEGATIVE PHASE 反向梯度%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% negdata = 1./(1 + exp(-poshidstates*vishid' - repmat(visbiases,numcases,1))); %01化表示以後算vt=p(vt|ht-1)重構的數據 p(v=1|h)=sigmod(W*h+a) 採集重構的可見變量v' neghidprobs = 1./(1 + exp(-negdata*vishid - repmat(hidbiases,numcases,1))); %ht=p(h|vt)使用重構數據隱藏層的輸出 p(h=1|v)=sigmod(W'*v+b) 採樣一個h' negprods = negdata'*neghidprobs; %計算反向梯度v'h'; neghidact = sum(neghidprobs); negvisact = sum(negdata); %%%%%%%%% END OF NEGATIVE PHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%更新參數 err= sum(sum( (data-negdata).^2 )); %整批數據的偏差 ||v-v'||^2 errsum = err + errsum; if epoch>5 %迭代次數不一樣調整衝量 momentum=finalmomentum; else momentum=initialmomentum; end %%%%%%%%% UPDATE WEIGHTS AND BIASES 更新權重和偏置%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% vishidinc = momentum*vishidinc + ... epsilonw*( (posprods-negprods)/numcases - weightcost*vishid); %權重的增量 ΔW=alpha*(vh'-v'h') visbiasinc = momentum*visbiasinc + (epsilonvb/numcases)*(posvisact-negvisact); %可視層增量 Δa=alpha*(v-v') hidbiasinc = momentum*hidbiasinc + (epsilonhb/numcases)*(poshidact-neghidact); %隱含層增量 Δb=alpha*(h-h') vishid = vishid + vishidinc; %a=a+Δa visbiases = visbiases + visbiasinc; %W=W+ΔW hidbiases = hidbiases + hidbiasinc; %b=b+Δb %%%%%%%%%%%%%%%% END OF UPDATES %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% end fprintf(1, 'epoch %4i error %6.1f \n', epoch, errsum); end
[1] 邱錫鵬, 神經網絡與深度學習[M]. 2019.github
[2] Salakhutdinov R, Hinton G. Deep boltzmann machines[C]//Artificial intelligence and statistics. 2009: 448-455.算法
[3] Hinton, Training a deep autoencoder or a classifier on MNIST digits. 2006.數組
[4] Hinton G E. Training products of experts by minimizing contrastive divergence[J]. Neural computation, 2002, 14(8): 1771-1800.網絡
[5] Hinton G E. A practical guide to training restricted Boltzmann machines[M]//Neural networks: Tricks of the trade. Springer, Berlin, Heidelberg, 2012: 599-619.ide
[6] 深度學習 --- 受限玻爾茲曼機詳解(RBM)函數
[7] 受限玻爾茲曼機(RBM)學習筆記(六)對比散度算法學習