CNN 手寫數字識別

時間 2019-11-10

標籤 cnn 手寫數字識別简体版

原文原文鏈接

1. 知識點準備

在瞭解 CNN 網絡神經以前有兩個概念要理解，第一是二維圖像上卷積的概念，第二是 pooling 的概念。php

a. 卷積

關於卷積的概念和細節能夠參考這裏,卷積運算有兩個很是重要特性，如下面這個一維的卷積爲例子：網絡

第一個特性是稀疏鏈接。能夠看到， layer m 上的每個節點都只與 layer m-1 對應區域的三個節點相鏈接。這個局部範圍也叫感覺野。第二個特性是相同顏色的線條表明了相同的權重，即權重共享。這樣作有什麼好處呢？一方面權重共享能夠極大減少參數的數目，學習起來更加有效，另外一方面，相同的權重可讓過濾器不受圖像位置的影響來檢測圖像的特性，從而使 CNN 具備更強的泛化能力。ide

b. 池化

理論上，咱們將圖像利用不一樣的過濾器經過卷積以後獲得了多個卷積以後的圖像，而後直接利用這些圖像進行分類，可是這樣計算量太大了。利用池化操做能夠將數據量減少，同時在必定程度上保留原有的圖像特徵。關於 pooling，概念更加簡單了，詳情能夠參考這裏。池化又能夠分爲平均池化和最大池化，這裏咱們將採用最大池化。注意到，池化的區域是不重疊的，卷積的感覺野是重疊的。函數

2. 卷積神經網絡的搭建

下圖是手寫數字識別中採用的 lenet-5 簡單的卷積神經網絡模型：學習

原圖是 28 × 28 的手寫數字圖片，經過第一次 20 個 5 × 5 的卷積核以後，獲得 20 張卷積圖片。卷積核的權重是取必定範圍內的隨機值，這樣，一張 28 × 28 的圖片就變爲 20 張（28-5+1）× （28-5+1）=24×24 的圖片了。ui
將 24×24 的圖片進行 2 × 2 的最大池化，獲得 20 張 12 × 12 的圖片。該圖片的像素還須要進行 tanh 函數的變換才能做爲下一個卷積層的輸入。spa
將 tanh 變化以後的 12 × 12 大小的圖片一樣進行 20 × 50 個 5 × 5 的卷積操做以後獲得 50 張（12-5+1）× (12-5+1) = 8 × 8 的圖片。code
將 8×8 的圖片進行 2×2 的最大池化，獲得 50 張 4×4 的圖片，再通過 tanh 函數進行歸一化處理，就能夠做爲 MLP 的 800 個輸入了。blog
餘下來就是 MLP 的訓練工做了。圖片

3. LR， MLP，CNN 識別代碼

已經訓練好的模型係數的下載地址。

三種方法識別手寫數字的代碼：

  1 import cPickle
  2 
  3 import numpy
  4 
  5 import theano
  6 import theano.tensor as T
  7 from theano.tensor.signal import downsample
  8 from theano.tensor.nnet import conv
  9 
 10 ########################################
 11 # define the classifer constructs
 12 ########################################
 13 
 14 class LogisticRegression(object):
 15     def __init__(self, input, W=None, b=None):
 16 
 17         if W is None:
 18             fle = open("../model_param/lr_sgd_best.pkl")
 19             W, b = cPickle.load(fle)
 20             fle.close()
 21 
 22         self.W = W
 23         self.b = b
 24 
 25         self.outputs = T.nnet.softmax(T.dot(input, self.W) + b)
 26 
 27         self.pred = T.argmax(self.outputs, axis=1)
 28 
 29 class MLP(object):
 30     def __init__(self, input, params=None):
 31         if params is None:
 32             fle = open("../model_param/mlp_best.pkl")
 33             params = cPickle.load(fle)
 34             fle.close()
 35 
 36         self.hidden_W, self.hidden_b, self.lr_W, self.lr_b = params
 37 
 38         self.hiddenlayer = T.tanh(T.dot(input, self.hidden_W) + self.hidden_b)
 39 
 40         self.outputs = T.nnet.softmax(T.dot(self.hiddenlayer, self.lr_W) \
 41                     + self.lr_b)
 42 
 43         self.pred = T.argmax(self.outputs, axis=1)
 44 
 45 class CNN(object):
 46     def __init__(self, input, params=None):
 47         if params is None: 
 48             fle = open("../model_param/cnn_best.pkl")
 49             params = cPickle.load(fle)
 50             fle.close()
 51 
 52         ################
 53         self.layer3_W, self.layer3_b, self.layer2_W, self.layer2_b, \
 54             self.layer1_W, self.layer1_b, self.layer0_W, self.layer0_b = params
 55 
 56         # compute layer0 
 57         self.conv_out0 = conv.conv2d(input=input, filters=self.layer0_W)
 58 #                    filter_shape=(20, 1, 5, 5), image_shape=(1, 1, \
 59 #                        28, 28))
 60         self.pooled_out0 = downsample.max_pool_2d(input=self.conv_out0, \
 61                     ds=(2, 2), ignore_border=True)
 62         self.layer0_output = T.tanh(self.pooled_out0 + \
 63                     self.layer0_b.dimshuffle('x', 0, 'x', 'x'))
 64 
 65         # compute layer1 
 66         self.conv_out1 = conv.conv2d(input=self.layer0_output, filters=self.layer1_W)
 67 #                    filter_shape=(50, 20, 5, 5), image_shape=(1, 20, \
 68 #                        12, 12))
 69         self.pooled_out1 = downsample.max_pool_2d(input=self.conv_out1, \
 70                     ds=(2, 2), ignore_border=True)
 71         self.layer1_output = T.tanh(self.pooled_out1 + \
 72                     self.layer1_b.dimshuffle('x', 0, 'x', 'x'))
 73         
 74         # compute layer2
 75         self.layer2_input = self.layer1_output.flatten(2)
 76 
 77         self.layer2_output = T.tanh(T.dot(self.layer2_input, self.layer2_W) + \
 78                     self.layer2_b)
 79 
 80         # compute layer3
 81         self.outputs = T.nnet.softmax(T.dot(self.layer2_output, self.layer3_W)\
 82                     + self.layer3_b)
 83 
 84         self.pred = T.argmax(self.outputs, axis=1)
 85 
 86 ########################################
 87 # build classifier
 88 ########################################
 89 
 90 def lr(input):
 91     input.shape = 1, -1
 92 
 93     x = T.fmatrix('x')
 94     classifer = LogisticRegression(input=x)
 95 
 96     get_p_y = theano.function(inputs=[x], outputs=classifer.outputs)
 97     pred_y = theano.function(inputs=[x], outputs=classifer.pred)
 98     return (get_p_y(input), pred_y(input))
 99 
100 def mlp(input):
101     input.shape = 1, -1
102 
103     x = T.fmatrix('x')
104     classifer = MLP(input=x)
105 
106     get_p_y = theano.function(inputs=[x], outputs=classifer.outputs)
107     pred_y = theano.function(inputs=[x], outputs=classifer.pred)
108     return (get_p_y(input), pred_y(input))
109 
110 def cnn(input):
111     input.shape = (1, 1, 28, 28)
112     x = T.dtensor4('x')
113     classifer = CNN(input=x)
114     get_p_y = theano.function(inputs=[x], outputs=classifer.outputs)
115     pred_y = theano.function(inputs=[x], outputs=classifer.pred)
116     return (get_p_y(input), pred_y(input))