第三節，TensorFlow 使用CNN實現手寫數字識別(卷積函數tf.nn.convd介紹)

時間 2019-12-20

標籤三節 tensorflow 使用 cnn 實現手寫數字識別函數 tf.nn.convd convd 介紹简体版

原文原文鏈接

上一節，咱們已經講解了使用全鏈接網絡實現手寫數字識別，其正確率大概能達到98%，這一節咱們使用卷積神經網絡來實現手寫數字識別，python

其準確率能夠超過99%，程序主要包括如下幾塊內容數組

[1]: 導入數據，即測試集和驗證集
[2]: 引入 tensorflow 啓動InteractiveSession(比session更靈活)
[3]: 定義兩個初始化w和b的函數，方便後續操做
[4]: 定義卷積和池化函數，這裏卷積採用padding，使得
輸入輸出圖像同樣大，池化採起2x2，那麼就是4格變一格
[5]: 分配輸入x_和y_
[6]: 修改x的shape
[7]: 定義第一層卷積的w和b
[8]: 把x_image和w進行卷積，加上b，而後應用ReLU激活函數，最後進行max-pooling
[9]: 第二層卷積，和第一層卷積相似
[10]: 全鏈接層
[11]: 爲了減小過擬合，能夠在輸出層以前加入dropout。（可是本例子比較簡單，即便不加，影響也不大）
[12]: 由一個softmax層來獲得輸出
[13]: 定義代價函數，訓練步驟，用Adam來進行優化
[14]: 使用測試集樣本進行測試

咱們先來介紹一下卷積神經網絡的相關函數：網絡

1 卷積函數tf.nn.conv2d()

Tensorflow中使用tf.nn.conv2d()函數來實現卷積，其格式以下：session

tf.nn.conv2d(input,filter,strides,padding,use_cudnn_on_gpu=None,name=None)

input:指定須要作卷積的輸入圖像，它要求是一個Tensor，具備[batch,in_height,in_width,in_channels]這樣的形狀(shape)，具體含義是"訓練時一個batch的圖片數量，圖片高度，圖片寬度，圖片通道數"，注意這是一個四維的Tensor，要求類型爲float32或者float64.
filter：至關於CNN中的卷積核，它要求是一個Tensor，具備[filter_height,filter_width,in_channels,out_channels]這樣的shape，具體含義是"卷積核的高度，卷積核的寬度，圖像通道數，濾波器個數"，要求類型與參數input相同。有一個地方須要注意，第三維in_channels，就是參數input中的第四維。
strides：卷積時在圖像每一維的步長，這是一個一維的向量，長度爲4，與輸入input對應，通常值爲[1,x,x,1],x取步長。
padding：定義元素邊框與元素內容之間的空間。string類型的量，只能是"SAME"和「VALID」其中之一，這個值決定了不一樣的卷積方式。
use_cudnn_on_gpu:bool類型，是否使用cudnn加速，默認是True.
name：指定名字

該函數返回一個Tensor，這個輸出就是常說的feature map。app

注意：在卷積核函數中，padding參數最容易引發歧義，該參數僅僅決定是否要補0，所以必定要清楚padding設置爲SAME的真正意義。在設SAME的狀況下，只有在步長爲1時生成的feature map纔會與輸入大小相等。ide

padding規則介紹：函數

padding屬性的意義是定義元素邊框與元素內容之間的空間。測試

在tf.nn.conv2d函數中，當變量padding爲VALID和SAME時，函數具體是怎麼計算的呢？實際上是有公式的。爲了方便演示，咱們先來定義幾個變量：優化

輸入的尺寸中高和寬定義爲in_height,in_width；
卷積核的高和寬定義成filter_height,filter_width；
輸出的尺寸中高和寬定義成output_height,output_width;
步長的高寬定義成strides_height,strides_width;

一、VALID狀況spa

輸出寬和高的公式分別爲：

output_width = (in_width - filter_width + 1)/strides_width (結果向上取整)
output_height = (in_height - filter_height + 1)/strides_height (結果向上取整)

二、SAME狀況

output_width = in_width/strides_width (結果向上取整)
output_height = in_height /strides_height (結果向上取整)

這裏有一個很重要的知識點--補零的規則：

pad_height = max((out_height - 1)xstrides_height + filter_height - in_height,0)
pad_width = max((out_width - 1)xstrides_width + filter_width - in_width,0)
pad_top = pad_height/2
pad_bottom = pad_height - pad_top
pad_left = pad_width/2
pad_right = pad_width - pad_left

pad_height：表明高度方向要填充0的行數;
pad_width：表明寬度方向要填充0的列數;
pad_top,pad_bottom,pad_left,pad_right：分別表示上、下、左、右這4個方向填充0的行數、列數。

2 池化函數 tf.nn.max_pool()和tf.nn.avg_pool()

TensorFlow裏池化函數以下：

tf.nn.max_pool(input,ksize,strides,padding,name=None)
tf.nn.avg_pooll(input,ksize,strides,padding,name=None)

這兩個函數中的4個參數和卷積參數很類似，具體說明以下：

input：須要池化的輸入，通常池化層接在卷積層後面，因此輸入一般是feature map,依然是[batch,height,width,channels]這樣的shape。
ksize：池化窗口的大小，取一個思惟向量，通常是[1,height,width,1]，由於咱們不想在batch和channels上作池化，因此這兩個維度設爲1.
strides：和卷積參數含義相似，窗口在每個維度上滑動的步長，通常也是[1,stride,stride,1]。
padding：和卷積參數含義同樣，也是"VALID"或者"SAME"。

該函數返回一個Tensor。類型不變，shape仍然是[batch,height,width,channels]這種形式。

使用CNN實現手寫數字識別代碼以下：

# -*- coding: utf-8 -*-
"""
Created on Mon Apr  2 18:32:47 2018

@author: Administrator
"""

'''
這裏咱們沒有定義一個實現CNN的類，實際上咱們能夠定義一個CNN的類，而且把每一層也定義成一個類

利用CNN實現手寫數字識別

In [1]: 導入數據，即測試集和驗證集

In [2]: 引入 tensorflow 啓動InteractiveSession(比session更靈活)

In [3]: 定義兩個初始化w和b的函數，方便後續操做

In [4]: 定義卷積和池化函數，這裏卷積採用padding，使得輸入輸出圖像同樣大，池化採起2x2，那麼就是4格變一格

In [5]: 分配輸入x_和y_

In [6]: 修改x的shape

In [7]: 定義第一層卷積的w和b

In [8]: 把x_image和w進行卷積，加上b，而後應用ReLU激活函數，最後進行max-pooling

In [9]: 第二層卷積，和第一層卷積相似

In [10]: 全鏈接層

In [11]: 爲了減小過擬合，能夠在輸出層以前加入dropout。（可是本例子比較簡單，即便不加，影響也不大）

In [12]: 由一個softmax層來獲得輸出

In [13]: 定義代價函數，訓練步驟，用Adam來進行優化 

In [14]: 使用測試集樣本進行測試

'''


import tensorflow as tf
import numpy as np

'''
一 導入數據
'''
from tensorflow.examples.tutorials.mnist import input_data

#mnist是一個輕量級的類，它以numpy數組的形式存儲着訓練，校驗，測試數據集  one_hot表示輸出二值化後的10維
mnist = input_data.read_data_sets('MNIST-data',one_hot=True)

print(type(mnist)) #<class 'tensorflow.contrib.learn.python.learn.datasets.base.Datasets'>

print('Training data shape:',mnist.train.images.shape)           #Training data shape: (55000, 784)
print('Test data shape:',mnist.test.images.shape)                #Test data shape: (10000, 784)
print('Validation data shape:',mnist.validation.images.shape)    #Validation data shape: (5000, 784)
print('Training label shape:',mnist.train.labels.shape)          #Training label shape: (55000, 10)



#設置tensorflow對GPU使用按需分配
config = tf.ConfigProto()
config.gpu_options.allow_growth = True

sess = tf.InteractiveSession(config=config)


'''
二 構建網絡
'''
'''
初始化權值和偏重
爲了建立這個模型，咱們須要建立大量的權重和偏置項。這個模型中的權重在初始化時應該加入少許的噪聲來
打破對稱性以及避免0梯度。因爲咱們使用的是ReLU神經元，所以比較好的作法是用一個較小的正數來初始化
偏置項，以免神經元節點輸出恆爲0的問題（dead neurons）。爲了避免在創建模型的時候反覆作初始化操做
，咱們定義兩個函數用於初始化。
'''
def weight_variable(shape):
    #使用正太分佈初始化權值
    initial = tf.truncated_normal(shape,stddev=0.1)    #標準差爲0.1
    return tf.Variable(initial)

def bias_variable(shape):
    initial = tf.constant(0.1,shape=shape)
    return tf.Variable(initial)

'''
卷積層和池化層
TensorFlow在卷積和池化上有很強的靈活性。咱們怎麼處理邊界？步長應該設多大？在這個實例裏，咱們會
一直使用vanilla版本。咱們的卷積使用1步長（stride size），0邊距（padding size）的模板，保證輸
出和輸入是同一個大小。咱們的池化用簡單傳統的2x2大小的模板作max pooling。爲了代碼更簡潔，咱們把
這部分抽象成一個函數。
'''
#定義卷積層
def conv2d(x,W):
    '''默認 strides[0] = strides[3] = 1,strides[1]爲x方向步長，strides[2]爲y方向步長
     Given an input tensor of shape `[batch, in_height, in_width, in_channels]`
     and a filter / kernel tensor of shape `[filter_height, filter_width, in_channels, 
     out_channels]`
    '''
    return tf.nn.conv2d(x,W,strides=[1,1,1,1],padding = 'SAME')

#pooling層
def max_pooling(x):
    return tf.nn.max_pool(x,ksize=[1,2,2,1],strides=[1,2,2,1],padding='SAME')


#咱們經過爲輸入圖像和目標輸出類別建立節點，來開始構建計算題  None表示數值不固定，用來指定batch的大小
x_ = tf.placeholder(tf.float32,[None,784])
y_ = tf.placeholder(tf.float32,[None,10])


#把x轉換爲卷積所須要的形式  batch_size張手寫數字，每張維度爲1x28x28
'''
爲了用這一層，咱們把x變成一個4d向量，其第二、第3維對應圖片的高、高，最後一維表明圖片的顏色通道數
(由於是灰度圖因此這裏的通道數爲1，若是是rgb彩色圖，則爲3)。
'''
X = tf.reshape(x_,shape=[-1,28,28,1])


'''
如今咱們能夠開始實現第一層了。它由一個卷積接一個max pooling完成。卷積在每一個5x5的patch中算出
32個特徵。卷積的權重張量形狀是[5, 5, 1, 32]，前兩個維度是patch的大小，接着是輸入的通道數目，
最後是輸出的通道數目。 而對於每個輸出通道都有一個對應的偏置量。
'''
#第一層卷積，32個過濾器，共享權重矩陣爲1*5*5  h_conv1.shape=[-1,28,28,32]
w_conv1 = weight_variable([5,5,1,32])
b_conv1 = bias_variable([32])
h_conv1 = tf.nn.relu(conv2d(X,w_conv1) + b_conv1)

#第一個pooling層 最大值池化層2x2 [-1,28,28,28]->[-1,14,14,32]
h_pool1 = max_pooling(h_conv1)



#第二層卷積，64個過濾器，共享權重矩陣爲32*5*5  h_conv2.shape=[-1,14,14,64]
w_conv2 = weight_variable([5,5,32,64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1,w_conv2) + b_conv2)

#第二個pooling層 最大值池化層2x2 [-1,14,14,64]->[-1,7,7,64]
h_pool2 = max_pooling(h_conv2)

'''
全鏈接層
如今，圖片尺寸減少到7x7，咱們加入一個有1024個神經元的全鏈接層，用於處理整個圖片。咱們把池化層輸
出的張量reshape成一些向量，乘上權重矩陣，加上偏置，而後對其使用ReLU。
'''
h_poo2_falt = tf.reshape(h_pool2,[-1,7*7*64])
#隱藏層
w_h = weight_variable([7*7*64,1024])
b_h = bias_variable([1024])
hidden = tf.nn.relu(tf.matmul(h_poo2_falt,w_h) + b_h)


'''
加入棄權，把部分神經元輸出置爲0
爲了減小過擬合，咱們在輸出層以前加入dropout。咱們用一個placeholder來表明一個神經元的輸出在
dropout中保持不變的機率。這樣咱們能夠在訓練過程當中啓用dropout，在測試過程當中關閉dropout。
TensorFlow的tf.nn.dropout操做除了能夠屏蔽神經元的輸出外，還會自動處理神經元輸出值的scale。
因此用dropout的時候能夠不用考慮scale。
'''
keep_prob = tf.placeholder(tf.float32)    #棄權機率0.0-1.0  1.0表示不使用棄權 
hidden_drop = tf.nn.dropout(hidden,keep_prob)

'''
輸出層
最後，咱們添加一個softmax層，就像前面的單層softmax regression同樣。
'''
w_o = weight_variable([1024,10])
b_o = bias_variable([10])
output = tf.nn.softmax(tf.matmul(hidden_drop,w_o) + b_o)


'''
三 設置對數似然損失函數
'''
#代價函數 J =-(Σy.logaL)/n    .表示逐元素乘
cost = tf.reduce_mean(-tf.reduce_sum(y_*tf.log(output),axis=1))


'''
四 求解
'''
train = tf.train.AdamOptimizer(0.0001).minimize(cost)

#預測結果評估
#tf.argmax(output,1)  按行統計最大值得索引
correct = tf.equal(tf.argmax(output,1),tf.argmax(y_,1))       #返回一個數組 表示統計預測正確或者錯誤 
accuracy = tf.reduce_mean(tf.cast(correct,tf.float32))        #求準確率


#建立list 保存每一迭代的結果
training_accuracy_list = []
test_accuracy_list = []
training_cost_list=[]
test_cost_list=[]


#使用會話執行圖
sess.run(tf.global_variables_initializer())   #初始化變量


#開始迭代 使用Adam優化的隨機梯度降低法
for i in range(5000):   #一個epoch須要迭代次數計算公式：測試集長度 / batch_size
    x_batch,y_batch = mnist.train.next_batch(batch_size = 64)   
    #開始訓練
    train.run(feed_dict={x_:x_batch,y_:y_batch,keep_prob:1.0})   
    if (i+1)%200 == 0:
         #輸出訓練集準確率
        #training_accuracy = accuracy.eval(feed_dict={x_:mnist.train.images,y_:mnist.train.labels})
        training_accuracy,training_cost = sess.run([accuracy,cost],feed_dict={x_:x_batch,y_:y_batch,keep_prob:1.0})
        training_accuracy_list.append(training_accuracy)
        training_cost_list.append(training_cost)        
        print('Step {0}:Training set accuracy {1},cost {2}.'.format(i+1,training_accuracy,training_cost))

#所有訓練完成作測試  分紅200次，一次測試50個樣本
#輸出測試機準確率   若是一次性所有作測試，內容不夠用會出現OOM錯誤。因此測試時選取比較小的mini_batch來測試
#test_accuracy = accuracy.eval(feed_dict={x_:mnist.test.images,y_:mnist.test.labels})
for i in range(200):        
    x_batch,y_batch = mnist.test.next_batch(batch_size = 50)           
    test_accuracy,test_cost = sess.run([accuracy,cost],feed_dict={x_:x_batch,y_:y_batch,keep_prob:1.0})
    test_accuracy_list.append(test_accuracy)
    test_cost_list.append(test_cost) 
    if (i+1)%20 == 0:
         print('Step {0}:Test set accuracy {1},cost {2}.'.format(i+1,test_accuracy,test_cost)) 
print('Test accuracy:',np.mean(test_accuracy_list))
        

'''
圖像操做
'''
import matplotlib.pyplot as plt
#隨便取一張圖像
img = mnist.train.images[2]
label = mnist.train.labels[2]

#print('圖像像素值：{0},對應的標籤{1}'.format(img.reshape(28,28),np.argmax(label)))
print('圖像對應的標籤{0}'.format(np.argmax(label)))

plt.figure()

#子圖1
plt.subplot(1,2,1)
plt.imshow(img.reshape(28,28))                #顯示的是熱度圖片
plt.axis('off')                               #不顯示座標軸

#子圖2
plt.subplot(1,2,2)
plt.imshow(img.reshape(28,28),cmap='gray')    #顯示灰度圖片
plt.axis('off')


plt.show()

'''
顯示卷積和池化層結果
'''
plt.figure(figsize=(1.0*8,1.6*4))
plt.subplots_adjust(bottom=0,left=.01,right=.99,top=.90,hspace=.35)   
#顯示第一個卷積層以後的結果  (1,28,28,32)
conv1 = h_conv1.eval(feed_dict={x_:img.reshape([-1,784]),y_:label.reshape([-1,10]),keep_prob:1.0})
print('conv1 shape',conv1.shape)

for i in range(32):
    show_image = conv1[:,:,:,1]
    show_image.shape = [28,28]
    plt.subplot(4,8,i+1)
    plt.imshow(show_image,cmap='gray')
    plt.axis('off')
plt.show()   

plt.figure(figsize=(1.2*8,2.0*4))
plt.subplots_adjust(bottom=0,left=.01,right=.99,top=.90,hspace=.35)   
#顯示第一個池化層以後的結果  (1,14,14,32)
pool1 = h_pool1.eval(feed_dict={x_:img.reshape([-1,784]),y_:label.reshape([-1,10]),keep_prob:1.0})
print('pool1 shape',pool1.shape)

for i in range(32):
    show_image = pool1[:,:,:,1]
    show_image.shape = [14,14]
    plt.subplot(4,8,i+1)
    plt.imshow(show_image,cmap='gray')
    plt.axis('off')
plt.show()