CNN tflearn處理mnist圖像識別代碼解說——conv_2d參數解釋，整個網絡的訓練，主要就是爲了學那個卷積核啊。

時間 2019-11-18

標籤 cnn tflearn 處理 mnist 圖像識別代碼解說 conv 2d 參數解釋整個網絡訓練主要就是爲了那個欄目系統網絡简体版

原文原文鏈接

官方參數解釋：python

Convolution 2D

tflearn.layers.conv.conv_2d (incoming, nb_filter, filter_size, strides=1, padding='same', activation='linear', bias=True, weights_init='uniform_scaling', bias_init='zeros', regularizer=None, weight_decay=0.001, trainable=True, restore=True, reuse=False, scope=None, name='Conv2D')git

Input

4-D Tensor [batch, height, width, in_channels].github

Output

4-D Tensor [batch, new height, new width, nb_filter].網絡

Arguments

incoming: Tensor. Incoming 4-D Tensor.
nb_filter: int. The number of convolutional filters.
filter_size: int or list of int. Size of filters.
strides: 'intor list ofint`. Strides of conv operation. Default: [1 1 1 1].
padding: str from "same", "valid". Padding algo to use. Default: 'same'.
activation: str (name) or function (returning a Tensor) or None. Activation applied to this layer (see tflearn.activations). Default: 'linear'.
bias: bool. If True, a bias is used.
weights_init: str (name) or Tensor. Weights initialization. (see tflearn.initializations) Default: 'truncated_normal'.
bias_init: str (name) or Tensor. Bias initialization. (see tflearn.initializations) Default: 'zeros'.
regularizer: str (name) or Tensor. Add a regularizer to this layer weights (see tflearn.regularizers). Default: None.
weight_decay: float. Regularizer decay parameter. Default: 0.001.
trainable: bool. If True, weights will be trainable.
restore: bool. If True, this layer weights will be restored when loading a model.
reuse: bool. If True and 'scope' is provided, this layer variables will be reused (shared).
scope: str. Define this layer scope (optional). A scope can be used to share variables between layers. Note that scope will override name.
name: A name for this layer (optional). Default: 'Conv2D'.

代碼：app

# 64 filters

net = tflearn.conv_2d(net, 64 , 3 , activation = 'relu' )

按照個人理解：

其中的filter（卷積核）就是

[1 0 1

 0 1 0

 1 0 1],size=3

由於設置了64個filter，那麼卷積操做後有64個卷積結果做爲輸入的特徵（feature map）。難道後面激活函數就是由於選擇部分激活？？？


圖的原文：http://cs231n.github.io/convolutional-networks/

若是一個卷積層有4個feature map，那是否是就有4個卷積核？
是的。ide


這4個卷積核如何定義？
一般是隨機初始化再用BP算梯度作訓練。若是數據少或者沒有labeled data的話也能夠考慮用K-means的K箇中心點，逐層作初始化。
卷積核是學習的。卷積核是由於權重的做用方式跟卷積同樣，因此叫卷積層，其實你仍是能夠把它當作是一個parameter layer，須要更新的。

這四個卷積核就屬於網絡的參數，而後經過BP進行訓練

整個網絡的訓練，主要就是爲了學那個卷積核啊。

先初始化一個，以後BP調整，你能夠去看看caffe的源碼。




--------------------------------------------------------------------------------------------------
下面內容摘自：http://blog.csdn.net/bugcreater/article/details/53293075

from __future__ import division, print_function, absolute_import
import tflearn
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.conv import conv_2d, max_pool_2d
from tflearn.layers.normalization import local_response_normalization
from tflearn.layers.estimator import regression
#加載大名頂頂的mnist數據集（http://yann.lecun.com/exdb/mnist/）
import tflearn.datasets.mnist as mnist
X, Y, testX, testY = mnist.load_data(one_hot=True)
X = X.reshape([-1, 28, 28, 1])
testX = testX.reshape([-1, 28, 28, 1])
network = input_data(shape=[None, 28, 28, 1], name='input')
# CNN中的卷積操做,下面會有詳細解釋
network = conv_2d(network, 32, 3, activation='relu', regularizer="L2")
# 最大池化操做
network = max_pool_2d(network, 2)
# 局部響應歸一化操做
network = local_response_normalization(network)
network = conv_2d(network, 64, 3, activation='relu', regularizer="L2")
network = max_pool_2d(network, 2)
network = local_response_normalization(network)
# 全鏈接操做
network = fully_connected(network, 128, activation='tanh')
# dropout操做
network = dropout(network, 0.8)
network = fully_connected(network, 256, activation='tanh')
network = dropout(network, 0.8)
network = fully_connected(network, 10, activation='softmax')
# 迴歸操做
network = regression(network, optimizer='adam', learning_rate=0.01,
loss='categorical_crossentropy', name='target')
# Training
# DNN操做，構建深度神經網絡
model = tflearn.DNN(network, tensorboard_verbose=0)
model.fit({'input': X}, {'target': Y}, n_epoch=20,
validation_set=({'input': testX}, {'target': testY}),
snapshot_step=100, show_metric=True, run_id='convnet_mnist')

關於conv_2d函數，在源碼裏是能夠看到總共有14個參數，分別以下：函數

1.incoming: 輸入的張量，形式是[batch, height, width, in_channels]
2.nb_filter: filter的個數
3.filter_size: filter的尺寸，是int類型
4.strides: 卷積操做的步長，默認是[1,1,1,1]
5.padding: padding操做時標誌位，"same"或者"valid"，默認是「same」
6.activation: 激活函數（ps：這裏須要瞭解的知識不少，會單獨講）
7.bias: bool量，若是True，就是使用bias

8.weights_init: 權重的初始化
9.bias_init: bias的初始化，默認是0，好比衆所周知的線性函數y=wx+b,其中的w就至關於weights，b就是bias
10.regularizer: 正則項（這裏須要講解的東西很是多，會單獨講）
11.weight_decay: 權重降低的學習率
12.trainable: bool量，是否能夠被訓練
13.restore: bool量，訓練的模型是否被保存
14.name: 卷積層的名稱，默認是"Conv2D"

關於max_pool_2d函數，在源碼裏有5個參數，分別以下：

1.incoming ,相似於conv_2d裏的incoming

2.kernel_size：池化時核的大小，至關於conv_2d時的filter的尺寸

3.strides:相似於conv_2d裏的strides

4.padding:同上

5.name：同上

看了這麼多參數，好像有些迷糊，我先用一張圖解釋下每一個參數的意義。

其中的filter就是

[1 0 1

 0 1 0

 1 0 1],size=3，因爲每次移動filter都是一個格子，因此strides=1.

關於最大池化能夠看看下面這張圖，這裏面 strides=1,kernel_size =2(就是每一個顏色塊的大小),圖中示意的最大池化（能夠提取出顯著信息，好比在進行文本分析時能夠提取一句話裏的關鍵字，以及圖像處理中顯著顏色，紋理等），關於池化這裏多說一句，有時須要平均池化，有時須要最小池化。

下面說說其中的padding操做，作圖像處理的人對於這個操做應該不會陌生，說白了，就是填充。好比你對圖像作卷積操做，好比你用的3×3的卷積核，在進行邊上操做時，會發現卷積核已經超過原圖像，這時須要把原圖像進行擴大，擴大出來的就是填充，基本都填充0。




Convolution Demo. Below is a running demo of a CONV layer. Since 3D volumes are hard to visualize, all the volumes (the input volume (in blue), the weight volumes (in red), the output volume (in green)) are visualized with each depth slice stacked in rows. The input volume is of size  $W_{1} = 5, H_{1} = 5, D_{1} = 3$

General pooling. In addition to max pooling, the pooling units can also perform other functions, such as average pooling or even L2-norm pooling. Average pooling was often used historically but has recently fallen out of favor compared to the max pooling operation, which has been shown to work better in practice.學習

Pooling layer downsamples the volume spatially, independently in each depth slice of the input volume. Left: In this example, the input volume of size [224x224x64] is pooled with filter size 2, stride 2 into output volume of size [112x112x64]. Notice that the volume depth is preserved. Right: The most common downsampling operation is max, giving rise to max pooling, here shown with a stride of 2. That is, each max is taken over 4 numbers (little 2x2 square).