【深度學習系列】卷積神經網絡詳解(二)——本身手寫一個卷積神經網絡

時間 2019-11-19

標籤深度學習系列神經網絡詳解本身手寫一個简体版

原文原文鏈接

　　上篇文章中咱們講解了卷積神經網絡的基本原理，包括幾個基本層的定義、運算規則等。本文主要寫卷積神經網絡如何進行一次完整的訓練，包括前向傳播和反向傳播，並本身手寫一個卷積神經網絡。若是不瞭解基本原理的，能夠先看看上篇文章：【深度學習系列】卷積神經網絡CNN原理詳解(一)——基本原理html

卷積神經網絡的前向傳播python

　　首先咱們來看一個最簡單的卷積神經網絡：git

　1.輸入層---->卷積層github

　　以上一節的例子爲例，輸入是一個4*4 的image，通過兩個2*2的卷積核進行卷積運算後，變成兩個3*3的feature_map數組

　　以卷積核filter1爲例(stride = 1 )：網絡

　　計算第一個卷積層神經元o₁₁的輸入:　app

\begin{equation}
\begin{aligned}
\ net_{o_{11}}&= conv (input,filter)\\
&= i_{11} \times h_{11} + i_{12} \times h_{12} +i_{21} \times h_{21} + i_{22} \times h_{22}\\
&=1 \times 1 + 0 \times (-1) +1 \times 1 + 1 \times (-1)=1
\end{aligned}
\end{equation}dom

　　神經元o₁₁的輸出:(此處使用Relu激活函數)ide

\begin{equation}
\begin{aligned}
out_{o_{11}} &= activators(net_{o_{11}}) \\
&=max(0,net_{o_{11}}) = 1
\end{aligned}
\end{equation}函數

　　其餘神經元計算方式相同

　2.卷積層---->池化層

　　計算池化層m₁₁的輸入(取窗口爲 2 * 2),池化層沒有激活函數　　

\begin{equation}
\begin{aligned}
net_{m_{11}} &= max(o_{11},o_{12},o_{21},o_{22}) = 1\\
&out_{m_{11}} = net_{m_{11}} = 1
\end{aligned}
\end{equation}

　　3.池化層---->全鏈接層

　　池化層的輸出到flatten層把全部元素「拍平」，而後到全鏈接層。

　　4.全鏈接層---->輸出層

　　全鏈接層到輸出層就是正常的神經元與神經元之間的鄰接相連，經過softmax函數計算後輸出到output，獲得不一樣類別的機率值，輸出機率值最大的即爲該圖片的類別。

卷積神經網絡的反向傳播

　　傳統的神經網絡是全鏈接形式的，若是進行反向傳播，只須要由下一層對前一層不斷的求偏導，即求鏈式偏導就能夠求出每一層的偏差敏感項，而後求出權重和偏置項的梯度，便可更新權重。而卷積神經網絡有兩個特殊的層：卷積層和池化層。池化層輸出時不須要通過激活函數，是一個滑動窗口的最大值，一個常數，那麼它的偏導是1。池化層至關於對上層圖片作了一個壓縮，這個反向求偏差敏感項時與傳統的反向傳播方式不一樣。從卷積後的feature_map反向傳播到前一層時，因爲前向傳播時是經過卷積核作卷積運算獲得的feature_map，因此反向傳播與傳統的也不同，須要更新卷積核的參數。下面咱們介紹一下池化層和卷積層是如何作反向傳播的。

　　在介紹以前，首先回顧一下傳統的反向傳播方法：

　　1.經過前向傳播計算每一層的輸入值$net_{i,j}$(如卷積後的feature_map的第一個神經元的輸入：$net_{i_{11}}$)

　　2.反向傳播計算每一個神經元的偏差項$\delta_{i,j}$，$\delta_{i,j} = \frac{\partial E}{\partial net_{i,j}}$，其中E爲損失函數計算獲得的整體偏差，能夠用平方差，交叉熵等表示。

　　3.計算每一個神經元權重$w_{i,j}$的梯度，$\eta_{i,j} = \frac{\partial E}{\partial net_{i,j}} \cdot \frac{\partial net_{i,j}}{\partial w_{i,j}} = \delta_{i,j} \cdot out_{i,j}$

　　4.更新權重 $w_{i,j} = w_{i,j}-\lambda \cdot \eta_{i,j}$(其中$\lambda$爲學習率)

　　卷積層的反向傳播

　　由前向傳播可得：

　　每個神經元的值都是上一個神經元的輸入做爲這個神經元的輸入，通過激活函數激活以後輸出，做爲下一個神經元的輸入，在這裏我用$i_{11}$表示前一層,$o_{11}$表示$i_{11}$的下一層。那麼$net_{i_{11}}$就是i11這個神經元的輸入，$out_{i_{11}}$就是i11這個神經元的輸出，同理，$net_{o_{11}}$就是o11這個神經元的輸入，$out_{o_{11}}$就是$o_{11}$這個神經元的輸出,由於上一層神經元的輸出 = 下一層神經元的輸入，因此$out_{i_{11}}$= $net_{o_{11}}$，這裏我爲了簡化，直接把$out_{i_{11}}$記爲$i_{11}$

\begin{equation}
\begin{aligned}
\ i_{11}
&=out_{i_{11}} \\
&= activators(net_{i_{11}})\\
\ net_{o_{11}}&= conv (input,filter)\\
&= i_{11} \times h_{11} + i_{12} \times h_{12} +i_{21} \times h_{21} + i_{22} \times h_{22}\\
out_{o_{11}} &= activators(net_{o_{11}}) \\
&=max(0,net_{o_{11}})
\end{aligned}
\end{equation}

　　$net_{i_{11}}$表示上一層的輸入，$out_{i_{11}}$表示上一層的輸出

　　首先計算卷積的上一層的第一個元素$i_{11}$的偏差項$\delta_{11}$：

$$\delta_{11} = \frac{\partial E}{\partial net_{i_{11}}} =\frac{\partial E}{\partial out_{i_{11}}} \cdot \frac{\partial out_{i_{11}}}{\partial net_{i_{11}}} = \frac{\partial E}{\partial i_{11}} \cdot \frac{\partial i_{11}}{\partial net_{i_{11}}}$$

　　先計算$\frac{\partial E}{\partial i_{11}} $

　　此處咱們並不清楚$\frac{\partial E}{\partial i_{11}}$怎麼算，那能夠先把input層經過卷積核作完卷積運算後的輸出feature_map寫出來:

\begin{equation}
\begin{aligned}
net_{o_{11}} = i_{11} \times h_{11} + i_{12} \times h_{12} +i_{21} \times h_{21} + i_{22} \times h_{22} \\
net_{o_{12}} = i_{12} \times h_{11} + i_{13} \times h_{12} +i_{22} \times h_{21} + i_{23} \times h_{22} \\
net_{o_{12}} = i_{13} \times h_{11} + i_{14} \times h_{12} +i_{23} \times h_{21} + i_{24} \times h_{22} \\
net_{o_{21}} = i_{21} \times h_{11} + i_{22} \times h_{12} +i_{31} \times h_{21} + i_{32} \times h_{22} \\
net_{o_{22}} = i_{22} \times h_{11} + i_{23} \times h_{12} +i_{32} \times h_{21} + i_{33} \times h_{22} \\
net_{o_{23}} = i_{23} \times h_{11} + i_{24} \times h_{12} +i_{33} \times h_{21} + i_{34} \times h_{22} \\
net_{o_{31}} = i_{31} \times h_{11} + i_{32} \times h_{12} +i_{41} \times h_{21} + i_{42} \times h_{22} \\
net_{o_{32}} = i_{32} \times h_{11} + i_{33} \times h_{12} +i_{42} \times h_{21} + i_{43} \times h_{22} \\
net_{o_{33}} = i_{33} \times h_{11} + i_{34} \times h_{12} +i_{43} \times h_{21} + i_{44} \times h_{22} \\
\end{aligned}
\end{equation}

　　而後依次對輸入元素$i_{i,j}$求偏導

　　$i_{11}$的偏導：

\begin{equation}
\begin{aligned}
\frac{\partial E}{\partial i_{11}}&=\frac{\partial E}{\partial net_{o_{11}}} \cdot \frac{\partial net_{o_{11}}}{\partial i_{11}}\\
&=\delta_{11} \cdot h_{11}
\end{aligned}
\end{equation}

　　$i_{12}$的偏導：

\begin{equation}
\begin{aligned}
\frac{\partial E}{\partial i_{12}}&=\frac{\partial E}{\partial net_{o_{11}}} \cdot \frac{\partial net_{o_{11}}}{\partial i_{12}} +\frac{\partial E}{\partial net_{o_{12}}} \cdot \frac{\partial net_{o_{12}}}{\partial i_{12}}\\
&=\delta_{11} \cdot h_{12}+\delta_{12} \cdot h_{11}
\end{aligned}
\end{equation}

　　$i_{13}$的偏導：

\begin{equation}
\begin{aligned}
\frac{\partial E}{\partial i_{13}}&=\frac{\partial E}{\partial net_{o_{12}}} \cdot \frac{\partial net_{o_{12}}}{\partial i_{13}} +\frac{\partial E}{\partial net_{o_{13}}} \cdot \frac{\partial net_{o_{13}}}{\partial i_{13}}\\
&=\delta_{12} \cdot h_{12}+\delta_{13} \cdot h_{11}
\end{aligned}
\end{equation}

　　$i_{21}$的偏導：

\begin{equation}
\begin{aligned}
\frac{\partial E}{\partial i_{21}}&=\frac{\partial E}{\partial net_{o_{11}}} \cdot \frac{\partial net_{o_{11}}}{\partial i_{21}} +\frac{\partial E}{\partial net_{o_{21}}} \cdot \frac{\partial net_{o_{21}}}{\partial i_{21}}\\
&=\delta_{11} \cdot h_{21}+\delta_{21} \cdot h_{11}
\end{aligned}
\end{equation}

　　$i_{22}$的偏導：

\begin{equation}
\begin{aligned}
\frac{\partial E}{\partial i_{22}}&=\frac{\partial E}{\partial net_{o_{11}}} \cdot \frac{\partial net_{o_{11}}}{\partial i_{22}} +\frac{\partial E}{\partial net_{o_{12}}} \cdot \frac{\partial net_{o_{12}}}{\partial i_{22}}\\
&+\frac{\partial E}{\partial net_{o_{21}}} \cdot \frac{\partial net_{o_{21}}}{\partial i_{22}}+\frac{\partial E}{\partial net_{o_{22}}} \cdot \frac{\partial net_{o_{22}}}{\partial i_{22}}\\
&=\delta_{11} \cdot h_{22}+\delta_{12} \cdot h_{21}+\delta_{21} \cdot h_{12}+\delta_{22} \cdot h_{11}
\end{aligned}
\end{equation}

　　觀察一下上面幾個式子的規律，概括一下，能夠獲得以下表達式：

\begin{equation}
{
\left[ \begin{array}{ccc}
0& 0& 0& 0& 0& \\
0& \delta_{11} & \delta_{12} & \delta_{13}&0\\
0&\delta_{21} & \delta_{22} & \delta_{23} &0\\
0&\delta_{31} & \delta_{32} & \delta_{33} &0\\
0& 0& 0& 0& 0& \\
\end{array}
\right ]
\cdot
\left[ \begin{array}{ccc}
h_{22}& h_{21} \\
h_{12}& h_{11} \\
\end{array}
\right]}=
\left[ \begin{array}{ccc}
\frac{\partial E}{\partial i_{11}}& \frac{\partial E}{\partial i_{12}}& \frac{\partial E}{\partial i_{13}}& \frac{\partial E}{\partial i_{14}} \\
\frac{\partial E}{\partial i_{21}}& \frac{\partial E}{\partial i_{22}}& \frac{\partial E}{\partial i_{23}}& \frac{\partial E}{\partial i_{24}} \\
\frac{\partial E}{\partial i_{31}}& \frac{\partial E}{\partial i_{32}}& \frac{\partial E}{\partial i_{33}}& \frac{\partial E}{\partial i_{34}} \\
\frac{\partial E}{\partial i_{41}}& \frac{\partial E}{\partial i_{42}}& \frac{\partial E}{\partial i_{43}}& \frac{\partial E}{\partial i_{44}} \\
\end{array}
\right]
\end{equation}

　　圖中的卷積核進行了180°翻轉，與這一層的偏差敏感項矩陣${delta_{i,j})}$周圍補零後的矩陣作卷積運算後，就能夠獲得${\frac{\partial E}{\partial i_{11}}}$，即

$\frac{\partial E}{\partial i_{i,j}} = \sum_m \cdot \sum_n h_{m,n}\delta_{i+m,j+n}$

　　第一項求完後，咱們來求第二項$\frac{\partial i_{11}}{\partial net_{i_{11}}}$

\begin{equation}
\begin{aligned}
\because i_{11} &= out_{i_{11}} \\
&= activators(net_{i_{11}})\\
\therefore \frac{\partial i_{11}}{\partial net_{i_{11}}}
&=f'(net_{i_{11}})\\
\therefore \delta_{11} &=\frac{\partial E}{\partial net_{i_{11}}} \\
&=\frac{\partial E}{\partial i_{11}} \cdot \frac{\partial i_{11}}{\partial net_{i_{11}}}\\
&=\sum_m \cdot \sum_n h_{m,n}\delta_{i+m,j+n} \cdot f'(net_{i_{11}})
\end{aligned}
\end{equation}

　　此時咱們的偏差敏感矩陣就求完了，獲得偏差敏感矩陣後，便可求權重的梯度。

　　因爲上面已經寫出了卷積層的輸入$net_{o_{11}}$與權重$h_{i,j}$之間的表達式，因此能夠直接求出：

\begin{equation}
\begin{aligned}
\frac{\partial E}{\partial h_{11}}&=\frac{\partial E}{\partial net_{o_{11}}} \cdot \frac{\partial net_{o_{11}}}{\partial h_{11}}+...\\
&+\frac{\partial E}{\partial net_{o_{33}}} \cdot \frac{\partial net_{o_{33}}}{\partial h_{11}}\\
&=\delta_{11} \cdot h_{11} +...+ \delta_{33} \cdot h_{11}
\end{aligned}
\end{equation}

　　推論出權重的梯度：

\begin{equation}
\begin{aligned}
\frac{\partial E}{\partial h_{i,j}} = \sum_m\sum_n\delta_{m,n}out_{o_{i+m,j+n}}
\end{aligned}
\end{equation}

　　偏置項的梯度：

\begin{equation}
\begin{aligned}
\frac{\partial E}{\partial b} &=\frac{\partial E}{\partial net_{o_{11}}} \frac{\partial net_{o_{11}}}{\partial w_b} +\frac{\partial E}{\partial net_{o_{12}}} \frac{\partial net_{o_{12}}}{\partial w_b}\\
&+\frac{\partial E}{\partial net_{o_{21}}} \frac{\partial net_{o_{21}}}{\partial w_b} +\frac{\partial E}{\partial net_{o_{22}}} \frac{\partial net_{o_{22}}}{\partial w_b}\\
&=\delta_{11}+\delta_{12}+\delta_{21}+\delta_{22}\\
&=\sum_i\sum_j\delta_{i,j}
\end{aligned}
\end{equation}

　　能夠看出，偏置項的偏導等於這一層全部偏差敏感項之和。獲得了權重和偏置項的梯度後，就能夠根據梯度降低法更新權重和梯度了。

　 池化層的反向傳播

　池化層的反向傳播就比較好求了，看着下面的圖，左邊是上一層的輸出，也就是卷積層的輸出feature_map，右邊是池化層的輸入，仍是先根據前向傳播，把式子都寫出來，方便計算：

　　假設上一層這個滑動窗口的最大值是$out_{o_{11}}$
\begin{equation}
\begin{aligned}
&\because net_{m_{11}} = max(out_{o_{11}},out_{o_{12}},out_{o_{21}},out_{o_{22}})\\
&\therefore \frac{\partial net_{m_{11}}}{\partial out_{o_{11}}} = 1\\
& \frac{\partial net_{m_{11}}}{\partial out_{o_{12}}}=\frac{\partial net_{m_{11}}}{\partial out_{o_{21}}}=\frac{\partial net_{m_{11}}}{\partial out_{o_{22}}} = 0\\
&\therefore \delta_{11}^{l-1} = \frac{\partial E}{\partial out_{o_{11}}} = \frac{\partial E}{\partial net_{m_{11}}} \cdot \frac{\partial net_{m_{11}}}{\partial out_{o_{11}}} =\delta_{11}^l\\
&\delta_{12}^{l-1} = \delta_{21}^{l-1} =\delta_{22}^{l-1} = 0
\end{aligned}
\end{equation}

　　這樣就求出了池化層的偏差敏感項矩陣。同理能夠求出每一個神經元的梯度並更新權重。

手寫一個卷積神經網絡

　　1.定義一個卷積層

　　首先咱們經過ConvLayer來實現一個卷積層，定義卷積層的超參數

 1 class ConvLayer(object):
 2     '''
 3     參數含義：
 4     input_width:輸入圖片尺寸——寬度
 5     input_height:輸入圖片尺寸——長度
 6     channel_number:通道數，彩色爲3，灰色爲1
 7     filter_width:卷積核的寬
 8     filter_height:卷積核的長
 9     filter_number:卷積核數量
10     zero_padding：補零長度
11     stride:步長
12     activator:激活函數
13     learning_rate:學習率
14     '''
15     def __init__(self, input_width, input_height,
16                  channel_number, filter_width,
17                  filter_height, filter_number,
18                  zero_padding, stride, activator,
19                  learning_rate):
20         self.input_width = input_width
21         self.input_height = input_height
22         self.channel_number = channel_number
23         self.filter_width = filter_width
24         self.filter_height = filter_height
25         self.filter_number = filter_number
26         self.zero_padding = zero_padding
27         self.stride = stride
28         self.output_width = \
29             ConvLayer.calculate_output_size(
30             self.input_width, filter_width, zero_padding,
31             stride)
32         self.output_height = \
33             ConvLayer.calculate_output_size(
34             self.input_height, filter_height, zero_padding,
35             stride)
36         self.output_array = np.zeros((self.filter_number,
37             self.output_height, self.output_width))
38         self.filters = []
39         for i in range(filter_number):
40             self.filters.append(Filter(filter_width,
41                 filter_height, self.channel_number))
42         self.activator = activator
43         self.learning_rate = learning_rate

　　其中calculate_output_size用來計算經過卷積運算後輸出的feature_map大小

1 @staticmethod
2     def calculate_output_size(input_size,
3             filter_size, zero_padding, stride):
4         return (input_size - filter_size +
5             2 * zero_padding) / stride + 1

　　2.構造一個激活函數

　　此處用的是RELU激活函數，所以咱們在activators.py裏定義，forward是前向計算，backforward是計算公式的導數：

1 class ReluActivator(object):
2     def forward(self, weighted_input):
3         #return weighted_input
4         return max(0, weighted_input)
5 
6     def backward(self, output):
7         return 1 if output > 0 else 0

　　其餘常見的激活函數咱們也能夠放到activators裏，如sigmoid函數，咱們能夠作以下定義：

1 class SigmoidActivator(object):
2     def forward(self, weighted_input):
3         return 1.0 / (1.0 + np.exp(-weighted_input))
4     #the partial of sigmoid
5     def backward(self, output):
6         return output * (1 - output)

　　若是咱們須要自動以其餘的激活函數，均可以在activator.py定義一個類便可。

　　3.定義一個類，保存卷積層的參數和梯度

 1 class Filter(object):
 2     def __init__(self, width, height, depth):
 3         #初始權重
 4         self.weights = np.random.uniform(-1e-4, 1e-4,
 5             (depth, height, width))
 6         #初始偏置
 7         self.bias = 0
 8         self.weights_grad = np.zeros(
 9             self.weights.shape)
10         self.bias_grad = 0
11 
12     def __repr__(self):
13         return 'filter weights:\n%s\nbias:\n%s' % (
14             repr(self.weights), repr(self.bias))
15 
16     def get_weights(self):
17         return self.weights
18 
19     def get_bias(self):
20         return self.bias
21 
22     def update(self, learning_rate):
23         self.weights -= learning_rate * self.weights_grad
24         self.bias -= learning_rate * self.bias_grad

　　4.卷積層的前向傳播

　　1).獲取卷積區域

 1 # 獲取卷積區域
 2 def get_patch(input_array, i, j, filter_width,
 3               filter_height, stride):
 4     '''
 5     從輸入數組中獲取本次卷積的區域，
 6     自動適配輸入爲2D和3D的狀況
 7     '''
 8     start_i = i * stride
 9     start_j = j * stride
10     if input_array.ndim == 2:
11         input_array_conv = input_array[
12             start_i : start_i + filter_height,
13             start_j : start_j + filter_width]
14         print "input_array_conv:",input_array_conv
15         return input_array_conv
16 
17     elif input_array.ndim == 3:
18         input_array_conv = input_array[:,
19             start_i : start_i + filter_height,
20             start_j : start_j + filter_width]
21         print "input_array_conv:",input_array_conv
22         return input_array_conv

　　2).進行卷積運算

 1 def conv(input_array,
 2          kernel_array,
 3          output_array,
 4          stride, bias):
 5     '''
 6     計算卷積，自動適配輸入爲2D和3D的狀況
 7     '''
 8     channel_number = input_array.ndim
 9     output_width = output_array.shape[1]
10     output_height = output_array.shape[0]
11     kernel_width = kernel_array.shape[-1]
12     kernel_height = kernel_array.shape[-2]
13     for i in range(output_height):
14         for j in range(output_width):
15             output_array[i][j] = (
16                 get_patch(input_array, i, j, kernel_width,
17                     kernel_height, stride) * kernel_array
18                 ).sum() + bias

　　3).增長zero_padding

 1 #增長Zero padding
 2 def padding(input_array, zp):
 3     '''
 4     爲數組增長Zero padding，自動適配輸入爲2D和3D的狀況
 5     '''
 6     if zp == 0:
 7         return input_array
 8     else:
 9         if input_array.ndim == 3:
10             input_width = input_array.shape[2]
11             input_height = input_array.shape[1]
12             input_depth = input_array.shape[0]
13             padded_array = np.zeros((
14                 input_depth,
15                 input_height + 2 * zp,
16                 input_width + 2 * zp))
17             padded_array[:,
18                 zp : zp + input_height,
19                 zp : zp + input_width] = input_array
20             return padded_array
21         elif input_array.ndim == 2:
22             input_width = input_array.shape[1]
23             input_height = input_array.shape[0]
24             padded_array = np.zeros((
25                 input_height + 2 * zp,
26                 input_width + 2 * zp))
27             padded_array[zp : zp + input_height,
28                 zp : zp + input_width] = input_array
29             return padded_array

　　4).進行前向傳播

 1 def forward(self, input_array):
 2         '''
 3         計算卷積層的輸出
 4         輸出結果保存在self.output_array
 5         '''
 6         self.input_array = input_array
 7         self.padded_input_array = padding(input_array,
 8             self.zero_padding)
 9         for f in range(self.filter_number):
10             filter = self.filters[f]
11             conv(self.padded_input_array,
12                 filter.get_weights(), self.output_array[f],
13                 self.stride, filter.get_bias())
14         element_wise_op(self.output_array,
15                         self.activator.forward)

　　其中element_wise_op函數是將每一個組的元素對應相乘

1 # 對numpy數組進行element wise操做，將矩陣中的每一個元素對應相乘
2 def element_wise_op(array, op):
3     for i in np.nditer(array,
4                        op_flags=['readwrite']):
5         i[...] = op(i)

　　5.卷積層的反向傳播

　　1).將偏差傳遞到上一層

 1 def bp_sensitivity_map(self, sensitivity_array,
 2                            activator):
 3         '''
 4         計算傳遞到上一層的sensitivity map
 5         sensitivity_array: 本層的sensitivity map
 6         activator: 上一層的激活函數
 7         '''
 8         # 處理卷積步長，對原始sensitivity map進行擴展
 9         expanded_array = self.expand_sensitivity_map(
10             sensitivity_array)
11         # full卷積，對sensitivitiy map進行zero padding
12         # 雖然原始輸入的zero padding單元也會得到殘差
13         # 但這個殘差不須要繼續向上傳遞，所以就不計算了
14         expanded_width = expanded_array.shape[2]
15         zp = (self.input_width +
16               self.filter_width - 1 - expanded_width) / 2
17         padded_array = padding(expanded_array, zp)
18         # 初始化delta_array，用於保存傳遞到上一層的
19         # sensitivity map
20         self.delta_array = self.create_delta_array()
21         # 對於具備多個filter的卷積層來講，最終傳遞到上一層的
22         # sensitivity map至關於全部的filter的
23         # sensitivity map之和
24         for f in range(self.filter_number):
25             filter = self.filters[f]
26             # 將filter權重翻轉180度
27             flipped_weights = np.array(map(
28                 lambda i: np.rot90(i, 2),
29                 filter.get_weights()))
30             # 計算與一個filter對應的delta_array
31             delta_array = self.create_delta_array()
32             for d in range(delta_array.shape[0]):
33                 conv(padded_array[f], flipped_weights[d],
34                     delta_array[d], 1, 0)
35             self.delta_array += delta_array
36         # 將計算結果與激活函數的偏導數作element-wise乘法操做
37         derivative_array = np.array(self.input_array)
38         element_wise_op(derivative_array,
39                         activator.backward)
40         self.delta_array *= derivative_array

　　2).保存傳遞到上一層的sensitivity map的數組

1 def create_delta_array(self):
2         return np.zeros((self.channel_number,
3             self.input_height, self.input_width))

　　3).計算代碼梯度

 1 def bp_gradient(self, sensitivity_array):
 2         # 處理卷積步長，對原始sensitivity map進行擴展
 3         expanded_array = self.expand_sensitivity_map(
 4             sensitivity_array)
 5         for f in range(self.filter_number):
 6             # 計算每一個權重的梯度
 7             filter = self.filters[f]
 8             for d in range(filter.weights.shape[0]):
 9                 conv(self.padded_input_array[d],
10                      expanded_array[f],
11                      filter.weights_grad[d], 1, 0)
12             # 計算偏置項的梯度
13             filter.bias_grad = expanded_array[f].sum()

　　　4).按照梯度降低法更新參數

1 def update(self):
2         '''
3         按照梯度降低，更新權重
4         '''
5         for filter in self.filters:
6             filter.update(self.learning_rate)

　　6.MaxPooling層的訓練

　　1).定義MaxPooling類

 1 class MaxPoolingLayer(object):
 2     def __init__(self, input_width, input_height,
 3                  channel_number, filter_width,
 4                  filter_height, stride):
 5         self.input_width = input_width
 6         self.input_height = input_height
 7         self.channel_number = channel_number
 8         self.filter_width = filter_width
 9         self.filter_height = filter_height
10         self.stride = stride
11         self.output_width = (input_width -
12             filter_width) / self.stride + 1
13         self.output_height = (input_height -
14             filter_height) / self.stride + 1
15         self.output_array = np.zeros((self.channel_number,
16             self.output_height, self.output_width))

　　2).前向傳播計算

 1 # 前向傳播
 2     def forward(self, input_array):
 3         for d in range(self.channel_number):
 4             for i in range(self.output_height):
 5                 for j in range(self.output_width):
 6                     self.output_array[d,i,j] = (
 7                         get_patch(input_array[d], i, j,
 8                             self.filter_width,
 9                             self.filter_height,
10                             self.stride).max())

　　3).反向傳播計算

 1 #反向傳播
 2     def backward(self, input_array, sensitivity_array):
 3         self.delta_array = np.zeros(input_array.shape)
 4         for d in range(self.channel_number):
 5             for i in range(self.output_height):
 6                 for j in range(self.output_width):
 7                     patch_array = get_patch(
 8                         input_array[d], i, j,
 9                         self.filter_width,
10                         self.filter_height,
11                         self.stride)
12                     k, l = get_max_index(patch_array)
13                     self.delta_array[d,
14                         i * self.stride + k,
15                         j * self.stride + l] = \
16                         sensitivity_array[d,i,j]

　　完整代碼請見：cnn.py (https://github.com/huxiaoman7/PaddlePaddle_code/blob/master/1.mnist/cnn.py)

  1 #coding:utf-8
  2 '''
  3 Created by huxiaoman 2017.11.22
  4 
  5 '''
  6 
  7 import numpy as np
  8 from activators import ReluActivator,IdentityActivator
  9 
 10 class ConvLayer(object):
 11     def __init__(self,input_width,input_weight,
 12              channel_number,filter_width,
 13              filter_height,filter_number,
 14              zero_padding,stride,activator,
 15              learning_rate):
 16         self.input_width = input_width
 17         self.input_height = input_height
 18         self.channel_number = channel_number
 19         self.filter_width = filter_width
 20         self.filter_height = filter_height
 21         self.filter_number = filter_number
 22         self.zero_padding = zero_padding
 23         self.stride = stride #此處能夠加上stride_x, stride_y
 24         self.output_width = ConvLayer.calculate_output_size(
 25                 self.input_width,filter_width,zero_padding,
 26                 stride)
 27         self.output_height = ConvLayer.calculate_output_size(
 28                 self.input_height,filter_height,zero_padding,
 29                 stride)
 30         self.output_array = np.zeros((self.filter_number,
 31                 self.output_height,self.output_width))
 32         self.filters = []
 33         for i in range(filter_number):    
 34             self.filters.append(Filter(filter_width,
 35                 filter_height,self.channel_number))
 36         self.activator = activator
 37         self.learning_rate = learning_rate
 38     def forward(self,input_array):
 39         '''
 40         計算卷積層的輸出
 41         輸出結果保存在self.output_array
 42         '''
 43         self.input_array = input_array
 44         self.padded_input_array = padding(input_array,
 45             self.zero_padding)
 46         for i in range(self.filter_number):
 47             filter = self.filters[f]
 48             conv(self.padded_input_array,
 49                  filter.get_weights(), self.output_array[f],
 50                  self.stride, filter.get_bias())
 51             element_wise_op(self.output_array,
 52                     self.activator.forward)
 53 
 54 def get_batch(input_array, i, j, filter_width,filter_height,stride):
 55     '''
 56     從輸入數組中獲取本次卷積的區域，
 57     自動適配輸入爲2D和3D的狀況
 58     '''
 59     start_i = i * stride
 60     start_j = j * stride
 61     if input_array.ndim == 2:
 62         return input_array[
 63             start_i : start_i + filter_height,
 64             start_j : start_j + filter_width]
 65     elif input_array.ndim == 3:
 66         return input_array[
 67             start_i : start_i + filter_height,
 68                         start_j : start_j + filter_width]
 69 
 70 # 獲取一個2D區域的最大值所在的索引
 71 def get_max_index(array):
 72     max_i = 0
 73     max_j = 0
 74     max_value = array[0,0]
 75     for i in range(array.shape[0]):
 76         for j in range(array.shape[1]):
 77             if array[i,j] > max_value:
 78                 max_value = array[i,j]
 79                 max_i, max_j = i, j
 80     return max_i, max_j
 81 
 82 def conv(input_array,kernal_array,
 83     output_array,stride,bias):
 84     '''
 85     計算卷積，自動適配輸入2D,3D的狀況
 86     '''
 87     channel_number = input_array.ndim
 88     output_width = output_array.shape[1]
 89     output_height = output_array.shape[0]
 90     kernel_width = kernel_array.shape[-1]
 91     kernel_height = kernel_array.shape[-2]
 92     for i in range(output_height):
 93         for j in range(output_width):
 94             output_array[i][j] = (
 95                 get_patch(input_array, i, j, kernel_width,
 96                     kernel_height,stride) * kernel_array).sum() +bias
 97 
 98 
 99 def element_wise_op(array, op):
100     for i in np.nditer(array,
101                op_flags = ['readwrite']):
102         i[...] = op(i)
103 
104 
105 class ReluActivators(object):
106     def forward(self, weighted_input):
107         # Relu計算公式 = max(0,input)
108         return max(0, weighted_input)
109 
110     def backward(self,output):
111         return 1 if output > 0 else 0
112 
113 class SigmoidActivator(object):
114         
115     def forward(self,weighted_input):
116         return 1 / (1 + math.exp(- weighted_input))
117     
118     def backward(self,output):
119         return output * (1 - output)

View Code

　　最後，咱們用以前的4 * 4的image數據檢驗一下經過一次卷積神經網絡進行前向傳播和反向傳播後的輸出結果：

 1 def init_test():
 2     a = np.array(
 3         [[[0,1,1,0,2],
 4           [2,2,2,2,1],
 5           [1,0,0,2,0],
 6           [0,1,1,0,0],
 7           [1,2,0,0,2]],
 8          [[1,0,2,2,0],
 9           [0,0,0,2,0],
10           [1,2,1,2,1],
11           [1,0,0,0,0],
12           [1,2,1,1,1]],
13          [[2,1,2,0,0],
14           [1,0,0,1,0],
15           [0,2,1,0,1],
16           [0,1,2,2,2],
17           [2,1,0,0,1]]])
18     b = np.array(
19         [[[0,1,1],
20           [2,2,2],
21           [1,0,0]],
22          [[1,0,2],
23           [0,0,0],
24           [1,2,1]]])
25     cl = ConvLayer(5,5,3,3,3,2,1,2,IdentityActivator(),0.001)
26     cl.filters[0].weights = np.array(
27         [[[-1,1,0],
28           [0,1,0],
29           [0,1,1]],
30          [[-1,-1,0],
31           [0,0,0],
32           [0,-1,0]],
33          [[0,0,-1],
34           [0,1,0],
35           [1,-1,-1]]], dtype=np.float64)
36     cl.filters[0].bias=1
37     cl.filters[1].weights = np.array(
38         [[[1,1,-1],
39           [-1,-1,1],
40           [0,-1,1]],
41          [[0,1,0],
42          [-1,0,-1],
43           [-1,1,0]],
44          [[-1,0,0],
45           [-1,0,1],
46           [-1,0,0]]], dtype=np.float64)
47     return a, b, cl

　　運行一下：

 1 def test():
 2     a, b, cl = init_test()
 3     cl.forward(a)
 4     print "前向傳播結果:", cl.output_array
 5     cl.backward(a, b, IdentityActivator())
 6     cl.update()
 7     print "反向傳播後更新獲得的filter1:",cl.filters[0]
 8     print "反向傳播後更新獲得的filter2:",cl.filters[1]
 9 
10 if __name__ == "__main__":
11         test()

　　運行結果：　

 1 前向傳播結果: [[[ 6.  7.  5.]
 2   [ 3. -1. -1.]
 3   [ 2. -1.  4.]]
 4 
 5  [[ 2. -5. -8.]
 6   [ 1. -4. -4.]
 7   [ 0. -5. -5.]]]
 8 反向傳播後更新獲得的filter1: filter weights:
 9 array([[[-1.008,  0.99 , -0.009],
10         [-0.005,  0.994, -0.006],
11         [-0.006,  0.995,  0.996]],
12 
13        [[-1.004, -1.001, -0.004],
14         [-0.01 , -0.009, -0.012],
15         [-0.002, -1.002, -0.002]],
16 
17        [[-0.002, -0.002, -1.003],
18         [-0.005,  0.992, -0.005],
19         [ 0.993, -1.008, -1.007]]])
20 bias:
21 0.99099999999999999
22 反向傳播後更新獲得的filter2: filter weights:
23 array([[[  9.98000000e-01,   9.98000000e-01,  -1.00100000e+00],
24         [ -1.00400000e+00,  -1.00700000e+00,   9.97000000e-01],
25         [ -4.00000000e-03,  -1.00400000e+00,   9.98000000e-01]],
26 
27        [[  0.00000000e+00,   9.99000000e-01,   0.00000000e+00],
28         [ -1.00900000e+00,  -5.00000000e-03,  -1.00400000e+00],
29         [ -1.00400000e+00,   1.00000000e+00,   0.00000000e+00]],
30 
31        [[ -1.00400000e+00,  -6.00000000e-03,  -5.00000000e-03],
32         [ -1.00200000e+00,  -5.00000000e-03,   9.98000000e-01],
33         [ -1.00200000e+00,  -1.00000000e-03,   0.00000000e+00]]])
34 bias:
35 -0.0070000000000000001

PaddlePaddle卷積神經網絡源碼解析

　　卷積層

　　在上篇文章中，咱們對paddlepaddle實現卷積神經網絡的的函數簡單介紹了一下。在手寫數字識別中，咱們設計CNN的網絡結構時，調用了一個函數simple_img_conv_pool(上篇文章的連接已失效，由於已經把framework--->fluid，更新速度太快了 = =)使用方式以下：

1 conv_pool_1 = paddle.networks.simple_img_conv_pool(
2         input=img,
3         filter_size=5,
4         num_filters=20,
5         num_channel=1,
6         pool_size=2,
7         pool_stride=2,
8         act=paddle.activation.Relu())

　　這個函數把卷積層和池化層兩個部分封裝在一塊兒，只用調用一個函數就能夠搞定，很是方便。若是隻須要單獨使用卷積層，能夠調用這個函數img_conv_layer,使用方式以下：

1 conv = img_conv_layer(input=data, filter_size=1, filter_size_y=1,
2                               num_channels=8,
3                               num_filters=16, stride=1,
4                               bias_attr=False,
5                               act=ReluActivation())

　　咱們來看一下這個函數具體有哪些參數(註釋寫明瞭參數的含義和怎麼使用)

  1 def img_conv_layer(input,
  2                    filter_size,
  3                    num_filters,
  4                    name=None,
  5                    num_channels=None,
  6                    act=None,
  7                    groups=1,
  8                    stride=1,
  9                    padding=0,
 10                    dilation=1,
 11                    bias_attr=None,
 12                    param_attr=None,
 13                    shared_biases=True,
 14                    layer_attr=None,
 15                    filter_size_y=None,
 16                    stride_y=None,
 17                    padding_y=None,
 18                    dilation_y=None,
 19                    trans=False,
 20                    layer_type=None):
 21     """
 22     適合圖像的卷積層。Paddle能夠支持正方形和長方形兩種圖片尺寸的輸入
 23     
 24     也可適用於圖像的反捲積(Convolutional Transpose，即deconv)。
 25     一樣可支持正方形和長方形兩種尺寸輸入。
 26 
 27     num_channel:輸入圖片的通道數。能夠是1或者3，或者是上一層的通道數(卷積核數目 * 組的數量)
 28     每個組都會處理圖片的一些通道。舉個例子，若是一個輸入如偏的num_channel是256，設置4個group，
 29     32個卷積核，那麼會建立32*4 = 128個卷積核來處理輸入圖片。通道會被分紅四塊，32個卷積核會先
 30     處理64(256/4=64)個通道。剩下的卷積核組會處理剩下的通道。
 31 
 32     name:層的名字。可選，自定義。
 33     type:basestring
 34 
 35     input:這個層的輸入
 36     type:LayerOutPut
 37 
 38     filter_size:卷積核的x維，能夠理解爲width。
 39                 若是是正方形，能夠直接輸入一個元祖組表示圖片的尺寸
 40     type:int/ tuple/ list
 41 
 42     filter_size_y:卷積核的y維，能夠理解爲height。
 43                 PaddlePaddle支持長方形的圖片尺寸，因此卷積核的尺寸爲(filter_size,filter_size_y)
 44 
 45     type:int/ None
 46 
 47     act: 激活函數類型。默認選Relu
 48     type:BaseActivation
 49 
 50     groups:卷積核的組數量
 51     type:int
 52     
 53 
 54     stride: 水平方向的滑動步長。或者世界輸入一個元祖，表明水平數值滑動步長相同。
 55     type:int/ tuple/ list
 56 
 57     stride_y:垂直滑動步長。
 58     type:int 
 59     
 60     padding: 補零的水平維度，也能夠直接輸入一個元祖，水平和垂直方向上補零的維度相同。
 61     type:int/ tuple/ list
 62 
 63     padding_y:垂直方向補零的維度
 64     type:int
 65 
 66     dilation:水平方向的擴展維度。一樣能夠輸入一個元祖表示水平和初值上擴展維度相同
 67     :type:int/ tuple/ list
 68 
 69     dilation_y:垂直方向的擴展維度
 70     type:int
 71 
 72     bias_attr:偏置屬性
 73               False：不定義bias   True：bias初始化爲0
 74     type: ParameterAttribute/ None/ bool/ Any
 75 
 76     num_channel：輸入圖片的通道channel。若是設置爲None，自動生成爲上層輸出的通道數
 77     type: int
 78 
 79     param_attr:卷積參數屬性。設置爲None表示默認屬性
 80     param_attr:ParameterAttribute
 81 
 82     shared_bias:設置偏置項是否會在卷積核中共享
 83     type:bool
 84 
 85     layer_attr: Layer的 Extra Attribute
 86     type:ExtraLayerAttribute
 87 
 88     param trans:若是是convTransLayer，設置爲True，若是是convlayer設置爲conv
 89     type:bool
 90 
 91     layer_type:明確layer_type，默認爲None。
 92                若是trans= True，必須是exconvt或者cudnn_convt，不然的話要麼是exconv，要麼是cudnn_conv
 93                ps:若是是默認的話，paddle會自動選擇適合cpu的ExpandConvLayer和適合GPU的CudnnConvLayer
 94                固然，咱們本身也能夠明確選擇哪一種類型
 95     type:string
 96     return:LayerOutput object
 97     rtype:LayerOutput
 98 
 99     """
100 
101 
102 def img_conv_layer(input,
103                    filter_size,
104                    num_filters,
105                    name=None,
106                    num_channels=None,
107                    act=None,
108                    groups=1,
109                    stride=1,
110                    padding=0,
111                    dilation=1,
112                    bias_attr=None,
113                    param_attr=None,
114                    shared_biases=True,
115                    layer_attr=None,
116                    filter_size_y=None,
117                    stride_y=None,
118                    padding_y=None,
119                    dilation_y=None,
120                    trans=False,
121                    layer_type=None):
122 
123     if num_channels is None:
124         assert input.num_filters is not None
125         num_channels = input.num_filters
126 
127     if filter_size_y is None:
128         if isinstance(filter_size, collections.Sequence):
129             assert len(filter_size) == 2
130             filter_size, filter_size_y = filter_size
131         else:
132             filter_size_y = filter_size
133 
134     if stride_y is None:
135         if isinstance(stride, collections.Sequence):
136             assert len(stride) == 2
137             stride, stride_y = stride
138         else:
139             stride_y = stride
140 
141     if padding_y is None:
142         if isinstance(padding, collections.Sequence):
143             assert len(padding) == 2
144             padding, padding_y = padding
145         else:
146             padding_y = padding
147 
148     if dilation_y is None:
149         if isinstance(dilation, collections.Sequence):
150             assert len(dilation) == 2
151             dilation, dilation_y = dilation
152         else:
153             dilation_y = dilation
154 
155     if param_attr.attr.get('initial_smart'):
156         # special initial for conv layers.
157         init_w = (2.0 / (filter_size**2 * num_channels))**0.5
158         param_attr.attr["initial_mean"] = 0.0
159         param_attr.attr["initial_std"] = init_w
160         param_attr.attr["initial_strategy"] = 0
161         param_attr.attr["initial_smart"] = False
162 
163     if layer_type:
164         if dilation > 1 or dilation_y > 1:
165             assert layer_type in [
166                 "cudnn_conv", "cudnn_convt", "exconv", "exconvt"
167             ]
168         if trans:
169             assert layer_type in ["exconvt", "cudnn_convt"]
170         else:
171             assert layer_type in ["exconv", "cudnn_conv"]
172         lt = layer_type
173     else:
174         lt = LayerType.CONVTRANS_LAYER if trans else LayerType.CONV_LAYER
175 
176     l = Layer(
177         name=name,
178         inputs=Input(
179             input.name,
180             conv=Conv(
181                 filter_size=filter_size,
182                 padding=padding,
183                 dilation=dilation,
184                 stride=stride,
185                 channels=num_channels,
186                 groups=groups,
187                 filter_size_y=filter_size_y,
188                 padding_y=padding_y,
189                 dilation_y=dilation_y,
190                 stride_y=stride_y),
191             **param_attr.attr),
192         active_type=act.name,
193         num_filters=num_filters,
194         bias=ParamAttr.to_bias(bias_attr),
195         shared_biases=shared_biases,
196         type=lt,
197         **ExtraLayerAttribute.to_kwargs(layer_attr))
198     return LayerOutput(
199         name,
200         lt,
201         parents=[input],
202         activation=act,
203         num_filters=num_filters,
204         size=l.config.size)

　　　咱們瞭解這些參數的含義後，對比咱們以前本身手寫的CNN，能夠看出paddlepaddle有幾個優勢：

支持長方形和正方形的圖片尺寸

支持滑動步長stride、補零zero_padding、擴展dilation在水平和垂直方向上設置不一樣的值

支持偏置項卷積核中可以共享

自動適配cpu和gpu的卷積網絡

　　在咱們本身寫的CNN中，只支持正方形的圖片長度，若是是長方形會報錯。滑動步長，補零的維度等也只支持水平和垂直方向上的維度相同。瞭解卷積層的參數含義後，咱們來看一下底層的源碼是如何實現的：ConvBaseLayer.py 有興趣的同窗能夠在這個連接下看看底層是如何用C++寫的ConvLayer

　　池化層同理，能夠按照以前的思路分析，有興趣的能夠一直順延看到底層的實現，下次有機會再詳細分析。(佔坑明天補一下tensorflow的源碼實現)

總結　　

　　本文主要講解了卷積神經網絡中反向傳播的一些技巧，包括卷積層和池化層的反向傳播與傳統的反向傳播的區別，並實現了一個完整的CNN，後續你們能夠本身修改一些代碼，譬如當水平滑動長度與垂直滑動長度不一樣時須要怎麼調整等等，最後研究了一下paddlepaddle中CNN中的卷積層的實現過程，對比本身寫的CNN，總結了4個優勢，底層是C++實現的，有興趣的能夠本身再去深刻研究。寫的比較粗糙，若是有問題歡迎留言：）

參考文章：

1.https://www.cnblogs.com/pinard/p/6494810.html

2.https://www.zybuluo.com/hanbingtao/note/476663