上篇文章中咱們講解了卷積神經網絡的基本原理,包括幾個基本層的定義、運算規則等。本文主要寫卷積神經網絡如何進行一次完整的訓練,包括前向傳播和反向傳播,並本身手寫一個卷積神經網絡。若是不瞭解基本原理的,能夠先看看上篇文章:【深度學習系列】卷積神經網絡CNN原理詳解(一)——基本原理html
卷積神經網絡的前向傳播python
首先咱們來看一個最簡單的卷積神經網絡:git
1.輸入層---->卷積層github
以上一節的例子爲例,輸入是一個4*4 的image,通過兩個2*2的卷積核進行卷積運算後,變成兩個3*3的feature_map數組
以卷積核filter1爲例(stride = 1 ):網絡
計算第一個卷積層神經元o11的輸入: app
\begin{equation}
\begin{aligned}
\ net_{o_{11}}&= conv (input,filter)\\
&= i_{11} \times h_{11} + i_{12} \times h_{12} +i_{21} \times h_{21} + i_{22} \times h_{22}\\
&=1 \times 1 + 0 \times (-1) +1 \times 1 + 1 \times (-1)=1
\end{aligned}
\end{equation}dom
神經元o11的輸出:(此處使用Relu激活函數)ide
\begin{equation}
\begin{aligned}
out_{o_{11}} &= activators(net_{o_{11}}) \\
&=max(0,net_{o_{11}}) = 1
\end{aligned}
\end{equation}函數
其餘神經元計算方式相同
2.卷積層---->池化層
計算池化層m11 的輸入(取窗口爲 2 * 2),池化層沒有激活函數
\begin{equation}
\begin{aligned}
net_{m_{11}} &= max(o_{11},o_{12},o_{21},o_{22}) = 1\\
&out_{m_{11}} = net_{m_{11}} = 1
\end{aligned}
\end{equation}
3.池化層---->全鏈接層
池化層的輸出到flatten層把全部元素「拍平」,而後到全鏈接層。
4.全鏈接層---->輸出層
全鏈接層到輸出層就是正常的神經元與神經元之間的鄰接相連,經過softmax函數計算後輸出到output,獲得不一樣類別的機率值,輸出機率值最大的即爲該圖片的類別。
卷積神經網絡的反向傳播
傳統的神經網絡是全鏈接形式的,若是進行反向傳播,只須要由下一層對前一層不斷的求偏導,即求鏈式偏導就能夠求出每一層的偏差敏感項,而後求出權重和偏置項的梯度,便可更新權重。而卷積神經網絡有兩個特殊的層:卷積層和池化層。池化層輸出時不須要通過激活函數,是一個滑動窗口的最大值,一個常數,那麼它的偏導是1。池化層至關於對上層圖片作了一個壓縮,這個反向求偏差敏感項時與傳統的反向傳播方式不一樣。從卷積後的feature_map反向傳播到前一層時,因爲前向傳播時是經過卷積核作卷積運算獲得的feature_map,因此反向傳播與傳統的也不同,須要更新卷積核的參數。下面咱們介紹一下池化層和卷積層是如何作反向傳播的。
在介紹以前,首先回顧一下傳統的反向傳播方法:
1.經過前向傳播計算每一層的輸入值$net_{i,j}$ (如卷積後的feature_map的第一個神經元的輸入:$net_{i_{11}}$)
2.反向傳播計算每一個神經元的偏差項$\delta_{i,j}$ ,$\delta_{i,j} = \frac{\partial E}{\partial net_{i,j}}$,其中E爲損失函數計算獲得的整體偏差,能夠用平方差,交叉熵等表示。
3.計算每一個神經元權重$w_{i,j}$ 的梯度,$\eta_{i,j} = \frac{\partial E}{\partial net_{i,j}} \cdot \frac{\partial net_{i,j}}{\partial w_{i,j}} = \delta_{i,j} \cdot out_{i,j}$
4.更新權重 $w_{i,j} = w_{i,j}-\lambda \cdot \eta_{i,j}$(其中$\lambda$爲學習率)
卷積層的反向傳播
由前向傳播可得:
每個神經元的值都是上一個神經元的輸入做爲這個神經元的輸入,通過激活函數激活以後輸出,做爲下一個神經元的輸入,在這裏我用$i_{11}$表示前一層,$o_{11}$表示$i_{11}$的下一層。那麼$net_{i_{11}}$就是i11這個神經元的輸入,$out_{i_{11}}$就是i11這個神經元的輸出,同理,$net_{o_{11}}$就是o11這個神經元的輸入,$out_{o_{11}}$就是$o_{11}$這個神經元的輸出,由於上一層神經元的輸出 = 下一層神經元的輸入,因此$out_{i_{11}}$= $net_{o_{11}}$,這裏我爲了簡化,直接把$out_{i_{11}}$記爲$i_{11}$
\begin{equation}
\begin{aligned}
\ i_{11}
&=out_{i_{11}} \\
&= activators(net_{i_{11}})\\
\ net_{o_{11}}&= conv (input,filter)\\
&= i_{11} \times h_{11} + i_{12} \times h_{12} +i_{21} \times h_{21} + i_{22} \times h_{22}\\
out_{o_{11}} &= activators(net_{o_{11}}) \\
&=max(0,net_{o_{11}})
\end{aligned}
\end{equation}
$net_{i_{11}}$表示上一層的輸入,$out_{i_{11}}$表示上一層的輸出
首先計算卷積的上一層的第一個元素$i_{11}$的偏差項$\delta_{11}$:
$$\delta_{11} = \frac{\partial E}{\partial net_{i_{11}}} =\frac{\partial E}{\partial out_{i_{11}}} \cdot \frac{\partial out_{i_{11}}}{\partial net_{i_{11}}} = \frac{\partial E}{\partial i_{11}} \cdot \frac{\partial i_{11}}{\partial net_{i_{11}}}$$
先計算$\frac{\partial E}{\partial i_{11}} $
此處咱們並不清楚$\frac{\partial E}{\partial i_{11}}$怎麼算,那能夠先把input層經過卷積核作完卷積運算後的輸出feature_map寫出來:
\begin{equation}
\begin{aligned}
net_{o_{11}} = i_{11} \times h_{11} + i_{12} \times h_{12} +i_{21} \times h_{21} + i_{22} \times h_{22} \\
net_{o_{12}} = i_{12} \times h_{11} + i_{13} \times h_{12} +i_{22} \times h_{21} + i_{23} \times h_{22} \\
net_{o_{12}} = i_{13} \times h_{11} + i_{14} \times h_{12} +i_{23} \times h_{21} + i_{24} \times h_{22} \\
net_{o_{21}} = i_{21} \times h_{11} + i_{22} \times h_{12} +i_{31} \times h_{21} + i_{32} \times h_{22} \\
net_{o_{22}} = i_{22} \times h_{11} + i_{23} \times h_{12} +i_{32} \times h_{21} + i_{33} \times h_{22} \\
net_{o_{23}} = i_{23} \times h_{11} + i_{24} \times h_{12} +i_{33} \times h_{21} + i_{34} \times h_{22} \\
net_{o_{31}} = i_{31} \times h_{11} + i_{32} \times h_{12} +i_{41} \times h_{21} + i_{42} \times h_{22} \\
net_{o_{32}} = i_{32} \times h_{11} + i_{33} \times h_{12} +i_{42} \times h_{21} + i_{43} \times h_{22} \\
net_{o_{33}} = i_{33} \times h_{11} + i_{34} \times h_{12} +i_{43} \times h_{21} + i_{44} \times h_{22} \\
\end{aligned}
\end{equation}
而後依次對輸入元素$i_{i,j}$求偏導
$i_{11}$的偏導:
\begin{equation}
\begin{aligned}
\frac{\partial E}{\partial i_{11}}&=\frac{\partial E}{\partial net_{o_{11}}} \cdot \frac{\partial net_{o_{11}}}{\partial i_{11}}\\
&=\delta_{11} \cdot h_{11}
\end{aligned}
\end{equation}
$i_{12}$的偏導:
\begin{equation}
\begin{aligned}
\frac{\partial E}{\partial i_{12}}&=\frac{\partial E}{\partial net_{o_{11}}} \cdot \frac{\partial net_{o_{11}}}{\partial i_{12}} +\frac{\partial E}{\partial net_{o_{12}}} \cdot \frac{\partial net_{o_{12}}}{\partial i_{12}}\\
&=\delta_{11} \cdot h_{12}+\delta_{12} \cdot h_{11}
\end{aligned}
\end{equation}
$i_{13}$的偏導:
\begin{equation}
\begin{aligned}
\frac{\partial E}{\partial i_{13}}&=\frac{\partial E}{\partial net_{o_{12}}} \cdot \frac{\partial net_{o_{12}}}{\partial i_{13}} +\frac{\partial E}{\partial net_{o_{13}}} \cdot \frac{\partial net_{o_{13}}}{\partial i_{13}}\\
&=\delta_{12} \cdot h_{12}+\delta_{13} \cdot h_{11}
\end{aligned}
\end{equation}
$i_{21}$的偏導:
\begin{equation}
\begin{aligned}
\frac{\partial E}{\partial i_{21}}&=\frac{\partial E}{\partial net_{o_{11}}} \cdot \frac{\partial net_{o_{11}}}{\partial i_{21}} +\frac{\partial E}{\partial net_{o_{21}}} \cdot \frac{\partial net_{o_{21}}}{\partial i_{21}}\\
&=\delta_{11} \cdot h_{21}+\delta_{21} \cdot h_{11}
\end{aligned}
\end{equation}
$i_{22}$的偏導:
\begin{equation}
\begin{aligned}
\frac{\partial E}{\partial i_{22}}&=\frac{\partial E}{\partial net_{o_{11}}} \cdot \frac{\partial net_{o_{11}}}{\partial i_{22}} +\frac{\partial E}{\partial net_{o_{12}}} \cdot \frac{\partial net_{o_{12}}}{\partial i_{22}}\\
&+\frac{\partial E}{\partial net_{o_{21}}} \cdot \frac{\partial net_{o_{21}}}{\partial i_{22}}+\frac{\partial E}{\partial net_{o_{22}}} \cdot \frac{\partial net_{o_{22}}}{\partial i_{22}}\\
&=\delta_{11} \cdot h_{22}+\delta_{12} \cdot h_{21}+\delta_{21} \cdot h_{12}+\delta_{22} \cdot h_{11}
\end{aligned}
\end{equation}
觀察一下上面幾個式子的規律,概括一下,能夠獲得以下表達式:
\begin{equation}
{
\left[ \begin{array}{ccc}
0& 0& 0& 0& 0& \\
0& \delta_{11} & \delta_{12} & \delta_{13}&0\\
0&\delta_{21} & \delta_{22} & \delta_{23} &0\\
0&\delta_{31} & \delta_{32} & \delta_{33} &0\\
0& 0& 0& 0& 0& \\
\end{array}
\right ]
\cdot
\left[ \begin{array}{ccc}
h_{22}& h_{21} \\
h_{12}& h_{11} \\
\end{array}
\right]}=
\left[ \begin{array}{ccc}
\frac{\partial E}{\partial i_{11}}& \frac{\partial E}{\partial i_{12}}& \frac{\partial E}{\partial i_{13}}& \frac{\partial E}{\partial i_{14}} \\
\frac{\partial E}{\partial i_{21}}& \frac{\partial E}{\partial i_{22}}& \frac{\partial E}{\partial i_{23}}& \frac{\partial E}{\partial i_{24}} \\
\frac{\partial E}{\partial i_{31}}& \frac{\partial E}{\partial i_{32}}& \frac{\partial E}{\partial i_{33}}& \frac{\partial E}{\partial i_{34}} \\
\frac{\partial E}{\partial i_{41}}& \frac{\partial E}{\partial i_{42}}& \frac{\partial E}{\partial i_{43}}& \frac{\partial E}{\partial i_{44}} \\
\end{array}
\right]
\end{equation}
圖中的卷積核進行了180°翻轉,與這一層的偏差敏感項矩陣${delta_{i,j})}$周圍補零後的矩陣作卷積運算後,就能夠獲得${\frac{\partial E}{\partial i_{11}}}$,即
$\frac{\partial E}{\partial i_{i,j}} = \sum_m \cdot \sum_n h_{m,n}\delta_{i+m,j+n}$
第一項求完後,咱們來求第二項$\frac{\partial i_{11}}{\partial net_{i_{11}}}$
\begin{equation}
\begin{aligned}
\because i_{11} &= out_{i_{11}} \\
&= activators(net_{i_{11}})\\
\therefore \frac{\partial i_{11}}{\partial net_{i_{11}}}
&=f'(net_{i_{11}})\\
\therefore \delta_{11} &=\frac{\partial E}{\partial net_{i_{11}}} \\
&=\frac{\partial E}{\partial i_{11}} \cdot \frac{\partial i_{11}}{\partial net_{i_{11}}}\\
&=\sum_m \cdot \sum_n h_{m,n}\delta_{i+m,j+n} \cdot f'(net_{i_{11}})
\end{aligned}
\end{equation}
此時咱們的偏差敏感矩陣就求完了,獲得偏差敏感矩陣後,便可求權重的梯度。
因爲上面已經寫出了卷積層的輸入$net_{o_{11}}$與權重$h_{i,j}$之間的表達式,因此能夠直接求出:
\begin{equation}
\begin{aligned}
\frac{\partial E}{\partial h_{11}}&=\frac{\partial E}{\partial net_{o_{11}}} \cdot \frac{\partial net_{o_{11}}}{\partial h_{11}}+...\\
&+\frac{\partial E}{\partial net_{o_{33}}} \cdot \frac{\partial net_{o_{33}}}{\partial h_{11}}\\
&=\delta_{11} \cdot h_{11} +...+ \delta_{33} \cdot h_{11}
\end{aligned}
\end{equation}
推論出權重的梯度:
\begin{equation}
\begin{aligned}
\frac{\partial E}{\partial h_{i,j}} = \sum_m\sum_n\delta_{m,n}out_{o_{i+m,j+n}}
\end{aligned}
\end{equation}
偏置項的梯度:
\begin{equation}
\begin{aligned}
\frac{\partial E}{\partial b} &=\frac{\partial E}{\partial net_{o_{11}}} \frac{\partial net_{o_{11}}}{\partial w_b} +\frac{\partial E}{\partial net_{o_{12}}} \frac{\partial net_{o_{12}}}{\partial w_b}\\
&+\frac{\partial E}{\partial net_{o_{21}}} \frac{\partial net_{o_{21}}}{\partial w_b} +\frac{\partial E}{\partial net_{o_{22}}} \frac{\partial net_{o_{22}}}{\partial w_b}\\
&=\delta_{11}+\delta_{12}+\delta_{21}+\delta_{22}\\
&=\sum_i\sum_j\delta_{i,j}
\end{aligned}
\end{equation}
能夠看出,偏置項的偏導等於這一層全部偏差敏感項之和。獲得了權重和偏置項的梯度後,就能夠根據梯度降低法更新權重和梯度了。
池化層的反向傳播
池化層的反向傳播就比較好求了,看着下面的圖,左邊是上一層的輸出,也就是卷積層的輸出feature_map,右邊是池化層的輸入,仍是先根據前向傳播,把式子都寫出來,方便計算:
假設上一層這個滑動窗口的最大值是$out_{o_{11}}$
\begin{equation}
\begin{aligned}
&\because net_{m_{11}} = max(out_{o_{11}},out_{o_{12}},out_{o_{21}},out_{o_{22}})\\
&\therefore \frac{\partial net_{m_{11}}}{\partial out_{o_{11}}} = 1\\
& \frac{\partial net_{m_{11}}}{\partial out_{o_{12}}}=\frac{\partial net_{m_{11}}}{\partial out_{o_{21}}}=\frac{\partial net_{m_{11}}}{\partial out_{o_{22}}} = 0\\
&\therefore \delta_{11}^{l-1} = \frac{\partial E}{\partial out_{o_{11}}} = \frac{\partial E}{\partial net_{m_{11}}} \cdot \frac{\partial net_{m_{11}}}{\partial out_{o_{11}}} =\delta_{11}^l\\
&\delta_{12}^{l-1} = \delta_{21}^{l-1} =\delta_{22}^{l-1} = 0
\end{aligned}
\end{equation}
這樣就求出了池化層的偏差敏感項矩陣。同理能夠求出每一個神經元的梯度並更新權重。
手寫一個卷積神經網絡
1.定義一個卷積層
首先咱們經過ConvLayer來實現一個卷積層,定義卷積層的超參數
1 class ConvLayer(object): 2 ''' 3 參數含義: 4 input_width:輸入圖片尺寸——寬度 5 input_height:輸入圖片尺寸——長度 6 channel_number:通道數,彩色爲3,灰色爲1 7 filter_width:卷積核的寬 8 filter_height:卷積核的長 9 filter_number:卷積核數量 10 zero_padding:補零長度 11 stride:步長 12 activator:激活函數 13 learning_rate:學習率 14 ''' 15 def __init__(self, input_width, input_height, 16 channel_number, filter_width, 17 filter_height, filter_number, 18 zero_padding, stride, activator, 19 learning_rate): 20 self.input_width = input_width 21 self.input_height = input_height 22 self.channel_number = channel_number 23 self.filter_width = filter_width 24 self.filter_height = filter_height 25 self.filter_number = filter_number 26 self.zero_padding = zero_padding 27 self.stride = stride 28 self.output_width = \ 29 ConvLayer.calculate_output_size( 30 self.input_width, filter_width, zero_padding, 31 stride) 32 self.output_height = \ 33 ConvLayer.calculate_output_size( 34 self.input_height, filter_height, zero_padding, 35 stride) 36 self.output_array = np.zeros((self.filter_number, 37 self.output_height, self.output_width)) 38 self.filters = [] 39 for i in range(filter_number): 40 self.filters.append(Filter(filter_width, 41 filter_height, self.channel_number)) 42 self.activator = activator 43 self.learning_rate = learning_rate
其中calculate_output_size用來計算經過卷積運算後輸出的feature_map大小
1 @staticmethod 2 def calculate_output_size(input_size, 3 filter_size, zero_padding, stride): 4 return (input_size - filter_size + 5 2 * zero_padding) / stride + 1
2.構造一個激活函數
此處用的是RELU激活函數,所以咱們在activators.py裏定義,forward是前向計算,backforward是計算公式的導數:
1 class ReluActivator(object): 2 def forward(self, weighted_input): 3 #return weighted_input 4 return max(0, weighted_input) 5 6 def backward(self, output): 7 return 1 if output > 0 else 0
其餘常見的激活函數咱們也能夠放到activators裏,如sigmoid函數,咱們能夠作以下定義:
1 class SigmoidActivator(object): 2 def forward(self, weighted_input): 3 return 1.0 / (1.0 + np.exp(-weighted_input)) 4 #the partial of sigmoid 5 def backward(self, output): 6 return output * (1 - output)
若是咱們須要自動以其餘的激活函數,均可以在activator.py定義一個類便可。
3.定義一個類,保存卷積層的參數和梯度
1 class Filter(object): 2 def __init__(self, width, height, depth): 3 #初始權重 4 self.weights = np.random.uniform(-1e-4, 1e-4, 5 (depth, height, width)) 6 #初始偏置 7 self.bias = 0 8 self.weights_grad = np.zeros( 9 self.weights.shape) 10 self.bias_grad = 0 11 12 def __repr__(self): 13 return 'filter weights:\n%s\nbias:\n%s' % ( 14 repr(self.weights), repr(self.bias)) 15 16 def get_weights(self): 17 return self.weights 18 19 def get_bias(self): 20 return self.bias 21 22 def update(self, learning_rate): 23 self.weights -= learning_rate * self.weights_grad 24 self.bias -= learning_rate * self.bias_grad
4.卷積層的前向傳播
1).獲取卷積區域
1 # 獲取卷積區域 2 def get_patch(input_array, i, j, filter_width, 3 filter_height, stride): 4 ''' 5 從輸入數組中獲取本次卷積的區域, 6 自動適配輸入爲2D和3D的狀況 7 ''' 8 start_i = i * stride 9 start_j = j * stride 10 if input_array.ndim == 2: 11 input_array_conv = input_array[ 12 start_i : start_i + filter_height, 13 start_j : start_j + filter_width] 14 print "input_array_conv:",input_array_conv 15 return input_array_conv 16 17 elif input_array.ndim == 3: 18 input_array_conv = input_array[:, 19 start_i : start_i + filter_height, 20 start_j : start_j + filter_width] 21 print "input_array_conv:",input_array_conv 22 return input_array_conv
2).進行卷積運算
1 def conv(input_array, 2 kernel_array, 3 output_array, 4 stride, bias): 5 ''' 6 計算卷積,自動適配輸入爲2D和3D的狀況 7 ''' 8 channel_number = input_array.ndim 9 output_width = output_array.shape[1] 10 output_height = output_array.shape[0] 11 kernel_width = kernel_array.shape[-1] 12 kernel_height = kernel_array.shape[-2] 13 for i in range(output_height): 14 for j in range(output_width): 15 output_array[i][j] = ( 16 get_patch(input_array, i, j, kernel_width, 17 kernel_height, stride) * kernel_array 18 ).sum() + bias
3).增長zero_padding
1 #增長Zero padding 2 def padding(input_array, zp): 3 ''' 4 爲數組增長Zero padding,自動適配輸入爲2D和3D的狀況 5 ''' 6 if zp == 0: 7 return input_array 8 else: 9 if input_array.ndim == 3: 10 input_width = input_array.shape[2] 11 input_height = input_array.shape[1] 12 input_depth = input_array.shape[0] 13 padded_array = np.zeros(( 14 input_depth, 15 input_height + 2 * zp, 16 input_width + 2 * zp)) 17 padded_array[:, 18 zp : zp + input_height, 19 zp : zp + input_width] = input_array 20 return padded_array 21 elif input_array.ndim == 2: 22 input_width = input_array.shape[1] 23 input_height = input_array.shape[0] 24 padded_array = np.zeros(( 25 input_height + 2 * zp, 26 input_width + 2 * zp)) 27 padded_array[zp : zp + input_height, 28 zp : zp + input_width] = input_array 29 return padded_array
4).進行前向傳播
1 def forward(self, input_array): 2 ''' 3 計算卷積層的輸出 4 輸出結果保存在self.output_array 5 ''' 6 self.input_array = input_array 7 self.padded_input_array = padding(input_array, 8 self.zero_padding) 9 for f in range(self.filter_number): 10 filter = self.filters[f] 11 conv(self.padded_input_array, 12 filter.get_weights(), self.output_array[f], 13 self.stride, filter.get_bias()) 14 element_wise_op(self.output_array, 15 self.activator.forward)
其中element_wise_op函數是將每一個組的元素對應相乘
1 # 對numpy數組進行element wise操做,將矩陣中的每一個元素對應相乘 2 def element_wise_op(array, op): 3 for i in np.nditer(array, 4 op_flags=['readwrite']): 5 i[...] = op(i)
5.卷積層的反向傳播
1).將偏差傳遞到上一層
1 def bp_sensitivity_map(self, sensitivity_array, 2 activator): 3 ''' 4 計算傳遞到上一層的sensitivity map 5 sensitivity_array: 本層的sensitivity map 6 activator: 上一層的激活函數 7 ''' 8 # 處理卷積步長,對原始sensitivity map進行擴展 9 expanded_array = self.expand_sensitivity_map( 10 sensitivity_array) 11 # full卷積,對sensitivitiy map進行zero padding 12 # 雖然原始輸入的zero padding單元也會得到殘差 13 # 但這個殘差不須要繼續向上傳遞,所以就不計算了 14 expanded_width = expanded_array.shape[2] 15 zp = (self.input_width + 16 self.filter_width - 1 - expanded_width) / 2 17 padded_array = padding(expanded_array, zp) 18 # 初始化delta_array,用於保存傳遞到上一層的 19 # sensitivity map 20 self.delta_array = self.create_delta_array() 21 # 對於具備多個filter的卷積層來講,最終傳遞到上一層的 22 # sensitivity map至關於全部的filter的 23 # sensitivity map之和 24 for f in range(self.filter_number): 25 filter = self.filters[f] 26 # 將filter權重翻轉180度 27 flipped_weights = np.array(map( 28 lambda i: np.rot90(i, 2), 29 filter.get_weights())) 30 # 計算與一個filter對應的delta_array 31 delta_array = self.create_delta_array() 32 for d in range(delta_array.shape[0]): 33 conv(padded_array[f], flipped_weights[d], 34 delta_array[d], 1, 0) 35 self.delta_array += delta_array 36 # 將計算結果與激活函數的偏導數作element-wise乘法操做 37 derivative_array = np.array(self.input_array) 38 element_wise_op(derivative_array, 39 activator.backward) 40 self.delta_array *= derivative_array
2).保存傳遞到上一層的sensitivity map的數組
1 def create_delta_array(self): 2 return np.zeros((self.channel_number, 3 self.input_height, self.input_width))
3).計算代碼梯度
1 def bp_gradient(self, sensitivity_array): 2 # 處理卷積步長,對原始sensitivity map進行擴展 3 expanded_array = self.expand_sensitivity_map( 4 sensitivity_array) 5 for f in range(self.filter_number): 6 # 計算每一個權重的梯度 7 filter = self.filters[f] 8 for d in range(filter.weights.shape[0]): 9 conv(self.padded_input_array[d], 10 expanded_array[f], 11 filter.weights_grad[d], 1, 0) 12 # 計算偏置項的梯度 13 filter.bias_grad = expanded_array[f].sum()
4).按照梯度降低法更新參數
1 def update(self): 2 ''' 3 按照梯度降低,更新權重 4 ''' 5 for filter in self.filters: 6 filter.update(self.learning_rate)
6.MaxPooling層的訓練
1).定義MaxPooling類
1 class MaxPoolingLayer(object): 2 def __init__(self, input_width, input_height, 3 channel_number, filter_width, 4 filter_height, stride): 5 self.input_width = input_width 6 self.input_height = input_height 7 self.channel_number = channel_number 8 self.filter_width = filter_width 9 self.filter_height = filter_height 10 self.stride = stride 11 self.output_width = (input_width - 12 filter_width) / self.stride + 1 13 self.output_height = (input_height - 14 filter_height) / self.stride + 1 15 self.output_array = np.zeros((self.channel_number, 16 self.output_height, self.output_width))
2).前向傳播計算
1 # 前向傳播 2 def forward(self, input_array): 3 for d in range(self.channel_number): 4 for i in range(self.output_height): 5 for j in range(self.output_width): 6 self.output_array[d,i,j] = ( 7 get_patch(input_array[d], i, j, 8 self.filter_width, 9 self.filter_height, 10 self.stride).max())
3).反向傳播計算
1 #反向傳播 2 def backward(self, input_array, sensitivity_array): 3 self.delta_array = np.zeros(input_array.shape) 4 for d in range(self.channel_number): 5 for i in range(self.output_height): 6 for j in range(self.output_width): 7 patch_array = get_patch( 8 input_array[d], i, j, 9 self.filter_width, 10 self.filter_height, 11 self.stride) 12 k, l = get_max_index(patch_array) 13 self.delta_array[d, 14 i * self.stride + k, 15 j * self.stride + l] = \ 16 sensitivity_array[d,i,j]
完整代碼請見:cnn.py (https://github.com/huxiaoman7/PaddlePaddle_code/blob/master/1.mnist/cnn.py)
1 #coding:utf-8 2 ''' 3 Created by huxiaoman 2017.11.22 4 5 ''' 6 7 import numpy as np 8 from activators import ReluActivator,IdentityActivator 9 10 class ConvLayer(object): 11 def __init__(self,input_width,input_weight, 12 channel_number,filter_width, 13 filter_height,filter_number, 14 zero_padding,stride,activator, 15 learning_rate): 16 self.input_width = input_width 17 self.input_height = input_height 18 self.channel_number = channel_number 19 self.filter_width = filter_width 20 self.filter_height = filter_height 21 self.filter_number = filter_number 22 self.zero_padding = zero_padding 23 self.stride = stride #此處能夠加上stride_x, stride_y 24 self.output_width = ConvLayer.calculate_output_size( 25 self.input_width,filter_width,zero_padding, 26 stride) 27 self.output_height = ConvLayer.calculate_output_size( 28 self.input_height,filter_height,zero_padding, 29 stride) 30 self.output_array = np.zeros((self.filter_number, 31 self.output_height,self.output_width)) 32 self.filters = [] 33 for i in range(filter_number): 34 self.filters.append(Filter(filter_width, 35 filter_height,self.channel_number)) 36 self.activator = activator 37 self.learning_rate = learning_rate 38 def forward(self,input_array): 39 ''' 40 計算卷積層的輸出 41 輸出結果保存在self.output_array 42 ''' 43 self.input_array = input_array 44 self.padded_input_array = padding(input_array, 45 self.zero_padding) 46 for i in range(self.filter_number): 47 filter = self.filters[f] 48 conv(self.padded_input_array, 49 filter.get_weights(), self.output_array[f], 50 self.stride, filter.get_bias()) 51 element_wise_op(self.output_array, 52 self.activator.forward) 53 54 def get_batch(input_array, i, j, filter_width,filter_height,stride): 55 ''' 56 從輸入數組中獲取本次卷積的區域, 57 自動適配輸入爲2D和3D的狀況 58 ''' 59 start_i = i * stride 60 start_j = j * stride 61 if input_array.ndim == 2: 62 return input_array[ 63 start_i : start_i + filter_height, 64 start_j : start_j + filter_width] 65 elif input_array.ndim == 3: 66 return input_array[ 67 start_i : start_i + filter_height, 68 start_j : start_j + filter_width] 69 70 # 獲取一個2D區域的最大值所在的索引 71 def get_max_index(array): 72 max_i = 0 73 max_j = 0 74 max_value = array[0,0] 75 for i in range(array.shape[0]): 76 for j in range(array.shape[1]): 77 if array[i,j] > max_value: 78 max_value = array[i,j] 79 max_i, max_j = i, j 80 return max_i, max_j 81 82 def conv(input_array,kernal_array, 83 output_array,stride,bias): 84 ''' 85 計算卷積,自動適配輸入2D,3D的狀況 86 ''' 87 channel_number = input_array.ndim 88 output_width = output_array.shape[1] 89 output_height = output_array.shape[0] 90 kernel_width = kernel_array.shape[-1] 91 kernel_height = kernel_array.shape[-2] 92 for i in range(output_height): 93 for j in range(output_width): 94 output_array[i][j] = ( 95 get_patch(input_array, i, j, kernel_width, 96 kernel_height,stride) * kernel_array).sum() +bias 97 98 99 def element_wise_op(array, op): 100 for i in np.nditer(array, 101 op_flags = ['readwrite']): 102 i[...] = op(i) 103 104 105 class ReluActivators(object): 106 def forward(self, weighted_input): 107 # Relu計算公式 = max(0,input) 108 return max(0, weighted_input) 109 110 def backward(self,output): 111 return 1 if output > 0 else 0 112 113 class SigmoidActivator(object): 114 115 def forward(self,weighted_input): 116 return 1 / (1 + math.exp(- weighted_input)) 117 118 def backward(self,output): 119 return output * (1 - output)
最後,咱們用以前的4 * 4的image數據檢驗一下經過一次卷積神經網絡進行前向傳播和反向傳播後的輸出結果:
1 def init_test(): 2 a = np.array( 3 [[[0,1,1,0,2], 4 [2,2,2,2,1], 5 [1,0,0,2,0], 6 [0,1,1,0,0], 7 [1,2,0,0,2]], 8 [[1,0,2,2,0], 9 [0,0,0,2,0], 10 [1,2,1,2,1], 11 [1,0,0,0,0], 12 [1,2,1,1,1]], 13 [[2,1,2,0,0], 14 [1,0,0,1,0], 15 [0,2,1,0,1], 16 [0,1,2,2,2], 17 [2,1,0,0,1]]]) 18 b = np.array( 19 [[[0,1,1], 20 [2,2,2], 21 [1,0,0]], 22 [[1,0,2], 23 [0,0,0], 24 [1,2,1]]]) 25 cl = ConvLayer(5,5,3,3,3,2,1,2,IdentityActivator(),0.001) 26 cl.filters[0].weights = np.array( 27 [[[-1,1,0], 28 [0,1,0], 29 [0,1,1]], 30 [[-1,-1,0], 31 [0,0,0], 32 [0,-1,0]], 33 [[0,0,-1], 34 [0,1,0], 35 [1,-1,-1]]], dtype=np.float64) 36 cl.filters[0].bias=1 37 cl.filters[1].weights = np.array( 38 [[[1,1,-1], 39 [-1,-1,1], 40 [0,-1,1]], 41 [[0,1,0], 42 [-1,0,-1], 43 [-1,1,0]], 44 [[-1,0,0], 45 [-1,0,1], 46 [-1,0,0]]], dtype=np.float64) 47 return a, b, cl
運行一下:
1 def test(): 2 a, b, cl = init_test() 3 cl.forward(a) 4 print "前向傳播結果:", cl.output_array 5 cl.backward(a, b, IdentityActivator()) 6 cl.update() 7 print "反向傳播後更新獲得的filter1:",cl.filters[0] 8 print "反向傳播後更新獲得的filter2:",cl.filters[1] 9 10 if __name__ == "__main__": 11 test()
運行結果:
1 前向傳播結果: [[[ 6. 7. 5.] 2 [ 3. -1. -1.] 3 [ 2. -1. 4.]] 4 5 [[ 2. -5. -8.] 6 [ 1. -4. -4.] 7 [ 0. -5. -5.]]] 8 反向傳播後更新獲得的filter1: filter weights: 9 array([[[-1.008, 0.99 , -0.009], 10 [-0.005, 0.994, -0.006], 11 [-0.006, 0.995, 0.996]], 12 13 [[-1.004, -1.001, -0.004], 14 [-0.01 , -0.009, -0.012], 15 [-0.002, -1.002, -0.002]], 16 17 [[-0.002, -0.002, -1.003], 18 [-0.005, 0.992, -0.005], 19 [ 0.993, -1.008, -1.007]]]) 20 bias: 21 0.99099999999999999 22 反向傳播後更新獲得的filter2: filter weights: 23 array([[[ 9.98000000e-01, 9.98000000e-01, -1.00100000e+00], 24 [ -1.00400000e+00, -1.00700000e+00, 9.97000000e-01], 25 [ -4.00000000e-03, -1.00400000e+00, 9.98000000e-01]], 26 27 [[ 0.00000000e+00, 9.99000000e-01, 0.00000000e+00], 28 [ -1.00900000e+00, -5.00000000e-03, -1.00400000e+00], 29 [ -1.00400000e+00, 1.00000000e+00, 0.00000000e+00]], 30 31 [[ -1.00400000e+00, -6.00000000e-03, -5.00000000e-03], 32 [ -1.00200000e+00, -5.00000000e-03, 9.98000000e-01], 33 [ -1.00200000e+00, -1.00000000e-03, 0.00000000e+00]]]) 34 bias: 35 -0.0070000000000000001
PaddlePaddle卷積神經網絡源碼解析
卷積層
在上篇文章中,咱們對paddlepaddle實現卷積神經網絡的的函數簡單介紹了一下。在手寫數字識別中,咱們設計CNN的網絡結構時,調用了一個函數simple_img_conv_pool(上篇文章的連接已失效,由於已經把framework--->fluid,更新速度太快了 = =)使用方式以下:
1 conv_pool_1 = paddle.networks.simple_img_conv_pool( 2 input=img, 3 filter_size=5, 4 num_filters=20, 5 num_channel=1, 6 pool_size=2, 7 pool_stride=2, 8 act=paddle.activation.Relu())
這個函數把卷積層和池化層兩個部分封裝在一塊兒,只用調用一個函數就能夠搞定,很是方便。若是隻須要單獨使用卷積層,能夠調用這個函數img_conv_layer,使用方式以下:
1 conv = img_conv_layer(input=data, filter_size=1, filter_size_y=1, 2 num_channels=8, 3 num_filters=16, stride=1, 4 bias_attr=False, 5 act=ReluActivation())
咱們來看一下這個函數具體有哪些參數(註釋寫明瞭參數的含義和怎麼使用)
1 def img_conv_layer(input, 2 filter_size, 3 num_filters, 4 name=None, 5 num_channels=None, 6 act=None, 7 groups=1, 8 stride=1, 9 padding=0, 10 dilation=1, 11 bias_attr=None, 12 param_attr=None, 13 shared_biases=True, 14 layer_attr=None, 15 filter_size_y=None, 16 stride_y=None, 17 padding_y=None, 18 dilation_y=None, 19 trans=False, 20 layer_type=None): 21 """ 22 適合圖像的卷積層。Paddle能夠支持正方形和長方形兩種圖片尺寸的輸入 23 24 也可適用於圖像的反捲積(Convolutional Transpose,即deconv)。 25 一樣可支持正方形和長方形兩種尺寸輸入。 26 27 num_channel:輸入圖片的通道數。能夠是1或者3,或者是上一層的通道數(卷積核數目 * 組的數量) 28 每個組都會處理圖片的一些通道。舉個例子,若是一個輸入如偏的num_channel是256,設置4個group, 29 32個卷積核,那麼會建立32*4 = 128個卷積核來處理輸入圖片。通道會被分紅四塊,32個卷積核會先 30 處理64(256/4=64)個通道。剩下的卷積核組會處理剩下的通道。 31 32 name:層的名字。可選,自定義。 33 type:basestring 34 35 input:這個層的輸入 36 type:LayerOutPut 37 38 filter_size:卷積核的x維,能夠理解爲width。 39 若是是正方形,能夠直接輸入一個元祖組表示圖片的尺寸 40 type:int/ tuple/ list 41 42 filter_size_y:卷積核的y維,能夠理解爲height。 43 PaddlePaddle支持長方形的圖片尺寸,因此卷積核的尺寸爲(filter_size,filter_size_y) 44 45 type:int/ None 46 47 act: 激活函數類型。默認選Relu 48 type:BaseActivation 49 50 groups:卷積核的組數量 51 type:int 52 53 54 stride: 水平方向的滑動步長。或者世界輸入一個元祖,表明水平數值滑動步長相同。 55 type:int/ tuple/ list 56 57 stride_y:垂直滑動步長。 58 type:int 59 60 padding: 補零的水平維度,也能夠直接輸入一個元祖,水平和垂直方向上補零的維度相同。 61 type:int/ tuple/ list 62 63 padding_y:垂直方向補零的維度 64 type:int 65 66 dilation:水平方向的擴展維度。一樣能夠輸入一個元祖表示水平和初值上擴展維度相同 67 :type:int/ tuple/ list 68 69 dilation_y:垂直方向的擴展維度 70 type:int 71 72 bias_attr:偏置屬性 73 False:不定義bias True:bias初始化爲0 74 type: ParameterAttribute/ None/ bool/ Any 75 76 num_channel:輸入圖片的通道channel。若是設置爲None,自動生成爲上層輸出的通道數 77 type: int 78 79 param_attr:卷積參數屬性。設置爲None表示默認屬性 80 param_attr:ParameterAttribute 81 82 shared_bias:設置偏置項是否會在卷積核中共享 83 type:bool 84 85 layer_attr: Layer的 Extra Attribute 86 type:ExtraLayerAttribute 87 88 param trans:若是是convTransLayer,設置爲True,若是是convlayer設置爲conv 89 type:bool 90 91 layer_type:明確layer_type,默認爲None。 92 若是trans= True,必須是exconvt或者cudnn_convt,不然的話要麼是exconv,要麼是cudnn_conv 93 ps:若是是默認的話,paddle會自動選擇適合cpu的ExpandConvLayer和適合GPU的CudnnConvLayer 94 固然,咱們本身也能夠明確選擇哪一種類型 95 type:string 96 return:LayerOutput object 97 rtype:LayerOutput 98 99 """ 100 101 102 def img_conv_layer(input, 103 filter_size, 104 num_filters, 105 name=None, 106 num_channels=None, 107 act=None, 108 groups=1, 109 stride=1, 110 padding=0, 111 dilation=1, 112 bias_attr=None, 113 param_attr=None, 114 shared_biases=True, 115 layer_attr=None, 116 filter_size_y=None, 117 stride_y=None, 118 padding_y=None, 119 dilation_y=None, 120 trans=False, 121 layer_type=None): 122 123 if num_channels is None: 124 assert input.num_filters is not None 125 num_channels = input.num_filters 126 127 if filter_size_y is None: 128 if isinstance(filter_size, collections.Sequence): 129 assert len(filter_size) == 2 130 filter_size, filter_size_y = filter_size 131 else: 132 filter_size_y = filter_size 133 134 if stride_y is None: 135 if isinstance(stride, collections.Sequence): 136 assert len(stride) == 2 137 stride, stride_y = stride 138 else: 139 stride_y = stride 140 141 if padding_y is None: 142 if isinstance(padding, collections.Sequence): 143 assert len(padding) == 2 144 padding, padding_y = padding 145 else: 146 padding_y = padding 147 148 if dilation_y is None: 149 if isinstance(dilation, collections.Sequence): 150 assert len(dilation) == 2 151 dilation, dilation_y = dilation 152 else: 153 dilation_y = dilation 154 155 if param_attr.attr.get('initial_smart'): 156 # special initial for conv layers. 157 init_w = (2.0 / (filter_size**2 * num_channels))**0.5 158 param_attr.attr["initial_mean"] = 0.0 159 param_attr.attr["initial_std"] = init_w 160 param_attr.attr["initial_strategy"] = 0 161 param_attr.attr["initial_smart"] = False 162 163 if layer_type: 164 if dilation > 1 or dilation_y > 1: 165 assert layer_type in [ 166 "cudnn_conv", "cudnn_convt", "exconv", "exconvt" 167 ] 168 if trans: 169 assert layer_type in ["exconvt", "cudnn_convt"] 170 else: 171 assert layer_type in ["exconv", "cudnn_conv"] 172 lt = layer_type 173 else: 174 lt = LayerType.CONVTRANS_LAYER if trans else LayerType.CONV_LAYER 175 176 l = Layer( 177 name=name, 178 inputs=Input( 179 input.name, 180 conv=Conv( 181 filter_size=filter_size, 182 padding=padding, 183 dilation=dilation, 184 stride=stride, 185 channels=num_channels, 186 groups=groups, 187 filter_size_y=filter_size_y, 188 padding_y=padding_y, 189 dilation_y=dilation_y, 190 stride_y=stride_y), 191 **param_attr.attr), 192 active_type=act.name, 193 num_filters=num_filters, 194 bias=ParamAttr.to_bias(bias_attr), 195 shared_biases=shared_biases, 196 type=lt, 197 **ExtraLayerAttribute.to_kwargs(layer_attr)) 198 return LayerOutput( 199 name, 200 lt, 201 parents=[input], 202 activation=act, 203 num_filters=num_filters, 204 size=l.config.size)
咱們瞭解這些參數的含義後,對比咱們以前本身手寫的CNN,能夠看出paddlepaddle有幾個優勢:
- 支持長方形和正方形的圖片尺寸
- 支持滑動步長stride、補零zero_padding、擴展dilation在水平和垂直方向上設置不一樣的值
- 支持偏置項卷積核中可以共享
- 自動適配cpu和gpu的卷積網絡
在咱們本身寫的CNN中,只支持正方形的圖片長度,若是是長方形會報錯。滑動步長,補零的維度等也只支持水平和垂直方向上的維度相同。瞭解卷積層的參數含義後,咱們來看一下底層的源碼是如何實現的:ConvBaseLayer.py 有興趣的同窗能夠在這個連接下看看底層是如何用C++寫的ConvLayer
池化層同理,能夠按照以前的思路分析,有興趣的能夠一直順延看到底層的實現,下次有機會再詳細分析。(佔坑明天補一下tensorflow的源碼實現)
總結
本文主要講解了卷積神經網絡中反向傳播的一些技巧,包括卷積層和池化層的反向傳播與傳統的反向傳播的區別,並實現了一個完整的CNN,後續你們能夠本身修改一些代碼,譬如當水平滑動長度與垂直滑動長度不一樣時須要怎麼調整等等,最後研究了一下paddlepaddle中CNN中的卷積層的實現過程,對比本身寫的CNN,總結了4個優勢,底層是C++實現的,有興趣的能夠本身再去深刻研究。寫的比較粗糙,若是有問題歡迎留言:)
參考文章:
1.https://www.cnblogs.com/pinard/p/6494810.html
2.https://www.zybuluo.com/hanbingtao/note/476663