從卷積拆分和分組的角度看CNN模型的演化

時間 2020-05-15

標籤拆分分組角度 cnn 模型演化简体版

原文原文鏈接

博客：博客園 | CSDN | blog網絡

寫在前面

如題，這篇文章將嘗試從卷積拆分的角度看一看各類經典CNN backbone網絡module是如何演進的，爲了視角的統一，僅分析單條路徑上的卷積形式。ide

形式化

方便起見，對常規卷積操做，作以下定義，模塊化

\(I\)：輸入尺寸，長\(H\) 寬\(W\) ，令長寬相同，即\(I = H = W\)
\(M\)：輸入channel數，能夠當作是tensor的高
\(K\)：卷積核尺寸\(K \times K\)，channel數與輸入channel數相同，爲\(M\)
\(N\)：卷積核個數
\(F\)：卷積獲得的feature map尺寸\(F \times F\)，channel數與卷積核個數相同，爲\(N\)

因此，輸入爲\(M \times I \times I\)的tensor，卷積核爲\(N \times M \times K \times K\)的tensor，feature map爲\(N \times F \times F\)的tensor，因此常規卷積的計算量爲函數

\[FLOPS = K \times K \times M \times N \times F \times F \]

特別地，若是僅考慮SAME padding且\(stride = 1\)的狀況，則\(F = I\)，則計算量等價爲性能

\[FLOPS = K \times K \times M \times N \times I \times I \]

能夠當作是\((K \times K \times M) \times (N \times I \times I)\)，前一個括號爲卷積中一次內積運算的計算量，後一個括號爲須要多少次內積運算。spa

參數量爲.net

\[\#Params = N \times M \times K \times K \]

網絡演化

總覽SqueezeNet、MobileNet V1 V二、ShuffleNet等各類輕量化網絡，能夠當作對卷積核\(M \times K \times K\) 進行了各類拆分或分組（同時引入激活函數），這些拆分和分組一般會減小參數量和計算量，這就爲進一步增長卷積核數量\(N\)讓出了空間，同時這種結構上的變化也是一種正則，經過上述變化來得到性能和計算量之間的平衡。blog

這些變化，從總體上看，至關於對原始\(FLOPS = K \times K \times M \times N \times I \times I\)作了各類變換。backbone

下面就從這個視角進行一下疏理，簡潔起見，只列出其中發生改變的因子項，get

Group Convolution（AlexNet），對輸入進行分組，卷積核數量不變，但channel數減小，至關於

\[M \rightarrow \frac{M}{G} \]
大卷積核替換爲多個堆疊的小核（VGG），好比\(5\times 5\)替換爲2個\(3\times 3\)，\(7\times 7\)替換爲3個\(3\times 3\)，保持感覺野不變的同時，減小參數量和計算量，至關於把大數乘積變成小數乘積之和，

\[(K \times K) \rightarrow (k \times k + \dots + k \times k) \]
Factorized Convolution（Inception V2），二維卷積變爲行列分別卷積，先行卷積再列卷積，

\[(K \times K) \rightarrow (K \times 1 + 1 \times K) \]
Fire module（SqueezeNet），pointwise+ReLU+(pointwise + 3x3 conv)+ReLU，pointwise降維，同時將必定比例的\(3\times 3\)卷積替換爲爲\(1 \times 1\)，

\[(K \times K \times M \times N) \rightarrow (M \times \frac{N}{t} + \frac{N}{t} \times (1-p)N + K \times K \times \frac{N}{t} \times pN) \\ K = 3 \]
Bottleneck（ResNet），pointwise+BN ReLU+3x3 conv+BN ReLU+pointwise，相似於對channel維作SVD，

\[(K \times K \times M \times N) \rightarrow (M \times \frac{N}{t} + K \times K \times \frac{N}{t} \times \frac{N}{t} + \frac{N}{t} \times N) \\ t = 4 \]
ResNeXt Block（ResNeXt），至關於引入了group \(3\times 3\) convolution的bottleneck，

\[(K \times K \times M \times N) \rightarrow (M \times \frac{N}{t} + K \times K \times \frac{N}{tG} \times \frac{N}{t} + \frac{N}{t} \times N) \\t = 2, \ G = 32 \]
Depthwise Separable Convolution（MobileNet V1），depthwise +BN ReLU + pointwise + BN ReLU，至關於將channel維單獨分解出去，

\[(K \times K \times N) \rightarrow (K \times K + N) \]
Separable Convolution（Xception），pointwise + depthwise + BN ReLU，也至關於將channel維分解出去，但先後順序不一樣（但由於是連續堆疊，其實跟基本Depthwise Separable Convolution等價），同時移除了二者間的ReLU，

\[(K \times K \times M) \rightarrow (M + K \times K) \]
但實際在實現時仍是depthwise + pointwise + ReLU。。。
pointwise group convolution and channel shuffle（ShuffleNet），group pointwise+BN ReLU+Channel Shuffle+depthwise+BN+group pointwise+BN，至關於bottleneck中2個pointwise引入相同的group，同時\(3\times 3\) conv變成depthwise，也就是說3個卷積層都group了，這會阻礙不一樣channel間（分組間）的信息交流，因此在第一個group pointwise後加入了channel shuffle，即

\[(K \times K \times M \times N) \rightarrow (\frac{M}{G} \times \frac{N}{t} + channel \ shuffle +K \times K \times \frac{N}{t} + \frac{N}{tG} \times N) \]
Inverted Linear Bottleneck（MobileNet V2），bottleneck是先經過pointwise降維、再卷積、再升維，Inverted bottleneck是先升維、再卷積、再降維，pointwise+BN ReLU6+depthwise+BN ReLU6+pointwise+BN，

\[(K \times K \times M \times N) \rightarrow (M \times tM + K \times K \times tM + tM \times N) \\t = 6 \]