圖像分類丨Inception家族進化史「GoogleNet、Inception、Xception」

時間 2019-12-05

標籤圖像分類 inception 家族進化 googlenet xception 简体版

原文原文鏈接

引言

Google提出的Inception系列是分類任務中的表明性工做，不一樣於VGG簡單地堆疊卷積層，Inception重視網絡的拓撲結構。本文關注Inception系列方法的演變，並加入了Xception做爲對比。

PS1：這裏有一篇blog，做者Bharath Raj簡潔明瞭地介紹這系列的工做：https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202，強烈建議閱讀。python

PS2：我看了比較多的blog，都沒有介紹清楚V2和V3的區別。主要是由於V2的提出涉及到兩篇paper，而且V2和V3是在一篇論文中提到的。實際上，它們二者的區別並不大。網絡

InceptionV1

Going Deeper with Convolutions架構

核心思想

因爲圖像的突出部分可能有極大的尺寸變化，這爲卷積操做選擇正確的內核大小創造了困難，好比更全局的信息應該使用大的內核，而更局部的信息應該使用小內核。不妨在同一級運行多種尺寸的濾波核，讓網絡本質變得更"寬"而不是」更深「。框架

提出Inception模塊（左），具備三種不一樣的濾波器（1x1,3x3,5x5）和max pooling。爲下降計算量，GooLeNet借鑑Network-in-Network的思想，用1x1卷積降維減少參數量（右）。可在保持計算成本的同時增長網絡的深度和寬度。

網絡架構

GoogLeNet具備9個Inception模塊，22層深（27層包括pooling），並在最後一個Inception模塊使用全局池化。
因爲網絡深度，將存在梯度消失vanishing gradient的問題。
爲了防止網絡中間部分消失，做者提出了兩個輔助分類器auxiliary classifiers（紫色），總損失是實際損失和輔助損失的加權求和。

# The total loss used by the inception net during training.
total_loss = real_loss + 0.3 * aux_loss_1 + 0.3 * aux_loss_2

實驗結果

InceptionV2

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shiftide

Rethinking the Inception Architecture for Computer Vision函數

核心思想

使用Batch Normalization

將輸出歸一化爲N(0,1正態分佈，一方面能夠採用較大的學習速率，加快收斂；另外一方面BN具備正則效應。性能

卷積分解Factorizing Convolutions

當卷積沒有完全改變輸入維度時，神經網絡表現更好。過分減少尺寸會致使信息丟失，稱爲"representational bottleneck"，巧妙地使用分解(factorization)方法，可提升卷積的計算效率。學習

分解爲更小的卷積：\(5\times5\)卷積可分解爲兩個\(3\times3\)卷積以提高計算效率，計算效率爲原來的\(\frac{3\times3+3\times3}{5\times5}\)
分解爲非對稱卷積：\(n\times n\)卷積可分解爲\(1\times n\)和\(n \times 1\)的卷積。

Inception的演化ui

a爲InceptionV1；用兩個3x3卷積替換5x5獲得b；再將3x3卷積分解爲3x一、1x3得c；在高層特徵中，卷積組被拓展爲d已產生更多不同的特徵。es5

下采樣模塊

InceptionV3再也不使用max pooling下采樣，這樣致使信息損失較大。因而做者想用conv升維，而後再pooling，但會帶來較大的計算量，因此做者設計了一個並行雙分支的結構Grid Size Reduction來取代max pooling。

網絡結構

figure五、figure六、figure7分別表示上圖的b、c、d，每種block之間加入Grid Size Reduction。

實驗結果

Inceptionv2達到23.4%，而Inceptionv3是指在Inceptionv2上同時使用RMSProp、Label Smoothing和分解7x7卷積、輔助分類器使用BN。

InceptionV3

Rethinking the Inception Architecture for Computer Vision

核心思想

做者指出，輔助分類器在訓練即將結束時準確度接近飽和時纔會有大的貢獻。所以能夠做正則化regularizes。

V3在V2上做了以下改進，見V2實驗結果：
1. RMSProp Optimizer
2. 分解7x7的卷積
3. 輔助分類器採用BatchNorm
4. Label Smoothing，防止過擬合。

實驗結果

InceptionV4

Inception-ResNet and the Impact of Residual Connections on Learning

這篇文章結合ResNet和Inception提出了三種新的網絡結構

Inception-ResNet-v1：混合版Inception，和InceptionV3有相同計算成本。
Inception-ResNet-v2：計算成本更高，顯著提升performance。
InceptionV4：純Inception變體，無residual鏈接，媲美Inception-ResNetV2

核心思想

InceptionV4是對原來的版本進行了梳理，由於原始模型是採用分區方式訓練，而遷移到TensorFlow框架後能夠對Inception模塊進行必定的規範和簡化。

網絡架構

Stem：Inception-ResNetV1採用了top，Inceptionv4和Inception-ResNetV2採用了bottom。

Inception modules A,B,C

Reduction Blocks A,B

Network

Inception-ResNet

核心思想

受ResNet啓發，提出一種混合版的Inception。Inception-ResNet有v一、v2版本。
1. Inception-ResNetV1計算量與InceptionV3類似，Inception-ResNetV2計算量與InceptionV4類似。
2. 它們有不一樣的steam。
3. 它們的A、B、C模塊相同，區別在於超參數設置。

當卷積核數量超過1000時，更深的單元會致使網絡死亡。所以爲了增長穩定性，做者對殘差激活值進行0.1-0.3的縮放。

網絡架構

Steam：見InceptionV4
Inception-ResNet Module A,B,C

Residual Blocks A,B

Network

實驗結果

Xception

核心思想

Xception: Deep Learning with Depthwise Separable Convolutions

借鑑depth wise separable conv改進InceptionV3。

Inception基於假設：卷積時將通道和空間卷積分離會更好。其1x1的卷積做用於通道，3x3的卷積同時做用於通道和空間，沒有作到徹底分離。

Xception(Extream Inception)則讓3x3卷積只做用於一個通道的特徵圖，從而實現了徹底分離。

InceptionV3到Xception的演化

Xception與depthwise separable conv的不一樣之處：

depthwise separable conv先對通道進行卷積再1x1卷積，而Xception先1x1卷積，再對通道卷積。
depthwise separable conv兩個卷積間不帶激活函數，Xception會通過ReLU。

網絡架構

實驗結果

總結

GoogLeNet即InceptionV1提出了Inception結構，包含1x一、3x三、5x5的conv和pooling，使網絡變寬，增長網絡對多尺度的適應性。
InceptionV2提出了Batch Normalization，使輸出歸一化爲N(0,1)分佈，從而加快收斂。而且提出了卷積分解的思想，將大卷積分解爲小卷積或非對稱卷積，從而下降計算量。
InceptionV3在InceptionV2的基礎上作了一些改進，繼續分解7x7卷積、Label Smoothing，並在輔助分類器中也採用BN。
InceptionV4從新考慮了InceptionV3的結構，下降了沒必要要的計算量，純Inception，未引入Residual鏈接，準確性媲美Inception-ResNetv2。
Inception-ResNet是Inception和Residual Connection的結合，性能有所提高。其有兩個版本v一、v2，v1的計算量跟InceptionV3類似，v2的計算量跟InceptionV4類似。
Xception借鑑了depth wise separable conv改進InceptionV3，將空間和通道徹底分離，從而提高了性能，下降了參數量。

參考

paper

[1]Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 1-9.

[2]Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift[J]. arXiv preprint arXiv:1502.03167, 2015.

[3]Szegedy C, Vanhoucke V, Ioffe S, et al. Rethinking the inception architecture for computer vision[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 2818-2826.

[4]Szegedy C, Ioffe S, Vanhoucke V, et al. Inception-v4, inception-resnet and the impact of residual connections on learning[C]//Thirty-First AAAI Conference on Artificial Intelligence. 2017.

[5]Chollet F. Xception: Deep learning with depthwise separable convolutions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1251-1258.