使用Tensorflow和VGG16預訓模型進行預測
fast.ai的入門教程中使用了kaggle: dogs vs cats做爲例子來讓你們入門Computer Vision。不過並未應用到最近很火的Tensorflow。Keras雖然能夠調用Tensorflow做爲backend,不過既然能夠少走一層直接走Tensorflow,那秉着學習的想法,就直接用Tensorflow來一下把。html
據說工程上廣泛的作法並非從頭開始訓練模型,而是直接用已經訓練完的模型稍加改動(這個過程叫finetune)來達到目的。那麼這裏就須要用Tensorflow還原出VGG16的模型。這裏借鑑了frossard的python代碼和他轉化的權重。架構具體以下:(cs231n有更詳細的介紹)python
INPUT: [224x224x3] memory: 224*224*3=150K weights: 0
CONV3-64: [224x224x64] memory: 224*224*64=3.2M weights: (3*3*3)*64 = 1,728
CONV3-64: [224x224x64] memory: 224*224*64=3.2M weights: (3*3*64)*64 = 36,864
POOL2: [112x112x64] memory: 112*112*64=800K weights: 0
CONV3-128: [112x112x128] memory: 112*112*128=1.6M weights: (3*3*64)*128 = 73,728
CONV3-128: [112x112x128] memory: 112*112*128=1.6M weights: (3*3*128)*128 = 147,456
POOL2: [56x56x128] memory: 56*56*128=400K weights: 0
CONV3-256: [56x56x256] memory: 56*56*256=800K weights: (3*3*128)*256 = 294,912
CONV3-256: [56x56x256] memory: 56*56*256=800K weights: (3*3*256)*256 = 589,824
CONV3-256: [56x56x256] memory: 56*56*256=800K weights: (3*3*256)*256 = 589,824
POOL2: [28x28x256] memory: 28*28*256=200K weights: 0
CONV3-512: [28x28x512] memory: 28*28*512=400K weights: (3*3*256)*512 = 1,179,648
CONV3-512: [28x28x512] memory: 28*28*512=400K weights: (3*3*512)*512 = 2,359,296
CONV3-512: [28x28x512] memory: 28*28*512=400K weights: (3*3*512)*512 = 2,359,296
POOL2: [14x14x512] memory: 14*14*512=100K weights: 0
CONV3-512: [14x14x512] memory: 14*14*512=100K weights: (3*3*512)*512 = 2,359,296
CONV3-512: [14x14x512] memory: 14*14*512=100K weights: (3*3*512)*512 = 2,359,296
CONV3-512: [14x14x512] memory: 14*14*512=100K weights: (3*3*512)*512 = 2,359,296
POOL2: [7x7x512] memory: 7*7*512=25K weights: 0
FC: [1x1x4096] memory: 4096 weights: 7*7*512*4096 = 102,760,448
FC: [1x1x4096] memory: 4096 weights: 4096*4096 = 16,777,216
FC: [1x1x1000] memory: 1000 weights: 4096*1000 = 4,096,000
具體實現移步VGG16。這裏要注意的一點就是最後的輸出是不須要通過Relu的。git
預測貓和狗不能照搬這個架構,由於VGG16是用來預測ImageNet上1000個不一樣種類的。用來預測貓和狗兩種類別,須要在這個架構的基礎上再加一層FC把1000轉化成2個。(也能夠把最後一層替換掉,直接輸出成2個)。我還在VGG16以後多加了一層BN,原來VGG16的時候並不存在BN。我也並無在每一個CONV後面加,由於不想算...github
FC的輸出在訓練的時候使用Cross Entropy損失函數,預測的時候使用Softmax。這樣就能夠識別出給定圖片是貓仍是狗了。具體代碼移步cats_model.pyredux
咱們來看一下效果如何。完整的:Jupyter Notebook架構
未通過Finetune直接運行VGG16改模型(加上了最後一層FC)的結果(預測很是不許,由於最後一層的權重都是隨機的)。這麼作的目的是看一下模型是否能運行,順便看看能蒙對幾個。less
通過一次迭代,準確率就達到95%了(重複過幾回,此次並非最高的)。函數
再看一下一樣的圖片預測結果,彷佛準確了不少。post
Final Thoughts學習
圖像識別很是有趣,是一個很是有挑戰的領域。