上週咱們講了經典CNN網絡AlexNet對圖像分類的效果,2014年,在AlexNet出來的兩年後,牛津大學提出了Vgg網絡,並在ILSVRC 2014中的classification項目的比賽中取得了第2名的成績(第一名是GoogLeNet,也是同年提出的)。在論文《Very Deep Convolutional Networks for Large-Scale Image Recognition》中,做者提出經過縮小卷積核大小來構建更深的網絡。網絡
Vgg網絡結構session
VGGnet是Oxford的Visual Geometry Group的team,在ILSVRC 2014上的主要工做是證實了增長網絡的深度可以在必定程度上影響網絡最終的性能,以下圖,文章經過逐步增長網絡深度來提升性能,雖然看起來有一點小暴力,沒有特別多取巧的,可是確實有效,不少pretrained的方法就是使用VGG的model(主要是16和19),VGG相對其餘的方法,參數空間很大,因此train一個vgg模型一般要花費更長的時間,不過公開的pretrained model讓咱們很方便的使用,paper中的幾種模型以下:多線程
圖1 vgg網絡結構app
圖中D和E分別爲VGG-16和VGG-19,參數分別是138m和144m,是文中兩個效果最好的網絡結構,VGG網絡結構能夠看作是AlexNet的加深版,VGG在圖像檢測中效果很好(如:Faster-RCNN),這種傳統結構相對較好的保存了圖片的局部位置信息(不像GoogLeNet中引入Inception可能致使位置信息的錯亂)。 dom
咱們來仔細看一下vgg16的網絡結構:ide
圖2 vgg16網絡結構 函數
從圖中能夠看到,每一個卷積層都使用更小的3×3卷積覈對圖像進行卷積,並把這些小的卷積核排列起來做爲一個卷積序列。通俗點來說就是對原始圖像進行3×3卷積,而後再進行3×3卷積,連續使用小的卷積覈對圖像進行屢次卷積。性能
在alexnet裏咱們一開始的時候是用11*11的大卷積核網絡,爲何在這裏要用3*3的小卷積核來對圖像進行卷積呢?而且仍是使用連續的小卷積核?VGG一開始提出的時候恰好與LeNet的設計原則相違背,由於LeNet相信大的卷積核可以捕獲圖像當中類似的特徵(權值共享)。AlexNet在淺層網絡開始的時候也是使用9×九、11×11卷積核,而且儘可能在淺層網絡的時候避免使用1×1的卷積核。可是VGG的神奇之處就是在於使用多個3×3卷積核能夠模仿較大卷積核那樣對圖像進行局部感知。後來多個小的卷積核串聯這一思想被GoogleNet和ResNet等吸取。學習
從圖1的實驗結果也能夠看到,VGG使用多個3x3卷積來對高維特徵進行提取。由於若是使用較大的卷積核,參數就會大量地增長、運算時間也會成倍的提高。例如3x3的卷積核只有9個權值參數,使用7*7的卷積核權值參數就會增長到49個。由於缺少一個模型去對大量的參數進行歸一化、約減,或者說是限制大規模的參數出現,所以訓練核數更大的卷積網絡就變得很是困難了。測試
VGG相信若是使用大的卷積核將會形成很大的時間浪費,減小的卷積核可以減小參數,節省運算開銷。雖然訓練的時間變長了,可是整體來講預測的時間和參數都是減小的了。
Vgg的優點
與AlexNet相比:
相同點
- 總體結構分五層;
- 除softmax層外,最後幾層爲全鏈接層;
- 五層之間經過max pooling鏈接。
不一樣點
- 使用3×3的小卷積核代替7×7大卷積核,網絡構建的比較深;
- 因爲LRN太耗費計算資源,性價比不高,因此被去掉;
- 採用了更多的feature map,可以提取更多的特徵,從而可以作更多特徵的組合。
用PaddlePaddle實現Vgg
1.網絡結構
1 #coding:utf-8 2 ''' 3 Created by huxiaoman 2017.12.12 4 vggnet.py:用vgg網絡實現cifar-10分類 5 ''' 6 7 import paddle.v2 as paddle 8 9 def vgg(input): 10 def conv_block(ipt, num_filter, groups, dropouts, num_channels=None): 11 return paddle.networks.img_conv_group( 12 input=ipt, 13 num_channels=num_channels, 14 pool_size=2, 15 pool_stride=2, 16 conv_num_filter=[num_filter] * groups, 17 conv_filter_size=3, 18 conv_act=paddle.activation.Relu(), 19 conv_with_batchnorm=True, 20 conv_batchnorm_drop_rate=dropouts, 21 pool_type=paddle.pooling.Max()) 22 23 conv1 = conv_block(input, 64, 2, [0.3, 0], 3) 24 conv2 = conv_block(conv1, 128, 2, [0.4, 0]) 25 conv3 = conv_block(conv2, 256, 3, [0.4, 0.4, 0]) 26 conv4 = conv_block(conv3, 512, 3, [0.4, 0.4, 0]) 27 conv5 = conv_block(conv4, 512, 3, [0.4, 0.4, 0]) 28 29 drop = paddle.layer.dropout(input=conv5, dropout_rate=0.5) 30 fc1 = paddle.layer.fc(input=drop, size=512, act=paddle.activation.Linear()) 31 bn = paddle.layer.batch_norm( 32 input=fc1, 33 act=paddle.activation.Relu(), 34 layer_attr=paddle.attr.Extra(drop_rate=0.5)) 35 fc2 = paddle.layer.fc(input=bn, size=512, act=paddle.activation.Linear()) 36 return fc2
2.訓練模型
1 #coding:utf-8 2 ''' 3 Created by huxiaoman 2017.12.12 4 train_vgg.py:訓練vgg16對cifar10數據集進行分類 5 ''' 6 7 import sys, os 8 import paddle.v2 as paddle 9 from vggnet import vgg 10 11 with_gpu = os.getenv('WITH_GPU', '0') != '1' 12 13 def main(): 14 datadim = 3 * 32 * 32 15 classdim = 10 16 17 # PaddlePaddle init 18 paddle.init(use_gpu=with_gpu, trainer_count=8) 19 20 image = paddle.layer.data( 21 name="image", type=paddle.data_type.dense_vector(datadim)) 22 23 net = vgg(image) 24 25 out = paddle.layer.fc( 26 input=net, size=classdim, act=paddle.activation.Softmax()) 27 28 lbl = paddle.layer.data( 29 name="label", type=paddle.data_type.integer_value(classdim)) 30 cost = paddle.layer.classification_cost(input=out, label=lbl) 31 32 # Create parameters 33 parameters = paddle.parameters.create(cost) 34 35 # Create optimizer 36 momentum_optimizer = paddle.optimizer.Momentum( 37 momentum=0.9, 38 regularization=paddle.optimizer.L2Regularization(rate=0.0002 * 128), 39 learning_rate=0.1 / 128.0, 40 learning_rate_decay_a=0.1, 41 learning_rate_decay_b=50000 * 100, 42 learning_rate_schedule='discexp') 43 44 # End batch and end pass event handler 45 def event_handler(event): 46 if isinstance(event, paddle.event.EndIteration): 47 if event.batch_id % 100 == 0: 48 print "\nPass %d, Batch %d, Cost %f, %s" % ( 49 event.pass_id, event.batch_id, event.cost, event.metrics) 50 else: 51 sys.stdout.write('.') 52 sys.stdout.flush() 53 if isinstance(event, paddle.event.EndPass): 54 # save parameters 55 with open('params_pass_%d.tar' % event.pass_id, 'w') as f: 56 parameters.to_tar(f) 57 58 result = trainer.test( 59 reader=paddle.batch( 60 paddle.dataset.cifar.test10(), batch_size=128), 61 feeding={'image': 0, 62 'label': 1}) 63 print "\nTest with Pass %d, %s" % (event.pass_id, result.metrics) 64 65 # Create trainer 66 trainer = paddle.trainer.SGD( 67 cost=cost, parameters=parameters, update_equation=momentum_optimizer) 68 69 # Save the inference topology to protobuf. 70 inference_topology = paddle.topology.Topology(layers=out) 71 with open("inference_topology.pkl", 'wb') as f: 72 inference_topology.serialize_for_inference(f) 73 74 trainer.train( 75 reader=paddle.batch( 76 paddle.reader.shuffle( 77 paddle.dataset.cifar.train10(), buf_size=50000), 78 batch_size=128), 79 num_passes=200, 80 event_handler=event_handler, 81 feeding={'image': 0, 82 'label': 1}) 83 84 # inference 85 from PIL import Image 86 import numpy as np 87 import os 88 89 def load_image(file): 90 im = Image.open(file) 91 im = im.resize((32, 32), Image.ANTIALIAS) 92 im = np.array(im).astype(np.float32) 93 im = im.transpose((2, 0, 1)) # CHW 94 im = im[(2, 1, 0), :, :] # BGR 95 im = im.flatten() 96 im = im / 255.0 97 return im 98 99 test_data = [] 100 cur_dir = os.path.dirname(os.path.realpath(__file__)) 101 test_data.append((load_image(cur_dir + '/image/dog.png'), )) 102 103 probs = paddle.infer( 104 output_layer=out, parameters=parameters, input=test_data) 105 lab = np.argsort(-probs) # probs and lab are the results of one batch data 106 print "Label of image/dog.png is: %d" % lab[0][0] 107 108 109 if __name__ == '__main__': 110 main()
3.訓練結果
1 nohup: ignoring input 2 I1127 09:36:58.313799 13026 Util.cpp:166] commandline: --use_gpu=True --trainer_count=7 3 [INFO 2017-11-27 09:37:04,477 layers.py:2539] output for __conv_0__: c = 64, h = 32, w = 32, size = 65536 4 [INFO 2017-11-27 09:37:04,478 layers.py:3062] output for __batch_norm_0__: c = 64, h = 32, w = 32, size = 65536 5 [INFO 2017-11-27 09:37:04,479 layers.py:2539] output for __conv_1__: c = 64, h = 32, w = 32, size = 65536 6 [INFO 2017-11-27 09:37:04,480 layers.py:3062] output for __batch_norm_1__: c = 64, h = 32, w = 32, size = 65536 7 [INFO 2017-11-27 09:37:04,480 layers.py:2667] output for __pool_0__: c = 64, h = 16, w = 16, size = 16384 8 [INFO 2017-11-27 09:37:04,481 layers.py:2539] output for __conv_2__: c = 128, h = 16, w = 16, size = 32768 9 [INFO 2017-11-27 09:37:04,482 layers.py:3062] output for __batch_norm_2__: c = 128, h = 16, w = 16, size = 32768 10 [INFO 2017-11-27 09:37:04,483 layers.py:2539] output for __conv_3__: c = 128, h = 16, w = 16, size = 32768 11 [INFO 2017-11-27 09:37:04,484 layers.py:3062] output for __batch_norm_3__: c = 128, h = 16, w = 16, size = 32768 12 [INFO 2017-11-27 09:37:04,485 layers.py:2667] output for __pool_1__: c = 128, h = 8, w = 8, size = 8192 13 [INFO 2017-11-27 09:37:04,485 layers.py:2539] output for __conv_4__: c = 256, h = 8, w = 8, size = 16384 14 [INFO 2017-11-27 09:37:04,486 layers.py:3062] output for __batch_norm_4__: c = 256, h = 8, w = 8, size = 16384 15 [INFO 2017-11-27 09:37:04,487 layers.py:2539] output for __conv_5__: c = 256, h = 8, w = 8, size = 16384 16 [INFO 2017-11-27 09:37:04,488 layers.py:3062] output for __batch_norm_5__: c = 256, h = 8, w = 8, size = 16384 17 [INFO 2017-11-27 09:37:04,489 layers.py:2539] output for __conv_6__: c = 256, h = 8, w = 8, size = 16384 18 [INFO 2017-11-27 09:37:04,490 layers.py:3062] output for __batch_norm_6__: c = 256, h = 8, w = 8, size = 16384 19 [INFO 2017-11-27 09:37:04,490 layers.py:2667] output for __pool_2__: c = 256, h = 4, w = 4, size = 4096 20 [INFO 2017-11-27 09:37:04,491 layers.py:2539] output for __conv_7__: c = 512, h = 4, w = 4, size = 8192 21 [INFO 2017-11-27 09:37:04,492 layers.py:3062] output for __batch_norm_7__: c = 512, h = 4, w = 4, size = 8192 22 [INFO 2017-11-27 09:37:04,493 layers.py:2539] output for __conv_8__: c = 512, h = 4, w = 4, size = 8192 23 [INFO 2017-11-27 09:37:04,494 layers.py:3062] output for __batch_norm_8__: c = 512, h = 4, w = 4, size = 8192 24 [INFO 2017-11-27 09:37:04,495 layers.py:2539] output for __conv_9__: c = 512, h = 4, w = 4, size = 8192 25 [INFO 2017-11-27 09:37:04,495 layers.py:3062] output for __batch_norm_9__: c = 512, h = 4, w = 4, size = 8192 26 [INFO 2017-11-27 09:37:04,496 layers.py:2667] output for __pool_3__: c = 512, h = 2, w = 2, size = 2048 27 [INFO 2017-11-27 09:37:04,497 layers.py:2539] output for __conv_10__: c = 512, h = 2, w = 2, size = 2048 28 [INFO 2017-11-27 09:37:04,498 layers.py:3062] output for __batch_norm_10__: c = 512, h = 2, w = 2, size = 2048 29 [INFO 2017-11-27 09:37:04,499 layers.py:2539] output for __conv_11__: c = 512, h = 2, w = 2, size = 2048 30 [INFO 2017-11-27 09:37:04,499 layers.py:3062] output for __batch_norm_11__: c = 512, h = 2, w = 2, size = 2048 31 [INFO 2017-11-27 09:37:04,502 layers.py:2539] output for __conv_12__: c = 512, h = 2, w = 2, size = 2048 32 [INFO 2017-11-27 09:37:04,502 layers.py:3062] output for __batch_norm_12__: c = 512, h = 2, w = 2, size = 2048 33 [INFO 2017-11-27 09:37:04,503 layers.py:2667] output for __pool_4__: c = 512, h = 1, w = 1, size = 512 34 I1127 09:37:04.563228 13026 MultiGradientMachine.cpp:99] numLogicalDevices=1 numThreads=7 numDevices=8 35 I1127 09:37:04.822993 13026 GradientMachine.cpp:85] Initing parameters.. 36 I1127 09:37:05.728123 13026 GradientMachine.cpp:92] Init parameters done. 37 38 Pass 0, Batch 0, Cost 2.407296, {'classification_error_evaluator': 0.8828125} 39 ................................................................................................... 40 Pass 0, Batch 100, Cost 1.994910, {'classification_error_evaluator': 0.84375} 41 ................................................................................................... 42 Pass 0, Batch 200, Cost 2.199248, {'classification_error_evaluator': 0.8671875} 43 ................................................................................................... 44 Pass 0, Batch 300, Cost 1.982006, {'classification_error_evaluator': 0.8125} 45 .......................................................................................... 46 Test with Pass 0, {'classification_error_evaluator': 0.8999999761581421} 47 48 ``` 49 ``` 50 Pass 199, Batch 0, Cost 0.012132, {'classification_error_evaluator': 0.0} 51 ................................................................................................... 52 Pass 199, Batch 100, Cost 0.021121, {'classification_error_evaluator': 0.0078125} 53 ................................................................................................... 54 Pass 199, Batch 200, Cost 0.068369, {'classification_error_evaluator': 0.0078125} 55 ................................................................................................... 56 Pass 199, Batch 300, Cost 0.015805, {'classification_error_evaluator': 0.0} 57 ..........................................................................................I1128 01:57:44.727157 13026 MultiGradientMachine.cpp:99] numLogicalDevices=1 numThreads=7 numDevices=8 58 59 Test with Pass 199, {'classification_error_evaluator': 0.10890000313520432} 60 Label of image/dog.png is: 5
從訓練結果來看,開了7個線程,8個Tesla K80,迭代200次,耗時16h21min,相比於以前訓練的lenet和alexnet的幾個小時來講,時間消耗很高,可是結果很好,準確率是89.11%,在同設備和迭代次數狀況下,比lenet的和alexnet的精度都要高。
用Tensorflow實現vgg
1.網絡結構
1 def inference_op(input_op, keep_prob): 2 p = [] 3 # 第一塊 conv1_1-conv1_2-pool1 4 conv1_1 = conv_op(input_op, name='conv1_1', kh=3, kw=3, 5 n_out = 64, dh = 1, dw = 1, p = p) 6 conv1_2 = conv_op(conv1_1, name='conv1_2', kh=3, kw=3, 7 n_out = 64, dh = 1, dw = 1, p = p) 8 pool1 = mpool_op(conv1_2, name = 'pool1', kh = 2, kw = 2, 9 dw = 2, dh = 2) 10 # 第二塊 conv2_1-conv2_2-pool2 11 conv2_1 = conv_op(pool1, name='conv2_1', kh=3, kw=3, 12 n_out = 128, dh = 1, dw = 1, p = p) 13 conv2_2 = conv_op(conv2_1, name='conv2_2', kh=3, kw=3, 14 n_out = 128, dh = 1, dw = 1, p = p) 15 pool2 = mpool_op(conv2_2, name = 'pool2', kh = 2, kw = 2, 16 dw = 2, dh = 2) 17 # 第三塊 conv3_1-conv3_2-conv3_3-pool3 18 conv3_1 = conv_op(pool2, name='conv3_1', kh=3, kw=3, 19 n_out = 256, dh = 1, dw = 1, p = p) 20 conv3_2 = conv_op(conv3_1, name='conv3_2', kh=3, kw=3, 21 n_out = 256, dh = 1, dw = 1, p = p) 22 conv3_3 = conv_op(conv3_2, name='conv3_3', kh=3, kw=3, 23 n_out = 256, dh = 1, dw = 1, p = p) 24 pool3 = mpool_op(conv3_3, name = 'pool3', kh = 2, kw = 2, 25 dw = 2, dh = 2) 26 # 第四塊 conv4_1-conv4_2-conv4_3-pool4 27 conv4_1 = conv_op(pool3, name='conv4_1', kh=3, kw=3, 28 n_out = 512, dh = 1, dw = 1, p = p) 29 conv4_2 = conv_op(conv4_1, name='conv4_2', kh=3, kw=3, 30 n_out = 512, dh = 1, dw = 1, p = p) 31 conv4_3 = conv_op(conv4_2, name='conv4_3', kh=3, kw=3, 32 n_out = 512, dh = 1, dw = 1, p = p) 33 pool4 = mpool_op(conv4_3, name = 'pool4', kh = 2, kw = 2, 34 dw = 2, dh = 2) 35 # 第五塊 conv5_1-conv5_2-conv5_3-pool5 36 conv5_1 = conv_op(pool4, name='conv5_1', kh=3, kw=3, 37 n_out = 512, dh = 1, dw = 1, p = p) 38 conv5_2 = conv_op(conv5_1, name='conv5_2', kh=3, kw=3, 39 n_out = 512, dh = 1, dw = 1, p = p) 40 conv5_3 = conv_op(conv5_2, name='conv5_3', kh=3, kw=3, 41 n_out = 512, dh = 1, dw = 1, p = p) 42 pool5 = mpool_op(conv5_3, name = 'pool5', kh = 2, kw = 2, 43 dw = 2, dh = 2) 44 # 把pool5 ( [7, 7, 512] ) 拉成向量 45 shp = pool5.get_shape() 46 flattened_shape = shp[1].value * shp[2].value * shp[3].value 47 resh1 = tf.reshape(pool5, [-1, flattened_shape], name = 'resh1') 48 49 # 全鏈接層1 添加了 Droput來防止過擬合 50 fc1 = fc_op(resh1, name = 'fc1', n_out = 2048, p = p) 51 fc1_drop = tf.nn.dropout(fc1, keep_prob, name = 'fc1_drop') 52 53 # 全鏈接層2 添加了 Droput來防止過擬合 54 fc2 = fc_op(fc1_drop, name = 'fc2', n_out = 2048, p = p) 55 fc2_drop = tf.nn.dropout(fc2, keep_prob, name = 'fc2_drop') 56 57 # 全鏈接層3 加一個softmax求給類別的機率 58 fc3 = fc_op(fc2_drop, name = 'fc3', n_out = 10, p = p) 59 softmax = tf.nn.softmax(fc3) 60 predictions = tf.argmax(softmax, 1) 61 return predictions, softmax, fc3, p
2.訓練網絡結構
1 # -*- coding: utf-8 -*- 2 """ 3 Created by huxiaoman 2017.12.12 4 vgg_tf.py:訓練tensorflow版的vgg16網絡,對cifar-10shuju進行分類 5 """ 6 from datetime import datetime 7 import math 8 import time 9 import tensorflow as tf 10 import cifar10 11 12 batch_size = 128 13 num_batches = 200 14 15 # 定義函數對卷積層進行初始化 16 # input_op : 輸入數據 17 # name : 該卷積層的名字,用tf.name_scope()來命名 18 # kh,kw : 分別是卷積核的高和寬 19 # n_out : 輸出通道數 20 # dh,dw : 步長的高和寬 21 # p : 是參數列表,存儲VGG所用到的參數 22 # 採用xavier方法對卷積核權值進行初始化 23 def conv_op(input_op, name, kh, kw, n_out, dh, dw, p): 24 n_in = input_op.get_shape()[-1].value # 得到輸入圖像的通道數 25 with tf.name_scope(name) as scope: 26 kernel = tf.get_variable(scope+'w', 27 shape = [kh, kw, n_in, n_out], dtype = tf.float32, 28 initializer = tf.contrib.layers.xavier_initializer_conv2d()) 29 # 卷積層計算 30 conv = tf.nn.conv2d(input_op, kernel, (1, dh, dw, 1), padding = 'SAME') 31 bias_init_val = tf.constant(0.0, shape = [n_out], dtype = tf.float32) 32 biases = tf.Variable(bias_init_val, trainable = True, name = 'b') 33 z = tf.nn.bias_add(conv, biases) 34 activation = tf.nn.relu(z, name = scope) 35 p += [kernel, biases] 36 return activation 37 38 # 定義函數對全鏈接層進行初始化 39 # input_op : 輸入數據 40 # name : 該全鏈接層的名字 41 # n_out : 輸出的通道數 42 # p : 參數列表 43 # 初始化方法用 xavier方法 44 def fc_op(input_op, name, n_out, p): 45 n_in = input_op.get_shape()[-1].value 46 47 with tf.name_scope(name) as scope: 48 kernel = tf.get_variable(scope+'w', 49 shape = [n_in, n_out], dtype = tf.float32, 50 initializer = tf.contrib.layers.xavier_initializer()) 51 biases = tf.Variable(tf.constant(0.1, shape = [n_out], 52 dtype = tf.float32), name = 'b') 53 activation = tf.nn.relu_layer(input_op, kernel, 54 biases, name = scope) 55 p += [kernel, biases] 56 return activation 57 58 # 定義函數 建立 maxpool層 59 # input_op : 輸入數據 60 # name : 該卷積層的名字,用tf.name_scope()來命名 61 # kh,kw : 分別是卷積核的高和寬 62 # dh,dw : 步長的高和寬 63 def mpool_op(input_op, name, kh, kw, dh, dw): 64 return tf.nn.max_pool(input_op, ksize = [1,kh,kw,1], 65 strides = [1, dh, dw, 1], padding = 'SAME', name = name) 66 67 #---------------建立 VGG-16------------------ 68 69 def inference_op(input_op, keep_prob): 70 p = [] 71 # 第一塊 conv1_1-conv1_2-pool1 72 conv1_1 = conv_op(input_op, name='conv1_1', kh=3, kw=3, 73 n_out = 64, dh = 1, dw = 1, p = p) 74 conv1_2 = conv_op(conv1_1, name='conv1_2', kh=3, kw=3, 75 n_out = 64, dh = 1, dw = 1, p = p) 76 pool1 = mpool_op(conv1_2, name = 'pool1', kh = 2, kw = 2, 77 dw = 2, dh = 2) 78 # 第二塊 conv2_1-conv2_2-pool2 79 conv2_1 = conv_op(pool1, name='conv2_1', kh=3, kw=3, 80 n_out = 128, dh = 1, dw = 1, p = p) 81 conv2_2 = conv_op(conv2_1, name='conv2_2', kh=3, kw=3, 82 n_out = 128, dh = 1, dw = 1, p = p) 83 pool2 = mpool_op(conv2_2, name = 'pool2', kh = 2, kw = 2, 84 dw = 2, dh = 2) 85 # 第三塊 conv3_1-conv3_2-conv3_3-pool3 86 conv3_1 = conv_op(pool2, name='conv3_1', kh=3, kw=3, 87 n_out = 256, dh = 1, dw = 1, p = p) 88 conv3_2 = conv_op(conv3_1, name='conv3_2', kh=3, kw=3, 89 n_out = 256, dh = 1, dw = 1, p = p) 90 conv3_3 = conv_op(conv3_2, name='conv3_3', kh=3, kw=3, 91 n_out = 256, dh = 1, dw = 1, p = p) 92 pool3 = mpool_op(conv3_3, name = 'pool3', kh = 2, kw = 2, 93 dw = 2, dh = 2) 94 # 第四塊 conv4_1-conv4_2-conv4_3-pool4 95 conv4_1 = conv_op(pool3, name='conv4_1', kh=3, kw=3, 96 n_out = 512, dh = 1, dw = 1, p = p) 97 conv4_2 = conv_op(conv4_1, name='conv4_2', kh=3, kw=3, 98 n_out = 512, dh = 1, dw = 1, p = p) 99 conv4_3 = conv_op(conv4_2, name='conv4_3', kh=3, kw=3, 100 n_out = 512, dh = 1, dw = 1, p = p) 101 pool4 = mpool_op(conv4_3, name = 'pool4', kh = 2, kw = 2, 102 dw = 2, dh = 2) 103 # 第五塊 conv5_1-conv5_2-conv5_3-pool5 104 conv5_1 = conv_op(pool4, name='conv5_1', kh=3, kw=3, 105 n_out = 512, dh = 1, dw = 1, p = p) 106 conv5_2 = conv_op(conv5_1, name='conv5_2', kh=3, kw=3, 107 n_out = 512, dh = 1, dw = 1, p = p) 108 conv5_3 = conv_op(conv5_2, name='conv5_3', kh=3, kw=3, 109 n_out = 512, dh = 1, dw = 1, p = p) 110 pool5 = mpool_op(conv5_3, name = 'pool5', kh = 2, kw = 2, 111 dw = 2, dh = 2) 112 # 把pool5 ( [7, 7, 512] ) 拉成向量 113 shp = pool5.get_shape() 114 flattened_shape = shp[1].value * shp[2].value * shp[3].value 115 resh1 = tf.reshape(pool5, [-1, flattened_shape], name = 'resh1') 116 117 # 全鏈接層1 添加了 Droput來防止過擬合 118 fc1 = fc_op(resh1, name = 'fc1', n_out = 2048, p = p) 119 fc1_drop = tf.nn.dropout(fc1, keep_prob, name = 'fc1_drop') 120 121 # 全鏈接層2 添加了 Droput來防止過擬合 122 fc2 = fc_op(fc1_drop, name = 'fc2', n_out = 2048, p = p) 123 fc2_drop = tf.nn.dropout(fc2, keep_prob, name = 'fc2_drop') 124 125 # 全鏈接層3 加一個softmax求給類別的機率 126 fc3 = fc_op(fc2_drop, name = 'fc3', n_out = 10, p = p) 127 softmax = tf.nn.softmax(fc3) 128 predictions = tf.argmax(softmax, 1) 129 return predictions, softmax, fc3, p 130 131 # 定義評測函數 132 133 def time_tensorflow_run(session, target, feed, info_string): 134 num_steps_burn_in = 10 135 total_duration = 0.0 136 total_duration_squared = 0.0 137 138 for i in range(num_batches + num_steps_burn_in): 139 start_time = time.time() 140 _ = session.run(target, feed_dict = feed) 141 duration = time.time() - start_time 142 if i >= num_steps_burn_in: 143 if not i % 10: 144 print('%s: step %d, duration = %.3f' % 145 (datetime.now(), i-num_steps_burn_in, duration)) 146 total_duration += duration 147 total_duration_squared += duration * duration 148 mean_dur = total_duration / num_batches 149 var_dur = total_duration_squared / num_batches - mean_dur * mean_dur 150 std_dur = math.sqrt(var_dur) 151 print('%s: %s across %d steps, %.3f +/- %.3f sec / batch' %(datetime.now(), info_string, num_batches, mean_dur, std_dur)) 152 153 154 def train_vgg16(): 155 with tf.Graph().as_default(): 156 image_size = 224 # 輸入圖像尺寸 157 # 生成隨機數測試是否能跑通 158 #images = tf.Variable(tf.random_normal([batch_size, image_size, image_size, 3], dtype=tf.float32, stddev=1e-1)) 159 with tf.device('/cpu:0'): 160 images, labels = cifar10.distorted_inputs() 161 keep_prob = tf.placeholder(tf.float32) 162 prediction,softmax,fc8,p = inference_op(images,keep_prob) 163 init = tf.global_variables_initializer() 164 sess = tf.Session() 165 sess.run(init) 166 time_tensorflow_run(sess, prediction,{keep_prob:1.0}, "Forward") 167 # 用以模擬訓練的過程 168 objective = tf.nn.l2_loss(fc8) # 給一個loss 169 grad = tf.gradients(objective, p) # 相對於loss的 全部模型參數的梯度 170 time_tensorflow_run(sess, grad, {keep_prob:0.5},"Forward-backward") 171 172 173 174 175 if __name__ == '__main__': 176 train_vgg16()
固然,咱們也能夠用tf.slim來簡化一下網絡結構
1 def vgg16(inputs): 2 with slim.arg_scope([slim.conv2d, slim.fully_connected], 3 activation_fn=tf.nn.relu, 4 weights_initializer=tf.truncated_normal_initializer(0.0, 0.01), 5 weights_regularizer=slim.l2_regularizer(0.0005)): 6 net = slim.repeat(inputs, 2, slim.conv2d, 64, [3, 3], scope='conv1') 7 net = slim.max_pool2d(net, [2, 2], scope='pool1') 8 net = slim.repeat(net, 2, slim.conv2d, 128, [3, 3], scope='conv2') 9 net = slim.max_pool2d(net, [2, 2], scope='pool2') 10 net = slim.repeat(net, 3, slim.conv2d, 256, [3, 3], scope='conv3') 11 net = slim.max_pool2d(net, [2, 2], scope='pool3') 12 net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='conv4') 13 net = slim.max_pool2d(net, [2, 2], scope='pool4') 14 net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='conv5') 15 net = slim.max_pool2d(net, [2, 2], scope='pool5') 16 net = slim.fully_connected(net, 4096, scope='fc6') 17 net = slim.dropout(net, 0.5, scope='dropout6') 18 net = slim.fully_connected(net, 4096, scope='fc7') 19 net = slim.dropout(net, 0.5, scope='dropout7') 20 net = slim.fully_connected(net, 1000, activation_fn=None, scope='fc8')
對比訓練結果,在同等設備和環境下,迭代200tensorflow的訓練結果是89.18%,耗時18h12min,對比paddlepaddle的效果,精度差很少,時間慢一點。其實能夠對數據進行處理後再進行訓練,轉換成tfrecord多線程輸入在訓練,時間應該會快不少。
總結
經過論文的分析和實驗的結果,我總結了幾點:
1.LRN層太耗費計算資源,做用不大,能夠捨去。
2.大卷積核能夠學習更大的空間特徵,可是須要的參數空間也更多,小卷積核雖然學習的空間特徵有限,但所需參數空間更小,多層疊加訓練可能效果更好。
3.越深的網絡效果越好,可是要避免梯度消失的問題,選取relu的激活函數、batch_normalization等均可以從必定程度上避免。
4.小卷積核+深層網絡的效果,在迭代相同次數時,比大卷積核+淺層網絡效果更好,對於咱們本身設計網絡時能夠有借鑑做用。可是前者的訓練時間可能更長,不過可能比後者收斂速度更快,精確度更好。
ps:爲了方便你們及時看到個人更新,我搞了一個公衆號,之後文章會同步發佈與公衆號和博客園,這樣你們就能及時收到通知啦,有不懂的問題也能夠在公衆號留言,這樣我可以及時看到並回復。
能夠經過掃下面的二維碼或者直接搜公衆號:CharlotteDataMining 就能夠了,謝謝關注^_^
本文同步發佈於:https://mp.weixin.qq.com/s?__biz=MzI0OTQwMTA5Ng==&mid=2247483677&idx=1&sn=9402a0532bc6330f83e58c7e18f51b93&chksm=e9935b7adee4d26cd69de6c89b25be994735094ef420befd1d275f97821819ba9528f13e079a#rd
參考文獻:
1.https://arxiv.org/pdf/1409.1556.pdf