Inception這個名字的靈感來自http://www.javashuo.com/tag/nin和[we need to go deeper][3]這句話。本文中的deep的含義主要是兩個方面:發明了Inception模塊;增長了網絡的深度。通常來講,Inception模塊能夠被看做12的一個合乎邏輯的實現,這其中靈感和指導思想主要來自於Arora et al的理論工做。算法
2. 相關工做
自從LeNet-5開始,卷積神經網絡(CNN)有了一個典型的標準結構:堆疊卷積層(可選follow contrast normalization和max pool),而後跟一層或多層FC。基礎結構的各類變種在圖像分類任務中很流行,而且在MNIST、CIFAR和ImageNet分類比賽上取得了state of art。當前對於更大的數據集,當前的趨勢是增長模型的深度和寬度,同時使用dropout來克服overfitting問題。安全
當前最好的物體探測方法是Regions with Convolutional Neural Networks(R-CNN)。R-CNN分解overall物體探測任務爲兩部分:去首先利用低層的線索(顏色,超像素的一致性)生成物體潛在的區域(不考慮類別),而後使用CNN分類器去識別這些區域的物體的種類。這種兩步方法充分利用了:低層信息進行bounding box分割的準確度,和state of art模型的強大分類能力。在探測任務中,咱們採用了一個類似的pipline,可是在兩個步驟都有了提升,例如:multi-box prediction for higher object bounding box recall, and ensemble approach for better categorization of bounding box proposals。數據結構
3. 動機和更高層的考慮(Motivation and High Level Considerations)
最直接的提升深度神經網絡的性能的方法是增長它的size。這包括增長深度(the number of levels)和寬度(the number of units at each level)。這是一個訓練更高質量模型的容易且安全的方法,尤爲是在有大量帶標籤的訓練數據的狀況下。但這個簡單的解決方案帶來了兩個主要問題。
更大size通常意味着大量的參數,這使得加大後的網絡更容易過擬合,尤爲是帶標籤數據比較少的狀況下。這很難,由於高質量的大數據集的製做須要技巧,而且費用很高。尤爲是當expert human raters須要去分辨細微差異的種類。
退一步(on the downside),今天的計算硬件是很是不高效的,當涉及到非均勻的稀疏數據結構的計算時。當前的視覺方面的機器學習系統在空間域上一用稀疏性by卷積。可是卷積能夠當作一系列的局部FC。因此稀疏矩陣應該被聚類,轉爲密集矩陣。因此,可將相似的方法應用到non-uniform deep architectures的自動構建。
#coding:utf-8#inception_modules.py''' Inception module '''import tensorflow as tf
relu = tf.nn.relu
definception_naive(inputs,
sub_chs,
scope='inception_naive'):''' sub_chs: sub channels '''[sub_ch1,sub_ch2,sub_ch3]= sub_chs
with tf.variable_scope(scope):
x = inputs
sub1 = tf.layers.Conv2D(sub_ch1,[1,1], padding='SAME', activation=relu)(x)
sub2 = tf.layers.Conv2D(sub_ch2,[3,3], padding='SAME', activation=relu)(x)
sub3 = tf.layers.Conv2D(sub_ch3,[5,5], padding='SAME', activation=relu)(x)
sub4 = tf.layers.MaxPooling2D([3,3],1, padding='SAME')(x)
x = tf.concat([sub1,sub2,sub3,sub4], axis=-1)return x
definception(inputs,
sub_chs,
scope='inception'):''' sub_chs: sub channels '''[sub_ch1, sub_ch2, sub_ch3, sub_ch4]= sub_chs
with tf.variable_scope(scope):
x = inputs
sub1 = tf.layers.Conv2D(sub_ch1,[1,1], padding='SAME', activation=relu)(x)
_sub2 = tf.layers.Conv2D(sub_ch2[0],[1,1], padding='SAME', activation=relu)(x)
sub2 = tf.layers.Conv2D(sub_ch2[1],[3,3], padding='SAME', activation=relu)(_sub2)
_sub3 = tf.layers.Conv2D(sub_ch3[0],[1,1], padding='SAME', activation=relu)(x)
sub3 = tf.layers.Conv2D(sub_ch3[1],[5,5], padding='SAME', activation=relu)(_sub3)
_sub4 = tf.layers.MaxPooling2D([3,3],1, padding='SAME')(x)
sub4 = tf.layers.Conv2D(sub_ch4,[1,1], padding='SAME', activation=relu)(_sub4)
x = tf.concat([sub1,sub2,sub3,sub4], axis=-1)return x
if __name__ =='__main__':
x = tf.placeholder(tf.float32,[192,28,28,3])
y = inception_naive(x,[64,128,32])assert y.get_shape().as_list()==[192,28,28,227]print('inception_naive is ok')
y1 = inception(x,[64,[96,128],[16,32],32])assert y1.get_shape().as_list()==[192,28,28,256]print('inception is ok')
下面給出GoogLeNet的實現:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
#coding:utf-8''' Inception v1 '''import tensorflow as tf
relu = tf.nn.relu
import inception_modules as modules
defprint_activation(x):print(x.op.name, x.get_shape().as_list())definference(inputs,
num_classes=10,
is_training=True,
dropout_rate=0.4):''' inputs: a tensor of images num_classes: the num of category. is_training: set ture when it used for training dropout_prob: the rate of dropout during training '''
caches =[]
x = inputs
print_activation(x)# conv1
x = tf.layers.Conv2D(64,[7,7],2, activation=relu, padding='SAME', name='conv1')(x)
print_activation(x)# pool1
x = tf.layers.MaxPooling2D([3,3],2, padding='SAME', name='pool1')(x)
print_activation(x)# lrn1
x = tf.nn.local_response_normalization(x, name='lrn1')
print_activation(x)# conv2
x = tf.layers.Conv2D(64,[1,1],1, activation=relu, padding='SAME', name='conv2')(x)
print_activation(x)# conv3
x = tf.layers.Conv2D(192,[3,3],1, activation=relu, padding='SAME', name='conv3')(x)
print_activation(x)# lrn2
x = tf.nn.local_response_normalization(x, name='lrn2')
print_activation(x)# pool3
x = tf.layers.MaxPooling2D([3,3],2, padding='SAME', name='pool3')(x)
print_activation(x)with tf.variable_scope('inception3'):# inception_3a
x = modules.inception(x,[64,[96,128],[16,32],32], scope='inception_3a')
print_activation(x)# inception_3b
x = modules.inception(x,[128,[128,192],[32,96],64], scope='inception_3b')
print_activation(x)# pool4
x = tf.layers.MaxPooling2D([3,3],2, padding='SAME', name='pool4')(x)
print_activation(x)with tf.variable_scope('inception4'):# inception_4a
x = modules.inception(x,[192,[96,208],[16,48],64], scope='inception_4a')
caches.append(x)
print_activation(x)# inception_4b
x = modules.inception(x,[160,[112,224],[24,64],64], scope='inception_4b')
print_activation(x)# inception_4c
x = modules.inception(x,[128,[128,256],[24,64],64], scope='inception_4c')
print_activation(x)# inception_4d
x = modules.inception(x,[112,[144,288],[32,64],64], scope='inception_4d')
print_activation(x)
caches.append(x)# inception_4e
x = modules.inception(x,[256,[160,320],[32,128],128], scope='inception_4e')
print_activation(x)# pool5
x = tf.layers.MaxPooling2D([3,3],2, padding='SAME', name='pool5')(x)
print_activation(x)with tf.variable_scope('inception5'):# inception_5a
x = modules.inception(x,[256,[160,320],[32,128],128], scope='inception_5a')
print_activation(x)# inception_5b
x = modules.inception(x,[384,[192,384],[48,128],128], scope='inception_5b')
print_activation(x)# avg_pool
_ksize = x.get_shape().as_list()[1]
x = tf.layers.AveragePooling2D([_ksize,_ksize],1, name='avg_pool')(x)
print_activation(x)# dropout
x = tf.layers.Dropout(dropout_rate, name='dropout')(x)
print_activation(x)# linear+softmax
logits = tf.layers.Conv2D(num_classes,[1,1],1,
activation=tf.nn.softmax, name='linear-softmax')(x)
print_activation(logits)return logits, caches
defbuild_cost(logits, labels, scope='costs'):with tf.variable_scope(scope):
cost = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(
logits=logits, labels=labels), name='xent')return cost
defbuild_sub_cost(cache, labels, scope='sub_costs'):
num_classes = labels.get_shape().as_list()[-1]with tf.variable_scope(scope):
x = cache
x = tf.layers.AveragePooling2D([5,5],3, name='avg_pool')(x)
x = tf.layers.Conv2D(128,[1,1],1, activation=relu, name='conv')(x)
_ksize = x.get_shape().as_list()[1]
x = tf.layers.Conv2D(1024,[_ksize,_ksize],1, activation=relu, name='fc')(x)
x = tf.layers.Dropout(0.7)(x)
logits = tf.layers.Conv2D(num_classes,[1,1],1, activation=tf.nn.softmax, name='fc-softmax')(x)
cost = build_cost(tf.layers.flatten(logits), labels)return cost
defbuild_train_op(cost, lrn_rate=0.001, scope='train'):with tf.variable_scope(scope):
train_op = tf.train.AdamOptimizer(lrn_rate).minimize(cost)return train_op
if __name__ =='__main__':
mode ='train'with tf.variable_scope('inputs'):
images = tf.placeholder(tf.float32,[None,224,224,3])
labels = tf.placeholder(tf.float32,[None,1000])
logits, caches = inference(inputs=images, num_classes=1000)
logits = tf.layers.flatten(logits)print('inference is ok!')if mode=='train':with tf.variable_scope('costs'):
cost = tf.add_n([build_cost(logits, labels),
build_sub_cost(caches[0], labels, scope='sub_cost1')*0.3,
build_sub_cost(caches[1], labels, scope='sub_cost2')*0.3])else:
cost = build_cost(logits, labels)print('build_cost is ok!')
train_op = build_train_op(cost, lrn_rate=0.001)print('build_train_op is ok!')
sess = tf.Session()
tf.summary.FileWriter('./',sess.graph)
注意:使用本博客的代碼,請添加引用
[3]: We Need to Go Deeper:The world of We Need to go Deeper was heavily inspired by the works of Jules Verne, with 20,000 Leagues Under the Sea in particular being a heavy influence on our game’s universe. What would it be like to be a crew member aboard the Nautilus? To be exploring the deep, facing the wrath of a giant squid one moment, and uncovering the lost city of Atlantis the next?