圖像識別與卷積神經網絡

時間 2019-12-19

標籤圖像識別神經網絡简体版

原文原文鏈接

6.1 圖像識別問題簡介及經典數據集node

CIFAR 數據集就是一個影響力很大的圖像分類數據集。CIFAR數據集分爲了CIFAR-10 和 CIFAR-100 兩個問題，它們都是圖像詞典項目( Visual Dictionary ) 中 800 萬張圖片的一個子集。CIFAR數據集中的圖片爲32×32的彩色圖片。python

CIFAR-10 問題收集了來自 10 個不一樣種類的 60000 張圖片。每張圖片大小固定且僅含一個種類的實體。與MNIST相比，最大區別是圖片由黑白變成彩色，且分類難度更高。git

不管是 MNIST 數據集仍是 CIFAR 數據集，相比真實環境下的圖像識別問題，有 2 個最大的問題。第一，現實生活中的圖片分辨率要遠高於 32 × 32，並且圖像的分辨率也不會是固定的。第二，現實生活中的物體類別不少，不管是 10 種仍是 100 種都遠遠不夠，並且一張圖片中不會只出現一個種類的物體。正則表達式

ImageNet很大程度上解決了這兩個問題，更加貼近真實環境下的圖像識別問題。算法

ImageNet是一個基於WordNet的大型圖像數據庫。有將近1500萬圖片被關聯到了WordNet的大約2000個名詞同義詞集上。每個與ImageNet相關的WordNet同義詞集都表明了現實世界中的一個實體，能夠被認爲是分類問題中的一個類別。一張圖片中可能有多個同義詞集所表明的實體。數據庫

ILSVRC2012圖像分類數據集是ImageNet的子集，包含了來自1000個類別的120萬張圖片，其中每張圖片只屬於一個類別。圖片是直接從網上爬取的，因此圖片的大小從幾千字節到幾百萬字節不等。數組

top-N正確率是指圖像識別算法給出前N個答案中有一個是正確的機率。在圖像分類問題上，不少學術論文都將前N個答案的正確率做爲比較的方法，其中N的取值通常爲3或5。網絡

6.2 卷積神經網絡簡介架構

在全鏈接神經網絡中，每相鄰兩層之間的節點都有邊相連，因而通常會將每一層全鏈接層中的節點組織成一列，這樣方便顯示鏈接結構。而對於卷積神經網絡，相鄰兩層之間只有部分節點相連，通常會將每一層卷積層的節點組織成一個三維矩陣。雖然直觀上差別很大，實際上總體架構很是類似，並且輸入輸出、訓練流程也基本一致。兩者惟一的區別在於神經網絡中相鄰兩層的鏈接方式。app

使用全鏈接神經網絡處理圖像的最大問題在於全鏈接層的參數太多，參數增多除了致使計算速度減慢，還很容易致使過擬合問題。卷積神經網絡的目的就是爲了減小參數個數。

卷積神經網絡中前幾層中每一個節點只和上一層中的部分節點相連。

卷積神經網絡的五種結構：

　　1.輸入層：一張圖片的像素矩陣，長×寬×深度(色道)

　　2.卷積層：卷積層中每一個節點的輸入只是上一層神經網絡的一小塊，這個小塊的經常使用大小有3×3或5×5。卷積層試圖將神經網絡中的每一小塊進行更深刻地分析從而獲得抽象程度更高的特徵。通常來講，經過卷積層處理過的節點矩陣深度會增長。

　　3.池化層：不改變三維矩陣的深度，但能夠縮小矩陣的大小。池化操做能夠認爲是將一張分辨率較高的圖片轉化爲分辨率較低的圖片。經過池化層，能夠進一步縮小最後全鏈接層中節點的個數，從而達到減小整個神經網絡中參數的目的。

　　4.全鏈接層：通過幾輪卷積層和池化層的處理以後，能夠認爲圖像中的信息已經被抽象成了信息含量更高的特徵。在卷積層和池化層完成自動圖像特徵提取以後，仍然須要全鏈接層完成分類任務。

　　5.Softmax層：用於分類問題。經過Softmax層，能夠獲得當前樣例屬於不一樣種類的機率分佈狀況。

6.3 卷積神經網絡經常使用結構

卷積層神經網絡結構中最重要的部分是過濾器(filter)或者內核(kernel)，過濾器能夠把當前層神經網絡上的一個子節點矩陣轉化爲下一層神經網絡上的一個單位節點矩陣，即長寬爲1，深度不限的節點矩陣。

過濾器所處理的節點矩陣的長和寬都是人工指定的，這個節點矩陣的尺寸也被稱爲過濾器的尺寸。經常使用尺寸有3×3或5×5。由於過濾器處理的矩陣深度和當前層神經網絡節點矩陣的深度是一致的，因此雖然節點矩陣是三維的，但過濾器的尺寸只需指定兩個維度。

過濾器中另一個須要人工指定的設置是處理獲得的單位節點矩陣的深度，稱爲過濾器的深度。

(局部)過濾器的前向傳播過程就是經過左側小矩陣中的節點計算出右側單位節點矩陣中節點的過程。與全鏈接層相似，也是權重和偏置項。如圖6-8

(總體)卷積層的前向傳播過程就是經過將一個過濾器從神經網絡當前層的左上角移動到右下角，而且在移動中計算每個對應的單位矩陣獲得的。如圖6-10

過濾器每移動一次，能夠計算出一個值(當深度爲 k 時會計算出 k 個值)，將這些數值拼接成一個新的矩陣，就完成了卷積層前向傳播的過程。

當過濾器的大小不爲 1×1 時，卷積層前向傳播獲得的矩陣的尺寸要小於當前層矩陣的尺寸。

爲了不尺寸的變化，能夠在當前層矩陣的邊界上加入全0填充。如圖6-11

除了使用全0填充，還能夠經過設置過濾器移動的步長來調整結果矩陣的大小。圖6-12顯示了當移動步長爲2且使用全0填充時，卷積層前向傳播的過程。

輸出層大小的肯定：

　　使用全0填充時，向上取整

　　　　out_length =in_length / stride_length

　　　　out_width = in_width / stride_width

　　不使用全0填充時，向上取整

　　　　out_length = (in_length - filter_length +1) / stride_length

　　　　out_width = (in_width - filter_width + 1) / stride_width

卷積神經網絡有一個很是重要的性質就是每個卷積層中使用的過濾器中的參數相同，這可使得圖像上的內容不受位置的影響。以mnist手寫體數字識別爲例，不管數字「1」出如今左上角仍是右下角，圖片的種類都是不變的。並且，共享每一個卷積層中過濾器的參數能夠巨幅減小神經網絡上的參數。以CIFAR-10問題爲例，輸入層矩陣的維度爲32×32×3，假設卷積層使用的過濾器尺寸爲5×5，深度爲16，那麼這個卷積層的參數個數爲5*5*3*16+16=1216個(能夠想象爲輸入層爲5*5*三、輸出層爲16*1的全鏈接層)。並且，參數個數只與過濾器的尺寸、深度以及當前層節點矩陣的深度有關，而與圖片大小無關，這使得卷積神經網絡能夠很好地擴展到更大的圖像數據上。

經過tensorflow實現卷積層的前向傳播，

 1 x = tf.placeholder(tf.float32, shape=[None, 32, 32, 3], name='x-input')
 2 # shape分別爲過濾器尺寸、當前層深度、過濾器深度
 3 filter_weight = tf.get_variable(
 4     'weights', shape=[5, 5, 3, 16], initializer=tf.truncated_normal_initializer(stddev=0.1)
 5 )
 6 biases = tf.get_variable(
 7     'biases', shape=[16], initializer=tf.constant_initializer(0.1)  # shape爲過濾器深度
 8 )
 9 # 第一個輸入爲當前層的節點矩陣，該矩陣爲四維矩陣，第一個維度對應一個輸入batch, 後三個維度對應一個節點矩陣(長*寬*深)
10 # 第二個輸入爲卷積層的權重，也就是過濾器
11 # 第三個輸入爲不一樣維度上的步長，長度爲4的數組，要求第一維度和第四維度必定爲1, 由於卷積層的步長只對矩陣的長和寬有效
12 # 第四個輸入爲padding, 取值能夠爲SAME或VALID
13 conv = tf.nn.conv2d(
14     x, filter_weight, strides=[1, 1, 1, 1], padding='SAME'
15 )
16 # print(conv.shape)  # (?, 32, 32, 16)  # 深度變成16, 根據公式，使用全0填充時爲32, 不使用時爲28
17 
18 # 不能直接使用加法，由於矩陣上不一樣位置的節點都須要加上一樣的偏置項。
19 # 例如圖6-13所示, 雖然下一層神經網絡的大小爲 2×2, 可是偏置項只有一個數(由於深度爲1), 而2×2矩陣中的每個值都須要加上這個偏置項。
20 bias = tf.nn.bias_add(conv, biases)
21 actived_conv = tf.nn.relu(bias)
22 
23 # 注意區分輸入的四個維度、權重的四個維度、步長的四個維度。

6.3.2 池化層

池化層主要用於減少矩陣的尺寸，從而減小最後全鏈接層中的參數。使用池化層既能夠加快計算速度也有防止過擬合問題的做用。

池化層的前向傳播過程也是經過移動一個相似過濾器的結構完成的。但池化層過濾器中的計算不是節點的加權和，而是採用更簡單的最大值或平均值運算。使用最大值操做的池化層稱爲最大池化層，使用平均值操做的池化層稱爲平均池化層。

與卷積層的過濾器相似，池化層的過濾器也須要人工設定過濾器的尺寸、是否使用全0填充以及過濾器移動的步長等。卷積層和池化層中過濾器的移動方式是類似的，惟一的區別在於卷積層使用的過濾器是橫跨整個深度的，而池化層使用的過濾器隻影響一個深度上的節點。因此池化層的過濾器除了在長和寬上移動，還須要在深度上移動。

卷積層，深度爲3，3個相加

池化層，深度爲2，分別處理

經過tensorflow實現池化層的前向傳播，

1 # 第一個輸入爲當前層節點矩陣(四維矩陣)
2 # 第二個爲過濾器尺寸，長度爲4的數組，第一維度和第四維度必須爲1，這意味着過濾器不可跨不一樣輸入樣例和節點矩陣深度，使用最多的是[1,2,2,1]或[1,3,3,1]  # 與卷積層不一樣
3 # 第三個輸入爲步長，長度爲4的數組，第一維度和第四維度必須爲1，這意味着池化層不能減小節點矩陣的深度或輸入樣例的個數。
4 # 第四個輸入爲padding
5 pool = tf.nn.max_pool(actived_conv, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='SAME')

卷積層和池化層的最大不一樣在於過濾器：[5, 5, 3, 16] [1, 3, 3, 1]

6.4 經典卷積網絡模型

6.4.1 LeNet-5模型

第一個成功應用於數字識別問題的卷積神經網絡。

LetNet-5模型接受的輸入層大小爲三維矩陣(長×寬×深)。

參數個數遠遠小於鏈接個數，但卷積層的鏈接個數？？沒搞懂爲啥還要加1。

只有全鏈接層的權重須要加入正則化。

relu和dropout不在最後一層使用。

  1 # mnist_inference.py
  2 
  3 import tensorflow as tf
  4 
  5 IMAGE_SIZE = 28
  6 NUM_CHANNELS = 1  # 黑白
  7 NUM_LABELS = 10
  8 
  9 # 第一層卷積層的尺寸和深度
 10 CONV1_SIZE = 5
 11 CONV1_DEEP = 32
 12 # 第二層卷積層的尺寸和深度
 13 CONV2_SIZE = 5
 14 CONV2_DEEP = 64
 15 # 全鏈接層的節點個數
 16 FC_SIZE = 512
 17 
 18 def get_weight_variable(shape, regularizer):
 19     weights = tf.get_variable('weight', shape, initializer=tf.truncated_normal_initializer(stddev=0.1))
 20     if regularizer:
 21         tf.add_to_collection('losses', regularizer(weights))
 22     return weights
 23 
 24 def inference(input_tensor, train, regularizer):
 25     with tf.variable_scope('layer1-conv1'):
 26         # 輸入層爲28×28×1，尺寸爲5×5，深度爲32，步長爲1，輸出層爲28×28×32
 27         conv1_weights = get_weight_variable([CONV1_SIZE, CONV1_SIZE, NUM_CHANNELS, CONV1_DEEP], None)
 28         conv1_biases = tf.get_variable('bias', [CONV1_DEEP], initializer=tf.constant_initializer(0.0))
 29         conv1 = tf.nn.conv2d(input_tensor, conv1_weights, strides=[1, 1, 1, 1], padding='SAME')
 30         relu1 = tf.nn.relu(tf.nn.bias_add(conv1, conv1_biases))
 31 
 32     with tf.name_scope('layer2-pool1'):
 33         # 輸出層爲14*14*32
 34         pool1 = tf.nn.max_pool(relu1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
 35 
 36     with tf.variable_scope('layer3-conv2'):
 37         # 尺寸爲5*5，深度爲64，輸出層爲14*14*64
 38         conv2_weights = get_weight_variable([CONV2_SIZE, CONV2_SIZE, CONV1_DEEP, CONV2_DEEP], None)
 39         conv2_biases = tf.get_variable('bias', [CONV2_DEEP], initializer=tf.constant_initializer(0.0))
 40         conv2 = tf.nn.conv2d(pool1, conv2_weights, strides=[1, 1, 1, 1], padding='SAME')
 41         relu2 = tf.nn.relu(tf.nn.bias_add(conv2, conv2_biases))
 42 
 43     with tf.name_scope('layer4-pool2'):
 44         # 輸出層爲7*7*64
 45         pool2 = tf.nn.max_pool(relu2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
 46 
 47     # 全鏈接層的輸入格式爲特徵向量，這就須要將三維矩陣拉直成一維向量。
 48     pool_shape = pool2.get_shape().as_list()  # 包含一個batch
 49     nodes = pool_shape[1] * pool_shape[2] * pool_shape[3]  # 3136
 50     reshaped = tf.reshape(pool2, [pool_shape[0], nodes])
 51 
 52     # dropout在訓練時會隨機將部分節點的輸出改成0。dropout方法能夠進一步提高模型可靠性並防止過擬合，dropout過程只在訓練時使用。
 53     with tf.variable_scope('layer5-fc1'):
 54         # 只有全鏈接層的權重須要加入正則化
 55         fc1_weights = get_weight_variable([nodes, FC_SIZE], regularizer)
 56         fc1_biases = tf.get_variable('bias', shape=[FC_SIZE], initializer=tf.constant_initializer(0.1))
 57         fc1 = tf.nn.relu(tf.matmul(reshaped, fc1_weights) + fc1_biases)
 58         if train:
 59             fc1 = tf.nn.dropout(fc1, 0.5)
 60 
 61     with tf.variable_scope('layer6-fc2'):
 62         fc2_weights = get_weight_variable([FC_SIZE, NUM_LABELS], regularizer)
 63         fc2_biases = tf.get_variable('bias', shape=[NUM_LABELS], initializer=tf.constant_initializer(0.1))
 64         logit = tf.matmul(fc1, fc2_weights) + fc2_biases
 65 
 66     # relu和dropout不在最後一層使用。  後面會使用sparse_softmax_cross_entropy_with_logits計算交叉熵。
 67     return logit
 68 
 69 
 72 # mnist_train.py
 73 
 74 #!coding:utf8
 75 import tensorflow as tf
 76 from tensorflow.examples.tutorials.mnist import input_data
 77 import mnist_inference
 78 import os
 79 import numpy as np
 80 
 81 BATCH_SIZE = 100
 82 
 83 LEARNING_RATE_BASE = 0.8
 84 LEARNING_RATE_DECAY = 0.99
 85 REGULARIZATION_RATE = 0.0001  # lambda
 86 TRAINING_STEPS = 30000
 87 MOVING_AVERAGE_DACAY = 0.99
 88 
 89 MODEL_SAVE_PATH = '/home/yangxl/files/save_model'
 90 MODEL_NAME = 'conv2d.ckpt'
 91 
 92 
 93 def train(mnist):
 94     # 由於從池化層到全鏈接層要進行reshape，因此不能爲shape[0]不能爲None。
 95     x = tf.placeholder(tf.float32, [BATCH_SIZE, mnist_inference.IMAGE_SIZE, mnist_inference.IMAGE_SIZE, mnist_inference.NUM_CHANNELS], 'x-input')
 96     y_ = tf.placeholder(tf.float32, [BATCH_SIZE, mnist_inference.NUM_LABELS], 'y-input')
 97 
 98     # 正則化
 99     from tensorflow.contrib.layers import l2_regularizer
100     regularizer = l2_regularizer(REGULARIZATION_RATE)
101 
102     y = mnist_inference.inference(x, True, regularizer)
103 
104     global_step = tf.Variable(0, trainable=False)
105 
106     # 滑動平均
107     variables_averages = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DACAY, global_step)
108     variables_averages_op = variables_averages.apply(tf.trainable_variables())
109     # 互斥分類問題；
110     # 由於標準答案是一個長度爲10的一維數組，而該函數須要提供的是一個正確答案的數字，因此須要使用tf.argmax 函數來獲得正確答案對應的類別編號。
111     cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y, labels=tf.argmax(y_, 1))
112     cross_entropy_mean = tf.reduce_mean(cross_entropy)
113     loss = cross_entropy_mean + tf.add_n(tf.get_collection('losses'))
114 
115     learning_rate = tf.train.exponential_decay(LEARNING_RATE_BASE, global_step, mnist.train.num_examples / BATCH_SIZE,
116                                                LEARNING_RATE_DECAY, staircase=True)
117     train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step)
118     with tf.control_dependencies([train_step, variables_averages_op]):
119         train_op = tf.no_op(name='train')
120 
121     saver = tf.train.Saver()
122 
123     with tf.Session() as sess:
124         tf.global_variables_initializer().run()
125 
126         for i in range(TRAINING_STEPS):
127             xs, ys = mnist.train.next_batch(BATCH_SIZE)  # xs.shape=(100, 784)
128             reshaped_xs = np.reshape(xs, [BATCH_SIZE, mnist_inference.IMAGE_SIZE, mnist_inference.IMAGE_SIZE, mnist_inference.NUM_CHANNELS])
129             _, loss_value, step = sess.run([train_op, loss, global_step], feed_dict={x: reshaped_xs, y_: ys})
130 
131             if i % 1000 == 0:
132                 print('after %d training steps, loss on training batch is %g ' % (i, loss_value))
133                 saver.save(sess, os.path.join(MODEL_SAVE_PATH, MODEL_NAME), global_step=global_step)
134 
135 
136 def main(argv=None):
137     mnist = input_data.read_data_sets('/home/yangxl/files/mnist', one_hot=True)
138     import time
139     # print('start...', int(time.time()))
140     train(mnist)
141     # print(int(time.time()))
142 
143 
144 if __name__ == '__main__':
145     tf.app.run()
146 
147 
150 # mnist_eval.py
151 
152 #!coding:utf8
153 import tensorflow as tf
154 from tensorflow.examples.tutorials.mnist import input_data
155 import mnist_inference
156 import mnist_train
157 import time
158 import numpy as np
159 
160 # 每10秒加載一次最新模型，並在測試數據上測試最新模型的正確率。
161 EVAL_INTERVAL_SECS = 60
162 
163 def evaluate(mnist):
164     x = tf.placeholder(tf.float32, [mnist.test.num_examples, mnist_inference.IMAGE_SIZE, mnist_inference.IMAGE_SIZE, mnist_inference.NUM_CHANNELS], 'x-input')
165     y_ = tf.placeholder(tf.float32, [mnist.test.num_examples, mnist_inference.NUM_LABELS], 'y-input')
166 
167     y = mnist_inference.inference(x, False, None)
168 
169     correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
170     accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
171 
172     # 滑動平均
173     variables_averages = tf.train.ExponentialMovingAverage(mnist_train.MOVING_AVERAGE_DACAY)
174     variables_to_restore = variables_averages.variables_to_restore()
175 
176     saver = tf.train.Saver(variables_to_restore)  # 訓練時須要保存滑動平均模型，驗證時才能加載到。
177 
178     while True:
179         with tf.Session() as sess:
180             reshape_x = np.reshape(mnist.test.images, [-1, 28, 28, 1])
181             validate_feed = {x: reshape_x, y_: mnist.test.labels}
182 
183             # 經過checkpoint文件自動找到目錄中最新模型的文件名
184             ckpt = tf.train.get_checkpoint_state(mnist_train.MODEL_SAVE_PATH)
185             if ckpt and ckpt.model_checkpoint_path:
186                 saver.restore(sess, ckpt.model_checkpoint_path)
187 
188                 global_step = ckpt.model_checkpoint_path.split('/')[-1].split('-')[-1]
189                 accuracy_score = sess.run(accuracy, feed_dict=validate_feed)
190                 print('after %s training steps, validation accuracy = %g ' % (global_step, accuracy_score))
191             else:
192                 print('No checkpoint file found')
193                 return
194         time.sleep(EVAL_INTERVAL_SECS)
195 
196 
197 def main(argv=None):
198     mnist = input_data.read_data_sets('/home/yangxl/files/mnist', one_hot=True)
199     print('start...')
200     evaluate(mnist)
201 
202 
203 if __name__ == '__main__':
204     tf.app.run()

代碼麼問題，損失函數不固定，準確率大約爲0.117，應該大約爲99.4%纔對啊。

一種卷積神經網絡架構不能解決全部問題。好比，LeNet-5模型就沒法很好地處理相似ImageNet這樣比較大的圖像數據集。

如下正則表達式公式總結了一些經典的用於圖片分類問題的卷積神經網絡架構：輸入層 --> (卷積層+ --> 池化層?)+ --> 全鏈接層+

大部分卷積神經網絡中通常最多連續使用三層卷積層。

在多輪卷積層和池化層以後，卷積神經網絡在輸出以前通常會通過1~2個全鏈接層。

在過濾器的深度上，大部分卷積神經網絡都採用逐層遞增的方式。

6.4.2 Inception-v3模型

LeNet-5模型中，不一樣卷積層經過串聯的方式鏈接在一塊兒，而Inception-v3模型中，inception結構是將不一樣卷積層經過並聯的方式結合在一塊兒。

在6.4.1中提到了一個卷積層可使用邊長爲一、3或5的過濾器，那麼如何在這些邊長中選擇呢？inception模型給出了一個方案，那就是同時使用全部不一樣尺寸的過濾器，而後再將獲得的矩陣拼接起來。

雖然過濾器的尺寸不一樣，但若是全部的過濾器都使用全0填充而且步長爲1，那麼前向傳播獲得的結果矩陣的長和寬都與輸入矩陣一致。這樣通過不一樣過濾器處理的結果矩陣能夠拼接成一個更深的矩陣。

Inception-v3 模型總共有46 層(圖中方框裏的層數)，由 11 個(圖中方框) Inception 模塊組成。在 Inception-v3 模型中有 96 個卷積層。

inception-v3模型的代碼和slim庫，

  1 import tensorflow as tf
  2 import tensorflow.contrib.slim as slim
  3 
  4 trunc_normal = lambda stddev: tf.truncated_normal_initializer(0.0, stddev)
  5 
  6 def inception_v3_base(inputs,
  7                       final_endpoint='Mixed_7c',
  8                       min_depth=16,
  9                       depth_multiplier=1.0,
 10                       scope=None):
 11     end_points = {}
 12 
 13     if depth_multiplier <= 0:
 14         raise ValueError('depth_multiplier is not greater than zero.')
 15     depth = lambda d: max(int(d * depth_multiplier), min_depth)
 16 
 17     with tf.variable_scope(scope, 'InceptionV3', [inputs]):
 18         # arg_scope用於設置默認的參數取值
 19         with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d],
 20                        stride=1,
 21                        padding='VALID'):
 22             # 299 × 299 × 3
 23             end_point = 'Conv2d_1a_3x3'  # 字母數字下劃線，乘號用x代替
 24             # 不使用全0填充
 25             net = slim.conv2d(inputs, depth(32), [3, 3], stride=2, scope=end_point)
 26             end_points[end_point] = net
 27             if end_point == final_endpoint:
 28                 return net, end_points
 29             # 149 × 149 × 32
 30             end_point = 'Conv2d_2a_3x3'
 31             # 不使用全0填充，步長爲1
 32             net = slim.conv2d(net, depth(32), [3, 3], scope=end_point)
 33             end_points[end_point] = net
 34             if end_point == final_endpoint:
 35                 return net, end_points
 36             # 147 × 147 × 32
 37             end_point = 'Conv2d_2b_3x3'
 38             net = slim.conv2d(net, depth(64), [3, 3], padding='SAME', scope=end_point)
 39             end_points[end_point] = net
 40             if end_point == final_endpoint:
 41                 return net, end_points
 42             # 147 × 147 × 64
 43             end_point = 'MaxPool_3a_3x3'
 44             net = slim.max_pool2d(net, [3, 3], stride=2, scope=end_point)
 45             end_points[end_point] = net
 46             if end_point == final_endpoint:
 47                 return net, end_points
 48             # 73 × 73 × 64
 49             end_point = 'Conv2d_3b_1x1'
 50             net = slim.conv2d(net, depth(80), [1, 1], scope=end_point)
 51             end_points[end_point] = net
 52             if end_point == final_endpoint:
 53                 return net, end_points
 54             # 73 × 73 × 80
 55             end_point = 'Conv2d_4a_3x3'
 56             net = slim.conv2d(net, depth(192), [3, 3], scope=end_point)
 57             end_points[end_point] = net
 58             if end_point == final_endpoint:
 59                 return net, end_points
 60             # 71 × 71 × 192
 61             end_point = 'MaxPool_5a_3x3'
 62             net = slim.max_pool2d(net, [3, 3], stride=2, scope=end_point)
 63             end_points[end_point] = net
 64             if end_point == final_endpoint:
 65                 return net, end_points
 66             # 35 × 35 × 192
 67 
 68         # Inception blocks
 69         with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d],
 70                             stride=1,
 71                             padding='SAME'):
 72             # mixed: 35 × 35 × 256
 73             end_point = 'Mixed_5b'
 74             with tf.variable_scope(end_point):
 75                 with tf.variable_scope('Branch_0'):
 76                     branch_0 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1')
 77                 with tf.variable_scope('Branch_1'):
 78                     branch_1 = slim.conv2d(net, depth(48), [1, 1], scope='Conv2d_0a_1x1')
 79                     branch_1 = slim.conv2d(branch_1, depth(64), [5, 5], scope='Conv2d_0b_1x1')
 80                 with tf.variable_scope('Branch_2'):
 81                     branch_2 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1')
 82                     branch_2 = slim.conv2d(branch_2, depth(96), [3, 3], scope='Conv2d_0b_1x1')
 83                     branch_2 = slim.conv2d(branch_2, depth(96), [3, 3], scope='Conv2d_0c_1x1')
 84                 with tf.variable_scope('Branch_3'):
 85                     branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
 86                     branch_3 = slim.conv2d(branch_3, depth(32), [1, 1], scope='Conv2d_0b_1x1')
 87                 net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
 88             end_points[end_point] = net
 89             if end_point == final_endpoint:
 90                 return net, end_points
 91 
 92             # mixed_1: 35 × 35 × 288
 93             end_point = 'Mixed_5c'
 94             with tf.variable_scope(end_point):
 95                 with tf.variable_scope('Branch_0'):
 96                     branch_0 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1')
 97                 with tf.variable_scope('Branch_1'):
 98                     branch_1 = slim.conv2d(net, depth(48), [1, 1], scope='Conv2d_0b_1x1')
 99                     branch_1 = slim.conv2d(branch_1, depth(64), [5, 5], scope='Conv_1_0c_1x1')
100                 with tf.variable_scope('Branch_2'):
101                     branch_2 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1')
102                     branch_2 = slim.conv2d(branch_2, depth(96), [3, 3], scope='Conv2d_0b_3x3')
103                     branch_2 = slim.conv2d(branch_2, depth(96), [3, 3], scope='Conv2d_0c_3x3')
104                 with tf.variable_scope('Branch_3'):
105                     branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
106                     branch_3 = slim.conv2d(branch_3, depth(64), [1, 1], scope='Conv2d_0b_1x1')
107                 net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
108             end_points[end_point] = net
109             if end_point == final_endpoint:
110                 return net, end_points
111 
112             # mixed_2: 35 × 35 × 288
113             end_point = 'Mixed_5d'
114             with tf.variable_scope(end_point):
115                 with tf.variable_scope('Branch_0'):
116                     branch_0 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1')
117                 with tf.variable_scope('Branch_1'):
118                     branch_1 = slim.conv2d(net, depth(48), [1, 1], scope='Conv2d_0a_1x1')
119                     branch_1 = slim.conv2d(branch_1, depth(64), [5, 5], scope='Conv2d_0b_5x5')
120                 with tf.variable_scope('Branch_2'):
121                     branch_2 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1')
122                     branch_2 = slim.conv2d(branch_2, depth(96), [3, 3], scope='Conv2d_0b_3x3')
123                     branch_2 = slim.conv2d(branch_2, depth(96), [3, 3], scope='Conv2d_0c_3x3')
124                 with tf.variable_scope('Branch_3'):
125                     branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
126                     branch_3 = slim.conv2d(branch_3, depth(64), [1, 1], scope='Conv2d_0b_1x1')
127                 net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
128             end_points[end_point] = net
129             if end_point == final_endpoint:
130                 return net, end_points
131 
132             # mixed_3: 17 × 17 × 768
133             end_point = 'Mixed_6a'
134             with tf.variable_scope(end_point):
135                 with tf.variable_scope('Branch_0'):
136                     branch_0 = slim.conv2d(net, depth(384), [3, 3], stride=2, padding='VALID', scope='Conv2d_1a_1x1')
137                 with tf.variable_scope('Branch_1'):
138                     branch_1 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1')
139                     branch_1 = slim.conv2d(branch_1, depth(96), [3, 3], scope='Conv2d_0b_3x3')
140                     branch_1 = slim.conv2d(branch_1, depth(96), [3, 3], stride=2, padding='VALID', scope='Conv2d_1a_1x1')
141                 with tf.variable_scope('Branch_2'):
142                     branch_2 = slim.max_pool2d(net, [3, 3], stride=2, padding='VALID', scope='MaxPool_1a_3x3')
143                 net = tf.concat([branch_0, branch_1, branch_2], 3)
144             end_points[end_point] = net
145             if end_point == final_endpoint:
146                 return net, end_points
147 
148             # mixed_4: 17 x 17 x 768.
149             end_point = 'Mixed_6b'
150             with tf.variable_scope(end_point):
151                 with tf.variable_scope('Branch_0'):
152                     branch_0 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1')
153                 with tf.variable_scope('Branch_1'):
154                     branch_1 = slim.conv2d(net, depth(128), [1, 1], scope='Conv2d_0a_1x1')
155                     branch_1 = slim.conv2d(branch_1, depth(128), [1, 7], scope='Conv2d_0b_1x7')  # 輸出層大小不變，即便過濾器長寬不一樣
156                     branch_1 = slim.conv2d(branch_1, depth(192), [7, 1], scope='Conv2d_0c_7x1')
157                 with tf.variable_scope('Branch_2'):
158                     branch_2 = slim.conv2d(net, depth(128), [1, 1], scope='Conv2d_0a_1x1')
159                     branch_2 = slim.conv2d(branch_2, depth(128), [7, 1], scope='Conv2d_0b_7x1')
160                     branch_2 = slim.conv2d(branch_2, depth(128), [1, 7], scope='Conv2d_0c_1x7')
161                     branch_2 = slim.conv2d(branch_2, depth(128), [7, 1], scope='Conv2d_0d_7x1')
162                     branch_2 = slim.conv2d(branch_2, depth(192), [1, 7], scope='Conv2d_0e_1x7')
163                 with tf.variable_scope('Branch_3'):
164                     branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
165                     branch_3 = slim.conv2d(branch_3, depth(192), [1, 1], scope='Conv2d_0b_1x1')
166                 net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
167             end_points[end_point] = net
168             if end_point == final_endpoint:
169                 return net, end_points
170 
171             # mixed_5: 17 x 17 x 768.
172             end_point = 'Mixed_6c'
173             with tf.variable_scope(end_point):
174                 with tf.variable_scope('Branch_0'):
175                     branch_0 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1')
176                 with tf.variable_scope('Branch_1'):
177                     branch_1 = slim.conv2d(net, depth(160), [1, 1], scope='Conv2d_0a_1x1')
178                     branch_1 = slim.conv2d(branch_1, depth(160), [1, 7], scope='Conv2d_0b_1x7')
179                     branch_1 = slim.conv2d(branch_1, depth(192), [7, 1], scope='Conv2d_0c_7x1')
180                 with tf.variable_scope('Branch_2'):
181                     branch_2 = slim.conv2d(net, depth(160), [1, 1], scope='Conv2d_0a_1x1')
182                     branch_2 = slim.conv2d(branch_2, depth(160), [7, 1], scope='Conv2d_0b_7x1')
183                     branch_2 = slim.conv2d(branch_2, depth(160), [1, 7], scope='Conv2d_0c_1x7')
184                     branch_2 = slim.conv2d(branch_2, depth(160), [7, 1], scope='Conv2d_0d_7x1')
185                     branch_2 = slim.conv2d(branch_2, depth(192), [1, 7], scope='Conv2d_0e_1x7')
186                 with tf.variable_scope('Branch_3'):
187                     branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
188                     branch_3 = slim.conv2d(branch_3, depth(192), [1, 1], scope='Conv2d_0b_1x1')
189                 net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
190             end_points[end_point] = net
191             if end_point == final_endpoint:
192                 return net, end_points
193 
194             # mixed_6: 17 x 17 x 768.
195             end_point = 'Mixed_6d'
196             with tf.variable_scope(end_point):
197                 with tf.variable_scope('Branch_0'):
198                     branch_0 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1')
199                 with tf.variable_scope('Branch_1'):
200                     branch_1 = slim.conv2d(net, depth(160), [1, 1], scope='Conv2d_0a_1x1')
201                     branch_1 = slim.conv2d(branch_1, depth(160), [1, 7], scope='Conv2d_0b_1x7')
202                     branch_1 = slim.conv2d(branch_1, depth(192), [7, 1], scope='Conv2d_0c_7x1')
203                 with tf.variable_scope('Branch_2'):
204                     branch_2 = slim.conv2d(net, depth(160), [1, 1], scope='Conv2d_0a_1x1')
205                     branch_2 = slim.conv2d(branch_2, depth(160), [7, 1], scope='Conv2d_0b_7x1')
206                     branch_2 = slim.conv2d(branch_2, depth(160), [1, 7], scope='Conv2d_0c_1x7')
207                     branch_2 = slim.conv2d(branch_2, depth(160), [7, 1], scope='Conv2d_0d_7x1')
208                     branch_2 = slim.conv2d(branch_2, depth(192), [1, 7], scope='Conv2d_0e_1x7')
209                 with tf.variable_scope('Branch_3'):
210                     branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
211                     branch_3 = slim.conv2d(branch_3, depth(192), [1, 1], scope='Conv2d_0b_1x1')
212                 net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
213             end_points[end_point] = net
214             if end_point == final_endpoint:
215                 return net, end_points
216 
217             # mixed_7: 17 x 17 x 768.
218             end_point = 'Mixed_6e'
219             with tf.variable_scope(end_point):
220                 with tf.variable_scope('Branch_0'):
221                     branch_0 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1')
222                 with tf.variable_scope('Branch_1'):
223                     branch_1 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1')
224                     branch_1 = slim.conv2d(branch_1, depth(192), [1, 7], scope='Conv2d_0b_1x7')
225                     branch_1 = slim.conv2d(branch_1, depth(192), [7, 1], scope='Conv2d_0c_7x1')
226                 with tf.variable_scope('Branch_2'):
227                     branch_2 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1')
228                     branch_2 = slim.conv2d(branch_2, depth(192), [7, 1], scope='Conv2d_0b_7x1')
229                     branch_2 = slim.conv2d(branch_2, depth(192), [1, 7], scope='Conv2d_0c_1x7')
230                     branch_2 = slim.conv2d(branch_2, depth(192), [7, 1], scope='Conv2d_0d_7x1')
231                     branch_2 = slim.conv2d(branch_2, depth(192), [1, 7], scope='Conv2d_0e_1x7')
232                 with tf.variable_scope('Branch_3'):
233                     branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
234                     branch_3 = slim.conv2d(branch_3, depth(192), [1, 1], scope='Conv2d_0b_1x1')
235                 net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
236             end_points[end_point] = net
237             if end_point == final_endpoint:
238                 return net, end_points
239 
240             # mixed_8: 8 x 8 x 1280.
241             end_point = 'Mixed_7a'
242             with tf.variable_scope(end_point):
243                 with tf.variable_scope('Branch_0'):
244                     branch_0 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1')
245                     branch_0 = slim.conv2d(branch_0, depth(320), [3, 3], stride=2, padding='VALID', scope='Conv2d_1a_3x3')
246                 with tf.variable_scope('Branch_1'):
247                     branch_1 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1')
248                     branch_1 = slim.conv2d(branch_1, depth(192), [1, 7], scope='Conv2d_0b_1x7')
249                     branch_1 = slim.conv2d(branch_1, depth(192), [7, 1], scope='Conv2d_0c_7x1')
250                     branch_1 = slim.conv2d(branch_1, depth(192), [3, 3], stride=2, padding='VALID', scope='Conv2d_1a_3x3')
251                 with tf.variable_scope('Branch_2'):
252                     branch_2 = slim.max_pool2d(net, [3, 3], stride=2, padding='VALID', scope='MaxPool_1a_3x3')
253                 net = tf.concat([branch_0, branch_1, branch_2], 3)
254             end_points[end_point] = net
255             if end_point == final_endpoint:
256                 return net, end_points
257 
258             # mixed_9: 8 x 8 x 2048.
259             end_point = 'Mixed_7b'
260             with tf.variable_scope(end_point):
261                 with tf.variable_scope('Branch_0'):
262                     branch_0 = slim.conv2d(net, depth(320), [1, 1], scope='Conv2d_0a_1x1')
263                 with tf.variable_scope('Branch_1'):
264                     branch_1 = slim.conv2d(net, depth(384), [1, 1], scope='Conv2d_0a_1x1')
265                     branch_1 = tf.concat(
266                         [
267                             slim.conv2d(branch_1, depth(384), [1, 3], scope='Conv2d_0b_1x3'),
268                             slim.conv2d(branch_1, depth(384), [3, 1], scope='Conv2d_0b_3x1')
269                         ],
270                         3)
271                 with tf.variable_scope('Branch_2'):
272                     branch_2 = slim.conv2d(net, depth(448), [1, 1], scope='Conv2d_0a_1x1')
273                     branch_2 = slim.conv2d(branch_2, depth(384), [3, 3], scope='Conv2d_0b_3x3')
274                     branch_2 = tf.concat(
275                         [
276                             slim.conv2d(branch_2, depth(384), [1, 3], scope='Conv2d_0c_1x3'),
277                             slim.conv2d(branch_2, depth(384), [3, 1], scope='Conv2d_0d_3x1')
278                         ],
279                         3)
280                 with tf.variable_scope('Branch_3'):
281                     branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
282                     branch_3 = slim.conv2d(branch_3, depth(192), [1, 1], scope='Conv2d_0b_1x1')
283                 net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
284             end_points[end_point] = net
285             if end_point == final_endpoint:
286                 return net, end_points
287 
288             # mixed_10: 8 x 8 x 2048.
289             end_point = 'Mixed_7c'
290             with tf.variable_scope(end_point):
291                 with tf.variable_scope('Branch_0'):
292                     branch_0 = slim.conv2d(net, depth(320), [1, 1], scope='Conv2d_0a_1x1')
293                 with tf.variable_scope('Branch_1'):
294                     branch_1 = slim.conv2d(net, depth(384), [1, 1], scope='Conv2d_0a_1x1')
295                     branch_1 = tf.concat(
296                         [
297                             slim.conv2d(branch_1, depth(384), [1, 3], scope='Conv2d_0b_1x3'),
298                             slim.conv2d(branch_1, depth(384), [3, 1], scope='Conv2d_0c_3x1')
299                         ],
300                         3)
301                 with tf.variable_scope('Branch_2'):
302                     branch_2 = slim.conv2d(net, depth(448), [1, 1], scope='Conv2d_0a_1x1')
303                     branch_2 = slim.conv2d(branch_2, depth(384), [3, 3], scope='Conv2d_0b_3x3')
304                     branch_2 = tf.concat(
305                         [
306                             slim.conv2d(branch_2, depth(384), [1, 3], scope='Conv2d_0c_1x3'),
307                             slim.conv2d(branch_2, depth(384), [3, 1], scope='Conv2d_0d_3x1')
308                         ],
309                         3)
310                 with tf.variable_scope('Branch_3'):
311                     branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
312                     branch_3 = slim.conv2d(branch_3, depth(192), [1, 1], scope='Conv2d_0b_1x1')
313                 net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
314             end_points[end_point] = net
315             if end_point == final_endpoint:
316                 return net, end_points
317         raise ValueError('Unknown final endpoint %s' % final_endpoint)
318 
319 
320 # 源文件中該函數放在了後面也能調用，why??
321 def _reduced_kernel_size_for_small_input(input_tensor, kernel_size):
322     '''
323     Define kernel size which is automatically reduced for small input.
324 
325     If the shape of the input images is unknown at graph construction time this
326     function assumes that the input images are is large enough.
327 
328     '''
329     shape = input_tensor.get_shape().as_list()  # [?, 5, 5, 128]
330     if shape[1] is None or shape[2] is None:
331         kernel_size_out = kernel_size
332     else:
333         kernel_size_out = [min(shape[1], kernel_size[0]), min(shape[2], kernel_size[1])]
334     return kernel_size_out
335 
336 
337 def incepiton_v3(inputs,
338                  num_classes=1000,
339                  is_training=True,
340                  dropout_keep_prob=0.8,
341                  min_depth=16,
342                  depth_multiplier=1.0,
343                  prediction_fn=slim.softmax,
344                  spatial_squeeze=True,
345                  reuse=None,
346                  scope='InceptionV3'):
347     if depth_multiplier <= 0:
348         raise ValueError('depth_multiplier is not greater than zero.')
349     depth = lambda d: max(int(d * depth_multiplier), min_depth)
350 
351     with tf.variable_scope(scope, 'InceptionV3', [inputs, num_classes], reuse=reuse) as scope:
352         with slim.arg_scope([slim.batch_norm, slim.dropout], is_training=is_training):
353             net, end_points = inception_v3_base(inputs, scope=scope, min_depth=min_depth, depth_multiplier=depth_multiplier)
354             # Auxiliary Head logits
355             # 這一部分是作啥的??
356             with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d],
357                                 stride=1,
358                                 padding='SAME'):
359                 aux_logits = end_points['Mixed_6e']  # mixed_7: 17 x 17 x 768
360                 with tf.variable_scope('AuxLogits'):
361                     # 5 × 5 × 768
362                     aux_logits = slim.avg_pool2d(aux_logits, [5, 5], stride=3, padding='VALID', scope='AvgPool_1a_5x5')
363                     # 5 × 5 × 128
364                     aux_logits = slim.conv2d(aux_logits, depth(128), [1, 1], scope='Conv2d_1b_1x1')
365 
366                     # shape of feature map before the final layer.
367                     kernel_size = _reduced_kernel_size_for_small_input(aux_logits, [5, 5])
368                     # 1 × 1 × 768  輸入層大小與過濾器尺寸相同，按照公式計算就沒問題
369                     aux_logits = slim.conv2d(aux_logits, depth(768), kernel_size, padding='VALID', weights_initializer=trunc_normal(0.01), scope='Conv2d_2a_{}x{}'.format(*kernel_size))
370                     # 1 × 1 × 1000
371                     aux_logits = slim.conv2d(aux_logits, num_classes, [1, 1], activation_fn=None, normalizer_fn=None, weights_initializer=trunc_normal(0.001), scope='Conv2d_2b_1x1')
372                     if spatial_squeeze:
373                         # (?, 1000)
374                         aux_logits = tf.squeeze(aux_logits, [1, 2], name='SpatialSqueeze')
375                     end_points['AuxLogits'] = aux_logits
376 
377             # final pooling and prediction
378             with tf.variable_scope('Logits'):
379                 kernel_size = _reduced_kernel_size_for_small_input(net, [8, 8])
380                 # 1 × 1 × 2048
381                 net = slim.avg_pool2d(net, kernel_size, padding='VALID', scope='AvgPool_1a_{}x{}'.format(*kernel_size))
382                 # 1 × 1 × 2048
383                 # 這裏竟然有一個dropout方法??
384                 net = slim.dropout(net, keep_prob=dropout_keep_prob, scope='Dropout_1b')
385                 end_points['Predictions'] = net
386                 slim.conv2d(net, num_classes, [1, 1], activation_fn=None, normalizer_fn=None, scope='Conv2d_1c_1x1')
387                 if spatial_squeeze:
388                     # (?, 2048)
389                     logits = tf.squeeze(net, [1, 2], name='SpatialSqueeze')
390             end_points['Logits'] = logits
391             end_points['Predictions'] = slim.softmax(logits, scope='Predictions')
392 
393     return logits, end_points
394 
395 
396 # 在遷移中，定義模型時會用到
397 def inception_v3_arg_scope(weight_decay=0.00004,
398                            batch_norm_var_collection='moving_vars',
399                            batch_norm_decay=0.9997,
400                            batch_norm_epsilon=0.001,
401                            updates_collections=tf.GraphKeys.UPDATE_OPS,
402                            use_fused_batchnorm=True):
403     """Defines the default InceptionV3 arg scope.
404     Returns:
405     An `arg_scope` to use for the inception v3 model.
406     """
407     batch_norm_params = {
408         # Decay for the moving averages.
409         'decay': batch_norm_decay,
410         # epsilon to prevent 0s in variance.
411         'epsilon': batch_norm_epsilon,
412         # collection containing update_ops.
413         'updates_collections': updates_collections,
414         # Use fused batch norm if possible.
415         'fused': use_fused_batchnorm,
416         # collection containing the moving mean and moving variance.
417         'variables_collections': {
418             'beta': None,
419             'gamma': None,
420             'moving_mean': [batch_norm_var_collection],
421             'moving_variance': [batch_norm_var_collection],
422         }
423     }
424 
425     # Set weight_decay for weights in Conv and FC layers.
426     with slim.arg_scope([slim.conv2d, slim.fully_connected], weights_regularizer=slim.l2_regularizer(weight_decay)):
427         with slim.arg_scope(
428                 [slim.conv2d],
429                 weights_initializer=slim.variance_scaling_initializer(),
430                 activation_fn=tf.nn.relu,
431                 normalizer_fn=slim.batch_norm,
432                 normalizer_params=batch_norm_params) as sc:
433             return sc
434 
435 
436 inputs = tf.placeholder(tf.float32, shape=[None, 299, 299, 3], name='X')
437 # inception_v3_base(inputs)
438 incepiton_v3(inputs)

輸入層爲299*299*3的三維矩陣。

6.5 卷積神經網絡遷移學習

遷移學習，就是將一個問題上訓練好的模型經過簡單的調整使其適用於一個新的問題。

利用 ImageNet 數據集上訓練好的 Inception-v3 模型來解決一個新的圖像分類問題。能夠保留訓練好的 Inception-v3 模型中全部卷積層的參數，只是替換最後一層全鏈接層。在最後這一層全鏈接層以前的網絡層稱之爲瓶頸層( bottleneck ) 。瓶頸層指的是一層。

通常來講，在數據足夠的狀況下，遷移學習的效果不如徹底從新訓練。

遷移學習處理

處理文件樣例，須要在2核8g上才能執行

 1 import os
 2 import glob
 3 import tensorflow as tf
 4 import numpy as np
 5 
 6 INPUT_DATA = '/home/yangxl/flower_photos'  # 輸入文件
 7 OUTPUT_DATA = '/home/yangxl/flower_processed_data.npy'  # 輸出文件
 8 
 9 VALIDATION_PERCENTAGE = 10
10 TEST_PERCENTAGE = 10
11 
12 def create_image_lists(sess, testing_percentage, validation_percentage):
13     sub_dirs = [x[0] for x in os.walk(INPUT_DATA)]  # 當前目錄和子目錄
14     # print(sub_dirs)
15     is_root_dir = True
16 
17     # 初始化各個數據集
18     training_images = []
19     training_labels = []
20     testing_images = []
21     testing_labels = []
22     validation_images = []
23     validation_labels = []
24     current_labels = 0
25 
26     # 讀取全部子目錄
27     for sub_dir in sub_dirs:
28         if is_root_dir:  # 把第一個排除了
29             is_root_dir = False
30             continue
31 
32         # 獲取一個子目錄中全部的圖片文件
33         extensions = ['jpg', 'jpeg', 'JPG', 'JPEG']
34         file_list = []
35         dir_name = os.path.basename(sub_dir)  # '/'最後面的部分
36         print(dir_name)
37         for extension in extensions:
38             file_glob = os.path.join(INPUT_DATA, dir_name, '*.' + extension)
39             file_list.extend(glob.glob(file_glob))  # glob.glob返回一個匹配該模式的列表, glob和os配合使用來操做文件
40         if not file_list:
41             continue
42 
43         # 處理圖片數據
44         for file_name in file_list:
45             image_raw_data = tf.gfile.GFile(file_name, 'rb').read()  # 二進制數據
46             image = tf.image.decode_jpeg(image_raw_data)  # tensor, dtype=uint8  333×500×3   色道0~255
47             if image.dtype != tf.float32:
48                 image = tf.image.convert_image_dtype(image, dtype=tf.float32)  # 色道值0～1
49             image = tf.image.resize_images(image, [299, 299])
50             image_value = sess.run(image)  # numpy.ndarray
51 
52             # 隨機劃分數據集
53             chance = np.random.randint(100)
54             if chance < validation_percentage:
55                 validation_images.append(image_value)
56                 validation_labels.append(current_labels)
57             elif chance < validation_percentage + testing_percentage:
58                 testing_images.append(image_value)
59                 testing_labels.append(current_labels)
60             else:
61                 training_images.append(image_value)
62                 training_labels.append(current_labels)
63         current_labels += 1
64 
65     # 將訓練數據隨機打亂以得到更好的訓練效果， 將數據打亂，但仍保持training_images和training_labels的對應關係。
66     state = np.random.get_state()
67     np.random.shuffle(training_images)
68     np.random.set_state(state)
69     np.random.shuffle(training_labels)
70 
71     print("it's time to return")
72     return np.asarray([training_images, training_labels,
73                        validation_images, validation_labels,
74                        testing_images, testing_labels])
75 
76 def main():
77     with tf.Session() as sess:
78         processed_data = create_image_lists(sess, TEST_PERCENTAGE, VALIDATION_PERCENTAGE)
79         # 經過numpy格式保存處理後的數據
80         np.save(OUTPUT_DATA, processed_data)
81 
82 if __name__ == '__main__':
83     main()

獲取交叉熵更簡便的方式：

tf.losses.softmax_cross_entropy(tf.one_hot(labels, N_CLASSES), logits, weights=1.0)
train_step = tf.train.RMSPropOptimizer(LEARNING_RATE).minimize(tf.losses.get_total_loss())

遷移學習示例，

  1 #!coding:utf8
  2 
  3 import tensorflow as tf
  4 import numpy as np
  5 import tensorflow.contrib.slim as slim
  6 
  7 # 加載inception-v3模型
  8 import tensorflow.contrib.slim.python.slim.nets.inception_v3 as inception_v3
  9 
 10 INPUT_DATA = '/home/yangxl/files/flower_processed_data.npy'
 11 
 12 TRAIN_FILE = '/home/yangxl/files/save_model'
 13 CKPT_FILE = '/home/yangxl/files/inception_v3.ckpt'
 14 
 15 LEARNING_RATE = 0.0001
 16 STEPS = 300
 17 BATCH = 32
 18 N_CLASSES = 5  # 5種花
 19 
 20 CHECKPOINT_EXCLUDE_SCOPES = 'InceptionV3/Logits,InceptionV3/AuxLogits'
 21 TRAINABLE_SCOPES = 'InceptionV3/Logits,InceptionV3/AuxLogits'
 22 
 23 # 獲取全部須要從訓練好的模型中加載的參數
 24 def get_tuned_variables():
 25     exclusions = [scope.strip() for scope in CHECKPOINT_EXCLUDE_SCOPES.split(',')]
 26     variables_to_restore = []
 27 
 28     # 過濾參數
 29     for var in slim.get_model_variables():  # 先定義了inception-v3模型，以後纔會有變量
 30         excluded = False
 31         for exclusion in exclusions:
 32             if var.op.name.startswith(exclusion):
 33                 excluded = True
 34                 break
 35         if not excluded:
 36             variables_to_restore.append(var)
 37     return variables_to_restore
 38 
 39 # 獲取全部須要訓練的變量列表
 40 def get_trainable_variables():
 41     scopes = [scope.strip() for scope in TRAINABLE_SCOPES.split(',')]
 42     variables_to_train = []
 43     for scope in scopes:
 44         variables = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope)  # 對scope進行正則匹配
 45         variables_to_train.append(variables)
 46     return variables_to_train
 47 
 48 def main(arg=None):
 49     processed_data = np.load(INPUT_DATA)
 50     training_images = processed_data[0]
 51     n_training_example = len(training_images)
 52     training_labels = processed_data[1]
 53     validation_images = processed_data[2]
 54     validation_labels = processed_data[3]
 55     testing_images = processed_data[4]
 56     testing_labels = processed_data[5]
 57     print('%d training examples, %s validation examples and %d tseting examples.' % (n_training_example, len(validation_labels), len(testing_labels)))
 58 
 59     images = tf.placeholder(tf.float32, [None, 299, 299, 3], name='input_images')
 60     labels = tf.placeholder(tf.int64, [None], name='labels')  # 5種花
 61 
 62     # 定義inception-v3模型，由於谷歌給出的只有模型參數取值，因此這裏須要在這個代碼中定義inception-v3的結構。
 63     with slim.arg_scope(inception_v3.inception_v3_arg_scope()):
 64         # inception_v3.inception_v3_arg_scope()是一個包含兩個鍵的字典。嵌套的arg_scope函數返回的字典會整合到一塊兒。
 65         # inception_v3.inception_v3函數裏的一些函數可能會使用字典中的參數。
 66         logits, _ = inception_v3.inception_v3(images, num_classes=N_CLASSES)
 67 
 68     # 獲取須要訓練的變量
 69     trainable_variables = get_trainable_variables()
 70     # print('==', len(trainable_variables), trainable_variables)
 71     '''
 72     [[<tf.Variable 'InceptionV3/Logits/Conv2d_1c_1x1/weights:0' shape=(1, 1, 2048, 5) dtype=float32_ref>, 
 73       <tf.Variable 'InceptionV3/Logits/Conv2d_1c_1x1/biases:0' shape=(5,) dtype=float32_ref>], 
 74      [<tf.Variable 'InceptionV3/AuxLogits/Conv2d_1b_1x1/weights:0' shape=(1, 1, 768, 128) dtype=float32_ref>, 
 75       <tf.Variable 'InceptionV3/AuxLogits/Conv2d_1b_1x1/BatchNorm/beta:0' shape=(128,) dtype=float32_ref>, 
 76       <tf.Variable 'InceptionV3/AuxLogits/Conv2d_2a_5x5/weights:0' shape=(5, 5, 128, 768) dtype=float32_ref>, 
 77       <tf.Variable 'InceptionV3/AuxLogits/Conv2d_2a_5x5/BatchNorm/beta:0' shape=(768,) dtype=float32_ref>, 
 78       <tf.Variable 'InceptionV3/AuxLogits/Conv2d_2b_1x1/weights:0' shape=(1, 1, 768, 5) dtype=float32_ref>, 
 79       <tf.Variable 'InceptionV3/AuxLogits/Conv2d_2b_1x1/biases:0' shape=(5,) dtype=float32_ref>]]
 80 
 81     '''
 82     # 定義損失函數。在模型定義的時候已經將正則化損失加入損失集合了。
 83     tf.losses.softmax_cross_entropy(tf.one_hot(labels, N_CLASSES), logits, weights=1.0)
 84 
 85     # 定義訓練過程
 86     train_step = tf.train.RMSPropOptimizer(LEARNING_RATE).minimize(tf.losses.get_total_loss())
 87 
 88     # 計算正確率
 89     with tf.name_scope('evaluation'):
 90         correct_prediction = tf.equal(tf.argmax(logits, 1), labels)
 91         evaluation_step = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
 92 
 93     # 定義加載模型的函數。返回一個回調函數callback，執行callback(sess)就會加載get_tuned_variables()變量列表到當前圖。
 94     load_fn = slim.assign_from_checkpoint_fn(CKPT_FILE, get_tuned_variables(), ignore_missing_vars=True)
 95 
 96     # 定義保存新的訓練好的模型的函數
 97     saver = tf.train.Saver()
 98 
 99     with tf.Session() as sess:
100         # 初始化沒有加載進來的變量。這個過程必定要在模型加載以前，不然初始化過程會將已經加載好的變量從新賦值。
101         tf.global_variables_initializer().run()
102         # 加載已經訓練好的模型
103         print('Loading tuned variables from %s' % CKPT_FILE)
104         load_fn(sess)
105 
106         start = 0
107         end = BATCH
108         for i in range(STEPS):
109             sess.run(train_step, feed_dict={
110                 images: training_images[start: end],
111                 labels: training_labels[start: end]
112             })
113 
114             # 輸出日誌
115             if i % 30 == 0 or i + 1 == STEPS:
116                 saver.save(sess, TRAIN_FILE, global_step=i)
117                 validation_accuracy = sess.run(evaluation_step, feed_dict={
118                     images: validation_images, labels: validation_labels
119                 })
120                 print('Step %d: Validation accuracy = %.1f%%' % (i, validation_accuracy * 100.0))
121 
122             start = end
123             if start == n_training_example:
124                 start = 0
125             end = start + BATCH
126             if end > n_training_example:
127                 end = n_training_example
128             test_accuracy = sess.run(evaluation_step, feed_dict={
129                 images: testing_images, labels: testing_labels
130             })
131             print('Final test accuracy = %.1f%%' % (test_accuracy * 100.0))
132 
133 if __name__ == '__main__':
134     tf.app.run()

執行過程：

代碼執行了12個小時，可是top命令中的TIME+顯示只有300多分鐘，why??

執行過程當中，`load average`至關高，可是進程的CPU、MEM使用率很低，多是CPU執行了內存和swap之間的調度，really??

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。