6.1 圖像識別問題簡介及經典數據集node
CIFAR 數據集就是一個影響力很大的圖像分類數據集。CIFAR數據集分爲了CIFAR-10 和 CIFAR-100 兩個問題,它們都是圖像詞典項目( Visual Dictionary ) 中 800 萬張圖片的一個子集。CIFAR數據集中的圖片爲32×32的彩色圖片。python
CIFAR-10 問題收集了來自 10 個不一樣種類的 60000 張圖片。每張圖片大小固定且僅含一個種類的實體。與MNIST相比,最大區別是圖片由黑白變成彩色,且分類難度更高。git
不管是 MNIST 數據集仍是 CIFAR 數據集,相比真實環境下的圖像識別問題,有 2 個最大的問題。第一,現實生活中的圖片分辨率要遠高於 32 × 32,並且圖像的分辨率也不會是固定的。第二,現實生活中的物體類別不少,不管是 10 種仍是 100 種都遠遠不夠,並且一張圖片中不會只出現一個種類的物體。正則表達式
ImageNet很大程度上解決了這兩個問題,更加貼近真實環境下的圖像識別問題。算法
ImageNet是一個基於WordNet的大型圖像數據庫。有將近1500萬圖片被關聯到了WordNet的大約2000個名詞同義詞集上。每個與ImageNet相關的WordNet同義詞集都表明了現實世界中的一個實體,能夠被認爲是分類問題中的一個類別。一張圖片中可能有多個同義詞集所表明的實體。數據庫
ILSVRC2012圖像分類數據集是ImageNet的子集,包含了來自1000個類別的120萬張圖片,其中每張圖片只屬於一個類別。圖片是直接從網上爬取的,因此圖片的大小從幾千字節到幾百萬字節不等。數組
top-N正確率是指圖像識別算法給出前N個答案中有一個是正確的機率。在圖像分類問題上,不少學術論文都將前N個答案的正確率做爲比較的方法,其中N的取值通常爲3或5。網絡
6.2 卷積神經網絡簡介架構
在全鏈接神經網絡中,每相鄰兩層之間的節點都有邊相連,因而通常會將每一層全鏈接層中的節點組織成一列,這樣方便顯示鏈接結構。而對於卷積神經網絡,相鄰兩層之間只有部分節點相連,通常會將每一層卷積層的節點組織成一個三維矩陣。雖然直觀上差別很大,實際上總體架構很是類似,並且輸入輸出、訓練流程也基本一致。兩者惟一的區別在於神經網絡中相鄰兩層的鏈接方式。app
使用全鏈接神經網絡處理圖像的最大問題在於全鏈接層的參數太多,參數增多除了致使計算速度減慢,還很容易致使過擬合問題。卷積神經網絡的目的就是爲了減小參數個數。
卷積神經網絡中前幾層中每一個節點只和上一層中的部分節點相連。
卷積神經網絡的五種結構:
1.輸入層:一張圖片的像素矩陣,長×寬×深度(色道)
2.卷積層:卷積層中每一個節點的輸入只是上一層神經網絡的一小塊,這個小塊的經常使用大小有3×3或5×5。卷積層試圖將神經網絡中的每一小塊進行更深刻地分析從而獲得抽象程度更高的特徵。通常來講,經過卷積層處理過的節點矩陣深度會增長。
3.池化層:不改變三維矩陣的深度,但能夠縮小矩陣的大小。池化操做能夠認爲是將一張分辨率較高的圖片轉化爲分辨率較低的圖片。經過池化層,能夠進一步縮小最後全鏈接層中節點的個數,從而達到減小整個神經網絡中參數的目的。
4.全鏈接層:通過幾輪卷積層和池化層的處理以後,能夠認爲圖像中的信息已經被抽象成了信息含量更高的特徵。在卷積層和池化層完成自動圖像特徵提取以後,仍然須要全鏈接層完成分類任務。
5.Softmax層:用於分類問題。經過Softmax層,能夠獲得當前樣例屬於不一樣種類的機率分佈狀況。
6.3 卷積神經網絡經常使用結構
卷積層神經網絡結構中最重要的部分是過濾器(filter)或者內核(kernel),過濾器能夠把當前層神經網絡上的一個子節點矩陣轉化爲下一層神經網絡上的一個單位節點矩陣,即長寬爲1,深度不限的節點矩陣。
過濾器所處理的節點矩陣的長和寬都是人工指定的,這個節點矩陣的尺寸也被稱爲過濾器的尺寸。經常使用尺寸有3×3或5×5。由於過濾器處理的矩陣深度和當前層神經網絡節點矩陣的深度是一致的,因此雖然節點矩陣是三維的,但過濾器的尺寸只需指定兩個維度。
過濾器中另一個須要人工指定的設置是處理獲得的單位節點矩陣的深度,稱爲過濾器的深度。
(局部)過濾器的前向傳播過程就是經過左側小矩陣中的節點計算出右側單位節點矩陣中節點的過程。與全鏈接層相似,也是權重和偏置項。如圖6-8
(總體)卷積層的前向傳播過程就是經過將一個過濾器從神經網絡當前層的左上角移動到右下角,而且在移動中計算每個對應的單位矩陣獲得的。如圖6-10
過濾器每移動一次,能夠計算出一個值(當深度爲 k 時會計算出 k 個值),將這些數值拼接成一個新的矩陣,就完成了卷積層前向傳播的過程。
當過濾器的大小不爲 1×1 時,卷積層前向傳播獲得的矩陣的尺寸要小於當前層矩陣的尺寸。
爲了不尺寸的變化,能夠在當前層矩陣的邊界上加入全0填充。如圖6-11
除了使用全0填充,還能夠經過設置過濾器移動的步長來調整結果矩陣的大小。圖6-12顯示了當移動步長爲2且使用全0填充時,卷積層前向傳播的過程。
輸出層大小的肯定:
使用全0填充時,向上取整
out_length =in_length / stride_length
out_width = in_width / stride_width
不使用全0填充時,向上取整
out_length = (in_length - filter_length +1) / stride_length
out_width = (in_width - filter_width + 1) / stride_width
卷積神經網絡有一個很是重要的性質就是每個卷積層中使用的過濾器中的參數相同,這可使得圖像上的內容不受位置的影響。以mnist手寫體數字識別爲例,不管數字「1」出如今左上角仍是右下角,圖片的種類都是不變的。並且,共享每一個卷積層中過濾器的參數能夠巨幅減小神經網絡上的參數。以CIFAR-10問題爲例,輸入層矩陣的維度爲32×32×3,假設卷積層使用的過濾器尺寸爲5×5,深度爲16,那麼這個卷積層的參數個數爲5*5*3*16+16=1216個(能夠想象爲輸入層爲5*5*三、輸出層爲16*1的全鏈接層)。並且,參數個數只與過濾器的尺寸、深度以及當前層節點矩陣的深度有關,而與圖片大小無關,這使得卷積神經網絡能夠很好地擴展到更大的圖像數據上。
經過tensorflow實現卷積層的前向傳播,
1 x = tf.placeholder(tf.float32, shape=[None, 32, 32, 3], name='x-input') 2 # shape分別爲過濾器尺寸、當前層深度、過濾器深度 3 filter_weight = tf.get_variable( 4 'weights', shape=[5, 5, 3, 16], initializer=tf.truncated_normal_initializer(stddev=0.1) 5 ) 6 biases = tf.get_variable( 7 'biases', shape=[16], initializer=tf.constant_initializer(0.1) # shape爲過濾器深度 8 ) 9 # 第一個輸入爲當前層的節點矩陣,該矩陣爲四維矩陣,第一個維度對應一個輸入batch, 後三個維度對應一個節點矩陣(長*寬*深) 10 # 第二個輸入爲卷積層的權重,也就是過濾器 11 # 第三個輸入爲不一樣維度上的步長,長度爲4的數組,要求第一維度和第四維度必定爲1, 由於卷積層的步長只對矩陣的長和寬有效 12 # 第四個輸入爲padding, 取值能夠爲SAME或VALID 13 conv = tf.nn.conv2d( 14 x, filter_weight, strides=[1, 1, 1, 1], padding='SAME' 15 ) 16 # print(conv.shape) # (?, 32, 32, 16) # 深度變成16, 根據公式,使用全0填充時爲32, 不使用時爲28 17 18 # 不能直接使用加法,由於矩陣上不一樣位置的節點都須要加上一樣的偏置項。 19 # 例如圖6-13所示, 雖然下一層神經網絡的大小爲 2×2, 可是偏置項只有一個數(由於深度爲1), 而2×2矩陣中的每個值都須要加上這個偏置項。 20 bias = tf.nn.bias_add(conv, biases) 21 actived_conv = tf.nn.relu(bias) 22 23 # 注意區分輸入的四個維度、權重的四個維度、步長的四個維度。
6.3.2 池化層
池化層主要用於減少矩陣的尺寸,從而減小最後全鏈接層中的參數。使用池化層既能夠加快計算速度也有防止過擬合問題的做用。
池化層的前向傳播過程也是經過移動一個相似過濾器的結構完成的。但池化層過濾器中的計算不是節點的加權和,而是採用更簡單的最大值或平均值運算。使用最大值操做的池化層稱爲最大池化層,使用平均值操做的池化層稱爲平均池化層。
與卷積層的過濾器相似,池化層的過濾器也須要人工設定過濾器的尺寸、是否使用全0填充以及過濾器移動的步長等。卷積層和池化層中過濾器的移動方式是類似的,惟一的區別在於卷積層使用的過濾器是橫跨整個深度的,而池化層使用的過濾器隻影響一個深度上的節點。因此池化層的過濾器除了在長和寬上移動,還須要在深度上移動。
卷積層,深度爲3,3個相加
池化層,深度爲2,分別處理
經過tensorflow實現池化層的前向傳播,
1 # 第一個輸入爲當前層節點矩陣(四維矩陣) 2 # 第二個爲過濾器尺寸,長度爲4的數組,第一維度和第四維度必須爲1,這意味着過濾器不可跨不一樣輸入樣例和節點矩陣深度,使用最多的是[1,2,2,1]或[1,3,3,1] # 與卷積層不一樣 3 # 第三個輸入爲步長,長度爲4的數組,第一維度和第四維度必須爲1,這意味着池化層不能減小節點矩陣的深度或輸入樣例的個數。 4 # 第四個輸入爲padding 5 pool = tf.nn.max_pool(actived_conv, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='SAME')
卷積層和池化層的最大不一樣在於過濾器:[5, 5, 3, 16] [1, 3, 3, 1]
6.4 經典卷積網絡模型
6.4.1 LeNet-5模型
第一個成功應用於數字識別問題的卷積神經網絡。
LetNet-5模型接受的輸入層大小爲三維矩陣(長×寬×深)。
參數個數遠遠小於鏈接個數,但卷積層的鏈接個數??沒搞懂爲啥還要加1。
只有全鏈接層的權重須要加入正則化。
relu和dropout不在最後一層使用。
1 # mnist_inference.py 2 3 import tensorflow as tf 4 5 IMAGE_SIZE = 28 6 NUM_CHANNELS = 1 # 黑白 7 NUM_LABELS = 10 8 9 # 第一層卷積層的尺寸和深度 10 CONV1_SIZE = 5 11 CONV1_DEEP = 32 12 # 第二層卷積層的尺寸和深度 13 CONV2_SIZE = 5 14 CONV2_DEEP = 64 15 # 全鏈接層的節點個數 16 FC_SIZE = 512 17 18 def get_weight_variable(shape, regularizer): 19 weights = tf.get_variable('weight', shape, initializer=tf.truncated_normal_initializer(stddev=0.1)) 20 if regularizer: 21 tf.add_to_collection('losses', regularizer(weights)) 22 return weights 23 24 def inference(input_tensor, train, regularizer): 25 with tf.variable_scope('layer1-conv1'): 26 # 輸入層爲28×28×1,尺寸爲5×5,深度爲32,步長爲1,輸出層爲28×28×32 27 conv1_weights = get_weight_variable([CONV1_SIZE, CONV1_SIZE, NUM_CHANNELS, CONV1_DEEP], None) 28 conv1_biases = tf.get_variable('bias', [CONV1_DEEP], initializer=tf.constant_initializer(0.0)) 29 conv1 = tf.nn.conv2d(input_tensor, conv1_weights, strides=[1, 1, 1, 1], padding='SAME') 30 relu1 = tf.nn.relu(tf.nn.bias_add(conv1, conv1_biases)) 31 32 with tf.name_scope('layer2-pool1'): 33 # 輸出層爲14*14*32 34 pool1 = tf.nn.max_pool(relu1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME') 35 36 with tf.variable_scope('layer3-conv2'): 37 # 尺寸爲5*5,深度爲64,輸出層爲14*14*64 38 conv2_weights = get_weight_variable([CONV2_SIZE, CONV2_SIZE, CONV1_DEEP, CONV2_DEEP], None) 39 conv2_biases = tf.get_variable('bias', [CONV2_DEEP], initializer=tf.constant_initializer(0.0)) 40 conv2 = tf.nn.conv2d(pool1, conv2_weights, strides=[1, 1, 1, 1], padding='SAME') 41 relu2 = tf.nn.relu(tf.nn.bias_add(conv2, conv2_biases)) 42 43 with tf.name_scope('layer4-pool2'): 44 # 輸出層爲7*7*64 45 pool2 = tf.nn.max_pool(relu2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME') 46 47 # 全鏈接層的輸入格式爲特徵向量,這就須要將三維矩陣拉直成一維向量。 48 pool_shape = pool2.get_shape().as_list() # 包含一個batch 49 nodes = pool_shape[1] * pool_shape[2] * pool_shape[3] # 3136 50 reshaped = tf.reshape(pool2, [pool_shape[0], nodes]) 51 52 # dropout在訓練時會隨機將部分節點的輸出改成0。dropout方法能夠進一步提高模型可靠性並防止過擬合,dropout過程只在訓練時使用。 53 with tf.variable_scope('layer5-fc1'): 54 # 只有全鏈接層的權重須要加入正則化 55 fc1_weights = get_weight_variable([nodes, FC_SIZE], regularizer) 56 fc1_biases = tf.get_variable('bias', shape=[FC_SIZE], initializer=tf.constant_initializer(0.1)) 57 fc1 = tf.nn.relu(tf.matmul(reshaped, fc1_weights) + fc1_biases) 58 if train: 59 fc1 = tf.nn.dropout(fc1, 0.5) 60 61 with tf.variable_scope('layer6-fc2'): 62 fc2_weights = get_weight_variable([FC_SIZE, NUM_LABELS], regularizer) 63 fc2_biases = tf.get_variable('bias', shape=[NUM_LABELS], initializer=tf.constant_initializer(0.1)) 64 logit = tf.matmul(fc1, fc2_weights) + fc2_biases 65 66 # relu和dropout不在最後一層使用。 後面會使用sparse_softmax_cross_entropy_with_logits計算交叉熵。 67 return logit 68 69 72 # mnist_train.py 73 74 #!coding:utf8 75 import tensorflow as tf 76 from tensorflow.examples.tutorials.mnist import input_data 77 import mnist_inference 78 import os 79 import numpy as np 80 81 BATCH_SIZE = 100 82 83 LEARNING_RATE_BASE = 0.8 84 LEARNING_RATE_DECAY = 0.99 85 REGULARIZATION_RATE = 0.0001 # lambda 86 TRAINING_STEPS = 30000 87 MOVING_AVERAGE_DACAY = 0.99 88 89 MODEL_SAVE_PATH = '/home/yangxl/files/save_model' 90 MODEL_NAME = 'conv2d.ckpt' 91 92 93 def train(mnist): 94 # 由於從池化層到全鏈接層要進行reshape,因此不能爲shape[0]不能爲None。 95 x = tf.placeholder(tf.float32, [BATCH_SIZE, mnist_inference.IMAGE_SIZE, mnist_inference.IMAGE_SIZE, mnist_inference.NUM_CHANNELS], 'x-input') 96 y_ = tf.placeholder(tf.float32, [BATCH_SIZE, mnist_inference.NUM_LABELS], 'y-input') 97 98 # 正則化 99 from tensorflow.contrib.layers import l2_regularizer 100 regularizer = l2_regularizer(REGULARIZATION_RATE) 101 102 y = mnist_inference.inference(x, True, regularizer) 103 104 global_step = tf.Variable(0, trainable=False) 105 106 # 滑動平均 107 variables_averages = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DACAY, global_step) 108 variables_averages_op = variables_averages.apply(tf.trainable_variables()) 109 # 互斥分類問題; 110 # 由於標準答案是一個長度爲10的一維數組,而該函數須要提供的是一個正確答案的數字,因此須要使用tf.argmax 函數來獲得正確答案對應的類別編號。 111 cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y, labels=tf.argmax(y_, 1)) 112 cross_entropy_mean = tf.reduce_mean(cross_entropy) 113 loss = cross_entropy_mean + tf.add_n(tf.get_collection('losses')) 114 115 learning_rate = tf.train.exponential_decay(LEARNING_RATE_BASE, global_step, mnist.train.num_examples / BATCH_SIZE, 116 LEARNING_RATE_DECAY, staircase=True) 117 train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step) 118 with tf.control_dependencies([train_step, variables_averages_op]): 119 train_op = tf.no_op(name='train') 120 121 saver = tf.train.Saver() 122 123 with tf.Session() as sess: 124 tf.global_variables_initializer().run() 125 126 for i in range(TRAINING_STEPS): 127 xs, ys = mnist.train.next_batch(BATCH_SIZE) # xs.shape=(100, 784) 128 reshaped_xs = np.reshape(xs, [BATCH_SIZE, mnist_inference.IMAGE_SIZE, mnist_inference.IMAGE_SIZE, mnist_inference.NUM_CHANNELS]) 129 _, loss_value, step = sess.run([train_op, loss, global_step], feed_dict={x: reshaped_xs, y_: ys}) 130 131 if i % 1000 == 0: 132 print('after %d training steps, loss on training batch is %g ' % (i, loss_value)) 133 saver.save(sess, os.path.join(MODEL_SAVE_PATH, MODEL_NAME), global_step=global_step) 134 135 136 def main(argv=None): 137 mnist = input_data.read_data_sets('/home/yangxl/files/mnist', one_hot=True) 138 import time 139 # print('start...', int(time.time())) 140 train(mnist) 141 # print(int(time.time())) 142 143 144 if __name__ == '__main__': 145 tf.app.run() 146 147 150 # mnist_eval.py 151 152 #!coding:utf8 153 import tensorflow as tf 154 from tensorflow.examples.tutorials.mnist import input_data 155 import mnist_inference 156 import mnist_train 157 import time 158 import numpy as np 159 160 # 每10秒加載一次最新模型,並在測試數據上測試最新模型的正確率。 161 EVAL_INTERVAL_SECS = 60 162 163 def evaluate(mnist): 164 x = tf.placeholder(tf.float32, [mnist.test.num_examples, mnist_inference.IMAGE_SIZE, mnist_inference.IMAGE_SIZE, mnist_inference.NUM_CHANNELS], 'x-input') 165 y_ = tf.placeholder(tf.float32, [mnist.test.num_examples, mnist_inference.NUM_LABELS], 'y-input') 166 167 y = mnist_inference.inference(x, False, None) 168 169 correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1)) 170 accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) 171 172 # 滑動平均 173 variables_averages = tf.train.ExponentialMovingAverage(mnist_train.MOVING_AVERAGE_DACAY) 174 variables_to_restore = variables_averages.variables_to_restore() 175 176 saver = tf.train.Saver(variables_to_restore) # 訓練時須要保存滑動平均模型,驗證時才能加載到。 177 178 while True: 179 with tf.Session() as sess: 180 reshape_x = np.reshape(mnist.test.images, [-1, 28, 28, 1]) 181 validate_feed = {x: reshape_x, y_: mnist.test.labels} 182 183 # 經過checkpoint文件自動找到目錄中最新模型的文件名 184 ckpt = tf.train.get_checkpoint_state(mnist_train.MODEL_SAVE_PATH) 185 if ckpt and ckpt.model_checkpoint_path: 186 saver.restore(sess, ckpt.model_checkpoint_path) 187 188 global_step = ckpt.model_checkpoint_path.split('/')[-1].split('-')[-1] 189 accuracy_score = sess.run(accuracy, feed_dict=validate_feed) 190 print('after %s training steps, validation accuracy = %g ' % (global_step, accuracy_score)) 191 else: 192 print('No checkpoint file found') 193 return 194 time.sleep(EVAL_INTERVAL_SECS) 195 196 197 def main(argv=None): 198 mnist = input_data.read_data_sets('/home/yangxl/files/mnist', one_hot=True) 199 print('start...') 200 evaluate(mnist) 201 202 203 if __name__ == '__main__': 204 tf.app.run()
代碼麼問題,損失函數不固定,準確率大約爲0.117,應該大約爲99.4%纔對啊。
一種卷積神經網絡架構不能解決全部問題。好比,LeNet-5模型就沒法很好地處理相似ImageNet這樣比較大的圖像數據集。
如下正則表達式公式總結了一些經典的用於圖片分類問題的卷積神經網絡架構:輸入層 --> (卷積層+ --> 池化層?)+ --> 全鏈接層+
大部分卷積神經網絡中通常最多連續使用三層卷積層。
在多輪卷積層和池化層以後,卷積神經網絡在輸出以前通常會通過1~2個全鏈接層。
在過濾器的深度上,大部分卷積神經網絡都採用逐層遞增的方式。
6.4.2 Inception-v3模型
LeNet-5模型中,不一樣卷積層經過串聯的方式鏈接在一塊兒,而Inception-v3模型中,inception結構是將不一樣卷積層經過並聯的方式結合在一塊兒。
在6.4.1中提到了一個卷積層可使用邊長爲一、3或5的過濾器,那麼如何在這些邊長中選擇呢?inception模型給出了一個方案,那就是同時使用全部不一樣尺寸的過濾器,而後再將獲得的矩陣拼接起來。
雖然過濾器的尺寸不一樣,但若是全部的過濾器都使用全0填充而且步長爲1,那麼前向傳播獲得的結果矩陣的長和寬都與輸入矩陣一致。這樣通過不一樣過濾器處理的結果矩陣能夠拼接成一個更深的矩陣。
Inception-v3 模型總共有46 層(圖中方框裏的層數),由 11 個(圖中方框) Inception 模塊組成。在 Inception-v3 模型中有 96 個卷積層。
inception-v3模型的代碼和slim庫,
1 import tensorflow as tf 2 import tensorflow.contrib.slim as slim 3 4 trunc_normal = lambda stddev: tf.truncated_normal_initializer(0.0, stddev) 5 6 def inception_v3_base(inputs, 7 final_endpoint='Mixed_7c', 8 min_depth=16, 9 depth_multiplier=1.0, 10 scope=None): 11 end_points = {} 12 13 if depth_multiplier <= 0: 14 raise ValueError('depth_multiplier is not greater than zero.') 15 depth = lambda d: max(int(d * depth_multiplier), min_depth) 16 17 with tf.variable_scope(scope, 'InceptionV3', [inputs]): 18 # arg_scope用於設置默認的參數取值 19 with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d], 20 stride=1, 21 padding='VALID'): 22 # 299 × 299 × 3 23 end_point = 'Conv2d_1a_3x3' # 字母數字下劃線,乘號用x代替 24 # 不使用全0填充 25 net = slim.conv2d(inputs, depth(32), [3, 3], stride=2, scope=end_point) 26 end_points[end_point] = net 27 if end_point == final_endpoint: 28 return net, end_points 29 # 149 × 149 × 32 30 end_point = 'Conv2d_2a_3x3' 31 # 不使用全0填充,步長爲1 32 net = slim.conv2d(net, depth(32), [3, 3], scope=end_point) 33 end_points[end_point] = net 34 if end_point == final_endpoint: 35 return net, end_points 36 # 147 × 147 × 32 37 end_point = 'Conv2d_2b_3x3' 38 net = slim.conv2d(net, depth(64), [3, 3], padding='SAME', scope=end_point) 39 end_points[end_point] = net 40 if end_point == final_endpoint: 41 return net, end_points 42 # 147 × 147 × 64 43 end_point = 'MaxPool_3a_3x3' 44 net = slim.max_pool2d(net, [3, 3], stride=2, scope=end_point) 45 end_points[end_point] = net 46 if end_point == final_endpoint: 47 return net, end_points 48 # 73 × 73 × 64 49 end_point = 'Conv2d_3b_1x1' 50 net = slim.conv2d(net, depth(80), [1, 1], scope=end_point) 51 end_points[end_point] = net 52 if end_point == final_endpoint: 53 return net, end_points 54 # 73 × 73 × 80 55 end_point = 'Conv2d_4a_3x3' 56 net = slim.conv2d(net, depth(192), [3, 3], scope=end_point) 57 end_points[end_point] = net 58 if end_point == final_endpoint: 59 return net, end_points 60 # 71 × 71 × 192 61 end_point = 'MaxPool_5a_3x3' 62 net = slim.max_pool2d(net, [3, 3], stride=2, scope=end_point) 63 end_points[end_point] = net 64 if end_point == final_endpoint: 65 return net, end_points 66 # 35 × 35 × 192 67 68 # Inception blocks 69 with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d], 70 stride=1, 71 padding='SAME'): 72 # mixed: 35 × 35 × 256 73 end_point = 'Mixed_5b' 74 with tf.variable_scope(end_point): 75 with tf.variable_scope('Branch_0'): 76 branch_0 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1') 77 with tf.variable_scope('Branch_1'): 78 branch_1 = slim.conv2d(net, depth(48), [1, 1], scope='Conv2d_0a_1x1') 79 branch_1 = slim.conv2d(branch_1, depth(64), [5, 5], scope='Conv2d_0b_1x1') 80 with tf.variable_scope('Branch_2'): 81 branch_2 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1') 82 branch_2 = slim.conv2d(branch_2, depth(96), [3, 3], scope='Conv2d_0b_1x1') 83 branch_2 = slim.conv2d(branch_2, depth(96), [3, 3], scope='Conv2d_0c_1x1') 84 with tf.variable_scope('Branch_3'): 85 branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3') 86 branch_3 = slim.conv2d(branch_3, depth(32), [1, 1], scope='Conv2d_0b_1x1') 87 net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3) 88 end_points[end_point] = net 89 if end_point == final_endpoint: 90 return net, end_points 91 92 # mixed_1: 35 × 35 × 288 93 end_point = 'Mixed_5c' 94 with tf.variable_scope(end_point): 95 with tf.variable_scope('Branch_0'): 96 branch_0 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1') 97 with tf.variable_scope('Branch_1'): 98 branch_1 = slim.conv2d(net, depth(48), [1, 1], scope='Conv2d_0b_1x1') 99 branch_1 = slim.conv2d(branch_1, depth(64), [5, 5], scope='Conv_1_0c_1x1') 100 with tf.variable_scope('Branch_2'): 101 branch_2 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1') 102 branch_2 = slim.conv2d(branch_2, depth(96), [3, 3], scope='Conv2d_0b_3x3') 103 branch_2 = slim.conv2d(branch_2, depth(96), [3, 3], scope='Conv2d_0c_3x3') 104 with tf.variable_scope('Branch_3'): 105 branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3') 106 branch_3 = slim.conv2d(branch_3, depth(64), [1, 1], scope='Conv2d_0b_1x1') 107 net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3) 108 end_points[end_point] = net 109 if end_point == final_endpoint: 110 return net, end_points 111 112 # mixed_2: 35 × 35 × 288 113 end_point = 'Mixed_5d' 114 with tf.variable_scope(end_point): 115 with tf.variable_scope('Branch_0'): 116 branch_0 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1') 117 with tf.variable_scope('Branch_1'): 118 branch_1 = slim.conv2d(net, depth(48), [1, 1], scope='Conv2d_0a_1x1') 119 branch_1 = slim.conv2d(branch_1, depth(64), [5, 5], scope='Conv2d_0b_5x5') 120 with tf.variable_scope('Branch_2'): 121 branch_2 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1') 122 branch_2 = slim.conv2d(branch_2, depth(96), [3, 3], scope='Conv2d_0b_3x3') 123 branch_2 = slim.conv2d(branch_2, depth(96), [3, 3], scope='Conv2d_0c_3x3') 124 with tf.variable_scope('Branch_3'): 125 branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3') 126 branch_3 = slim.conv2d(branch_3, depth(64), [1, 1], scope='Conv2d_0b_1x1') 127 net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3) 128 end_points[end_point] = net 129 if end_point == final_endpoint: 130 return net, end_points 131 132 # mixed_3: 17 × 17 × 768 133 end_point = 'Mixed_6a' 134 with tf.variable_scope(end_point): 135 with tf.variable_scope('Branch_0'): 136 branch_0 = slim.conv2d(net, depth(384), [3, 3], stride=2, padding='VALID', scope='Conv2d_1a_1x1') 137 with tf.variable_scope('Branch_1'): 138 branch_1 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1') 139 branch_1 = slim.conv2d(branch_1, depth(96), [3, 3], scope='Conv2d_0b_3x3') 140 branch_1 = slim.conv2d(branch_1, depth(96), [3, 3], stride=2, padding='VALID', scope='Conv2d_1a_1x1') 141 with tf.variable_scope('Branch_2'): 142 branch_2 = slim.max_pool2d(net, [3, 3], stride=2, padding='VALID', scope='MaxPool_1a_3x3') 143 net = tf.concat([branch_0, branch_1, branch_2], 3) 144 end_points[end_point] = net 145 if end_point == final_endpoint: 146 return net, end_points 147 148 # mixed_4: 17 x 17 x 768. 149 end_point = 'Mixed_6b' 150 with tf.variable_scope(end_point): 151 with tf.variable_scope('Branch_0'): 152 branch_0 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1') 153 with tf.variable_scope('Branch_1'): 154 branch_1 = slim.conv2d(net, depth(128), [1, 1], scope='Conv2d_0a_1x1') 155 branch_1 = slim.conv2d(branch_1, depth(128), [1, 7], scope='Conv2d_0b_1x7') # 輸出層大小不變,即便過濾器長寬不一樣 156 branch_1 = slim.conv2d(branch_1, depth(192), [7, 1], scope='Conv2d_0c_7x1') 157 with tf.variable_scope('Branch_2'): 158 branch_2 = slim.conv2d(net, depth(128), [1, 1], scope='Conv2d_0a_1x1') 159 branch_2 = slim.conv2d(branch_2, depth(128), [7, 1], scope='Conv2d_0b_7x1') 160 branch_2 = slim.conv2d(branch_2, depth(128), [1, 7], scope='Conv2d_0c_1x7') 161 branch_2 = slim.conv2d(branch_2, depth(128), [7, 1], scope='Conv2d_0d_7x1') 162 branch_2 = slim.conv2d(branch_2, depth(192), [1, 7], scope='Conv2d_0e_1x7') 163 with tf.variable_scope('Branch_3'): 164 branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3') 165 branch_3 = slim.conv2d(branch_3, depth(192), [1, 1], scope='Conv2d_0b_1x1') 166 net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3) 167 end_points[end_point] = net 168 if end_point == final_endpoint: 169 return net, end_points 170 171 # mixed_5: 17 x 17 x 768. 172 end_point = 'Mixed_6c' 173 with tf.variable_scope(end_point): 174 with tf.variable_scope('Branch_0'): 175 branch_0 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1') 176 with tf.variable_scope('Branch_1'): 177 branch_1 = slim.conv2d(net, depth(160), [1, 1], scope='Conv2d_0a_1x1') 178 branch_1 = slim.conv2d(branch_1, depth(160), [1, 7], scope='Conv2d_0b_1x7') 179 branch_1 = slim.conv2d(branch_1, depth(192), [7, 1], scope='Conv2d_0c_7x1') 180 with tf.variable_scope('Branch_2'): 181 branch_2 = slim.conv2d(net, depth(160), [1, 1], scope='Conv2d_0a_1x1') 182 branch_2 = slim.conv2d(branch_2, depth(160), [7, 1], scope='Conv2d_0b_7x1') 183 branch_2 = slim.conv2d(branch_2, depth(160), [1, 7], scope='Conv2d_0c_1x7') 184 branch_2 = slim.conv2d(branch_2, depth(160), [7, 1], scope='Conv2d_0d_7x1') 185 branch_2 = slim.conv2d(branch_2, depth(192), [1, 7], scope='Conv2d_0e_1x7') 186 with tf.variable_scope('Branch_3'): 187 branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3') 188 branch_3 = slim.conv2d(branch_3, depth(192), [1, 1], scope='Conv2d_0b_1x1') 189 net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3) 190 end_points[end_point] = net 191 if end_point == final_endpoint: 192 return net, end_points 193 194 # mixed_6: 17 x 17 x 768. 195 end_point = 'Mixed_6d' 196 with tf.variable_scope(end_point): 197 with tf.variable_scope('Branch_0'): 198 branch_0 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1') 199 with tf.variable_scope('Branch_1'): 200 branch_1 = slim.conv2d(net, depth(160), [1, 1], scope='Conv2d_0a_1x1') 201 branch_1 = slim.conv2d(branch_1, depth(160), [1, 7], scope='Conv2d_0b_1x7') 202 branch_1 = slim.conv2d(branch_1, depth(192), [7, 1], scope='Conv2d_0c_7x1') 203 with tf.variable_scope('Branch_2'): 204 branch_2 = slim.conv2d(net, depth(160), [1, 1], scope='Conv2d_0a_1x1') 205 branch_2 = slim.conv2d(branch_2, depth(160), [7, 1], scope='Conv2d_0b_7x1') 206 branch_2 = slim.conv2d(branch_2, depth(160), [1, 7], scope='Conv2d_0c_1x7') 207 branch_2 = slim.conv2d(branch_2, depth(160), [7, 1], scope='Conv2d_0d_7x1') 208 branch_2 = slim.conv2d(branch_2, depth(192), [1, 7], scope='Conv2d_0e_1x7') 209 with tf.variable_scope('Branch_3'): 210 branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3') 211 branch_3 = slim.conv2d(branch_3, depth(192), [1, 1], scope='Conv2d_0b_1x1') 212 net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3) 213 end_points[end_point] = net 214 if end_point == final_endpoint: 215 return net, end_points 216 217 # mixed_7: 17 x 17 x 768. 218 end_point = 'Mixed_6e' 219 with tf.variable_scope(end_point): 220 with tf.variable_scope('Branch_0'): 221 branch_0 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1') 222 with tf.variable_scope('Branch_1'): 223 branch_1 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1') 224 branch_1 = slim.conv2d(branch_1, depth(192), [1, 7], scope='Conv2d_0b_1x7') 225 branch_1 = slim.conv2d(branch_1, depth(192), [7, 1], scope='Conv2d_0c_7x1') 226 with tf.variable_scope('Branch_2'): 227 branch_2 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1') 228 branch_2 = slim.conv2d(branch_2, depth(192), [7, 1], scope='Conv2d_0b_7x1') 229 branch_2 = slim.conv2d(branch_2, depth(192), [1, 7], scope='Conv2d_0c_1x7') 230 branch_2 = slim.conv2d(branch_2, depth(192), [7, 1], scope='Conv2d_0d_7x1') 231 branch_2 = slim.conv2d(branch_2, depth(192), [1, 7], scope='Conv2d_0e_1x7') 232 with tf.variable_scope('Branch_3'): 233 branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3') 234 branch_3 = slim.conv2d(branch_3, depth(192), [1, 1], scope='Conv2d_0b_1x1') 235 net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3) 236 end_points[end_point] = net 237 if end_point == final_endpoint: 238 return net, end_points 239 240 # mixed_8: 8 x 8 x 1280. 241 end_point = 'Mixed_7a' 242 with tf.variable_scope(end_point): 243 with tf.variable_scope('Branch_0'): 244 branch_0 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1') 245 branch_0 = slim.conv2d(branch_0, depth(320), [3, 3], stride=2, padding='VALID', scope='Conv2d_1a_3x3') 246 with tf.variable_scope('Branch_1'): 247 branch_1 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1') 248 branch_1 = slim.conv2d(branch_1, depth(192), [1, 7], scope='Conv2d_0b_1x7') 249 branch_1 = slim.conv2d(branch_1, depth(192), [7, 1], scope='Conv2d_0c_7x1') 250 branch_1 = slim.conv2d(branch_1, depth(192), [3, 3], stride=2, padding='VALID', scope='Conv2d_1a_3x3') 251 with tf.variable_scope('Branch_2'): 252 branch_2 = slim.max_pool2d(net, [3, 3], stride=2, padding='VALID', scope='MaxPool_1a_3x3') 253 net = tf.concat([branch_0, branch_1, branch_2], 3) 254 end_points[end_point] = net 255 if end_point == final_endpoint: 256 return net, end_points 257 258 # mixed_9: 8 x 8 x 2048. 259 end_point = 'Mixed_7b' 260 with tf.variable_scope(end_point): 261 with tf.variable_scope('Branch_0'): 262 branch_0 = slim.conv2d(net, depth(320), [1, 1], scope='Conv2d_0a_1x1') 263 with tf.variable_scope('Branch_1'): 264 branch_1 = slim.conv2d(net, depth(384), [1, 1], scope='Conv2d_0a_1x1') 265 branch_1 = tf.concat( 266 [ 267 slim.conv2d(branch_1, depth(384), [1, 3], scope='Conv2d_0b_1x3'), 268 slim.conv2d(branch_1, depth(384), [3, 1], scope='Conv2d_0b_3x1') 269 ], 270 3) 271 with tf.variable_scope('Branch_2'): 272 branch_2 = slim.conv2d(net, depth(448), [1, 1], scope='Conv2d_0a_1x1') 273 branch_2 = slim.conv2d(branch_2, depth(384), [3, 3], scope='Conv2d_0b_3x3') 274 branch_2 = tf.concat( 275 [ 276 slim.conv2d(branch_2, depth(384), [1, 3], scope='Conv2d_0c_1x3'), 277 slim.conv2d(branch_2, depth(384), [3, 1], scope='Conv2d_0d_3x1') 278 ], 279 3) 280 with tf.variable_scope('Branch_3'): 281 branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3') 282 branch_3 = slim.conv2d(branch_3, depth(192), [1, 1], scope='Conv2d_0b_1x1') 283 net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3) 284 end_points[end_point] = net 285 if end_point == final_endpoint: 286 return net, end_points 287 288 # mixed_10: 8 x 8 x 2048. 289 end_point = 'Mixed_7c' 290 with tf.variable_scope(end_point): 291 with tf.variable_scope('Branch_0'): 292 branch_0 = slim.conv2d(net, depth(320), [1, 1], scope='Conv2d_0a_1x1') 293 with tf.variable_scope('Branch_1'): 294 branch_1 = slim.conv2d(net, depth(384), [1, 1], scope='Conv2d_0a_1x1') 295 branch_1 = tf.concat( 296 [ 297 slim.conv2d(branch_1, depth(384), [1, 3], scope='Conv2d_0b_1x3'), 298 slim.conv2d(branch_1, depth(384), [3, 1], scope='Conv2d_0c_3x1') 299 ], 300 3) 301 with tf.variable_scope('Branch_2'): 302 branch_2 = slim.conv2d(net, depth(448), [1, 1], scope='Conv2d_0a_1x1') 303 branch_2 = slim.conv2d(branch_2, depth(384), [3, 3], scope='Conv2d_0b_3x3') 304 branch_2 = tf.concat( 305 [ 306 slim.conv2d(branch_2, depth(384), [1, 3], scope='Conv2d_0c_1x3'), 307 slim.conv2d(branch_2, depth(384), [3, 1], scope='Conv2d_0d_3x1') 308 ], 309 3) 310 with tf.variable_scope('Branch_3'): 311 branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3') 312 branch_3 = slim.conv2d(branch_3, depth(192), [1, 1], scope='Conv2d_0b_1x1') 313 net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3) 314 end_points[end_point] = net 315 if end_point == final_endpoint: 316 return net, end_points 317 raise ValueError('Unknown final endpoint %s' % final_endpoint) 318 319 320 # 源文件中該函數放在了後面也能調用,why?? 321 def _reduced_kernel_size_for_small_input(input_tensor, kernel_size): 322 ''' 323 Define kernel size which is automatically reduced for small input. 324 325 If the shape of the input images is unknown at graph construction time this 326 function assumes that the input images are is large enough. 327 328 ''' 329 shape = input_tensor.get_shape().as_list() # [?, 5, 5, 128] 330 if shape[1] is None or shape[2] is None: 331 kernel_size_out = kernel_size 332 else: 333 kernel_size_out = [min(shape[1], kernel_size[0]), min(shape[2], kernel_size[1])] 334 return kernel_size_out 335 336 337 def incepiton_v3(inputs, 338 num_classes=1000, 339 is_training=True, 340 dropout_keep_prob=0.8, 341 min_depth=16, 342 depth_multiplier=1.0, 343 prediction_fn=slim.softmax, 344 spatial_squeeze=True, 345 reuse=None, 346 scope='InceptionV3'): 347 if depth_multiplier <= 0: 348 raise ValueError('depth_multiplier is not greater than zero.') 349 depth = lambda d: max(int(d * depth_multiplier), min_depth) 350 351 with tf.variable_scope(scope, 'InceptionV3', [inputs, num_classes], reuse=reuse) as scope: 352 with slim.arg_scope([slim.batch_norm, slim.dropout], is_training=is_training): 353 net, end_points = inception_v3_base(inputs, scope=scope, min_depth=min_depth, depth_multiplier=depth_multiplier) 354 # Auxiliary Head logits 355 # 這一部分是作啥的?? 356 with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d], 357 stride=1, 358 padding='SAME'): 359 aux_logits = end_points['Mixed_6e'] # mixed_7: 17 x 17 x 768 360 with tf.variable_scope('AuxLogits'): 361 # 5 × 5 × 768 362 aux_logits = slim.avg_pool2d(aux_logits, [5, 5], stride=3, padding='VALID', scope='AvgPool_1a_5x5') 363 # 5 × 5 × 128 364 aux_logits = slim.conv2d(aux_logits, depth(128), [1, 1], scope='Conv2d_1b_1x1') 365 366 # shape of feature map before the final layer. 367 kernel_size = _reduced_kernel_size_for_small_input(aux_logits, [5, 5]) 368 # 1 × 1 × 768 輸入層大小與過濾器尺寸相同,按照公式計算就沒問題 369 aux_logits = slim.conv2d(aux_logits, depth(768), kernel_size, padding='VALID', weights_initializer=trunc_normal(0.01), scope='Conv2d_2a_{}x{}'.format(*kernel_size)) 370 # 1 × 1 × 1000 371 aux_logits = slim.conv2d(aux_logits, num_classes, [1, 1], activation_fn=None, normalizer_fn=None, weights_initializer=trunc_normal(0.001), scope='Conv2d_2b_1x1') 372 if spatial_squeeze: 373 # (?, 1000) 374 aux_logits = tf.squeeze(aux_logits, [1, 2], name='SpatialSqueeze') 375 end_points['AuxLogits'] = aux_logits 376 377 # final pooling and prediction 378 with tf.variable_scope('Logits'): 379 kernel_size = _reduced_kernel_size_for_small_input(net, [8, 8]) 380 # 1 × 1 × 2048 381 net = slim.avg_pool2d(net, kernel_size, padding='VALID', scope='AvgPool_1a_{}x{}'.format(*kernel_size)) 382 # 1 × 1 × 2048 383 # 這裏竟然有一個dropout方法?? 384 net = slim.dropout(net, keep_prob=dropout_keep_prob, scope='Dropout_1b') 385 end_points['Predictions'] = net 386 slim.conv2d(net, num_classes, [1, 1], activation_fn=None, normalizer_fn=None, scope='Conv2d_1c_1x1') 387 if spatial_squeeze: 388 # (?, 2048) 389 logits = tf.squeeze(net, [1, 2], name='SpatialSqueeze') 390 end_points['Logits'] = logits 391 end_points['Predictions'] = slim.softmax(logits, scope='Predictions') 392 393 return logits, end_points 394 395 396 # 在遷移中,定義模型時會用到 397 def inception_v3_arg_scope(weight_decay=0.00004, 398 batch_norm_var_collection='moving_vars', 399 batch_norm_decay=0.9997, 400 batch_norm_epsilon=0.001, 401 updates_collections=tf.GraphKeys.UPDATE_OPS, 402 use_fused_batchnorm=True): 403 """Defines the default InceptionV3 arg scope. 404 Returns: 405 An `arg_scope` to use for the inception v3 model. 406 """ 407 batch_norm_params = { 408 # Decay for the moving averages. 409 'decay': batch_norm_decay, 410 # epsilon to prevent 0s in variance. 411 'epsilon': batch_norm_epsilon, 412 # collection containing update_ops. 413 'updates_collections': updates_collections, 414 # Use fused batch norm if possible. 415 'fused': use_fused_batchnorm, 416 # collection containing the moving mean and moving variance. 417 'variables_collections': { 418 'beta': None, 419 'gamma': None, 420 'moving_mean': [batch_norm_var_collection], 421 'moving_variance': [batch_norm_var_collection], 422 } 423 } 424 425 # Set weight_decay for weights in Conv and FC layers. 426 with slim.arg_scope([slim.conv2d, slim.fully_connected], weights_regularizer=slim.l2_regularizer(weight_decay)): 427 with slim.arg_scope( 428 [slim.conv2d], 429 weights_initializer=slim.variance_scaling_initializer(), 430 activation_fn=tf.nn.relu, 431 normalizer_fn=slim.batch_norm, 432 normalizer_params=batch_norm_params) as sc: 433 return sc 434 435 436 inputs = tf.placeholder(tf.float32, shape=[None, 299, 299, 3], name='X') 437 # inception_v3_base(inputs) 438 incepiton_v3(inputs)
輸入層爲299*299*3的三維矩陣。
6.5 卷積神經網絡遷移學習
遷移學習,就是將一個問題上訓練好的模型經過簡單的調整使其適用於一個新的問題。
利用 ImageNet 數據集上訓練好的 Inception-v3 模型來解決一個新的圖像分類問題 。能夠保留訓練好的 Inception-v3 模型中全部卷積層的參數,只是替換最後一層全鏈接層。在最後這一層全鏈接層以前的網絡層稱之爲瓶頸層( bottleneck ) 。瓶頸層指的是一層。
通常來講,在數據足夠的狀況下,遷移學習的效果不如徹底從新訓練。
遷移學習處理
處理文件樣例,須要在2核8g上才能執行
1 import os 2 import glob 3 import tensorflow as tf 4 import numpy as np 5 6 INPUT_DATA = '/home/yangxl/flower_photos' # 輸入文件 7 OUTPUT_DATA = '/home/yangxl/flower_processed_data.npy' # 輸出文件 8 9 VALIDATION_PERCENTAGE = 10 10 TEST_PERCENTAGE = 10 11 12 def create_image_lists(sess, testing_percentage, validation_percentage): 13 sub_dirs = [x[0] for x in os.walk(INPUT_DATA)] # 當前目錄和子目錄 14 # print(sub_dirs) 15 is_root_dir = True 16 17 # 初始化各個數據集 18 training_images = [] 19 training_labels = [] 20 testing_images = [] 21 testing_labels = [] 22 validation_images = [] 23 validation_labels = [] 24 current_labels = 0 25 26 # 讀取全部子目錄 27 for sub_dir in sub_dirs: 28 if is_root_dir: # 把第一個排除了 29 is_root_dir = False 30 continue 31 32 # 獲取一個子目錄中全部的圖片文件 33 extensions = ['jpg', 'jpeg', 'JPG', 'JPEG'] 34 file_list = [] 35 dir_name = os.path.basename(sub_dir) # '/'最後面的部分 36 print(dir_name) 37 for extension in extensions: 38 file_glob = os.path.join(INPUT_DATA, dir_name, '*.' + extension) 39 file_list.extend(glob.glob(file_glob)) # glob.glob返回一個匹配該模式的列表, glob和os配合使用來操做文件 40 if not file_list: 41 continue 42 43 # 處理圖片數據 44 for file_name in file_list: 45 image_raw_data = tf.gfile.GFile(file_name, 'rb').read() # 二進制數據 46 image = tf.image.decode_jpeg(image_raw_data) # tensor, dtype=uint8 333×500×3 色道0~255 47 if image.dtype != tf.float32: 48 image = tf.image.convert_image_dtype(image, dtype=tf.float32) # 色道值0~1 49 image = tf.image.resize_images(image, [299, 299]) 50 image_value = sess.run(image) # numpy.ndarray 51 52 # 隨機劃分數據集 53 chance = np.random.randint(100) 54 if chance < validation_percentage: 55 validation_images.append(image_value) 56 validation_labels.append(current_labels) 57 elif chance < validation_percentage + testing_percentage: 58 testing_images.append(image_value) 59 testing_labels.append(current_labels) 60 else: 61 training_images.append(image_value) 62 training_labels.append(current_labels) 63 current_labels += 1 64 65 # 將訓練數據隨機打亂以得到更好的訓練效果, 將數據打亂,但仍保持training_images和training_labels的對應關係。 66 state = np.random.get_state() 67 np.random.shuffle(training_images) 68 np.random.set_state(state) 69 np.random.shuffle(training_labels) 70 71 print("it's time to return") 72 return np.asarray([training_images, training_labels, 73 validation_images, validation_labels, 74 testing_images, testing_labels]) 75 76 def main(): 77 with tf.Session() as sess: 78 processed_data = create_image_lists(sess, TEST_PERCENTAGE, VALIDATION_PERCENTAGE) 79 # 經過numpy格式保存處理後的數據 80 np.save(OUTPUT_DATA, processed_data) 81 82 if __name__ == '__main__': 83 main()
獲取交叉熵更簡便的方式:
tf.losses.softmax_cross_entropy(tf.one_hot(labels, N_CLASSES), logits, weights=1.0)
train_step = tf.train.RMSPropOptimizer(LEARNING_RATE).minimize(tf.losses.get_total_loss())
遷移學習示例,
1 #!coding:utf8 2 3 import tensorflow as tf 4 import numpy as np 5 import tensorflow.contrib.slim as slim 6 7 # 加載inception-v3模型 8 import tensorflow.contrib.slim.python.slim.nets.inception_v3 as inception_v3 9 10 INPUT_DATA = '/home/yangxl/files/flower_processed_data.npy' 11 12 TRAIN_FILE = '/home/yangxl/files/save_model' 13 CKPT_FILE = '/home/yangxl/files/inception_v3.ckpt' 14 15 LEARNING_RATE = 0.0001 16 STEPS = 300 17 BATCH = 32 18 N_CLASSES = 5 # 5種花 19 20 CHECKPOINT_EXCLUDE_SCOPES = 'InceptionV3/Logits,InceptionV3/AuxLogits' 21 TRAINABLE_SCOPES = 'InceptionV3/Logits,InceptionV3/AuxLogits' 22 23 # 獲取全部須要從訓練好的模型中加載的參數 24 def get_tuned_variables(): 25 exclusions = [scope.strip() for scope in CHECKPOINT_EXCLUDE_SCOPES.split(',')] 26 variables_to_restore = [] 27 28 # 過濾參數 29 for var in slim.get_model_variables(): # 先定義了inception-v3模型,以後纔會有變量 30 excluded = False 31 for exclusion in exclusions: 32 if var.op.name.startswith(exclusion): 33 excluded = True 34 break 35 if not excluded: 36 variables_to_restore.append(var) 37 return variables_to_restore 38 39 # 獲取全部須要訓練的變量列表 40 def get_trainable_variables(): 41 scopes = [scope.strip() for scope in TRAINABLE_SCOPES.split(',')] 42 variables_to_train = [] 43 for scope in scopes: 44 variables = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope) # 對scope進行正則匹配 45 variables_to_train.append(variables) 46 return variables_to_train 47 48 def main(arg=None): 49 processed_data = np.load(INPUT_DATA) 50 training_images = processed_data[0] 51 n_training_example = len(training_images) 52 training_labels = processed_data[1] 53 validation_images = processed_data[2] 54 validation_labels = processed_data[3] 55 testing_images = processed_data[4] 56 testing_labels = processed_data[5] 57 print('%d training examples, %s validation examples and %d tseting examples.' % (n_training_example, len(validation_labels), len(testing_labels))) 58 59 images = tf.placeholder(tf.float32, [None, 299, 299, 3], name='input_images') 60 labels = tf.placeholder(tf.int64, [None], name='labels') # 5種花 61 62 # 定義inception-v3模型,由於谷歌給出的只有模型參數取值,因此這裏須要在這個代碼中定義inception-v3的結構。 63 with slim.arg_scope(inception_v3.inception_v3_arg_scope()): 64 # inception_v3.inception_v3_arg_scope()是一個包含兩個鍵的字典。嵌套的arg_scope函數返回的字典會整合到一塊兒。 65 # inception_v3.inception_v3函數裏的一些函數可能會使用字典中的參數。 66 logits, _ = inception_v3.inception_v3(images, num_classes=N_CLASSES) 67 68 # 獲取須要訓練的變量 69 trainable_variables = get_trainable_variables() 70 # print('==', len(trainable_variables), trainable_variables) 71 ''' 72 [[<tf.Variable 'InceptionV3/Logits/Conv2d_1c_1x1/weights:0' shape=(1, 1, 2048, 5) dtype=float32_ref>, 73 <tf.Variable 'InceptionV3/Logits/Conv2d_1c_1x1/biases:0' shape=(5,) dtype=float32_ref>], 74 [<tf.Variable 'InceptionV3/AuxLogits/Conv2d_1b_1x1/weights:0' shape=(1, 1, 768, 128) dtype=float32_ref>, 75 <tf.Variable 'InceptionV3/AuxLogits/Conv2d_1b_1x1/BatchNorm/beta:0' shape=(128,) dtype=float32_ref>, 76 <tf.Variable 'InceptionV3/AuxLogits/Conv2d_2a_5x5/weights:0' shape=(5, 5, 128, 768) dtype=float32_ref>, 77 <tf.Variable 'InceptionV3/AuxLogits/Conv2d_2a_5x5/BatchNorm/beta:0' shape=(768,) dtype=float32_ref>, 78 <tf.Variable 'InceptionV3/AuxLogits/Conv2d_2b_1x1/weights:0' shape=(1, 1, 768, 5) dtype=float32_ref>, 79 <tf.Variable 'InceptionV3/AuxLogits/Conv2d_2b_1x1/biases:0' shape=(5,) dtype=float32_ref>]] 80 81 ''' 82 # 定義損失函數。在模型定義的時候已經將正則化損失加入損失集合了。 83 tf.losses.softmax_cross_entropy(tf.one_hot(labels, N_CLASSES), logits, weights=1.0) 84 85 # 定義訓練過程 86 train_step = tf.train.RMSPropOptimizer(LEARNING_RATE).minimize(tf.losses.get_total_loss()) 87 88 # 計算正確率 89 with tf.name_scope('evaluation'): 90 correct_prediction = tf.equal(tf.argmax(logits, 1), labels) 91 evaluation_step = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) 92 93 # 定義加載模型的函數。返回一個回調函數callback,執行callback(sess)就會加載get_tuned_variables()變量列表到當前圖。 94 load_fn = slim.assign_from_checkpoint_fn(CKPT_FILE, get_tuned_variables(), ignore_missing_vars=True) 95 96 # 定義保存新的訓練好的模型的函數 97 saver = tf.train.Saver() 98 99 with tf.Session() as sess: 100 # 初始化沒有加載進來的變量。這個過程必定要在模型加載以前,不然初始化過程會將已經加載好的變量從新賦值。 101 tf.global_variables_initializer().run() 102 # 加載已經訓練好的模型 103 print('Loading tuned variables from %s' % CKPT_FILE) 104 load_fn(sess) 105 106 start = 0 107 end = BATCH 108 for i in range(STEPS): 109 sess.run(train_step, feed_dict={ 110 images: training_images[start: end], 111 labels: training_labels[start: end] 112 }) 113 114 # 輸出日誌 115 if i % 30 == 0 or i + 1 == STEPS: 116 saver.save(sess, TRAIN_FILE, global_step=i) 117 validation_accuracy = sess.run(evaluation_step, feed_dict={ 118 images: validation_images, labels: validation_labels 119 }) 120 print('Step %d: Validation accuracy = %.1f%%' % (i, validation_accuracy * 100.0)) 121 122 start = end 123 if start == n_training_example: 124 start = 0 125 end = start + BATCH 126 if end > n_training_example: 127 end = n_training_example 128 test_accuracy = sess.run(evaluation_step, feed_dict={ 129 images: testing_images, labels: testing_labels 130 }) 131 print('Final test accuracy = %.1f%%' % (test_accuracy * 100.0)) 132 133 if __name__ == '__main__': 134 tf.app.run()
執行過程:
代碼執行了12個小時,可是top命令中的TIME+顯示只有300多分鐘,why??
執行過程當中,`load average`至關高,可是進程的CPU、MEM使用率很低,多是CPU執行了內存和swap之間的調度,really??