近些年來,隨着以卷積神經網絡(CNN)爲表明的深度學習在圖像識別領域的突破,愈來愈多的圖像識別算法不斷涌現。在去年,咱們初步成功嘗試了圖像識別在測試領域的應用:將網站樣式錯亂問題、無線領域機型適配問題轉換爲「特定場景下的正常圖片和異常圖片的二分類問題」,並藉助Goolge開源的Inception V3網絡進行遷移學習,重訓練出對應場景下的圖片分類模型,問題圖片的準確率達到95%以上。node
過去一年,咱們在圖片智能識別作的主要工做包括:git
本篇文章主要是對模型重訓練的源碼進行學習和分析,加深對模型訓練過程的理解,以便後續在對模型訓練過程進行調整時有的放矢。github
這邊對遷移學習作個簡單解釋:圖像識別每每包含數以百萬計的參數,從頭訓練須要大量打好標籤的圖片,還須要大量的計算力(每每數百小時的GPU時間)。對此,遷移學習是一個捷徑,它能夠在已經訓練好的類似工做模型基礎上,繼續訓練新的模型。算法
目前咱們使用的圖像智能服務,對於遷移學習的代碼,是參考的開源代碼 github: tensorflow/hub/image_retraining/retrain.py。docker
下面是對源碼的學習和解讀:數據庫
if __name__ == '__main__': parser = argparse.ArgumentParser() parser.add_argument( '--image_dir', type=str, default='', help='Path to folders of labeled images.' ) parser.add_argument( '--output_graph', type=str, default='/tmp/output_graph.pb', help='Where to save the trained graph.' ) ......省略...... parser.add_argument( '--logging_verbosity', type=str, default='INFO', choices=['DEBUG', 'INFO', 'WARN', 'ERROR', 'FATAL'], help='How much logging output should be produced.') FLAGS, unparsed = parser.parse_known_args() tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
能夠看到,程序main入口主要是對輸入參數的聲明和解析,實際執行時傳入的參數會存入到FLAGS變量中,而後執行tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
開始正式訓練。緩存
def main(_): # Needed to make sure the logging output is visible. # See https://github.com/tensorflow/tensorflow/issues/3047 ## 設置log級別 logging_verbosity = logging_level_verbosity(FLAGS.logging_verbosity) tf.logging.set_verbosity(logging_verbosity) ## 判斷image_dir參數是否傳入,該參數表示用於訓練的圖片集路徑 if not FLAGS.image_dir: tf.logging.error('Must set flag --image_dir.') return -1 # Prepare necessary directories that can be used during training ## 重建summaries_dir,並確保intermediate_output_graphs_dir存在 prepare_file_system() # Look at the folder structure, and create lists of all the images. ## 根據輸入的圖片集路徑、測試圖片佔比、驗證圖片佔比來劃分輸入的圖集,將圖集劃分爲訓練集、測試集、驗證集 image_lists = create_image_lists(FLAGS.image_dir, FLAGS.testing_percentage, FLAGS.validation_percentage) ## 根據image_dir下的子目錄個數,判斷要分類的數量。每一個子目錄爲一個類別,每一個類別會各自分爲訓練集、測試集、驗證集。若是類別數爲0或1,則返回錯誤,由於分類問題至少要有2個類。 class_count = len(image_lists.keys()) if class_count == 0: tf.logging.error('No valid folders of images found at ' + FLAGS.image_dir) return -1 if class_count == 1: tf.logging.error('Only one valid folder of images found at ' + FLAGS.image_dir + ' - multiple classes are needed for classification.') return -1 # See if the command-line flags mean we're applying any distortions. ## 根據傳入的參數判斷是否要對圖片進行一些調整 do_distort_images = should_distort_images( FLAGS.flip_left_right, FLAGS.random_crop, FLAGS.random_scale, FLAGS.random_brightness) # Set up the pre-trained graph. ## 載入module,默認使用inception v3,能夠用參數--tfhub_module調整爲使用其餘已訓練的模型 module_spec = hub.load_module_spec(FLAGS.tfhub_module) ## 建立模型圖graph graph, bottleneck_tensor, resized_image_tensor, wants_quantization = ( create_module_graph(module_spec)) # Add the new layer that we'll be training. ## 調用add_final_retrain_ops方法得到訓練步驟、交叉熵、瓶頸輸入、真實的輸入、最終的tensor with graph.as_default(): (train_step, cross_entropy, bottleneck_input, ground_truth_input, final_tensor) = add_final_retrain_ops( class_count, FLAGS.final_tensor_name, bottleneck_tensor, wants_quantization, is_training=True) with tf.Session(graph=graph) as sess: # Initialize all weights: for the module to their pretrained values, # and for the newly added retraining layer to random initial values. ## 初始化變量 init = tf.global_variables_initializer() sess.run(init) # Set up the image decoding sub-graph. ## 調用圖片解碼操做的函數得到輸入的圖片tensor和解碼後的圖片tensor jpeg_data_tensor, decoded_image_tensor = add_jpeg_decoding(module_spec) if do_distort_images: # We will be applying distortions, so set up the operations we'll need. (distorted_jpeg_data_tensor, distorted_image_tensor) = add_input_distortions( FLAGS.flip_left_right, FLAGS.random_crop, FLAGS.random_scale, FLAGS.random_brightness, module_spec) else: # We'll make sure we've calculated the 'bottleneck' image summaries and # cached them on disk. ## 建立各個image的bottlenecks,並緩存到磁盤disk cache_bottlenecks(sess, image_lists, FLAGS.image_dir, FLAGS.bottleneck_dir, jpeg_data_tensor, decoded_image_tensor, resized_image_tensor, bottleneck_tensor, FLAGS.tfhub_module) # Create the operations we need to evaluate the accuracy of our new layer. ## 建立評估的operation evaluation_step, _ = add_evaluation_step(final_tensor, ground_truth_input) # Merge all the summaries and write them out to the summaries_dir ## 將summary merge並寫到summaries_dir目錄下 merged = tf.summary.merge_all() train_writer = tf.summary.FileWriter(FLAGS.summaries_dir + '/train', sess.graph) validation_writer = tf.summary.FileWriter( FLAGS.summaries_dir + '/validation') # Create a train saver that is used to restore values into an eval graph # when exporting models. train_saver = tf.train.Saver() # Run the training for as many cycles as requested on the command line. ## 根據傳入的迭代次數,開始訓練 for i in range(FLAGS.how_many_training_steps): # Get a batch of input bottleneck values, either calculated fresh every # time with distortions applied, or from the cache stored on disk. if do_distort_images: (train_bottlenecks, train_ground_truth) = get_random_distorted_bottlenecks( sess, image_lists, FLAGS.train_batch_size, 'training', FLAGS.image_dir, distorted_jpeg_data_tensor, distorted_image_tensor, resized_image_tensor, bottleneck_tensor) else: ## 獲取用於training的圖片bottlenecks值,默認train_batch_size=100,即每次迭代會批量取100張圖片進行訓練 (train_bottlenecks, train_ground_truth, _) = get_random_cached_bottlenecks( sess, image_lists, FLAGS.train_batch_size, 'training', FLAGS.bottleneck_dir, FLAGS.image_dir, jpeg_data_tensor, decoded_image_tensor, resized_image_tensor, bottleneck_tensor, FLAGS.tfhub_module) # Feed the bottlenecks and ground truth into the graph, and run a training # step. Capture training summaries for TensorBoard with the `merged` op. ## 執行merge操做,並用feed_dict的內容填充placeholder train_summary, _ = sess.run( [merged, train_step], feed_dict={bottleneck_input: train_bottlenecks, ground_truth_input: train_ground_truth}) train_writer.add_summary(train_summary, i) # Every so often, print out how well the graph is training. ## 判斷是否最後一步訓練 is_last_step = (i + 1 == FLAGS.how_many_training_steps) ## 默認eval_step_interval=10,即每訓練10次或訓練所有完成,打印一下當前的訓練結果 if (i % FLAGS.eval_step_interval) == 0 or is_last_step: ## 打印訓練精確度和交叉熵 train_accuracy, cross_entropy_value = sess.run( [evaluation_step, cross_entropy], feed_dict={bottleneck_input: train_bottlenecks, ground_truth_input: train_ground_truth}) tf.logging.info('%s: Step %d: Train accuracy = %.1f%%' % (datetime.now(), i, train_accuracy * 100)) tf.logging.info('%s: Step %d: Cross entropy = %f' % (datetime.now(), i, cross_entropy_value)) # TODO: Make this use an eval graph, to avoid quantization # moving averages being updated by the validation set, though in # practice this makes a negligable difference. ## 獲取驗證集的圖片的bottleneck值,也是每批次取100 validation_bottlenecks, validation_ground_truth, _ = ( get_random_cached_bottlenecks( sess, image_lists, FLAGS.validation_batch_size, 'validation', FLAGS.bottleneck_dir, FLAGS.image_dir, jpeg_data_tensor, decoded_image_tensor, resized_image_tensor, bottleneck_tensor, FLAGS.tfhub_module)) # Run a validation step and capture training summaries for TensorBoard # with the `merged` op. validation_summary, validation_accuracy = sess.run( [merged, evaluation_step], feed_dict={bottleneck_input: validation_bottlenecks, ground_truth_input: validation_ground_truth}) validation_writer.add_summary(validation_summary, i) ## 打印驗證集的測試精確度和測試的圖片數 tf.logging.info('%s: Step %d: Validation accuracy = %.1f%% (N=%d)' % (datetime.now(), i, validation_accuracy * 100, len(validation_bottlenecks))) # Store intermediate results ## 存儲瞬時結果 intermediate_frequency = FLAGS.intermediate_store_frequency if (intermediate_frequency > 0 and (i % intermediate_frequency == 0) and i > 0): # If we want to do an intermediate save, save a checkpoint of the train # graph, to restore into the eval graph. train_saver.save(sess, CHECKPOINT_NAME) intermediate_file_name = (FLAGS.intermediate_output_graphs_dir + 'intermediate_' + str(i) + '.pb') tf.logging.info('Save intermediate result to : ' + intermediate_file_name) save_graph_to_file(intermediate_file_name, module_spec, class_count) # After training is complete, force one last save of the train checkpoint. train_saver.save(sess, CHECKPOINT_NAME) # We've completed all our training, so run a final test evaluation on # some new images we haven't used before. ## 執行最終的評估 run_final_eval(sess, module_spec, class_count, image_lists, jpeg_data_tensor, decoded_image_tensor, resized_image_tensor, bottleneck_tensor) # Write out the trained graph and labels with the weights stored as # constants. tf.logging.info('Save final result to : ' + FLAGS.output_graph) if wants_quantization: tf.logging.info('The model is instrumented for quantization with TF-Lite') save_graph_to_file(FLAGS.output_graph, module_spec, class_count) with tf.gfile.GFile(FLAGS.output_labels, 'w') as f: f.write('\n'.join(image_lists.keys()) + '\n') ## 保存訓練的graph if FLAGS.saved_model_dir: export_model(module_spec, class_count, FLAGS.saved_model_dir)
main方法中的一些細節解釋已經用中文備註在上述代碼(使用「##」開頭)中,它的主要步驟是:網絡
分析完代碼的主要執行路徑,下面解讀下其它方法。由於總的代碼很是的長,篇幅有限,下面按照順序簡單介紹下其它方法的內容。app
def create_image_lists(image_dir, testing_percentage, validation_percentage): ...... 省略...... result[label_name] = { 'dir': dir_name, 'training': training_images, 'testing': testing_images, 'validation': validation_images, } return result
根據image_dir的地址,testing_percentage和testing_percentage的比例劃分圖集,返回的格式相似以下:dom
{ 'correct': { 'dir': correct_image_dir, 'training': correct_training_images, 'testing': correct_testing_images, 'validation': correct_validation_images }, 'error': { 'dir': error_image_dir, 'training': error_training_images, 'testing': error_testing_images, 'validation': error_validation_images } }
每一個training/testing/validation對應的value爲image的file_name list。
獲取圖片的全路徑
得到不一樣類別(training、testing、validation)的bottleneck路徑
根據給定的已訓練好的模型Hub Module,建立模型的圖
def run_bottleneck_on_image(sess, image_data, image_data_tensor, decoded_image_tensor, resized_input_tensor, bottleneck_tensor): """Runs inference on an image to extract the 'bottleneck' summary layer. Args: sess: Current active TensorFlow Session. image_data: String of raw JPEG data. image_data_tensor: Input data layer in the graph. decoded_image_tensor: Output of initial image resizing and preprocessing. resized_input_tensor: The input node of the recognition graph. bottleneck_tensor: Layer before the final softmax. Returns: Numpy array of bottleneck values. """ # First decode the JPEG image, resize it, and rescale the pixel values. resized_input_values = sess.run(decoded_image_tensor, {image_data_tensor: image_data}) # Then run it through the recognition network. bottleneck_values = sess.run(bottleneck_tensor, {resized_input_tensor: resized_input_values}) bottleneck_values = np.squeeze(bottleneck_values) return bottleneck_values
根據給定的輸入圖片解碼後的tensor,計算bottleneck_values,並執行squeeze操做(刪除單維度條目,把shape中爲1的維度去掉)
確保目錄存在:若是目錄不存在,則建立目錄
調run_bottleneck_on_image方法計算bottleneck值,並緩存到磁盤文件
批量獲取一組圖片的bottleneck值
批量緩存bottleneck
隨機獲取一批緩存的bottlenecks,以及其對應的真實標ground_truths和文件名filenames
def add_final_retrain_ops(class_count, final_tensor_name, bottleneck_tensor, quantize_layer, is_training): batch_size, bottleneck_tensor_size = bottleneck_tensor.get_shape().as_list() assert batch_size is None, 'We want to work with arbitrary batch size.' with tf.name_scope('input'): bottleneck_input = tf.placeholder_with_default( bottleneck_tensor, shape=[batch_size, bottleneck_tensor_size], name='BottleneckInputPlaceholder') ground_truth_input = tf.placeholder( tf.int64, [batch_size], name='GroundTruthInput') # Organizing the following ops so they are easier to see in TensorBoard. layer_name = 'final_retrain_ops' with tf.name_scope(layer_name): with tf.name_scope('weights'): initial_value = tf.truncated_normal( [bottleneck_tensor_size, class_count], stddev=0.001) layer_weights = tf.Variable(initial_value, name='final_weights') variable_summaries(layer_weights) with tf.name_scope('biases'): layer_biases = tf.Variable(tf.zeros([class_count]), name='final_biases') variable_summaries(layer_biases) with tf.name_scope('Wx_plus_b'): logits = tf.matmul(bottleneck_input, layer_weights) + layer_biases tf.summary.histogram('pre_activations', logits) final_tensor = tf.nn.softmax(logits, name=final_tensor_name) # The tf.contrib.quantize functions rewrite the graph in place for # quantization. The imported model graph has already been rewritten, so upon # calling these rewrites, only the newly added final layer will be # transformed. if quantize_layer: if is_training: tf.contrib.quantize.create_training_graph() else: tf.contrib.quantize.create_eval_graph() tf.summary.histogram('activations', final_tensor) # If this is an eval graph, we don't need to add loss ops or an optimizer. if not is_training: return None, None, bottleneck_input, ground_truth_input, final_tensor with tf.name_scope('cross_entropy'): cross_entropy_mean = tf.losses.sparse_softmax_cross_entropy( labels=ground_truth_input, logits=logits) tf.summary.scalar('cross_entropy', cross_entropy_mean) with tf.name_scope('train'): optimizer = tf.train.GradientDescentOptimizer(FLAGS.learning_rate) train_step = optimizer.minimize(cross_entropy_mean) return (train_step, cross_entropy_mean, bottleneck_input, ground_truth_input, final_tensor)
在結尾處添加一個新的softmax層和全鏈接層(y=WX+b),用於訓練和評估。此處與logistic模型是同樣的,採用梯度降低的方式來最小化交叉熵進行迭代訓練。
def add_evaluation_step(result_tensor, ground_truth_tensor): with tf.name_scope('accuracy'): with tf.name_scope('correct_prediction'): ## 對每組向量按列找到最大值的index prediction = tf.argmax(result_tensor, 1) ## 將每組張量比較預測的結果和實際的結果的一致性,一致則爲True,不然爲False correct_prediction = tf.equal(prediction, ground_truth_tensor) with tf.name_scope('accuracy'): ## 將True或False轉爲float格式,並計算平均值 evaluation_step = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) tf.summary.scalar('accuracy', evaluation_step) return evaluation_step, prediction
註解見上述代碼,返回最終的accuracy和預測的值list。
執行最終的評估,使用測試集進行結果評估。若是傳入參數print_misclassified_test_images,則會打印評估出錯的圖片的名字和識別結果。
將graph保存到文件
準備workspace
將輸入圖片解析爲張量,並進行解碼
輸出模型