TensorFlow object detection API應用

時間 2019-12-01

標籤 tensorflow object detection api 應用简体版

原文原文鏈接

前一篇講述了TensorFlow object detection API的安裝與配置，如今咱們嘗試用這個API搭建本身的目標檢測模型。html

1、準備數據集python

　　本篇旨在人臉識別，在百度圖片上下載了120張張鈞甯的圖片，存放在/models/research/object_detection下新建的images文件夾內，images文件夾下新建train和test兩個文件夾，而後將120分爲100和20張分別存放在train和test中。git

接下來使用 LabelImg 這款小軟件，安裝方法參考這裏，對train和test裏的圖片進行人工標註（時間充裕的話越多越好），以下圖所示。github

標註完成後保存爲同名的xml文件，並存在原文件夾中。api

對於Tensorflow，須要輸入專門的 TFRecords Format 格式。瀏覽器

寫兩個小python腳本文件，第一個將文件夾內的xml文件內的信息統一記錄到.csv表格中，第二個從.csv表格中建立tfrecord格式。app

附上對應代碼：編輯器

# xml2csv.py

import os import glob import pandas as pd import xml.etree.ElementTree as ET os.chdir('/home/zzf/tensorflow/models/research/object_detection/images/test') path = '/home/zzf/tensorflow/models/research/object_detection/images/test'

def xml_to_csv(path): xml_list = [] for xml_file in glob.glob(path + '/*.xml'): tree = ET.parse(xml_file) root = tree.getroot() for member in root.findall('object'): value = (root.find('filename').text, int(root.find('size')[0].text), int(root.find('size')[1].text), member[0].text, int(member[4][0].text), int(member[4][1].text), int(member[4][2].text), int(member[4][3].text) ) xml_list.append(value) column_name = ['filename', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax'] xml_df = pd.DataFrame(xml_list, columns=column_name) return xml_df def main(): image_path = path xml_df = xml_to_csv(image_path) xml_df.to_csv('zhangjn_train.csv', index=None) print('Successfully converted xml to csv.') main()

View Code

# generate_tfrecord.py

# -*- coding: utf-8 -*-


""" Usage: # From tensorflow/models/ # Create train data: python generate_tfrecord.py --csv_input=data/tv_vehicle_labels.csv --output_path=train.record # Create test data: python generate_tfrecord.py --csv_input=data/test_labels.csv --output_path=test.record """


import os import io import pandas as pd import tensorflow as tf from PIL import Image from object_detection.utils import dataset_util from collections import namedtuple, OrderedDict os.chdir('/home/zzf/tensorflow/models/research/object_detection') flags = tf.app.flags flags.DEFINE_string('csv_input', '', 'Path to the CSV input') flags.DEFINE_string('output_path', '', 'Path to output TFRecord') FLAGS = flags.FLAGS # TO-DO replace this with label map
def class_text_to_int(row_label): if row_label == 'ZhangJN':     # 需改動
        return 1
    else: None def split(df, group): data = namedtuple('data', ['filename', 'object']) gb = df.groupby(group) return [data(filename, gb.get_group(x)) for filename, x in zip(gb.groups.keys(), gb.groups)] def create_tf_example(group, path): with tf.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid: encoded_jpg = fid.read() encoded_jpg_io = io.BytesIO(encoded_jpg) image = Image.open(encoded_jpg_io) width, height = image.size filename = group.filename.encode('utf8') image_format = b'jpg' xmins = [] xmaxs = [] ymins = [] ymaxs = [] classes_text = [] classes = [] for index, row in group.object.iterrows(): xmins.append(row['xmin'] / width) xmaxs.append(row['xmax'] / width) ymins.append(row['ymin'] / height) ymaxs.append(row['ymax'] / height) classes_text.append(row['class'].encode('utf8')) classes.append(class_text_to_int(row['class'])) tf_example = tf.train.Example(features=tf.train.Features(feature={ 'image/height': dataset_util.int64_feature(height), 'image/width': dataset_util.int64_feature(width), 'image/filename': dataset_util.bytes_feature(filename), 'image/source_id': dataset_util.bytes_feature(filename), 'image/encoded': dataset_util.bytes_feature(encoded_jpg), 'image/format': dataset_util.bytes_feature(image_format), 'image/object/bbox/xmin': dataset_util.float_list_feature(xmins), 'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs), 'image/object/bbox/ymin': dataset_util.float_list_feature(ymins), 'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs), 'image/object/class/text': dataset_util.bytes_list_feature(classes_text), 'image/object/class/label': dataset_util.int64_list_feature(classes), })) return tf_example def main(_): writer = tf.python_io.TFRecordWriter(FLAGS.output_path) path = os.path.join(os.getcwd(), 'images/test')         # 需改動
    examples = pd.read_csv(FLAGS.csv_input) grouped = split(examples, 'filename') for group in grouped: tf_example = create_tf_example(group, path) writer.write(tf_example.SerializeToString()) writer.close() output_path = os.path.join(os.getcwd(), FLAGS.output_path) print('Successfully created the TFRecords: {}'.format(output_path)) if __name__ == '__main__': tf.app.run()

View Code

對於xml2csv.py，注意改變8,9行，os.chdir和path路徑，以及35行，最後生成的csv文件的命名。generate_tfrecord.py也同樣，路徑需改成本身的，注意33行後的標籤識別代碼中改成相應的標籤，我這裏就一個。ide

對於訓練集與測試集分別運行上述代碼便可，獲得train.record與test.record文件。測試

2、配置文件和模型

爲了方便，我把image下的train和test的csv和record文件都放到object_detection/data目錄下，如此，在object_dection文件夾下，咱們有以下的文件結構：

Object-Detection -data/
--test_labels.csv --test.record --train_labels.csv --train.record -images/
--test/
---testingimages.jpg --train/
---testingimages.jpg --...yourimages.jpg -training/   # 新建，用於一會訓練模型使用

View Code

接下來須要設置配置文件，在objec_detection/samples下，尋找須要的對於模型的config文件，

咱們還能夠在官方提供的model zoo裏下載訓練好的模型。咱們使用ssd_mobilenet_v1_coco，先下載它。

在 object_dection文件夾下，解壓 ssd_mobilenet_v1_coco_2017_11_17.tar.gz，

將ssd_mobilenet_v1_coco.config 放在training 文件夾下，用文本編輯器打開（我用的sublime 3），進行以下更改：

一、搜索其中的 PATH_TO_BE_CONFIGURED ，將對應的路徑改成本身的路徑，注意不要把test跟train弄反了；

　注意最後train input reader和evaluation input reader中label_map_path必須保持一致。

二、將 num_classes 按照實際狀況更改，個人例子中是1；

三、batch_size 本來是24，我在運行的時候出現顯存不足的問題，爲了保險起見，改成1，若是1仍是出現相似問題的話，建議換電腦……

四、fine_tune_checkpoint: "ssd_mobilenet_v1_coco_11_06_2017/model.ckpt"
from_detection_checkpoint: true

　　這裏是使用finetune，在它原來訓練好的模型數據上進行訓練，這樣能夠快不少。否則從頭訓練好慢。

此時在對應目錄（/data）下，建立一個 zhangjn.pbtxt的文本文件（能夠複製一個其餘名字的文件，而後用文本編輯軟件打開修改），寫入咱們的標籤，個人例子中是兩個，id序號注意與前面建立CSV文件時保持一致，從1開始。

item {
  id: 1
  name: 'ZhangJN'
}

好，全部數據都已準備好。能夠開始訓練了。

3、訓練模型

我在本地GPU訓練（本機環境：Ubuntu 16.04LTS），終端進入 object_detection目錄下，最新版用model_main.py，也能夠用老版本的train.py訓練，後面會講到。model_main.py訓練時散熱器風扇已經呼呼轉動了，但終端沒有step the loss 信息輸出，心有點慌，須要先改幾個地方，

添加 tf.logging.set_verbosity(tf.logging.INFO) 到model_main.py 的 import 區域以後，會每隔一百個step輸出loss，總比沒有好，至少它讓你知道它在跑。
若是是python3訓練，添加list() 到 model_lib.py的大概390行 category_index.values()變成： list(category_index.values())，不然會有 can't pickle dict_values ERROR出現
還有一個問題是，用model_main.py 訓練時，由於它把老版本的train.py和eval.py集合到了一塊兒，因此制定eval num時指定很差會有warning出現，就像：

WARNING:tensorflow:Ignoring ground truth with image id 558212937 since it was previously added

　　因此在config文件設置時，eval部分的 num_examples （以下）和運行設置參數--num_eval_steps 時任何一個值只要比你數據集中訓練圖片數目要大就會出現警告，由於它沒那麼多圖片來評估，因此這兩個值直接設置成訓練圖片數量就行了。

eval_config: {
  num_examples: 20
  # Note: The below line limits the evaluation process to 10 evaluations.
  # Remove the below line to evaluate indefinitely.
  max_evals: 10
}

而後在終端輸入：

python3 model_main.py \
    --pipeline_config_path=training/ssd_mobilenet_v1_coco.config \
    --model_dir=training \
    --num_train_steps=60000 \
    --num_eval_steps=20 \
    --alsologtostderr

正常的話，稍等片刻，聽到風扇開始加速轉動的聲音時，訓練正在有條不紊地進行。model_main.py最後還生成了一個export文件夾，裏面竟然把save_model.pb都生成了，我沒試過這個是否是咱們後面要用的。有興趣的能夠試試這個pb文件。

不想改的話能夠用老版本的train.py，在legacy/train.py，一樣運行：

python3 legacy/train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/ssd_mobilenet_v1_coco.config

就開始訓練了

另開一個終端，一樣進入到object_detection目錄下，輸入：

tensorboard --logdir=training

此時，咱們能夠在瀏覽器打開查看訓練進度，它會不停地傳遞新訓練的數據進來。

運行一段時間後，咱們能夠看到咱們的training文件夾下已經有模型數據保存了，接下來就能夠生成咱們的須要的模型文件了，終端在object_detection目錄下，輸入：

python3 export_inference_graph.py --input_type image_tensor --pipeline_config_path training/ssd_mobilenet_v1_coco.config --trained_checkpoint_prefix training/model.ckpt-3737 --output_directory zhangjn_detction

其中，trained checkpoint 要改成本身訓練到的數字， output爲想要將模型存放在何處，我這裏新建了一個文件夾zhangjn_detction 。運行結束後，就能夠在zhangjn_detction文件夾下看到若干文件，有saved_model、checkpoint、frozen_inference_graph.pb等。 .pb結尾的就是最重要的frozen model了，上一篇小demo裏用的就是它，接下來咱們測試就是要用到它。

4、測試模型

將object_detection目錄下的object_detection_tutorial.ipynb打開，或者轉成object_detection_tutorial.py的python文件，更改一下就能夠測試。

# coding: utf-8

# # Object Detection Demo
# Welcome to the object detection inference walkthrough!  This notebook will walk you step by step through the process of using a pre-trained model to detect objects in an image. Make sure to follow the [installation instructions](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md) before you start.


from distutils.version import StrictVersion
import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile

from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image

# This is needed since the notebook is stored in the object_detection folder.
sys.path.append("..")
from object_detection.utils import ops as utils_ops

# if StrictVersion(tf.__version__) < StrictVersion('1.9.0'):
#   raise ImportError('Please upgrade your TensorFlow installation to v1.9.* or later!')


# ## Env setup

# In[2]:


# This is needed to display the images.
# get_ipython().magic(u'matplotlib inline')


# ## Object detection imports
# Here are the imports from the object detection module.



from utils import label_map_util

from utils import visualization_utils as vis_util


# # Model preparation 

# ## Variables
# 
# Any model exported using the `export_inference_graph.py` tool can be loaded here simply by changing `PATH_TO_FROZEN_GRAPH` to point to a new .pb file.  
# 
# By default we use an "SSD with Mobilenet" model here. See the [detection model zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md) for a list of other models that can be run out-of-the-box with varying speeds and accuracies.

# In[4]:


# What model to download.
MODEL_NAME = 'zhangjn_detction'
# MODEL_FILE = MODEL_NAME + '.tar.gz'
# DOWNLOAD_BASE = 'http://download.tensorflow.org/models/object_detection/'

# Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_FROZEN_GRAPH = MODEL_NAME + '/frozen_inference_graph.pb'

# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = os.path.join('data', 'zhangjn.pbtxt')

NUM_CLASSES = 1


# ## Download Model



# opener = urllib.request.URLopener()
# opener.retrieve(DOWNLOAD_BASE + MODEL_FILE, MODEL_FILE)
'''
tar_file = tarfile.open(MODEL_FILE)
for file in tar_file.getmembers():
  file_name = os.path.basename(file.name)
  if 'frozen_inference_graph.pb' in file_name:
    tar_file.extract(file, os.getcwd())
'''

# ## Load a (frozen) Tensorflow model into memory.



detection_graph = tf.Graph()
with detection_graph.as_default():
  od_graph_def = tf.GraphDef()
  with tf.gfile.GFile(PATH_TO_FROZEN_GRAPH, 'rb') as fid:
    serialized_graph = fid.read()
    od_graph_def.ParseFromString(serialized_graph)
    tf.import_graph_def(od_graph_def, name='')


# ## Loading label map
# Label maps map indices to category names, so that when our convolution network predicts `5`, we know that this corresponds to `airplane`.  Here we use internal utility functions, but anything that returns a dictionary mapping integers to appropriate string labels would be fine



label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)


# ## Helper code

# In[8]:


def load_image_into_numpy_array(image):
  (im_width, im_height) = image.size
  return np.array(image.getdata()).reshape(
      (im_height, im_width, 3)).astype(np.uint8)


# # Detection



# For the sake of simplicity we will use only 2 images:
# image1.jpg
# image2.jpg
# If you want to test the code with your images, just add path to the images to the TEST_IMAGE_PATHS.
PATH_TO_TEST_IMAGES_DIR = 'test_images'
TEST_IMAGE_PATHS = [ os.path.join(PATH_TO_TEST_IMAGES_DIR, 'image{}.jpg'.format(i)) for i in range(3, 8) ]

# Size, in inches, of the output images.
IMAGE_SIZE = (12, 8)


# In[10]:


def run_inference_for_single_image(image, graph):
  with graph.as_default():
    with tf.Session() as sess:
      # Get handles to input and output tensors
      ops = tf.get_default_graph().get_operations()
      all_tensor_names = {output.name for op in ops for output in op.outputs}
      tensor_dict = {}
      for key in [
          'num_detections', 'detection_boxes', 'detection_scores',
          'detection_classes', 'detection_masks'
      ]:
        tensor_name = key + ':0'
        if tensor_name in all_tensor_names:
          tensor_dict[key] = tf.get_default_graph().get_tensor_by_name(
              tensor_name)
      if 'detection_masks' in tensor_dict:
        # The following processing is only for single image
        detection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0])
        detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0])
        # Reframe is required to translate mask from box coordinates to image coordinates and fit the image size.
        real_num_detection = tf.cast(tensor_dict['num_detections'][0], tf.int32)
        detection_boxes = tf.slice(detection_boxes, [0, 0], [real_num_detection, -1])
        detection_masks = tf.slice(detection_masks, [0, 0, 0], [real_num_detection, -1, -1])
        detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(
            detection_masks, detection_boxes, image.shape[0], image.shape[1])
        detection_masks_reframed = tf.cast(
            tf.greater(detection_masks_reframed, 0.5), tf.uint8)
        # Follow the convention by adding back the batch dimension
        tensor_dict['detection_masks'] = tf.expand_dims(
            detection_masks_reframed, 0)
      image_tensor = tf.get_default_graph().get_tensor_by_name('image_tensor:0')

      # Run inference
      output_dict = sess.run(tensor_dict,
                             feed_dict={image_tensor: np.expand_dims(image, 0)})

      # all outputs are float32 numpy arrays, so convert types as appropriate
      output_dict['num_detections'] = int(output_dict['num_detections'][0])
      output_dict['detection_classes'] = output_dict[
          'detection_classes'][0].astype(np.uint8)
      output_dict['detection_boxes'] = output_dict['detection_boxes'][0]
      output_dict['detection_scores'] = output_dict['detection_scores'][0]
      if 'detection_masks' in output_dict:
        output_dict['detection_masks'] = output_dict['detection_masks'][0]
  return output_dict


# In[ ]:


for image_path in TEST_IMAGE_PATHS:
  image = Image.open(image_path)
  # the array based representation of the image will be used later in order to prepare the
  # result image with boxes and labels on it.
  image_np = load_image_into_numpy_array(image)
  # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
  image_np_expanded = np.expand_dims(image_np, axis=0)
  # Actual detection.
  output_dict = run_inference_for_single_image(image_np, detection_graph)
  # Visualization of the results of a detection.
  vis_util.visualize_boxes_and_labels_on_image_array(
      image_np,
      output_dict['detection_boxes'],
      output_dict['detection_classes'],
      output_dict['detection_scores'],
      category_index,
      instance_masks=output_dict.get('detection_masks'),
      use_normalized_coordinates=True,
      line_thickness=8)
  plt.figure(figsize=IMAGE_SIZE)
  plt.imshow(image_np)
  plt.show()

View Code

一、由於不用下載模型，下載相關代碼能夠刪除，model name, path to labels , num classes 更改爲本身的，download model部分都刪去。

二、測試圖片，準備幾張放入test images文件夾中，命名images+數字.jpg的格式，就不用改代碼，在

TEST_IMAGE_PATHS = [ os.path.join(PATH_TO_TEST_IMAGES_DIR, 'image{}.jpg'.format(i)) for i in range(3, 8) ]

一行更改本身圖片的數字序列就行了，range(3,8)，個人圖片命名從3至7.

若是用python文件的話，最後圖片顯示要加一句

plt.show()

運行它就能夠了。

python3 object_detection_tutorial.py

總之，整個訓練過程就是這樣。熟悉了以後也還挺簡單的。運行中可能會碰到這樣那樣的問題，不少是版本問題致使的。TensorFlow最煩人的一點就是版本更新太快，而且改動大，先後版本有些還不兼容。因此有問題不用怕，多Google，百度一下，通常均可以找到答案，若是是版本問題，一時無法升級的話，對比一下你的版本和最新版本哪一個差別致使的，把代碼中方法調用方式改爲你的版本就行了，我原來用1.4版本的時候，常常遇到版本不一樣的問題，好比最新版本中tf.contrib.data.parallel_interleave()方法，在1.4版本中tf.contrib.data沒有parallel_interleave()這個方法;再好比1.10版本中tf.keras.Model()類也可這這樣調用tf.keras.models.Model(),可是在1.4版本中只有後者一種調用方法，如果某個程序用了前者方法，在1.4版本中要運行起來就得本身去改一下了，等等。不過用了一段時間後我仍是把TensorFlow升級到1.10了，改太多了，本身都受不了。升級一下就是麻煩點，NVIDIA 驅動，cuda,cudnn 都得改。還好此次輕車熟路，三四個小時就升級成功了。

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。