使用TensorFlow進行訓練識別視頻圖像中物體

本教程針對Windows10實現谷歌公佈的TensorFlow Object Detection API視頻物體識別系統，其餘平臺也可借鑑。python

本教程將網絡上相關資料篩選整合（文末附上參考資料連接），旨在爲快速搭建環境以及實現視頻物體識別功能提供參考，關於此API的更多相關信息請自行搜索。git

注意： windows用戶名不能出現中文！！！github

安裝Pythonwindows

注意： Windows平臺的TensorFlow僅支持3.5.X版本的Python
進入Python3.5.2下載頁，選擇 Files 中Windows平臺的Python安裝包，下載並安裝。
瀏覽器

安裝TensorFlowbash

進入TensorFlow on Windows下載頁，本教程使用最簡便的組合 CPU support only + Native pip。微信

打開cmd，輸入如下指令即進行TensorFlow的下載安裝，下載位置爲python\Lib\site-packages\tensorflow：

打開 IDLE，輸入如下指令：

若是出現以下結果則安裝成功：

若出現問題，請參考TensorFlow on Windows下載頁底端的常見問題。
markdown

安裝Protoc網絡

Protoc用於編譯相關程序運行文件，進入Protoc下載頁，下載相似下圖中帶win32的壓縮包。

解壓後將bin文件夾內的protoc.exe拷貝到c:\windows\system32目錄下（用於將protoc.exe所在的目錄配置到環境變量當中）。機器學習

安裝git

進入git官網下載Windows平臺的git，詳細安裝及配置注意事項可參考此文。

安裝其他組件

在cmd內輸入以下指令下載並安裝相關API運行支持組件：

注意： Native pip會受電腦中另外Python應用的影響，博主由於以前作仿真安裝了Anaconda，致使下載的jupyter等相關組件安裝到了Anaconda內的site-packages文件夾，後期調用失敗。

下載代碼並編譯

在cmd中輸入以下代碼：

從github下載谷歌tensorflow/models的代碼，通常默認下載到C盤。

一樣在cmd進入到models文件夾，編譯Object Detection API的代碼：

運行notebook demo

繼續在models文件夾下運行以下命令：

瀏覽器自動開啓，顯示以下界面：

進入object_detection文件夾中的object_detection_tutorial.ipynb：

點擊Cell內的Run All，等待三分鐘左右（博主電腦接近報廢），便可顯示以下結果：

修改文件路徑，便可檢測本身的圖片：

注意：要將圖片名稱設置的和代碼描述相符合，如image1.jpg

TensorFlow Object Detection API中提供了五種可直接調用的識別模型，默認的是最簡單的ssd + mobilenet模型。

可直接將MODEL_NAME修改成以下值調用其餘模型：

MODEL_NAME = 'ssd_inception_v2_coco_11_06_2017'

MODEL_NAME = 'rfcn_resnet101_coco_11_06_2017'

MODEL_NAME = 'faster_rcnn_resnet101_coco_11_06_2017'

MODEL_NAME = 'faster_rcnn_inception_resnet_v2_atrous_coco_11_06_2017'
複製代碼

將模型換爲faster_rcnn_inception_resnet，結果以下：

準確率確實得到了極大提升，可是速度卻降低了，在博主的老爺機上須要五分鐘才能跑出結果。

視頻物體識別

谷歌在github上公佈了此項目的完整代碼，接下來咱們將在現有代碼基礎上添加相應模塊實現對於視頻中物體的識別。

第一步：下載opencv的cv2包

在Python官網便可下載opencv相關庫，點擊此處直接進入。

博主安裝的版本以下：

下載完成後，在cmd中執行安裝命令

pip install opencv_python-3.2.0.8-cp35-cp35m-win_amd64.whl
複製代碼

安裝完成後，進入IDLE輸入命令

import cv2
複製代碼

若未報錯，則opencv-python庫成功導入，環境搭配成功。

第二步：在原代碼中引入cv2包

第三步：添加視頻識別代碼
主要步驟以下：
1.使用 VideoFileClip 函數從視頻中抓取圖片。
2.用fl_image函數將原圖片替換爲修改後的圖片，用於傳遞物體識別的每張抓取圖片。
3.全部修改的剪輯圖像被組合成爲一個新的視頻。

在原版代碼基礎上，在最後面依次添加以下代碼（可從完整代碼處複製，但須要做出一些改變，固然也能夠直接從下文複製修改後的代碼）：

# Import everything needed to edit/save/watch video clips
import imageio
imageio.plugins.ffmpeg.download()

from moviepy.editor import VideoFileClip
from IPython.display import HTML
複製代碼

此處會下載一個剪輯必備的程序ffmpeg.win32.exe，內網下載過程當中容易斷線，可使用下載工具下載完而後放入以下路徑：

C:\Users\ 用戶名 \AppData\Local\imageio\ffmpeg\ffmpeg.win32.exe

def detect_objects(image_np, sess, detection_graph):
    # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
    image_np_expanded = np.expand_dims(image_np, axis=0)
    image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')

    # Each box represents a part of the image where a particular object was detected.
    boxes = detection_graph.get_tensor_by_name('detection_boxes:0')

    # Each score represent how level of confidence for each of the objects.
    # Score is shown on the result image, together with the class label.
    scores = detection_graph.get_tensor_by_name('detection_scores:0')
    classes = detection_graph.get_tensor_by_name('detection_classes:0')
    num_detections = detection_graph.get_tensor_by_name('num_detections:0')

    # Actual detection.
    (boxes, scores, classes, num_detections) = sess.run(
        [boxes, scores, classes, num_detections],
        feed_dict={image_tensor: image_np_expanded})

    # Visualization of the results of a detection.
    vis_util.visualize_boxes_and_labels_on_image_array(
        image_np,
        np.squeeze(boxes),
        np.squeeze(classes).astype(np.int32),
        np.squeeze(scores),
        category_index,
        use_normalized_coordinates=True,
        line_thickness=8)
    return image_np
複製代碼

處理圖像

def process_image(image):
    # NOTE: The output you return should be a color image (3 channel) for processing video below
    # you should return the final output (image with lines are drawn on lanes)
    with detection_graph.as_default():
        with tf.Session(graph=detection_graph) as sess:
            image_process = detect_objects(image, sess, detection_graph)
            return image_process
複製代碼

輸入視頻文件

white_output = 'video1_out.mp4'
clip1 = VideoFileClip("video1.mp4").subclip(25,30)
white_clip = clip1.fl_image(process_image) #NOTE: this function expects color images!!s
%time white_clip.write_videofile(white_output, audio=False)
其中video1.mp4已經從電腦中上傳至object_detection文件夾，subclip（25,30）表明識別視頻中25-30s這一時間段。
複製代碼

原版視頻：

展現識別完畢的視頻：

from moviepy.editor import *
clip1 = VideoFileClip("video1_out.mp4")
clip1.write_gif("final.gif")
複製代碼

將識別完畢的視頻導爲gif格式，並保存至object_detection文件夾。

至此，快速教程結束。各位應該都能使用谷歌開放的API實現了視頻物體識別。

相關參考資料

知乎：何之源對於「谷歌開放的TensorFlow Object Detection API 效果如何？」的回答
林俊宇的博客：導入opencv-python庫
myboyliu2007的專欄：ffmpeg安裝方法
陳強：安裝protocolbuffer詳解
機器之心：如何使用TensorFlow API構建視頻物體識別系統
windows安裝git和環境變量配置

原創做者:withzheng，原文連接：https://blog.csdn.net/xiaoxiao123jun/article/details/76605928
歡迎關注個人微信公衆號「碼農突圍」，分享Python、Java、大數據、機器學習、人工智能等技術，關注碼農技術提高•職場突圍•思惟躍遷，20萬+碼農成長充電第一站，陪有夢想的你一塊兒成長。