使用Detectron2分6步進行目標檢測

時間 2020-10-26

標籤 html python git github 算法 shell json api 網絡框架欄目 HTML 简体版

原文原文鏈接

做者|Aakarsh Yelisetty
編譯|Flin
來源|towardsdatasciencehtml

讓咱們看看如何在涉及文本識別的自定義數據集上使用FAIR（Facebook AI Research）的Detectron 2進行實例檢測。python

你是否嘗試過使用你本身選擇的自定義數據集從頭開始訓練對象檢測模型？git

若是是的話，你就會知道這個過程有多乏味。若是咱們選擇基於區域建議的方法，如更快的R-CNN，或者咱們也可使用SSD和YOLO等一次性檢測器算法，咱們須要從使用特徵金字塔網絡和區域建議網絡來構建模型。github

若是咱們想從頭開始實現的話，它們中的任何一個都有點複雜。咱們須要一個框架，在這個框架中，咱們可使用最早進的模型，例如Fast，Faster和Mask R-CNN。然而，重要的是咱們須要從頭開始構建一個模型，以理解其背後的數學原理。算法

若是咱們想使用自定義數據集快速訓練對象檢測模型，Detectron 2就能夠提供幫助。Detectron 2庫的模型庫中存在的全部模型都在COCO Dataset上進行了預訓練。咱們只須要在預先訓練的模型上微調咱們的自定義數據集。shell

Detectron 2徹底重寫了2018年發佈的第一款Detectron。其前身是在Caffe2上編寫的，Caffe2是一個深度學習框架，也獲得了Facebook的支持。Caffe2和Detectron如今都不推薦使用。Caffe2如今是PyTorch的一部分，它的繼承者Detectron 2徹底是在PyTorch上編寫的。json

Detectron2旨在經過提供快速的訓練並解決公司從研究到生產的過程當中面臨的問題，來促進機器學習的發展。api

如下是Detectron 2提供的各類類型的目標檢測模型。網絡

讓咱們直接研究實例檢測。框架

實例檢測是指對象的分類和定位，並帶有邊界框。在本文中，咱們將使用Detectron 2的模型庫中的Faster RCNN模型來識別圖像中的文本語言。

請注意，咱們將語言限制爲2種。

咱們識別北印度語和英語文本，併爲其餘語言提供了一個名爲「Others」的類。

咱們將實現一個以這種方式輸出的模型。

讓咱們開始吧！

使用Detectron 2，可使用七個步驟對任何自定義數據集執行對象檢測。全部這些步驟均可以在此Google Colab Notebook 中輕鬆找到，你能夠當即運行！

使用Google Colab進行這項工做很容易，由於咱們可使用GPU進行更快的訓練。

步驟1：安裝Detectron 2

首先安裝一些依賴項，例如Torch Vision和COCO API，而後檢查CUDA是否可用。CUDA有助於跟蹤當前選擇的GPU。而後安裝Detectron2。

# install dependencies: 
!pip install -U torch==1.5 torchvision==0.6 -f https://download.pytorch.org/whl/cu101/torch_stable.html
!pip install cython pyyaml==5.1
!pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
import torch, torchvision
print(torch.__version__, torch.cuda.is_available())
!gcc --version
# install detectron2:
!pip install detectron2==0.1.3 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.5/index.html

步驟2：準備和註冊數據集

導入一些必要的程序包。

# You may need to restart your runtime prior to this, to let your installation take effect
import detectron2
from detectron2.utils.logger import setup_logger
setup_logger()

# import some common libraries
import numpy as np
import cv2
import random
from google.colab.patches import cv2_imshow

# import some common detectron2 utilities
from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog

內置數據集中列出了detectron2具備內置支持的數據集。若是要使用自定義數據集，同時還要重用detectron2的數據加載器，則須要註冊數據集（即，告訴detectron2如何獲取數據集）。

內置數據集: https://detectron2.readthedocs.io/tutorials/builtin_datasets.html

咱們使用具備三個類別的文本檢測數據集：

英語
印地語
其餘

咱們將從在COCO數據集上預先訓練的現有模型訓練文本檢測模型，該模型可在detectron2的模型庫中使用。

若是你有興趣瞭解從原始數據集格式到Detectron 2接受的格式的轉換，請查看：

https://colab.research.google.com/drive/1q-gwQteO79r8sX59oYnHYCNtP9zXWFPN?usp=sharing

如何將數據輸入模型？輸入數據要求屬於某些格式，如YOLO格式、PASCAL VOC格式、COCO格式等。Detectron2接受COCO格式的數據集。數據集的COCO格式由一個JSON文件組成，該文件包含圖像的全部細節，如大小、註釋（即邊界框座標）、與其邊界框對應的標籤等。例如，

這是一個JSON格式的圖像。邊界框表示有不一樣類型的格式。它必須是Detectron2的structures.BoxMode成員。這樣的格式有5種。但目前，它支持 BoxMode.XYXY_ABS, BoxMode.XYWH_ABS.

咱們使用第二種格式。（X，Y）表示邊界框的一個座標，W，H表示該框的寬度和高度。category_id 指的是邊界框所屬的類別。

而後，咱們須要註冊咱們的數據集。

import json
from detectron2.structures import BoxMode
def get_board_dicts(imgdir):
    json_file = imgdir+"/dataset.json" #Fetch the json file
    with open(json_file) as f:
        dataset_dicts = json.load(f)
    for i in dataset_dicts:
        filename = i["file_name"] 
        i["file_name"] = imgdir+"/"+filename 
        for j in i["annotations"]:
            j["bbox_mode"] = BoxMode.XYWH_ABS #Setting the required Box Mode
            j["category_id"] = int(j["category_id"])
    return dataset_dicts
from detectron2.data import DatasetCatalog, MetadataCatalog
#Registering the Dataset
for d in ["train", "val"]:
    DatasetCatalog.register("boardetect_" + d, lambda d=d: get_board_dicts("Text_Detection_Dataset_COCO_Format/" + d))
    MetadataCatalog.get("boardetect_" + d).set(thing_classes=["HINDI","ENGLISH","OTHER"])
board_metadata = MetadataCatalog.get("boardetect_train")

爲了驗證數據加載是否正確，讓咱們可視化訓練集中隨機選擇的樣本的標註。

步驟3：可視化訓練集

咱們將從數據集的train文件夾中隨機選擇3張圖片，並查看邊界框的外觀。

#Visualizing the Train Dataset
dataset_dicts = get_board_dicts("Text_Detection_Dataset_COCO_Format/train")
#Randomly choosing 3 images from the Set
for d in random.sample(dataset_dicts, 3):
    img = cv2.imread(d["file_name"])
    visualizer = Visualizer(img[:, :, ::-1], metadata=board_metadata)
    vis = visualizer.draw_dataset_dict(d)
    cv2_imshow(vis.get_image()[:, :, ::-1])

輸出看起來是這樣的，

第四步：訓練模型

咱們向前邁進了一大步。這是咱們給出配置和設置模型準備接受訓練的步驟。從技術上講，咱們只是在數據集上微調咱們的模型，由於模型已經在COCO數據集上進行了預訓練。

在Detectron2的模型庫裏有大量的模型可用於目標檢測。在這裏，咱們使用faster_rcnn_R_50_FPN_3x。

這裏有一個主幹網（這裏是Resnet），用於從圖像中提取特徵，而後是一個區域建議網絡，用於提出區域建議，以及一個用於收緊邊界框的框頭部。

你能夠在個人前一篇文章中讀到更多關於R-CNN如何更快工做的文章。

https://towardsdatascience.com/understanding-fast-r-cnn-and-faster-r-cnn-for-object-detection-adbb55653d97

讓咱們爲訓練設置配置。

from detectron2.engine import DefaultTrainer
from detectron2.config import get_cfg
import os
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml")) #Get the basic model configuration from the model zoo 
#Passing the Train and Validation sets
cfg.DATASETS.TRAIN = ("boardetect_train",)
cfg.DATASETS.TEST = ("boardetect_val",)
# Number of data loading threads
cfg.DATALOADER.NUM_WORKERS = 4
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml")  # Let training initialize from model zoo
# Number of images per batch across all machines.
cfg.SOLVER.IMS_PER_BATCH = 4
cfg.SOLVER.BASE_LR = 0.0125  # pick a good LearningRate
cfg.SOLVER.MAX_ITER = 1500  #No. of iterations   
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 256  
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 3 # No. of classes = [HINDI, ENGLISH, OTHER]
cfg.TEST.EVAL_PERIOD = 500 # No. of iterations after which the Validation Set is evaluated. 
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = CocoTrainer(cfg) 
trainer.resume_or_load(resume=False)
trainer.train()

我不認爲這是最好的配置。固然，其餘配置的精確度也會提升。畢竟，這取決於選擇正確的超參數。

注意，這裏咱們還計算驗證集中每500次迭代的精確度。

第五步：使用訓練好的模型進行推理

如今是時候經過在驗證集上測試模型來推斷結果了。

成功完成訓練後，輸出文件夾保存在本地存儲器中，其中存儲最終權重。你能夠保存此文件夾，以便未來根據此模型進行推斷。

from detectron2.utils.visualizer import ColorMode

#Use the final weights generated after successful training for inference  
cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")

cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.8  # set the testing threshold for this model
#Pass the validation dataset
cfg.DATASETS.TEST = ("boardetect_val", )

predictor = DefaultPredictor(cfg)

dataset_dicts = get_board_dicts("Text_Detection_Dataset_COCO_Format/val")
for d in random.sample(dataset_dicts, 3):    
    im = cv2.imread(d["file_name"])
    outputs = predictor(im)
    v = Visualizer(im[:, :, ::-1],
                   metadata=board_metadata, 
                   scale=0.8,
                   instance_mode=ColorMode.IMAGE   
    )
    v = v.draw_instance_predictions(outputs["instances"].to("cpu")) #Passing the predictions to CPU from the GPU
    cv2_imshow(v.get_image()[:, :, ::-1])

結果：

第6步：評估訓練模型

一般，模型的評估遵循COCO評估標準。用平均精度（mAP）來評價模型的性能。

這是一篇關於mAP的文章：https://tarangshah.com/blog/2018-01-27/what-is-map-understanding-the-statistic-of-choice-for-comparing-object-detection-models/

#import the COCO Evaluator to use the COCO Metrics
from detectron2.evaluation import COCOEvaluator, inference_on_dataset
from detectron2.data import build_detection_test_loader

#Call the COCO Evaluator function and pass the Validation Dataset
evaluator = COCOEvaluator("boardetect_val", cfg, False, output_dir="/output/")
val_loader = build_detection_test_loader(cfg, "boardetect_val")

#Use the created predicted model in the previous step
inference_on_dataset(predictor.model, val_loader, evaluator)

對於0.5的IoU，咱們得到約79.4％的準確度，這還不錯。能夠經過稍微調整參數並增長迭代次數來增長。但請密切注意訓練過程，由於該模型可能會過擬合。

若是你須要從保存的模型中進行推斷，請瀏覽：https://colab.research.google.com/drive/1d0kXs-TE7_3CXldJNs1WsEshXf8Gw_5n?usp=sharing