[代碼解析]Mask R-CNN介紹與實現(轉)

時間 2019-11-11

標籤代碼解析 mask cnn 介紹實現简体版

原文原文鏈接

文章來源

DFannphp

簡介

論文地址：Mask R-CNN
源代碼：matterport - githubcss

代碼源於matterport的工做組，能夠在github上fork它們組的工做。node

軟件必備

復現的Mask R-CNN是基於Python3，Keras，TensorFlow。python

Python 3.4+
TensorFlow 1.3+
Keras 2.0.8+
Jupyter Notebook
Numpy, skimage, scipy

建議配置一個高版本的Anaconda3+TensorFlow-GPU版本。linux

Mask R-CNN論文回顧

Mask R-CNN(簡稱MRCNN)是基於R-CNN系列、FPN、FCIS等工做之上的，MRCNN的思路很簡潔：Faster R-CNN針對每一個候選區域有兩個輸出：種類標籤和bbox的偏移量。那麼MRCNN就在Faster R-CNN的基礎上經過增長一個分支進而再增長一個輸出，即物體掩膜(object mask)。ios

先回顧一下Faster R-CNN， Faster R-CNN主要由兩個階段組成：區域候選網絡(Region Proposal Network,RPN)和基礎的Fast R-CNN模型。git

RPN用於產生候選區域
github
Fast R-CNN經過RoIPool層對每一個候選區域提取特徵，從而實現目標分類和bbox迴歸
web

MRCNN採用和Faster R-CNN相同的兩個階段，具備相同的第一層(即RPN)，第二階段，除了預測種類和bbox迴歸，而且並行的對每一個RoI預測了對應的二值掩膜(binary mask)。示意圖以下：正則表達式

這樣作能夠將整個任務簡化爲mulit-stage pipeline，解耦了多個子任務的關係，現階段來看，這樣作好處頗多。

主要工做

損失函數的定義

依舊採用的是多任務損失函數，針對每一個每一個RoI定義爲

L = L_{c l s} + L_{b o x} + L_{m a s k}

L_{c l s} ， L_{b o x}

與Faster R-CNN的定義相似，這裏主要看

L_{m a s k}

。

掩膜分支針對每一個RoI產生一個 $K m^{2}$ 的輸出，即K個分辨率爲 $m \times m$ 的二值的掩膜， $K$ 爲分類物體的種類數目。依據預測類別分支預測的類型 $i$ ，只將第 $i$ 的二值掩膜輸出記爲 $L_{m a s k}$ 。
掩膜分支的損失計算以下示意圖：

mask branch 預測 $K$ 個種類的 $m \times m$ 二值掩膜輸出
依據種類預測分支(Faster R-CNN部分)預測結果：當前RoI的物體種類爲 $i$
第 $i$ 個二值掩膜輸出就是該RoI的損失 $L_{m a s k}$

對於預測的二值掩膜輸出，咱們對每一個像素點應用sigmoid函數，總體損失定義爲平均二值交叉損失熵。
引入預測 $K$ 個輸出的機制，容許每一個類都生成獨立的掩膜，避免類間競爭。這樣作解耦了掩膜和種類預測。不像是FCN的方法，在每一個像素點上應用softmax函數，總體採用的多任務交叉熵，這樣會致使類間競爭，最終致使分割效果差。

掩膜表示到RoIAlign層

在Faster R-CNN上預測物體標籤或bbox偏移量是將feature map壓縮到FC層最終輸出vector，壓縮的過程丟失了空間上(平面結構)的信息，而掩膜是對輸入目標作空間上的編碼，直接用卷積形式表示像素點之間的對應關係那是最好的了。

輸出掩膜的操做是不須要壓縮輸出vector，因此可使用FCN(Full Convolutional Network)，不只效率高，並且參數量還少。爲了更好的表示出RoI輸入和FCN輸出的feature之間的像素對應關係，提出了RoIAlign層。

先回顧一下RoIPool層：

其核心思想是將不一樣大小的RoI輸入到RoIPool層，RoIPool層將RoI量化成不一樣粒度的特徵圖（量化成一個一個bin），在此基礎上使用池化操做提取特徵。

下圖是SPPNet內對RoI的操做，在Faster R-CNN中只使用了一種粒度的特徵圖：

平面示意圖以下：

這裏面存在一些問題，在上面量操做上，實際計算中是使用的是 $[x / 16]$ ， $16$ 的量化的步長， $[\cdot]$ 是舍入操做(rounding)。這套量化舍入操做在提取特徵時有着較好的魯棒性(檢測物體具備平移不變性等)，可是這很不利於掩膜定位，有較大負面效果。

針對這個問題，提出了RoIAlign層：避免了對RoI邊界或bin的量化操做，在擴展feature map時使用雙線性插值算法。這裏實現的架構要看FPN論文：

一開始的Faster R-CNN是基於最上層的特徵映射作分割和預測的，這會丟失高分辨下的信息，直觀的影響就是丟失小目標檢測，對細節部分丟失不敏感。受到SSD的啓發，FPN也使用了多層特徵作預測。這裏使用的top-down的架構，是將高層的特徵反捲積帶到低層的特徵(即有了語義，也有精度)，而在MRCNN論文裏面說的雙線性差值算法就是這裏的top-down反捲積是用的插值算法。

總結

MRCNN有着優異的效果，除去了掩膜分支的做用，很大程度上是由於基礎特徵網絡的加強，論文使用的是ResNeXt101+FPN的top-down組合，有着極強的特徵學習能力，而且在實驗中夾雜這多種工程調優技巧。

可是吧，MRCNN的缺點也很明顯，須要大的計算能力而且速度慢，這離實際應用仍是有很長的路，坐等大神們發力！

如何使用代碼

項目的源代碼地址爲:github/Mask R-CNN

知足運行環境
- Python 3.4+
- TensorFlow 1.3+
- Keras 2.0.8+
- Jupyter Notebook
- Numpy, skimage, scipy, Pillow（安裝Anaconda3直接完事）
- cv2
下載代碼
- linux環境下直接clone到本地
```
git clone https://github.com/matterport/Mask_RCNN.git
```
- Windows下下載代碼便可，地址在上面
下載模型在COCO數據集上預訓練權重（mask_rcnn_coco.h5），下載地址releasses Page.
若是須要在COCO數據集上訓練或測試，須要安裝pycocotools， clone下來，make生成對應的文件，拷貝下工程目錄下便可(方法可參考下面repos內的README.md文件)。
- Linux: https://github.com/waleedka/coco
- Windows: https://github.com/philferriere/cocoapi. You must have the Visual C++ 2015 build tools on your path (see the repo for additional details)
若是使用COCO數據集，須要：
- pycocotools (即第4條描述的)
- MS COCO Dataset。2014的訓練集數據
- COCO子數據集，5K的minival和35K的validation-minus-minival。（這兩個數據集下載比較慢，沒有貼原地址，而是個人CSDN地址，分不夠下載的能夠私信我~）

下面的代碼分析運行環境都是jupyter。

代碼分析-數據預處理

項目源代碼：matterport - github

inspect_data.ipynb展現了準備訓練數據的預處理步驟.

導包

導入的coco包須要從coco/PythonAPI上下載操做數據代碼，並在本地使用make指令編譯.將生成的pycocotools拷貝至工程的主目錄下，即和該inspect_data.ipynb文件同一目錄。

import os
import sys
import itertools
import math
import logging
import json
import re
import random
from collections import OrderedDict
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import matplotlib.lines as lines
from matplotlib.patches import Polygon

import utils
import visualize
from visualize import display_images
import model as modellib
from model import log

%matplotlib inline 

ROOT_DIR = os.getcwd()

# 選擇任意一個代碼塊 
# import shapes
# config = shapes.ShapesConfig() # 使用代碼建立數據集，後面會有介紹

# MS COCO 數據集
import coco
config = coco.CocoConfig()
COCO_DIR = "/root/模型復現/Mask_RCNN-master/coco"  # COCO數據存放位置

加載數據集

COCO數據集的訓練集內有82081張圖片，共81類。

# 這裏使用的是COCO
if config.NAME == 'shapes':
    dataset = shapes.ShapesDataset()
    dataset.load_shapes(500, config.IMAGE_SHAPE[0], config.IMAGE_SHAPE[1])
elif config.NAME == "coco":
    dataset = coco.CocoDataset()
    dataset.load_coco(COCO_DIR, "train")

# Must call before using the dataset
dataset.prepare()

print("Image Count: {}".format(len(dataset.image_ids)))
print("Class Count: {}".format(dataset.num_classes))
for i, info in enumerate(dataset.class_info):
    print("{:3}. {:50}".format(i, info['name']))

>>>
>>>
loading annotations into memory...
Done (t=7.68s)
creating index...
index created!
Image Count: 82081
Class Count: 81
  0. BG                                                
  1. person                                            
  2. bicycle   
 ...
 77. scissors                                          
 78. teddy bear                                        
 79. hair drier                                        
 80. toothbrush

隨機找幾張照片看看：

# 加載和展現隨機幾張照片和對應的mask
image_ids = np.random.choice(dataset.image_ids, 4)
for image_id in image_ids:
    image = dataset.load_image(image_id)
    mask, class_ids = dataset.load_mask(image_id)
    visualize.display_top_masks(image, mask, class_ids, dataset.class_names)

Bounding Boxes(bbox)

這裏咱們不使用數據集自己提供的bbox座標數據，取而代之的是經過mask計算出bbox，這樣能夠在不一樣的數據集下對bbox使用相同的處理方法。由於咱們是從mask上計算bbox，相比與從圖片計算bbox轉換來講，更便於放縮，旋轉，裁剪圖像。

# Load random image and mask.
image_id = random.choice(dataset.image_ids)
image = dataset.load_image(image_id)
mask, class_ids = dataset.load_mask(image_id)
# Compute Bounding box
bbox = utils.extract_bboxes(mask)

# Display image and additional stats
print("image_id ", image_id, dataset.image_reference(image_id))
log("image", image)
log("mask", mask)
log("class_ids", class_ids)
log("bbox", bbox)
# Display image and instances
visualize.display_instances(image, bbox, mask, class_ids, dataset.class_names)

>>>
>>>
image_id  41194 http://cocodataset.org/#explore?id=190360
image                    shape: (428, 640, 3)         min:    0.00000  max:  255.00000
mask                     shape: (428, 640, 5)         min:    0.00000  max:    1.00000
class_ids                shape: (5,)                  min:    1.00000  max:   59.00000
bbox                     shape: (5, 4)                min:    1.00000  max:  640.00000

調整圖片大小

由於訓練時是批量處理的，每次batch要處理多張圖片，模型須要一個固定的輸入大小。故將訓練集的圖片放縮到一個固定的大小(1024×1024)，放縮的過程要保持不變的寬高比，若是照片自己不是正方形，那邊就在邊緣填充0.(這在R-CNN論文裏面論證過)。

須要注意的是：原圖片作了放縮，對應的mask也須要放縮，由於咱們的bbox是依據mask計算出來的，這樣省了修改程序了~

# Load random image and mask.
image_id = np.random.choice(dataset.image_ids, 1)[0]
image = dataset.load_image(image_id)
mask, class_ids = dataset.load_mask(image_id)
original_shape = image.shape
# 調整到固定大小
image, window, scale, padding = utils.resize_image(
    image, 
    min_dim=config.IMAGE_MIN_DIM, 
    max_dim=config.IMAGE_MAX_DIM,
    padding=config.IMAGE_PADDING)
mask = utils.resize_mask(mask, scale, padding) # mask也要放縮
# Compute Bounding box
bbox = utils.extract_bboxes(mask)

# Display image and additional stats
print("image_id: ", image_id, dataset.image_reference(image_id))
print("Original shape: ", original_shape)
log("image", image)
log("mask", mask)
log("class_ids", class_ids)
log("bbox", bbox)
# Display image and instances
visualize.display_instances(image, bbox, mask, class_ids, dataset.class_names)

>>>
>>>
image_id:  6104 http://cocodataset.org/#explore?id=139889
Original shape:  (426, 640, 3)
image                    shape: (1024, 1024, 3)       min:    0.00000  max:  255.00000
mask                     shape: (1024, 1024, 2)       min:    0.00000  max:    1.00000
class_ids                shape: (2,)                  min:   24.00000  max:   24.00000
bbox                     shape: (2, 4)                min:  169.00000  max:  917.00000

原圖片從(426, 640, 3)放大到(1024, 1024, 3),圖片的上下兩端都填充了0(黑色的部分)：

Mini Mask

訓練高分辨率的圖片時，表示每一個目標的二值mask也會很是大。例如，訓練一張1024×1024的圖片，其目標物體對應的mask須要1MB的內存(用boolean變量表示單點)，若是1張圖片有100個目標物體就須要100MB。講道理，若是是五光十色就算了，但實際上表示mask的圖像矩陣上大部分都是0，很浪費空間。

爲了節省空間同時提高訓練速度，咱們優化mask的表示方式，不直接存儲那麼多0，而是經過存儲有值座標的相對位置來壓縮表示數據的內存，原理和壓縮算法差相似。

咱們存儲在對象邊界框內(bbox內)的mask像素，而不是存儲整張圖片的mask像素，大多數物體相對比於整張圖片是較小的，節省存儲空間是經過少存儲目標周圍的0實現的。
將mask調整到小尺寸56×56，對於大尺寸的物體會丟失一些精度，可是大多數對象的註解並非很準確，因此大多數狀況下這些損失是能夠忽略的。（能夠在config類中設置mini mask的size。）

說白了就是在處理數據的時候，咱們先利用標註的mask信息計算出對應的bbox框，然後利用計算的bbox框反過來改變mask的表示方法，目的就是操做規範化，同時下降存儲空間和計算複雜度。

image_id = np.random.choice(dataset.image_ids, 1)[0]
# 使用load_image_gt方法獲取bbox和mask
image, image_meta, bbox, mask = modellib.load_image_gt(
    dataset, config, image_id, use_mini_mask=False)

log("image", image)
log("image_meta", image_meta)
log("bbox", bbox)
log("mask", mask)

display_images([image]+[mask[:,:,i] for i in range(min(mask.shape[-1], 7))])

>>>
>>>
image                    shape: (1024, 1024, 3)       min:    0.00000  max:  252.00000
image_meta               shape: (89,)                 min:    0.00000  max: 66849.00000
bbox                     shape: (1, 5)                min:   62.00000  max:  987.00000
mask                     shape: (1024, 1024, 1)       min:    0.00000  max:    1.00000

隨機選取一張圖片，能夠看到圖片目標相對與圖片自己較小：

visualize.display_instances(image, bbox[:,:4], mask, bbox[:,4], dataset.class_names)

使用load_image_gt方法，傳入use_mini_mask=True實現mini mask操做：

# load_image_gt方法集成了mini_mask的操做
image, image_meta, bbox, mask = modellib.load_image_gt(
    dataset, config, image_id, augment=True, use_mini_mask=True)
log("mask", mask)
display_images([image]+[mask[:,:,i] for i in range(min(mask.shape[-1], 7))])

>>>
>>>
mask                     shape: (56, 56, 1)           min:    0.00000  max:    1.00000

這裏爲了展示效果，將mini_mask表示方法經過expand_mask方法擴大到大圖像下的mask，再繪製試試：

mask = utils.expand_mask(bbox, mask, image.shape)
visualize.display_instances(image, bbox[:,:4], mask, bbox[:,4], dataset.class_names)

能夠看到邊界是鋸齒狀，這也是壓縮的反作用，整體來講效果還能夠～

Anchors

Anchors是Faster R-CNN內提出的方法。
模型在運行過程當中有多層feature map，同時也會有很是多的Anchors，處理好Anchors的順序很是重要。例如使用anchors的順序要匹配卷積處理的順序等規則。

對於FPN網絡，anchor的順序要與卷積層的輸出相匹配：

先按金字塔等級排序，第一層的全部anchors,第二層全部anchors,etc..經過按層次能夠很容易分開全部的anchors
對於每一個層，經過feature map處理序列來排列anchors，一般，一個卷積層處理一個feature map 是從左上角開始，向右一行一行來整
對於feature map的每一個cell，可爲不一樣比例的Anchors採用隨意順序，這裏咱們將採用不一樣比例的順序當參數傳遞給相應的函數

Anchor步長：在FPN架構下，前幾層的feature map是高分辨率的。例如，若是輸入是1024×1024,那麼第一層的feature map大小爲256×256，這會產生約200K的anchors(2562563),這些anchor都是32×32,相對於圖片像素的步長爲4(1024/256=4),這裏面有不少重疊，若是咱們可以爲feature map的每一個點生成獨有的anchor，就會顯著的下降負載，若是設置anchor的步長爲2，那麼anchor的數量就會降低4倍。

這裏咱們使用的strides爲2，這和論文不同，在Config類中，咱們配置了3中比例([0.5, 1, 2])的anchors，以第一層feature map舉例，其大小爲256×256,故有 $\frac{f e a t u r e_m a p^{2} \times r a t i o s}{s t r i d e^{2}} = \frac{256 \times 256 \times 3}{2^{2}} = 49152$ 。

# 生成 Anchors
anchors = utils.generate_pyramid_anchors(config.RPN_ANCHOR_SCALES, 
                                          config.RPN_ANCHOR_RATIOS,
                                          config.BACKBONE_SHAPES,
                                          config.BACKBONE_STRIDES, 
                                          config.RPN_ANCHOR_STRIDE)

# Print summary of anchors
print("Scales: ", config.RPN_ANCHOR_SCALES)
print("ratios: {}, \nAnchors_per_cell:{}".format(config.RPN_ANCHOR_RATIOS , len(config.RPN_ANCHOR_RATIOS)))
print("backbone_shapes: ",config.BACKBONE_SHAPES)
print("backbone_strides: ",config.BACKBONE_STRIDES)
print("rpn_anchor_stride: ",config.RPN_ANCHOR_STRIDE)

num_levels = len(config.BACKBONE_SHAPES)
print("Count: ", anchors.shape[0])
print("Levels: ", num_levels)
anchors_per_level = []
for l in range(num_levels):
    num_cells = config.BACKBONE_SHAPES[l][0] * config.BACKBONE_SHAPES[l][1]
    anchors_per_level.append(anchors_per_cell * num_cells // config.RPN_ANCHOR_STRIDE**2)
    print("Anchors in Level {}: {}".format(l, anchors_per_level[l]))


>>>
>>>
Scales:  (32, 64, 128, 256, 512)
ratios: [0.5, 1, 2], 
 anchors_per_cell:3
backbone_shapes:  [[256 256] [128 128] [ 64  64]  [ 32  32]  [ 16  16]]
backbone_strides:  [4, 8, 16, 32, 64]
rpn_anchor_stride:  2
Count:  65472
Levels:  5
Anchors in Level 0: 49152
Anchors in Level 1: 12288
Anchors in Level 2: 3072
Anchors in Level 3: 768
Anchors in Level 4: 192

看看位置圖片中心點cell的不一樣層anchor表示：

# Load and draw random image
image_id = np.random.choice(dataset.image_ids, 1)[0]
image, image_meta, _, _ = modellib.load_image_gt(dataset, config, image_id)
fig, ax = plt.subplots(1, figsize=(10, 10))
ax.imshow(image)

levels = len(config.BACKBONE_SHAPES) # 共有5層 15個anchors

for level in range(levels):
    colors = visualize.random_colors(levels)
    # Compute the index of the anchors at the center of the image
    level_start = sum(anchors_per_level[:level]) # sum of anchors of previous levels
    level_anchors = anchors[level_start:level_start+anchors_per_level[level]]
    print("Level {}. Anchors: {:6} Feature map Shape: {}".format(level, level_anchors.shape[0], 
                                                                config.BACKBONE_SHAPES[level]))
    center_cell = config.BACKBONE_SHAPES[level] // 2
    center_cell_index = (center_cell[0] * config.BACKBONE_SHAPES[level][1] + center_cell[1])
    level_center = center_cell_index * anchors_per_cell 
    center_anchor = anchors_per_cell * (
        (center_cell[0] * config.BACKBONE_SHAPES[level][1] / config.RPN_ANCHOR_STRIDE**2) \
        + center_cell[1] / config.RPN_ANCHOR_STRIDE)
    level_center = int(center_anchor)

    # Draw anchors. Brightness show the order in the array, dark to bright.
    for i, rect in enumerate(level_anchors[level_center:level_center+anchors_per_cell]):
        y1, x1, y2, x2 = rect
        p = patches.Rectangle((x1, y1), x2-x1, y2-y1, linewidth=2, facecolor='none',
                              edgecolor=(i+1)*np.array(colors[level]) / anchors_per_cell)
        ax.add_patch(p)


>>>
>>>
Level 0. Anchors:  49152  Feature map Shape: [256 256]
Level 1. Anchors:  12288  Feature map Shape: [128 128]
Level 2. Anchors:   3072  Feature map Shape: [64 64]
Level 3. Anchors:    768  Feature map Shape: [32 32]
Level 4. Anchors:    192  Feature map Shape: [16 16]

代碼分析-在本身的數據集上訓練模型

項目源代碼：matterport - github

train_shapes.ipynb展現瞭如何在本身的數據集上訓練Mask R-CNN.

若是想在你的我的訓練集上訓練模型，須要分別建立兩個子類繼承下面兩個父類：

Config類，該類包含了默認的配置，子類繼承該類在針對數據集定製配置。
Dataset類，該類提供了一套api，新的數據集繼承該類，同時覆寫相關方法便可，這樣能夠在不修改模型代碼的狀況下，使用多種數據集(包括同時使用)。

不管是Dataset仍是Config都是基類，使用是要繼承並作相關定製，使用案例可參考下面的demo。

導包

由於demo中使用的數據集是使用opencv建立出來的，故不須要另外在下載數據集了。爲了保證模型運行正常，此demo依舊須要在GPU上運行。

import os
import sys
import random
import math
import re
import time
import numpy as np
import cv2
import matplotlib
import matplotlib.pyplot as plt

from config import Config
import utils
import model as modellib
import visualize
from model import log

%matplotlib inline 

ROOT_DIR = os.getcwd()  # Root directory of the project
MODEL_DIR = os.path.join(ROOT_DIR, "logs") # Directory to save logs and trained model
COCO_MODEL_PATH = os.path.join(ROOT_DIR, "mask_rcnn_coco.h5") # Path to COCO trained weights

構建我的數據集

這裏直接使用opencv建立一個數據集，數據集是由畫布和簡單的幾何形狀(三角形，正方形，圓形)組成。

構造的數據集須要繼承utils.Dataset類，使用load_shapes()方法向外提供加載數據的方法，並須要重寫下面的方法：

load_image()
load_mask()
image_reference()

構造數據集的代碼：

class ShapesDataset(utils.Dataset):
    """ 生成一個數據集，數據集由簡單的(三角形，正方形，圓形)放置在空白畫布的圖片組成。 """

    def load_shapes(self, count, height, width):
        """ 產生對應數目的固定大小圖片 count: 生成數據的數量 height, width: 產生圖片的大小 """
        # 添加種類信息
        self.add_class("shapes", 1, "square")
        self.add_class("shapes", 2, "circle")
        self.add_class("shapes", 3, "triangle")

        # 生成隨機規格形狀，每張圖片依據image_id指定
        for i in range(count):
            bg_color, shapes = self.random_image(height, width)
            self.add_image("shapes", image_id=i, path=None,
                           width=width, height=height,
                           bg_color=bg_color, shapes=shapes)

    def load_image(self, image_id):
        """ 依據給定的iamge_id產生對應圖片。 一般這個函數是讀取文件的，這裏咱們是依據image_id到image_info裏面查找信息，再生成圖片 """
        info = self.image_info[image_id]
        bg_color = np.array(info['bg_color']).reshape([1, 1, 3])
        image = np.ones([info['height'], info['width'], 3], dtype=np.uint8)
        image = image * bg_color.astype(np.uint8)
        for shape, color, dims in info['shapes']:
            image = self.draw_shape(image, shape, dims, color)
        return image

    def image_reference(self, image_id):
        """Return the shapes data of the image."""
        info = self.image_info[image_id]
        if info["source"] == "shapes":
            return info["shapes"]
        else:
            super(self.__class__).image_reference(self, image_id)

    def load_mask(self, image_id):
        """依據給定的image_id產生相應的規格形狀的掩膜"""
        info = self.image_info[image_id]
        shapes = info['shapes']
        count = len(shapes)
        mask = np.zeros([info['height'], info['width'], count], dtype=np.uint8)
        for i, (shape, _, dims) in enumerate(info['shapes']):
            mask[:, :, i:i+1] = self.draw_shape(mask[:, :, i:i+1].copy(),
                                                shape, dims, 1)
        # Handle occlusions
        occlusion = np.logical_not(mask[:, :, -1]).astype(np.uint8)
        for i in range(count-2, -1, -1):
            mask[:, :, i] = mask[:, :, i] * occlusion
            occlusion = np.logical_and(occlusion, np.logical_not(mask[:, :, i]))
        # Map class names to class IDs.
        class_ids = np.array([self.class_names.index(s[0]) for s in shapes])
        return mask, class_ids.astype(np.int32)

    def draw_shape(self, image, shape, dims, color):
        """繪製給定的形狀."""
        # Get the center x, y and the size s
        x, y, s = dims
        if shape == 'square':
            image = cv2.rectangle(image, (x-s, y-s), (x+s, y+s), color, -1)
        elif shape == "circle":
            image = cv2.circle(image, (x, y), s, color, -1)
        elif shape == "triangle":
            points = np.array([[(x, y-s),
                                (x-s/math.sin(math.radians(60)), y+s),
                                (x+s/math.sin(math.radians(60)), y+s),
                                ]], dtype=np.int32)
            image = cv2.fillPoly(image, points, color)
        return image

    def random_shape(self, height, width):
        """ 依據給定的長寬邊界生成隨機形狀 返回一個有三個值的元組： * shape: 形狀名稱(square, circle, ...) * color: 形狀顏色(a tuple of 3 values, RGB.) * dimensions: 隨機形狀的中心位置和大小(center_x,center_y,size) """
        # Shape
        shape = random.choice(["square", "circle", "triangle"])
        # Color
        color = tuple([random.randint(0, 255) for _ in range(3)])
        # Center x, y
        buffer = 20
        y = random.randint(buffer, height - buffer - 1)
        x = random.randint(buffer, width - buffer - 1)
        # Size
        s = random.randint(buffer, height//4)
        return shape, color, (x, y, s)

    def random_image(self, height, width):
        """ 產生有多種形狀的隨機規格的圖片 返回背景色 和 能夠用於繪製圖片的形狀規格列表 """
        # 隨機生成三個通道顏色
        bg_color = np.array([random.randint(0, 255) for _ in range(3)])
        # 生成一些隨機形狀並記錄它們的bbox
        shapes = []
        boxes = []
        N = random.randint(1, 4)
        for _ in range(N):
            shape, color, dims = self.random_shape(height, width)
            shapes.append((shape, color, dims))
            x, y, s = dims
            boxes.append([y-s, x-s, y+s, x+s])
        # 使用非極大值抑制避免各類形狀之間覆蓋 閾值爲:0.3
        keep_ixs = utils.non_max_suppression(np.array(boxes), np.arange(N), 0.3)
        shapes = [s for i, s in enumerate(shapes) if i in keep_ixs]
        return bg_color, shapes

用上面的數據類構造一組數據，看看：

# 構建訓練集，大小爲500
dataset_train = ShapesDataset()
dataset_train.load_shapes(500, config.IMAGE_SHAPE[0], config.IMAGE_SHAPE[1])
dataset_train.prepare()

# 構建驗證集，大小爲50
dataset_val = ShapesDataset()
dataset_val.load_shapes(50, config.IMAGE_SHAPE[0], config.IMAGE_SHAPE[1])
dataset_val.prepare()

# 隨機選取4個樣本
image_ids = np.random.choice(dataset_train.image_ids, 4)  

for image_id in image_ids:
    image = dataset_train.load_image(image_id)
    mask, class_ids = dataset_train.load_mask(image_id)
    visualize.display_top_masks(image, mask, class_ids, dataset_train.class_names)

爲上面構造的數據集配置一個對應的ShapesConfig類，該類的做用統一模型配置參數。該類須要繼承Config類：

class ShapesConfig(Config):
    """ 爲數據集添加訓練配置 繼承基類Config """
    NAME = "shapes" # 該配置類的識別符

    #Batch size is 8 (GPUs * images/GPU).
    GPU_COUNT = 1 # GPU數量
    IMAGES_PER_GPU = 8 # 單GPU上處理圖片數(這裏咱們構造的數據集圖片小，能夠多處理幾張) 

    # 分類種類數目 (包括背景)
    NUM_CLASSES = 1 + 3  # background + 3 shapes

    # 使用小圖片能夠更快的訓練
    IMAGE_MIN_DIM = 128 # 圖片的小邊長
    IMAGE_MAX_DIM = 128 # 圖片的大邊長

    # 使用小的anchors，由於數據圖片和目標都小
    RPN_ANCHOR_SCALES = (8, 16, 32, 64, 128)  # anchor side in pixels

    # 減小訓練每張圖片上的ROIs，由於圖片很小且目標不多，
    # Aim to allow ROI sampling to pick 33% positive ROIs.
    TRAIN_ROIS_PER_IMAGE = 32

    STEPS_PER_EPOCH = 100     # 由於數據簡單，使用小的epoch

    VALIDATION_STPES = 5    # 由於epoch較小，使用小的交叉驗證步數

config = ShapesConfig()
config.print()

>>>
>>>

Configurations:
BACKBONE_SHAPES                [[32 32]
 [16 16]
 [ 8  8]
 [ 4  4]
 [ 2  2]]
BACKBONE_STRIDES               [4, 8, 16, 32, 64]
BATCH_SIZE                     8
BBOX_STD_DEV                   [ 0.1  0.1  0.2  0.2]
DETECTION_MAX_INSTANCES        100
DETECTION_MIN_CONFIDENCE       0.7
DETECTION_NMS_THRESHOLD        0.3
GPU_COUNT                      1
IMAGES_PER_GPU                 8
IMAGE_MAX_DIM                  128
IMAGE_MIN_DIM                  128
IMAGE_PADDING                  True
IMAGE_SHAPE                    [128 128   3]
LEARNING_MOMENTUM              0.9
LEARNING_RATE                  0.002
MASK_POOL_SIZE                 14
MASK_SHAPE                     [28, 28]
MAX_GT_INSTANCES               100
MEAN_PIXEL                     [ 123.7  116.8  103.9]
MINI_MASK_SHAPE                (56, 56)
NAME                           shapes
NUM_CLASSES                    4
POOL_SIZE                      7
POST_NMS_ROIS_INFERENCE        1000
POST_NMS_ROIS_TRAINING         2000
ROI_POSITIVE_RATIO             0.33
RPN_ANCHOR_RATIOS              [0.5, 1, 2]
RPN_ANCHOR_SCALES              (8, 16, 32, 64, 128)
RPN_ANCHOR_STRIDE              2
RPN_BBOX_STD_DEV               [ 0.1  0.1  0.2  0.2]
RPN_TRAIN_ANCHORS_PER_IMAGE    256
STEPS_PER_EPOCH                100
TRAIN_ROIS_PER_IMAGE           32
USE_MINI_MASK                  True
USE_RPN_ROIS                   True
VALIDATION_STPES               5
WEIGHT_DECAY                   0.0001

加載模型並訓練

上面配置好了我的數據集和對應的Config了，下面加載預訓練模型：

# 模型有兩種模式: training inference
# 建立模型並設置training模式
model = modellib.MaskRCNN(mode="training", config=config,
                          model_dir=MODEL_DIR)

# 選擇權重類型，這裏咱們的預訓練權重是COCO的
init_with = "coco"  # imagenet, coco, or last

if init_with == "imagenet":
    model.load_weights(model.get_imagenet_weights(), by_name=True)
elif init_with == "coco":
    # 載入在MS COCO上的預訓練模型,跳過不同的分類數目層
    model.load_weights(COCO_MODEL_PATH, by_name=True,
                       exclude=["mrcnn_class_logits", "mrcnn_bbox_fc", 
                                "mrcnn_bbox", "mrcnn_mask"])
elif init_with == "last":
    # 載入你最後訓練的模型，繼續訓練
    model.load_weights(model.find_last()[1], by_name=True)

訓練模型

咱們前面基礎層是加載預訓練模型的，在預訓練模型的基礎上再訓練，分爲兩步：

只訓練head部分，爲了避免破壞基礎層的提取能力，咱們凍結全部backbone layers,只訓練隨機初始化的層，爲了達成只訓練head部分，訓練時須要向train()方法傳入layers='heads'參數。
Fine-tune全部層，上面訓練了一會head部分，爲了更好的適配新的數據集，須要fine-tune，使用layers='all'參數。

這兩個步驟也是作遷移學習的必備套路了~

1. 訓練head部分

# 經過傳入參數layers="heads" 凍結處理head部分的全部層。能夠經過傳入一個正則表達式選擇要訓練的層
model.train(dataset_train, dataset_val, 
            learning_rate=config.LEARNING_RATE, 
            epochs=1, 
            layers='heads')

>>>
>>>
Starting at epoch 0. LR=0.002

Checkpoint Path: /root/Mask_RCNNmaster/logs/shapes20171103T2047/mask_rcnn_shapes_{epoch:04d}.h5
Selecting layers to train
fpn_c5p5               (Conv2D)
fpn_c4p4               (Conv2D)
fpn_c3p3               (Conv2D)
fpn_c2p2               (Conv2D)
fpn_p5                 (Conv2D)
fpn_p2                 (Conv2D)
fpn_p3                 (Conv2D)
fpn_p4                 (Conv2D)
In model:  rpn_model
    rpn_conv_shared        (Conv2D)
    rpn_class_raw          (Conv2D)
    rpn_bbox_pred          (Conv2D)
mrcnn_mask_conv1       (TimeDistributed)
...
mrcnn_mask_conv4       (TimeDistributed)
mrcnn_mask_bn4         (TimeDistributed)
mrcnn_bbox_fc          (TimeDistributed)
mrcnn_mask_deconv      (TimeDistributed)
mrcnn_class_logits     (TimeDistributed)
mrcnn_mask             (TimeDistributed)

Epoch 1/1
100/100 [==============================] - 37s 371ms/step - loss: 2.5472 - rpn_class_loss: 0.0244 - rpn_bbox_loss: 1.1118 - mrcnn_class_loss: 0.3692 - mrcnn_bbox_loss: 0.3783 - mrcnn_mask_loss: 0.3223 - val_loss: 1.7634 - val_rpn_class_loss: 0.0143 - val_rpn_bbox_loss: 0.9989 - val_mrcnn_class_loss: 0.1673 - val_mrcnn_bbox_loss: 0.0857 - val_mrcnn_mask_loss: 0.1559

2. Fine tune 全部層

# 經過傳入參數layers="all"全部層
model.train(dataset_train, dataset_val, 
            learning_rate=config.LEARNING_RATE / 10,
            epochs=2, 
            layers="all")

>>>
>>>

Starting at epoch 1. LR=0.0002

Checkpoint Path: /root/Mask_RCNN-master/logs/shapes20171103T2047/mask_rcnn_shapes_{epoch:04d}.h5
Selecting layers to train
conv1                  (Conv2D)
bn_conv1               (BatchNorm)
res2a_branch2a         (Conv2D)
bn2a_branch2a          (BatchNorm)
res2a_branch2b         (Conv2D)
...
...
res5c_branch2c         (Conv2D)
bn5c_branch2c          (BatchNorm)
fpn_c5p5               (Conv2D)
fpn_c4p4               (Conv2D)
fpn_c3p3               (Conv2D)
fpn_c2p2               (Conv2D)
fpn_p5                 (Conv2D)
fpn_p2                 (Conv2D)
fpn_p3                 (Conv2D)
fpn_p4                 (Conv2D)
In model:  rpn_model
    rpn_conv_shared        (Conv2D)
    rpn_class_raw          (Conv2D)
    rpn_bbox_pred          (Conv2D)
mrcnn_mask_conv1       (TimeDistributed)
mrcnn_mask_bn1         (TimeDistributed)
mrcnn_mask_conv2       (TimeDistributed)
mrcnn_class_conv1      (TimeDistributed)
mrcnn_mask_bn2         (TimeDistributed)
mrcnn_class_bn1        (TimeDistributed)
mrcnn_mask_conv3       (TimeDistributed)
mrcnn_mask_bn3         (TimeDistributed)
mrcnn_class_conv2      (TimeDistributed)
mrcnn_class_bn2        (TimeDistributed)
mrcnn_mask_conv4       (TimeDistributed)
mrcnn_mask_bn4         (TimeDistributed)
mrcnn_bbox_fc          (TimeDistributed)
mrcnn_mask_deconv      (TimeDistributed)
mrcnn_class_logits     (TimeDistributed)
mrcnn_mask             (TimeDistributed)

Epoch 2/2
100/100 [==============================] - 38s 381ms/step - loss: 11.4351 - rpn_class_loss: 0.0190 - rpn_bbox_loss: 0.9108 - mrcnn_class_loss: 0.2085 - mrcnn_bbox_loss: 0.1606 - mrcnn_mask_loss: 0.2198 - val_loss: 11.2957 - val_rpn_class_loss: 0.0173 - val_rpn_bbox_loss: 0.8740 - val_mrcnn_class_loss: 0.1590 - val_mrcnn_bbox_loss: 0.0997 - val_mrcnn_mask_loss: 0.2296

模型預測

模型預測也須要配置一個類InferenceConfig類，大部分配置和train相同：

class InferenceConfig(ShapesConfig):
    GPU_COUNT = 1
    IMAGES_PER_GPU = 1

inference_config = InferenceConfig()

# 從新建立模型設置爲inference模式
model = modellib.MaskRCNN(mode="inference", 
                          config=inference_config,
                          model_dir=MODEL_DIR)

# 獲取保存的權重，或者手動指定目錄位置
# model_path = os.path.join(ROOT_DIR, ".h5 file name here")
model_path = model.find_last()[1]

# 加載權重
assert model_path != "", "Provide path to trained weights"
print("Loading weights from ", model_path)
model.load_weights(model_path, by_name=True)

# 測試隨機圖片
image_id = random.choice(dataset_val.image_ids)
original_image, image_meta, gt_bbox, gt_mask =\
    modellib.load_image_gt(dataset_val, inference_config, 
                           image_id, use_mini_mask=False)

log("original_image", original_image)
log("image_meta", image_meta)
log("gt_bbox", gt_bbox)
log("gt_mask", gt_mask)

visualize.display_instances(original_image, gt_bbox[:,:4], gt_mask, gt_bbox[:,4], 
                            dataset_train.class_names, figsize=(8, 8))

>>>
>>>
original_image           shape: (128, 128, 3)         min:   18.00000  max:  231.00000
image_meta               shape: (12,)                 min:    0.00000  max:  128.00000
gt_bbox                  shape: (2, 5)                min:    1.00000  max:  115.00000
gt_mask                  shape: (128, 128, 2)         min:    0.00000  max:    1.00000

隨機幾張驗證集圖片看看:

使用模型預測：

def get_ax(rows=1, cols=1, size=8):
    """返回Matplotlib Axes數組用於可視化.提供中心點控制圖形大小"""
    _, ax = plt.subplots(rows, cols, figsize=(size*cols, size*rows))
    return ax

results = model.detect([original_image], verbose=1) # 預測

r = results[0]
visualize.display_instances(original_image, r['rois'], r['masks'], r['class_ids'], 
                            dataset_val.class_names, r['scores'], ax=get_ax())

>>>
>>>
Processing 1 images
image                    shape: (128, 128, 3)         min:   18.00000  max:  231.00000
molded_images            shape: (1, 128, 128, 3)      min:  -98.80000  max:  127.10000
image_metas              shape: (1, 12)               min:    0.00000  max:  128.00000

計算ap值：

# Compute VOC-Style mAP @ IoU=0.5
# Running on 10 images. Increase for better accuracy.
image_ids = np.random.choice(dataset_val.image_ids, 10)
APs = []
for image_id in image_ids:
    # 加載數據
    image, image_meta, gt_bbox, gt_mask =\
        modellib.load_image_gt(dataset_val, inference_config,
                               image_id, use_mini_mask=False)
    molded_images = np.expand_dims(modellib.mold_image(image, inference_config), 0)
    # Run object detection
    results = model.detect([image], verbose=0)
    r = results[0]
    # Compute AP
    AP, precisions, recalls, overlaps =\
        utils.compute_ap(gt_bbox[:,:4], gt_bbox[:,4],
                         r["rois"], r["class_ids"], r["scores"])
    APs.append(AP)

print("mAP: ", np.mean(APs))

>>>
>>>
mAP:  0.9

代碼分析-Mask R-CNN 模型分析

測試，調試和評估Mask R-CNN模型。

導包

這裏會用到自定義的COCO子數據集，5K的minival和35K的validation-minus-minival。（這兩個數據集下載比較慢，沒有貼原地址，而是個人CSDN地址，分不夠下載的能夠私信我~）

import os
import sys
import random
import math
import re
import time
import numpy as np
import scipy.misc
import tensorflow as tf
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.patches as patches

import utils
import visualize
from visualize import display_images
import model as modellib
from model import log

%matplotlib inline 

ROOT_DIR = os.getcwd()  # Root directory of the project
MODEL_DIR = os.path.join(ROOT_DIR, "logs") # Directory to save logs and trained model
COCO_MODEL_PATH = os.path.join(ROOT_DIR, "coco/mask_rcnn_coco.h5")  # Path to trained weights file
SHAPES_MODEL_PATH = os.path.join(ROOT_DIR, "log/shapes20171103T2047/mask_rcnn_shapes_0002.h5") # Path to Shapes trained weights

# Shapes toy dataset
# import shapes
# config = shapes.ShapesConfig()

# MS COCO Dataset
import coco
config = coco.CocoConfig()
COCO_DIR = os.path.join(ROOT_DIR, "coco")  # TODO: enter value here

def get_ax(rows=1, cols=1, size=16):
    """控制繪圖大小"""
    _, ax = plt.subplots(rows, cols, figsize=(size*cols, size*rows))
    return ax

# 建立一個預測配置類InferenceConfig，用於測試預訓練模型
class InferenceConfig(config.__class__):
    # Run detection on one image at a time
    GPU_COUNT = 1
    IMAGES_PER_GPU = 1

config = InferenceConfig()
DEVICE = "/cpu:0"  # /cpu:0 or /gpu:0
TEST_MODE = "inference" # values: 'inference' or 'training'

# 加載驗證集 
if config.NAME == 'shapes':
    dataset = shapes.ShapesDataset()
    dataset.load_shapes(500, config.IMAGE_SHAPE[0], config.IMAGE_SHAPE[1])
elif config.NAME == "coco":
    dataset = coco.CocoDataset()
    dataset.load_coco(COCO_DIR, "minival")

# Must call before using the dataset
dataset.prepare()

# 建立模型並設置inference mode
with tf.device(DEVICE):
    model = modellib.MaskRCNN(mode="inference", model_dir=MODEL_DIR,
                              config=config)

# Set weights file path
if config.NAME == "shapes":
    weights_path = SHAPES_MODEL_PATH
elif config.NAME == "coco":
    weights_path = COCO_MODEL_PATH
# Or, uncomment to load the last model you trained
# weights_path = model.find_last()[1]

# Load weights
print("Loading weights ", weights_path)
model.load_weights(weights_path, by_name=True)


image_id = random.choice(dataset.image_ids)
image, image_meta, gt_bbox, gt_mask =\
    modellib.load_image_gt(dataset, config, image_id, use_mini_mask=False)
info = dataset.image_info[image_id]
print("image ID: {}.{} ({}) {}".format(info["source"], info["id"], image_id, 
                                       dataset.image_reference(image_id)))
gt_class_id = gt_bbox[:, 4]

# Run object detection
results = model.detect([image], verbose=1)

# Display results
ax = get_ax(1)
r = results[0]
# visualize.display_instances(image, gt_bbox[:,:4], gt_mask, gt_bbox[:,4], 
# dataset.class_names, ax=ax[0], title="Ground Truth")
visualize.display_instances(image, r['rois'], r['masks'], r['class_ids'], 
                            dataset.class_names, r['scores'], ax=ax,
                            title="Predictions")
log("gt_class_id", gt_class_id)
log("gt_bbox", gt_bbox)
log("gt_mask", gt_mask)

隨機在數據集中選張照片看看：

區域候選網絡(Region Proposal Network,RPN)

RPN網絡的任務就是作目標區域推薦，從R-CNN中使用的Selective Search方法到Faster R-CNN中使用的Anchor方法，目的就是用更快的方法產生更好的RoI。

RPN在圖像上建立大量的boxes(anchors)，並在anchors上運行一個輕量級的二值分類器返回有目標/無目標的分數。具備高分數的anchors(positive anchors，正樣本)會被傳到下一階段用於分類。

一般，positive anchors也不會徹底覆蓋目標，因此RPN在對anchor打分的同時會迴歸一個偏移量和放縮值，用於修正anchors位置和大小。

RPN Target

RPN Target是須要找到有目標的anchor，傳遞到模型後面用於分類等任務。RPN會在一個完整的圖片上覆蓋多種不一樣形狀的anchors，經過計算anchors與標註的ground truth(GT box)的IoU，認爲IoU≥0.7爲正樣本，IoU≤0.3爲負樣本，卡在中間的丟棄爲中立樣本，訓練模型不使用。

上面提到了訓練RPN的同時會迴歸一個偏移量和放縮值，目的就是用來修正anchor的位置和大小，最終更好的與ground truth相cover。

# 生成RPN trainig targets
# target_rpn_match 值爲1表明positive anchors, -1表明negative，0表明neutral.
target_rpn_match, target_rpn_bbox = modellib.build_rpn_targets(
    image.shape, model.anchors, gt_bbox, model.config)

log("target_rpn_match", target_rpn_match)
log("target_rpn_bbox", target_rpn_bbox)

# 分類全部anchor
positive_anchor_ix = np.where(target_rpn_match[:] == 1)[0]
negative_anchor_ix = np.where(target_rpn_match[:] == -1)[0]
neutral_anchor_ix = np.where(target_rpn_match[:] == 0)[0]

positive_anchors = model.anchors[positive_anchor_ix]
negative_anchors = model.anchors[negative_anchor_ix]
neutral_anchors = model.anchors[neutral_anchor_ix]
log("positive_anchors", positive_anchors)
log("negative_anchors", negative_anchors)
log("neutral anchors", neutral_anchors)

# 對positive anchor作修正
refined_anchors = utils.apply_box_deltas(
    positive_anchors,
    target_rpn_bbox[:positive_anchors.shape[0]] * model.config.RPN_BBOX_STD_DEV)
log("refined_anchors", refined_anchors, )

>>>
>>>
target_rpn_match         shape: (65472,)              min:   -1.00000  max:    1.00000
target_rpn_bbox          shape: (256, 4)              min:   -3.66348  max:    7.29204
positive_anchors         shape: (19, 4)               min:  -53.01934  max: 1030.62742
negative_anchors         shape: (237, 4)              min:  -90.50967  max: 1038.62742
neutral anchors          shape: (65216, 4)            min: -362.03867  max: 1258.03867
refined_anchors          shape: (19, 4)               min:   -0.00000  max: 1024.00000

看看positive anchors和修正後的positive anchors:

visualize.draw_boxes(image, boxes=positive_anchors, refined_boxes=refined_anchors, ax=get_ax())

RPN Prediction

# Run RPN sub-graph
pillar = model.keras_model.get_layer("ROI").output  # node to start searching from
rpn = model.run_graph([image], [
    ("rpn_class", model.keras_model.get_layer("rpn_class").output),
    ("pre_nms_anchors", model.ancestor(pillar, "ROI/pre_nms_anchors:0")),
    ("refined_anchors", model.ancestor(pillar, "ROI/refined_anchors:0")),
    ("refined_anchors_clipped", model.ancestor(pillar, "ROI/refined_anchors_clipped:0")),
    ("post_nms_anchor_ix", model.ancestor(pillar, "ROI/rpn_non_max_suppression:0")),
    ("proposals", model.keras_model.get_layer("ROI").output),
])

>>>
>>>
rpn_class                shape: (1, 65472, 2)         min:    0.00000  max:    1.00000
pre_nms_anchors          shape: (1, 10000, 4)         min: -362.03867  max: 1258.03870
refined_anchors          shape: (1, 10000, 4)         min: -1030.40588  max: 2164.92578
refined_anchors_clipped  shape: (1, 10000, 4)         min:    0.00000  max: 1024.00000
post_nms_anchor_ix       shape: (1000,)               min:    0.00000  max: 1879.00000
proposals                shape: (1, 1000, 4)          min:    0.00000  max:    1.00000

看看高分的anchors（沒有修正前):

limit = 100
sorted_anchor_ids = np.argsort(rpn['rpn_class'][:,:,1].flatten())[::-1]
visualize.draw_boxes(image, boxes=model.anchors[sorted_anchor_ids[:limit]], ax=get_ax())

看看修正後的高分anchors，超過的圖片邊界的會被截止:

limit = 50
ax = get_ax(1, 2)
visualize.draw_boxes(image, boxes=rpn["pre_nms_anchors"][0, :limit], 
 refined_boxes=rpn["refined_anchors"][0, :limit], ax=ax[0])
visualize.draw_boxes(image, refined_boxes=rpn["refined_anchors_clipped"][0, :limit], ax=ax[1])

對上面的anchors作非極大值抑制:

limit = 50
ixs = rpn["post_nms_anchor_ix"][:limit]
visualize.draw_boxes(image, refined_boxes=rpn["refined_anchors_clipped"][0, ixs], ax=get_ax())

最終的proposal和上面的步驟一致，只是在座標上作了歸一化操做:

limit = 50
# Convert back to image coordinates for display
h, w = config.IMAGE_SHAPE[:2]
proposals = rpn['proposals'][0, :limit] * np.array([h, w, h, w])
visualize.draw_boxes(image, refined_boxes=proposals, ax=get_ax())

測量RPN的召回率(目標被anchors覆蓋的比例)，這裏咱們計算召回率有三種方法：

全部的 anchors
全部修正的anchors
通過極大值抑制後的修正Anchors

iou_threshold = 0.7

recall, positive_anchor_ids = utils.compute_recall(model.anchors, gt_bbox, iou_threshold)
print("All Anchors ({:5}) Recall: {:.3f} Positive anchors: {}".format(
    model.anchors.shape[0], recall, len(positive_anchor_ids)))

recall, positive_anchor_ids = utils.compute_recall(rpn['refined_anchors'][0], gt_bbox, iou_threshold)
print("Refined Anchors ({:5}) Recall: {:.3f} Positive anchors: {}".format(
    rpn['refined_anchors'].shape[1], recall, len(positive_anchor_ids)))

recall, positive_anchor_ids = utils.compute_recall(proposals, gt_bbox, iou_threshold)
print("Post NMS Anchors ({:5}) Recall: {:.3f} Positive anchors: {}".format(
    proposals.shape[0], recall, len(positive_anchor_ids)))

>>>
>>>
All Anchors (65472)       Recall: 0.263  Positive anchors: 5
Refined Anchors (10000)   Recall: 0.895  Positive anchors: 126
Post NMS Anchors (   50)  Recall: 0.526  Positive anchors: 12

Proposal 分類

前面RPN Target是生成region proposal，這裏就要對其分類了~

Proposal Classification

將RPN推選出來的Proposal送到分類部分，最終生成種類機率分佈和bbox迴歸。

# Get input and output to classifier and mask heads.
mrcnn = model.run_graph([image], [
    ("proposals", model.keras_model.get_layer("ROI").output),
    ("probs", model.keras_model.get_layer("mrcnn_class").output),
    ("deltas", model.keras_model.get_layer("mrcnn_bbox").output),
    ("masks", model.keras_model.get_layer("mrcnn_mask").output),
    ("detections", model.keras_model.get_layer("mrcnn_detection").output),
])

>>>
>>>
proposals                shape: (1, 1000, 4)          min:    0.00000  max:    1.00000
probs                    shape: (1, 1000, 81)         min:    0.00000  max:    0.99825
deltas                   shape: (1, 1000, 81, 4)      min:   -3.31265  max:    2.86541
masks                    shape: (1, 100, 28, 28, 81)  min:    0.00003  max:    0.99986
detections               shape: (1, 100, 6)           min:    0.00000  max:  930.00000

獲取檢測種類，除去填充的0部分:

det_class_ids = mrcnn['detections'][0, :, 4].astype(np.int32)
det_count = np.where(det_class_ids == 0)[0][0]
det_class_ids = det_class_ids[:det_count]
detections = mrcnn['detections'][0, :det_count]

print("{} detections: {}".format(
    det_count, np.array(dataset.class_names)[det_class_ids]))

captions = ["{} {:.3f}".format(dataset.class_names[int(c)], s) if c > 0 else ""
            for c, s in zip(detections[:, 4], detections[:, 5])]
visualize.draw_boxes(
    image, 
    refined_boxes=detections[:, :4],
    visibilities=[2] * len(detections),
    captions=captions, title="Detections",
    ax=get_ax())

>>>
>>>
11 detections: ['person' 'person' 'person' 'person' 'person' 'orange' 'person' 'orange'
 'dog' 'handbag' 'apple']

Step by Step Detection

# Proposals是標準座標， 放縮回圖片座標
h, w = config.IMAGE_SHAPE[:2]
proposals = np.around(mrcnn["proposals"][0] * np.array([h, w, h, w])).astype(np.int32)

# Class ID, score, and mask per proposal
roi_class_ids = np.argmax(mrcnn["probs"][0], axis=1)
roi_scores = mrcnn["probs"][0, np.arange(roi_class_ids.shape[0]), roi_class_ids]
roi_class_names = np.array(dataset.class_names)[roi_class_ids]
roi_positive_ixs = np.where(roi_class_ids > 0)[0]

# How many ROIs vs empty rows?
print("{} Valid proposals out of {}".format(np.sum(np.any(proposals, axis=1)), proposals.shape[0]))
print("{} Positive ROIs".format(len(roi_positive_ixs)))

# Class counts
print(list(zip(*np.unique(roi_class_names, return_counts=True))))

>>>
>>>
1000 Valid proposals out of 1000
106 Positive ROIs
[('BG', 894), ('apple', 25), ('cup', 2), ('dog', 4), ('handbag', 2), ('orange', 36), ('person', 36), ('sandwich', 1)]

看一些隨機取出的proposal樣本，BG的不作顯示，主要看有類別的，還有其對應的分數:

limit = 200
ixs = np.random.randint(0, proposals.shape[0], limit)
captions = ["{} {:.3f}".format(dataset.class_names[c], s) if c > 0 else ""
            for c, s in zip(roi_class_ids[ixs], roi_scores[ixs])]
visualize.draw_boxes(image, boxes=proposals[ixs],
                     visibilities=np.where(roi_class_ids[ixs] > 0, 2, 1),
                     captions=captions, title="ROIs Before Refinment",
                     ax=get_ax())

作bbox修正:

# Class-specific bounding box shifts.
roi_bbox_specific = mrcnn["deltas"][0, np.arange(proposals.shape[0]), roi_class_ids]
log("roi_bbox_specific", roi_bbox_specific)

# Apply bounding box transformations
# Shape: [N, (y1, x1, y2, x2)]
refined_proposals = utils.apply_box_deltas(
    proposals, roi_bbox_specific * config.BBOX_STD_DEV).astype(np.int32)
log("refined_proposals", refined_proposals)

# Show positive proposals
# ids = np.arange(roi_boxes.shape[0]) # Display all
limit = 5
ids = np.random.randint(0, len(roi_positive_ixs), limit)  # Display random sample
captions = ["{} {:.3f}".format(dataset.class_names[c], s) if c > 0 else ""
            for c, s in zip(roi_class_ids[roi_positive_ixs][ids], roi_scores[roi_positive_ixs][ids])]
visualize.draw_boxes(image, boxes=proposals[roi_positive_ixs][ids],
                     refined_boxes=refined_proposals[roi_positive_ixs][ids],
                     visibilities=np.where(roi_class_ids[roi_positive_ixs][ids] > 0, 1, 0),
                     captions=captions, title="ROIs After Refinment",
                     ax=get_ax())

>>>
>>>
roi_bbox_specific        shape: (1000, 4)             min:   -3.31265  max:    2.86541
refined_proposals        shape: (1000, 4)             min:   -1.00000  max: 1024.00000

濾掉低分的檢測目標:

# Remove boxes classified as background
keep = np.where(roi_class_ids > 0)[0]
print("Keep {} detections:\n{}".format(keep.shape[0], keep))

# Remove low confidence detections
keep = np.intersect1d(keep, np.where(roi_scores >= config.DETECTION_MIN_CONFIDENCE)[0])
print("Remove boxes below {} confidence. Keep {}:\n{}".format(
    config.DETECTION_MIN_CONFIDENCE, keep.shape[0], keep))

>>>
>>>
Keep 106 detections:
[  0   1   2   3   4   5   6   7   9  10  11  12  13  14  15  16  17  18
  19  22  23  24  25  26  27  28  31  34  35  36  37  38  41  43  47  51
  56  65  66  67  68  71  73  75  82  87  91  92 101 102 105 109 110 115
 117 120 123 138 156 164 171 175 177 184 197 205 241 253 258 263 265 280
 287 325 367 430 451 452 464 469 491 514 519 527 554 597 610 686 697 712
 713 748 750 780 815 871 911 917 933 938 942 947 949 953 955 981]

Remove boxes below 0.7 confidence. Keep 44:
[  0   1   2   3   4   5   6   9  12  13  14  17  19  26  31  34  38  41
  43  47  67  75  82  87  92 120 123 164 171 175 177 205 258 325 452 469
 519 697 713 815 871 911 917 949]

作非極大值抑制操做:

# Apply per-class non-max suppression
pre_nms_boxes = refined_proposals[keep]
pre_nms_scores = roi_scores[keep]
pre_nms_class_ids = roi_class_ids[keep]

nms_keep = []
for class_id in np.unique(pre_nms_class_ids):
    # Pick detections of this class
    ixs = np.where(pre_nms_class_ids == class_id)[0]
    # Apply NMS
    class_keep = utils.non_max_suppression(pre_nms_boxes[ixs], 
                                            pre_nms_scores[ixs],
                                            config.DETECTION_NMS_THRESHOLD)
    # Map indicies
    class_keep = keep[ixs[class_keep]]
    nms_keep = np.union1d(nms_keep, class_keep)
    print("{:22}: {} -> {}".format(dataset.class_names[class_id][:20], 
                                   keep[ixs], class_keep))

keep = np.intersect1d(keep, nms_keep).astype(np.int32)
print("\nKept after per-class NMS: {}\n{}".format(keep.shape[0], keep))

>>>
>>>
person                : [  0   1   2   3   5   9  12  13  14  19  26  41  43  47  82  92 120 123
 175 177 258 452 469 519 871 911 917] -> [ 5 12  1  2  3 19]
dog                   : [  6  75 171] -> [75]
handbag               : [815] -> [815]
apple                 : [38] -> [38]
orange                : [  4  17  31  34  67  87 164 205 325 697 713 949] -> [ 4 87]

Kept after per-class NMS: 11
[  1   2   3   4   5  12  19  38  75  87 815]

看看最後的結果：

ixs = np.arange(len(keep))  # Display all
# ixs = np.random.randint(0, len(keep), 10) # Display random sample
captions = ["{} {:.3f}".format(dataset.class_names[c], s) if c > 0 else ""
            for c, s in zip(roi_class_ids[keep][ixs], roi_scores[keep][ixs])]
visualize.draw_boxes(
    image, boxes=proposals[keep][ixs],
    refined_boxes=refined_proposals[keep][ixs],
    visibilities=np.where(roi_class_ids[keep][ixs] > 0, 1, 0),
    captions=captions, title="Detections after NMS",
    ax=get_ax())

生成Mask

在上一階段產生的實例基礎上，經過mask head爲每一個實例產生分割mask。

Mask Target

即Mask分支的訓練目標:

display_images(np.transpose(gt_mask, [2, 0, 1]), cmap="Blues")

Predicted Masks

# Get predictions of mask head
mrcnn = model.run_graph([image], [
    ("detections", model.keras_model.get_layer("mrcnn_detection").output),
    ("masks", model.keras_model.get_layer("mrcnn_mask").output),
])

# Get detection class IDs. Trim zero padding.
det_class_ids = mrcnn['detections'][0, :, 4].astype(np.int32)
det_count = np.where(det_class_ids == 0)[0][0]
det_class_ids = det_class_ids[:det_count]

print("{} detections: {}".format(
    det_count, np.array(dataset.class_names)[det_class_ids]))

# Masks
det_boxes = mrcnn["detections"][0, :, :4].astype(np.int32)
det_mask_specific = np.array([mrcnn["masks"][0, i, :, :, c] 
                              for i, c in enumerate(det_class_ids)])
det_masks = np.array([utils.unmold_mask(m, det_boxes[i], image.shape)
                      for i, m in enumerate(det_mask_specific)])
log("det_mask_specific", det_mask_specific)
log("det_masks", det_masks)

display_images(det_mask_specific[:4] * 255, cmap="Blues", interpolation="none")

>>>
>>>
detections               shape: (1, 100, 6)           min:    0.00000  max:  930.00000
masks                    shape: (1, 100, 28, 28, 81)  min:    0.00003  max:    0.99986
11 detections: ['person' 'person' 'person' 'person' 'person' 'orange' 'person' 'orange'
 'dog' 'handbag' 'apple']

det_mask_specific        shape: (11, 28, 28)          min:    0.00016  max:    0.99985
det_masks                shape: (11, 1024, 1024)      min:    0.00000  max:    1.00000

display_images(det_masks[:4] * 255, cmap="Blues", interpolation="none")

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。