前面多節中咱們花了大量筆墨介紹build方法的inference分支,這節咱們看看它是如何被調用的。html
在dimo.ipynb中,涉及model的操做咱們簡單進行一下彙總,首先建立圖並載入預訓練權重,python
而後規範了類別序列,git
實際開始檢測的代碼塊以下,github
經由model.detect方法,調用model.build方法(也就是咱們前面多節在講解的方法)構建圖,實施預測。windows
首先看看detect方法的前幾行(和build同樣,同見model.py),數組
def detect(self, images, verbose=0): """Runs the detection pipeline. images: List of images, potentially of different sizes. Returns a list of dicts, one dict per image. The dict contains: rois: [N, (y1, x1, y2, x2)] detection bounding boxes class_ids: [N] int class IDs scores: [N] float probability scores for the class IDs masks: [H, W, N] instance binary masks """ assert self.mode == "inference", "Create model in inference mode." assert len( images) == self.config.BATCH_SIZE, "len(images) must be equal to BATCH_SIZE" # 日誌記錄 if verbose: log("Processing {} images".format(len(images))) for image in images: log("image", image)
# Mold inputs to format expected by the neural network molded_images, image_metas, windows = self.mold_inputs(images) # Validate image sizes # All images in a batch MUST be of the same size image_shape = molded_images[0].shape for g in molded_images[1:]: assert g.shape == image_shape,\ "After resizing, all images must have the same size. Check IMAGE_RESIZE_MODE and image sizes."
簡單的糾錯和日誌控制以後,即調用mold_input函數對輸入圖片進行調整,並記錄圖片信息。網絡
self.mold_inputs方法以下,app
def mold_inputs(self, images): """Takes a list of images and modifies them to the format expected as an input to the neural network. images: List of image matrices [height,width,depth]. Images can have different sizes. Returns 3 Numpy matrices: molded_images: [N, h, w, 3]. Images resized and normalized. image_metas: [N, length of meta data]. Details about each image. windows: [N, (y1, x1, y2, x2)]. The portion of the image that has the original image (padding excluded). """ molded_images = [] image_metas = [] windows = [] for image in images: # Resize image # TODO: move resizing to mold_image() molded_image, window, scale, padding, crop = utils.resize_image( image, min_dim=self.config.IMAGE_MIN_DIM, # 800 min_scale=self.config.IMAGE_MIN_SCALE, # 0 max_dim=self.config.IMAGE_MAX_DIM, # 1024 mode=self.config.IMAGE_RESIZE_MODE) # square molded_image = mold_image(molded_image, self.config) # 減平均像素 # Build image_meta 形式爲np數組 image_meta = compose_image_meta( 0, image.shape, molded_image.shape, window, scale, np.zeros([self.config.NUM_CLASSES], dtype=np.int32)) # Append molded_images.append(molded_image) windows.append(window) image_metas.append(image_meta) # Pack into arrays molded_images = np.stack(molded_images) image_metas = np.stack(image_metas) windows = np.stack(windows) return molded_images, image_metas, windows
utils.resize_image函數用於縮放原圖像,它生成一個scale,返回圖像大小等於輸入圖像大小*scale並保證dom
最後,將圖片padding到max_dim*max_dim大小(即molded_images大小實際上是固定的),其返回值以下:ide
image.astype(image_dtype), window, scale, padding, crop
表示:resize後圖片,原圖相對resize後圖片的位置信息(詳見『計算機視覺』Mask-RCNN_推斷網絡其五:目標檢測結果精煉),放縮倍數,padding信息(四個整數),crop信息(四個整數或者None)。
mold_image函數更爲簡單,就是把圖片像素減去了個平均值,MEAN_PIXEL=[123.7 116.8 103.9]。
compose_image_meta記錄了所有的原始信息,能夠看到,crop並未收錄在內,
def compose_image_meta(image_id, original_image_shape, image_shape, window, scale, active_class_ids): """Takes attributes of an image and puts them in one 1D array. image_id: An int ID of the image. Useful for debugging. original_image_shape: [H, W, C] before resizing or padding. image_shape: [H, W, C] after resizing and padding window: (y1, x1, y2, x2) in pixels. The area of the image where the real image is (excluding the padding) scale: The scaling factor applied to the original image (float32) active_class_ids: List of class_ids available in the dataset from which the image came. Useful if training on images from multiple datasets where not all classes are present in all datasets. """ meta = np.array( [image_id] + # size=1 list(original_image_shape) + # size=3 list(image_shape) + # size=3 list(window) + # size=4 (y1, x1, y2, x2) in image cooredinates [scale] + # size=1 list(active_class_ids) # size=num_classes ) return meta
最後拼接返回。
首先調用方法get_anchors生成錨框(見『計算機視覺』Mask-RCNN_錨框生成),shape爲[anchor_count, (y1, x1, y2, x2)],
# Anchors anchors = self.get_anchors(image_shape) # Duplicate across the batch dimension because Keras requires it # TODO: can this be optimized to avoid duplicating the anchors? # [anchor_count, (y1, x1, y2, x2)] --> [batch, anchor_count, (y1, x1, y2, x2)] anchors = np.broadcast_to(anchors, (self.config.BATCH_SIZE,) + anchors.shape)
而後爲之添加batch維度,最終[batch, anchor_count, (y1, x1, y2, x2)]。
調用keras的predict方法前向傳播,在預測任務中咱們僅僅關注detections和mrcnn_mask兩個輸出。
# Run object detection # 於__init__中定義:self.keras_model = self.build(mode=mode, config=config) # 返回list: [detections, mrcnn_class, mrcnn_bbox, # mrcnn_mask, rpn_rois, rpn_class, rpn_bbox] # detections, [batch, num_detections, (y1, x1, y2, x2, class_id, score)] # mrcnn_mask, [batch, num_detections, MASK_POOL_SIZE, MASK_POOL_SIZE, NUM_CLASSES] detections, _, _, mrcnn_mask, _, _, _ =\ self.keras_model.predict([molded_images, image_metas, anchors], verbose=0)
咱們對於座標的操做都是基於輸入圖片的相對位置,且單位長度也是其寬高,在最後咱們須要將之修正回像素空間座標。
令輸入圖片list不須要輸入圖片具備相同的尺寸,因此咱們在恢復時必須注意單張處理之。
# Process detections results = [] for i, image in enumerate(images): # 須要單張處理,由於原始圖片images不保證每張尺寸一致 final_rois, final_class_ids, final_scores, final_masks =\ self.unmold_detections(detections[i], mrcnn_mask[i], image.shape, molded_images[i].shape, windows[i])
def unmold_detections(self, detections, mrcnn_mask, original_image_shape, image_shape, window): """Reformats the detections of one image from the format of the neural network output to a format suitable for use in the rest of the application. detections: [N, (y1, x1, y2, x2, class_id, score)] in normalized coordinates mrcnn_mask: [N, height, width, num_classes] original_image_shape: [H, W, C] Original image shape before resizing image_shape: [H, W, C] Shape of the image after resizing and padding window: [y1, x1, y2, x2] Pixel coordinates of box in the image where the real image is excluding the padding. Returns: boxes: [N, (y1, x1, y2, x2)] Bounding boxes in pixels class_ids: [N] Integer class IDs for each bounding box scores: [N] Float probability scores of the class_id masks: [height, width, num_instances] Instance masks """ # How many detections do we have? # Detections array is padded with zeros. Find the first class_id == 0. zero_ix = np.where(detections[:, 4] == 0)[0] # DetectionLayer 末尾對結果進行了全0填充 N = zero_ix[0] if zero_ix.shape[0] > 0 else detections.shape[0] # 有意義的檢測結果數N # Extract boxes, class_ids, scores, and class-specific masks boxes = detections[:N, :4] # [N, (y1, x1, y2, x2)] class_ids = detections[:N, 4].astype(np.int32) # [N, class_id] scores = detections[:N, 5] # [N, score] masks = mrcnn_mask[np.arange(N), :, :, class_ids] # [N, height, width, num_classes] # Translate normalized coordinates in the resized image to pixel # coordinates in the original image before resizing window = utils.norm_boxes(window, image_shape[:2]) # window相對輸入圖片規範化 wy1, wx1, wy2, wx2 = window shift = np.array([wy1, wx1, wy1, wx1]) wh = wy2 - wy1 # window height ww = wx2 - wx1 # window width scale = np.array([wh, ww, wh, ww]) # Convert boxes to normalized coordinates on the window boxes = np.divide(boxes - shift, scale) # box相對window座標規範化 # Convert boxes to pixel coordinates on the original image boxes = utils.denorm_boxes(boxes, original_image_shape[:2]) # box相對原圖解規範化 # Filter out detections with zero area. Happens in early training when # network weights are still random exclude_ix = np.where( (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1]) <= 0)[0] if exclude_ix.shape[0] > 0: boxes = np.delete(boxes, exclude_ix, axis=0) class_ids = np.delete(class_ids, exclude_ix, axis=0) scores = np.delete(scores, exclude_ix, axis=0) masks = np.delete(masks, exclude_ix, axis=0) N = class_ids.shape[0] # Resize masks to original image size and set boundary threshold. full_masks = [] for i in range(N): # 單個box操做 # Convert neural network mask to full size mask full_mask = utils.unmold_mask(masks[i], boxes[i], original_image_shape) full_masks.append(full_mask) full_masks = np.stack(full_masks, axis=-1)\ if full_masks else np.empty(original_image_shape[:2] + (0,)) # [n, (y1, x1, y2, x2)] # [n, class_id] # [n, class_id] # [h, w, n] return boxes, class_ids, scores, full_masks
爲了將輸出結果格式還原,咱們須要進行以下幾步:
剔除爲了湊齊DETECTION_MAX_INSTANCES 填充的全0檢測結果
將box放縮回原始圖片對應尺寸
剔除面積爲0的box
mask輸出尺寸還原
在網絡中操做的box尺寸爲基於輸入圖片的規範化座標,window爲像素座標,因此咱們先將window相對輸入圖片規範化,使得window和box處於同一座標系,而後這二者座標就能夠直接交互了,使box相對window規範化,此時box座標尺寸都是window的相對值,而window和原始圖片是直接有映射關係的,因此box遵循其關係,映射回原始像素大小便可。
完成box映射後,咱們開始對mask進行處理。
utils.unmold_mask受調用於unmold_detections尾部:
# Resize masks to original image size and set boundary threshold. full_masks = [] for i in range(N): # 單個box操做 # Convert neural network mask to full size mask full_mask = utils.unmold_mask(masks[i], boxes[i], original_image_shape) full_masks.append(full_mask) full_masks = np.stack(full_masks, axis=-1)\ if full_masks else np.empty(original_image_shape[:2] + (0,))
首先重申咱們的unmold_detections函數是對單張圖片進行處理的,而mask處理進一步的是對每個檢測框進行處理的,
def unmold_mask(mask, bbox, image_shape): """Converts a mask generated by the neural network to a format similar to its original shape. mask: [height, width] of type float. A small, typically 28x28 mask. bbox: [y1, x1, y2, x2]. The box to fit the mask in. Returns a binary mask with the same size as the original image. """ threshold = 0.5 y1, x1, y2, x2 = bbox mask = resize(mask, (y2 - y1, x2 - x1)) mask = np.where(mask >= threshold, 1, 0).astype(np.bool) # Put the mask in the right location. full_mask = np.zeros(image_shape[:2], dtype=np.bool) full_mask[y1:y2, x1:x2] = mask return full_mask
咱們在inference中輸出的mask信息僅僅是通常的生成網絡輸出,因此爲了獲得掩碼格式咱們須要一個閾值。明確了這個概念,下一步就簡單了,咱們將mask輸出放縮到對應的box大小便可(此時的box已經相對原始圖片進行了放縮,是像素座標),而後將放縮後的掩碼按照box相對原始圖片的位置貼在一張和原始圖片等大的空白圖片上。
咱們對每個檢測目標作這個操做,就能夠獲得等同於檢測目標數的原始圖片大小的掩碼圖片(每一個掩碼圖片上有一個掩碼對象),將之按照axis=-1拼接,最終獲取[h, w, n]格式輸出,hw爲原始圖片大小,n爲最終檢測到的目標數目。
最終,將計算結果返回,退出函數。
# [n, (y1, x1, y2, x2)] # [n, class_id] # [n, class_id] # [h, w, n] return boxes, class_ids, scores, full_masks
實際調用以下: