上節的最後,咱們進行了以下操做獲取了有限的proposal,html
# [IMAGES_PER_GPU, num_rois, (y1, x1, y2, x2)] # IMAGES_PER_GPU取代了batch,以後說的batch都是IMAGES_PER_GPU rpn_rois = ProposalLayer( proposal_count=proposal_count, nms_threshold=config.RPN_NMS_THRESHOLD, # 0.7 name="ROI", config=config)([rpn_class, rpn_bbox, anchors])
總結一下:python
與 GT 的 IOU 大於0.7git
與某一個 GT 的 IOU 最大的那個 anchor網絡
進一步,咱們須要按照RCNN的思路,使用proposal對共享特徵進行ROI操做,在Mask-RCNN中這裏有兩個創新:架構
ROI使用ROI Align取代了以前的ROI Poolingapp
共享特徵由以前的單層變換爲了FPN獲得的金字塔多層特徵,即:
mrcnn_feature_maps
=
[P2, P3, P4, P5]
ide
其中創新點2意味着咱們不一樣的proposal對應去ROI的特徵層並不相同,因此,咱們須要:函數
按照proposal的長寬,將不一樣的proposal對應給不一樣的特徵層oop
在對應特徵層上進行ROI操做post
下面會用到高維切片函數,這裏先行給出講解連接:『TensorFlow』高級高維切片gather_nd
接前文bulid函數代碼,咱們以下調入實現本節的功能,
if mode == "training": …… else: # Network Heads # Proposal classifier and BBox regressor heads # output shapes: # mrcnn_class_logits: [batch, num_rois, NUM_CLASSES] classifier logits (before softmax) # mrcnn_class: [batch, num_rois, NUM_CLASSES] classifier probabilities # mrcnn_bbox(deltas): [batch, num_rois, NUM_CLASSES, (dy, dx, log(dh), log(dw))] mrcnn_class_logits, mrcnn_class, mrcnn_bbox =\ fpn_classifier_graph(rpn_rois, mrcnn_feature_maps, input_image_meta, config.POOL_SIZE, config.NUM_CLASSES, train_bn=config.TRAIN_BN, fc_layers_size=config.FPN_CLASSIF_FC_LAYERS_SIZE)
FPN特徵層分類函數縱覽以下,
############################################################ # Feature Pyramid Network Heads ############################################################ def fpn_classifier_graph(rois, feature_maps, image_meta, pool_size, num_classes, train_bn=True, fc_layers_size=1024): """Builds the computation graph of the feature pyramid network classifier and regressor heads. rois: [batch, num_rois, (y1, x1, y2, x2)] Proposal boxes in normalized coordinates. feature_maps: List of feature maps from different layers of the pyramid, [P2, P3, P4, P5]. Each has a different resolution. image_meta: [batch, (meta data)] Image details. See compose_image_meta() pool_size: The width of the square feature map generated from ROI Pooling. num_classes: number of classes, which determines the depth of the results train_bn: Boolean. Train or freeze Batch Norm layers fc_layers_size: Size of the 2 FC layers Returns: logits: [batch, num_rois, NUM_CLASSES] classifier logits (before softmax) probs: [batch, num_rois, NUM_CLASSES] classifier probabilities bbox_deltas: [batch, num_rois, NUM_CLASSES, (dy, dx, log(dh), log(dw))] Deltas to apply to proposal boxes """ # ROI Pooling # Shape: [batch, num_rois, POOL_SIZE, POOL_SIZE, channels] x = PyramidROIAlign([pool_size, pool_size], name="roi_align_classifier")([rois, image_meta] + feature_maps) # Two 1024 FC layers (implemented with Conv2D for consistency) # TimeDistributed拆分了輸入數據的第1維(從0開始),將徹底同樣的模型獨立的應用於拆分後的輸入數據,具體到下行, # 就是將num_rois個卷積應用到num_rois個維度爲[batch, POOL_SIZE, POOL_SIZE, channels]的輸入,結果合併 x = KL.TimeDistributed(KL.Conv2D(fc_layers_size, (pool_size, pool_size), padding="valid"), name="mrcnn_class_conv1")(x) # [batch, num_rois, 1, 1, 1024] x = KL.TimeDistributed(BatchNorm(), name='mrcnn_class_bn1')(x, training=train_bn) x = KL.Activation('relu')(x) x = KL.TimeDistributed(KL.Conv2D(fc_layers_size, (1, 1)), name="mrcnn_class_conv2")(x) x = KL.TimeDistributed(BatchNorm(), name='mrcnn_class_bn2')(x, training=train_bn) x = KL.Activation('relu')(x) shared = KL.Lambda(lambda x: K.squeeze(K.squeeze(x, 3), 2), name="pool_squeeze")(x) # [batch, num_rois, 1024] # Classifier head mrcnn_class_logits = KL.TimeDistributed(KL.Dense(num_classes), name='mrcnn_class_logits')(shared) mrcnn_probs = KL.TimeDistributed(KL.Activation("softmax"), name="mrcnn_class")(mrcnn_class_logits) # BBox head # [batch, num_rois, NUM_CLASSES * (dy, dx, log(dh), log(dw))] x = KL.TimeDistributed(KL.Dense(num_classes * 4, activation='linear'), name='mrcnn_bbox_fc')(shared) # Reshape to [batch, num_rois, NUM_CLASSES, (dy, dx, log(dh), log(dw))] s = K.int_shape(x) # 下行源碼:K.reshape(inputs, (K.shape(inputs)[0],) + self.target_shape) mrcnn_bbox = KL.Reshape((s[1], num_classes, 4), name="mrcnn_bbox")(x) return mrcnn_class_logits, mrcnn_probs, mrcnn_bbox
下面咱們來分析一下該函數。進入函數,首先調用了PyramidROI,
# ROI Pooling # Shape: [batch, num_rois, POOL_SIZE, POOL_SIZE, channels] x = PyramidROIAlign([pool_size, pool_size], name="roi_align_classifier")([rois, image_meta] + feature_maps)
這個class基本實現了咱們開篇所說的所有功能,即特徵層分類並ROI。
首先咱們依據『計算機視覺』FPN特徵金字塔網絡中第三節所講,對proposal進行分類,注意的是咱們使用於網絡中的hw是歸一化了的(以原圖hw爲單位長度),因此計算時須要還原(對於公式而言:w,h分別表示寬度和高度;k是分配RoI的level;是w,h=224,224時映射的level)。
注意兩個操做節點:level_boxes和box_indices,第一個記錄了對應的level特徵層中分配到的每一個box的座標,第二個則記錄了每一個box對應的圖片在batch中的索引(一個記錄了候選框索引對應的圖片即下文中的兩個大塊,一個記錄了候選框的索引對應其座標即小黑框的座標),二者結合能夠索引到下面每一個黑色小框的座標信息。
至於ROI Align自己,實際就是雙線性插值,使用內置API實現便可。
這裏屬於RPN網絡和RCNN網絡的分界線,level_boxes和box_indices自己屬於RPN計算出來結果,可是二者做用於feature後的輸出Tensor倒是RCNN部分的輸入,可是兩部分的梯度不能相互流通的,因此須要tf.stop_gradient()截斷梯度傳播。
############################################################ # ROIAlign Layer ############################################################ def log2_graph(x): """Implementation of Log2. TF doesn't have a native implementation.""" return tf.log(x) / tf.log(2.0) class PyramidROIAlign(KE.Layer): """Implements ROI Pooling on multiple levels of the feature pyramid. Params: - pool_shape: [pool_height, pool_width] of the output pooled regions. Usually [7, 7] Inputs: - boxes: [batch, num_boxes, (y1, x1, y2, x2)] in normalized coordinates. Possibly padded with zeros if not enough boxes to fill the array. - image_meta: [batch, (meta data)] Image details. See compose_image_meta() - feature_maps: List of feature maps from different levels of the pyramid. Each is [batch, height, width, channels] Output: Pooled regions in the shape: [batch, num_boxes, pool_height, pool_width, channels]. The width and height are those specific in the pool_shape in the layer constructor. """ def __init__(self, pool_shape, **kwargs): super(PyramidROIAlign, self).__init__(**kwargs) self.pool_shape = tuple(pool_shape) def call(self, inputs): # num_boxes指的是proposal數目,它們均會做用於每張圖片上,只是不一樣的proposal做用於圖片 # 的特徵級別不一樣,我經過循環特徵層尋找符合的proposal,應用ROIAlign # Crop boxes [batch, num_boxes, (y1, x1, y2, x2)] in normalized coords boxes = inputs[0] # Image meta # Holds details about the image. See compose_image_meta() image_meta = inputs[1] # Feature Maps. List of feature maps from different level of the # feature pyramid. Each is [batch, height, width, channels] feature_maps = inputs[2:] # Assign each ROI to a level in the pyramid based on the ROI area. y1, x1, y2, x2 = tf.split(boxes, 4, axis=2) h = y2 - y1 w = x2 - x1 # Use shape of first image. Images in a batch must have the same size. image_shape = parse_image_meta_graph(image_meta)['image_shape'][0] # h, w, c # Equation 1 in the Feature Pyramid Networks paper. Account for # the fact that our coordinates are normalized here. # e.g. a 224x224 ROI (in pixels) maps to P4 image_area = tf.cast(image_shape[0] * image_shape[1], tf.float32) roi_level = log2_graph(tf.sqrt(h * w) / (224.0 / tf.sqrt(image_area))) # h、w已經歸一化 roi_level = tf.minimum(5, tf.maximum( 2, 4 + tf.cast(tf.round(roi_level), tf.int32))) # 確保值位於2到5之間 roi_level = tf.squeeze(roi_level, 2) # [batch, num_boxes] # Loop through levels and apply ROI pooling to each. P2 to P5. pooled = [] box_to_level = [] for i, level in enumerate(range(2, 6)): # tf.where 返回值格式 [座標1, 座標2……] # np.where 返回值格式 [[座標1.x, 座標2.x……], [座標1.y, 座標2.y……]] ix = tf.where(tf.equal(roi_level, level)) # 返回座標表示:第n張圖片的第i個proposal level_boxes = tf.gather_nd(boxes, ix) # [本level的proposal數目, 4] # Box indices for crop_and_resize. box_indices = tf.cast(ix[:, 0], tf.int32) # 記錄每一個propose對應圖片序號 # Keep track of which box is mapped to which level box_to_level.append(ix) # Stop gradient propogation to ROI proposals level_boxes = tf.stop_gradient(level_boxes) box_indices = tf.stop_gradient(box_indices) # Crop and Resize # From Mask R-CNN paper: "We sample four regular locations, so # that we can evaluate either max or average pooling. In fact, # interpolating only a single value at each bin center (without # pooling) is nearly as effective." # # Here we use the simplified approach of a single value per bin, # which is how it's done in tf.crop_and_resize() # Result: [this_level_num_boxes, pool_height, pool_width, channels] pooled.append(tf.image.crop_and_resize( feature_maps[i], level_boxes, box_indices, self.pool_shape, method="bilinear")) # 輸入參數shape: # [batch, image_height, image_width, channels] # [this_level_num_boxes, 4] # [this_level_num_boxes] # [height, pool_width] # Pack pooled features into one tensor pooled = tf.concat(pooled, axis=0) # [batch*num_boxes, pool_height, pool_width, channels] # Pack box_to_level mapping into one array and add another # column representing the order of pooled boxes box_to_level = tf.concat(box_to_level, axis=0) # [batch*num_boxes, 2] box_range = tf.expand_dims(tf.range(tf.shape(box_to_level)[0]), 1) # [batch*num_boxes, 1] box_to_level = tf.concat([tf.cast(box_to_level, tf.int32), box_range], axis=1) # [batch*num_boxes, 3] # 截止到目前,咱們獲取了記錄所有ROIAlign結果feat集合的張量pooled,和記錄這些feat相關信息的張量box_to_level, # 因爲提取方法的緣由,此時的feat並非按照原始順序排序(先按batch而後按box index排序),下面咱們設法將之恢復順 # 序(ROIAlign做用於對應圖片的對應proposal生成feat) # Rearrange pooled features to match the order of the original boxes # Sort box_to_level by batch then box index # TF doesn't have a way to sort by two columns, so merge them and sort. # box_to_level[i, 0]表示的是當前feat隸屬的圖片索引,box_to_level[i, 1]表示的是其box序號 sorting_tensor = box_to_level[:, 0] * 100000 + box_to_level[:, 1] # [batch*num_boxes] ix = tf.nn.top_k(sorting_tensor, k=tf.shape( box_to_level)[0]).indices[::-1] ix = tf.gather(box_to_level[:, 2], ix) pooled = tf.gather(pooled, ix) # Re-add the batch dimension # [batch, num_boxes, (y1, x1, y2, x2)], [batch*num_boxes, pool_height, pool_width, channels] shape = tf.concat([tf.shape(boxes)[:2], tf.shape(pooled)[1:]], axis=0) pooled = tf.reshape(pooled, shape) return pooled # [batch, num_boxes, pool_height, pool_width, channels]
通過ROI以後,咱們獲取了衆多shape一致的小feat,爲了獲取他們的分類迴歸信息,咱們構建一系列並行的網絡進行處理,
# Two 1024 FC layers (implemented with Conv2D for consistency) # TimeDistributed拆分了輸入數據的第1維(從0開始),將徹底同樣的模型獨立的應用於拆分後的輸入數據,具體到下行, # 就是將num_rois個卷積應用到num_rois個維度爲[batch, POOL_SIZE, POOL_SIZE, channels]的輸入,結果合併 x = KL.TimeDistributed(KL.Conv2D(fc_layers_size, (pool_size, pool_size), padding="valid"), name="mrcnn_class_conv1")(x) # [batch, num_rois, 1, 1, 1024] x = KL.TimeDistributed(BatchNorm(), name='mrcnn_class_bn1')(x, training=train_bn) x = KL.Activation('relu')(x) x = KL.TimeDistributed(KL.Conv2D(fc_layers_size, (1, 1)), name="mrcnn_class_conv2")(x) x = KL.TimeDistributed(BatchNorm(), name='mrcnn_class_bn2')(x, training=train_bn) x = KL.Activation('relu')(x) shared = KL.Lambda(lambda x: K.squeeze(K.squeeze(x, 3), 2), name="pool_squeeze")(x) # [batch, num_rois, 1024] # Classifier head mrcnn_class_logits = KL.TimeDistributed(KL.Dense(num_classes), name='mrcnn_class_logits')(shared) mrcnn_probs = KL.TimeDistributed(KL.Activation("softmax"), name="mrcnn_class")(mrcnn_class_logits) # BBox head # [batch, num_rois, NUM_CLASSES * (dy, dx, log(dh), log(dw))] x = KL.TimeDistributed(KL.Dense(num_classes * 4, activation='linear'), name='mrcnn_bbox_fc')(shared) # Reshape to [batch, num_rois, NUM_CLASSES, (dy, dx, log(dh), log(dw))] s = K.int_shape(x) # 下行源碼:K.reshape(inputs, (K.shape(inputs)[0],) + self.target_shape) mrcnn_bbox = KL.Reshape((s[1], num_classes, 4), name="mrcnn_bbox")(x) return mrcnn_class_logits, mrcnn_probs, mrcnn_bbox
返回以下:
mrcnn_class_logits: [batch, num_rois, NUM_CLASSES] classifier logits (before softmax)
mrcnn_class: [batch, num_rois, NUM_CLASSES] classifier probabilities
mrcnn_bbox(deltas): [batch, num_rois, NUM_CLASSES, (dy, dx, log(dh), log(dw))]
KL.TimeDistributed實現創建一系列一樣架構的的並行網絡結構(dim0個),將[dim0, dim1, ……]中的每一個[dim1, ……]做爲輸入,並行的計算輸出。
def build(self, mode, config): """Build Mask R-CNN architecture. input_shape: The shape of the input image. mode: Either "training" or "inference". The inputs and outputs of the model differ accordingly. """ assert mode in ['training', 'inference'] # Image size must be dividable by 2 multiple times h, w = config.IMAGE_SHAPE[:2] # [1024 1024 3] if h / 2**6 != int(h / 2**6) or w / 2**6 != int(w / 2**6): # 這裏就限定了下采樣不會產生座標偏差 raise Exception("Image size must be dividable by 2 at least 6 times " "to avoid fractions when downscaling and upscaling." "For example, use 256, 320, 384, 448, 512, ... etc. ") # Inputs input_image = KL.Input( shape=[None, None, config.IMAGE_SHAPE[2]], name="input_image") input_image_meta = KL.Input(shape=[config.IMAGE_META_SIZE], name="input_image_meta") if mode == "training": # RPN GT input_rpn_match = KL.Input( shape=[None, 1], name="input_rpn_match", dtype=tf.int32) input_rpn_bbox = KL.Input( shape=[None, 4], name="input_rpn_bbox", dtype=tf.float32) # Detection GT (class IDs, bounding boxes, and masks) # 1. GT Class IDs (zero padded) input_gt_class_ids = KL.Input( shape=[None], name="input_gt_class_ids", dtype=tf.int32) # 2. GT Boxes in pixels (zero padded) # [batch, MAX_GT_INSTANCES, (y1, x1, y2, x2)] in image coordinates input_gt_boxes = KL.Input( shape=[None, 4], name="input_gt_boxes", dtype=tf.float32) # Normalize coordinates gt_boxes = KL.Lambda(lambda x: norm_boxes_graph( x, K.shape(input_image)[1:3]))(input_gt_boxes) # 3. GT Masks (zero padded) # [batch, height, width, MAX_GT_INSTANCES] if config.USE_MINI_MASK: input_gt_masks = KL.Input( shape=[config.MINI_MASK_SHAPE[0], config.MINI_MASK_SHAPE[1], None], name="input_gt_masks", dtype=bool) else: input_gt_masks = KL.Input( shape=[config.IMAGE_SHAPE[0], config.IMAGE_SHAPE[1], None], name="input_gt_masks", dtype=bool) elif mode == "inference": # Anchors in normalized coordinates input_anchors = KL.Input(shape=[None, 4], name="input_anchors") # Build the shared convolutional layers. # Bottom-up Layers # Returns a list of the last layers of each stage, 5 in total. # Don't create the thead (stage 5), so we pick the 4th item in the list. if callable(config.BACKBONE): _, C2, C3, C4, C5 = config.BACKBONE(input_image, stage5=True, train_bn=config.TRAIN_BN) else: _, C2, C3, C4, C5 = resnet_graph(input_image, config.BACKBONE, stage5=True, train_bn=config.TRAIN_BN) # Top-down Layers # TODO: add assert to varify feature map sizes match what's in config P5 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c5p5')(C5) # 256 P4 = KL.Add(name="fpn_p4add")([ KL.UpSampling2D(size=(2, 2), name="fpn_p5upsampled")(P5), KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c4p4')(C4)]) P3 = KL.Add(name="fpn_p3add")([ KL.UpSampling2D(size=(2, 2), name="fpn_p4upsampled")(P4), KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c3p3')(C3)]) P2 = KL.Add(name="fpn_p2add")([ KL.UpSampling2D(size=(2, 2), name="fpn_p3upsampled")(P3), KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c2p2')(C2)]) # Attach 3x3 conv to all P layers to get the final feature maps. P2 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p2")(P2) P3 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p3")(P3) P4 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p4")(P4) P5 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p5")(P5) # P6 is used for the 5th anchor scale in RPN. Generated by # subsampling from P5 with stride of 2. P6 = KL.MaxPooling2D(pool_size=(1, 1), strides=2, name="fpn_p6")(P5) # Note that P6 is used in RPN, but not in the classifier heads. rpn_feature_maps = [P2, P3, P4, P5, P6] mrcnn_feature_maps = [P2, P3, P4, P5] # Anchors if mode == "training": anchors = self.get_anchors(config.IMAGE_SHAPE) # Duplicate across the batch dimension because Keras requires it # TODO: can this be optimized to avoid duplicating the anchors? anchors = np.broadcast_to(anchors, (config.BATCH_SIZE,) + anchors.shape) # A hack to get around Keras's bad support for constants anchors = KL.Lambda(lambda x: tf.Variable(anchors), name="anchors")(input_image) else: anchors = input_anchors # RPN Model, 返回的是keras的Module對象, 注意keras中的Module對象是可call的 rpn = build_rpn_model(config.RPN_ANCHOR_STRIDE, # 1 3 256 len(config.RPN_ANCHOR_RATIOS), config.TOP_DOWN_PYRAMID_SIZE) # Loop through pyramid layers layer_outputs = [] # list of lists for p in rpn_feature_maps: layer_outputs.append(rpn([p])) # 保存各pyramid特徵通過RPN以後的結果 # Concatenate layer outputs # Convert from list of lists of level outputs to list of lists # of outputs across levels. # e.g. [[a1, b1, c1], [a2, b2, c2]] => [[a1, a2], [b1, b2], [c1, c2]] output_names = ["rpn_class_logits", "rpn_class", "rpn_bbox"] outputs = list(zip(*layer_outputs)) # [[logits2,……6], [class2,……6], [bbox2,……6]] outputs = [KL.Concatenate(axis=1, name=n)(list(o)) for o, n in zip(outputs, output_names)] # [batch, num_anchors, 2/4] # 其中num_anchors指的是所有特徵層上的anchors總數 rpn_class_logits, rpn_class, rpn_bbox = outputs # Generate proposals # Proposals are [batch, N, (y1, x1, y2, x2)] in normalized coordinates # and zero padded. # POST_NMS_ROIS_INFERENCE = 1000 # POST_NMS_ROIS_TRAINING = 2000 proposal_count = config.POST_NMS_ROIS_TRAINING if mode == "training"\ else config.POST_NMS_ROIS_INFERENCE # [IMAGES_PER_GPU, num_rois, (y1, x1, y2, x2)] # IMAGES_PER_GPU取代了batch,以後說的batch都是IMAGES_PER_GPU rpn_rois = ProposalLayer( proposal_count=proposal_count, nms_threshold=config.RPN_NMS_THRESHOLD, # 0.7 name="ROI", config=config)([rpn_class, rpn_bbox, anchors]) if mode == "training": # Class ID mask to mark class IDs supported by the dataset the image # came from. active_class_ids = KL.Lambda( lambda x: parse_image_meta_graph(x)["active_class_ids"] )(input_image_meta) if not config.USE_RPN_ROIS: # Ignore predicted ROIs and use ROIs provided as an input. input_rois = KL.Input(shape=[config.POST_NMS_ROIS_TRAINING, 4], name="input_roi", dtype=np.int32) # Normalize coordinates target_rois = KL.Lambda(lambda x: norm_boxes_graph( x, K.shape(input_image)[1:3]))(input_rois) else: target_rois = rpn_rois # Generate detection targets # Subsamples proposals and generates target outputs for training # Note that proposal class IDs, gt_boxes, and gt_masks are zero # padded. Equally, returned rois and targets are zero padded. rois, target_class_ids, target_bbox, target_mask =\ DetectionTargetLayer(config, name="proposal_targets")([ target_rois, input_gt_class_ids, gt_boxes, input_gt_masks]) # Network Heads # TODO: verify that this handles zero padded ROIs mrcnn_class_logits, mrcnn_class, mrcnn_bbox =\ fpn_classifier_graph(rois, mrcnn_feature_maps, input_image_meta, config.POOL_SIZE, config.NUM_CLASSES, train_bn=config.TRAIN_BN, fc_layers_size=config.FPN_CLASSIF_FC_LAYERS_SIZE) mrcnn_mask = build_fpn_mask_graph(rois, mrcnn_feature_maps, input_image_meta, config.MASK_POOL_SIZE, config.NUM_CLASSES, train_bn=config.TRAIN_BN) # TODO: clean up (use tf.identify if necessary) output_rois = KL.Lambda(lambda x: x * 1, name="output_rois")(rois) # Losses rpn_class_loss = KL.Lambda(lambda x: rpn_class_loss_graph(*x), name="rpn_class_loss")( [input_rpn_match, rpn_class_logits]) rpn_bbox_loss = KL.Lambda(lambda x: rpn_bbox_loss_graph(config, *x), name="rpn_bbox_loss")( [input_rpn_bbox, input_rpn_match, rpn_bbox]) class_loss = KL.Lambda(lambda x: mrcnn_class_loss_graph(*x), name="mrcnn_class_loss")( [target_class_ids, mrcnn_class_logits, active_class_ids]) bbox_loss = KL.Lambda(lambda x: mrcnn_bbox_loss_graph(*x), name="mrcnn_bbox_loss")( [target_bbox, target_class_ids, mrcnn_bbox]) mask_loss = KL.Lambda(lambda x: mrcnn_mask_loss_graph(*x), name="mrcnn_mask_loss")( [target_mask, target_class_ids, mrcnn_mask]) # Model inputs = [input_image, input_image_meta, input_rpn_match, input_rpn_bbox, input_gt_class_ids, input_gt_boxes, input_gt_masks] if not config.USE_RPN_ROIS: inputs.append(input_rois) outputs = [rpn_class_logits, rpn_class, rpn_bbox, mrcnn_class_logits, mrcnn_class, mrcnn_bbox, mrcnn_mask, rpn_rois, output_rois, rpn_class_loss, rpn_bbox_loss, class_loss, bbox_loss, mask_loss] model = KM.Model(inputs, outputs, name='mask_rcnn') else: # Network Heads # Proposal classifier and BBox regressor heads # output shapes: # mrcnn_class_logits: [batch, num_rois, NUM_CLASSES] classifier logits (before softmax) # mrcnn_class: [batch, num_rois, NUM_CLASSES] classifier probabilities # mrcnn_bbox(deltas): [batch, num_rois, NUM_CLASSES, (dy, dx, log(dh), log(dw))] mrcnn_class_logits, mrcnn_class, mrcnn_bbox =\ fpn_classifier_graph(rpn_rois, mrcnn_feature_maps, input_image_meta, config.POOL_SIZE, config.NUM_CLASSES, train_bn=config.TRAIN_BN, fc_layers_size=config.FPN_CLASSIF_FC_LAYERS_SIZE) # Detections # output is [batch, num_detections, (y1, x1, y2, x2, class_id, score)] in # normalized coordinates detections = DetectionLayer(config, name="mrcnn_detection")( [rpn_rois, mrcnn_class, mrcnn_bbox, input_image_meta]) # Create masks for detections detection_boxes = KL.Lambda(lambda x: x[..., :4])(detections) mrcnn_mask = build_fpn_mask_graph(detection_boxes, mrcnn_feature_maps, input_image_meta, config.MASK_POOL_SIZE, config.NUM_CLASSES, train_bn=config.TRAIN_BN) model = KM.Model([input_image, input_image_meta, input_anchors], [detections, mrcnn_class, mrcnn_bbox, mrcnn_mask, rpn_rois, rpn_class, rpn_bbox], name='mask_rcnn') # Add multi-GPU support. if config.GPU_COUNT > 1: from mrcnn.parallel_model import ParallelModel model = ParallelModel(model, config.GPU_COUNT) return model