基於飛槳復現目標檢測網絡EfficientDet，感覺CVPR2020的新SOTA算法的威力

【飛槳開發者說】武秉泓，國內一線互聯網大廠工程師，計算機視覺技術愛好者，研究方向爲目標檢測、醫療影像php

內容簡介ios

下載安裝命令

## CPU版本安裝命令
pip install -f https://paddlepaddle.org.cn/pip/oschina/cpu paddlepaddle

## GPU版本安裝命令
pip install -f https://paddlepaddle.org.cn/pip/oschina/gpu paddlepaddle-gpu

EfficientDet是由Google Brain於2019年底在目標檢測領域所提出的當之無愧的新SOTA算法，並被收錄於CVPR2020。本項目對目標檢測算法EfficientDet進行了詳細的解析，並介紹了基於官方目標檢測開發套件PaddleDetection進行模型復現的細節。git

EfficientDet源於CVPR2020年的一篇文章 github

https://arxiv.org/abs/1911.09070（源碼：算法

https://github.com/google/automl/tree/master/efficientdet）, 其主要核心是在已完成網絡結構搜索的EfficientNet的基礎上，經過新設計的BiFPN進一步進行多尺度特徵融合，最後經由分類/迴歸分支生成檢測框，從而實現從高效分類器到高效檢測器的拓展。在總體結構上，EfficientDet與RetinaNet等一系列anchor-based的one-stage detector無明顯差異，但在每一個單一模塊上，EfficientDet在計算/存儲資源有限的狀況下將模型性能提高到了極致。網絡

EfficientDet與現有其餘主流模型的性能比對：app

EfficientDet網絡結構：ide

如上圖從左至右所示，EfficientDet總共分爲三個部分，依次爲檢測器的特徵提取模塊（Backbone）EfficientNet，檢測器的多尺度特徵融合模塊（Neck）BiFPN，以及檢測器的分類/迴歸預測模塊（Head）Class/Box prediction net，其具體模型結構定義代碼可見：性能

class EfficientDet(object):
    """
    EfficientDet architecture, see https://arxiv.org/abs/1911.09070
    Args:
        backbone (object): backbone instance
        fpn (object): feature pyramid network instance
        retina_head (object): `RetinaHead` instance
    """
    __category__ = 'architecture'
    __inject__ = ['backbone', 'fpn', 'efficient_head', 'anchor_grid']

    def __init__(self, backbone, fpn, efficient_head, anchor_grid, box_loss_weight=50.):
        super(EfficientDet, self).__init__()
        self.backbone = backbone
        self.fpn = fpn
        self.efficient_head = efficient_head
        self.anchor_grid = anchor_grid
        self.box_loss_weight = box_loss_weight

    def build(self, feed_vars, mode='train'):
        im = feed_vars['image']
        if mode == 'train':
            gt_labels = feed_vars['gt_label']
            gt_targets = feed_vars['gt_target']
            fg_num = feed_vars['fg_num']
        else:
            im_info = feed_vars['im_info']
        mixed_precision_enabled = mixed_precision_global_state() is not None
        if mixed_precision_enabled:
            im = fluid.layers.cast(im, 'float16')
        body_feats = self.backbone(im)
        if mixed_precision_enabled:
            body_feats = [fluid.layers.cast(f, 'float32') for f in body_feats]
        body_feats = self.fpn(body_feats)
        anchors = self.anchor_grid()

        if mode == 'train':
            loss = self.efficient_head.get_loss(body_feats, gt_labels, gt_targets, fg_num)
            loss_cls = loss['loss_cls']
            loss_bbox = loss['loss_bbox']
            total_loss = loss_cls + self.box_loss_weight * loss_bbox
            loss.update({'loss': total_loss})
            return loss
        else:
            pred = self.efficient_head.get_prediction(body_feats, anchors, im_info)
            return pred

在主體結構上，EfficientDet使用了從簡易到複雜的不一樣的配置來作到速度和精度上的權衡，以EfficientDet-D0爲例，模型組網&訓練配置參數可見(YML文件格式)：測試

architecture: EfficientDet
…
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/EfficientNetB0_pretrained.tar
weights: output/efficientdet_d0/model_final
…
EfficientDet:
  backbone: EfficientNet
  fpn: BiFPN
  efficient_head: EfficientHead
  anchor_grid: AnchorGrid
  box_loss_weight: 50.

EfficientNet:
  norm_type: sync_bn
  scale: b0
  use_se: true

BiFPN:
  num_chan: 64
  repeat: 3
  levels: 5

EfficientHead:
  repeat: 3
  num_chan: 64
  prior_prob: 0.01
  num_anchors: 9
  gamma: 1.5
  alpha: 0.25
  delta: 0.1
  output_decoder:
    score_thresh: 0.05   # originally 0.
    nms_thresh: 0.5
    pre_nms_top_n: 1000  # originally 5000
    detections_per_im: 100
    nms_eta: 1.0

AnchorGrid:
  anchor_base_scale: 4
  num_scales: 3
  aspect_ratios: [[1, 1], [1.4, 0.7], [0.7, 1.4]]
…

EfficientNet是由同做者Mingxing Tan發表於ICML 2019的分類網絡，主要是在計算資源有限的狀況下，考慮如何更高效地進行網絡結構組合，從而使得模型具備更高的分類精度。在EfficientNet的網絡設計時，其主要考慮三個維度對模型性能和資源佔用的影響：網絡深度(depth)、網絡寬度(width)和輸入圖像分辨率 (resolution)大小。在進行網絡結構搜索的背景下，做者所定義的優化目標以下：

在網絡結構搜索時，網絡深度(depth)、網絡寬度(width)和輸入圖像分辨率 (resolution)則爲可變量，爲了在固定計算資源的狀況下建模三者之間的關聯，做者又提出瞭如下的建模方式，用來表示三者之間的約束關係：

在網絡結構搜索時，與其餘網絡設計文章所不一樣的是，EfficientNet提出了複合擴張方法（compound scaling method），其主要分爲兩步：

1. 在計算資源固定的狀況下，經過網格搜索獲得基準的網絡深度/寬度/分辨率大小；

2. 經過複合係數（compound coefficient）對深度/寬度/分辨率進行同時增長，從而獲得一系列的網絡結構EfficientNet-B0至B7。

在PaddleDetection中EfficientNet實現以下所示：

from __future__ import absolute_import
from __future__ import division

import collections
import math
import re

from paddle import fluid
from paddle.fluid.regularizer import L2Decay

from ppdet.core.workspace import register

__all__ = ['EfficientNet']

GlobalParams = collections.namedtuple('GlobalParams', [
    'batch_norm_momentum', 'batch_norm_epsilon', 'width_coefficient', 'depth_coefficient', 'depth_divisor'
])

BlockArgs = collections.namedtuple('BlockArgs', [
    'kernel_size', 'num_repeat', 'input_filters', 'output_filters', 'expand_ratio', 'stride', 'se_ratio'
])

GlobalParams.__new__.__defaults__ = (None, ) * len(GlobalParams._fields)
BlockArgs.__new__.__defaults__ = (None, ) * len(BlockArgs._fields)


def _decode_block_string(block_string):
    assert isinstance(block_string, str)
    ops = block_string.split('_')
    options = {}
    for op in ops:
        splits = re.split(r'(\d.*)', op)
        if len(splits) >= 2:
            key, value = splits[:2]
            options[key] = value

    assert (('s' in options and len(options['s']) == 1) or (len(options['s']) == 2 and options['s'][0] == options['s'][1]))

    return BlockArgs(
        kernel_size=int(options['k']),
        num_repeat=int(options['r']),
        input_filters=int(options['i']),
        output_filters=int(options['o']),
        expand_ratio=int(options['e']),
        se_ratio=float(options['se']) if 'se' in options else None,
        stride=int(options['s'][0]))


def get_model_params(scale):
    block_strings = [
        'r1_k3_s11_e1_i32_o16_se0.25',
        'r2_k3_s22_e6_i16_o24_se0.25',
        'r2_k5_s22_e6_i24_o40_se0.25',
        'r3_k3_s22_e6_i40_o80_se0.25',
        'r3_k5_s11_e6_i80_o112_se0.25',
        'r4_k5_s22_e6_i112_o192_se0.25',
        'r1_k3_s11_e6_i192_o320_se0.25',
    ]
    block_args = []
    for block_string in block_strings:
        block_args.append(_decode_block_string(block_string))

    params_dict = {
        # width, depth
        'b0': (1.0, 1.0),
        'b1': (1.0, 1.1),
        'b2': (1.1, 1.2),
        'b3': (1.2, 1.4),
        'b4': (1.4, 1.8),
        'b5': (1.6, 2.2),
        'b6': (1.8, 2.6),
        'b7': (2.0, 3.1),
    }

    w, d = params_dict[scale]

    global_params = GlobalParams(
        batch_norm_momentum=0.99,
        batch_norm_epsilon=1e-3,
        width_coefficient=w,
        depth_coefficient=d,
        depth_divisor=8)

    return block_args, global_params


def round_filters(filters, global_params):
    multiplier = global_params.width_coefficient
    if not multiplier:
        return filters
    divisor = global_params.depth_divisor
    filters *= multiplier
    min_depth = divisor
    new_filters = max(min_depth, int(filters + divisor / 2) // divisor * divisor)
    if new_filters < 0.9 * filters:  # prevent rounding by more than 10%
        new_filters += divisor
    return int(new_filters)


def round_repeats(repeats, global_params):
    multiplier = global_params.depth_coefficient
    if not multiplier:
        return repeats
    return int(math.ceil(multiplier * repeats))


def conv2d(inputs, num_filters, filter_size, stride=1, padding='SAME', groups=1, use_bias=False, name='conv2d'):
    param_attr = fluid.ParamAttr(name=name + '_weights')
    bias_attr = False
    if use_bias:
        bias_attr = fluid.ParamAttr(name=name + '_offset', regularizer=L2Decay(0.))
    feats = fluid.layers.conv2d(inputs, num_filters, filter_size, groups=groups, name=name, stride=stride, padding=padding, param_attr=param_attr, bias_attr=bias_attr)
    return feats


def batch_norm(inputs, momentum, eps, name=None):
    param_attr = fluid.ParamAttr(name=name + '_scale', regularizer=L2Decay(0.))
    bias_attr = fluid.ParamAttr(name=name + '_offset', regularizer=L2Decay(0.))
    return fluid.layers.batch_norm(input=inputs, momentum=momentum, epsilon=eps, name=name, moving_mean_name=name + '_mean', moving_variance_name=name + '_variance', param_attr=param_attr, bias_attr=bias_attr)


def mb_conv_block(inputs, input_filters, output_filters, expand_ratio, kernel_size, stride, momentum, eps, se_ratio=None, name=None):
    feats = inputs
    num_filters = input_filters * expand_ratio

    if expand_ratio != 1:
        feats = conv2d(feats, num_filters, 1, name=name + '_expand_conv')
        feats = batch_norm(feats, momentum, eps, name=name + '_bn0')
        feats = fluid.layers.swish(feats)

    feats = conv2d(feats, num_filters, kernel_size, stride, groups=num_filters, name=name + '_depthwise_conv')
    feats = batch_norm(feats, momentum, eps, name=name + '_bn1')
    feats = fluid.layers.swish(feats)

    if se_ratio is not None:
        filter_squeezed = max(1, int(input_filters * se_ratio))
        squeezed = fluid.layers.pool2d(feats, pool_type='avg', global_pooling=True)
        squeezed = conv2d(squeezed, filter_squeezed, 1, use_bias=True, name=name + '_se_reduce')
        squeezed = fluid.layers.swish(squeezed)
        squeezed = conv2d(squeezed, num_filters, 1, use_bias=True, name=name + '_se_expand')
        feats = feats * fluid.layers.sigmoid(squeezed)

    feats = conv2d(feats, output_filters, 1, name=name + '_project_conv')
    feats = batch_norm(feats, momentum, eps, name=name + '_bn2')

    if stride == 1 and input_filters == output_filters:
        feats = fluid.layers.elementwise_add(feats, inputs)

    return feats


@register
class EfficientNet(object):
    """
    EfficientNet, see https://arxiv.org/abs/1905.11946
    Args:
        scale (str): compounding scale factor, 'b0' - 'b7'.
        use_se (bool): use squeeze and excite module.
        norm_type (str): normalization type, 'bn' and 'sync_bn' are supported
    """
    __shared__ = ['norm_type']

    def __init__(self, scale='b0', use_se=True, norm_type='bn'):
        assert scale in ['b' + str(i) for i in range(8)], "valid scales are b0 - b7"
        assert norm_type in ['bn', 'sync_bn'], "only 'bn' and 'sync_bn' are supported"

        super(EfficientNet, self).__init__()
        self.norm_type = norm_type
        self.scale = scale
        self.use_se = use_se

    def __call__(self, inputs):
        blocks_args, global_params = get_model_params(self.scale)
        momentum = global_params.batch_norm_momentum
        eps = global_params.batch_norm_epsilon

        num_filters = round_filters(32, global_params)
        feats = conv2d(inputs, num_filters=num_filters, filter_size=3, stride=2, name='_conv_stem')
        feats = batch_norm(feats, momentum=momentum, eps=eps, name='_bn0')
        feats = fluid.layers.swish(feats)

        layer_count = 0
        feature_maps = []

        for b, block_arg in enumerate(blocks_args):
            for r in range(block_arg.num_repeat):
                input_filters = round_filters(block_arg.input_filters, global_params)
                output_filters = round_filters(block_arg.output_filters, global_params)
                kernel_size = block_arg.kernel_size
                stride = block_arg.stride
                se_ratio = None
                if self.use_se:
                    se_ratio = block_arg.se_ratio
                if r > 0:
                    input_filters = output_filters
                    stride = 1
                feats = mb_conv_block(feats, input_filters, output_filters, block_arg.expand_ratio, kernel_size, stride, momentum, eps, se_ratio=se_ratio, name='_blocks.{}.'.format(layer_count))
                layer_count += 1
            feature_maps.append(feats)

        return list(feature_maps[i] for i in [2, 4, 6])

其中EfficientNet入參scale即對應着原文的複合係數compound coefficient，可選項爲b0 - b7，在模型訓練/推理時，會將最後三個不一樣block的feature map返回，並送入BiFPN進行進一步的多尺度特徵融合。

Neck：BiFPN

做爲EfficientDet的主要創新點，BiFPN主要經過堆疊多個「BiFPN Layer」來實現不一樣尺度特徵的高度融合。下圖爲BiFPN Layer和不一樣結構的FPN的對比，相比於基本的FPN結構，BiFPN在具備top-down鏈接以外，同時有具備第二次bottom up的自底向上的特徵融合同路。而相比於一樣具備第二次bottom up的PANet，BiFPN具備同尺度特徵的跨層鏈接（紫色箭頭），且每一個BiFPN Layer中每一個層均帶有獨立的attention權重，在計算每一個節點的特徵圖時，均以鏈接此節點的箭頭末端的feature map先進行尺度上的縮放，再乘以歸一化權重以後進行特徵融合，從而獲得該節點的feature map。

以P6節點爲例，特徵融合方式以下，其中爲中間一列節點的計算方式，則爲最後一列：

BiFPN Layer具體實現代碼以下所示（BiFPNCell類）：

from __future__ import absolute_import
from __future__ import division

from paddle import fluid
from paddle.fluid.param_attr import ParamAttr
from paddle.fluid.regularizer import L2Decay
from paddle.fluid.initializer import Constant, Xavier

from ppdet.core.workspace import register

__all__ = ['BiFPN']


class FusionConv(object):
    def __init__(self, num_chan):
        super(FusionConv, self).__init__()
        self.num_chan = num_chan

    def __call__(self, inputs, name=''):
        x = fluid.layers.swish(inputs)
        # depthwise
        x = fluid.layers.conv2d(x, self.num_chan, filter_size=3, padding='SAME', groups=self.num_chan, param_attr=ParamAttr(initializer=Xavier(), name=name + '_dw_w'), bias_attr=False)
        # pointwise
        x = fluid.layers.conv2d(x, self.num_chan, filter_size=1, param_attr=ParamAttr(nitializer=Xavier(), name=name + '_pw_w'), bias_attr=ParamAttr(regularizer=L2Decay(0.), name=name + '_pw_b'))
        # bn + act
        x = fluid.layers.batch_norm(x, momentum=0.997, epsilon=1e-04, param_attr=ParamAttr(initializer=Constant(1.0), regularizer=L2Decay(0.), name=name + '_bn_w'), bias_attr=ParamAttr(regularizer=L2Decay(0.), name=name + '_bn_b'))
        return x


class BiFPNCell(object):
    def __init__(self, num_chan, levels=5):
        super(BiFPNCell, self).__init__()
        self.levels = levels
        self.num_chan = num_chan
        num_trigates = levels - 2
        num_bigates = levels
        self.trigates = fluid.layers.create_parameter(shape=[num_trigates, 3], dtype='float32', default_initializer=fluid.initializer.Constant(1.))
        self.bigates = fluid.layers.create_parameter(shape=[num_bigates, 2], dtype='float32', default_initializer=fluid.initializer.Constant(1.))
        self.eps = 1e-4

    def __call__(self, inputs, cell_name=''):
        assert len(inputs) == self.levels

        def upsample(feat):
            return fluid.layers.resize_nearest(feat, scale=2.)

        def downsample(feat):
            return fluid.layers.pool2d(feat, pool_type='max', pool_size=3, pool_stride=2, pool_padding='SAME')

        fuse_conv = FusionConv(self.num_chan)

        # normalize weight
        trigates = fluid.layers.relu(self.trigates)
        bigates = fluid.layers.relu(self.bigates)
        trigates /= fluid.layers.reduce_sum(trigates, dim=1, keep_dim=True) + self.eps
        bigates /= fluid.layers.reduce_sum(bigates, dim=1, keep_dim=True) + self.eps

        feature_maps = list(inputs)  # make a copy
        # top down path
        for l in range(self.levels - 1):
            p = self.levels - l - 2
            w1 = fluid.layers.slice(bigates, axes=[0, 1], starts=[l, 0], ends=[l + 1, 1])
            w2 = fluid.layers.slice(bigates, axes=[0, 1], starts=[l, 1], ends=[l + 1, 2])
            above = upsample(feature_maps[p + 1])
            feature_maps[p] = fuse_conv(w1 * above + w2 * inputs[p], name='{}_tb_{}'.format(cell_name, l))
        # bottom up path
        for l in range(1, self.levels):
            p = l
            name = '{}_bt_{}'.format(cell_name, l)
            below = downsample(feature_maps[p - 1])
            if p == self.levels - 1:
                # handle P7
                w1 = fluid.layers.slice(bigates, axes=[0, 1], starts=[p, 0], ends=[p + 1, 1])
                w2 = fluid.layers.slice(bigates, axes=[0, 1], starts=[p, 1], ends=[p + 1, 2])
                feature_maps[p] = fuse_conv(w1 * below + w2 * inputs[p], name=name)
            else:
                w1 = fluid.layers.slice(trigates, axes=[0, 1], starts=[p - 1, 0], ends=[p, 1])
                w2 = fluid.layers.slice(trigates, axes=[0, 1], starts=[p - 1, 1], ends=[p, 2])
                w3 = fluid.layers.slice(trigates, axes=[0, 1], starts=[p - 1, 2], ends=[p, 3])
                feature_maps[p] = fuse_conv(w1 * feature_maps[p] + w2 * below + w3 * inputs[p], name=name)
        return feature_maps


@register
class BiFPN(object):
    """
    Bidirectional Feature Pyramid Network, see https://arxiv.org/abs/1911.09070
    Args:
        num_chan (int): number of feature channels
        repeat (int): number of repeats of the BiFPN module
        level (int): number of FPN levels, default: 5
    """

    def __init__(self, num_chan, repeat=3, levels=5):
        super(BiFPN, self).__init__()
        self.num_chan = num_chan
        self.repeat = repeat
        self.levels = levels

    def __call__(self, inputs):
        feats = []
        # NOTE add two extra levels
        for idx in range(self.levels):
            if idx <= len(inputs):
                if idx == len(inputs):
                    feat = inputs[-1]
                else:
                    feat = inputs[idx]

                if feat.shape[1] != self.num_chan:
                    feat = fluid.layers.conv2d(feat, self.num_chan, filter_size=1, padding='SAME', param_attr=ParamAttr(initializer=Xavier()), bias_attr=ParamAttr(regularizer=L2Decay(0.)))
                    feat = fluid.layers.batch_norm(feat, momentum=0.997, epsilon=1e-04, param_attr=ParamAttr(initializer=Constant(1.0), regularizer=L2Decay(0.)), bias_attr=ParamAttr(regularizer=L2Decay(0.)))

            if idx >= len(inputs):
                feat = fluid.layers.pool2d(feat, pool_type='max', pool_size=3, pool_stride=2, pool_padding='SAME')
            feats.append(feat)

        biFPN = BiFPNCell(self.num_chan, self.levels)
        for r in range(self.repeat):
            feats = biFPN(feats, 'bifpn_{}'.format(r))
        return feats

在此BiFPN layer的基礎上，經過堆疊不一樣個數的BiFPN Layer即組成了BiFPN，在backbone使用不一樣複雜程度的EfficientNet的配置時，BiFPN一樣沿用了EfficientNet的部分設計思想：經過相似的複合係數來控制BiFPN的寬度和深度，其具體公式以下：

經過與EfficientNet相似的複合係數的控制，backbone和BiFPN的配比規模以下所示：

在PaddleDetection實現中，BiFPN實現如上連接中BiFPN所示，而參數修改則對應爲配置文件efficientdet_d0.yml連接中的BiFPN處，其中repeat參數對應爲BiFPN中「#layers」，num_chan參數則對應爲BiFPN中「#channels」。整體來看，隨着EfficientDet的backbone網絡的不斷複雜加深，BiFPN也隨之不斷繼續堆疊，且通道數也同時隨之增加。

Head：Class prediction net

& Box prediction net

做爲Anchor based的目標檢測算法，EfficientDet的Head與現有其餘SOTA算法基本一致，一樣是經過對從BiFPN中所獲取的5個不一樣尺度的特徵圖分別進行檢測框的分類與迴歸。在實現邏輯上，Class prediction net和Box prediction net均使用深度可分離卷積層（Depthwise separable convolution layer），且不一樣複雜度的backbone一樣對應着不一樣的堆疊次數。而與其餘Anchor-based detector所不一樣的是，EfficientDet的分類和迴歸卷積層採起了「參數共享」的方式來減少模型的參數量，具體而言，即分類和迴歸的大部分卷積層在特徵計算的時候使用的一樣的卷積核參數，但須要注意的是，二者的BN層相互獨立，以下可見，卷積層的名稱不受level參數的影響，而bn層則隨着level參數的不一樣而不一樣：

def subnet(inputs, prefix, level):
    feat = inputs
    for i in range(self.repeat):
        # NOTE share weight across FPN levels
        conv_name = '{}_pred_conv_{}'.format(prefix, i)
        feat = separable_conv(feat, self.num_chan, name=conv_name)
        # NOTE batch norm params are not shared
        bn_name = '{}_pred_bn_{}_{}'.format(prefix, level, i)
        feat = fluid.layers.batch_norm(input=feat, act='swish', momentum=0.997, epsilon=1e-4, moving_mean_name=bn_name + '_mean', moving_variance_name=bn_name + '_variance', param_attr=ParamAttr(name=bn_name + '_w', initializer=Constant(value=1.), regularizer=L2Decay(0.)), bias_attr=ParamAttr(name=bn_name + '_b', regularizer=L2Decay(0.)))
    return feat

訓練方式

在efficientdet原文中所註明的是，訓練efficientdet d0-d7使用了128的batchsize，並在32塊TPUv3上訓練了300個epoch（D7/D7x使用了600個epoch）。使用咱們在paddledetection上所復現的efficientdet-d0的配置進行訓練，在第216個epoch的時候完成收斂，在coco-minival上測試性能以下，與原文指標波動在0.02map內：

 Average Precision  (AP) @[ IoU=0.50:0.95  | area=   all   | maxDets=100 ] = 0.341
 Average Precision  (AP) @[ IoU=0.50      | area=   all   | maxDets=100 ] = 0.523
 Average Precision  (AP) @[ IoU=0.75      | area=   all   | maxDets=100 ] = 0.360
 Average Precision  (AP) @[ IoU=0.50:0.95  | area= small   | maxDets=100 ] = 0.134
 Average Precision  (AP) @[ IoU=0.50:0.95  | area=medium | maxDets=100 ] = 0.401
 Average Precision  (AP) @[ IoU=0.50:0.95  | area= large   | maxDets=100 ] = 0.525
 Average Recall     (AR) @[ IoU=0.50:0.95  | area=   all   | maxDets=  1 ] = 0.289
 Average Recall     (AR) @[ IoU=0.50:0.95  | area=   all   | maxDets= 10 ] = 0.445
 Average Recall     (AR) @[ IoU=0.50:0.95  | area=   all   | maxDets=100 ] = 0.471
 Average Recall     (AR) @[ IoU=0.50:0.95  | area= small   | maxDets=100 ] = 0.196
 Average Recall     (AR) @[ IoU=0.50:0.95  | area=medium | maxDets=100 ] = 0.559
 Average Recall     (AR) @[ IoU=0.50:0.95  | area= large   | maxDets=100 ] = 0.690

因而可知，所復現的EfficientDet-D0性能與原文所描述效果基本符合，所有復現代碼與相關模型將在近期更新至PaddleDetection官方代碼庫上，後期也會陸續增長更高配置的coco預訓練模型，歡迎小夥伴們多提意見、多多使用：

https://github.com/PaddlePaddle/PaddleDetection

下載安裝命令

## CPU版本安裝命令
pip install -f https://paddlepaddle.org.cn/pip/oschina/cpu paddlepaddle

## GPU版本安裝命令
pip install -f https://paddlepaddle.org.cn/pip/oschina/gpu paddlepaddle-gpu

>> 訪問 PaddlePaddle 官網，瞭解更多相關內容。