【Semantic Segmentation】DeepLab V3（轉）

時間 2019-12-13

標籤 semantic segmentation deeplab v3 简体版

原文原文鏈接

原文地址：DeepLabv3python

代碼:git

TensorFlow

Abstract

DeepLabv3進一步探討空洞卷積，這是一個在語義分割任務中：能夠調整濾波器視野、控制卷積神經網絡計算的特徵響應分辨率的強大工具。爲了解決多尺度下的目標分割問題，咱們設計了空洞卷積級聯或不一樣採樣率空洞卷積並行架構。此外，咱們強調了ASPP(Atrous Spatial Pyramid Pooling)模塊，該模塊能夠在獲取多個尺度上卷積特徵，進一步提高性能。同時，咱們分享了實施細節和訓練方法，這次提出的DeepLabv3相比先前的版本有顯著的效果提高，在PASCAL VOC 2012上得到了先進的性能。github

Introduction

對於語義分割任務，在應用深度卷積神經網絡中的有兩個挑戰：網絡

第一個挑戰：連續池化和下采樣，讓高層特徵具備局部圖像變換的內在不變性，這容許DCNN學習愈來愈抽象的特徵表示。但同時引發的特徵分辨率降低，會妨礙密集的定位預測任務，由於這須要詳細的空間信息。DeepLabv3系列解決這一問題的辦法是使用空洞卷積(前兩個版本會使用CRF細化分割結果)，這容許咱們能夠保持參數量和計算量的同時提高計算特徵響應的分辨率，從而得到更多的上下文。session
第二個挑戰：多尺度目標的存在。現有多種處理多尺度目標的方法，咱們主要考慮4種，以下圖：
架構
- a. Image Pyramid: 將輸入圖片放縮成不一樣比例，分別應用在DCNN上，將預測結果融合獲得最終輸出
- b. Encoder-Decoder: 利用Encoder階段的多尺度特徵，運用到Decoder階段上恢復空間分辨率(表明工做有FCN、SegNet、PSPNet等工做)
- c. Deeper w. Atrous Convolution: 在原始模型的頂端增長額外的模塊，例如DenseCRF，捕捉像素間長距離信息
- d. Spatial Pyramid Pooling: 空間金字塔池化具備不一樣採樣率和多種視野的卷積核，可以以多尺度捕捉對象

DeepLabv3的主要貢獻在於：app

本文從新討論了空洞卷積的使用，這讓咱們在級聯模塊和空間金字塔池化的框架下，可以獲取更大的感覺野從而獲取多尺度信息。框架
改進了ASPP模塊：由不一樣採樣率的空洞卷積和BN層組成，咱們嘗試以級聯或並行的方式佈局模塊。dom
討論了一個重要問題：使用大采樣率的 $3 \times 3$ 的空洞卷積，由於圖像邊界響應沒法捕捉遠距離信息，會退化爲1×1的卷積, 咱們建議將圖像級特徵融合到ASPP模塊中。 ide
闡述了訓練細節並分享了訓練經驗，論文提出的」DeepLabv3」改進了之前的工做，得到了很好的結果

現有多個工做代表全局特徵或上下文之間的互相做用有助於作語義分割，咱們討論四種不一樣類型利用上下文信息作語義分割的全卷積網絡。

圖像金字塔(Image pyramid)：一般使用共享權重的模型，適用於多尺度的輸入。小尺度的輸入響應控制語義，大尺寸的輸入響應控制細節。經過拉布拉斯金字塔對輸入變換成多尺度，傳入DCNN，融合輸出。這類的缺點是：由於GPU存儲器的限制，對於更大/更深的模型不方便擴展。一般應用於推斷階段。
編碼器-解碼器(Encoder-decoder)： 編碼器的高層次的特徵容易捕獲更長的距離信息，在解碼器階段使用編碼器階段的信息幫助恢復目標的細節和空間維度。例如SegNet利用下采樣的池化索引做爲上採樣的指導；U-Net增長了編碼器部分的特徵跳躍鏈接到解碼器；RefineNet等證實了Encoder-Decoder結構的有效性。
上下文模塊(Context module)：包含了額外的模塊用於級聯編碼長距離的上下文。一種有效的方法是DenseCRF併入DCNN中，共同訓練DCNN和CRF。
空間金字塔池化(Spatial pyramid pooling)：採用空間金字塔池化能夠捕捉多個層次的上下文。在ParseNet中從不一樣圖像等級的特徵中獲取上下文信息；DeepLabv2提出ASPP，以不一樣採樣率的並行空洞卷積捕捉多尺度信息。最近PSPNet在不一樣網格尺度上執行空間池化，並在多個數據集上得到優異的表現。還有其餘基於LSTM方法聚合全局信息。

咱們的工做主要探討空洞卷積做爲上下文模塊和一個空間金字塔池化的工具，這適用於任何網絡。具體來講，咱們取ResNet最後一個block，複製多個級聯起來，送入到ASPP模塊後。咱們經過實驗發現使用BN層有利於訓練過程，爲了進一步捕獲全局上下文，咱們建議在ASPP上融入圖像級特徵.

Method

空洞卷積應用於密集的特徵提取

這在DeepLabv1和DeepLabv2都已經講過，這裏不詳解了~

深層次的空洞卷積

咱們首先探討將空洞卷積應用在級聯模塊。具體來講，咱們取ResNet中最後一個block，在下圖中爲block4，並在其後面增長級聯模塊。

上圖(a)所示，總體圖片的信息總結到後面很是小的特徵映射上，但實驗證實這是不利於語義分割的。以下圖：

使用步幅越長的特徵映射，獲得的結果反倒會差，結果最好的out_stride = 8 須要佔用較多的存儲空間。由於連續的下采樣會下降特徵映射的分辨率，細節信息被抽取，這對語義分割是有害的。
上圖(b)所示，可以使用不一樣採樣率的空洞卷積保持輸出步幅的爲out_stride = 16.這樣不增長參數量和計算量同時有效的縮小了步幅。

　Atrous Spatial Pyramid Pooling

對於在DeepLabv2中提出的ASPP模塊，其在特徵頂部映射圖並行使用了四種不一樣採樣率的空洞卷積。這代表以不一樣尺度採樣是有效的，咱們在DeepLabv3中向ASPP中添加了BN層。不一樣採樣率的空洞卷積能夠有效的捕獲多尺度信息，可是，咱們發現隨着採樣率的增長，濾波器的有效權重(權重有效的應用在特徵區域，而不是填充0)逐漸變小。以下圖所示：

當咱們不一樣採樣率的 $3 \times 3$ 卷積核應用在 $65 \times 65$ 的特徵映射上，當採樣率接近特徵映射大小時， $3 \times 3$ 的濾波器不是捕捉全圖像的上下文，而是退化爲簡單的 $1 \times 1$ 濾波器，只有濾波器中心點的權重起了做用。

爲了克服這個問題，咱們考慮使用圖片級特徵。具體來講，咱們在模型最後的特徵映射上應用全局平均，將結果通過 $1 \times 1$ 的卷積，再雙線性上採樣獲得所需的空間維度。最終，咱們改進的ASPP包括：

一個 $1 \times 1$ 卷積和三個 $3 \times 3$ 的採樣率爲 $r a t e s = {6, 12, 18}$ 的空洞卷積，濾波器數量爲256，包含BN層。針對output_stride=16的狀況。以下圖(a)部分Atrous Spatial Pyramid Pooling
圖像級特徵，即將特徵作全局平均池化，通過卷積，再融合。以下圖(b)部分Image Pooling.

改進後的ASPP模塊以下圖所示：

注意當output_stride=8時，加倍了採樣率。全部的特徵經過 $1 \times 1$ 級聯到一塊兒，生成最終的分數.

Experiment

採用的是預訓練的ResNet爲基礎層，並配合使用了空洞卷積控制輸出步幅。由於輸出步幅output_stride(定義爲輸入圖像的分辨率與最終輸出分辨率的比值)。當咱們輸出步幅爲8時，原ResNet的最後兩個block包含的空洞卷積的採樣率爲 $r = 2$ 和 $r = 4$ 。

模型的訓練設置：

部分	設置
數據集	PASCAL VOC 2012
工具	TensorFlow
裁剪尺寸	採樣513大小的裁剪尺寸
學習率策略	採用poly策略，在初始學習率基礎上，乘以 $(1 - \frac{i t e r}{m a x_i t e r})^{p o w e r}$ ,其中 $p o w e r = 0.9$
BN層策略	當output_stride=16時，咱們採用batchsize=16，同時BN層的參數作參數衰減0.9997。在加強的數據集上，以初始學習率0.007訓練30K後，凍結BN層參數。採用output_stride=8時，再使用初始學習率0.001訓練30K。訓練output_stride=16比output_stride=8要快不少，由於中間的特徵映射在空間上小的四倍。但由於output_stride=16在特徵映射上粗糙是犧牲了精度。
上採樣策略	在先前的工做上，咱們是將最終的輸出與GroundTruth下采樣8倍作比較如今咱們發現保持GroundTruth更重要，故咱們是將最終的輸出上採樣8倍與完整的GroundTruth比較。

Going Deeper with Atrous Convolution實驗

咱們首先試試級聯使用多個帶空洞卷積的block模塊。

ResNet50：以下圖，咱們探究輸出步幅的影響，當輸出步幅爲256時，因爲嚴重的信號抽取，性能大大的降低了。

而當咱們使用不一樣採樣率的空洞卷積，結果大大的上升了，這表如今語義分割中使用空洞卷積的必要性。
ResNet50 vs. ResNet101: 用更深的模型，並改變級聯模塊的數量。以下圖，當block增長性能也隨之增長。
Multi-grid：咱們使用的變體殘差模塊，採用Multi-gird策略，即主分支的三個卷積都使用空洞卷積，採樣率設置Multi-gird策略。按照以下圖：
- 應用不一樣策略一般比單倍數 $(r_{1}, r_{2}, r_{3}) = (1, 1, 1)$ 效果要好
- 簡單的提高倍數是無效的 $(r_{1}, r_{2}, r_{3}) = (2, 2, 2)$
- 最好的隨着網絡的深刻提高性能.即block7下 $(r_{1}, r_{2}, r_{3}) = (1, 2, 1)$
Inference strategy on val set：
推斷期間使用output_stride = 8，有着更豐富的細節內容:

Atrous Spatial Pyramid Pooling實驗

ASPP模塊相比之前增長了BN層，對比multi-grid策略和圖片層級特徵提高實驗結果：

Inference strategy on val set：
推斷期間使用output_stride = 8，有着更豐富的細節內容，採用多尺度輸入和翻轉，性能進一步提高了:

在PASCAL VOC 2012上表現：

Cityscapes表現

多種技巧配置結果：

與其餘模型相比：

其餘參數的影響

上採樣策略和裁剪大小和BN層的影響：
不一樣batchsize的影響：
不一樣評估步幅的影響：

Conclusion

DeepLabv3重點探討了空洞卷積的使用，同時改進了ASPP模塊，便於更好的捕捉多尺度上下文。

代碼分析

由於沒找到官方的代碼，在github上找了一個DeepLabV3-TensorFlow版本.

訓練腳本分析

先找到train_voc12.py訓練文件。

找到關鍵的main方法：

建立訓練模型 & 計算loss

def main():
    """建立模型 and 準備訓練."""
    h = args.input_size
    w = args.input_size
    input_size = (h, w)

    # 設置隨機種子
    tf.set_random_seed(args.random_seed)

    # 建立線程隊列，準備數據
    coord = tf.train.Coordinator()

    # 讀取數據
    image_batch, label_batch = read_data(is_training=True)

    # 建立訓練模型
    net, end_points = deeplabv3(image_batch,
                                num_classes=args.num_classes,
                                depth=args.num_layers,
                                is_training=True,
                                )
    # 對於小的batchsize,保持BN layers的統計參數更佳(即凍結預訓練模型的BN參數)
    # If is_training=True, 統計參數在訓練期間會被更新
    # 注意的是：即便is_training=False ，BN參數gamma (scale) and beta (offset) 也會更新

    # 取出模型預測值
    raw_output = end_points['resnet{}/logits'.format(args.num_layers)]

    # Which variables to load. Running means and variances are not trainable,
    # thus all_variables() should be restored.
    restore_var = [v for v in tf.global_variables() if 'fc' not in v.name 
        or not args.not_restore_last]
    if args.freeze_bn:
        all_trainable = [v for v in tf.trainable_variables() if 'beta' not in 
            v.name and 'gamma' not in v.name]
    else:
        all_trainable = [v for v in tf.trainable_variables()]
    conv_trainable = [v for v in all_trainable if 'fc' not in v.name] 

    # 上採樣logits輸出，取代ground truth下采樣
    raw_output_up = tf.image.resize_bilinear(raw_output, [h, w]) # 雙線性插值放大到原大小

    # Predictions: 忽略標籤中大於或等於n_classes的值
    label_proc = tf.squeeze(label_batch) # 刪除數據標籤tensor的shape中維度值爲1
    mask = label_proc <= args.num_classes # 忽略標籤中大於或等於n_classes的值
    seg_logits = tf.boolean_mask(raw_output_up, mask)  #取出預測值中感興趣的mask
    seg_gt = tf.boolean_mask(label_proc, mask) # 取出數據標籤中標註的mask(感興趣的mask)
    seg_gt = tf.cast(seg_gt, tf.int32)  # 轉換一下數據類型 

    # 逐像素作softmax loss.
    loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=seg_logits,
        labels=seg_gt)
    seg_loss = tf.reduce_mean(loss)
    seg_loss_sum = tf.summary.scalar('loss/seg', seg_loss) # TensorBoard記錄

    # 增長正則化損失
    reg_losses = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
    reg_loss = tf.add_n(reg_losses)
    reg_loss_sum = tf.summary.scalar('loss/reg', reg_loss)

    tot_loss = seg_loss + reg_loss
    tot_loss_sum = tf.summary.scalar('loss/tot', tot_loss)

    seg_pred = tf.argmax(seg_logits, axis=1)

    # 計算MIOU 
    train_mean_iou, train_update_mean_iou = streaming_mean_iou(seg_pred, 
        seg_gt, args.num_classes, name="train_iou")  

    train_iou_sum = tf.summary.scalar('accuracy/train_mean_iou', 
        train_mean_iou)

關於streaming_mean_iou方法代碼見metric_ops.py，該方法用於計算每步的平均交叉點(mIOU),即先計算每一個類別的IOU，再平均到各個類上。

IOU的計算定義以下：

I O U = t r u e

該方法返回一個 update_op操做用於估計數據流上的度量，更新變量並返回 mean_iou.

上面代碼初始化了DeepLabv3模型，並取出模型輸出，計算了loss，並計算了mIOU.

訓練參數設置

這裏學習率沒有使用poly策略，該github說學習率設置0.00001效果更好點~

# 初始化訓練參數
    train_initializer = tf.variables_initializer(var_list=tf.get_collection(
        tf.GraphKeys.LOCAL_VARIABLES, scope="train_iou"))

    # 定義 loss and 優化參數.
    # 這裏學習率沒采用poly策略 
    base_lr = tf.constant(args.learning_rate)
    step_ph = tf.placeholder(dtype=tf.float32, shape=())
    # learning_rate = tf.scalar_mul(base_lr, 
    # tf.pow((1 - step_ph / args.num_steps), args.power))
    learning_rate = base_lr
    lr_sum = tf.summary.scalar('params/learning_rate', learning_rate)

    train_sum_op = tf.summary.merge([seg_loss_sum, reg_loss_sum, 
        tot_loss_sum, train_iou_sum, lr_sum])

建立交叉驗證模型，並設置輸出值

# 交叉驗證模型
    image_batch_val, label_batch_val = read_data(is_training=False)
    _, end_points_val = deeplabv3(image_batch_val,
                                  num_classes=args.num_classes,
                                  depth=args.num_layers,
                                  reuse=True,
                                  is_training=False,
                                  )
    raw_output_val = end_points_val['resnet{}/logits'.format(args.num_layers)] # 交叉驗證輸出
    nh, nw = tf.shape(image_batch_val)[1], tf.shape(image_batch_val)[2]

    seg_logits_val = tf.image.resize_bilinear(raw_output_val, [nh, nw])
    seg_pred_val = tf.argmax(seg_logits_val, axis=3)
    seg_pred_val = tf.expand_dims(seg_pred_val, 3)
    seg_pred_val = tf.reshape(seg_pred_val, [-1,])

    seg_gt_val = tf.cast(label_batch_val, tf.int32)
    seg_gt_val = tf.reshape(seg_gt_val, [-1,])
    mask_val = seg_gt_val <= args.num_classes - 1

    seg_pred_val = tf.boolean_mask(seg_pred_val, mask_val)
    seg_gt_val = tf.boolean_mask(seg_gt_val, mask_val)

    val_mean_iou, val_update_mean_iou = streaming_mean_iou(seg_pred_val, 
        seg_gt_val, num_classes=args.num_classes, name="val_iou")        
    val_iou_sum = tf.summary.scalar('accuracy/val_mean_iou', val_mean_iou)

訓練模型

val_initializer = tf.variables_initializer(var_list=tf.get_collection(
        tf.GraphKeys.LOCAL_VARIABLES, scope="val_iou"))
    test_sum_op = tf.summary.merge([val_iou_sum])
    global_step = tf.train.get_or_create_global_step()

    opt = tf.train.MomentumOptimizer(learning_rate, args.momentum)
    grads_conv = tf.gradients(tot_loss, conv_trainable)
    # train_op = opt.apply_gradients(zip(grads_conv, conv_trainable))
    train_op = slim.learning.create_train_op(
        tot_loss, opt,
        global_step=global_step,
        variables_to_train=conv_trainable,
        summarize_gradients=True)

    # Set up tf session and initialize variables. 
    config = tf.ConfigProto()
    config.gpu_options.allow_growth = True
    sess = tf.Session(config=config)

    sess.run(tf.global_variables_initializer())
    sess.run(tf.local_variables_initializer())

    # Saver for storing checkpoints of the model.
    saver = tf.train.Saver(var_list=tf.global_variables(), max_to_keep=20)

    # 若是有checkpoint則加載
    if args.ckpt > 0 or args.restore_from is not None:
        loader = tf.train.Saver(var_list=restore_var)
        load(loader, sess, args.snapshot_dir)

    # 開始線程隊列
    threads = tf.train.start_queue_runners(coord=coord, sess=sess)

    # tf.get_default_graph().finalize()
    summary_writer = tf.summary.FileWriter(args.snapshot_dir,
                                           sess.graph)

    # 迭代訓練
    for step in range(args.ckpt, args.num_steps):
        start_time = time.time()
        feed_dict = { step_ph : step }
        tot_loss_float, seg_loss_float, reg_loss_float, _, lr_float, _,train_summary = sess.run([tot_loss, seg_loss, reg_loss, train_op,
            learning_rate, train_update_mean_iou, train_sum_op], 
            feed_dict=feed_dict)
        train_mean_iou_float = sess.run(train_mean_iou)
        duration = time.time() - start_time
        sys.stdout.write('step {:d}, tot_loss = {:.6f}, seg_loss = {:.6f}, ' \
            'reg_loss = {:.6f}, mean_iou = {:.6f}, lr: {:.6f}({:.3f}' \
            'sec/step)\n'.format(step, tot_loss_float, seg_loss_float,
             reg_loss_float, train_mean_iou_float, lr_float, duration)
            )
        sys.stdout.flush()

        if step % args.save_pred_every == 0 and step > args.ckpt:
            summary_writer.add_summary(train_summary, step)
            sess.run(val_initializer)
            for val_step in range(NUM_VAL-1):
                _, test_summary = sess.run([val_update_mean_iou, test_sum_op],
                feed_dict=feed_dict)

            summary_writer.add_summary(test_summary, step)
            val_mean_iou_float= sess.run(val_mean_iou)

            save(saver, sess, args.snapshot_dir, step)
            sys.stdout.write('step {:d}, train_mean_iou: {:.6f}, ' \
                'val_mean_iou: {:.6f}\n'.format(step, train_mean_iou_float, 
                val_mean_iou_float))
            sys.stdout.flush()
            sess.run(train_initializer)

        if coord.should_stop():
            coord.request_stop()
            coord.join(threads)

模型分析

上面看完了訓練腳本，下面看看DeepLabv3的模型定義腳本libs.nets.deeplabv3.py.

deeplabv3中ResNet變體

def deeplabv3(inputs, num_classes, depth=50, aspp=True, reuse=None, is_training=True):
  """DeepLabV3 Args: inputs: A tensor of size [batch, height, width, channels]. depth: ResNet的深度 通常爲101或51. aspp: 是否使用ASPP module, if True, 使用4 blocks with multi_grid=(1,2,4), if False, 使用7 blocks with multi_grid=(1,2,1). reuse: 模型參數重用(驗證會重用訓練的模型參數) Returns: net: A rank-4 tensor of size [batch, height_out, width_out, channels_out]. end_points: 模型的組合 """

  if aspp:
    multi_grid = (1,2,4)
  else:
    multi_grid = (1,2,1)
  scope ='resnet{}'.format(depth)
  with tf.variable_scope(scope, [inputs], reuse=reuse) as sc:
    end_points_collection = sc.name + '_end_points'
    with slim.arg_scope(resnet_arg_scope(weight_decay=args.weight_decay, 
      batch_norm_decay=args.bn_weight_decay)):
      with slim.arg_scope([slim.conv2d, bottleneck, bottleneck_hdc],
                          outputs_collections=end_points_collection):
        with slim.arg_scope([slim.batch_norm], is_training=is_training):
          net = inputs
          net = resnet_utils.conv2d_same(net, 64, 7, stride=2, scope='conv1')
          net = slim.max_pool2d(net, [3, 3], stride=2, scope='pool1')

          with tf.variable_scope('block1', [net]) as sc:
            base_depth = 64
            for i in range(2):
              with tf.variable_scope('unit_%d' % (i + 1), values=[net]):
                net = bottleneck(net, depth=base_depth * 4, 
                  depth_bottleneck=base_depth, stride=1)
            with tf.variable_scope('unit_3', values=[net]):
              net = bottleneck(net, depth=base_depth * 4, 
                depth_bottleneck=base_depth, stride=2)
            net = slim.utils.collect_named_outputs(end_points_collection, 
              sc.name, net)

          with tf.variable_scope('block2', [net]) as sc:
            base_depth = 128
            for i in range(3):
              with tf.variable_scope('unit_%d' % (i + 1), values=[net]):
                net = bottleneck(net, depth=base_depth * 4, 
                  depth_bottleneck=base_depth, stride=1)
            with tf.variable_scope('unit_4', values=[net]):
              net = bottleneck(net, depth=base_depth * 4, 
                depth_bottleneck=base_depth, stride=2)
            net = slim.utils.collect_named_outputs(end_points_collection, 
              sc.name, net)

          with tf.variable_scope('block3', [net]) as sc:
            base_depth = 256

            num_units = 6
            if depth == 101:
              num_units = 23
            elif depth == 152:
              num_units = 36

            for i in range(num_units):
              with tf.variable_scope('unit_%d' % (i + 1), values=[net]):
                net = bottleneck(net, depth=base_depth * 4, 
                  depth_bottleneck=base_depth, stride=1)
            net = slim.utils.collect_named_outputs(end_points_collection, 
              sc.name, net)

          with tf.variable_scope('block4', [net]) as sc:
            base_depth = 512

            for i in range(3):
              with tf.variable_scope('unit_%d' % (i + 1), values=[net]):
                net = bottleneck_hdc(net, depth=base_depth * 4, 
                  depth_bottleneck=base_depth, stride=1, rate=2, 
                  multi_grid=multi_grid)
            net = slim.utils.collect_named_outputs(end_points_collection, 
              sc.name, net)

這部分實現的變體的ResNet結構，包括帶mutli-grid的殘差模塊由libs.nets.deeplabv3.py中的bottleneck_hdc方法提供。

帶mutli-grid策略的bottleneck_hdc殘差結構代碼以下：

@slim.add_arg_scope
def bottleneck_hdc(inputs, depth, depth_bottleneck, stride, rate=1, multi_grid=(1,2,4), outputs_collections=None, scope=None, use_bounded_activations=False):
  """Hybrid Dilated Convolution Bottleneck. Multi_Grid = (1,2,4) See Understanding Convolution for Semantic Segmentation. When putting together two consecutive ResNet blocks that use this unit, one should use stride = 2 in the last unit of the first block. Args: inputs: A tensor of size [batch, height, width, channels]. depth: The depth of the ResNet unit output. depth_bottleneck: The depth of the bottleneck layers. stride: The ResNet unit's stride. Determines the amount of downsampling of the units output compared to its input. rate: An integer, rate for atrous convolution. multi_grid: multi_grid sturcture. outputs_collections: Collection to add the ResNet unit output. scope: Optional variable_scope. use_bounded_activations: Whether or not to use bounded activations. Bounded activations better lend themselves to quantized inference. Returns: The ResNet unit's output. """
  with tf.variable_scope(scope, 'bottleneck_v1', [inputs]) as sc:
    depth_in = slim.utils.last_dimension(inputs.get_shape(), min_rank=4)
    # 是否降採樣
    if depth == depth_in:
      shortcut = resnet_utils.subsample(inputs, stride, 'shortcut')
    else:
      shortcut = slim.conv2d(
          inputs,
          depth, [1, 1],
          stride=stride,
          activation_fn=tf.nn.relu6 if use_bounded_activations else None,
          scope='shortcut')

    # 殘差結構的主分支
    residual = slim.conv2d(inputs, depth_bottleneck, [1, 1], stride=1, 
      rate=rate*multi_grid[0], scope='conv1')
    residual = resnet_utils.conv2d_same(residual, depth_bottleneck, 3, stride,
      rate=rate*multi_grid[1], scope='conv2')
    residual = slim.conv2d(residual, depth, [1, 1], stride=1, 
      rate=rate*multi_grid[2], activation_fn=None, scope='conv3')

    # 是否後接激活函數
    if use_bounded_activations:
      # Use clip_by_value to simulate bandpass activation.
      residual = tf.clip_by_value(residual, -6.0, 6.0)
      output = tf.nn.relu6(shortcut + residual)
    else:
      output = tf.nn.relu(shortcut + residual)

    return slim.utils.collect_named_outputs(outputs_collections,
                                            sc.name,
                                            output)

下面是關於aspp模塊和後期的空洞卷積策略使用

if aspp:
            with tf.variable_scope('aspp', [net]) as sc:
              aspp_list = []
              branch_1 = slim.conv2d(net, 256, [1,1], stride=1, 
                scope='1x1conv')
              branch_1 = slim.utils.collect_named_outputs(
                end_points_collection, sc.name, branch_1)
              aspp_list.append(branch_1)

              for i in range(3):
                branch_2 = slim.conv2d(net, 256, [3,3], stride=1, rate=6*(i+1), scope='rate{}'.format(6*(i+1)))
                branch_2 = slim.utils.collect_named_outputs(end_points_collection, sc.name, branch_2)
                aspp_list.append(branch_2)

              aspp = tf.add_n(aspp_list)
              aspp = slim.utils.collect_named_outputs(end_points_collection, sc.name, aspp)

            # 增長圖像級特徵，即全局平均池化
            with tf.variable_scope('img_pool', [net]) as sc:
              """Image Pooling See ParseNet: Looking Wider to See Better """
              pooled = tf.reduce_mean(net, [1, 2], name='avg_pool', 
                keep_dims=True)
              pooled = slim.utils.collect_named_outputs(end_points_collection, 
                sc.name, pooled)

              pooled = slim.conv2d(pooled, 256, [1,1], stride=1, scope='1x1conv')
              pooled = slim.utils.collect_named_outputs(end_points_collection, 
                sc.name, pooled)

              pooled = tf.image.resize_bilinear(pooled, tf.shape(net)[1:3])
              pooled = slim.utils.collect_named_outputs(end_points_collection, 
                sc.name, pooled)

            # 將圖像級特徵融合到aspp中
            with tf.variable_scope('fusion', [aspp, pooled]) as sc:
              net = tf.concat([aspp, pooled], 3)
              net = slim.utils.collect_named_outputs(end_points_collection, 
                sc.name, net)

              net = slim.conv2d(net, 256, [1,1], stride=1, scope='1x1conv')
              net = slim.utils.collect_named_outputs(end_points_collection, 
                sc.name, net)

          # 若是不使用aspp， 則使用帶mutli-grid的殘差結構
          else:
            with tf.variable_scope('block5', [net]) as sc:
              base_depth = 512

              for i in range(3):
                with tf.variable_scope('unit_%d' % (i + 1), values=[net]):
                  net = bottleneck_hdc(net, depth=base_depth * 4, 
                    depth_bottleneck=base_depth, stride=1, rate=4)
              net = slim.utils.collect_named_outputs(end_points_collection, 
                sc.name, net)

            with tf.variable_scope('block6', [net]) as sc:
              base_depth = 512

              for i in range(3):
                with tf.variable_scope('unit_%d' % (i + 1), values=[net]):
                  net = bottleneck_hdc(net, depth=base_depth * 4, 
                    depth_bottleneck=base_depth, stride=1, rate=8)
              net = slim.utils.collect_named_outputs(end_points_collection, 
                sc.name, net)

            with tf.variable_scope('block7', [net]) as sc:
              base_depth = 512

              for i in range(3):
                with tf.variable_scope('unit_%d' % (i + 1), values=[net]):
                  net = bottleneck_hdc(net, depth=base_depth * 4, 
                    depth_bottleneck=base_depth, stride=1, rate=16)
              net = slim.utils.collect_named_outputs(end_points_collection, 
                sc.name, net)

          # 輸出
          with tf.variable_scope('logits',[net]) as sc:
            net = slim.conv2d(net, num_classes, [1,1], stride=1, 
              activation_fn=None, normalizer_fn=None)
            net = slim.utils.collect_named_outputs(end_points_collection, 
            sc.name, net)

          end_points = slim.utils.convert_collection_to_dict(
              end_points_collection)

          return net, end_points

if __name__ == "__main__":
  x = tf.placeholder(tf.float32, [None, 512, 512, 3])

  net, end_points = deeplabv3(x, 21)
  for i in end_points:
    print(i, end_points[i])