『高性能模型』深度可分離卷積和MobileNet_v1

時間 2019-11-06

標籤高性能模型深度可分離 mobilenet v1 简体版

原文原文鏈接

論文原址：MobileNets v1css

TensorFlow實現：mobilenet_v1.py python

TensorFlow預訓練模型：mobilenet_v1.mdgit

1、深度可分離卷積

標準的卷積過程能夠看上圖，一個2×2的卷積核在卷積時，對應圖像區域中的全部通道均被同時考慮，問題在於，爲何必定要同時考慮圖像區域和通道？咱們爲何不能把通道和空間區域分開考慮？github

深度可分離卷積提出了一種新的思路：對於不一樣的輸入channel採起不一樣的卷積核進行卷積，它將普通的卷積操做分解爲兩個過程。網絡

卷積過程

假設有 $N\times H\times W \times C$ 的輸入，同時有個 $3\times3$ 的卷積。若是設置且，那麼普通卷積輸出爲 $N\times H\times W \times k$ 。架構

Depthwise 過程

Depthwise是指將 $N\times H\times W \times C$ 的輸入分爲組，而後每一組作 $3\times3$ 卷積。這樣至關於收集了每一個Channel的空間特徵，即Depthwise特徵。框架

Pointwise 過程

Pointwise是指對 $N\times H\times W \times C$ 的輸入作個普通的 $1\times1$ 卷積。這樣至關於收集了每一個點的特徵，即Pointwise特徵。Depthwise+Pointwise最終輸出也是 $N\times H\times W \times k$ 。ide

2、優點與創新

Depthwise+Pointwise能夠近似看做一個卷積層：函數

普通卷積：3x3 Conv+BN+ReLU
Mobilenet卷積：3x3 Depthwise Conv+BN+ReLU 和 1x1 Pointwise Conv+BN+ReLU

計算加速

參數量下降

假設輸入通道數爲3，要求輸出通道數爲256，兩種作法：性能

1.直接接一個3×3×256的卷積核，參數量爲：3×3×3×256 = 6,912

2.DW操做，分兩步完成，參數量爲：3×3×3 + 3×1×1×256 = 795（3個特徵層*（3*3的卷積核）），卷積深度參數一般取爲1

乘法運算次數下降

對比一下不一樣卷積的乘法次數：

普通卷積計算量爲： $H\times W \times C\times k \times 3\times 3$
Depthwise計算量爲： $H\times W \times C \times 3\times 3$
Pointwise計算量爲： $H\times W\times C\times k$

經過Depthwise+Pointwise的拆分，至關於將普通卷積的計算量壓縮爲：

$\frac{depthwise+pointwise}{conv}=\frac{H\times W \times C \times 3\times 3 + H\times W\times C\times k}{H\times W \times C\times k \times 3\times 3}=\frac{1}{k} +\frac{1}{3\times 3}$

通道區域分離

深度可分離卷積將以往普通卷積操做同時考慮通道和區域改變（卷積先只考慮區域，而後再考慮通道），實現了通道和區域的分離。

3、Mobilenet v1

Mobilenet v1利用深度可分離卷積進行加速，其架構以下，

還能夠對全部卷積層數量統一乘以縮小因子 $\alpha$ （其中 $\alpha\in(0,1]，典型值爲1，0.75，0.5和0.25$ ）以壓縮網絡。這樣Depthwise+Pointwise總計算量能夠進一下降爲：

$H\times W \times \alpha C \times 3\times 3 + H\times W\times \alpha C\times \alpha k$

固然，壓縮網絡計算量確定是有代價的。下圖展現了 $\alpha$ 不一樣時Mobilenet v1在ImageNet上的性能。能夠看到即便 $\alpha=0.5$ 時Mobilenet v1在ImageNet上依然有63.7%的準確度。

下圖展現Mobilenet v1 $\alpha=1.0$ 與GoogleNet和VGG16的在輸入分辨率 $224\times 224$ 狀況下，準確度差距很是小，可是計算量和參數量都小不少。同時原文也給出了以Mobilenet v1提取特徵的SSD/Faster R-CNN在COCO數據集上的性能。

結構實現一探

在實現代碼中（連接見本文開頭），做者使用具名元組存儲了網絡結構信息，

Conv = namedtuple('Conv', ['kernel', 'stride', 'depth'])
DepthSepConv = namedtuple('DepthSepConv', ['kernel', 'stride', 'depth'])

# MOBILENETV1_CONV_DEFS specifies the MobileNet body
MOBILENETV1_CONV_DEFS = [
    Conv(kernel=[3, 3], stride=2, depth=32),
    DepthSepConv(kernel=[3, 3], stride=1, depth=64),
    DepthSepConv(kernel=[3, 3], stride=2, depth=128),
    DepthSepConv(kernel=[3, 3], stride=1, depth=128),
    DepthSepConv(kernel=[3, 3], stride=2, depth=256),
    DepthSepConv(kernel=[3, 3], stride=1, depth=256),
    DepthSepConv(kernel=[3, 3], stride=2, depth=512),
    DepthSepConv(kernel=[3, 3], stride=1, depth=512),
    DepthSepConv(kernel=[3, 3], stride=1, depth=512),
    DepthSepConv(kernel=[3, 3], stride=1, depth=512),
    DepthSepConv(kernel=[3, 3], stride=1, depth=512),
    DepthSepConv(kernel=[3, 3], stride=1, depth=512),
    DepthSepConv(kernel=[3, 3], stride=2, depth=1024),
    DepthSepConv(kernel=[3, 3], stride=1, depth=1024)
]

而後，在生成結構中迭代這個具名元組列表，根據信息生成網路結構，這僅僅給出深度可分離層的實現部分，

elif isinstance(conv_def, DepthSepConv):
      end_point = end_point_base + '_depthwise'

      # By passing filters=None
      # separable_conv2d produces only a depthwise convolution layer
      if use_explicit_padding:
        net = _fixed_padding(net, conv_def.kernel, layer_rate)
      net = slim.separable_conv2d(net, None, conv_def.kernel,  # <---Depthwise
                                  depth_multiplier=1,
                                  stride=layer_stride,
                                  rate=layer_rate,
                                  scope=end_point)

      end_points[end_point] = net
      if end_point == final_endpoint:
        return net, end_points
      end_point = end_point_base + '_pointwise'
      net = slim.conv2d(net, depth(conv_def.depth), [1, 1],  # <---Pointwise
                        stride=1,
                        scope=end_point)

4、相關框架實現

TensorFlow 分步執行

順便一提，tf的實現能夠接收rate參數，便可以採用空洞卷積的方式進行操做。

一、depthwise_conv2d 分離卷積部分

咱們定義一張4*4的雙通道圖片

import tensorflow as tf

img1 = tf.constant(value=[[[[1],[2],[3],[4]],
                           [[1],[2],[3],[4]],
                           [[1],[2],[3],[4]],
                           [[1],[2],[3],[4]]]],dtype=tf.float32)

img2 = tf.constant(value=[[[[1],[1],[1],[1]],
                           [[1],[1],[1],[1]],
                           [[1],[1],[1],[1]],
                           [[1],[1],[1],[1]]]],dtype=tf.float32)

img = tf.concat(values=[img1,img2],axis=3)

img

<tf.Tensor 'concat_1:0' shape=(1, 4, 4, 2) dtype=float32>

使用3*3的卷積核，輸入channel爲2，輸出channel爲2（卷積核數目爲2），

filter1 = tf.constant(value=0, shape=[3,3,1,1],dtype=tf.float32)
filter2 = tf.constant(value=1, shape=[3,3,1,1],dtype=tf.float32)
filter3 = tf.constant(value=2, shape=[3,3,1,1],dtype=tf.float32)
filter4 = tf.constant(value=3, shape=[3,3,1,1],dtype=tf.float32)
filter_out1 = tf.concat(values=[filter1,filter2],axis=2)
filter_out2 = tf.concat(values=[filter3,filter4],axis=2)
filter = tf.concat(values=[filter_out1,filter_out2],axis=3)

filter

<tf.Tensor 'concat_4:0' shape=(3, 3, 2, 2) dtype=float32>

同時執行卷積操做，和深度可分離卷積操做，

out_img_conv = tf.nn.conv2d(input=img, filter=filter, 
                            strides=[1,1,1,1], padding='VALID')
out_img_depthwise = tf.nn.depthwise_conv2d(input=img, 
                                           filter=filter, strides=[1,1,1,1], 
　　　　　　　　　　　　　　　　　　　　　　　　　　rate=[1,1], padding='VALID')

with tf.Session() as sess:
    res1 = sess.run(out_img_conv)
    res2 = sess.run(out_img_depthwise)
print(res1, '\n', res1.shape)
print(res2, '\n', res2.shape)

[[[[  9.  63.]
   [  9.  81.]]

  [[  9.  63.]
   [  9.  81.]]]] 
 (1, 2, 2, 2)  # 《----------


[[[[  0.  36.   9.  27.]
   [  0.  54.   9.  27.]]

  [[  0.  36.   9.  27.]
   [  0.  54.   9.  27.]]]] 
 (1, 2, 2, 4)# 《----------

對比輸出shape，depthwise_conv2d輸出的channel數目爲in_channel * 卷積核數目，每個卷積覈對應通道都會對對應的channel進行一次卷積，因此輸出通道數更多，

看到這裏你們可能會誤解深度可分離卷積的輸出通道數大於普通卷積，其實這只是「分離」部分，後面還有組合的步驟，而普通卷積只不過直接完成了組合：經過對應點相加，將四個卷積中間結果合併爲卷積核個數（這裏是2）

二、合併特徵

合併過程以下，可分離卷積中的合併過程變成可學習的了，使用一個1*1的普通卷積進行特徵合併，

point_filter = tf.constant(value=1, shape=[1,1,4,4],dtype=tf.float32)
out_img_s = tf.nn.conv2d(input=out_img_depthwise, filter=point_filter, strides=[1,1,1,1], padding='VALID')
with tf.Session() as sess:
    res3 = sess.run(out_img_s)
print(res3, '\n', res3.shape)

[[[[ 72.  72.  72.  72.]
   [ 90.  90.  90.  90.]]

  [[ 72.  72.  72.  72.]
   [ 90.  90.  90.  90.]]]] 
 (1, 2, 2, 4)

TensorFlow 一步執行

out_img_se = tf.nn.separable_conv2d(input=img, 
                                    depthwise_filter=filter, 
                                    pointwise_filter=point_filter, 
                                    strides=[1,1,1,1], rate=[1,1], padding='VALID')

with tf.Session() as sess:
    print(sess.run(out_img_se))

[[[[ 72. 72. 72. 72.]
[ 90. 90. 90. 90.]]

[[ 72. 72. 72. 72.]
[ 90. 90. 90. 90.]]]]
(1, 2, 2, 4)

slim 庫API介紹

def separable_convolution2d(
    inputs,
    num_outputs,
    kernel_size,
    depth_multiplier=1,
    stride=1,
    padding='SAME',
    data_format=DATA_FORMAT_NHWC,
    rate=1,
    activation_fn=nn.relu,
    normalizer_fn=None,
    normalizer_params=None,
    weights_initializer=initializers.xavier_initializer(),
    pointwise_initializer=None,
    weights_regularizer=None,
    biases_initializer=init_ops.zeros_initializer(),
    biases_regularizer=None,
    reuse=None,
    variables_collections=None,
    outputs_collections=None,
    trainable=True,
    scope=None):

  """一個2維的可分離卷積，能夠選擇是否增長BN層。
  這個操做首先執行逐通道的卷積（每一個通道分別執行卷積），建立一個稱爲depthwise_weights的變量。若是num_outputs
不爲空，它將增長一個pointwise的卷積（混合通道間的信息），建立一個稱爲pointwise_weights的變量。若是
normalizer_fn爲空，它將給結果加上一個偏置，而且建立一個爲biases的變量，若是不爲空，那麼歸一化函數將被調用。
最後再調用一個激活函數而後獲得最終的結果。

  Args:
    inputs: 一個形狀爲[batch_size, height, width, channels]的tensor
    num_outputs: pointwise 卷積的卷積核個數，若是爲空，將跳過pointwise卷積的步驟.
    kernel_size: 卷積核的尺寸：[kernel_height, kernel_width]，若是兩個的值相同，則能夠爲一個整數。
    depth_multiplier: 卷積乘子，即每一個輸入通道通過卷積後的輸出通道數。總共的輸出通道數將爲：
num_filters_in * depth_multiplier。
    stride:卷積步長，[stride_height, stride_width],若是兩個值相同的話，爲一個整數值。
    padding:  填充方式，'VALID' 或者 'SAME'.
    data_format:數據格式， `NHWC` (默認) 和 `NCHW` 
    rate: 空洞卷積的膨脹率：[rate_height, rate_width],若是兩個值相同的話，能夠爲整數值。若是這兩個值
任意一個大於1，那麼stride的值必須爲1.     
    activation_fn: 激活函數，默認爲ReLU。若是設置爲None，將跳過。
    normalizer_fn: 歸一化函數，用來替代biase。若是歸一化函數不爲空，那麼biases_initializer
和biases_regularizer將被忽略。 biases將不會被建立。若是設爲None，將不會有歸一化。
    normalizer_params: 歸一化函數的參數。
    weights_initializer: depthwise卷積的權重初始化器
    pointwise_initializer: pointwise卷積的權重初始化器。若是設爲None，將使用weights_initializer。
    weights_regularizer: (可選)權重正則化器。
    biases_initializer: 偏置初始化器，若是爲None，將跳過偏置。
    biases_regularizer: (可選)偏置正則化器。
    reuse: 網絡層和它的變量是否能夠被重用，爲了重用，網絡層的scope必須被提供。
    variables_collections: (可選)全部變量的collection列表，或者是一個關鍵字爲變量值爲collection的字典。
    outputs_collections: 輸出被添加的collection.
    trainable: 變量是否能夠被訓練
    scope: (可選)變量的命名空間。
  Returns:
    表明這個操做的輸出的一個tensor"""

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。