3.5 卷積神經網絡進階-Inception-mobile_net 實戰

4.2.5 Inception-mobile_net實戰

  • Inception-Net

    Inception Net的思想是分組卷積,上一層分紅幾組卷積,卷積完成以後在把分組的結果拼接起來python

    能夠進行擴展,每一個組有不少層,這裏只實現基本的分組卷積git

    # 定義 Inception-Net的分組結構
    def inception_block(x, output_channel_for_each_path, name):
        """inception block implementation"""
        """ Args: - x: 輸入數據 - output_channel_for_each_path: 每組的輸出通道數目 eg: [10,20,30] - name: 每組的卷積命名 """
        # variable_scope 在這個scope下命名不會有衝突 conv1 = 'conv1' => scope_name/conv1
        with tf.variable_scope(name):
            conv1_1 = tf.layers.conv2d(x,
                                       output_channel_for_each_path[0],
                                       (1, 1),
                                       strides = (1,1),
                                       padding = 'same',
                                       activation = tf.nn.relu,
                                       name = 'conv1_1')
            
            conv3_3 = tf.layers.conv2d(x,
                                       output_channel_for_each_path[1],
                                       (3, 3),
                                       strides = (1,1),
                                       padding = 'same',
                                       activation = tf.nn.relu,
                                       name = 'conv3_3')
            conv5_5 = tf.layers.conv2d(x,
                                       output_channel_for_each_path[0],
                                       (5, 5),
                                       strides = (1,1),
                                       padding = 'same',
                                       activation = tf.nn.relu,
                                       name = 'conv5_5')
            max_pooling = tf.layers.max_pooling2d(x,
                                                (2,2),
                                                (2,2),
                                                name = 'max_pooling')
            
            # max_pooling 會使得圖像變小,因此須要padding
            max_pooling_shape = max_pooling.get_shape().as_list()[1:]
            input_shape = x.get_shape().as_list()[1:]
            width_padding = (input_shape[0] - max_pooling_shape[0]) // 2
            height_padding = (input_shape[1] - max_pooling_shape[1]) // 2
            padded_pooling = tf.pad(max_pooling,
                                    [[0,0],
                                     [width_padding,width_padding],
                                     [height_padding,height_padding],
                                     [0,0]])
            
            # 在第四個維度(通道數)上作拼接
            concat_layer = tf.concat(
                [conv1_1, conv3_3, conv5_5, padded_pooling],
                axis = 3)
            return concat_layer
            
    x = tf.placeholder(tf.float32, [None, 3072])
    y = tf.placeholder(tf.int64, [None])
    
    # 將向量變成具備三通道的圖片的格式
    x_image = tf.reshape(x, [-1,3,32,32])
    # 32*32
    x_image = tf.transpose(x_image, perm = [0, 2, 3, 1])
    
    # 先通過一個普通的卷積層和池化層
    # conv1:神經元圖,feature map,輸出圖像
    conv1 = tf.layers.conv2d(x_image,
                               32, # output channel number
                               (3,3), # kernal size
                               padding = 'same', # same 表明輸出圖像的大小沒有變化,valid 表明不作padding
                               activation = tf.nn.relu,
                               name = 'conv1')
    # 16*16
    pooling1 = tf.layers.max_pooling2d(conv1,
                                       (2, 2), # kernal size
                                       (2, 2), # stride
                                       name = 'pool1' # name爲了給這一層作一個命名,這樣會讓圖打印出來的時候會是一個有意義的圖
                                      )
    
    # 通過兩個個分組卷積
    inception_2a = inception_block(pooling1, 
                                   [16, 16, 16],
                                   name = 'inception_2a')
    
    inception_2b = inception_block(inception_2a, 
                                   [16, 16, 16],
                                   name = 'inception_2b')
    
    # 接一個池化
    pooling2 = tf.layers.max_pooling2d(inception_2b,
                                       (2, 2), 
                                       (2, 2), 
                                       name = 'pool2' 
                                      )
    
    # 再通過兩個分組卷積核一個池化
    inception_3a = inception_block(pooling2, 
                                   [16, 16, 16],
                                   name = 'inception_3a')
    
    inception_3b = inception_block(inception_3a, 
                                   [16, 16, 16],
                                   name = 'inception_3b')
    
    pooling3 = tf.layers.max_pooling2d(inception_3b,
                                       (2, 2), 
                                       (2, 2), 
                                       name = 'pool3' 
                                      )
    
    # [None, 4*4*42] 將三通道的圖形轉換成矩陣
    flatten = tf.layers.flatten(pooling3)
    y_ = tf.layers.dense(flatten, 10)
    
    
    # 交叉熵
    loss = tf.losses.sparse_softmax_cross_entropy(labels=y, logits=y_)
    # y_-> softmax
    # y -> one_hot
    # loss = ylogy_
    
    
    # bool
    predict = tf.argmax(y_, 1)
    # [1,0,1,1,1,0,0,0]
    correct_prediction = tf.equal(predict, y)
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float64))
    
    with tf.name_scope('train_op'):
        train_op = tf.train.AdamOptimizer(1e-3).minimize(loss)
    複製代碼
  • Mobile-Netbash

    Mobile Net 的基本結構 深度可分類的卷積 -> BN ->RELU-> 1*1 的卷積 -> BN -> RELU網絡

    這裏BN先不加,這是下節課的內容app

    image.png

    def separable_conv_block(x,
                      output_channel_number,
                      name):
        """separable_conv block implementation"""
        """ Args: - x: 輸入數據 - output_channel_number: 通過深度可分離卷積以後,再通過1*1 的卷積生成的通道數目 - name: 每組的卷積命名 """
        # variable_scope 在這個scope下命名不會有衝突 conv1 = 'conv1' => scope_name/conv1
        with tf.variable_scope(name):
            input_channel = x.get_shape().as_list()[-1]
            # 將x 在 第四個維度(axis+1) 上 拆分紅 input_channel 份
            # channel_wise_x: [channel1, channel2, ...]
            channel_wise_x = tf.split(x, input_channel, axis = 3)
            output_channels = []
            for i in range(len(channel_wise_x)):
                output_channel = tf.layers.conv2d(channel_wise_x[i],
                                                  1,
                                                  (3,3),
                                                  strides = (1,1),
                                                  padding = 'same',
                                                  activation = tf.nn.relu,
                                                  name = 'conv_%d' % i)
                output_channels.append(output_channel)
            concat_layers = tf.concat(output_channels, axis = 3)
            conv1_1 = tf.layers.conv2d(concat_layers,
                                       output_channel_number,
                                       (1,1),
                                       strides = (1,1),
                                       padding = 'same',
                                       activation = tf.nn.relu,
                                       name = 'conv1_1')
            return conv1_1
            
            
    x = tf.placeholder(tf.float32, [None, 3072])
    y = tf.placeholder(tf.int64, [None])
    
    # 將向量變成具備三通道的圖片的格式
    x_image = tf.reshape(x, [-1,3,32,32])
    # 32*32
    x_image = tf.transpose(x_image, perm = [0, 2, 3, 1])
    
    # conv1:神經元圖,feature map,輸出圖像
    conv1 = tf.layers.conv2d(x_image,
                               32, # output channel number
                               (3,3), # kernal size
                               padding = 'same', # same 表明輸出圖像的大小沒有變化,valid 表明不作padding
                               activation = tf.nn.relu,
                               name = 'conv1')
    # 16*16
    pooling1 = tf.layers.max_pooling2d(conv1,
                                       (2, 2), # kernal size
                                       (2, 2), # stride
                                       name = 'pool1' # name爲了給這一層作一個命名,這樣會讓圖打印出來的時候會是一個有意義的圖
                                      )
    
    separable_2a = separable_conv_block(pooling1, 
                                        32,
                                        name = 'separable_2a')
    
    separable_2b = separable_conv_block(separable_2a, 
                                        32,
                                        name = 'separable_2b')
    
    pooling2 = tf.layers.max_pooling2d(separable_2b,
                                       (2, 2), 
                                       (2, 2), 
                                       name = 'pool2' 
                                      )
    
    separable_3a = separable_conv_block(pooling2, 
                                        32,
                                        name = 'separable_3a')
    
    separable_3b = separable_conv_block(separable_3a, 
                                        32,
                                        name = 'separable_3b')
    
    pooling3 = tf.layers.max_pooling2d(separable_3b,
                                       (2, 2), 
                                       (2, 2), 
                                       name = 'pool3')
    
    # [None, 4*4*42] 將三通道的圖形轉換成矩陣
    flatten = tf.layers.flatten(pooling3)
    y_ = tf.layers.dense(flatten, 10)
    
    
    # 交叉熵
    loss = tf.losses.sparse_softmax_cross_entropy(labels=y, logits=y_)
    # y_-> softmax
    # y -> one_hot
    # loss = ylogy_
    
    
    # bool
    predict = tf.argmax(y_, 1)
    # [1,0,1,1,1,0,0,0]
    correct_prediction = tf.equal(predict, y)
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float64))
    
    with tf.name_scope('train_op'):
        train_op = tf.train.AdamOptimizer(1e-3).minimize(loss)
    複製代碼

    這裏的準確率是10000次百分之60,這是由於mobile net 的 參數減少和計算率減少影響了準確率。ide

  • 這裏的訓練咱們都使用的是一萬次訓練,真正的神經網絡訓練遠不止於此,可能會達到100萬次的規模ui

相關文章
相關標籤/搜索