文章目錄

引入

本節對VGG進行介紹，其名字源於論文所在實驗室Visual Geometry Group $^{\color{red}[1]}$ 。VGG提出了能夠經過重複使用簡單的基礎塊來構建深度模型的思路 $^{\color{red}[2]}$ 。html

注：
[1] Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
[2] 李沐、Aston Zhang老師等，動手學深度學習。python

1 庫引入

import time
import torch
from torch import nn, optim
from util.SimpleTool import load_data_fashion_mnist

2 VGG塊

VGG塊的組成規律是：連續使用數個相同的填充爲 $1$ 、窗口形狀爲 $\times 3$ 的卷積層，後接一個步幅爲 $2$ 、窗口形狀爲 $\times 2$ 的最大池化層 $^{\color{red}[1]}$ 。卷積層保持輸入的高寬不變，池化層使其減半。
如下代碼實現了基礎的VGG塊，它能夠指定卷積層的數量和輸入輸出通道數：web

def vgg_block(num_convs, in_channels, out_channels):
    """ The VGG block. """
    temp_block = [nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1), nn.ReLU()]
    for i in range(1, num_convs):
        temp_block.append(nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1))
        temp_block.append(nn.ReLU())
    temp_block.append(nn.MaxPool2d(kernel_size=2, stride=2))
    return nn.Sequential(*temp_block)

3 VGG網絡

VGG網絡由卷積層模塊後接全鏈接層模塊構成。卷積層模塊串聯數個vgg_block，其超參數由變量conv_arch定義：指定了每一個VGG塊裏卷積層的個數和輸入輸出通道數，全鏈接模塊則和AlexNet一致。
如今構造一個VGG網絡，其具備如下特色：
1） $5$ 個卷積塊，前兩個使用單卷積層，後 $3$ 塊使用雙卷積層；
2）第一塊的輸入輸出通道分爲是 $1$ 和 $64$ ，以後每次對輸出通道翻倍，直到變成 $512$ 。
因爲該網絡使用了 $8$ 個卷積層和 $3$ 個全鏈接層，故被稱爲VGG-1：網絡

def vgg(conv_arch, fc_num_features, fc_num_hiddens=4096):
    ret_net = nn.Sequential()
    for i, (num_convs, in_channels, out_channels) in enumerate(conv_arch):
        ret_net.add_module("vgg_block_" + str(i + 1), vgg_block(num_convs, in_channels, out_channels))
    ret_net.add_module("fc", nn.Sequential(FlattenLayer(),
                                       nn.Linear(fc_num_features, fc_num_hiddens),
                                       nn.ReLU(),
                                       nn.Dropout(0.5),
                                       nn.Linear(fc_num_hiddens, fc_num_hiddens),
                                       nn.ReLU(),
                                       nn.Dropout(0.5),
                                       nn.Linear(fc_num_hiddens, 10)
                                       ))
    return ret_net


class FlattenLayer(torch.nn.Module):
    def __init__(self):
        super(FlattenLayer, self).__init__()

    def forward(self, x):
        return x.view(x.shape[0], -1)

輸出每一級的形狀看看：app

def test1():
    temp_conv_arch = ((1, 1, 64),
                      (1, 64, 128),
                      (2, 128, 256),
                      (2, 256, 512),
                      (2, 512, 512))
    temp_fc_num_features = 512 * 7 * 7
    temp_fc_num_hiddens = 4096
    temp_net = vgg(temp_conv_arch, temp_fc_num_features, temp_fc_num_hiddens)
    temp_x = torch.rand(1, 1, 224, 224)
    for name, block in temp_net.named_children():
        temp_x = block(temp_x)
        print(name, "output shape:", temp_x.shape)


if __name__ == '__main__':
    test1()

輸出以下：ide

vgg_block_1 output shape: torch.Size([1, 64, 112, 112])
vgg_block_2 output shape: torch.Size([1, 128, 56, 56])
vgg_block_3 output shape: torch.Size([1, 256, 28, 28])
vgg_block_4 output shape: torch.Size([1, 512, 14, 14])
vgg_block_5 output shape: torch.Size([1, 512, 7, 7])
fc output shape: torch.Size([1, 10])

能夠發現，每次的輸入和高寬都減半，直到變爲 $\times 7$ 傳入全鏈接層。與此同時，輸出通道數每次翻倍，直到 $512$ 。
由於每一個卷積層的窗口大小一致，全部每層模型的參數尺寸和計算複雜度與輸入高寬、通道數的乘積成正比。
VGG這種高寬減半、通道翻倍的設計使得多數卷積層都有相同的模型參數尺寸和計算複雜度。svg

4 獲取數據和模型訓練

因爲VGG-11相對複雜，所以構造一個通道更小的網絡在Fashion-MNIST數據集上進行訓練 (train函數以及load_data_fashion_mnist與AlexNet相同)：函數

def test2():
    temp_ratio = 8
    temp_conv_arch = ((1, 1, 64 // temp_ratio),
                      (1, 64 // temp_ratio, 128 // temp_ratio),
                      (2, 128 // temp_ratio, 256 // temp_ratio),
                      (2, 256 // temp_ratio, 512 // temp_ratio),
                      (2, 512 // temp_ratio, 512 // temp_ratio))
    temp_fc_num_features = 512 * 7 * 7
    temp_fc_num_hiddens = 4096
    temp_net = vgg(temp_conv_arch, temp_fc_num_features // temp_ratio, temp_fc_num_hiddens // temp_ratio)
    temp_batch_size = 64
    temp_tr_iter, temp_te_iter = load_data_fashion_mnist(temp_batch_size, resize=224)
    temp_lr = 0.001
    temp_num_epochs = 5
    temp_optimizer = optim.Adam(temp_net.parameters(), lr=temp_lr)
    train(temp_net, temp_tr_iter, temp_te_iter, temp_batch_size, temp_optimizer, num_epochs=temp_num_epochs)


if __name__ == '__main__':
    test2()

輸出以下：學習

Training on cpu
Epoch 1, loss 0.5778, training acc 0.786, test ass 0.881, time 1180.2 s

完整代碼

""" @author: Inki @contact: inki.yinji@gmail.com @version: Created in 2020 1220, last modified in 2020 1220. """

import time
import torch
from torch import nn, optim
from util.SimpleTool import load_data_fashion_mnist


def vgg_block(num_convs, in_channels, out_channels):
    """ The VGG block. """
    temp_block = [nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1), nn.ReLU()]
    for i in range(1, num_convs):
        temp_block.append(nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1))
        temp_block.append(nn.ReLU())
    temp_block.append(nn.MaxPool2d(kernel_size=2, stride=2))
    return nn.Sequential(*temp_block)


def vgg(conv_arch, fc_num_features, fc_num_hiddens=4096):
    ret_net = nn.Sequential()
    for i, (num_convs, in_channels, out_channels) in enumerate(conv_arch):
        ret_net.add_module("vgg_block_" + str(i + 1), vgg_block(num_convs, in_channels, out_channels))
    ret_net.add_module("fc", nn.Sequential(FlattenLayer(),
                                       nn.Linear(fc_num_features, fc_num_hiddens),
                                       nn.ReLU(),
                                       nn.Dropout(0.5),
                                       nn.Linear(fc_num_hiddens, fc_num_hiddens),
                                       nn.ReLU(),
                                       nn.Dropout(0.5),
                                       nn.Linear(fc_num_hiddens, 10)
                                       ))
    return ret_net


class FlattenLayer(torch.nn.Module):
    def __init__(self):
        super(FlattenLayer, self).__init__()

    def forward(self, x):
        return x.view(x.shape[0], -1)


def test1():
    temp_conv_arch = ((1, 1, 64),
                      (1, 64, 128),
                      (2, 128, 256),
                      (2, 256, 512),
                      (2, 512, 512))
    temp_fc_num_features = 512 * 7 * 7
    temp_fc_num_hiddens = 4096
    temp_net = vgg(temp_conv_arch, temp_fc_num_features, temp_fc_num_hiddens)
    temp_x = torch.rand(1, 1, 224, 224)
    for name, block in temp_net.named_children():
        temp_x = block(temp_x)
        print(name, "output shape:", temp_x.shape)


def test2():
    temp_ratio = 8
    temp_conv_arch = ((1, 1, 64 // temp_ratio),
                      (1, 64 // temp_ratio, 128 // temp_ratio),
                      (2, 128 // temp_ratio, 256 // temp_ratio),
                      (2, 256 // temp_ratio, 512 // temp_ratio),
                      (2, 512 // temp_ratio, 512 // temp_ratio))
    temp_fc_num_features = 512 * 7 * 7
    temp_fc_num_hiddens = 4096
    temp_net = vgg(temp_conv_arch, temp_fc_num_features // temp_ratio, temp_fc_num_hiddens // temp_ratio)
    temp_batch_size = 64
    temp_tr_iter, temp_te_iter = load_data_fashion_mnist(temp_batch_size, resize=224)
    temp_lr = 0.001
    temp_num_epochs = 5
    temp_optimizer = optim.Adam(temp_net.parameters(), lr=temp_lr)
    train(temp_net, temp_tr_iter, temp_te_iter, temp_batch_size, temp_optimizer, num_epochs=temp_num_epochs)

注：
[1] 對於給定的感覺野，採用堆積的小卷積核優於採用大的卷積核，由於能夠增長網絡深度來保證學習更復雜的模型，並且代價更小。例如在VGG中，使用 $3$ 個 $\times 3$ 的卷積核來代替 $\times 7$ 卷積核，使用 $2$ 個 $\times 3$ 卷積核代替 $\times 5$ 的卷積核，這樣既提高了網絡的深度，使用網絡效果提高，也減少了參數數量。ui

本文同步分享在博客「因吉」（CSDN）。
若有侵權，請聯繫 support@oschina.cn 刪除。
本文參與「OSC源創計劃」，歡迎正在閱讀的你也加入，一塊兒分享。

深度學習 (二十二)：卷積神經網絡之VGG模型