本節對VGG進行介紹,其名字源於論文所在實驗室Visual Geometry Group [ 1 ] ^{\color{red}[1]} [1]。VGG提出了能夠經過重複使用簡單的基礎塊來構建深度模型的思路 [ 2 ] ^{\color{red}[2]} [2]。html
1 庫引入
import time import torch from torch import nn, optim from util.SimpleTool import load_data_fashion_mnist
2 VGG塊
VGG塊的組成規律是:連續使用數個相同的填充爲 1 1 1、窗口形狀爲 3 × 3 3 \times 3 3×3的卷積層,後接一個步幅爲 2 2 2、窗口形狀爲 2 × 2 2 \times 2 2×2的最大池化層 [ 1 ] ^{\color{red}[1]} [1]。卷積層保持輸入的高寬不變,池化層使其減半。
def vgg_block(num_convs, in_channels, out_channels): """ The VGG block. """ temp_block = [nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1), nn.ReLU()] for i in range(1, num_convs): temp_block.append(nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1)) temp_block.append(nn.ReLU()) temp_block.append(nn.MaxPool2d(kernel_size=2, stride=2)) return nn.Sequential(*temp_block)
3 VGG網絡
1) 5 5 5個卷積塊,前兩個使用單卷積層,後 3 3 3塊使用雙卷積層;
2)第一塊的輸入輸出通道分爲是 1 1 1和 64 64 64,以後每次對輸出通道翻倍,直到變成 512 512 512。
因爲該網絡使用了 8 8 8個卷積層和 3 3 3個全鏈接層,故被稱爲VGG-1:網絡
def vgg(conv_arch, fc_num_features, fc_num_hiddens=4096): ret_net = nn.Sequential() for i, (num_convs, in_channels, out_channels) in enumerate(conv_arch): ret_net.add_module("vgg_block_" + str(i + 1), vgg_block(num_convs, in_channels, out_channels)) ret_net.add_module("fc", nn.Sequential(FlattenLayer(), nn.Linear(fc_num_features, fc_num_hiddens), nn.ReLU(), nn.Dropout(0.5), nn.Linear(fc_num_hiddens, fc_num_hiddens), nn.ReLU(), nn.Dropout(0.5), nn.Linear(fc_num_hiddens, 10) )) return ret_net class FlattenLayer(torch.nn.Module): def __init__(self): super(FlattenLayer, self).__init__() def forward(self, x): return x.view(x.shape[0], -1)
def test1(): temp_conv_arch = ((1, 1, 64), (1, 64, 128), (2, 128, 256), (2, 256, 512), (2, 512, 512)) temp_fc_num_features = 512 * 7 * 7 temp_fc_num_hiddens = 4096 temp_net = vgg(temp_conv_arch, temp_fc_num_features, temp_fc_num_hiddens) temp_x = torch.rand(1, 1, 224, 224) for name, block in temp_net.named_children(): temp_x = block(temp_x) print(name, "output shape:", temp_x.shape) if __name__ == '__main__': test1()
vgg_block_1 output shape: torch.Size([1, 64, 112, 112]) vgg_block_2 output shape: torch.Size([1, 128, 56, 56]) vgg_block_3 output shape: torch.Size([1, 256, 28, 28]) vgg_block_4 output shape: torch.Size([1, 512, 14, 14]) vgg_block_5 output shape: torch.Size([1, 512, 7, 7]) fc output shape: torch.Size([1, 10])
能夠發現,每次的輸入和高寬都減半,直到變爲 7 × 7 7 \times 7 7×7傳入全鏈接層。與此同時,輸出通道數每次翻倍,直到 512 512 512。
4 獲取數據和模型訓練
因爲VGG-11相對複雜,所以構造一個通道更小的網絡在Fashion-MNIST數據集上進行訓練 (train函數以及load_data_fashion_mnist與AlexNet相同):函數
def test2(): temp_ratio = 8 temp_conv_arch = ((1, 1, 64 // temp_ratio), (1, 64 // temp_ratio, 128 // temp_ratio), (2, 128 // temp_ratio, 256 // temp_ratio), (2, 256 // temp_ratio, 512 // temp_ratio), (2, 512 // temp_ratio, 512 // temp_ratio)) temp_fc_num_features = 512 * 7 * 7 temp_fc_num_hiddens = 4096 temp_net = vgg(temp_conv_arch, temp_fc_num_features // temp_ratio, temp_fc_num_hiddens // temp_ratio) temp_batch_size = 64 temp_tr_iter, temp_te_iter = load_data_fashion_mnist(temp_batch_size, resize=224) temp_lr = 0.001 temp_num_epochs = 5 temp_optimizer = optim.Adam(temp_net.parameters(), lr=temp_lr) train(temp_net, temp_tr_iter, temp_te_iter, temp_batch_size, temp_optimizer, num_epochs=temp_num_epochs) if __name__ == '__main__': test2()
Training on cpu Epoch 1, loss 0.5778, training acc 0.786, test ass 0.881, time 1180.2 s
""" @author: Inki @contact: inki.yinji@gmail.com @version: Created in 2020 1220, last modified in 2020 1220. """ import time import torch from torch import nn, optim from util.SimpleTool import load_data_fashion_mnist def vgg_block(num_convs, in_channels, out_channels): """ The VGG block. """ temp_block = [nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1), nn.ReLU()] for i in range(1, num_convs): temp_block.append(nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1)) temp_block.append(nn.ReLU()) temp_block.append(nn.MaxPool2d(kernel_size=2, stride=2)) return nn.Sequential(*temp_block) def vgg(conv_arch, fc_num_features, fc_num_hiddens=4096): ret_net = nn.Sequential() for i, (num_convs, in_channels, out_channels) in enumerate(conv_arch): ret_net.add_module("vgg_block_" + str(i + 1), vgg_block(num_convs, in_channels, out_channels)) ret_net.add_module("fc", nn.Sequential(FlattenLayer(), nn.Linear(fc_num_features, fc_num_hiddens), nn.ReLU(), nn.Dropout(0.5), nn.Linear(fc_num_hiddens, fc_num_hiddens), nn.ReLU(), nn.Dropout(0.5), nn.Linear(fc_num_hiddens, 10) )) return ret_net class FlattenLayer(torch.nn.Module): def __init__(self): super(FlattenLayer, self).__init__() def forward(self, x): return x.view(x.shape[0], -1) def test1(): temp_conv_arch = ((1, 1, 64), (1, 64, 128), (2, 128, 256), (2, 256, 512), (2, 512, 512)) temp_fc_num_features = 512 * 7 * 7 temp_fc_num_hiddens = 4096 temp_net = vgg(temp_conv_arch, temp_fc_num_features, temp_fc_num_hiddens) temp_x = torch.rand(1, 1, 224, 224) for name, block in temp_net.named_children(): temp_x = block(temp_x) print(name, "output shape:", temp_x.shape) def test2(): temp_ratio = 8 temp_conv_arch = ((1, 1, 64 // temp_ratio), (1, 64 // temp_ratio, 128 // temp_ratio), (2, 128 // temp_ratio, 256 // temp_ratio), (2, 256 // temp_ratio, 512 // temp_ratio), (2, 512 // temp_ratio, 512 // temp_ratio)) temp_fc_num_features = 512 * 7 * 7 temp_fc_num_hiddens = 4096 temp_net = vgg(temp_conv_arch, temp_fc_num_features // temp_ratio, temp_fc_num_hiddens // temp_ratio) temp_batch_size = 64 temp_tr_iter, temp_te_iter = load_data_fashion_mnist(temp_batch_size, resize=224) temp_lr = 0.001 temp_num_epochs = 5 temp_optimizer = optim.Adam(temp_net.parameters(), lr=temp_lr) train(temp_net, temp_tr_iter, temp_te_iter, temp_batch_size, temp_optimizer, num_epochs=temp_num_epochs)
[1] 對於給定的感覺野,採用堆積的小卷積核優於採用大的卷積核,由於能夠增長網絡深度來保證學習更復雜的模型,並且代價更小。例如在VGG中,使用 3 3 3個 3 × 3 3 \times 3 3×3的卷積核來代替 7 × 7 7 \times 7 7×7卷積核,使用 2 2 2個 3 × 3 3 \times 3 3×3卷積核代替 5 × 5 5 \times 5 5×5的卷積核,這樣既提高了網絡的深度,使用網絡效果提高,也減少了參數數量。ui
