從頭學pytorch(十六):VGG NET

VGG

AlexNet在Lenet的基礎上增長了幾個卷積層,改變了卷積核大小,每一層輸出通道數目等,而且取得了很好的效果.可是並無提出一個簡單有效的思路.
VGG作到了這一點,提出了能夠經過重複使⽤簡單的基礎塊來構建深度學習模型的思路.git

論文地址:https://arxiv.org/abs/1409.1556github

vgg的結構以下所示:

上圖給出了不一樣層數的vgg的結構.也就是常說的vgg16,vgg19等等.app

VGG BLOCK

vgg的設計思路是,經過不斷堆疊3x3的卷積核,不斷加深模型深度.vgg net證實了加深模型深度對提升模型的學習能力是一個頗有效的手段.ide


看上圖就能發現,連續的2個3x3卷積,感覺野和一個5x5卷積是同樣的,可是前者有兩次非線性變換,後者只有一次!,這就是連續堆疊小卷積核能提升
模型特徵學習的關鍵.此外,2個3x3的參數數量也比一個5x5少.(2x3x3 < 5x5)函數

vgg的基礎組成模塊,每個卷積層都由n個3x3卷積後面接2x2的最大池化.池化層的步幅爲2.從而卷積層卷積後,寬高不變,池化後,寬高減半.
咱們能夠有如下代碼:學習

def make_layers(in_channels,cfg):
    layers = []
    previous_channel = in_channels #上一層的輸出的channel數量
    for v in cfg:
        if v == 'M':
            layers.append(nn.MaxPool2d(kernel_size=2,stride=2))
        else:
            layers.append(nn.Conv2d(previous_channel,v,kernel_size=3,padding=1))
            layers.append(nn.ReLU())

            previous_channel = v

    conv = nn.Sequential(*layers)
    return conv


cfgs = {
    'A': [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
    'B': [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
    'D': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],
    'E': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M'],
}

cfgs定義了不一樣的vgg模型的結構,好比'A'表明vgg11. 數字表明卷積後的channel數. 'M'表明Maxpool測試

咱們能夠給出模型定義優化

class VGG(nn.Module):
    def __init__(self,input_channels,cfg,num_classes=10, init_weights=True):
        super(VGG, self).__init__()
        self.conv = make_layers(input_channels,cfg) # torch.Size([1, 512, 7, 7])
        self.fc = nn.Sequential(
            nn.Linear(512*7*7,4096),
            nn.ReLU(),
            nn.Linear(4096,4096),
            nn.ReLU(),
            nn.Linear(4096,num_classes)
        )
    
    def forward(self, img):
        feature = self.conv(img)
        output = self.fc(feature.view(img.shape[0], -1))
        return output

卷積層的輸出可由如下測試代碼得出設計

# conv = make_layers(1,cfgs['A'])
# X = torch.randn((1,1,224,224))
# out = conv(X)
# #print(out.shape)

加載數據

batch_size,num_workers=4,4
train_iter,test_iter = learntorch_utils.load_data(batch_size,num_workers,resize=224)

這裏batch_size調到8個人顯存就不夠了...3d

定義模型

net = VGG(1,cfgs['A']).cuda()

定義損失函數

loss = nn.CrossEntropyLoss()

定義優化器 

opt = torch.optim.Adam(net.parameters(),lr=0.001)

定義評估函數

def test():
    acc_sum = 0
    batch = 0
    for X,y in test_iter:
        X,y = X.cuda(),y.cuda()
        y_hat = net(X)
        acc_sum += (y_hat.argmax(dim=1) == y).float().sum().item()
        batch += 1
    #print('acc_sum %d,batch %d' % (acc_sum,batch))

    return 1.0*acc_sum/(batch*batch_size)

訓練

num_epochs = 3
def train():
    for epoch in range(num_epochs):
        train_l_sum,batch,acc_sum = 0,0,0
        start = time.time()
        for X,y in train_iter:
            # start_batch_begin = time.time()
            X,y = X.cuda(),y.cuda()
            y_hat = net(X)
            acc_sum += (y_hat.argmax(dim=1) == y).float().sum().item()

            l = loss(y_hat,y)
            opt.zero_grad()
            l.backward()

            opt.step()
            train_l_sum += l.item()

            batch += 1

            mean_loss = train_l_sum/(batch*batch_size) #計算平均到每張圖片的loss
            start_batch_end = time.time()
            time_batch = start_batch_end - start

            print('epoch %d,batch %d,train_loss %.3f,time %.3f' % 
                (epoch,batch,mean_loss,time_batch))

        print('***************************************')
        mean_loss = train_l_sum/(batch*batch_size) #計算平均到每張圖片的loss
        train_acc = acc_sum/(batch*batch_size)     #計算訓練準確率
        test_acc = test()                           #計算測試準確率
        end = time.time()
        time_per_epoch =  end - start
        print('epoch %d,train_loss %f,train_acc %f,test_acc %f,time %f' % 
                (epoch + 1,mean_loss,train_acc,test_acc,time_per_epoch))

train()

4G的GTX 1050顯卡,訓練一個epoch大概一個多小時.
完整代碼:https://github.com/sdu2011/learn_pytorch

相關文章
相關標籤/搜索