快速入門PyTorch(3)--訓練一個圖片分類器和多 GPUs 訓練

時間 2019-12-11

標籤快速入門 pytorch 訓練一個圖片分類器 gpus 简体版

原文原文鏈接

原文連接：mp.weixin.qq.com/s/3hXlcOVuJ…html

快速入門 PyTorch 教程前兩篇文章：python

這是快速入門 PyTorch 的第三篇教程也是最後一篇教程，此次將會在 CIFAR10 數據集上簡單訓練一個圖片分類器，將會簡單實現一個分類器從網絡定義、數據處理和加載到訓練網絡模型，最後測試模型性能的流程。以及如何使用多 GPUs 訓練網絡模型。git

本文的目錄以下：程序員

4. 訓練分類器

上一節介紹瞭如何構建神經網絡、計算 loss 和更新網絡的權值參數，接下來須要作的就是實現一個圖片分類器。github

4.1 訓練數據

在訓練分類器前，固然須要考慮數據的問題。一般在處理如圖片、文本、語音或者視頻數據的時候，通常都採用標準的 Python 庫將其加載並轉成 Numpy 數組，而後再轉回爲 PyTorch 的張量。算法

對於圖像，能夠採用 Pillow, OpenCV 庫；
對於語音，有 scipy 和 librosa;
對於文本，能夠選擇原生 Python 或者 Cython 進行加載數據，或者使用 NLTK 和 SpaCy 。

PyTorch 對於計算機視覺，特別建立了一個 torchvision 的庫，它包含一個數據加載器(data loader)，能夠加載比較常見的數據集，好比 Imagenet, CIFAR10, MNIST 等等，而後還有一個用於圖像的數據轉換器(data transformers)，調用的庫是 torchvision.datasets 和 torch.utils.data.DataLoader 。數組

在本教程中，將採用 CIFAR10 數據集，它包含 10 個類別，分別是飛機、汽車、鳥、貓、鹿、狗、青蛙、馬、船和卡車。數據集中的圖片都是 3x32x32。一些例子以下所示：緩存

4.2 訓練圖片分類器

訓練流程以下：bash

經過調用 torchvision 加載和歸一化 CIFAR10 訓練集和測試集；
構建一個卷積神經網絡；
定義一個損失函數；
在訓練集上訓練網絡；
在測試集上測試網絡性能。

4.2.1 加載和歸一化 CIFAR10

首先導入必須的包：微信

import torch
import torchvision
import torchvision.transforms as transforms
複製代碼

torchvision 的數據集輸出的圖片都是 PILImage ，即取值範圍是 [0, 1] ，這裏須要作一個轉換，變成取值範圍是 [-1, 1] , 代碼以下所示：

# 將圖片數據從 [0,1] 歸一化爲 [-1, 1] 的取值範圍
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
複製代碼

這裏下載好數據後，能夠可視化部分訓練圖片，代碼以下：

import matplotlib.pyplot as plt
import numpy as np

# 展現圖片的函數
def imshow(img):
    img = img / 2 + 0.5     # 非歸一化
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()


# 隨機獲取訓練集圖片
dataiter = iter(trainloader)
images, labels = dataiter.next()

# 展現圖片
imshow(torchvision.utils.make_grid(images))
# 打印圖片類別標籤
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))
複製代碼

展現圖片以下所示：

其類別標籤爲：

frog plane   dog  ship
複製代碼

4.2.2 構建一個卷積神經網絡

這部份內容其實直接採用上一節定義的網絡便可，除了修改 conv1 的輸入通道，從 1 變爲 3，由於此次接收的是 3 通道的彩色圖片。

import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net()
複製代碼

4.2.3 定義損失函數和優化器

這裏採用類別交叉熵函數和帶有動量的 SGD 優化方法：

import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
複製代碼

4.2.4 訓練網絡

第四步天然就是開始訓練網絡，指定須要迭代的 epoch，而後輸入數據，指定次數打印當前網絡的信息，好比 loss 或者準確率等性能評價標準。

import time
start = time.time()
for epoch in range(2):
    
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # 獲取輸入數據
        inputs, labels = data
        # 清空梯度緩存
        optimizer.zero_grad()
        
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        
        # 打印統計信息
        running_loss += loss.item()
        if i % 2000 == 1999:
            # 每 2000 次迭代打印一次信息
            print('[%d, %5d] loss: %.3f' % (epoch + 1, i+1, running_loss / 2000))
            running_loss = 0.0
print('Finished Training! Total cost time: ', time.time()-start)
複製代碼

這裏定義訓練總共 2 個 epoch，訓練信息以下，大概耗時爲 77s。

[1,  2000] loss: 2.226
[1,  4000] loss: 1.897
[1,  6000] loss: 1.725
[1,  8000] loss: 1.617
[1, 10000] loss: 1.524
[1, 12000] loss: 1.489
[2,  2000] loss: 1.407
[2,  4000] loss: 1.376
[2,  6000] loss: 1.354
[2,  8000] loss: 1.347
[2, 10000] loss: 1.324
[2, 12000] loss: 1.311

Finished Training! Total cost time:  77.24696755409241
複製代碼

4.2.5 測試模型性能

訓練好一個網絡模型後，就須要用測試集進行測試，檢驗網絡模型的泛化能力。對於圖像分類任務來講，通常就是用準確率做爲評價標準。

首先，咱們先用一個 batch 的圖片進行小小測試，這裏 batch=4 ，也就是 4 張圖片，代碼以下：

dataiter = iter(testloader)
images, labels = dataiter.next()

# 打印圖片
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))
複製代碼

圖片和標籤分別以下所示：

GroundTruth:    cat  ship  ship plane
複製代碼

而後用這四張圖片輸入網絡，看看網絡的預測結果：

# 網絡輸出
outputs = net(images)

# 預測結果
_, predicted = torch.max(outputs, 1)
print('Predicted: ', ' '.join('%5s' % classes[predicted[j]] for j in range(4)))
複製代碼

輸出爲：

Predicted:    cat  ship  ship  ship
複製代碼

前面三張圖片都預測正確了，第四張圖片錯誤預測飛機爲船。

接着，讓咱們看看在整個測試集上的準確率能夠達到多少吧！

correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (100 * correct / total))
複製代碼

輸出結果以下

Accuracy of the network on the 10000 test images: 55 %
複製代碼

這裏可能準確率並不必定同樣，教程中的結果是 51% ，由於權重初始化問題，可能多少有些浮動，相比隨機猜想 10 個類別的準確率(即 10%)，這個結果是不錯的，固然其實是很是很差，不過咱們僅僅採用 5 層網絡，並且僅僅做爲教程的一個示例代碼。

而後，還能夠再進一步，查看每一個類別的分類準確率，跟上述代碼有所不一樣的是，計算準確率部分是 c = (predicted == labels).squeeze()，這段代碼其實會根據預測和真實標籤是否相等，輸出 1 或者 0，表示真或者假，所以在計算當前類別正確預測數量時候直接相加，預測正確天然就是加 1，錯誤就是加 0，也就是沒有變化。

class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs, 1)
        c = (predicted == labels).squeeze()
        for i in range(4):
            label = labels[i]
            class_correct[label] += c[i].item()
            class_total[label] += 1
        

for i in range(10):
    print('Accuracy of %5s : %2d %%' % (classes[i], 100 * class_correct[i] / class_total[i]))
複製代碼

輸出結果，能夠看到貓、鳥、鹿是錯誤率前三，即預測最不許確的三個類別，反卻是船和卡車最準確。

Accuracy of plane : 58 %
Accuracy of   car : 59 %
Accuracy of  bird : 40 %
Accuracy of   cat : 33 %
Accuracy of  deer : 39 %
Accuracy of   dog : 60 %
Accuracy of  frog : 54 %
Accuracy of horse : 66 %
Accuracy of  ship : 70 %
Accuracy of truck : 72 %
複製代碼

4.3 在 GPU 上訓練

深度學習天然須要 GPU 來加快訓練速度的。因此接下來介紹若是是在 GPU 上訓練，應該如何實現。

首先，須要檢查是否有可用的 GPU 來訓練，代碼以下：

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)
複製代碼

輸出結果以下，這代表你的第一塊 GPU 顯卡或者惟一的 GPU 顯卡是空閒可用狀態，不然會打印 cpu 。

cuda:0
複製代碼

既然有可用的 GPU ，接下來就是在 GPU 上進行訓練了，其中須要修改的代碼以下，分別是須要將網絡參數和數據都轉移到 GPU 上：

net.to(device)
inputs, labels = inputs.to(device), labels.to(device)
複製代碼

修改後的訓練部分代碼：

import time
# 在 GPU 上訓練注意須要將網絡和數據放到 GPU 上
net.to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

start = time.time()
for epoch in range(2):
    
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # 獲取輸入數據
        inputs, labels = data
        inputs, labels = inputs.to(device), labels.to(device)
        # 清空梯度緩存
        optimizer.zero_grad()
        
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        
        # 打印統計信息
        running_loss += loss.item()
        if i % 2000 == 1999:
            # 每 2000 次迭代打印一次信息
            print('[%d, %5d] loss: %.3f' % (epoch + 1, i+1, running_loss / 2000))
            running_loss = 0.0
print('Finished Training! Total cost time: ', time.time() - start)
複製代碼

注意，這裏調用 net.to(device) 後，須要定義下優化器，即傳入的是 CUDA 張量的網絡參數。訓練結果和以前的相似，並且其實由於這個網絡很是小，轉移到 GPU 上並不會有多大的速度提高，並且個人訓練結果看來反而變慢了，也多是由於個人筆記本的 GPU 顯卡問題。

若是須要進一步提高速度，能夠考慮採用多 GPUs，這裏能夠查看數據並行教程，這是一個可選內容。

pytorch.org/tutorials/b…

本小節教程：

pytorch.org/tutorials/b…

本小節的代碼：

github.com/ccc013/Deep…

5. 數據並行

這部分教程將學習如何使用 DataParallel 來使用多個 GPUs 訓練網絡。

首先，在 GPU 上訓練模型的作法很簡單，以下代碼所示，定義一個 device 對象，而後用 .to() 方法將網絡模型參數放到指定的 GPU 上。

device = torch.device("cuda:0")
model.to(device)
複製代碼

接着就是將全部的張量變量放到 GPU 上：

mytensor = my_tensor.to(device)
複製代碼

注意，這裏 my_tensor.to(device) 是返回一個 my_tensor 的新的拷貝對象，而不是直接修改 my_tensor 變量，所以你須要將其賦值給一個新的張量，而後使用這個張量。

Pytorch 默認只會採用一個 GPU，所以須要使用多個 GPU，須要採用 DataParallel ，代碼以下所示：

model = nn.DataParallel(model)
複製代碼

這代碼也就是本節教程的關鍵，接下來會繼續詳細介紹。

5.1 導入和參數

首先導入必須的庫以及定義一些參數：

import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

# Parameters and DataLoaders
input_size = 5
output_size = 2

batch_size = 30
data_size = 100

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
複製代碼

這裏主要定義網絡輸入大小和輸出大小，batch 以及圖片的大小，並定義了一個 device 對象。

5.2 構建一個假數據集

接着就是構建一個假的(隨機)數據集。實現代碼以下：

class RandomDataset(Dataset):

    def __init__(self, size, length):
        self.len = length
        self.data = torch.randn(length, size)

    def __getitem__(self, index):
        return self.data[index]

    def __len__(self):
        return self.len

rand_loader = DataLoader(dataset=RandomDataset(input_size, data_size),
                         batch_size=batch_size, shuffle=True)
複製代碼

5.3 簡單的模型

接下來構建一個簡單的網絡模型，僅僅包含一層全鏈接層的神經網絡，加入 print() 函數用於監控網絡輸入和輸出 tensors 的大小：

class Model(nn.Module):
    # Our model

    def __init__(self, input_size, output_size):
        super(Model, self).__init__()
        self.fc = nn.Linear(input_size, output_size)

    def forward(self, input):
        output = self.fc(input)
        print("\tIn Model: input size", input.size(),
              "output size", output.size())

        return output
複製代碼

5.4 建立模型和數據平行

這是本節的核心部分。首先須要定義一個模型實例，而且檢查是否擁有多個 GPUs，若是是就能夠將模型包裹在 nn.DataParallel ，並調用 model.to(device) 。代碼以下：

model = Model(input_size, output_size)
if torch.cuda.device_count() > 1:
  print("Let's use", torch.cuda.device_count(), "GPUs!")
  # dim = 0 [30, xxx] -> [10, ...], [10, ...], [10, ...] on 3 GPUs
  model = nn.DataParallel(model)

model.to(device)
複製代碼

5.5 運行模型

接着就能夠運行模型，看看打印的信息：

for data in rand_loader:
    input = data.to(device)
    output = model(input)
    print("Outside: input size", input.size(),
          "output_size", output.size())
複製代碼

輸出以下：

In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
        In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
        In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
        In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
        In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
        In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
        In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2])
        In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2])
Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])
複製代碼

5.6 運行結果

若是僅僅只有 1 個或者沒有 GPU ，那麼 batch=30 的時候，模型會獲得輸入輸出的大小都是 30。但若是有多個 GPUs，那麼結果以下：

2 GPUs

# on 2 GPUs
Let's use 2 GPUs!
    In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
    In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
    In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
    In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
    In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
    In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
    In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2])
    In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2])
Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])
複製代碼

3 GPUs

Let's use 3 GPUs!
    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])
複製代碼

8 GPUs

Let's use 8 GPUs!
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])
複製代碼