如何在PyTorch和TensorFlow中訓練圖像分類模型

做者|PULKIT SHARMA 編譯|Flin 來源|analyticsvidhyapython

介紹

圖像分類是計算機視覺的最重要應用之一。它的應用範圍包括從自動駕駛汽車中的物體分類到醫療行業中的血細胞識別，從製造業中的缺陷物品識別到創建能夠對戴口罩與否的人進行分類的系統。在全部這些行業中，圖像分類都以一種或另外一種方式使用。他們是如何作到的呢？他們使用哪一個框架？算法

你必須已閱讀不少有關不一樣深度學習框架（包括TensorFlow，PyTorch，Keras等）之間差別的信息。TensorFlow和PyTorch無疑是業內最受歡迎的框架。我相信你會發現無窮的資源來學習這些深度學習框架之間的異同。api

這是爲你提供的一份資源：每一個數據科學家都必須知道的5種驚人的深度學習框架！數組

https://www.analyticsvidhya.com/blog/2019/03/deep-learning-frameworks-comparison

在本文中，咱們將瞭解如何在PyTorch和TensorFlow中創建基本的圖像分類模型。咱們將從PyTorch和TensorFlow的簡要概述開始。而後，咱們將使用MNIST手寫數字分類數據集，並在PyTorch和TensorFlow中使用CNN（卷積神經網絡）創建圖像分類模型。網絡

這將是你的起點，而後你能夠選擇本身喜歡的任何框架，也能夠開始構建其餘計算機視覺模型。架構

若是你不熟悉深度學習並且對計算機視覺領域很感興趣（誰不是呢），請查看「認證計算機視覺碩士課程」。app

https://courses.analyticsvidhya.com/bundles/certified-computer-vision-masters-program

PyTorch概述

PyTorch在深度學習社區中愈來愈受歡迎，而且被深度學習從業者普遍使用，PyTorch是一個提供Tensor計算的Python軟件包。此外，tensors是多維數組，就像NumPy的ndarrays也能夠在GPU上運行同樣。

PyTorch的一個獨特功能是它使用動態計算圖。PyTorch的Autograd軟件包從張量生成計算圖並自動計算梯度。而不是具備特定功能的預約義圖形。

PyTorch爲咱們提供了一個框架，能夠隨時隨地構建計算圖，甚至在運行時進行更改。特別是，對於咱們不知道建立神經網絡須要多少內存的狀況，這頗有用。

你可使用PyTorch應對各類深度學習挑戰。如下是一些挑戰：

圖像（檢測，分類等）
文字（分類，生成等）
強化學習

若是你但願從頭開始瞭解PyTorch，則如下是一些詳細資源：

PyTorch入門指南
- https://www.analyticsvidhya.com/blog/2019/09/introduction-to-pytorch-from-scratch
在PyTorch中使用卷積神經網絡創建圖像分類模型
- https://www.analyticsvidhya.com/blog/2019/10/building-image-classification-models-cnn-pytorch
全部人的深度學習：使用PyTorch掌握強大的遷移學習藝術
- https://www.analyticsvidhya.com/blog/2019/10/how-to-master-transfer-learning-using-pytorch
使用PyTorch進行深度學習的圖像加強–圖像特徵工程
- https://www.analyticsvidhya.com/blog/2019/12/image-augmentation-deep-learning-pytorch

TensorFlow概述

TensorFlow由Google Brain團隊的研究人員和工程師開發。它與深度學習領域最經常使用的軟件庫相距甚遠（儘管其餘軟件庫正在迅速追趕）。

TensorFlow如此受歡迎的最大緣由之一是它支持多種語言來建立深度學習模型，例如Python，C ++和R。它提供了詳細的文檔和指南的指導。

TensorFlow包含許多組件。如下是兩個傑出的表明：

TensorBoard：使用數據流圖幫助有效地可視化數據
TensorFlow：對於快速部署新算法/實驗很是有用

TensorFlow當前正在運行2.0版本，該版本於2019年9月正式發佈。咱們還將在2.0版本中實現CNN。

若是你想了解有關此新版本的TensorFlow的更多信息，請查看TensorFlow 2.0深度學習教程

https://www.analyticsvidhya.com/blog/2020/03/tensorflow-2-tutorial-deep-learning

我但願你如今對PyTorch和TensorFlow都有基本的瞭解。如今，讓咱們嘗試使用這兩個框架構建深度學習模型並瞭解其內部工做。在此以前，讓咱們首先了解咱們將在本文中解決的問題陳述。

瞭解問題陳述：MNIST

在開始以前，讓咱們瞭解數據集。在本文中，咱們將解決流行的MNIST問題。這是一個數字識別任務，其中咱們必須將手寫數字的圖像分類爲0到9這10個類別之一。

在MNIST數據集中，咱們具備從各類掃描的文檔中獲取的數字圖像，尺寸通過標準化並居中。隨後，每一個圖像都是28 x 28像素的正方形（總計784像素）。數據集的標準拆分用於評估和比較模型，其中60,000張圖像用於訓練模型，而單獨的10,000張圖像集用於測試模型。

如今，咱們也瞭解了數據集。所以，讓咱們在PyTorch和TensorFlow中使用CNN構建圖像分類模型。咱們將從PyTorch中的實現開始。咱們將在google colab中實現這些模型，該模型提供免費的GPU以運行這些深度學習模型。

我但願你熟悉卷積神經網絡（CNN），若是沒有，請隨時參考如下文章：

從頭開始學習卷積神經網絡的綜合教程:https://www.analyticsvidhya.com/blog/2018/12/guide-convolutional-neural-network-cnn

在PyTorch中實現卷積神經網絡（CNN）

讓咱們首先導入全部庫：

# importing the libraries
import numpy as np
import torch
import torchvision
import matplotlib.pyplot as plt
from time import time
from torchvision import datasets, transforms
from torch import nn, optim

咱們還要在Google colab上檢查PyTorch的版本：

# version of pytorch
print(torch.__version__)

所以，我正在使用1.5.1版本的PyTorch。若是使用任何其餘版本，則可能會收到一些警告或錯誤，所以你能夠更新到此版本的PyTorch。咱們將對圖像執行一些轉換，例如對像素值進行歸一化，所以，讓咱們也定義這些轉換：

# transformations to be applied on images
transform = transforms.Compose([transforms.ToTensor(),
                              transforms.Normalize((0.5,), (0.5,)),
                              ])

如今，讓咱們加載MNIST數據集的訓練和測試集：

# defining the training and testing set
trainset = datasets.MNIST('./data', download=True, train=True, transform=transform)
testset = datasets.MNIST('./', download=True, train=False, transform=transform)

接下來，我定義了訓練和測試加載器，這將幫助咱們分批加載訓練和測試集。我將批量大小定義爲64：

# defining trainloader and testloader
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)
testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=True)

首先讓咱們看一下訓練集的摘要：

# shape of training data
dataiter = iter(trainloader)
images, labels = dataiter.next()

print(images.shape)
print(labels.shape)

所以，在每一個批次中，咱們有64個圖像，每一個圖像的大小爲28,28，而且對於每一個圖像，咱們都有一個相應的標籤。讓咱們可視化訓練圖像並查看其外觀：

# visualizing the training images
plt.imshow(images[0].numpy().squeeze(), cmap='gray')

它是數字0的圖像。相似地，讓咱們可視化測試集圖像：

# shape of validation data
dataiter = iter(testloader)
images, labels = dataiter.next()

print(images.shape)
print(labels.shape)

在測試集中，咱們也有大小爲64的批次。如今讓咱們定義架構

定義模型架構

咱們將在這裏使用CNN模型。所以，讓咱們定義並訓練該模型：

# defining the model architecture
class Net(nn.Module):   
  def __init__(self):
      super(Net, self).__init__()

      self.cnn_layers = nn.Sequential(
          # Defining a 2D convolution layer
          nn.Conv2d(1, 4, kernel_size=3, stride=1, padding=1),
          nn.BatchNorm2d(4),
          nn.ReLU(inplace=True),
          nn.MaxPool2d(kernel_size=2, stride=2),
          # Defining another 2D convolution layer
          nn.Conv2d(4, 4, kernel_size=3, stride=1, padding=1),
          nn.BatchNorm2d(4),
          nn.ReLU(inplace=True),
          nn.MaxPool2d(kernel_size=2, stride=2),
      )

      self.linear_layers = nn.Sequential(
          nn.Linear(4 * 7 * 7, 10)
      )

  # Defining the forward pass    
  def forward(self, x):
      x = self.cnn_layers(x)
      x = x.view(x.size(0), -1)
      x = self.linear_layers(x)
      return x

咱們還定義優化器和損失函數，而後咱們將看一下該模型的摘要：

# defining the model
model = Net()
# defining the optimizer
optimizer = optim.Adam(model.parameters(), lr=0.01)
# defining the loss function
criterion = nn.CrossEntropyLoss()
# checking if GPU is available
if torch.cuda.is_available():
    model = model.cuda()
    criterion = criterion.cuda()
    
print(model)

所以，咱們有2個卷積層，這將有助於從圖像中提取特徵。這些卷積層的特徵傳遞到徹底鏈接的層，該層將圖像分類爲各自的類別。如今咱們的模型架構已準備就緒，讓咱們訓練此模型十個時期：

for i in range(10):
    running_loss = 0
    for images, labels in trainloader:

        if torch.cuda.is_available():
          images = images.cuda()
          labels = labels.cuda()

        # Training pass
        optimizer.zero_grad()
        
        output = model(images)
        loss = criterion(output, labels)
        
        #This is where the model learns by backpropagating
        loss.backward()
        
        #And optimizes its weights here
        optimizer.step()
        
        running_loss += loss.item()
    else:
        print("Epoch {} - Training loss: {}".format(i+1, running_loss/len(trainloader)))

你會看到訓練隨着時期的增長而減小。這意味着咱們的模型是從訓練集中學習模式。讓咱們在測試集上檢查該模型的性能：

# getting predictions on test set and measuring the performance
correct_count, all_count = 0, 0
for images,labels in testloader:
  for i in range(len(labels)):
    if torch.cuda.is_available():
        images = images.cuda()
        labels = labels.cuda()
    img = images[i].view(1, 1, 28, 28)
    with torch.no_grad():
        logps = model(img)

    
    ps = torch.exp(logps)
    probab = list(ps.cpu()[0])
    pred_label = probab.index(max(probab))
    true_label = labels.cpu()[i]
    if(true_label == pred_label):
      correct_count += 1
    all_count += 1

print("Number Of Images Tested =", all_count)
print("\nModel Accuracy =", (correct_count/all_count))

所以，咱們總共測試了10000張圖片，而且該模型在預測測試圖片的標籤方面的準確率約爲96％。

這是你能夠在PyTorch中構建卷積神經網絡的方法。在下一節中，咱們將研究如何在TensorFlow中實現相同的體系結構。

在TensorFlow中實施卷積神經網絡（CNN）

如今，讓咱們在TensorFlow中使用卷積神經網絡解決相同的MNIST問題。與往常同樣，咱們將從導入庫開始：

# importing the libraries
import tensorflow as tf

from tensorflow.keras import datasets, layers, models
from tensorflow.keras.utils import to_categorical
import matplotlib.pyplot as plt

檢查一下咱們正在使用的TensorFlow的版本：

# version of tensorflow
print(tf.__version__)

所以，咱們正在使用TensorFlow的2.2.0版本。如今讓咱們使用tensorflow.keras的數據集類加載MNIST數據集：

(train_images, train_labels), (test_images, test_labels) = datasets.mnist.load_data(path='mnist.npz')
# Normalize pixel values to be between 0 and 1
train_images, test_images = train_images / 255.0, test_images / 255.0

在這裏，咱們已經加載了訓練以及MNIST數據集的測試集。此外，咱們已經將訓練和測試圖像的像素值標準化了。接下來，讓咱們可視化來自數據集的一些圖像：

# visualizing a few images
plt.figure(figsize=(10,10))
for i in range(9):
    plt.subplot(3,3,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(train_images[i], cmap='gray')
plt.show()

這就是咱們的數據集的樣子。咱們有手寫數字的圖像。再來看一下訓練和測試集的形狀：

# shape of the training and test set
(train_images.shape, train_labels.shape), (test_images.shape, test_labels.shape)

所以，咱們在訓練集中有60,000張28乘28的圖像，在測試集中有10,000張相同形狀的圖像。接下來，咱們將調整圖像的大小，並一鍵編碼目標變量：

# reshaping the images
train_images = train_images.reshape((60000, 28, 28, 1))
test_images = test_images.reshape((10000, 28, 28, 1))

# one hot encoding the target variable
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

定義模型體系結構

如今，咱們將定義模型的體系結構。咱們將使用Pytorch中定義的相同架構。所以，咱們的模型將是具備2個卷積層，以及最大池化層的組合，而後咱們將有一個Flatten層，最後是一個有10個神經元的全鏈接層，由於咱們有10個類。

# defining the model architecture
model = models.Sequential()
model.add(layers.Conv2D(4, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2), strides=2))
model.add(layers.Conv2D(4, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2), strides=2))
model.add(layers.Flatten())
model.add(layers.Dense(10, activation='softmax'))

讓咱們快速看一下該模型的摘要：

# summary of the model
model.summary()

總而言之，咱們有2個卷積層，2個最大池層，一個Flatten層和一個全鏈接層。模型中的參數總數爲1198個。如今咱們的模型已經準備好了，咱們將編譯它：

# compiling the model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

咱們正在使用Adam優化器，你也能夠對其進行更改。損失函數被設置爲分類交叉熵，由於咱們正在解決一個多類分類問題，而且度量標準是‘accuracy’。如今讓咱們訓練模型10個時期

# training the model
history = model.fit(train_images, train_labels, epochs=10, validation_data=(test_images, test_labels))

總而言之，最初，訓練損失約爲0.46，通過10個時期後，訓練損失降至0.08。10個時期後的訓練和驗證準確性分別爲97.31％和97.48％。

所以，這就是咱們能夠在TensorFlow中訓練CNN的方式。

尾註

總而言之，在本文中，咱們首先研究了PyTorch和TensorFlow的簡要概述。而後咱們瞭解了MNIST手寫數字分類的挑戰，最後，在PyTorch和TensorFlow中使用CNN（卷積神經網絡）創建了圖像分類模型。如今，我但願你熟悉這兩個框架。下一步，應對另外一個圖像分類挑戰，並嘗試同時使用PyTorch和TensorFlow來解決。

下面是一些練習和圖像分類方面的技巧

識別服裝（時尚MNIST）：https://datahack.analyticsvidhya.com/contest/practice-problem-identify-the-apparels

原文連接：https://www.analyticsvidhya.com/blog/2020/07/how-to-train-an-image-classification-model-in-pytorch-and-tensorflow/

歡迎關注磐創AI博客站： http://panchuang.net/

sklearn機器學習中文官方文檔： http://sklearn123.com/

歡迎關注磐創博客資源彙總站： http://docs.panchuang.net/