本文目的:基於kaggle上狗的種類識別項目,展現如何利用PyTorch來進行模型微調。html
PyTorch中torchvision是一個針對視覺領域的工具庫,除了提供有大量的數據集,還有許多預訓練的經典模型。這裏以官方訓練好的resnet50爲例,拿來參加kaggle上面的dog breed狗的種類識別。python
import torch import torchvision import torch.nn as nn from torch.utils.data import Dataset, DataLoader from torchvision import datasets, models, transforms import pandas as pd import os from PIL import Image from sklearn.model_selection import StratifiedShuffleSplit print(torch.__version__) #1.1.0 print(torchvision.__version__) #0.3.0 #定義一些超參 IMG_SIZE = 224 #模型要求的輸入尺寸 IMG_MEAN = [0.485, 0.456, 0.406] #圖像預處理中須要的均值和方差 IMG_STD = [0.229, 0.224, 0.225] DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu') #儘可能使用GPU BATCH_SIZE = 64 #每個batch的大小 EPOCHS = 7 #訓練輪數
Pytorch中數據的讀取一般須要封裝成Dataset類對象和DataLoader類對象。linux
首先下載官方的數據並解壓,只要保持數據的目錄結構便可,這裏指定一下目錄的位置,而且看下內容。(注意:labels.csv文件中有10222條標籤,對應的是train文件夾中圖像。)git
#DATA_ROOT = r'D:\KaggleDatasets\competitions\dog-breed-identification' #注1:經常使用'/'表相對路徑,'\'表絕對路徑,網頁網址和linux系統下通常用'/' DATA_ROOT = '/KaggleDatasets/competitions/dog-breed-identification' df = pd.read_csv(os.path.join(DATA_ROOT, 'labels.csv')) df.head()
爲了後續方便,這裏定義兩個字典,並將類別序號添加進DataFrame中。github
#分別以標籤字符串和序號爲索引,定義兩個字典 breeds = df.breed.unique() breed2idx = dict((breed,idx) for idx,breed in enumerate(breeds)) idx2breed = dict((idx,breed) for idx,breed in enumerate(breeds)) len(breeds) #120 #將類別序號添加到df的列 df['label_idx'] = pd.Series(breed2idx, index=df.breed).values #df.shape #(10222, 3) df.head()
將數據分割成訓練集和驗證集。這裏只分割10%的數據做爲訓練時的驗證數據。多線程
#分割數據集 shuffle_split = StratifiedShuffleSplit(n_splits=1, test_size=0.1, random_state=0) #分層切割 train_idx, val_idx = next(iter(shuffle_split.split(df, df.breed))) #split方法返回迭代器 train_df = df.iloc[train_idx].reset_index(drop=True) #(9199, 3) val_df = df.iloc[val_idx].reset_index(drop=True) #(1023, 3)
注2:StratifiedShuffleSplit().split(X, y)dom
注3:sklearn中幾種數據切分方法ide
torch.utils.data.Dataset是一個抽象類, 自定義的Dataset須要繼承它而且實現兩個成員方法:函數
另外,transform過程也在此處傳進來。工具
#自定義Dataset class DogDataset(Dataset): def __init__(self, df, img_path, transform=None): self.df = df self.img_path = img_path self.transform = transform def __len__(self): return self.df.shape[0] #返回數據集長度 def __getitem__(self, idx): #每次根據idx返回一個(image,label)數據對 img_name = os.path.join(self.img_path, self.df.id[idx]) + '.jpg' img = Image.open(img_name) #建議用PIL,而非skimage label = self.df.label_idx[idx] if self.transform: img = self.transform(img) return img, label #自定義訓練集和驗證集的transform train_transform = transforms.Compose([ transforms.Resize(IMG_SIZE), transforms.RandomResizedCrop(IMG_SIZE), transforms.RandomHorizontalFlip(), transforms.RandomRotation(30), transforms.ToTensor(), transforms.Normalize(IMG_MEAN, IMG_STD), ]) test_transform = transforms.Compose([ transforms.Resize(IMG_SIZE), #注4:傳入一個int時,短邊縮放到IMG_SIZE,長邊按比例縮放 transforms.CenterCrop(IMG_SIZE), transforms.ToTensor(), transforms.Normalize(IMG_MEAN, IMG_STD), ]) #生成dataset train_dataset = DogDataset(train_df, os.path.join(DATA_ROOT,'train'), train_transform) val_dataset = DogDataset(val_df, os.path.join(DATA_ROOT,'train'), test_transform)
類定義爲:
torch.utils.data.DataLoader(dataset, batch_size=1, shuffle=False, sampler=None, batch_sampler=None, num_workers=0, ...)
能夠看到主要參數有這麼幾個:
#生成dataloader train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True) val_loader = DataLoader(val_dataset, batch_size=BATCH_SIZE, shuffle=True)
使用Pytorch中torchvision.models.resnet50。因爲ImageNet是識別1000個物體,這裏狗的分類一共只有120,因此須要對模型的最後一層全鏈接層進行微調,將輸出從1000改成120。
#準備模型 model = models.resnet50(pretrained=True) #可用dir(model)查看屬性及方法 #將全部參數凍結 for param in model.parameters(): param.requires_grad = False print(model.fc) #修改fc層。可用model.named_parameters()迭代查看具體名稱和參數 num_feature = model.fc.in_features #獲取fc層的輸入個數 model.fc = nn.Linear(num_feature, len(breeds)) #從新定義fc層 print(model.fc) #print(model) #將model移至GPU model.to(DEVICE)
注5:關於預訓練模型的使用,須要
訓練須要定義損失函數和優化器。另外也打包定義了訓練和驗證函數。
#指定損失函數和優化器 loss_fn = nn.CrossEntropyLoss() #注6:默認的reduction爲mean,即求平均損失 #optimizer = torch.optim.Adam([{'params':model.fc.parameters()}], lr=0.001) #定義fc層學習率 optimizer = torch.optim.Adam(model.fc.parameters(), lr=0.001) #定義訓練函數 #注7:訓練5部曲:梯度清零,前向傳播,計算損失,反向傳播,梯度更新。 def train(model, train_loader, device, epoch): model.train() #注8:開啓訓練模型,即開啓BN和Dropout等 for batch_idx, (data, target) in enumerate(train_loader): data, target = data.to(device), target.to(device) #注9:模型和數據均要移至GPU #data和target的size分別爲torch.Size([64, 3, 224, 224])、torch.Size([64]) optimizer.zero_grad() #梯度清零 yhat = model(data) #前向傳播 torch.Size([64, 120]) loss = loss_fn(yhat, target) #計算損失 loss.backward() #反向傳播 optimizer.step() #更新梯度 print('Train epoch {}\t Loss {:.6f}'.format(epoch, loss.item())) #定義測試函數 def test(model, val_loader, device): model.eval() test_loss = 0 #記錄測試損失 correct = 0 #記錄預測正確個數 with torch.no_grad(): for batch_idx, (data, target) in enumerate(val_loader): data, target = data.to(device), target.to(device) yhat = model(data) test_loss += loss_fn(yhat, target).item() #每次加上一個batch的平均損失值 pred = torch.max(yhat, dim=1, keepdim=True)[1] #注10:找到機率最大的下標 correct += pred.eq(target.view_as(pred)).sum().item() #累加正確的樣本個數 test_loss /= len(val_loader) #注意此處是除以batch個數,而非len(val_loader.dataset) print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.1f}%)\n'.format( test_loss, correct, len(val_loader.dataset), 100. * correct / len(val_loader.dataset)))
#開始訓練 for epoch in range(1, EPOCHS+1): %time train(model, train_loader, DEVICE, epoch) test(model, val_loader, DEVICE)
從結果能夠看出,運行幾輪以後準確率大約在80%左右,比隨機猜想(0.83%)要好不少。
Train epoch 1 Loss 1.935438 Wall time: 3min 26s Test set: Average loss: 1.2672, Accuracy: 723/1023 (70.7%) Train epoch 2 Loss 1.673698 Wall time: 1min 41s Test set: Average loss: 0.8607, Accuracy: 782/1023 (76.4%) Train epoch 3 Loss 1.657430 Wall time: 1min 41s Test set: Average loss: 0.7643, Accuracy: 795/1023 (77.7%) Train epoch 4 Loss 1.463368 Wall time: 1min 40s Test set: Average loss: 0.7109, Accuracy: 806/1023 (78.8%) Train epoch 5 Loss 1.849077 Wall time: 1min 40s Test set: Average loss: 0.7227, Accuracy: 803/1023 (78.5%) Train epoch 6 Loss 1.442590 Wall time: 1min 40s Test set: Average loss: 0.7080, Accuracy: 796/1023 (77.8%) Train epoch 7 Loss 1.540823 Wall time: 1min 41s Test set: Average loss: 0.6738, Accuracy: 822/1023 (80.4%)
Reference