來自這裏。html
在解決任何機器學習問題時,都須要在處理數據上花費大量的努力。PyTorch提供了不少工具來簡化數據加載,但願使代碼更具可讀性。在本教程中,咱們將學習如何從繁瑣的數據中加載、預處理數據或加強數據。python
開始本教程以前,請確認你已安裝以下Python包:數組
咱們接下來要處理的數據集是人臉姿態。這意味着人臉的註釋以下:網絡
總之,每一個面部都有68個不一樣標記點。dom
能夠從這裏下載數據集,並將其解壓後存放到目錄‘data/faces/’。機器學習
數據集來自帶有面部註釋的CSV文件,文件內容相似如下格式:函數
image_name,part_0_x,part_0_y,part_1_x,part_1_y,part_2_x, ... ,part_67_x,part_67_y 0805personali01.jpg,27,83,27,98, ... 84,134 1084239450_e76e00b7e7.jpg,70,236,71,257, ... ,128,312
接下來咱們快速讀取CSV文件,並從(N,2)數組中獲取註釋,N表示標記數量。工具
landmarks_frame = pd.read_csv('data/faces/face_landmarks.csv') n = 65 img_name = landmarks_frame.iloc[n, 0] landmarks = landmarks_frame.iloc[n, 1:].as_matrix() landmarks = landmarks.astype('float').reshape(-1, 2) print('Image name: {}'.format(img_name)) print('Landmarks shape: {}'.format(landmarks.shape)) print('First 4 Landmarks: {}'.format(landmarks[:4]))
輸出:post
Image name: person-7.jpg Landmarks shape: (68, 2) First 4 Landmarks: [[32. 65.] [33. 76.] [34. 86.] [34. 97.]]
如今咱們寫一個簡單的幫助函數:展現圖片和它的標記,用它來展現樣本。學習
def show_landmarks(image,landmarks): ''' 展現帶標記點的圖像 ''' plt.imshow(image) plt.scatter(landmarks[:,0],landmarks[:,1],s=10,marker='.',c='r') plt.pause(10) plt.figure() show_landmarks(io.imread(os.path.join('data/faces',img_name)),landmarks) plt.show()
torch.utils.data.Dataset
是一個表示數據集的抽象類。你自定義的數據集應該繼承Dataset
並重寫如下方法:
len(dataset)
是能夠返回數據集的大小dataset[i]
來獲取第i個樣本。如今咱們來實現咱們的面部標記數據集類。咱們將在__init__
中讀取CSV,而後再__getitem__
中讀取圖像。這樣能夠高效利用內存,由於全部的圖像並非都存在在內存中,而是按需讀取。
咱們數據集的樣本是字典格式的:{'image':image,'landmarks':landmarks}
。咱們的數據集將採用可選參數transform
,以便任何須要的處理均可以被應用在樣本上。在下一節中咱們會看到transform
的用途。
class FaceLandmarksDataset(Dataset): ''' Face Landmarks Dataset ''' def __init__(self,csv_file,root_dir,transform=None): ''' param csv_file(string): 帶註釋的CSV文件路徑 param root_dit(string): 存儲圖像的路徑 param transform(callable,optional): 被應用到樣本的可選transform操做 ''' self.landmarks_frame = pd.read_csv(csv_file) self.root_dir = root_dir self.transform = transform def __len__(self): return len(self.landmarks_frame) def __getitem__(self,idx): img_name = os.path.join(self.root_dir,self.landmarks_frame.iloc[idx,0]) image = io.imread(img_name) landmarks = self.landmarks_frame.iloc[idx,1:] landmarks = np.array([landmarks]) landmarks = landmarks.astype('float').reshape(-1,2) sample = {'image':image,'landmarks':landmarks} if self.transform: sample = self.transform(sample) return sample
如今咱們實例化這個類,而且迭代輸出部分樣本。咱們打印輸出前4個樣本並展現它們的標記。
face_dataset = FaceLandmarksDataset( csv_file='data/faces/face_landmarks.csv', root_dir='data/faces/') fig = plt.figure() for i in range(len(face_dataset)): sample = face_dataset[i] print(i, sample['image'].shape, sample['landmarks'].shape) ax = plt.subplot(1, 4, i+1) plt.tight_layout() ax.set_title('Sample #{}'.format(i)) ax.axis('off') show_landmarks(**sample) if i == 3: plt.show() break
輸出:
0 (324, 215, 3) (68, 2) 1 (500, 333, 3) (68, 2) 2 (250, 258, 3) (68, 2) 3 (434, 290, 3) (68, 2)
從上面的例子能夠看出這些樣本的尺寸並不一致。大多數神經網絡都指望圖像的尺寸是固定的。這樣的話,咱們就須要一些處理代碼。接下來咱們建立三個變換函數:
Rescale
:縮放圖像RandomCrop
:隨機裁剪圖像。這是數據擴充。ToTensor
:將numpy圖像轉爲torch圖像(咱們須要交換軸)。咱們將以類而不是簡單的函數的方式來實現它們,這樣就不須要在每次調用時都傳遞轉換須要的參數。這樣咱們只須要實現__call__
方法,須要的話還能夠實現__init__
方法。而後咱們能夠按以下的方式使用:
tsfm = Transform(params) transformed_sample = tsfm(sample)
下面展現如何將這些轉換同時應用在圖像和標記點。
class Rescale(object): ''' 按給定的尺寸縮放圖像 param output_size (tuple or int): 目標輸出尺寸。若是是tuple,輸出爲匹配的輸出尺寸;若是是int,則匹配較小的圖像邊緣,保證相同的長寬比例。 ''' def __init__(self, output_size): assert isinstance(output_size, (int, tuple)) self.output_size = output_size def __call__(self, sample): image, landmarks = sample['image'], sample['landmarks'] h, w = image.shape[:2] if isinstance(self.output_size, int): if h > w: new_h, new_w = self.output_size*h/w, self.output_size else: new_h, new_w = self.output_size, self.output_size*w/h else: new_h, new_w = self.output_size new_h, new_w = int(new_h), int(new_w) img = transform.resize(image, (new_h, new_w)) landmarks = landmarks*[new_w/w, new_h/h] return {'image': img, 'landmarks': landmarks} class RandomCrop(object): ''' 隨機裁剪圖像 param output_size (tuple or int): 目標輸出尺寸。若是是int,正方形裁剪 ''' def __init__(self, output_size): assert isinstance(output_size, (int, tuple)) if isinstance(output_size, int): self.output_size = (output_size, output_size) else: assert len(output_size) == 2 self.output_size = output_size def __call__(self, sample): image, landmarks = sample['image'], sample['landmarks'] h, w = image.shape[:2] new_h, new_w = self.output_size top = np.random.randint(0, h-new_h) left = np.random.randint(0, w-new_w) image = image[top:top+new_h, left:left+new_w] landmarks = landmarks - [left, top] return {'image': image, 'landmarks': landmarks} class ToTensor(object): ''' 將ndarrays格式樣本轉換爲Tensors ''' def __call__(self, sample): image, landmarks = sample['image'], sample['landmarks'] image = image.transpose((2, 0, 1)) return {'image': torch.from_numpy(image), 'landmarks': torch.from_numpy(landmarks)}
如今,咱們在樣本上應用轉換。
好比咱們想將圖片的短邊縮放爲256而後在隨機裁剪出一個224的正方形,那麼咱們將用到Rescale
和RandomCrop
。torchvision.transforms.Compost
能夠幫助咱們完成上述組合操做。
scale = Rescale(256) crop = RandomCrop(128) composed = transforms.Compose([Rescale(256), RandomCrop(224)]) fig = plt.figure() sample = face_dataset[65] for i, tsfrm in enumerate([scale, crop, composed]): transformed_sample = tsfrm(sample) ax = plt.subplot(1, 3, i+1) plt.tight_layout() ax.set_title(type(tsfrm).__name__) show_landmarks(**transformed_sample) plt.show()
接下來咱們將上面的代碼整合起來,建立一個帶有組合變換的數據集。綜上所述,每次採樣該數據集時:
咱們能夠是像以前同樣用for i in range
循環遍歷建立的數據集:
transformed_dataset = FaceLandmarksDataset( csv_file='data/faces/face_landmarks.csv', root_dir='data/faces/', transform=transforms.Compose([Rescale(256), RandomCrop(224), ToTensor()]) ) for i in range(len(transformed_dataset)): sample = transformed_dataset[i] print(i, sample['image'].size(), sample['landmarks'].size()) if i == 3: break
輸出:
0 torch.Size([3, 224, 224]) torch.Size([68, 2]) 1 torch.Size([3, 224, 224]) torch.Size([68, 2]) 2 torch.Size([3, 224, 224]) torch.Size([68, 2]) 3 torch.Size([3, 224, 224]) torch.Size([68, 2])
然而,只是使用建檔的for
訓練遍歷數據,咱們將丟失不少特徵。尤爲是咱們丟失了:
multiprocessing
並行加載數據torch.utils.data.DataLoader
是一個提供了全部這些功能的迭代器。接下來使用的參數是明朗的。一個有趣的參數是collate_fn
。你可使用collate_fn
指定須要如何對樣本進行批量處理。然而,默認的collate足夠勝任大多數使用場景。
dataloader = DataLoader(transformed_dataset, batch_size=4, shuffle=True, num_workers=4) def show_landmarks_batch(sample_batched): ''' 批量展現樣本 ''' images_batch, landmarks_batch = sample_batched['image'], sample_batched['landmarks'] batch_size = len(sample_batched) im_size = images_batch.size(2) grid_border_size = 2 grid = utils.make_grid(images_batch) plt.imshow(grid.numpy().transpose((1, 2, 0))) for i in range(batch_size): plt.scatter( landmarks_batch[i, :, 0].numpy() + i * im_size + (i+1)*grid_border_size, landmarks_batch[i, :, 1].numpy() + grid_border_size, s=10, marker='.', c='r' ) plt.title('Batch from dataloader') for i_batch,sample_batched in enumerate(dataloader): print(i_batch, sample_batched['image'].size(), sample_batched['landmarks'].size()) if i_batch == 3: plt.figure() show_landmarks_batch(sample_batched) plt.axis('off') plt.ioff() plt.show() break
輸出:
0 torch.Size([4, 3, 224, 224]) torch.Size([4, 68, 2]) 1 torch.Size([4, 3, 224, 224]) torch.Size([4, 68, 2]) 2 torch.Size([4, 3, 224, 224]) torch.Size([4, 68, 2]) 3 torch.Size([4, 3, 224, 224]) torch.Size([4, 68, 2])
在本教程中,咱們瞭解瞭如何實現並使用數據集、轉換和數據導入。torchvision
包提供了一些經常使用的數據集和轉換。你甚至可能不須要編寫自定義的類。在torchvision中最經常使用的數據集是ImageFolder
。它假設圖像的組織方式以下所示:
root/ants/xxx.png root/ants/xxy.jpeg root/ants/xxz.png . . . root/bees/123.jpg root/bees/nsdf3.png root/bees/asd932_.png
‘ants’、‘bees’等等都是類的標籤。在PIL.Image
上操做的相似經常使用的轉化,如RandomHorizontalFlip
、Scale
,都是可用的。你可使用它們來編寫想下面的數據導入:
import torch from torchvision import transforms, datasets data_transform = transforms.Compose([ transforms.RandomSizedCrop(224), transforms.RandomHorizontalFlip(), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ]) hymenoptera_dataset = datasets.ImageFolder(root='hymenoptera_data/train', transform=data_transform) dataset_loader = torch.utils.data.DataLoader(hymenoptera_dataset, batch_size=4, shuffle=True, num_workers=4)