Python圖像讀寫方法對比

時間 2020-11-15

標籤數組網絡框架 spa code orm blog 進程圖片欄目 Python 简体版

原文原文鏈接

　　訓練視覺相關的神經網絡模型時，老是要用到圖像的讀寫。方法有不少，好比matplotlib、cv二、PIL等。下面比較幾種讀寫方式，旨在選出一個最快的方式，提高訓練速度。數組

實驗標準

　　由於訓練使用的框架是Pytorch，所以讀取的實驗標準以下：網絡

　　一、讀取分辨率都爲1920x1080的5張圖片（png格式一張，jpg格式四張）並保存到數組。框架

　　二、將讀取的數組轉換爲維度順序爲CxHxW的Pytorch張量，並保存到顯存中（我使用GPU訓練），其中三個通道的順序爲RGB。spa

　　三、記錄各個方法在以上操做中所耗費的時間。由於png格式的圖片大小差很少是質量有微小差別的jpg格式的10倍，因此數據集一般不會用png來保存，就不比較這兩種格式的讀取時間差別了。code

　　寫入的實驗標準以下：orm

　　一、將5張1920x1080的5張圖像對應的Pytorch張量轉換爲對應方法可以使用的數據類型數組。blog

　　二、以jpg格式保存五張圖片。進程

　　三、記錄各個方法保存圖片所耗費的時間。圖片

實驗狀況

cv2

　　由於有GPU，因此cv2讀取圖片有兩種方式：ip

　　一、先把圖片都讀取爲一個numpy數組，再轉換成保存在GPU中的pytorch張量。

　　二、初始化一個保存在GPU中的pytorch張量，而後將每張圖直接複製進這個張量中。

　　第一種方式實驗代碼以下：

import os, torch
import cv2 as cv 
import numpy as np 
from time import time 
 
read_path = 'D:test'
write_path = 'D:test\\write\\'
 
# cv2讀取 1
start_t = time()
imgs = np.zeros([5, 1080, 1920, 3])
for img, i in zip(os.listdir(read_path), range(5)): 
  img = cv.imread(filename=os.path.join(read_path, img))
  imgs[i] = img   
imgs = torch.tensor(imgs).to('cuda')[...,[2,1,0]].permute([0,3,1,2])/255 
print('cv2 讀取時間1：', time() - start_t) 
# cv2保存
start_t = time()
imgs = (imgs.permute([0,2,3,1])[...,[2,1,0]]*255).cpu().numpy()
for i in range(imgs.shape[0]): 
  cv.imwrite(write_path + str(i) + '.jpg', imgs[i])
print('cv2 保存時間：', time() - start_t)

　　實驗結果：

cv2 讀取時間1： 0.39693760871887207
cv2 保存時間： 0.3560612201690674

　　第二種方式實驗代碼以下：

import os, torch
import cv2 as cv 
import numpy as np 
from time import time 
 
read_path = 'D:test'
write_path = 'D:test\\write\\'
 
 
# cv2讀取 2
start_t = time()
imgs = torch.zeros([5, 1080, 1920, 3], device='cuda')
for img, i in zip(os.listdir(read_path), range(5)): 
  img = torch.tensor(cv.imread(filename=os.path.join(read_path, img)), device='cuda')
  imgs[i] = img   
imgs = imgs[...,[2,1,0]].permute([0,3,1,2])/255 
print('cv2 讀取時間2：', time() - start_t) 
# cv2保存
start_t = time()
imgs = (imgs.permute([0,2,3,1])[...,[2,1,0]]*255).cpu().numpy()
for i in range(imgs.shape[0]): 
  cv.imwrite(write_path + str(i) + '.jpg', imgs[i])
print('cv2 保存時間：', time() - start_t)

　　實驗結果：

cv2 讀取時間2： 0.23636841773986816
cv2 保存時間： 0.3066873550415039

matplotlib

　　一樣兩種讀取方式，第一種代碼以下：

import os, torch 
import numpy as np
import matplotlib.pyplot as plt 
from time import time 
 
read_path = 'D:test'
write_path = 'D:test\\write\\'
 
# matplotlib 讀取 1
start_t = time()
imgs = np.zeros([5, 1080, 1920, 3])
for img, i in zip(os.listdir(read_path), range(5)): 
  img = plt.imread(os.path.join(read_path, img)) 
  imgs[i] = img    
imgs = torch.tensor(imgs).to('cuda').permute([0,3,1,2])/255  
print('matplotlib 讀取時間1：', time() - start_t) 
# matplotlib 保存
start_t = time()
imgs = (imgs.permute([0,2,3,1])).cpu().numpy()
for i in range(imgs.shape[0]):  
  plt.imsave(write_path + str(i) + '.jpg', imgs[i])
print('matplotlib 保存時間：', time() - start_t)

　　實驗結果：

matplotlib 讀取時間1： 0.45380306243896484
matplotlib 保存時間： 0.768944263458252

　　第二種方式實驗代碼：

import os, torch 
import numpy as np
import matplotlib.pyplot as plt 
from time import time 
 
read_path = 'D:test'
write_path = 'D:test\\write\\'
 
# matplotlib 讀取 2
start_t = time()
imgs = torch.zeros([5, 1080, 1920, 3], device='cuda')
for img, i in zip(os.listdir(read_path), range(5)): 
  img = torch.tensor(plt.imread(os.path.join(read_path, img)), device='cuda')
  imgs[i] = img    
imgs = imgs.permute([0,3,1,2])/255  
print('matplotlib 讀取時間2：', time() - start_t) 
# matplotlib 保存
start_t = time()
imgs = (imgs.permute([0,2,3,1])).cpu().numpy()
for i in range(imgs.shape[0]):  
  plt.imsave(write_path + str(i) + '.jpg', imgs[i])
print('matplotlib 保存時間：', time() - start_t)

　　實驗結果：

matplotlib 讀取時間2： 0.2044532299041748
matplotlib 保存時間： 0.4737534523010254

　　須要注意的是，matplotlib讀取png格式圖片獲取的數組的數值是在$[0, 1]$範圍內的浮點數，而jpg格式圖片倒是在$[0, 255]$範圍內的整數。因此若是數據集內圖片格式不一致，要注意先轉換爲一致再讀取，不然數據集的預處理就麻煩了。

PIL

　　PIL的讀取與寫入並不能直接使用pytorch張量或numpy數組，要先轉換爲Image類型，因此很麻煩，時間複雜度上確定也是佔下風的，就不實驗了。

torchvision

　　torchvision提供了直接從pytorch張量保存圖片的功能，和上面讀取最快的matplotlib的方法結合，代碼以下：

import os, torch  
import matplotlib.pyplot as plt 
from time import time 
from torchvision import utils 

read_path = 'D:test'
write_path = 'D:test\\write\\'
 
# matplotlib 讀取 2
start_t = time()
imgs = torch.zeros([5, 1080, 1920, 3], device='cuda')
for img, i in zip(os.listdir(read_path), range(5)): 
  img = torch.tensor(plt.imread(os.path.join(read_path, img)), device='cuda')
  imgs[i] = img    
imgs = imgs.permute([0,3,1,2])/255  
print('matplotlib 讀取時間2：', time() - start_t) 
# torchvision 保存
start_t = time() 
for i in range(imgs.shape[0]):   
  utils.save_image(imgs[i], write_path + str(i) + '.jpg')
print('torchvision 保存時間：', time() - start_t)

　　實驗結果：

matplotlib 讀取時間2： 0.15358829498291016
torchvision 保存時間： 0.14760661125183105

　　能夠看出這兩個是最快的讀寫方法。另外，要讓圖片的讀寫儘可能不影響訓練進程，咱們還可讓這兩個過程與訓練並行。另外，utils.save_image能夠將多張圖片拼接成一張來保存，具體使用方法以下：

utils.save_image(tensor = imgs,     # 要保存的多張圖片張量 shape = [n, C, H, W]
                 fp = 'test.jpg',   # 保存路徑
                 nrow = 5,          # 多圖拼接時，每行所佔的圖片數
                 padding = 1,       # 多圖拼接時，每張圖之間的間距
                 normalize = True,  # 是否進行規範化，一般輸出圖像用tanh，因此要用規範化 
                 range = (-1,1))    # 規範化的範圍