PyTorch在作通常的深度學習圖像處理任務時,先使用dataset類和dataloader類讀入圖片,在讀入的時候須要作transform變換,其中transform通常都須要ToTensor()操做,將dataset類中__getitem__()方法內讀入的PIL或CV的圖像數據轉換爲torch.FloatTensor。詳細過程以下:python
from PIL import Image import numpy as np image = Image.open('test.jpg') # 圖片是400x300 寬x高 print type(image) # out: PIL.JpegImagePlugin.JpegImageFile print image.size # out: (400,300) print image.mode # out: 'RGB' print image.getpixel((0,0)) # out: (143, 198, 201) # resize w*h image = image.resize((200,100),Image.NEAREST) print image.size # out: (200,100) ''' 代碼解釋 **注意image是 class:`~PIL.Image.Image` object**,它有不少屬性,好比它的size是(w,h),通道是RGB,,他也有不少方法,好比獲取getpixel((x,y))某個位置的像素,獲得三個通道的值,x最大可取w-1,y最大可取h-1 好比resize方法,能夠實現圖片的放縮,具體參數以下 resize(self, size, resample=0) method of PIL.Image.Image instance Returns a resized copy of this image. :param size: The requested size in pixels, as a 2-tuple: (width, height). 注意size是 (w,h),和本來的(w,h)保持一致 :param resample: An optional resampling filter. This can be one of :py:attr:`PIL.Image.NEAREST`, :py:attr:`PIL.Image.BOX`, :py:attr:`PIL.Image.BILINEAR`, :py:attr:`PIL.Image.HAMMING`, :py:attr:`PIL.Image.BICUBIC` or :py:attr:`PIL.Image.LANCZOS`. If omitted, or if the image has mode "1" or "P", it is set :py:attr:`PIL.Image.NEAREST`. See: :ref:`concept-filters`. 注意這幾種插值方法,默認NEAREST最近鄰(分割經常使用),分類經常使用BILINEAR雙線性,BICUBIC立方 :returns: An :py:class:`~PIL.Image.Image` object. ''' image = np.array(image,dtype=np.float32) # image = np.array(image)默認是uint8 print image.shape # out: (100, 200, 3) # 神奇的事情發生了,w和h換了,變成(h,w,c)了 # 注意ndarray中是 行row x 列col x 維度dim 因此行數是高,列數是寬
import cv2 import numpy as np image = cv2.imread('test.jpg') print type(image) # out: numpy.ndarray print image.dtype # out: dtype('uint8') print image.shape # out: (300, 400, 3) (h,w,c) 和skimage相似 print image # BGR ''' array([ [ [143, 198, 201 (dim=3)],[143, 198, 201],... (w=200)], [ [143, 198, 201],[143, 198, 201],... ], ...(h=100) ], dtype=uint8) ''' # w*h image = cv2.resize(image,(100,200),interpolation=cv2.INTER_LINEAR) print image.dtype # out: dtype('uint8') print image.shape # out: (200, 100, 3) ''' 注意注意注意 和skimage不一樣 resize(src, dsize[, dst[, fx[, fy[, interpolation]]]]) 關鍵字參數爲dst,fx,fy,interpolation dst爲縮放後的圖像 dsize爲(w,h),可是image是(h,w,c) fx,fy爲圖像x,y方向的縮放比例, interplolation爲縮放時的插值方式,有三種插值方式: cv2.INTER_AREA:使用象素關係重採樣。當圖像縮小時候,該方法能夠避免波紋出現。當圖像放大時,相似於 CV_INTER_NN方法 cv2.INTER_CUBIC: 立方插值 cv2.INTER_LINEAR: 雙線形插值 cv2.INTER_NN: 最近鄰插值 [詳細可查看該博客](http://www.tuicool.com/articles/rq6fIn) ''' ''' cv2.imread(filename, flags=None): flag: cv2.IMREAD_COLOR 1: Loads a color image. Any transparency of image will be neglected. It is the default flag. 正常的3通道圖 cv2.IMREAD_GRAYSCALE 0: Loads image in grayscale mode 單通道灰度圖 cv2.IMREAD_UNCHANGED -1: Loads image as such including alpha channel 4通道圖 注意: 默認應該是cv2.IMREAD_COLOR,若是你cv2.imread('gray.png'),雖然圖片是灰度圖,可是讀入後會是3個通道值同樣的3通道圖片 '''
另外,PIL圖像在轉換爲numpy.ndarray後,格式爲(h,w,c),像素順序爲RGB;
OpenCV在cv2.imread()後數據類型爲numpy.ndarray,格式爲(h,w,c),像素順序爲BGR。函數
torchvision.transforms.transforms.py:61性能
class ToTensor(object): """Convert a ``PIL Image`` or ``numpy.ndarray`` to tensor. Converts a PIL Image or numpy.ndarray (H x W x C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0]. """ def __call__(self, pic): """ Args: pic (PIL Image or numpy.ndarray): Image to be converted to tensor. Returns: Tensor: Converted image. """ return F.to_tensor(pic) def __repr__(self): return self.__class__.__name__ + '()'
torchvision.transforms.functional.py:32學習
def to_tensor(pic): """Convert a ``PIL Image`` or ``numpy.ndarray`` to tensor. See ``ToTensor`` for more details. Args: pic (PIL Image or numpy.ndarray): Image to be converted to tensor. Returns: Tensor: Converted image. """ if not(_is_pil_image(pic) or _is_numpy_image(pic)): raise TypeError('pic should be PIL Image or ndarray. Got {}'.format(type(pic))) if isinstance(pic, np.ndarray): # handle numpy array img = torch.from_numpy(pic.transpose((2, 0, 1))) # backward compatibility if isinstance(img, torch.ByteTensor): return img.float().div(255) else: return img if accimage is not None and isinstance(pic, accimage.Image): nppic = np.zeros([pic.channels, pic.height, pic.width], dtype=np.float32) pic.copyto(nppic) return torch.from_numpy(nppic) # handle PIL Image if pic.mode == 'I': img = torch.from_numpy(np.array(pic, np.int32, copy=False)) elif pic.mode == 'I;16': img = torch.from_numpy(np.array(pic, np.int16, copy=False)) elif pic.mode == 'F': img = torch.from_numpy(np.array(pic, np.float32, copy=False)) elif pic.mode == '1': img = 255 * torch.from_numpy(np.array(pic, np.uint8, copy=False)) else: img = torch.ByteTensor(torch.ByteStorage.from_buffer(pic.tobytes())) # PIL image mode: L, P, I, F, RGB, YCbCr, RGBA, CMYK if pic.mode == 'YCbCr': nchannel = 3 elif pic.mode == 'I;16': nchannel = 1 else: nchannel = len(pic.mode) img = img.view(pic.size[1], pic.size[0], nchannel) # put it from HWC to CHW format # yikes, this transpose takes 80% of the loading time/CPU img = img.transpose(0, 1).transpose(0, 2).contiguous() if isinstance(img, torch.ByteTensor): return img.float().div(255) else: return img
能夠從to_tensor()函數看到,函數接受PIL Image或numpy.ndarray,將其先由HWC轉置爲CHW格式,再轉爲float後每一個像素除以255.ui