PyTorch載入圖片後ToTensor解讀（含PIL和OpenCV讀取圖片對比）

時間 2019-11-30

標籤 pytorch 載入圖片 totensor 解讀 pil opencv 讀取對比简体版

原文原文鏈接

概述

PyTorch在作通常的深度學習圖像處理任務時，先使用dataset類和dataloader類讀入圖片，在讀入的時候須要作transform變換，其中transform通常都須要ToTensor()操做，將dataset類中__getitem__()方法內讀入的PIL或CV的圖像數據轉換爲torch.FloatTensor。詳細過程以下：python

PIL與CV數據格式

PIL(RGB)
PIL(Python Imaging Library)是Python中最基礎的圖像處理庫，通常操做以下：

from PIL import Image
import numpy as np
image = Image.open('test.jpg') # 圖片是400x300 寬x高
print type(image) # out: PIL.JpegImagePlugin.JpegImageFile
print image.size  # out: (400,300)
print image.mode # out: 'RGB'
print image.getpixel((0,0)) # out: (143, 198, 201)
# resize w*h
image = image.resize((200,100)，Image.NEAREST)
print image.size # out: (200,100)
'''
代碼解釋
**注意image是 class:`~PIL.Image.Image` object**，它有不少屬性，好比它的size是(w,h),通道是RGB，，他也有不少方法，好比獲取getpixel((x,y))某個位置的像素，獲得三個通道的值，x最大可取w-1，y最大可取h-1
好比resize方法，能夠實現圖片的放縮，具體參數以下
resize(self, size, resample=0) method of PIL.Image.Image instance
    Returns a resized copy of this image.

    :param size: The requested size in pixels, as a 2-tuple:
       (width, height). 
    注意size是 (w,h),和本來的(w,h)保持一致
    :param resample: An optional resampling filter.  This can be
       one of :py:attr:`PIL.Image.NEAREST`, :py:attr:`PIL.Image.BOX`,
       :py:attr:`PIL.Image.BILINEAR`, :py:attr:`PIL.Image.HAMMING`,
       :py:attr:`PIL.Image.BICUBIC` or :py:attr:`PIL.Image.LANCZOS`.
       If omitted, or if the image has mode "1" or "P", it is
       set :py:attr:`PIL.Image.NEAREST`.
       See: :ref:`concept-filters`.
    注意這幾種插值方法，默認NEAREST最近鄰（分割經常使用），分類經常使用BILINEAR雙線性，BICUBIC立方
    :returns: An :py:class:`~PIL.Image.Image` object.

'''
image = np.array(image,dtype=np.float32) # image = np.array(image)默認是uint8
print image.shape # out: (100, 200, 3)
# 神奇的事情發生了，w和h換了，變成(h,w,c)了
# 注意ndarray中是 行row x 列col x 維度dim 因此行數是高，列數是寬

OpenCV(python版)(BGR)
OpenCV是一個很強大的圖像處理庫，適用面更廣，能夠在各類場合看到，性能也較好，相關代碼也較多。經常使用操做以下：

import cv2
import numpy as np
image = cv2.imread('test.jpg')
print type(image) # out: numpy.ndarray
print image.dtype # out: dtype('uint8')
print image.shape # out: (300, 400, 3) (h,w,c) 和skimage相似
print image # BGR
'''
array([
        [ [143, 198, 201 (dim=3)],[143, 198, 201],... (w=200)],
        [ [143, 198, 201],[143, 198, 201],... ],
        ...(h=100)
      ], dtype=uint8)

'''
# w*h
image = cv2.resize(image,(100,200),interpolation=cv2.INTER_LINEAR)
print image.dtype # out: dtype('uint8')
print image.shape # out: (200, 100, 3) 
'''
注意注意注意 和skimage不一樣 
resize(src, dsize[, dst[, fx[, fy[, interpolation]]]]) 
關鍵字參數爲dst,fx,fy,interpolation
dst爲縮放後的圖像
dsize爲(w,h),可是image是(h,w,c)
fx,fy爲圖像x,y方向的縮放比例，
interplolation爲縮放時的插值方式，有三種插值方式：
cv2.INTER_AREA:使用象素關係重採樣。當圖像縮小時候，該方法能夠避免波紋出現。當圖像放大時，相似於 CV_INTER_NN方法　　　　
cv2.INTER_CUBIC: 立方插值
cv2.INTER_LINEAR: 雙線形插值　
cv2.INTER_NN: 最近鄰插值
[詳細可查看該博客](http://www.tuicool.com/articles/rq6fIn)
'''
'''
cv2.imread(filename, flags=None):
flag:
cv2.IMREAD_COLOR 1: Loads a color image. Any transparency of image will be neglected. It is the default flag. 正常的3通道圖
cv2.IMREAD_GRAYSCALE 0: Loads image in grayscale mode 單通道灰度圖
cv2.IMREAD_UNCHANGED -1: Loads image as such including alpha channel 4通道圖
注意: 默認應該是cv2.IMREAD_COLOR，若是你cv2.imread('gray.png')，雖然圖片是灰度圖，可是讀入後會是3個通道值同樣的3通道圖片

'''

另外，PIL圖像在轉換爲numpy.ndarray後，格式爲(h,w,c)，像素順序爲RGB；
OpenCV在cv2.imread()後數據類型爲numpy.ndarray，格式爲(h,w,c)，像素順序爲BGR。函數

torchvision.transforms.ToTensor()

torchvision.transforms.transforms.py:61性能

class ToTensor(object):
    """Convert a ``PIL Image`` or ``numpy.ndarray`` to tensor.

    Converts a PIL Image or numpy.ndarray (H x W x C) in the range
    [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0].
    """

    def __call__(self, pic):
        """
        Args:
            pic (PIL Image or numpy.ndarray): Image to be converted to tensor.

        Returns:
            Tensor: Converted image.
        """
        return F.to_tensor(pic)

    def __repr__(self):
        return self.__class__.__name__ + '()'

torchvision.transforms.functional.py:32學習

def to_tensor(pic):
    """Convert a ``PIL Image`` or ``numpy.ndarray`` to tensor.

    See ``ToTensor`` for more details.

    Args:
        pic (PIL Image or numpy.ndarray): Image to be converted to tensor.

    Returns:
        Tensor: Converted image.
    """
    if not(_is_pil_image(pic) or _is_numpy_image(pic)):
        raise TypeError('pic should be PIL Image or ndarray. Got {}'.format(type(pic)))

    if isinstance(pic, np.ndarray):
        # handle numpy array
        img = torch.from_numpy(pic.transpose((2, 0, 1)))
        # backward compatibility
        if isinstance(img, torch.ByteTensor):
            return img.float().div(255)
        else:
            return img

    if accimage is not None and isinstance(pic, accimage.Image):
        nppic = np.zeros([pic.channels, pic.height, pic.width], dtype=np.float32)
        pic.copyto(nppic)
        return torch.from_numpy(nppic)

    # handle PIL Image
    if pic.mode == 'I':
        img = torch.from_numpy(np.array(pic, np.int32, copy=False))
    elif pic.mode == 'I;16':
        img = torch.from_numpy(np.array(pic, np.int16, copy=False))
    elif pic.mode == 'F':
        img = torch.from_numpy(np.array(pic, np.float32, copy=False))
    elif pic.mode == '1':
        img = 255 * torch.from_numpy(np.array(pic, np.uint8, copy=False))
    else:
        img = torch.ByteTensor(torch.ByteStorage.from_buffer(pic.tobytes()))
    # PIL image mode: L, P, I, F, RGB, YCbCr, RGBA, CMYK
    if pic.mode == 'YCbCr':
        nchannel = 3
    elif pic.mode == 'I;16':
        nchannel = 1
    else:
        nchannel = len(pic.mode)
    img = img.view(pic.size[1], pic.size[0], nchannel)
    # put it from HWC to CHW format
    # yikes, this transpose takes 80% of the loading time/CPU
    img = img.transpose(0, 1).transpose(0, 2).contiguous()
    if isinstance(img, torch.ByteTensor):
        return img.float().div(255)
    else:
        return img

能夠從to_tensor()函數看到，函數接受PIL Image或numpy.ndarray，將其先由HWC轉置爲CHW格式，再轉爲float後每一個像素除以255.ui

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。