文章目錄

引入

本節主要介紹多輸入通道或者多輸出通道的卷積核 $^{[1]}$ 。例如彩色圖像中，其在高和寬 $2$ 個維度外還有RGB $3$ 個顏色通道，即其能夠表示爲一個 $\times h \times w$ 的多維數組。 $3$ 所對應的維則稱爲通道維。html

1 多輸入通道

當輸入數據含多個通道時，須要構造一個輸入通道數與輸入數據的通道數相同的卷積核，從而可以與含多通道的輸入數據作互相關運算。
假設輸入數據的通道數爲 $c_i$ ，卷積核窗口形狀爲 $k_h \times k_w$ 。當 $c_i = 1$ 時，卷積核只需包含一個形狀爲 $k_h \times k_w$ 的二維數組。當 $c_i > 1$ 時，則須要爲每個輸入通道分配一個形狀爲 $k_h \times k_w$ 的核矩陣。
將這 $c_i$ 個矩陣在輸入通道進行連結，將獲得一個形狀爲 $c_i \times k_h \times k_w$ 的卷積核。對於每個通道，都作一次互相關運算，再將結果的二維輸出按通道相加，這個過程則是多通道輸入數據與多通道卷積核的二維互相關運算 (簡單說就是各個通道作一次互相關運算，再把結果相加)：python

import torch
import warnings
warnings.filterwarnings('ignore')


def corr2d_multi_in(x, k):
    ret_mat = corr2d(x[0, :, :], k[0, :, :])
    for i in range(1, x.shape[0]):
        ret_mat += corr2d(x[i, :, :], k[i, :, :])

    return ret_mat


def corr2d(x, k):
    """ Correlation compute with 2-dimensional matrix. """
    m, n = k.shape
    ret_mat = torch.zeros((x.shape[0] - m + 1, x.shape[1] - n + 1))
    for i in range(ret_mat.shape[0]):
        for j in range(ret_mat.shape[1]):
            ret_mat[i, j] = torch.sum(x[i: i + m, j: j + n] * k)

    return ret_mat


if __name__ == '__main__':
    temp_x = torch.reshape(torch.range(1, 18), (2, 3, 3))
    temp_k = torch.tensor([[[0, 1], [2, 3]], [[1, 2], [3, 4]]])
    print(corr2d_multi_in(temp_x, temp_k))

輸出以下：web

tensor([[152., 168.],
        [200., 216.]])

2 多通道輸出

簡單來講，假設想要獲取輸出通道數爲 $c_o$ 的輸出，那麼只須要輸入 $c_o \times \cdots$ 核矩陣， $\cdots$ 表示多通道輸入中的核矩陣形狀：數組

import torch
import warnings
warnings.filterwarnings('ignore')


def corr2d_multi_in_out(x, K):
    return torch.stack([corr2d_multi_in(x, k) for k in K])


def corr2d_multi_in(x, k):
    ret_mat = corr2d(x[0, :, :], k[0, :, :])
    for i in range(1, x.shape[0]):
        ret_mat += corr2d(x[i, :, :], k[i, :, :])

    return ret_mat


def corr2d(x, k):
    """ Correlation compute with 2-dimensional matrix. """
    m, n = k.shape
    ret_mat = torch.zeros((x.shape[0] - m + 1, x.shape[1] - n + 1))
    for i in range(ret_mat.shape[0]):
        for j in range(ret_mat.shape[1]):
            ret_mat[i, j] = torch.sum(x[i: i + m, j: j + n] * k)

    return ret_mat


if __name__ == '__main__':
    temp_x = torch.reshape(torch.range(1, 18), (2, 3, 3))
    temp_k = torch.tensor([[[0, 1], [2, 3]], [[1, 2], [3, 4]]])
    temp_k = torch.stack([temp_k, temp_k + 1, temp_k + 2])
    print(corr2d_multi_in_out(temp_x, temp_k))

輸出以下：svg

tensor([[[152., 168.],
         [200., 216.]],

        [[212., 236.],
         [284., 308.]],

        [[272., 304.],
         [368., 400.]]])

3 $\times 1$ 卷積層

當多通道卷積層的卷積窗口形狀爲 $\times 1$ 時，稱之爲 $\times 1$ 卷積層，其中的卷積運算稱爲 $\times 1$ 卷積。因爲使用了最小窗口，其失去了識別高和寬維度上相鄰元素構成的模式的功能。實際上， $\times 1$ 卷積的主要計算髮生在通道維上。
下圖展現了輸入通道爲 $2$ ，輸出通道爲 $1$ 的 $\times 1$ 卷積核的互相關運算。值得注意的是，輸入和輸出具備相同的高和寬。輸出中的每一個元素來自輸入中在高和寬上相應位置的元素在不一樣通道之間的按權重累加。
假設將通道維看成特徵維，將高和寬維度上的元素看做數據樣本，那麼** $\times 1$ 卷積層的做用與全鏈接層等級**。
學習

import torch
import warnings
warnings.filterwarnings('ignore')


def corr2d_multi_in_out_1x1(x, K):
    c_i, h, w = x.shape
    c_o = K.shape[0]
    x = x.view(c_i, h * w)
    K = K.view(c_o, c_i)
    y = torch.mm(K, x)
    return y.view(c_o, h, w)


def corr2d_multi_in_out(x, K):
    return torch.stack([corr2d_multi_in(x, k) for k in K])


def corr2d_multi_in(x, k):
    ret_mat = corr2d(x[0, :, :], k[0, :, :])
    for i in range(1, x.shape[0]):
        ret_mat += corr2d(x[i, :, :], k[i, :, :])

    return ret_mat


def corr2d(x, k):
    """ Correlation compute with 2-dimensional matrix. """
    m, n = k.shape
    ret_mat = torch.zeros((x.shape[0] - m + 1, x.shape[1] - n + 1))
    for i in range(ret_mat.shape[0]):
        for j in range(ret_mat.shape[1]):
            ret_mat[i, j] = torch.sum(x[i: i + m, j: j + n] * k)

    return ret_mat


if __name__ == '__main__':
    torch.manual_seed(1)
    temp_x = torch.rand(3, 3, 3)
    temp_k = torch.rand(2, 3, 1, 1)
    print(corr2d_multi_in_out_1x1(temp_x, temp_k))
    print(corr2d_multi_in_out(temp_x, temp_k))

輸出以下：ui

tensor([[[1.3486, 0.7537, 0.5711],
         [0.7351, 0.3631, 0.8528],
         [0.5868, 1.0130, 1.0915]],

        [[1.6899, 1.3451, 1.0295],
         [1.2994, 0.6615, 1.1993],
         [0.9793, 1.4752, 1.5119]]])
tensor([[[1.3486, 0.7537, 0.5711],
         [0.7351, 0.3631, 0.8528],
         [0.5868, 1.0130, 1.0915]],

        [[1.6899, 1.3451, 1.0295],
         [1.2994, 0.6615, 1.1993],
         [0.9793, 1.4752, 1.5119]]])

參考文獻
[1] 李沐、Aston Zhang等老師的這本《動手學深度學習》一書。spa

本文同步分享在博客「因吉」（CSDN）。
若有侵權，請聯繫 support@oschina.cn 刪除。
本文參與「OSC源創計劃」，歡迎正在閱讀的你也加入，一塊兒分享。.net

深度學習 (十八)：多輸入通道和多輸出通道

文章目錄

引入

1 多輸入通道

2 多通道輸出

3 1 × 1 1 \times 1 1×1卷積層

3 $\times 1$ 卷積層