關於卷積操做是如何進行的就沒必要多說了,結合代碼一步一步來看卷積層是怎麼實現的。git
代碼來源:https://github.com/eriklindernoren/ML-From-Scratchgithub
先看一下其基本的組件函數,首先是determine_padding(filter_shape, output_shape="same"):express
def determine_padding(filter_shape, output_shape="same"): # No padding if output_shape == "valid": return (0, 0), (0, 0) # Pad so that the output shape is the same as input shape (given that stride=1) elif output_shape == "same": filter_height, filter_width = filter_shape # Derived from: # output_height = (height + pad_h - filter_height) / stride + 1 # In this case output_height = height and stride = 1. This gives the # expression for the padding below. pad_h1 = int(math.floor((filter_height - 1)/2)) pad_h2 = int(math.ceil((filter_height - 1)/2)) pad_w1 = int(math.floor((filter_width - 1)/2)) pad_w2 = int(math.ceil((filter_width - 1)/2)) return (pad_h1, pad_h2), (pad_w1, pad_w2)
說明:根據卷積核的形狀以及padding的方式來計算出padding的值,包括上、下、左、右,其中out_shape=valid表示不填充。app
補充:dom
帶入實際的參數來看下輸出:ide
pad_h,pad_w=determine_padding((3,3), output_shape="same")
輸出:(1,1),(1,1)函數
而後是image_to_column(images, filter_shape, stride, output_shape='same')函數優化
def image_to_column(images, filter_shape, stride, output_shape='same'): filter_height, filter_width = filter_shape pad_h, pad_w = determine_padding(filter_shape, output_shape)# Add padding to the image images_padded = np.pad(images, ((0, 0), (0, 0), pad_h, pad_w), mode='constant')# Calculate the indices where the dot products are to be applied between weights # and the image k, i, j = get_im2col_indices(images.shape, filter_shape, (pad_h, pad_w), stride) # Get content from image at those indices cols = images_padded[:, k, i, j] channels = images.shape[1] # Reshape content into column shape cols = cols.transpose(1, 2, 0).reshape(filter_height * filter_width * channels, -1) return cols
說明:輸入的images的形狀是[batchsize,channel,height,width],相似於pytorch的圖像格式的輸入。也就是說images_padded是在height和width上進行padding的。在其中調用了get_im2col_indices()函數,那咱們接下來看看它是個什麼樣子的:ui
def get_im2col_indices(images_shape, filter_shape, padding, stride=1): # First figure out what the size of the output should be batch_size, channels, height, width = images_shape filter_height, filter_width = filter_shape pad_h, pad_w = padding out_height = int((height + np.sum(pad_h) - filter_height) / stride + 1) out_width = int((width + np.sum(pad_w) - filter_width) / stride + 1) i0 = np.repeat(np.arange(filter_height), filter_width) i0 = np.tile(i0, channels) i1 = stride * np.repeat(np.arange(out_height), out_width) j0 = np.tile(np.arange(filter_width), filter_height * channels) j1 = stride * np.tile(np.arange(out_width), out_height) i = i0.reshape(-1, 1) + i1.reshape(1, -1) j = j0.reshape(-1, 1) + j1.reshape(1, -1) k = np.repeat(np.arange(channels), filter_height * filter_width).reshape(-1, 1)return (k, i, j)
說明:單獨看很難理解,咱們仍是帶着帶着實際的參數一步步來看。this
get_im2col_indices((1,3,32,32), (3,3), ((1,1),(1,1)), stride=1)
說明:看一下每個變量的變化狀況,out_width和out_height就很少說,是卷積以後的輸出的特徵圖的寬和高維度。
補充:
有了這些大小仍是挺難理解的呀。那麼咱們繼續,須要明確的是k是對通道進行操做,i是對特徵圖的高,j是對特徵圖的寬。使用3×3的卷積核在一個通道上進行卷積,每次執行3×3=9個像素操做,共3個通道,因此共對9×3=27個像素點進行操做。而圖像大小是32×32,共1024個像素。再回去看這三行代碼:
cols = images_padded[:, k, i, j] channels = images.shape[1] # Reshape content into column shape cols = cols.transpose(1, 2, 0).reshape(filter_height * filter_width * channels, -1)
images_padded的大小是(1,3,34,34),則cols=images_padded的大小是(1,27,1024)
channels的大小是3
最終cols=cols.transpose(1,2,0).reshape(3*3*3,-1)的大小是(27,1024)。
當batchsize的大小不是1,假設是64時,那麼最終輸出的cols的大小就是:(27,1024×64)=(27,65536)。
最後就是卷積層的實現了:
首先有一個Layer通用基類,經過繼承該基類能夠實現不一樣的層,例如卷積層、池化層、批量歸一化層等等:
class Layer(object): def set_input_shape(self, shape): """ Sets the shape that the layer expects of the input in the forward pass method """ self.input_shape = shape def layer_name(self): """ The name of the layer. Used in model summary. """ return self.__class__.__name__ def parameters(self): """ The number of trainable parameters used by the layer """ return 0 def forward_pass(self, X, training): """ Propogates the signal forward in the network """ raise NotImplementedError() def backward_pass(self, accum_grad): """ Propogates the accumulated gradient backwards in the network. If the has trainable weights then these weights are also tuned in this method. As input (accum_grad) it receives the gradient with respect to the output of the layer and returns the gradient with respect to the output of the previous layer. """ raise NotImplementedError() def output_shape(self): """ The shape of the output produced by forward_pass """ raise NotImplementedError()
對於子類繼承該基類必需要實現的方法,若是沒有實現使用raise NotImplementedError()拋出異常。
接着就能夠基於該基類實現Conv2D了:
class Conv2D(Layer): """A 2D Convolution Layer. Parameters: ----------- n_filters: int The number of filters that will convolve over the input matrix. The number of channels of the output shape. filter_shape: tuple A tuple (filter_height, filter_width). input_shape: tuple The shape of the expected input of the layer. (batch_size, channels, height, width) Only needs to be specified for first layer in the network. padding: string Either 'same' or 'valid'. 'same' results in padding being added so that the output height and width matches the input height and width. For 'valid' no padding is added. stride: int The stride length of the filters during the convolution over the input. """ def __init__(self, n_filters, filter_shape, input_shape=None, padding='same', stride=1): self.n_filters = n_filters self.filter_shape = filter_shape self.padding = padding self.stride = stride self.input_shape = input_shape self.trainable = True def initialize(self, optimizer): # Initialize the weights filter_height, filter_width = self.filter_shape channels = self.input_shape[0] limit = 1 / math.sqrt(np.prod(self.filter_shape)) self.W = np.random.uniform(-limit, limit, size=(self.n_filters, channels, filter_height, filter_width)) self.w0 = np.zeros((self.n_filters, 1)) # Weight optimizers self.W_opt = copy.copy(optimizer) self.w0_opt = copy.copy(optimizer) def parameters(self): return np.prod(self.W.shape) + np.prod(self.w0.shape) def forward_pass(self, X, training=True): batch_size, channels, height, width = X.shape self.layer_input = X # Turn image shape into column shape # (enables dot product between input and weights) self.X_col = image_to_column(X, self.filter_shape, stride=self.stride, output_shape=self.padding) # Turn weights into column shape self.W_col = self.W.reshape((self.n_filters, -1)) # Calculate output output = self.W_col.dot(self.X_col) + self.w0 # Reshape into (n_filters, out_height, out_width, batch_size) output = output.reshape(self.output_shape() + (batch_size, )) # Redistribute axises so that batch size comes first return output.transpose(3,0,1,2) def backward_pass(self, accum_grad): # Reshape accumulated gradient into column shape accum_grad = accum_grad.transpose(1, 2, 3, 0).reshape(self.n_filters, -1) if self.trainable: # Take dot product between column shaped accum. gradient and column shape # layer input to determine the gradient at the layer with respect to layer weights grad_w = accum_grad.dot(self.X_col.T).reshape(self.W.shape) # The gradient with respect to bias terms is the sum similarly to in Dense layer grad_w0 = np.sum(accum_grad, axis=1, keepdims=True) # Update the layers weights self.W = self.W_opt.update(self.W, grad_w) self.w0 = self.w0_opt.update(self.w0, grad_w0) # Recalculate the gradient which will be propogated back to prev. layer accum_grad = self.W_col.T.dot(accum_grad) # Reshape from column shape to image shape accum_grad = column_to_image(accum_grad, self.layer_input.shape, self.filter_shape, stride=self.stride, output_shape=self.padding) return accum_grad def output_shape(self): channels, height, width = self.input_shape pad_h, pad_w = determine_padding(self.filter_shape, output_shape=self.padding) output_height = (height + np.sum(pad_h) - self.filter_shape[0]) / self.stride + 1 output_width = (width + np.sum(pad_w) - self.filter_shape[1]) / self.stride + 1 return self.n_filters, int(output_height), int(output_width)
假設輸入仍是(1,3,32,32)的維度,使用16個3×3的卷積核進行卷積,那麼self.W的大小就是(16,3,3,3),self.w0的大小就是(16,1)。
self.X_col的大小就是(27,1024),self.W_col的大小是(16,27),那麼output = self.W_col.dot(self.X_col) + self.w0的大小就是(16,1024)
最後是這麼使用的:
image = np.random.randint(0,255,size=(1,3,32,32)).astype(np.uint8) input_shape=image.squeeze().shape conv2d = Conv2D(16, (3,3), input_shape=input_shape, padding='same', stride=1) conv2d.initialize(None) output=conv2d.forward_pass(image,training=True) print(output.shape)
輸出結果:(1,16,32,32)
計算下參數:
print(conv2d.parameters())
輸出結果:448
也就是448=3×3×3×16+16
再是一個padding=valid的:
image = np.random.randint(0,255,size=(1,3,32,32)).astype(np.uint8) input_shape=image.squeeze().shape conv2d = Conv2D(16, (3,3), input_shape=input_shape, padding='valid', stride=1) conv2d.initialize(None) output=conv2d.forward_pass(image,training=True) print(output.shape) print(conv2d.parameters())
須要注意的是cols的大小變化了,由於咱們卷積以後的輸出是(1,16,30,30)
輸出:
cols的大小:(27,900)
(1,16,30,30)
448
最後是帶步長的:
image = np.random.randint(0,255,size=(1,3,32,32)).astype(np.uint8) input_shape=image.squeeze().shape conv2d = Conv2D(16, (3,3), input_shape=input_shape, padding='valid', stride=2) conv2d.initialize(None) output=conv2d.forward_pass(image,training=True) print(output.shape) print(conv2d.parameters())
cols的大小:(27,225)
(1,16,15,15)
448
最後補充下:
卷積層參數計算公式 :params=卷積核高×卷積核寬×通道數目×卷積核數目+偏置項(卷積核數目)
卷積以後圖像大小計算公式:
輸出圖像的高=(輸入圖像的高+padding(高)×2-卷積核高)/步長+1
輸出圖像的寬=(輸入圖像的寬+padding(寬)×2-卷積核寬)/步長+1
get_im2col_indices()函數中的變換操做是清楚了,至於爲何這麼變換的緣由還須要好好去琢磨。至於反向傳播和優化optimizer等研究好了以後再更新了。