在TensorFlow中,使用tr.nn.conv2d來實現卷積操做,使用tf.nn.max_pool進行最大池化操做。經過闖傳入不一樣的參數,來實現各類不一樣類型的卷積與池化操做。ide
卷積函數tf.nn.conv2d函數
TensorFlow裏使用tf.nn.conv2d函數來實現卷積,其格式以下:ui
tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, name=None),參數的含義以下:spa
input:須要進行卷積操做的圖像,它要求是一個Tensor,具備[ batch, in_height, in_width, in_channels ]的形狀,這是一個四維的數據,其中各個位置參數的具體含義是:訓練時一個batch的圖片數量,圖片高度,圖片寬度,圖片通道數,數據的類型是float。code
filter:至關於CNN中的卷積核,它要求是一個Tensor,具備[ filter_height,filter_width,in_channels,out_channels ]這樣的shape,其中具體的含義是「卷積核的高度,卷積核寬度,圖片的通道數,濾波器的個數」,要求類型與參數input相同。有一個地方須要關注意,第三維in_channels就是參數input的第四維。blog
strides:卷積是在圖像每一位的步長,這是一個一維的向量,長度爲4。圖片
padding:定義元素邊框與元素內容之間的空間。string類型的量,只能是SAME和VALID其中之一,這個值決定不一樣的卷積方式,padding的值VALID時,表示邊緣不填充,當其爲SAME時,表示填充input
到濾波器能夠到達圖像邊緣。string
use_cudnn_on_gpu:bool類型,是否使用cudnn加速,默認爲true。it
返回值:函數的返回值仍然是一個Tensor,就是feature map。
import tensorflow as tf import numpy as np # [batch, im_height, im_weight, in_channels] input = tf.Variable(tf.constant(1.0, shape=[1, 5, 5, 1])) input2 = tf.Variable(tf.constant(1.0, shape=[1, 5, 5, 2])) input3 = tf.Variable(tf.constant(1.0, shape=[1, 4, 4, 1])) # [filter_height, filter_weight, in_channels, in_channels] filter1 = tf.Variable(tf.constant([-1.0, 0, 0, -1], shape=[2, 2, 1, 1])) filter2 = tf.Variable(tf.constant([-1.0, 0, 0, -1, -1, 0, 0, 1], shape=[2, 2, 1, 2])) filter3 = tf.Variable(tf.constant([-1.0, 0, 0, -1, -1.0, 0, 0, -1, -1.0, 0, 0, -1], shape=[2, 2, 1, 3])) filter4 = tf.Variable(tf.constant([-1.0, 0, 0, -1, -1.0, 0, 0, -1, -1.0, 0, 0, -1, -1.0, 0, 0, -1], shape=[2, 2, 2, 2])) filter5 = tf.Variable(tf.constant([-1.0, 0, 0, -1, -1.0, 0, 0, -1], shape=[2, 2, 2, 1])) op1 = tf.nn.conv2d(input, filter1, strides=[1, 2, 2, 1], padding='SAME') op2 = tf.nn.conv2d(input, filter2, strides=[1, 2, 2, 1], padding='SAME') op3 = tf.nn.conv2d(input, filter3, strides=[1, 2, 2, 1], padding='SAME') op4 = tf.nn.conv2d(input2, filter4, strides=[1, 2, 2, 1], padding='SAME') op5 = tf.nn.conv2d(input2, filter5, strides=[1, 2, 2, 1], padding='SAME') vop1 = tf.nn.conv2d(input, filter1, strides=[1, 2, 2, 1], padding='VALID') op6 = tf.nn.conv2d(input3, filter1, strides=[1, 2, 2, 1], padding='SAME') vop6 = tf.nn.conv2d(input3, filter1, strides=[1, 2, 2, 1], padding='VALID') init = tf.global_variables_initializer() with tf.Session() as sess: sess.run(init) print('-op1:\n', sess.run([op1, filter1])) print('-------------------------------') print('-op2:\n', sess.run([op2, filter2])) print('-op3:\n', sess.run([op3, filter3])) print('-------------------------------') print('-op4:\n', sess.run([op4, filter4])) print('-op5:\n', sess.run([op5, filter5])) print('-------------------------------') print('-op1:\n', sess.run([op1, filter1])) print('-vop1:\n', sess.run([vop1, filter1])) print('-op6:\n', sess.run([op6, filter1])) print('-vop6:\n', sess.run([vop6, filter1])) print('-------------------------------')
根據上述程序的運行結果,仔細的分析就能理解卷積是如何的工做的,須要注意的是SAME padding補0的狀況,
因爲可知,op1的卷積操做的結果是3*3的feature map,其餘的結果也是相似的狀況。當圖像是多通道的狀況時,卷積操做的結果是將各個通道的feature map的結果相加,做爲輸出的一張feature map。
圖像卷積示例
import tensorflow as tf import numpy as np import matplotlib.pyplot as plt import matplotlib.image as mping path = 'image/1.jpg' img = mping.imread(path) plt.imshow(img) plt.axis('off') plt.show() print(img.shape) full = np.reshape(img, [1, 374, 560, 3]) inputfull = tf.Variable(tf.constant(1.0, shape=[1, 374, 560, 3])) filter = tf.Variable(tf.constant([[-1.0, -1.0, -1.0], [0, 0, 0], [1.0, 1.0, 1.0], [-2.0, -2.0, -2.0], [0, 0, 0], [2.0, 2.0, 2.0], [-1.0, -1.0, -1.0], [0, 0, 0], [1.0, 1.0, 1.0]], shape=[3, 3, 3, 1])) op = tf.nn.conv2d(inputfull, filter, strides=[1, 1, 1, 1], padding='SAME') o = tf.cast(((op - tf.reduce_min(op))/(tf.reduce_max(op)-tf.reduce_min(op)))*255, tf.uint8) with tf.Session() as sess: sess.run(tf.global_variables_initializer()) t, f = sess.run([o, filter], feed_dict={inputfull: full}) t = np.reshape(t, [374, 560]) plt.imshow(t, cmap='Greys_r') plt.axis('off') plt.show()
池化函數tf.nn.max_pool(avg_pool)
TensorFlow裏的池化函數以下:
tf.nn.max_pool(input, ksize, strides, padding, name)
tf.nn.avg_pool(input, ksize, strides, padding, name)
上述的兩個池化函數中,4個參數的意義以下:
input:進行池化操做的數據,通常的池化層是在卷積層以後,因此一般的輸入是feature map,依然是[ batch, height, width, channels ]的形狀。
ksize:池化窗口的大小,4維的向量,通常是[1, height, width, 1],在batch和channles一般是不作池化的。
strides:和卷積參數含義相似,窗口在每個維度上面滑動,通常也是[1, strides,strides,1]
padding:和卷積參數含義同樣,也是取VALID或者SAME。
返回的Tensor:類型不變,shape仍然是[ batch, height, width, channels ]的形狀。
import tensorflow as tf import numpy as np import matplotlib.pyplot as plt import matplotlib.image as mping path = 'image/1.jpg' img = mping.imread(path) plt.imshow(img) plt.axis('off') plt.show() print(img.shape) full = np.reshape(img, [1, 374, 560, 3]) inputfull = tf.Variable(tf.constant(1.0, shape=[1, 374, 560, 3])) pooling = tf.nn.max_pool(inputfull, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME') pooling1 = tf.nn.max_pool(inputfull, [1, 2, 2, 1], [1, 1, 1, 1], padding='SAME') pooling2 = tf.nn.max_pool(inputfull, [1, 4, 4, 1], [1, 1, 1, 1], padding='SAME') pooling3 = tf.nn.max_pool(inputfull, [1, 4, 4, 1], [1, 4, 4, 1], padding='SAME') pooling4 = tf.nn.max_pool(inputfull, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME') o = tf.cast(((pooling - tf.reduce_min(pooling))/(tf.reduce_max(pooling)-tf.reduce_min(pooling)))*255, tf.uint8) with tf.Session() as sess: sess.run(tf.global_variables_initializer()) t = sess.run(o, feed_dict={inputfull: full}) print(t.shape) t = np.reshape(t, [187, 280, 3]) plt.imshow(t) plt.axis('off') plt.show()
池化操做的做用是將特徵圖中的信息進行放大,使得特性信息更加的明顯。