多層前饋神經網絡是指在多層的神經網絡中,每層神經元與下一層神經元徹底互連,神經元之間不存在同層鏈接,也不存在跨層鏈接的狀況,如圖 11所示。git
圖 11 數組
對於上圖中隱藏層的第j個神經元的輸出能夠表示爲:網絡
其中,f是激活函數,bj爲每一個神經元的偏置。app
卷積神經網絡與多層前饋神經網絡的結構不同,其每層神經元與下一層神經元不是全互連,而是部分鏈接,即每層神經層中只有部分的神經元與下一層神經元有鏈接,可是神經元之間不存在同層鏈接,也不存在跨層鏈接的狀況,這兩點與多層神經網絡結構相似。如圖 12所示。ide
圖 12 函數
圖 12中的輸入層有4個神經元,但隱藏層的每一個神經元只有3個輸入,而圖 11中的多層前饋神經網絡結構中,隱藏層的每一個神經元有4個輸入層神經元的輸入。性能
其中將輸入層中的局部神經元稱爲局部感覺野,如圖 12所示中,(x1,x2,x3),(x2,x3,x4),(x3,x4)都爲局部感覺野。優化
卷積神經網絡還有一點與前饋神經網絡不一樣的,就是對於隱藏層中每一個神經元共用一套輸入權重,同時共享同一個偏置。因此對於圖 12中隱藏層的第j個神經元的輸出能夠表示爲:ui
i的區間是[0,1],f是激活函數,b爲每一個神經元的共享偏置。this
其中將輸入層到隱藏層中所共用的那一套權重和所共用那一個偏置,稱爲共享權重和共享偏置。
從隱藏層到輸出層也不是全鏈接結構,如圖 12所示,也是隱藏層部分神經元鏈接到輸出層神經元。同時隱藏層神經元到輸出層神經元的計算方式有多種,如經常使用的最大值池化(max-pooling)法,輸出層每一個神經元選擇從隱藏層鏈接到其神經元中最大的那個,如在圖 12中y1,y2,y3的值分別爲1,2,3。那麼o1爲2,o2爲3.固然卷積神經網絡的池化方法還有不少種,如L2法等。
圖 12中輸入層只經過一套權重和一個偏置將輸入層神經元映射到一個隱藏層,其實卷積神經網絡能夠經過多套權重和多個偏置將輸入層映射爲多個隱藏層。這些隱藏層是平行的。多少個特徵映射徹底取決於用戶的計算須要。如圖 13所示,第一次卷積運算時,一個輸入層被映射爲4個隱藏層(卷積層);第二次卷積運算時,每一個輸入層(池化層)被映射爲3個隱藏層。因此通過第二次卷積後,總共有12個卷積層。
圖 13
上述介紹的卷積神經網絡結構都是一維形式,即輸入層、隱藏層和輸出層都是一個向量形式。可是圖像是一個二維結構,即一個矩陣形式。因此將卷積神經網絡應用到圖像識別上,須要轉變一下思惟,即將數據從一維轉變到二維。
如圖 14所示將一張28*28的圖像(輸入層)進行卷積運算,其中局部感覺野爲5*5。對於隱藏層的第一個像素點能夠由輸入層的前5*5矩形全部像素點進行計算而得,即
其中,i=0,j=0,若將式(3)轉換爲一維的,則可表示爲:
圖 14
以此類推能計算出隱藏層的第二個像素點,如圖 15所示,即經過公式能夠表示爲
其中,i=0,j=0,並且式(3)和式(5)中的權重wk,j和偏置b是相同的。
圖 15
接着對隱藏層中的2*2矩形採用最大法進行池化,就能造成一個輸出層,如圖 16所示。
圖 16
那麼經過3組特徵映射就能將一個輸入層映射爲3個隱藏層了,而後每一個隱藏層能池化爲一個輸出層,如圖 17所示的結構。
圖 17
正如上述所介紹的,卷積神經網絡有兩個主要計算步驟:卷積和池化。TensorFlow爲方便用戶進行計算,提供了衆多API來進行計算。
因爲在TensorFlow中常會出現輸入數據維度與API接口行參維度不一致,如輸入數據集爲一個[784]結構的向量(數組),而API須要一個[28,28]結構的矩陣,那麼就須要將一維(1-d)的向量轉換爲二維(2-d)的矩陣,固然保持數據不變。那麼此時就可使用TensorFlow提供的reshape函數。
def reshape(tensor, shape, name=None): |
其中reshape函數的主要參數語義爲:
# tensor 't' is [1, 2, 3, 4, 5, 6, 7, 8, 9] # tensor 't' has shape [9] reshape(t, [3, 3]) ==> [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
# tensor 't' is [[[1, 1], [2, 2]], # [[3, 3], [4, 4]]] # tensor 't' has shape [2, 2, 2] reshape(t, [2, 4]) ==> [[1, 1, 2, 2], [3, 3, 4, 4]]
# tensor 't' is [[[1, 1, 1], # [2, 2, 2]], # [[3, 3, 3], # [4, 4, 4]], # [[5, 5, 5], # [6, 6, 6]]] # tensor 't' has shape [3, 2, 3] # pass '[-1]' to flatten 't' reshape(t, [-1]) ==> [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6]
# -1 can also be used to infer the shape
# -1 is inferred to be 9: reshape(t, [2, -1]) ==> [[1, 1, 1, 2, 2, 2, 3, 3, 3], [4, 4, 4, 5, 5, 5, 6, 6, 6]] # -1 is inferred to be 2: reshape(t, [-1, 9]) ==> [[1, 1, 1, 2, 2, 2, 3, 3, 3], [4, 4, 4, 5, 5, 5, 6, 6, 6]] # -1 is inferred to be 3: reshape(t, [ 2, -1, 3]) ==> [[[1, 1, 1], [2, 2, 2], [3, 3, 3]], [[4, 4, 4], [5, 5, 5], [6, 6, 6]]] |
即若shape中某一維度指定的是-1,那麼reshape會自動將數據填充到所指定的那一維中。
TensorFlow提供conv2d函數來實現神經網絡的卷積運算,以下所示的定義:
def tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, data_format=None, name=None) |
主要參數語義爲:
如圖 14的池化操做,能夠按以下使用:
x_image = tf.reshape(x, [-1, 28, 28, 1])
initial_w = tf.truncated_normal([5, 5, 1, 3], stddev=0.1) w=tf.Variable(initial_w)
initial_d = tf.constant(0.1, shape= [3]) d=tf.Variable(initial_d)
y=tf.nn.relu (tf.nn.conv2d(x_image, w )+d) |
其中:
TensorFlow提供多個函數來實現神經網絡的池化運算,因爲池化函數定義的參數語義相似,因此這裏只介紹其中的max_pool函數,以下是其定義:
def tf.nn. max_pool(value, ksize, strides, padding, data_format="NHWC", name=None): |
主要參數語義爲:
如所示的池化操做,能夠用TensorFlow進行以下操做:
tf.nn.max_pool(y, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME') |
爲了後續計算方便,咱們定義了以下四個函數:
def conv2d(x, W): """conv2d returns a 2d convolution layer with full stride.""" return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME') |
def max_pool_2x2(x): """max_pool_2x2 downsamples a feature map by 2X.""" return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME') |
def weight_variable(shape): """weight_variable generates a weight variable of a given shape.""" initial = tf.truncated_normal(shape, stddev=0.1) return tf.Variable(initial) |
def bias_variable(shape): """bias_variable generates a bias variable of a given shape.""" initial = tf.constant(0.1, shape=shape) return tf.Variable(initial) |
獲取的mnist數據,是以[6000,784]結構存在的tensor數據。爲了可以使用TensorFlow的 tf.nn.conv2d 函數,因此須要將輸入數據進行結構重置。
# grayscale -- it would be 3 for an RGB image, 4 for RGBA, etc. x_image = tf.reshape(x, [-1, 28, 28, 1])
# First convolutional layer - maps one grayscale image to 32 feature maps. W_conv1 = weight_variable([5, 5, 1, 32]) b_conv1 = bias_variable([32]) h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
# Pooling layer - downsamples by 2X. h_pool1 = max_pool_2x2(h_conv1) |
對於數據數據的每張圖像是以[28,28]形式;經過卷積後,轉變爲[28,28,32]形式,其中32是其特徵映射組的數量;再進行池化後,轉變爲[14,14,64]的形式。如圖 21所示。
圖 21
在第一次卷積和池化後生成的h_pool1對象是一個[None,14,14,64]的tensor。即對於第二次卷積來講,h_pool1就是一個輸入tensor。以下所示的卷積和池化操做。
# Second convolutional layer -- maps 32 feature maps to 64. W_conv2 = weight_variable([5, 5, 32, 64]) b_conv2 = bias_variable([64]) h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
# Second pooling layer. h_pool2 = max_pool_2x2(h_conv2) |
圖 22
在兩次卷積池化後,再採用傳統前饋網絡進行訓練。第二次池化後的h_pool2對象是一個[None,7,7,64]的tensor,即一張圖片從一開始輸入,通過兩次卷積池化後,變成一張有7*7*64個像素點的圖像。
因爲傳統前饋網絡的輸入數據和輸出數據是一個一維(1-d)結構,因此須要對h_pool2對象進行結構重置。
# Fully connected layer 1 -- after 2 round of downsampling, our 28x28 image # is down to 7x7x64 feature maps -- maps this to 1024 features. W_fc1 = weight_variable([7 * 7 * 64, 1024]) b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64]) h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1) |
如圖 23所示的重置和全鏈接網絡結構:
圖 23
# Dropout - controls the complexity of the model, prevents co-adaptation of # features. keep_prob = tf.placeholder(tf.float32) h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob) |
對於多層前饋神經網絡有一個輸入層和一個輸出層,以及多個隱藏層。咱們只實現一個隱藏層,因此這裏直接將隱藏層轉換爲輸出層,以下所示的程序:
# Map the 1024 features to 10 classes, one for each digit W_fc2 = weight_variable([1024, 10]) b_fc2 = bias_variable([10])
y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2 |
結合圖 23的輸入層和隱藏層,增長了輸出層,整個多層前饋神經網絡的結構如圖 24所示。
圖 24
每一張圖像([784]類型的向量)經過多層前饋神經網絡運算輸出一個[10]向量後,此時可使用softmax激活函數,生成一個[10]的標籤,指明是哪個阿拉伯數字了。以下所示進行數據訓練的過程:
#建立優化器,使其來優化W和b等參數 cross_entropy = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv)) train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
#經過卷積和前饋網絡計算後,有y_conv的預測值,因此可以將其與y_進行比較,從而測量其性能。 correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
with tf.Session() as sess: sess.run(tf.global_variables_initializer()) for i in range(20000): batch = mnist.train.next_batch(50) if i % 100 == 0: train_accuracy = accuracy.eval(feed_dict={ x: batch[0], y_: batch[1], keep_prob: 1.0}) print('step %d, training accuracy %g' % (i, train_accuracy)) train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
print('test accuracy %g' % accuracy.eval(feed_dict={ x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0})) |
該附錄程序是來自 \tensorflow\examples\tutorials\mnist\mnist_deep.py。可是mnist數據存在本地的'/tmp/MNIST_data/'路徑。
from __future__ import absolute_import from __future__ import division from __future__ import print_function import argparse import sys from tensorflow.examples.tutorials.mnist import input_data import tensorflow as tf
FLAGS = None
def deepnn(x): """deepnn builds the graph for a deep net for classifying digits.
Args: x: an input tensor with the dimensions (N_examples, 784), where 784 is the number of pixels in a standard MNIST image.
Returns: A tuple (y, keep_prob). y is a tensor of shape (N_examples, 10), with values equal to the logits of classifying the digit into one of 10 classes (the digits 0-9). keep_prob is a scalar placeholder for the probability of dropout. """ # Reshape to use within a convolutional neural net. # Last dimension is for "features" - there is only one here, since images are # grayscale -- it would be 3 for an RGB image, 4 for RGBA, etc. x_image = tf.reshape(x, [-1, 28, 28, 1])
# First convolutional layer - maps one grayscale image to 32 feature maps. W_conv1 = weight_variable([5, 5, 1, 32]) b_conv1 = bias_variable([32]) h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
# Pooling layer - downsamples by 2X. h_pool1 = max_pool_2x2(h_conv1)
# Second convolutional layer -- maps 32 feature maps to 64. W_conv2 = weight_variable([5, 5, 32, 64]) b_conv2 = bias_variable([64]) h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
# Second pooling layer. h_pool2 = max_pool_2x2(h_conv2)
# Fully connected layer 1 -- after 2 round of downsampling, our 28x28 image # is down to 7x7x64 feature maps -- maps this to 1024 features. W_fc1 = weight_variable([7 * 7 * 64, 1024]) b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64]) h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
# Dropout - controls the complexity of the model, prevents co-adaptation of # features. keep_prob = tf.placeholder(tf.float32) h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
# Map the 1024 features to 10 classes, one for each digit W_fc2 = weight_variable([1024, 10]) b_fc2 = bias_variable([10])
y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2 return y_conv, keep_prob
def conv2d(x, W): """conv2d returns a 2d convolution layer with full stride.""" return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
def max_pool_2x2(x): """max_pool_2x2 downsamples a feature map by 2X.""" return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
def weight_variable(shape): """weight_variable generates a weight variable of a given shape.""" initial = tf.truncated_normal(shape, stddev=0.1) return tf.Variable(initial)
def bias_variable(shape): """bias_variable generates a bias variable of a given shape.""" initial = tf.constant(0.1, shape=shape) return tf.Variable(initial)
def main(_): # Import data mnist = input_data.read_data_sets(FLAGS.data_dir, one_hot=True)
# Create the model x = tf.placeholder(tf.float32, [None, 784])
# Define loss and optimizer y_ = tf.placeholder(tf.float32, [None, 10])
# Build the graph for the deep net y_conv, keep_prob = deepnn(x)
cross_entropy = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv)) train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy) correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
with tf.Session() as sess: sess.run(tf.global_variables_initializer()) for i in range(20000): batch = mnist.train.next_batch(50) if i % 100 == 0: train_accuracy = accuracy.eval(feed_dict={ x: batch[0], y_: batch[1], keep_prob: 1.0}) print('step %d, training accuracy %g' % (i, train_accuracy)) train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
print('test accuracy %g' % accuracy.eval(feed_dict={ x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))
if __name__ == '__main__': parser = argparse.ArgumentParser() parser.add_argument('--data_dir', type=str, default='/tmp/MNIST_data/', help='Directory for storing input data') FLAGS, unparsed = parser.parse_known_args() tf.app.run(main=main, argv=[sys.argv[0]] + unparsed) |