卷積神經網路

時間 2020-06-24

標籤神經網路简体版

原文原文鏈接

來源https://www.cnblogs.com/Belter/p/10662718.htmlhtml

在很長一段時間，MNIST數據集都是機器學習界不少分類算法的benchmark。初學深度學習，在這個數據集上訓練一個有效的卷積神經網絡就至關於學習編程的時候打印出一行「Hello World!」。下面基於與MNIST數據集很是相似的另外一個數據集Fashion-MNIST數據集來構建一個卷積神經網絡。python

0. Fashion-MNIST數據集

MNIST數據集在機器學習算法中被普遍使用，下面這句話能概況其重要性和地位：git

In fact, MNIST is often the first dataset researchers try. "If it doesn't work on MNIST, it won't work at all", they said. "Well, if it does work on MNIST, it may still fail on others."github

Fashion-MNIST數據集是由ZALANDO實驗室製做，發表於2017年。在該數據集的介紹中，列出了MNIST數據集的不足之處：算法

MNIST太容易了，卷積神經網絡能夠達到99.7%的正確率，傳統的分類算法也能很輕易的達到97%的正確率；

被過分使用了；

不能很好的表明現代計算機視覺任務.

Fashion-MNIST數據集的規格（28×28像素的灰度圖片，10個不一樣類型），數據量（訓練集包括60000張圖片，測試集包括10000張圖片）都與MNIST保持一致。差異是，MNIST的數據是手寫數字0-9，Fashion-MNIST的數據是不一樣類型的衣服和鞋的圖片。編程

下面是該數據集中的標籤：api

Label	Description
0	T-shirt/top
1	Trouser
2	Pullover
3	Dress
4	Coat
5	Sandal
6	Shirt
7	Sneaker
8	Bag
9	Ankle boot

下面是一些例子：網絡

圖0-1：Fashion-MNIST 中的圖片示例app

爲了便於使用，TF 收集了經常使用的數據集，製做成了一個獨立的 Python package。能夠經過如下方式安裝：機器學習

- 更多關於該數據集的信息可參考：https://github.com/tensorflow/datasets

pip install -U tensorflow_datasets

1. 普通神經網絡

1.1 導入依賴的包

下面導入了一些必要的 package（包括前面安裝的 tensorflow_datasets），而且輸出了當前使用的 TensorFlow(TF) 的版本號。若是不是最新的 TF，可使用下面的命令安裝最新的TF。

pip install tensorflow==2.0.0-alpha0  # 安裝最新版的TF

 1 from __future__ import absolute_import, division, print_function
 2 
 3 
 4 # Import TensorFlow and TensorFlow Datasets
 5 import tensorflow as tf
 6 import tensorflow_datasets as tfds
 7 
 8 # Helper libraries
 9 import math
10 import numpy as np
11 import matplotlib.pyplot as plt
12 
13 # Improve progress bar display
14 import tqdm
15 import tqdm.auto
16 tqdm.tqdm = tqdm.auto.tqdm
17 
18 
19 print(tf.__version__) # 2.0.0-alpha0
20 
21 # This will go away in the future.
22 # If this gives an error, you might be running TensorFlow 2 or above
23 # If so, the just comment out this line and run this cell again
24 # tf.enable_eager_execution()

1.2 導入數據集

準備就緒，就能夠從 tensorflow_datasets 中導入Fashion-MNIST數據集了：

- 加載的過程當中，會自動 shuffle 數據；

- 該數據集與MNIST數據集相同，train_dataset 中包含60000張圖片用來作訓練集，test_dataset 中包含10000張圖片用來作測試集.

dataset, metadata = tfds.load('fashion_mnist', as_supervised=True, with_info=True)
train_dataset, test_dataset = dataset['train'], dataset['test']

下面是全部衣服或鞋的名稱，其順序與其前面列出的該數據集的標籤順序相同：

class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 
               'Sandal',      'Shirt',   'Sneaker',  'Bag',   'Ankle boot']

能夠利用 metadata 來查看數據集的信息：

- 下面會輸出訓練集和測試集中樣本的個數

# metadata包含一些關於該數據集的元信息，包括數據集的description, url, version等信息
num_train_examples = metadata.splits['train'].num_examples
num_test_examples = metadata.splits['test'].num_examples
print("Number of training examples: {}".format(num_train_examples))
print("Number of test examples:     {}".format(num_test_examples))

1.3 數據的預處理

原始數據中圖片的每一個像素由[0, 255]區間上的整數表示。爲了更好的訓練模型，須要將全部的值都標準化到區間[0, 1]。

- 通過測試，若是不作這一步，最終在測試集的準確率會降低大概8%。

1 def normalize(images, labels):
2     images = tf.cast(images, tf.float32)  # Casts a tensor to a new type
3     images /= 255
4     return images, labels
5 
6 # The map function applies the normalize function to each element in the train
7 # and test datasets
8 train_dataset =  train_dataset.map(normalize)
9 test_dataset  =  test_dataset.map(normalize)

預處理後的數據一樣能夠表示一張圖片，下面取出測試集中的一張圖片並顯示：

# Take a single image, and remove the color dimension by reshaping
for image, label in test_dataset.take(1):
    break
# print(image.shape, label.shape)
image = image.numpy().reshape((28,28))

# Plot the image - voila a piece of fashion clothing
plt.figure()
plt.imshow(image, cmap=plt.cm.binary)
plt.colorbar()
plt.grid(False)
plt.show()

圖1-1：標準化後的圖片

取出訓練集中前25張圖片：

 1 plt.figure(figsize=(10,10))
 2 i = 0
 3 for (image, label) in train_dataset.take(25):
 4     image = image.numpy().reshape((28,28))
 5     plt.subplot(5,5,i+1)
 6     plt.xticks([])
 7     plt.yticks([])
 8     plt.grid(False)
 9     plt.imshow(image, cmap=plt.cm.binary)
10     plt.xlabel(class_names[label])
11     i += 1
12 plt.show()

圖1-2：訓練集中前25張圖片

1.4 創建模型

準備好數據以後，就能夠構建神經網絡模型了。主要包括構建網絡和編譯兩部分。

1.4.1 構建網絡

在構建網絡時須要明確如下參數：

網絡中包含的總層數；
每一層的類型：例如Flattten，Dense等；
每一層中包含的神經單元的個數；
每一層使用的激活函數：例如Relu，Softmax等，不設置該參數表示不對該層進行任何非線性變換.

下面時構建網絡的代碼：

1 model = tf.keras.Sequential([
2     tf.keras.layers.Flatten(input_shape=(28, 28, 1)),
3     tf.keras.layers.Dense(128, activation=tf.nn.relu),
4     tf.keras.layers.Dense(10,  activation=tf.nn.softmax)
5 ])

該網絡一共有3層（下面假設僅輸入單個樣本，即一張圖片）：

第一層是Flatten層（下圖中的l0），輸入的單個樣本是一個28*28的矩陣（矩陣每個元素的值表示圖片中對應的一個像素點的值），輸出一個長度爲784的向量；
第二層是Dense層（下圖中的l1），輸入是上一層的輸出，即長度爲784的向量；該層具備128個神經單元，激活函數爲Relu；輸出爲一個長度爲128的向量；
第三層是Dense層（下圖中的l2），輸入是上一層的輸出；該層具備10個神經單元，激活函數爲Softmax；輸出爲一個長度爲10的向量，也是該網絡的輸出層.

圖1-3：網絡的結構

上圖中上角標表示層的編號，

1.4.2 編譯

網絡構建好以後，須要編譯。在編譯過程當中須要肯定如下幾個參數：

損失函數（Loss function）：評價模型的好壞；
優化器（Optimizer）：根據偏差和梯度更新參數，從而最小化偏差；
評估標準（Metrics）：一樣用於評價模型的好壞.

損失函數與評估標準的異同：

都是評價模型好壞的方式，且具備高度的相關性；
損失函數必須可導，是待訓練參數的函數，模型的訓練過程就是基於損失函數的優化過程；
評估標準不必定可導，具備更好的可解釋性，例如分類問題中分類的準確率.

下面是編譯的代碼：

model.compile(optimizer='adam', 
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

1.5 訓練模型

創建好模型以後，就能夠訓練模型了。由於是使用梯度降低來訓練模型，所以除了訓練集，還須要指定兩個參數：

批次大小（batch size）：單次訓練模型使用的樣本數（下面設置該值爲32，也就是每次訓練只使用所有訓練集中的32個樣本，使用完全部訓練集樣本須要訓練60000/32=1875次）；
訓練迭代次數（epochs）：在整個訓練集上訓練的次數，若是該值爲5且批次大小爲32，那麼參數總共會更新5*1875次（也就是說訓練集中的每張圖片會被用到5次）；

下面是訓練模型的代碼：

BATCH_SIZE = 32
train_dataset = train_dataset.repeat().shuffle(num_train_examples).batch(BATCH_SIZE)
test_dataset = test_dataset.batch(BATCH_SIZE)

model.fit(train_dataset, epochs=5, steps_per_epoch=math.ceil(num_train_examples/BATCH_SIZE))

下面是訓練過程當中的輸出：

Epoch 1/5
1875/1875 [==============================] - 24s 13ms/step - loss: 0.2735 - accuracy: 0.8981
Epoch 2/5
1875/1875 [==============================] - 14s 8ms/step - loss: 0.2719 - accuracy: 0.8995
Epoch 3/5
1875/1875 [==============================] - 14s 8ms/step - loss: 0.2613 - accuracy: 0.9018
Epoch 4/5
1875/1875 [==============================] - 13s 7ms/step - loss: 0.2457 - accuracy: 0.9087
Epoch 5/5
1875/1875 [==============================] - 13s 7ms/step - loss: 0.2407 - accuracy: 0.9091
<tensorflow.python.keras.callbacks.History at 0x7fe5305bca58>

能夠看到隨着迭代次數的增長，損失函數的值在降低，分類的準確率在上升。最後該模型在訓練集上的分類準確率爲90.91%.

1.6 模型的最終評價

前面是在訓練集中訓練模型，訓練的終止條件是人爲設定的訓練次數。訓練中止後，模型在訓練集上的分類準確率爲91%。若是咱們認爲如今模型訓練已經完成，最後一步就是在測試集上評價模型。測試集中包含的數據是模型以前從未見過新樣本，若是在測試集上表現好，說明該模型有很好的泛化能力，學習到了這類數據的本質特徵。

test_loss, test_accuracy = model.evaluate(test_dataset, steps=math.ceil(num_test_examples/32))
print('Accuracy on test dataset:', test_accuracy)

下面是輸出：

313/313 [==============================] - 2s 6ms/step - loss: 0.3582 - accuracy: 0.8772
Accuracy on test dataset: 0.8772

由於測試集中每批次的大小也是32，所以須要重複10000/32=312.5次來完成整個測試集的測試。最終在測試集中分類準確率爲88%，

1.7 使用模型進行預測以及結果的可視化

下面從測試集取一個 batch 的樣本（32個樣本）進行預測，並將真實的label保存在test_labels中，最終獲得第一個樣本的預測分類與真實分類都是6.

for test_images, test_labels in test_dataset.take(1):
    test_images = test_images.numpy()
    test_labels = test_labels.numpy()
    predictions = model.predict(test_images)
np.argmax(predictions[0]), test_labels[0]  # (6, 6)

下面對部分結果進行可視化：

 1 def plot_image(i, predictions_array, true_labels, images):
 2     predictions_array, true_label, img = predictions_array[i], true_labels[i], images[i]
 3     plt.grid(False)
 4     plt.xticks([])
 5     plt.yticks([])
 6     
 7     plt.imshow(img[...,0], cmap=plt.cm.binary)
 8 
 9     predicted_label = np.argmax(predictions_array)
10     if predicted_label == true_label:
11         color = 'blue'
12     else:
13         color = 'red'
14     
15     plt.xlabel("{} {:2.0f}% ({})".format(class_names[predicted_label],
16                                   100*np.max(predictions_array),
17                                   class_names[true_label]),
18                                   color=color)
19 
20 def plot_value_array(i, predictions_array, true_label):
21     predictions_array, true_label = predictions_array[i], true_label[i]
22     plt.grid(False)
23     plt.xticks([])
24     plt.yticks([])
25     thisplot = plt.bar(range(10), predictions_array, color="#777777")
26     plt.ylim([0, 1]) 
27     predicted_label = np.argmax(predictions_array)
28     
29     thisplot[predicted_label].set_color('red')
30     thisplot[true_label].set_color('blue')
31 
32 # Plot the first X test images, their predicted label, and the true label
33 # Color correct predictions in blue, incorrect predictions in red
34 num_rows = 5
35 num_cols = 3
36 num_images = num_rows*num_cols
37 plt.figure(figsize=(2*2*num_cols, 2*num_rows))
38 for i in range(num_images):
39     plt.subplot(num_rows, 2*num_cols, 2*i+1)
40     plot_image(i, predictions, test_labels, test_images)
41     plt.subplot(num_rows, 2*num_cols, 2*i+2)
42     plot_value_array(i, predictions, test_labels)

結果以下：

圖1-4：部分結果的可視化

上圖中，藍色字體表示預測正確，藍色柱狀圖表示正確的類；紅色表示預測錯誤。

2. 卷積神經網絡

前面直接使用全鏈接層加上激活函數，已經取得了很是好分類效果：測試集的準確率爲88%。實現卷積神經網絡只須要改動網絡的結構（1.4.1 構建網絡）這一部分就能夠了：

 1 model = tf.keras.Sequential([
 2     tf.keras.layers.Conv2D(32, (3,3), padding='same', activation=tf.nn.relu,
 3                            input_shape=(28, 28, 1)),
 4     tf.keras.layers.MaxPooling2D((2, 2), strides=2),
 5     tf.keras.layers.Conv2D(64, (3,3), padding='same', activation=tf.nn.relu),
 6     tf.keras.layers.MaxPooling2D((2, 2), strides=2),
 7     tf.keras.layers.Flatten(),
 8     tf.keras.layers.Dense(128, activation=tf.nn.relu),
 9     tf.keras.layers.Dense(10,  activation=tf.nn.softmax)
10 ])

此時，除了前面出現過的Flatten和Dense層，還有兩種新的層類型：Conv2D和MaxPooling2D.

2.1 卷積層

Conv2D表示二維卷積層（2D convolution layer），主要參數以下：

filters：過濾器（filter或kernal）的個數n，每個過濾器均可以對上一層的整個圖片進行卷積操做，獲得n個激活圖（activation map）。例如上面的網絡結構中第一個卷積層中n=32，表示該層有32個過濾器，所以該層處理後獲得的結果的維度是(28, 28, 32)；
kernel_size：過濾器的大小，由於這裏使用的圖片是灰度圖片只有1個channel（彩色圖片有3個channel），所以kernal的深度也爲1，只須要設定kernal的長和寬。上面兩個卷積層都是用了(3, 3)大小的過濾器；
padding：padding的處理方式，若是不padding，過濾後原圖片邊緣的信息會丟失。本例中該參數都設置爲"same"，會在原圖像周圍補0，從而保持過濾後圖像的長寬保持不變；
激活函數：同其餘層，用於對神經單元的值作非線性變換.

下面是卷積層處理的示意圖：

圖2-1 卷積層過濾

上圖左邊是原圖像，中間是過濾器，右邊是卷積操做後獲得的結果。

本文更多的是介紹利用 TF 2.0 實現神經網絡的方式，關於卷積層的更多知識點能夠參考下面的連接：

- http://cs231n.stanford.edu/syllabus.html，Convolutional Neural Networks相關部分

- https://jhui.github.io/2017/03/16/CNN-Convolutional-neural-network/

2.2 最大池化層

MaxPooling2D表示2維最大池化層，用於對原圖像進行下采用（down sampling），從而減少圖片大小，下降訓練難度。最大池化操做通常與卷積操做連在一塊兒使用。主要參數以下：

pool_size：池化窗口的大小。例如上面兩個最大池化操做的窗口大小都爲(2, 2)；
strides：步幅，窗口平移時間隔的距離。例如上面的設置都爲2，表示窗口平移時，下一個窗口與上一個窗口間隔兩個像素.

圖2-2 使用(2, 2)，步幅爲2的窗口進行最大池化操做

最大池化就是隻保留每一個窗口中的最大值。如上圖所示，按照(2, 2)的窗口大小和2的步幅，在左邊(4, 4)的圖像中只有4個窗口，每一個窗口取最大值就能夠獲得右邊的結果。

2.3 CNN的位置不變性

卷積神經網絡之因此適合處理圖片，一個最大的緣由就是該算法具備位置不變性。例如進行圖像識別時，無論所識別的物體位於圖片的哪一個位置，均可以準確的識別。這種位置不變性就是卷積操做帶來的，由於該操做使用一個小的窗口（kernal）地毯式的掃描了圖片各個局部區域。

由卷積層和最大池化層構成的卷積神經網絡將 Fashion-MNIST 測試集圖片分類的正確率提升到了92%.

3. 小結

構建深度學習模型的通常流程

準備數據集：明確數據的特徵、標籤和樣本總數，將數據集拆分紅訓練集和測試集（有時候還會包括驗證集），數據的預處理（例如標準化等操做）；
定義網絡結構：在 Keras 和 TF 2.0 中，層（layer）是網絡的基本結構，全部的網絡類型均可以使用基本類型的層搭建起來。這裏須要肯定網絡的層數，每一層的類型、激活函數、神經單元的個數等超參數；
編譯模型：編譯構建好的網絡，須要明確三個參數，損失函數（loss function）、優化器（optimizer）和評估標準（metrics）;
訓練模型：須要指定批次大小（batch size）和迭代次數（epochs）;
評價模型：在測試集上評價模型的效果.

損失函數的選擇

參考：https://keras.io/losses/

兩分類：binary crossentropy
對分類問題：categorical crossentropy
迴歸問題：mean-squared error

優化器的選擇

參考：https://keras.io/optimizers/

如今用的比較多的是RMSprop和Adam

度量

參考：https://keras.io/metrics/

Reference

https://github.com/zalandoresearch/fashion-mnist#why-we-made-fashion-mnist

https://arxiv.org/abs/1708.07747

https://medium.com/tensorflow/introducing-tensorflow-datasets-c7f01f7e19f3

https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53

https://datascience.stackexchange.com/questions/13663/neural-networks-loss-and-accuracy-correlation

https://keras.io/layers/convolutional/

https://keras.io/layers/pooling/

http://cs231n.stanford.edu/slides/2019/cs231n_2019_lecture05.pdf

https://blogs.nvidia.com/blog/2018/09/05/whats-the-difference-between-a-cnn-and-an-rnn/

https://github.com/OnlyBelter/examples/blob/master/courses/udacity_intro_to_tensorflow_for_deep_learning/l03c01_classifying_images_of_clothing.ipynb，代碼

https://github.com/OnlyBelter/examples/blob/master/courses/udacity_intro_to_tensorflow_for_deep_learning/l04c01_image_classification_with_cnns.ipynb，代碼

https://github.com/keras-team/keras-docs-zh，一些名詞的翻譯參考了該文檔

Deep Learning with Python, by François Chollet, 2017.11