Kaggle上有一個圖像分類比賽Digit Recognizer,數據集是大名鼎鼎的MNIST——圖片是已分割 (image segmented)過的28*28的灰度圖,手寫數字部分對應的是0~255的灰度值,背景部分爲0。html
from keras.datasets import mnist (x_train, y_train), (x_test, y_test) = mnist.load_data() x_train[0] # .shape = 28*28 """ [[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] ... [ 0 0 0 0 0 0 0 0 0 0 0 0 3 18 18 18 126 136 175 26 166 255 247 127 0 0 0 0] [ 0 0 0 0 0 0 0 0 30 36 94 154 170 253 253 253 253 253 225 172 253 242 195 64 0 0 0 0] ... [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]] """
手寫數字圖片是長這樣的:
python
import matplotlib.pyplot as plt plt.subplot(1, 3, 1) plt.imshow(x_train[0], cmap='gray') plt.subplot(1, 3, 2) plt.imshow(x_train[1], cmap='gray') plt.subplot(1, 3, 3) plt.imshow(x_train[2], cmap='gray') plt.show()
手寫數字識別能夠看作是一個圖像分類問題——對二維向量的灰度圖進行分類。git
Rodrigo Benenson給出50種方法在MNIST的錯誤率。本文將從傳統方法過渡到深度學習,對比準確率來看。如下代碼基於Python 3.6 + sklearn 0.18.1 + keras 2.0.4。github
kNN網絡
思路比較簡單:將二維向量拉直成一個一維向量,基於距離度量以判斷向量間的類似性。顯而易見,這種不帶特徵提取的樸素辦法,丟掉了二維向量中最重要的四周相鄰像素的信息。在比較乾淨的數據集MNIST還有不錯的表現,準確率爲96.927%。此外,kNN模型訓練慢。app
from sklearn import neighbors from sklearn.metrics import precision_score num_pixels = x_train[0].shape[0] * x_train[0].shape[1] x_train = x_train.reshape((x_train.shape[0], num_pixels)) x_test = x_test.reshape((x_test.shape[0], num_pixels)) knn = neighbors.KNeighborsClassifier() knn.fit(x_train, y_train) pred = knn.predict(x_test) precision_score(y_test, pred, average='macro') # 0.96927533865705706
MLP
多層感知器MLP (Multi Layer Perceptron)亦即三層的前饋神經網絡,所採用的特徵與kNN方法相似——每個像素點的灰度值對應於輸入層的一個神經元,隱藏層的神經元數爲700(通常介於輸入層與輸出層的數量之間)。sklearn的MLPClassifier實現MLP分類,下面給出基於keras的MLP實現。沒怎麼細緻地調參,準確率大概在98.530%左右。學習
from keras.layers import Dense from keras.models import Sequential from keras.utils import np_utils # normalization num_pixels = 28 * 28 x_train = x_train.reshape(x_train.shape[0], num_pixels).astype('float32') / 255 x_test = x_test.reshape(x_test.shape[0], num_pixels).astype('float32') / 255 # one-hot enconder for class y_train = np_utils.to_categorical(y_train) y_test = np_utils.to_categorical(y_test) num_classes = y_train.shape[1] model = Sequential([ Dense(700, input_dim=num_pixels, activation='relu', kernel_initializer='normal'), # hidden layer Dense(num_classes, activation='softmax', kernel_initializer='normal') # output layer ]) model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy']) model.summary() model.fit(x_train, y_train, validation_data=(x_test, y_test), epochs=600, batch_size=200, verbose=2) model.evaluate(x_test, y_test, verbose=0) # [0.10381294689745164, 0.98529999999999995]
LeCun早在1989年發表的論文 [1]中提出了用CNN (Convolutional Neural Networks)來作手寫數字識別,後來 [2]又改進到Lenet-5,其網絡結構以下圖所示:
ui
卷積、池化、卷積、池化,而後套2個全鏈接層,最後接個Guassian鏈接層。衆所周知,CNN自帶特徵提取功能,不須要刻意地設計特徵提取器。基於keras,Lenet-5 非正式實現以下:lua
import keras from keras.layers import Conv2D, MaxPooling2D from keras.layers import Dense, Dropout, Flatten, Activation from keras.models import Sequential from keras.utils import np_utils img_rows, img_cols = 28, 28 # TensorFlow backend: image_data_format() == 'channels_last' x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1).astype('float32') / 255 x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1).astype('float32') / 255 # one-hot code for class y_train = np_utils.to_categorical(y_train) y_test = np_utils.to_categorical(y_test) num_classes = y_train.shape[1] model = Sequential() model.add(Conv2D(filters=6, kernel_size=(5, 5), padding='valid', input_shape=(28, 28, 1))) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Activation("sigmoid")) model.add(Conv2D(16, kernel_size=(5, 5), padding='valid')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Activation("sigmoid")) model.add(Dropout(0.25)) # full connection model.add(Conv2D(120, kernel_size=(1, 1), padding='valid')) model.add(Flatten()) # full connection model.add(Dense(84, activation='sigmoid')) model.add(Dense(num_classes, activation='softmax')) model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.SGD(lr=0.08, momentum=0.9), metrics=['accuracy']) model.summary() model.fit(x_train, y_train, batch_size=32, epochs=8, verbose=1, validation_data=(x_test, y_test)) model.evaluate(x_test, y_test, verbose=0)
以上三種方法的準確率以下:設計
特徵 | 分類器 | 準確率 |
---|---|---|
gray | kNN | 96.927% |
gray | 3-layer neural networks | 98.530% |
Lenet-5 | 98.640% |
[1] LeCun, Yann, et al. "Backpropagation applied to handwritten zip code recognition." Neural computation 1.4 (1989): 541-551.
[2] LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324.
[3] Taylor B. Arnold, Computer vision: LeNet-5, AlexNet, VGG-19, GoogLeNet.