深夜趕工：CNN神經網絡作彩色圖像識別，用以測試天價核彈

時間 2019-12-12

標籤趕工 cnn 神經網絡彩色圖像識別用以測試天價核彈简体版

原文原文鏈接

在圖像識別的道路越走越遠✌( •̀ ω •́ )yhtml

1.解釋一下git

深夜腦子不是很清楚，大部分代碼參考了github……
此CNN圖像識別神經網絡的用途是以後用來評估NVIDIA-DGX服務器的性能，所以儘可能擴大網絡的訓練時間。
此服務器搭載了8塊NVIDIA TESLA V100顯卡，是目前頂級的深度學習計算卡，單卡售價102萬RMB，整機售價接近1000萬，天價核彈，有錢真好。根據網上的信息，此服務器可在8小時內完成titanX 8天的工做量，頂級民用cpu數個月工做量。github

此神經網絡參考了GITHUB的圖像識別項目，採用了DenseNet模型，增長了ImageDataGenerator函數以擴充數據集。打算後續經過改變常量epoch的值在各個平臺進行運算。shell

因爲深夜倉促，還沒有完成GPU的配置，所以把epoch設置爲1先在CPU上跑跑試試，經過經驗估計在GTX1080上所需的時間。服務器

2.數據集說明
該訓練採用cifar10數據集，包含60000張32x32像素的彩色圖片，這些圖片分屬不一樣的類別，如圖所示：
網絡

具體說明參考多倫多大學官網：http://www.cs.toronto.edu/~kr...ide

此網絡的目的是儘可能精確地經過圖像識別將圖片分類到本身所屬類別當中。函數

下載數據集後直接更名後放入user.kerasdatasets文件夾中：
性能

解壓後可發現，數據集分紅6個batch，其中5個爲訓練集，1個爲測試集：
學習

3.深夜倉促，直接上代碼：

導入第三方庫（numpy/keras/math）：

import numpy as np
import keras
import math
from keras.datasets import cifar10
from keras.preprocessing.image import ImageDataGenerator
from keras.layers.normalization import BatchNormalization
from keras.layers import Conv2D, Dense, Input, add, Activation, AveragePooling2D, GlobalAveragePooling2D
from keras.layers import Lambda, concatenate
from keras.initializers import he_normal
from keras.layers.merge import Concatenate
from keras.callbacks import LearningRateScheduler, TensorBoard, ModelCheckpoint
from keras.models import Model
from keras import optimizers
from keras import regularizers
from keras.utils.vis_utils import plot_model as plot

設置常量：

growth_rate        = 12 
depth              = 100
compression        = 0.5

img_rows, img_cols = 32, 32           #圖片尺寸
img_channels       = 3                #圖片色彩通道數，RGB
num_classes        = 10               #數據集類別數量
batch_size         = 64               #訓練batch所包含的example數量，只能是64或者32
epochs             = 1                #全數據集迭代次數，這裏打算用cpu運算一次。
                                      #根據測試的顯卡和本身的要求改epoch數量
                                      #當epoch數量爲250時識別效果較好，但這裏不考慮效果

iterations         = 782              #每一次epoch的步數
weight_decay       = 0.0001

mean = [125.307, 122.95, 113.865]
std  = [62.9932, 62.0887, 66.7048]

根迭代次數改變scheduler，越迭代到後面該值越小，這意味着但願訓練過程當中隨機因素逐步減少：

def scheduler(epoch):
    if epoch <= 100:
       return 0.1
    if epoch <= 180:
       return 0.01
    return 0.0005

定義一個DenseNet模型（github搬運工上線！）：

def densenet(img_input,classes_num):

    def bn_relu(x):
        x = BatchNormalization()(x)
        x = Activation('relu')(x)
        return x

    def bottleneck(x):
        channels = growth_rate * 4
        x = bn_relu(x)
        x = Conv2D(channels,kernel_size=(1,1),strides=(1,1),padding='same',kernel_initializer=he_normal(),kernel_regularizer=regularizers.l2(weight_decay),use_bias=False)(x)
        x = bn_relu(x)
        x = Conv2D(growth_rate,kernel_size=(3,3),strides=(1,1),padding='same',kernel_initializer=he_normal(),kernel_regularizer=regularizers.l2(weight_decay),use_bias=False)(x)
        return x

    def single(x):
        x = bn_relu(x)
        x = Conv2D(growth_rate,kernel_size=(3,3),strides=(1,1),padding='same',kernel_initializer=he_normal(),kernel_regularizer=regularizers.l2(weight_decay),use_bias=False)(x)
        return x

    def transition(x, inchannels):
        x = bn_relu(x)
        x = Conv2D(int(inchannels * compression),kernel_size=(1,1),strides=(1,1),padding='same',kernel_initializer=he_normal(),kernel_regularizer=regularizers.l2(weight_decay),use_bias=False)(x)
        x = AveragePooling2D((2,2), strides=(2, 2))(x)
        return x

    def dense_block(x,blocks,nchannels):
        concat = x
        for i in range(blocks):
            x = bottleneck(concat)
            concat = concatenate([x,concat], axis=-1)
            nchannels += growth_rate
        return concat, nchannels

    def dense_layer(x):
        return Dense(classes_num,activation='softmax',kernel_initializer=he_normal(),kernel_regularizer=regularizers.l2(weight_decay))(x)


    # nblocks = (depth - 4) // 3 
    nblocks = (depth - 4) // 6 
    nchannels = growth_rate * 2

    x = Conv2D(nchannels,kernel_size=(3,3),strides=(1,1),padding='same',kernel_initializer=he_normal(),kernel_regularizer=regularizers.l2(weight_decay),use_bias=False)(img_input)

    x, nchannels = dense_block(x,nblocks,nchannels)
    x = transition(x,nchannels)
    x, nchannels = dense_block(x,nblocks,nchannels)
    x = transition(x,nchannels)
    x, nchannels = dense_block(x,nblocks,nchannels)
    x = bn_relu(x)
    x = GlobalAveragePooling2D()(x)
    x = dense_layer(x)
    return x

載入數據集，並對標籤進行矩陣設置，改變數據集數據類型：

(x_train, y_train), (x_test, y_test) = cifar10.load_data()
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test  = keras.utils.to_categorical(y_test, num_classes)
x_train = x_train.astype('float32')
x_test  = x_test.astype('float32')

將數據集歸一化，方便訓練：

for i in range(3):
    x_train[:,:,:,i] = (x_train[:,:,:,i] - mean[i]) / std[i]
    x_test[:,:,:,i] = (x_test[:,:,:,i] - mean[i]) / std[i]

定義模型並打印簡圖，shell中打印的模型圖太長了，就不貼了，長得一逼，須要看的話直接在shell中print summary就能夠：

img_input = Input(shape=(img_rows,img_cols,img_channels))
output    = densenet(img_input,num_classes)
model     = Model(img_input, output)
# model.load_weights('ckpt.h5')
print(model.summary())
plot(model, to_file='cnn_model.png',show_shapes=True)

這個模型的參數狀況以下圖所示。圖像識別的問題就是這點麻煩，參數太多了，大批求導，怪不得天價核彈這麼貴還這麼有市場：

本質上仍是一個分類問題，使用交叉熵做爲損失函數，定義輸出結果的好壞：

sgd = optimizers.SGD(lr=.1, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])

設定回饋：

tb_cb     = TensorBoard(log_dir='./densenet/', histogram_freq=0)
change_lr = LearningRateScheduler(scheduler)
ckpt      = ModelCheckpoint('./ckpt.h5', save_best_only=False, mode='auto', period=10)
cbks      = [change_lr,tb_cb,ckpt]

添加上數據集擴充功能，對圖像作一些彈性變換，好比水平翻轉，垂直翻轉，旋轉：

print('Using real-time data augmentation.')
datagen   = ImageDataGenerator(horizontal_flip=True,width_shift_range=0.125,height_shift_range=0.125,fill_mode='constant',cval=0.)

datagen.fit(x_train)

訓練模型：

model.fit_generator(datagen.flow(x_train, y_train,batch_size=batch_size), steps_per_epoch=iterations, epochs=epochs, callbacks=cbks,validation_data=(x_test, y_test))
model.save('densenet.h5')

訓練過程cpu（i7-7820hk）滿載：

在cpu上進行一次訓練須要將近10000秒：

根據以前手寫數字文本識別模型的經驗（cpu須要12秒，gtx1080只須要0.47秒，gpu是cpu性能的25.72倍），把本程序的epoch改到2500，則gtx1080須要大概270小時。

在v100天價核彈上會是個什麼狀況呢？明天去試試看咯！