【PaddlePaddle系列】手寫數字識別

時間 2020-05-09

標籤 PaddlePaddle系列手寫數字識別简体版

原文原文鏈接

最近百度爲了推廣自家編寫對深度學習框架PaddlePaddle不斷推出各類比賽。百度聲稱PaddlePaddle是一個「易學、易用」的開源深度學習框架，然而網上的資料少之又少。雖然百度很用心地提供了許多文檔，並且仍是中英雙語具有，可是最關鍵的是報錯了很難在網上找到相應的解決辦法。爲了明年備戰百度的比賽，便開始學習如下PaddlePaddle。git

一、安裝github

PaddlePaddle一樣支持CUDA加速運算，可是若是沒有NVIDIA的顯卡，那就仍是裝CPU版本。網絡

CPU版本安裝：pip install paddlepaddleapp

GPU版本根據所安裝的CUDA版本以及cuDNN版本有所不一樣：框架

CUDA9 + cuDNN7.0：pip install paddlepaddle-gpuide

CUDA8 + cuDNN7.0 : pip install paddlepaddle-gpu==0.14.0.post87函數

CUDA8 + cuDNN5.0 : pip install paddlepaddle-gpu==0.14.0.post85post

二、手寫數字識別學習

其實，Paddle的GitHub提供了這個例程。可是，我的感受這個例程部分直接調用PaddlePaddle內部類使得讀者閱讀起來十分困難。特別是數據輸入（Feed）中的reader，若是直接看程序，它直接一個函數就完成了圖像輸入，徹底搞不懂它是如何操做。這裏也就重點將這裏，我的感受這是和Tensorflow較大的區別。測試

2.一、網絡構建

程序中提供了三種網絡模型，代碼很明顯，這裏應該不用太多說，直接貼出來了。須要注意的是，PaddlePaddle將圖像的通道數放在最前面，即爲[C H W]，區別於[H W C]。

(1)、單層全鏈接層+softmax

#a full-connect-layer network using softmax as activation function
def softmax_regression():
    img = fluid.layers.data(name='img',shape=[1,28,28],dtype='float32')
    predict = fluid.layers.fc(input=img,size=10,act='softmax')
    return predict

(2)、多層全鏈接層+softmax

#3 full-connect-layers network using softmax as activation function
def multilayer_perceptron():
    img = fluid.layers.data(name='img',shape=[1,28,28],dtype='float32')
    hidden = fluid.layers.fc(input = img,size=128,act='softmax')
    hidden = fluid.layers.fc(input = hidden,size=64,act='softmax')
    prediction = fluid.layers.fc(input = hidden,size=10,act='softmax')
    return prediction

(3)、卷積神經網絡

#traditional converlutional neural network
def cnn():
    img = fluid.layers.data(name='img',shape=[1, 28, 28], dtype ='float32')
    # first conv pool
    conv_pool_1 = fluid.nets.simple_img_conv_pool(
        input = img,
        filter_size = 5,
        num_filters = 20,
        pool_size=2,
        pool_stride=2,
        act="relu")
    conv_pool_1 = fluid.layers.batch_norm(conv_pool_1)
    # second conv pool
    conv_pool_2 = fluid.nets.simple_img_conv_pool(
        input=conv_pool_1,
        filter_size=5,
        num_filters=50,
        pool_size=2,
        pool_stride=2,
        act="relu")
    # output layer with softmax activation function. size = 10 since there are only 10 possible digits.
    prediction = fluid.layers.fc(input=conv_pool_2, size=10, act='softmax')
    return prediction

2.二、構建損失函數

PaddlePaddle的損失函數的構建基本上與tensorflow沒有太大的區別。可是須要指出的是：（1）在tensorflow中交叉熵的求解函數是使用[0 0 0 ... 1 ...]等長向量求解。可是在PaddlePaddle中，交叉熵是直接與一個整數求解；（2）標籤(lable)的輸入數據類型使用的是int64，儘管reader生成器返回的是int類型。筆者嘗試將其改成int32類型，可是會出錯。另外在其餘實踐過程當中使用int32也是有相應的錯誤。

def train_program():
    #if using dtype='int64', it reports errors!
    label = fluid.layers.data(name='label', shape=[1], dtype='int64')
    # Here we can build the prediction network in different ways. Please
    predict = cnn()
    #predict = softmax_regression()
    #predict = multilayer_perssion()
    # Calculate the cost from the prediction and label.
    cost = fluid.layers.cross_entropy(input=predict, label=label)
    avg_cost = fluid.layers.mean(cost)
    acc = fluid.layers.accuracy(input=predict, label=label)
    return [avg_cost, acc]

PaddlePaddle使用Trainer進行訓練，只需構建訓練函數train_program做爲Trainer參數（這個下面個再詳細講解）。這裏要說一下，函數返回一個向量[arg_cost, acc]，其中第一個元素做爲損失函數，然後面幾個元素則是可選的，用於在迭代過程當中print出來。因此，返回arg_cost是必要的，其餘是可選的。特別說明：不要做死將一個常量放在裏面，也就是裏面的元素必須是會隨着訓練而變化，若是做死「acc=1」，則在訓練中會報錯。

2.三、訓練

PaddlePaddle使用fulid.Trainer來建立訓練器。這裏則須要配備好訓練器的train_program(損失函數)、place(是否使用GPU)以及optimizer_program(優化器)。而後調用train函數來進行訓練。詳細可見下面程序：

def optimizer_program():
    return fluid.optimizer.Adam(learning_rate=0.001)
if __name__ == "__main__":
    print("run minst train\n")
    minst_prefix = '/home/dzqiu/DataSet/minst/'
    train_image_path   = minst_prefix + 'train-images-idx3-ubyte.gz'
    train_label_path   = minst_prefix + 'train-labels-idx1-ubyte.gz'
    test_image_path    = minst_prefix + 't10k-images-idx3-ubyte.gz'
    test_label_path    = minst_prefix + 't10k-labels-idx1-ubyte.gz'
    #reader_creator在將在下面講述
    train_reader = paddle.batch(paddle.reader.shuffle(#shuffle用於打亂buffer的循序
                    reader_creator(train_image_path,train_label_path,buffer_size=100),
                                        buf_size=500),
                                        batch_size=64)
    test_reader  = paddle.batch(
                    reader_creator(test_image_path,test_label_path,buffer_size=100),
                    batch_size=64)              #測試集就不用打亂了
    
    #if use GPU, use 'export FLAGS_fraction_of_gpu_memory_to_use=0' at first
    use_cuda = True
    place    = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
    
    trainer  = fluid.Trainer(train_func=train_program,
                             place=place,
                             optimizer_func=optimizer_program)
    
    params_dirname = "recognize_digits_network.inference.model"
    lists = []
    #
    def event_handler(event):
        if isinstance(event,fluid.EndStepEvent):#每步觸發事件
            if event.step % 100 == 0:
                print("Pass %d, Epoch %d, Cost %f, Acc %f"\
                       %(event.step, event.epoch,
                       event.metrics[0],#train_program返回的第一個參數arg_cost
                       event.metrics[1]))#train_program返回的第二個參數acc
        if isinstance(event,fluid.EndEpochEvent):#每次迭代觸發事件
            trainer.save_params(params_dirname)
            #使用test的時候，返回值就是train_program的返回，因此賦值須要對應
            avg_cost, acc = trainer.test(reader=test_reader,
                                         feed_order=['img','label']) 
            print("Test with Epoch %d, avg_cost: %s, acc: %s"
                  %(event.epoch, avg_cost, acc))
            lists.append((event.epoch, avg_cost, acc))

    # Train the model now
    trainer.train(num_epochs=5,event_handler=event_handler,
                  reader=train_reader,feed_order=['img', 'label'])
    
    # find the best pass
    best = sorted(lists, key=lambda list: float(list[1]))[0]
    print 'Best pass is %s, testing Avgcost is %s' % (best[0], best[1])
    print 'The classification accuracy is %.2f%%' % (float(best[2]) * 100)

2.四、訓練數據的讀取 Reader

PaddlePaddle的訓練數據讀取僅用一個paddle.dataset.mnist.train()解決，封裝起來難以理解其操做，更不能看出如何讀取本身的訓練集。這裏，我將這個段函數從源碼中挖出來簡化爲reader_creator，實現對minst數據集的讀取，首先讓咱們看看minst數據集的格式：

訓練集中，標籤集前8個字節是magic和數目，後面每一個字節表明數字0-9的標籤；圖像集中前16字節是一些數據集信息，包括magic、圖像數目、行數和列數，後面每一個字節表明每一個像素點，也就是說咱們連續取出28*28個字節安順序就能夠組成28*28的圖片。弄清楚文件如何讀取，那麼就能夠編寫reader：

def reader_creator(image_filename,label_filename,buffer_size):
    def reader():
    #調用命令讀取文件，Linux下使用zcat
        if platform.system()=='Linux':
            zcat_cmd = 'zcat'
        elif paltform.system()=='Windows':
            zcat_cmd = 'gzcat'
        else:
            raise NotImplementedError("This program is suported on Windows or Linux,\
                                      but your platform is" + platform.system())
        
        #create a subprocess to read the images
        sub_img = subprocess.Popen([zcat_cmd, image_filename], stdout = subprocess.PIPE)
        sub_img.stdout.read(16) #skip some magic bytes 這裏咱們已經知道，因此咱們不在須要前16字節
        #create a subprocess to read the labels
        sub_lab = subprocess.Popen([zcat_cmd, label_filename], stdout = subprocess.PIPE)
        sub_lab.stdout.read(8)  #skip some magic bytes 同理
        
    try:
            while True:         #前面使用try,故若再讀取過程當中遇到結束則會退出
        #label is a pixel repersented by a unsigned byte,so just read a byte
                labels = numpy.fromfile(
                            sub_lab.stdout,'ubyte',count=buffer_size).astype("int")

                if labels.size != buffer_size:
                    break
        #read 28*28 byte as array,and then resize it
                images = numpy.fromfile(
                            sub_img.stdout,'ubyte',
                            count=buffer_size * 28 * 28)
                            .reshape(buffer_size, 28, 28).astype("float32")
        #mapping each pixel into (-1,1)
                images = images / 255.0 * 2.0 - 1.0;
                for i in xrange(buffer_size):
                    yield images[i,:],int(labels[i]) #將圖像與標籤拋出，循序與feed_order對應！
        finally:
            try:
        #terminate the reader subprocess
                sub_img.terminate()
            except:
                pass
            try:
        #terminate the reader subprocess
                sub_lable.terminate()
            except:
                pass
    return reader

2.五、運行結果

訓練集中有60000張圖片，buffer_size爲100，batch_size爲64，因此應該Pass了900屢次。

Pass 0, Batch 0, Cost 4.250958, Acc 0.062500
Pass 100, Batch 0, Cost 0.249865, Acc 0.953125
Pass 200, Batch 0, Cost 0.281933, Acc 0.906250
Pass 300, Batch 0, Cost 0.147851, Acc 0.953125
Pass 400, Batch 0, Cost 0.144059, Acc 0.968750
Pass 500, Batch 0, Cost 0.082035, Acc 0.953125
Pass 600, Batch 0, Cost 0.105593, Acc 0.984375
Pass 700, Batch 0, Cost 0.148170, Acc 0.968750
Pass 800, Batch 0, Cost 0.182150, Acc 0.937500
Pass 900, Batch 0, Cost 0.066323, Acc 0.968750
Test with Epoch 0, avg_cost: 0.07329441363440427, acc: 0.9762620192307693
Pass 0, Batch 1, Cost 0.157396, Acc 0.953125
Pass 100, Batch 1, Cost 0.050120, Acc 0.968750
Pass 200, Batch 1, Cost 0.086324, Acc 0.984375
Pass 300, Batch 1, Cost 0.002137, Acc 1.000000
Pass 400, Batch 1, Cost 0.173876, Acc 0.984375
Pass 500, Batch 1, Cost 0.059772, Acc 0.968750
Pass 600, Batch 1, Cost 0.035788, Acc 0.984375
Pass 700, Batch 1, Cost 0.008351, Acc 1.000000
Pass 800, Batch 1, Cost 0.022678, Acc 0.984375
Pass 900, Batch 1, Cost 0.021835, Acc 1.000000
Test with Epoch 1, avg_cost: 0.06836433922317389, acc: 0.9774639423076923
Pass 0, Batch 2, Cost 0.214221, Acc 0.937500
Pass 100, Batch 2, Cost 0.212448, Acc 0.953125
Pass 200, Batch 2, Cost 0.007266, Acc 1.000000
Pass 300, Batch 2, Cost 0.015241, Acc 1.000000
Pass 400, Batch 2, Cost 0.061948, Acc 0.984375
Pass 500, Batch 2, Cost 0.043950, Acc 0.984375
Pass 600, Batch 2, Cost 0.018946, Acc 0.984375
Pass 700, Batch 2, Cost 0.015527, Acc 0.984375
Pass 800, Batch 2, Cost 0.035185, Acc 0.984375
Pass 900, Batch 2, Cost 0.004890, Acc 1.000000
Test with Epoch 2, avg_cost: 0.05774364945361809, acc: 0.9822716346153846
Pass 0, Batch 3, Cost 0.031849, Acc 0.984375
Pass 100, Batch 3, Cost 0.059525, Acc 0.953125
Pass 200, Batch 3, Cost 0.022106, Acc 0.984375
Pass 300, Batch 3, Cost 0.006763, Acc 1.000000
Pass 400, Batch 3, Cost 0.056089, Acc 0.984375
Pass 500, Batch 3, Cost 0.018876, Acc 1.000000
Pass 600, Batch 3, Cost 0.010325, Acc 1.000000
Pass 700, Batch 3, Cost 0.010989, Acc 1.000000
Pass 800, Batch 3, Cost 0.026476, Acc 0.984375
Pass 900, Batch 3, Cost 0.007792, Acc 1.000000
Test with Epoch 3, avg_cost: 0.05476908334449968, acc: 0.9830729166666666
Pass 0, Batch 4, Cost 0.061547, Acc 0.984375
Pass 100, Batch 4, Cost 0.002315, Acc 1.000000
Pass 200, Batch 4, Cost 0.009715, Acc 1.000000
Pass 300, Batch 4, Cost 0.024202, Acc 0.984375
Pass 400, Batch 4, Cost 0.150663, Acc 0.968750
Pass 500, Batch 4, Cost 0.082586, Acc 0.984375
Pass 600, Batch 4, Cost 0.012232, Acc 1.000000
Pass 700, Batch 4, Cost 0.055258, Acc 0.984375
Pass 800, Batch 4, Cost 0.016068, Acc 1.000000
Pass 900, Batch 4, Cost 0.004945, Acc 1.000000
Test with Epoch 4, avg_cost: 0.041706092633705505, acc: 0.9865785256410257
Best pass is 4, testing Avgcost is 0.041706092633705505
The classification accuracy is 98.66%

View Code

2.6 測試接口

PaddlePaddle提供接口函數，調用接口便可。特別的是，圖像須要轉化爲[N C H W]的張量，若是是一張圖像，這裏N固然是1，由於是灰度圖C也即是1。具體看下面代碼：

def load_image(file):
        im = Image.open(file).convert('L')
        im = im.resize((28, 28), Image.ANTIALIAS)
        im = numpy.array(im).reshape(1, 1, 28, 28).astype(np.float32) #[N C H W] 這裏多了一個N
        im = im / 255.0 * 2.0 - 1.0
        return im
    cur_dir = os.path.dirname(os.path.realpath(__file__))
    img = load_image(cur_dir + '/infer_3.png')
    inferencer = fluid.Inferencer(
        # infer_func=softmax_regression, # uncomment for softmax regression
        # infer_func=multilayer_perceptron, # uncomment for MLP
        infer_func=cnn,  # uncomment for LeNet5
        param_path=params_dirname,
        place=place)
    results = inferencer.infer({'img': img})
    lab = numpy.argsort(results)  # probs and lab are the results of one batch data
    print "Label of infer_3.png is: %d" % lab[0][0][-1]