【深度學習系列】PaddlePaddle之手寫數字識別

時間 2019-11-12

標籤深度學習系列 paddlepaddle 手寫數字識別简体版

原文原文鏈接

　　上週在搜索關於深度學習分佈式運行方式的資料時，無心間搜到了paddlepaddle，發現這個框架的分佈式訓練方案作的還挺不錯的，想跟你們分享一下。不過呢，這塊內容太複雜了，因此就簡單的介紹一下paddlepaddle的第一個「hello word」程序----mnist手寫數字識別。下一次再介紹用PaddlePaddle作分佈式訓練的方案。其實以前也寫過一篇用CNN識別手寫數字集的文章（連接戳這裏~），是用keras實現的，此次用了paddlepaddle後，正好能夠簡單對比一下兩個框架的優劣。

什麼是PaddlePaddle？

　　PaddlePaddle是百度推出的一個深度學習框架，可能大多數人日常用的比較多的通常是tensorflow，caffe，mxnet等，但其實PaddlePaddle也是一個很是不錯的框架（聽說之前叫Paddle，如今更名叫PaddlePaddle，不知道爲啥總以爲有股莫名的萌點）

PaddlePaddle能作什麼?

　　傳統的基本都能作，尤爲對NLP的支持很好，譬如情感分析，word embedding，語言模型等，反正你想獲得的，常見的均可以用它來試一試~

PaddlePaddle的安裝

　　不得不吐槽一下PaddlePaddle的安裝，官網上說「PaddlePaddle目前惟一官方支持的運行的方式是Docker容器」，而docker其實在國內還並非特別的流行，以前遇到的全部的框架，都有不少種安裝方式，很是方便，因此這個惟一支持docker讓人以爲很是詭異 = =！不過偶然試了一下，竟然能夠用pip install，不過爲啥官網上沒有寫呢？因此，對於新手來講，最簡單的安裝方式就是：

CPU版本安裝

pip install paddlepaddle html

GPU版本安裝

pip install paddlepaddle-gpu git

用PaddlePaddle實現手寫數字識別

　　訓練步驟

　　傳統的方式此次就不展開講了，爲了對比咱們仍是用CNN來進行訓練。PaddlePaddle訓練一次模型完整的過程能夠以下幾個步驟：

導入數據---->定義網絡結構---->訓練模型---->保存模型---->測試結果　　

　　下面，我直接用代碼來展現訓練的過程（之後代碼都會放在github裏）：

#coding:utf-8
import os
from PIL import Image
import numpy as np
import paddle.v2 as paddle

# 設置是否用gpu，0爲否，1爲是
with_gpu = os.getenv('WITH_GPU', '0') != '1'

# 定義網絡結構
def convolutional_neural_network_org(img):
    # 第一層卷積層
    conv_pool_1 = paddle.networks.simple_img_conv_pool(
        input=img,
        filter_size=5,
        num_filters=20,
        num_channel=1,
        pool_size=2,
        pool_stride=2,
        act=paddle.activation.Relu())
    # 第二層卷積層
    conv_pool_2 = paddle.networks.simple_img_conv_pool(
        input=conv_pool_1,
        filter_size=5,
        num_filters=50,
        num_channel=20,
        pool_size=2,
        pool_stride=2,
        act=paddle.activation.Relu())
    # 全鏈接層
    predict = paddle.layer.fc(
        input=conv_pool_2, size=10, act=paddle.activation.Softmax())
    return predict

def main():
    # 初始化定義跑模型的設備
    paddle.init(use_gpu=with_gpu, trainer_count=1)

    # 讀取數據
    images = paddle.layer.data(
        name='pixel', type=paddle.data_type.dense_vector(784))
    label = paddle.layer.data(
        name='label', type=paddle.data_type.integer_value(10))

    # 調用以前定義的網絡結構
    predict = convolutional_neural_network_org(images)

    # 定義損失函數
    cost = paddle.layer.classification_cost(input=predict, label=label)

    # 指定訓練相關的參數
    parameters = paddle.parameters.create(cost)

    # 定義訓練方法
    optimizer = paddle.optimizer.Momentum(
        learning_rate=0.1 / 128.0,
        momentum=0.9,
        regularization=paddle.optimizer.L2Regularization(rate=0.0005 * 128))

    # 訓練模型
    trainer = paddle.trainer.SGD(
        cost=cost, parameters=parameters, update_equation=optimizer)


    lists = []

    # 定義event_handler，輸出訓練過程當中的結果
    def event_handler(event):
        if isinstance(event, paddle.event.EndIteration):
            if event.batch_id % 100 == 0:
                print "Pass %d, Batch %d, Cost %f, %s" % (
                    event.pass_id, event.batch_id, event.cost, event.metrics)
        if isinstance(event, paddle.event.EndPass):
            # 保存參數
            with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
                parameters.to_tar(f)

            result = trainer.test(reader=paddle.batch(
                paddle.dataset.mnist.test(), batch_size=128))
            print "Test with Pass %d, Cost %f, %s\n" % (
                event.pass_id, result.cost, result.metrics)
            lists.append((event.pass_id, result.cost,
                          result.metrics['classification_error_evaluator']))

    trainer.train(
        reader=paddle.batch(
            paddle.reader.shuffle(paddle.dataset.mnist.train(), buf_size=8192),
            batch_size=128),
        event_handler=event_handler,
        num_passes=10)

    # 找到訓練偏差最小的一次結果
    best = sorted(lists, key=lambda list: float(list[1]))[0]
    print 'Best pass is %s, testing Avgcost is %s' % (best[0], best[1])
    print 'The classification accuracy is %.2f%%' % (100 - float(best[2]) * 100)

    # 加載數據   
    def load_image(file):
        im = Image.open(file).convert('L')
        im = im.resize((28, 28), Image.ANTIALIAS)
        im = np.array(im).astype(np.float32).flatten()
        im = im / 255.0
        return im

    # 測試結果
    test_data = []
    cur_dir = os.path.dirname(os.path.realpath(__file__))
    test_data.append((load_image(cur_dir + '/image/infer_3.png'), ))

    probs = paddle.infer(
        output_layer=predict, parameters=parameters, input=test_data)
    lab = np.argsort(-probs)  # probs and lab are the results of one batch data
    print "Label of image/infer_3.png is: %d" % lab[0][0]


if __name__ == '__main__':
    main()

　　上面的代碼看起來很長，但結構仍是很清楚的。下面咱們用實際數據測試一下，看一下效果到底怎麼樣~

　　BaseLine版本

　　首先我用了官網給出的例子，直接用最基本的CNN網絡結構訓練了一下，代碼以下:　

 1 def convolutional_neural_network_org(img):
 2     # 第一層卷積層
 3     conv_pool_1 = paddle.networks.simple_img_conv_pool(
 4         input=img,
 5         filter_size=5,
 6         num_filters=20,
 7         num_channel=1,
 8         pool_size=2,
 9         pool_stride=2,
10         act=paddle.activation.Relu())
11     # 第二層卷積層
12     conv_pool_2 = paddle.networks.simple_img_conv_pool(
13         input=conv_pool_1,
14         filter_size=5,
15         num_filters=50,
16         num_channel=20,
17         pool_size=2,
18         pool_stride=2,
19         act=paddle.activation.Relu())
20     # 全鏈接層
21     predict = paddle.layer.fc(
22         input=conv_pool_2, size=10, act=paddle.activation.Softmax())
23     return predict

　　輸出結果以下:github

I1023 13:45:46.519075 34144 Util.cpp:166] commandline:  --use_gpu=True --trainer_count=1
[INFO 2017-10-23 13:45:52,667 layers.py:2539] output for __conv_pool_0___conv: c = 20, h = 24, w = 24, size = 11520
[INFO 2017-10-23 13:45:52,667 layers.py:2667] output for __conv_pool_0___pool: c = 20, h = 12, w = 12, size = 2880
[INFO 2017-10-23 13:45:52,668 layers.py:2539] output for __conv_pool_1___conv: c = 50, h = 8, w = 8, size = 3200
[INFO 2017-10-23 13:45:52,669 layers.py:2667] output for __conv_pool_1___pool: c = 50, h = 4, w = 4, size = 800
I1023 13:45:52.675750 34144 GradientMachine.cpp:85] Initing parameters..
I1023 13:45:52.686153 34144 GradientMachine.cpp:92] Init parameters done.
Pass 0, Batch 0, Cost 3.048408, {'classification_error_evaluator': 0.890625}
Pass 0, Batch 100, Cost 0.188828, {'classification_error_evaluator': 0.0546875}
Pass 0, Batch 200, Cost 0.075183, {'classification_error_evaluator': 0.015625}
Pass 0, Batch 300, Cost 0.070798, {'classification_error_evaluator': 0.015625}
Pass 0, Batch 400, Cost 0.079673, {'classification_error_evaluator': 0.046875}
Test with Pass 0, Cost 0.074587, {'classification_error_evaluator': 0.023800000548362732}
```
```
```
Pass 4, Batch 0, Cost 0.032454, {'classification_error_evaluator': 0.015625}
Pass 4, Batch 100, Cost 0.021028, {'classification_error_evaluator': 0.0078125}
Pass 4, Batch 200, Cost 0.020458, {'classification_error_evaluator': 0.0}
Pass 4, Batch 300, Cost 0.046728, {'classification_error_evaluator': 0.015625}
Pass 4, Batch 400, Cost 0.030264, {'classification_error_evaluator': 0.015625}
Test with Pass 4, Cost 0.035841, {'classification_error_evaluator': 0.01209999993443489}

Best pass is 4, testing Avgcost is 0.0358410408473
The classification accuracy is 98.79%
Label of image/infer_3.png is: 3

real    0m31.565s
user    0m20.996s
sys    0m15.891s

　　能夠看到，第一行輸出選擇的設備是不是gpu，這裏我選擇的是gpu，因此等於1，若是是cpu，就是0。接下來四行輸出的是網絡結構，而後開始輸出訓練結果，訓練結束，咱們把這幾回迭代中偏差最小的結果輸出來，98.79%，效果仍是很不錯的，畢竟只迭代了5次。最後看一下輸出時間，很是快，約31秒。然而這個結果我並非特別滿意，由於以前用keras作的時候調整的網絡模型訓練日後準確率可以達到99.72%，不過速度很是慢，迭代69次大概須要30分鐘左右，因此我以爲這個網絡結構仍是能夠改進一下的，因此我對這個網絡結構改進了一下，請看改進版docker

　　改進版　網絡

def convolutional_neural_network(img):
    # 第一層卷積層
    conv_pool_1 = paddle.networks.simple_img_conv_pool(
        input=img,
        filter_size=5,
        num_filters=20,
        num_channel=1,
        pool_size=2,
        pool_stride=2,
        act=paddle.activation.Relu())
    # 加一層dropout層
    drop_1 = paddle.layer.dropout(input=conv_pool_1, dropout_rate=0.2)
    # 第二層卷積層
    conv_pool_2 = paddle.networks.simple_img_conv_pool(
        input=drop_1,
        filter_size=5,
        num_filters=50,
        num_channel=20,
        pool_size=2,
        pool_stride=2,
        act=paddle.activation.Relu())
    # 加一層dropout層
    drop_2 = paddle.layer.dropout(input=conv_pool_2, dropout_rate=0.5)
    # 全鏈接層
    fc1 = paddle.layer.fc(input=drop_2, size=10, act=paddle.activation.Linear())
    bn = paddle.layer.batch_norm(input=fc1,act=paddle.activation.Relu(),
         layer_attr=paddle.attr.Extra(drop_rate=0.2))
    predict = paddle.layer.fc(input=bn, size=10, act=paddle.activation.Softmax())
    return predict

　　在改進版裏咱們加了一些dropout層來避免過擬合。分別在第一層卷積層和第二層卷積層後加了dropout，閾值設爲0.5。改變網絡結構也很是簡單，直接在定義的網絡結構函數裏對模型進行修改便可，這一點其實和keras的網絡結構定義方式仍是挺像的，易用性很高。下面來看看效果：

I1023 14:01:51.653827 34244 Util.cpp:166] commandline:  --use_gpu=True --trainer_count=1
[INFO 2017-10-23 14:01:57,830 layers.py:2539] output for __conv_pool_0___conv: c = 20, h = 24, w = 24, size = 11520
[INFO 2017-10-23 14:01:57,831 layers.py:2667] output for __conv_pool_0___pool: c = 20, h = 12, w = 12, size = 2880
[INFO 2017-10-23 14:01:57,832 layers.py:2539] output for __conv_pool_1___conv: c = 50, h = 8, w = 8, size = 3200
[INFO 2017-10-23 14:01:57,833 layers.py:2667] output for __conv_pool_1___pool: c = 50, h = 4, w = 4, size = 800
I1023 14:01:57.842871 34244 GradientMachine.cpp:85] Initing parameters..
I1023 14:01:57.854014 34244 GradientMachine.cpp:92] Init parameters done.
Pass 0, Batch 0, Cost 2.536199, {'classification_error_evaluator': 0.875}
Pass 0, Batch 100, Cost 1.668236, {'classification_error_evaluator': 0.515625}
Pass 0, Batch 200, Cost 1.024846, {'classification_error_evaluator': 0.375}
Pass 0, Batch 300, Cost 1.086315, {'classification_error_evaluator': 0.46875}
Pass 0, Batch 400, Cost 0.767804, {'classification_error_evaluator': 0.25}
Pass 0, Batch 500, Cost 0.545784, {'classification_error_evaluator': 0.1875}
Pass 0, Batch 600, Cost 0.731662, {'classification_error_evaluator': 0.328125}
```
```
```
Pass 49, Batch 0, Cost 0.415184, {'classification_error_evaluator': 0.09375}
Pass 49, Batch 100, Cost 0.067616, {'classification_error_evaluator': 0.0}
Pass 49, Batch 200, Cost 0.161415, {'classification_error_evaluator': 0.046875}
Pass 49, Batch 300, Cost 0.202667, {'classification_error_evaluator': 0.046875}
Pass 49, Batch 400, Cost 0.336043, {'classification_error_evaluator': 0.140625}
Pass 49, Batch 500, Cost 0.290948, {'classification_error_evaluator': 0.125}
Pass 49, Batch 600, Cost 0.223433, {'classification_error_evaluator': 0.109375}
Pass 49, Batch 700, Cost 0.217345, {'classification_error_evaluator': 0.0625}
Pass 49, Batch 800, Cost 0.163140, {'classification_error_evaluator': 0.046875}
Pass 49, Batch 900, Cost 0.203645, {'classification_error_evaluator': 0.078125}
Test with Pass 49, Cost 0.033639, {'classification_error_evaluator': 0.008100000210106373}

Best pass is 48, testing Avgcost is 0.0313018567383
The classification accuracy is 99.28%
Label of image/infer_3.png is: 3

real    5m3.151s
user    4m0.052s
sys    1m8.084s

　　從上面的數據來看，這個效果仍是很不錯滴，對比以前用keras訓練的效果來看，結果以下：

　　能夠看到這個速度差別是很大的了，在準確率差很少的狀況下，訓練時間幾乎比原來縮短了六倍，網絡結構也相對簡單，說明須要調整的參數也少了不少。

總結

　　paddlepaddle用起來仍是很方便的，不管是定義網絡結構仍是訓練速度，都值得一提，然而我我的的體驗中，認爲最值得說的是這幾點：

1.導入數據方便。此次訓練的手寫數字識別數據量比較小，可是若是想要添加數據，也很是方便，直接添加到相應目錄下。

2.event_handler機制，能夠自定義訓練結果輸出內容。以前用的keras，以及mxnet等都是已經封裝好的函數，輸出信息都是同樣的，這裏paddlepaddle把這個函數並無徹底封裝，而是讓咱們用戶自定義輸出的內容，能夠方便咱們減小冗餘的信息，增長一些模型訓練的細節的輸出，也能夠用相應的函數畫出模型收斂的圖片，可視化收斂曲線。

3.速度快。上面的例子已經證實了paddlepaddle的速度，而且在提高速度的同時，模型準確度也與最優結果相差很少，這對於咱們訓練海量數據的模型是一個極大的優點啊！

然而，paddlepaddle也有幾點讓我用的有點難受，譬如文檔太少了啊，報錯了上網上搜沒啥結果啊等等，不過我以爲這個應該不是大問題，之後用的人多了之後確定相關資料也會更多。因此一直很疑惑，爲啥paddlepaddle不火呢？安裝詭異是一個吐槽點，但其實仍是很優秀的一個開源軟件，尤爲是最值得說的分佈式訓練方式，多機多卡的設計是很是優秀的，本篇沒有講，下次講講如何用paddlepaddle作單機單卡，單機多卡，多機單卡和多機多卡的訓練方式來訓練模型，你們多多用起來呀~~能夠多交流呀~

ps:因爲paddlepaddle的文檔實在太少了，官網的文章理論介紹的比較多，網上的博文大多數都是幾個經典例子來回跑，因此我打算寫個系列，跟實戰相關的，再也不只有深度學習的「hello world」程序，此次用「hello world」作個引子，下篇開始寫點乾貨哈哈~