【深度學習系列】用PaddlePaddle和Tensorflow進行圖像分類

時間 2019-11-12

標籤深度學習系列 paddlepaddle tensorflow 進行圖像分類简体版

原文原文鏈接

　　上個月發佈了四篇文章，主要講了深度學習中的「hello world」----mnist圖像識別，以及卷積神經網絡的原理詳解，包括基本原理、本身手寫CNN和paddlepaddle的源碼解析。這篇主要跟你們講講如何用PaddlePaddle和Tensorflow作圖像分類。全部程序都在個人github裏，能夠自行下載訓練。html

　　在卷積神經網絡中，有五大經典模型，分別是：LeNet-5,AlexNet,GoogleNet,Vgg和ResNet。本文首先本身設計一個小型CNN網絡結構來對圖像進行分類，再瞭解一下LeNet-5網絡結構對圖像作分類，並用比較流行的Tensorflow框架和百度的PaddlePaddle實現LeNet-5網絡結構，並對結果對比。git

什麼是圖像分類github

　　圖像分類是根據圖像的語義信息將不一樣類別圖像區分開來，是計算機視覺中重要的基本問題，也是圖像檢測、圖像分割、物體跟蹤、行爲分析等其餘高層視覺任務的基礎。圖像分類在不少領域有普遍應用，包括安防領域的人臉識別和智能視頻分析等，交通領域的交通場景識別，互聯網領域基於內容的圖像檢索和相冊自動歸類，醫學領域的圖像識別等(引用自官網)網絡

　cifar-10數據集app

　　CIFAR-10分類問題是機器學習領域的一個通用基準，由60000張32*32的RGB彩色圖片構成，共10個分類。50000張用於訓練集，10000張用於測試集。其問題是將32X32像素的RGB圖像分類成10種類別:飛機，手機，鳥，貓，鹿，狗，青蛙，馬，船和卡車。更多信息能夠參考CIFAR-10和Alex Krizhevsky的演講報告。常見的還有cifar-100，分類物體達到100類，以及ILSVRC比賽的100類。框架

本身設計CNN機器學習

　瞭解CNN的基本網絡結構後，首先本身設計一個簡單的CNN網絡結構對cifar-10數據進行分類。ide

　　網絡結構函數

　　代碼實現學習

　　1.網絡結構：simple_cnn.py

 1 #coding:utf-8
 2 '''
 3 Created by huxiaoman 2017.11.27
 4 simple_cnn.py:本身設計的一個簡單的cnn網絡結構
 5 '''
 6 
 7 import os
 8 from PIL import Image
 9 import numpy as np
10 import paddle.fluid as fluid
11 from paddle.trainer_config_helpers import *
12 
13 with_gpu = os.getenv('WITH_GPU', '0') != '1'
14 
15 def simple_cnn(img):
16     conv_pool_1 = paddle.networks.simple_img_conv_pool(
17         input=img,
18         filter_size=5,
19         num_filters=20,
20         num_channel=3,
21         pool_size=2,
22         pool_stride=2,
23         act=paddle.activation.Relu())
24     conv_pool_2 = paddle.networks.simple_img_conv_pool(
25         input=conv_pool_1,
26         filter_size=5,
27         num_filters=50,
28         num_channel=20,
29         pool_size=2,
30         pool_stride=2,
31         act=paddle.activation.Relu())
32     fc = paddle.layer.fc(
33         input=conv_pool_2, size=512, act=paddle.activation.Softmax())

　　2.訓練程序：train_simple_cnn.py

  1 #coding:utf-8
  2 '''
  3 Created by huxiaoman 2017.11.27
  4 train_simple—_cnn.py:訓練simple_cnn對cifar10數據集進行分類
  5 '''
  6 import sys, os
  7 
  8 import paddle.v2 as paddle
  9 from simple_cnn import simple_cnn
 10 
 11 with_gpu = os.getenv('WITH_GPU', '0') != '1'
 12 
 13 
 14 def main():
 15     datadim = 3 * 32 * 32
 16     classdim = 10
 17 
 18     # PaddlePaddle init
 19     paddle.init(use_gpu=with_gpu, trainer_count=7)
 20 
 21     image = paddle.layer.data(
 22         name="image", type=paddle.data_type.dense_vector(datadim))
 23 
 24     # Add neural network config
 25     # option 1. resnet
 26     # net = resnet_cifar10(image, depth=32)
 27     # option 2. vgg
 28     net = simple_cnn(image)
 29 
 30     out = paddle.layer.fc(
 31         input=net, size=classdim, act=paddle.activation.Softmax())
 32 
 33     lbl = paddle.layer.data(
 34         name="label", type=paddle.data_type.integer_value(classdim))
 35     cost = paddle.layer.classification_cost(input=out, label=lbl)
 36 
 37     # Create parameters
 38     parameters = paddle.parameters.create(cost)
 39 
 40     # Create optimizer
 41     momentum_optimizer = paddle.optimizer.Momentum(
 42         momentum=0.9,
 43         regularization=paddle.optimizer.L2Regularization(rate=0.0002 * 128),
 44         learning_rate=0.1 / 128.0,
 45         learning_rate_decay_a=0.1,
 46         learning_rate_decay_b=50000 * 100,
 47         learning_rate_schedule='discexp')
 48 
 49     # End batch and end pass event handler
 50     def event_handler(event):
 51         if isinstance(event, paddle.event.EndIteration):
 52             if event.batch_id % 100 == 0:
 53                 print "\nPass %d, Batch %d, Cost %f, %s" % (
 54                     event.pass_id, event.batch_id, event.cost, event.metrics)
 55             else:
 56                 sys.stdout.write('.')
 57                 sys.stdout.flush()
 58         if isinstance(event, paddle.event.EndPass):
 59             # save parameters
 60             with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
 61                 parameters.to_tar(f)
 62 
 63             result = trainer.test(
 64                 reader=paddle.batch(
 65                     paddle.dataset.cifar.test10(), batch_size=128),
 66                 feeding={'image': 0,
 67                          'label': 1})
 68             print "\nTest with Pass %d, %s" % (event.pass_id, result.metrics)
 69 
 70     # Create trainer
 71     trainer = paddle.trainer.SGD(
 72         cost=cost, parameters=parameters, update_equation=momentum_optimizer)
 73 
 74     # Save the inference topology to protobuf.
 75     inference_topology = paddle.topology.Topology(layers=out)
 76     with open("inference_topology.pkl", 'wb') as f:
 77         inference_topology.serialize_for_inference(f)
 78 
 79     trainer.train(
 80         reader=paddle.batch(
 81             paddle.reader.shuffle(
 82                 paddle.dataset.cifar.train10(), buf_size=50000),
 83             batch_size=128),
 84         num_passes=200,
 85         event_handler=event_handler,
 86         feeding={'image': 0,
 87                  'label': 1})
 88 
 89     # inference
 90     from PIL import Image
 91     import numpy as np
 92     import os
 93 
 94     def load_image(file):
 95         im = Image.open(file)
 96         im = im.resize((32, 32), Image.ANTIALIAS)
 97         im = np.array(im).astype(np.float32)
 98         # The storage order of the loaded image is W(widht),
 99         # H(height), C(channel). PaddlePaddle requires
100         # the CHW order, so transpose them.
101         im = im.transpose((2, 0, 1))  # CHW
102         # In the training phase, the channel order of CIFAR
103         # image is B(Blue), G(green), R(Red). But PIL open
104         # image in RGB mode. It must swap the channel order.
105         im = im[(2, 1, 0), :, :]  # BGR
106         im = im.flatten()
107         im = im / 255.0
108         return im
109 
110     test_data = []
111     cur_dir = os.path.dirname(os.path.realpath(__file__))
112     test_data.append((load_image(cur_dir + '/image/dog.png'), ))
113 
114     # users can remove the comments and change the model name
115     # with open('params_pass_50.tar', 'r') as f:
116     #    parameters = paddle.parameters.Parameters.from_tar(f)
117 
118     probs = paddle.infer(
119         output_layer=out, parameters=parameters, input=test_data)
120     lab = np.argsort(-probs)  # probs and lab are the results of one batch data
121     print "Label of image/dog.png is: %d" % lab[0][0]
122 
123 
124 if __name__ == '__main__':
125     main()

　　3.結果輸出

 1 I1128 21:44:30.218085 14733 Util.cpp:166] commandline:  --use_gpu=True --trainer_count=7
 2 [INFO 2017-11-28 21:44:35,874 layers.py:2539] output for __conv_pool_0___conv: c = 20, h = 28, w = 28, size = 15680
 3 [INFO 2017-11-28 21:44:35,874 layers.py:2667] output for __conv_pool_0___pool: c = 20, h = 14, w = 14, size = 3920
 4 [INFO 2017-11-28 21:44:35,875 layers.py:2539] output for __conv_pool_1___conv: c = 50, h = 10, w = 10, size = 5000
 5 [INFO 2017-11-28 21:44:35,876 layers.py:2667] output for __conv_pool_1___pool: c = 50, h = 5, w = 5, size = 1250
 6 I1128 21:44:35.881502 14733 MultiGradientMachine.cpp:99] numLogicalDevices=1 numThreads=7 numDevices=8
 7 I1128 21:44:35.928449 14733 GradientMachine.cpp:85] Initing parameters..
 8 I1128 21:44:36.056259 14733 GradientMachine.cpp:92] Init parameters done.
 9 
10 Pass 0, Batch 0, Cost 2.302628, {'classification_error_evaluator': 0.9296875}
11 ................................................................................
12 ```
13 Pass 199, Batch 200, Cost 0.869726, {'classification_error_evaluator': 0.3671875}
14 ...................................................................................................
15 Pass 199, Batch 300, Cost 0.801396, {'classification_error_evaluator': 0.3046875}
16 ..........................................................................................I1128 23:21:39.443141 14733 MultiGradientMachine.cpp:99] numLogicalDevices=1 numThreads=7 numDevices=8
17 
18 Test with Pass 199, {'classification_error_evaluator': 0.5248000025749207}
19 Label of image/dog.png is: 9

　　我開了7個線程，用了8個Tesla K80 GPU訓練，batch_size = 128,迭代次數200次，耗時1h37min，錯誤分類率爲0.5248，這個結果，emm，不算很高，咱們能夠把它做爲一個baseline，後面對其進行調優。

LeNet-5網絡結構

　　Lenet-5網絡結構來源於Yan LeCun提出的,原文爲《Gradient-based learning applied to document recognition》，論文裏使用的是mnist手寫數字做爲輸入數據（32 * 32）進行驗證。咱們來看一下網絡結構。

　　LeNet-5一共有8層: 1個輸入層+3個卷積層(C一、C三、C5)+2個下采樣層(S二、S4)+1個全鏈接層(F6)+1個輸出層，每層有多個feature map(自動提取的多組特徵)。

　　Input輸入層

　cifar10 數據集，每一張圖片尺寸：32 * 32

　　C1 卷積層

6個feature_map，卷積核大小 5 * 5 ，feature_map尺寸：28 * 28

每一個卷積神經元的參數數目：5 * 5 = 25個和一個bias參數

鏈接數目：（5*5+1）* 6 *（28*28） = 122,304

參數共享：每一個feature_map內共享參數，$\therefore$共(5*5+1)*6 = 156個參數

　　S2 下采樣層（池化層）

6個14*14的feature_map，pooling大小 2* 2

每一個單元與上一層的feature_map中的一個2*2的滑動窗口鏈接，不重疊，所以S2每一個feature_map大小是C1中feature_map大小的1/4

鏈接數：（2*2+1)*1*14*14*6 = 5880個

參數共享：每一個feature_map內共享參數，

　　C3 卷積層

　　這層略微複雜，S2神經元與C3是多對多的關係，好比最簡單方式：用S2的全部feature map與C3的全部feature map作全鏈接(也能夠對S2抽樣幾個feature map出來與C3某個feature map鏈接)，這種全鏈接方式下：6個S2的feature map使用6個獨立的5×5卷積核獲得C3中1個feature map(生成每一個feature map時對應一個bias)，C3中共有16個feature map，因此該層須要學習的參數個數爲：(5×5×6+1)×16=2416個，神經元鏈接數爲：2416×8×8=154624個。

　　S4 下采樣層

　　同S2，若是採用Max Pooling/Mean Pooling，則該層須要學習的參數個數爲0個，神經元鏈接數爲：(2×2+1)×16×4×4=1280個。

　　C5卷積層

　　相似C3，用S4的全部feature map與C5的全部feature map作全鏈接，這種全鏈接方式下：16個S4的feature map使用16個獨立的1×1卷積核獲得C5中1個feature map(生成每一個feature map時對應一個bias)，C5中共有120個feature map，因此該層須要學習的參數個數爲：(1×1×16+1)×120=2040個，神經元鏈接數爲：2040個。

　　F6 全鏈接層

　　將C5層展開獲得4×4×120=1920個節點，並接一個全鏈接層，考慮bias，該層須要學習的參數和鏈接個數爲：(1920+1)*84=161364個。

　　輸出層

　　該問題是個10分類問題，因此有10個輸出單元，經過softmax作機率歸一化，每一個分類的輸出單元對應84個輸入。

LeNet-5的PaddlePaddle實現

　　1.網絡結構 lenet.py

 1 #coding:utf-8
 2 '''
 3 Created by huxiaoman 2017.11.27
 4 lenet.py:LeNet-5
 5 '''
 6 
 7 import os
 8 from PIL import Image
 9 import numpy as np
10 import paddle.v2 as paddle
11 from paddle.trainer_config_helpers import *
12 
13 with_gpu = os.getenv('WITH_GPU', '0') != '1'
14 
15 def lenet(img):
16     conv_pool_1 = paddle.networks.simple_img_conv_pool(
17         input=img,
18         filter_size=5,
19         num_filters=6,
20         num_channel=3,
21         pool_size=2,
22         pool_stride=2,
23         act=paddle.activation.Relu())
24     conv_pool_2 = paddle.networks.simple_img_conv_pool(
25         input=conv_pool_1,
26         filter_size=5,
27         num_filters=16,
28         pool_size=2,
29         pool_stride=2,
30         act=paddle.activation.Relu())
31     conv_3 = img_conv_layer(
32         input = conv_pool_2,
33         filter_size = 1,
34         num_filters = 120,
35         stride = 1)
36     fc = paddle.layer.fc(
37         input=conv_3, size=84, act=paddle.activation.Sigmoid())
38     return fc

　　2.訓練代碼 train_lenet.py

  1 #coding:utf-8
  2 '''
  3 Created by huxiaoman 2017.11.27
  4 train_lenet.py:訓練LeNet-5對cifar10數據集進行分類
  5 '''
  6 
  7 import sys, os
  8 
  9 import paddle.v2 as paddle
 10 from lenet import lenet
 11 
 12 with_gpu = os.getenv('WITH_GPU', '0') != '1'
 13 
 14 
 15 def main():
 16     datadim = 3 * 32 * 32
 17     classdim = 10
 18 
 19     # PaddlePaddle init
 20     paddle.init(use_gpu=with_gpu, trainer_count=7)
 21 
 22     image = paddle.layer.data(
 23         name="image", type=paddle.data_type.dense_vector(datadim))
 24 
 25     # Add neural network config
 26     # option 1. resnet
 27     # net = resnet_cifar10(image, depth=32)
 28     # option 2. vgg
 29     net = lenet(image)
 30 
 31     out = paddle.layer.fc(
 32         input=net, size=classdim, act=paddle.activation.Softmax())
 33 
 34     lbl = paddle.layer.data(
 35         name="label", type=paddle.data_type.integer_value(classdim))
 36     cost = paddle.layer.classification_cost(input=out, label=lbl)
 37 
 38     # Create parameters
 39     parameters = paddle.parameters.create(cost)
 40 
 41     # Create optimizer
 42     momentum_optimizer = paddle.optimizer.Momentum(
 43         momentum=0.9,
 44         regularization=paddle.optimizer.L2Regularization(rate=0.0002 * 128),
 45         learning_rate=0.1 / 128.0,
 46         learning_rate_decay_a=0.1,
 47         learning_rate_decay_b=50000 * 100,
 48         learning_rate_schedule='discexp')
 49 
 50     # End batch and end pass event handler
 51     def event_handler(event):
 52         if isinstance(event, paddle.event.EndIteration):
 53             if event.batch_id % 100 == 0:
 54                 print "\nPass %d, Batch %d, Cost %f, %s" % (
 55                     event.pass_id, event.batch_id, event.cost, event.metrics)
 56             else:
 57                 sys.stdout.write('.')
 58                 sys.stdout.flush()
 59         if isinstance(event, paddle.event.EndPass):
 60             # save parameters
 61             with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
 62                 parameters.to_tar(f)
 63 
 64             result = trainer.test(
 65                 reader=paddle.batch(
 66                     paddle.dataset.cifar.test10(), batch_size=128),
 67                 feeding={'image': 0,
 68                          'label': 1})
 69             print "\nTest with Pass %d, %s" % (event.pass_id, result.metrics)
 70 
 71     # Create trainer
 72     trainer = paddle.trainer.SGD(
 73         cost=cost, parameters=parameters, update_equation=momentum_optimizer)
 74 
 75     # Save the inference topology to protobuf.
 76     inference_topology = paddle.topology.Topology(layers=out)
 77     with open("inference_topology.pkl", 'wb') as f:
 78         inference_topology.serialize_for_inference(f)
 79 
 80     trainer.train(
 81         reader=paddle.batch(
 82             paddle.reader.shuffle(
 83                 paddle.dataset.cifar.train10(), buf_size=50000),
 84             batch_size=128),
 85         num_passes=200,
 86         event_handler=event_handler,
 87         feeding={'image': 0,
 88                  'label': 1})
 89 
 90     # inference
 91     from PIL import Image
 92     import numpy as np
 93     import os
 94 
 95     def load_image(file):
 96         im = Image.open(file)
 97         im = im.resize((32, 32), Image.ANTIALIAS)
 98         im = np.array(im).astype(np.float32)
 99         # The storage order of the loaded image is W(widht),
100         # H(height), C(channel). PaddlePaddle requires
101         # the CHW order, so transpose them.
102         im = im.transpose((2, 0, 1))  # CHW
103         # In the training phase, the channel order of CIFAR
104         # image is B(Blue), G(green), R(Red). But PIL open
105         # image in RGB mode. It must swap the channel order.
106         im = im[(2, 1, 0), :, :]  # BGR
107         im = im.flatten()
108         im = im / 255.0
109         return im
110 
111     test_data = []
112     cur_dir = os.path.dirname(os.path.realpath(__file__))
113     test_data.append((load_image(cur_dir + '/image/dog.png'), ))
114 
115     # users can remove the comments and change the model name
116     # with open('params_pass_50.tar', 'r') as f:
117     #    parameters = paddle.parameters.Parameters.from_tar(f)
118 
119     probs = paddle.infer(
120         output_layer=out, parameters=parameters, input=test_data)
121     lab = np.argsort(-probs)  # probs and lab are the results of one batch data
122     print "Label of image/dog.png is: %d" % lab[0][0]
123 
124 
125 if __name__ == '__main__':
126     main()

　　3.結果輸出　

 1 I1129 14:52:44.314946 15153 Util.cpp:166] commandline:  --use_gpu=True --trainer_count=7
 2 [INFO 2017-11-29 14:52:50,490 layers.py:2539] output for __conv_pool_0___conv: c = 6, h = 28, w = 28, size = 4704
 3 [INFO 2017-11-29 14:52:50,491 layers.py:2667] output for __conv_pool_0___pool: c = 6, h = 14, w = 14, size = 1176
 4 [INFO 2017-11-29 14:52:50,491 layers.py:2539] output for __conv_pool_1___conv: c = 16, h = 10, w = 10, size = 1600
 5 [INFO 2017-11-29 14:52:50,492 layers.py:2667] output for __conv_pool_1___pool: c = 16, h = 5, w = 5, size = 400
 6 [INFO 2017-11-29 14:52:50,493 layers.py:2539] output for __conv_0__: c = 120, h = 5, w = 5, size = 3000
 7 I1129 14:52:50.498749 15153 MultiGradientMachine.cpp:99] numLogicalDevices=1 numThreads=7 numDevices=8
 8 I1129 14:52:50.545882 15153 GradientMachine.cpp:85] Initing parameters..
 9 I1129 14:52:50.651103 15153 GradientMachine.cpp:92] Init parameters done.
10 
11 Pass 0, Batch 0, Cost 2.331898, {'classification_error_evaluator': 0.9609375}
12 ```
13 ......
14 Pass 199, Batch 300, Cost 0.004373, {'classification_error_evaluator': 0.0}
15 ..........................................................................................I1129 16:17:08.678097 15153 MultiGradientMachine.cpp:99] numLogicalDevices=1 numThreads=7 numDevices=8
16 
17 Test with Pass 199, {'classification_error_evaluator': 0.39579999446868896}
18 Label of image/dog.png is: 7

　　一樣是7個線程，8個Tesla K80 GPU，batch_size = 128,迭代次數200次，耗時1h25min，錯誤分類率爲0.3957，相比與simple_cnn的0.5248提升了12.91%。固然，這個結果也並非很好，若是輸出詳細的日誌，能夠看到在訓練的過程當中loss先降後升，說明有必定程度的過擬合，對於如何防止過擬合，咱們在後面會詳細講解。

　　有一個可視化CNN的網站能夠對mnist和cifar10分類的網絡結構進行可視化，這是cifar-10 BaseCNN的網絡結構：

LeNet-5的Tensorflow實現

　　tensorflow版本的LeNet-5版本的能夠參照models/tutorials/image/cifar10/(https://github.com/tensorflow/models/tree/master/tutorials/image/cifar10)的步驟來訓練，不過這裏面的代碼包含了不少數據處理、權重衰減以及正則化的一些方法防止過擬合。按照官方寫的,batch_size=128時在Tesla K40上迭代10w次須要4小時，準確率能達到86%。不過若是不對數據作處理，直接跑的話，效果應該沒有這麼好。不過能夠仔細借鑑cifar10_inputs.py裏的distorted_inouts函數對數據預處理增大數據集的思想，以及cifar10.py裏對於權重和偏置的衰減設置等。目前迭代到1w次左右，cost是0.98，acc是78.4%

　　對於未進行數據處理的cifar10我準備也跑一次，看看效果如何，與paddle的結果對比一下。不過得等到週末再補上了 = =

總結

　　本節用常規的cifar-10數據集作圖像分類，用了三種實現方式，第一種是本身設計的一個簡單的cnn，第二種是LeNet-5，第三種是Tensorflow實現的LeNet-5，對比速度能夠見一下表格：

　　能夠看到LeNet-5相比於原始的simple_cnn在準確率和速度方面都有必定的的提高，等tensorflow版本跑完後能夠把結果加上去再對比一下。不過用Lenet-5網絡結構後，結果雖然有必定的提高，可是仍是不夠理想，在日誌裏看到loss的信息基本能夠推斷出是過擬合，對於神經網絡訓練過程當中出現的過擬合狀況咱們應該如何避免，下期咱們講着重講解。此外在下一節將介紹AlexNet，並對分類作一個實驗，對比其效果。

參考文獻

1.LeNet-5論文：《Gradient-based learning applied to document recognition》

2.可視化CNN：http://shixialiu.com/publications/cnnvis/demo/