上個月發佈了四篇文章,主要講了深度學習中的「hello world」----mnist圖像識別,以及卷積神經網絡的原理詳解,包括基本原理、本身手寫CNN和paddlepaddle的源碼解析。這篇主要跟你們講講如何用PaddlePaddle和Tensorflow作圖像分類。全部程序都在個人github裏,能夠自行下載訓練。html
在卷積神經網絡中,有五大經典模型,分別是:LeNet-5,AlexNet,GoogleNet,Vgg和ResNet。本文首先本身設計一個小型CNN網絡結構來對圖像進行分類,再瞭解一下LeNet-5網絡結構對圖像作分類,並用比較流行的Tensorflow框架和百度的PaddlePaddle實現LeNet-5網絡結構,並對結果對比。git
什麼是圖像分類github
圖像分類是根據圖像的語義信息將不一樣類別圖像區分開來,是計算機視覺中重要的基本問題,也是圖像檢測、圖像分割、物體跟蹤、行爲分析等其餘高層視覺任務的基礎。圖像分類在不少領域有普遍應用,包括安防領域的人臉識別和智能視頻分析等,交通領域的交通場景識別,互聯網領域基於內容的圖像檢索和相冊自動歸類,醫學領域的圖像識別等(引用自官網)網絡
cifar-10數據集app
CIFAR-10分類問題是機器學習領域的一個通用基準,由60000張32*32的RGB彩色圖片構成,共10個分類。50000張用於訓練集,10000張用於測試集。其問題是將32X32像素的RGB圖像分類成10種類別:飛機
,手機
,鳥
,貓
,鹿
,狗
,青蛙
,馬
,船
和卡車。
更多信息能夠參考CIFAR-10和Alex Krizhevsky的演講報告。常見的還有cifar-100,分類物體達到100類,以及ILSVRC比賽的100類。框架
本身設計CNN機器學習
瞭解CNN的基本網絡結構後,首先本身設計一個簡單的CNN網絡結構對cifar-10數據進行分類。ide
網絡結構函數
代碼實現學習
1.網絡結構:simple_cnn.py
1 #coding:utf-8 2 ''' 3 Created by huxiaoman 2017.11.27 4 simple_cnn.py:本身設計的一個簡單的cnn網絡結構 5 ''' 6 7 import os 8 from PIL import Image 9 import numpy as np 10 import paddle.fluid as fluid 11 from paddle.trainer_config_helpers import * 12 13 with_gpu = os.getenv('WITH_GPU', '0') != '1' 14 15 def simple_cnn(img): 16 conv_pool_1 = paddle.networks.simple_img_conv_pool( 17 input=img, 18 filter_size=5, 19 num_filters=20, 20 num_channel=3, 21 pool_size=2, 22 pool_stride=2, 23 act=paddle.activation.Relu()) 24 conv_pool_2 = paddle.networks.simple_img_conv_pool( 25 input=conv_pool_1, 26 filter_size=5, 27 num_filters=50, 28 num_channel=20, 29 pool_size=2, 30 pool_stride=2, 31 act=paddle.activation.Relu()) 32 fc = paddle.layer.fc( 33 input=conv_pool_2, size=512, act=paddle.activation.Softmax())
2.訓練程序:train_simple_cnn.py
1 #coding:utf-8 2 ''' 3 Created by huxiaoman 2017.11.27 4 train_simple—_cnn.py:訓練simple_cnn對cifar10數據集進行分類 5 ''' 6 import sys, os 7 8 import paddle.v2 as paddle 9 from simple_cnn import simple_cnn 10 11 with_gpu = os.getenv('WITH_GPU', '0') != '1' 12 13 14 def main(): 15 datadim = 3 * 32 * 32 16 classdim = 10 17 18 # PaddlePaddle init 19 paddle.init(use_gpu=with_gpu, trainer_count=7) 20 21 image = paddle.layer.data( 22 name="image", type=paddle.data_type.dense_vector(datadim)) 23 24 # Add neural network config 25 # option 1. resnet 26 # net = resnet_cifar10(image, depth=32) 27 # option 2. vgg 28 net = simple_cnn(image) 29 30 out = paddle.layer.fc( 31 input=net, size=classdim, act=paddle.activation.Softmax()) 32 33 lbl = paddle.layer.data( 34 name="label", type=paddle.data_type.integer_value(classdim)) 35 cost = paddle.layer.classification_cost(input=out, label=lbl) 36 37 # Create parameters 38 parameters = paddle.parameters.create(cost) 39 40 # Create optimizer 41 momentum_optimizer = paddle.optimizer.Momentum( 42 momentum=0.9, 43 regularization=paddle.optimizer.L2Regularization(rate=0.0002 * 128), 44 learning_rate=0.1 / 128.0, 45 learning_rate_decay_a=0.1, 46 learning_rate_decay_b=50000 * 100, 47 learning_rate_schedule='discexp') 48 49 # End batch and end pass event handler 50 def event_handler(event): 51 if isinstance(event, paddle.event.EndIteration): 52 if event.batch_id % 100 == 0: 53 print "\nPass %d, Batch %d, Cost %f, %s" % ( 54 event.pass_id, event.batch_id, event.cost, event.metrics) 55 else: 56 sys.stdout.write('.') 57 sys.stdout.flush() 58 if isinstance(event, paddle.event.EndPass): 59 # save parameters 60 with open('params_pass_%d.tar' % event.pass_id, 'w') as f: 61 parameters.to_tar(f) 62 63 result = trainer.test( 64 reader=paddle.batch( 65 paddle.dataset.cifar.test10(), batch_size=128), 66 feeding={'image': 0, 67 'label': 1}) 68 print "\nTest with Pass %d, %s" % (event.pass_id, result.metrics) 69 70 # Create trainer 71 trainer = paddle.trainer.SGD( 72 cost=cost, parameters=parameters, update_equation=momentum_optimizer) 73 74 # Save the inference topology to protobuf. 75 inference_topology = paddle.topology.Topology(layers=out) 76 with open("inference_topology.pkl", 'wb') as f: 77 inference_topology.serialize_for_inference(f) 78 79 trainer.train( 80 reader=paddle.batch( 81 paddle.reader.shuffle( 82 paddle.dataset.cifar.train10(), buf_size=50000), 83 batch_size=128), 84 num_passes=200, 85 event_handler=event_handler, 86 feeding={'image': 0, 87 'label': 1}) 88 89 # inference 90 from PIL import Image 91 import numpy as np 92 import os 93 94 def load_image(file): 95 im = Image.open(file) 96 im = im.resize((32, 32), Image.ANTIALIAS) 97 im = np.array(im).astype(np.float32) 98 # The storage order of the loaded image is W(widht), 99 # H(height), C(channel). PaddlePaddle requires 100 # the CHW order, so transpose them. 101 im = im.transpose((2, 0, 1)) # CHW 102 # In the training phase, the channel order of CIFAR 103 # image is B(Blue), G(green), R(Red). But PIL open 104 # image in RGB mode. It must swap the channel order. 105 im = im[(2, 1, 0), :, :] # BGR 106 im = im.flatten() 107 im = im / 255.0 108 return im 109 110 test_data = [] 111 cur_dir = os.path.dirname(os.path.realpath(__file__)) 112 test_data.append((load_image(cur_dir + '/image/dog.png'), )) 113 114 # users can remove the comments and change the model name 115 # with open('params_pass_50.tar', 'r') as f: 116 # parameters = paddle.parameters.Parameters.from_tar(f) 117 118 probs = paddle.infer( 119 output_layer=out, parameters=parameters, input=test_data) 120 lab = np.argsort(-probs) # probs and lab are the results of one batch data 121 print "Label of image/dog.png is: %d" % lab[0][0] 122 123 124 if __name__ == '__main__': 125 main()
3.結果輸出
1 I1128 21:44:30.218085 14733 Util.cpp:166] commandline: --use_gpu=True --trainer_count=7 2 [INFO 2017-11-28 21:44:35,874 layers.py:2539] output for __conv_pool_0___conv: c = 20, h = 28, w = 28, size = 15680 3 [INFO 2017-11-28 21:44:35,874 layers.py:2667] output for __conv_pool_0___pool: c = 20, h = 14, w = 14, size = 3920 4 [INFO 2017-11-28 21:44:35,875 layers.py:2539] output for __conv_pool_1___conv: c = 50, h = 10, w = 10, size = 5000 5 [INFO 2017-11-28 21:44:35,876 layers.py:2667] output for __conv_pool_1___pool: c = 50, h = 5, w = 5, size = 1250 6 I1128 21:44:35.881502 14733 MultiGradientMachine.cpp:99] numLogicalDevices=1 numThreads=7 numDevices=8 7 I1128 21:44:35.928449 14733 GradientMachine.cpp:85] Initing parameters.. 8 I1128 21:44:36.056259 14733 GradientMachine.cpp:92] Init parameters done. 9 10 Pass 0, Batch 0, Cost 2.302628, {'classification_error_evaluator': 0.9296875} 11 ................................................................................ 12 ``` 13 Pass 199, Batch 200, Cost 0.869726, {'classification_error_evaluator': 0.3671875} 14 ................................................................................................... 15 Pass 199, Batch 300, Cost 0.801396, {'classification_error_evaluator': 0.3046875} 16 ..........................................................................................I1128 23:21:39.443141 14733 MultiGradientMachine.cpp:99] numLogicalDevices=1 numThreads=7 numDevices=8 17 18 Test with Pass 199, {'classification_error_evaluator': 0.5248000025749207} 19 Label of image/dog.png is: 9
我開了7個線程,用了8個Tesla K80 GPU訓練,batch_size = 128,迭代次數200次,耗時1h37min,錯誤分類率爲0.5248,這個結果,emm,不算很高,咱們能夠把它做爲一個baseline,後面對其進行調優。
LeNet-5網絡結構
Lenet-5網絡結構來源於Yan LeCun提出的,原文爲《Gradient-based learning applied to document recognition》,論文裏使用的是mnist手寫數字做爲輸入數據(32 * 32)進行驗證。咱們來看一下網絡結構。
LeNet-5一共有8層: 1個輸入層+3個卷積層(C一、C三、C5)+2個下采樣層(S二、S4)+1個全鏈接層(F6)+1個輸出層,每層有多個feature map(自動提取的多組特徵)。
Input輸入層
cifar10 數據集,每一張圖片尺寸:32 * 32
C1 卷積層
- 6個feature_map,卷積核大小 5 * 5 ,feature_map尺寸:28 * 28
- 每一個卷積神經元的參數數目:5 * 5 = 25個和一個bias參數
- 鏈接數目:(5*5+1)* 6 *(28*28) = 122,304
- 參數共享:每一個feature_map內共享參數,$\therefore$共(5*5+1)*6 = 156個參數
S2 下采樣層(池化層)
- 6個14*14的feature_map,pooling大小 2* 2
- 每一個單元與上一層的feature_map中的一個2*2的滑動窗口鏈接,不重疊,所以S2每一個feature_map大小是C1中feature_map大小的1/4
- 鏈接數:(2*2+1)*1*14*14*6 = 5880個
- 參數共享:每一個feature_map內共享參數,有2 * 6 = 12個訓練參數
C3 卷積層
這層略微複雜,S2神經元與C3是多對多的關係,好比最簡單方式:用S2的全部feature map與C3的全部feature map作全鏈接(也能夠對S2抽樣幾個feature map出來與C3某個feature map鏈接),這種全鏈接方式下:6個S2的feature map使用6個獨立的5×5卷積核獲得C3中1個feature map(生成每一個feature map時對應一個bias),C3中共有16個feature map,因此該層須要學習的參數個數爲:(5×5×6+1)×16=2416個,神經元鏈接數爲:2416×8×8=154624個。
S4 下采樣層
同S2,若是採用Max Pooling/Mean Pooling,則該層須要學習的參數個數爲0個,神經元鏈接數爲:(2×2+1)×16×4×4=1280個。
C5卷積層
相似C3,用S4的全部feature map與C5的全部feature map作全鏈接,這種全鏈接方式下:16個S4的feature map使用16個獨立的1×1卷積核獲得C5中1個feature map(生成每一個feature map時對應一個bias),C5中共有120個feature map,因此該層須要學習的參數個數爲:(1×1×16+1)×120=2040個,神經元鏈接數爲:2040個。
F6 全鏈接層
將C5層展開獲得4×4×120=1920個節點,並接一個全鏈接層,考慮bias,該層須要學習的參數和鏈接個數爲:(1920+1)*84=161364個。
輸出層
該問題是個10分類問題,因此有10個輸出單元,經過softmax作機率歸一化,每一個分類的輸出單元對應84個輸入。
LeNet-5的PaddlePaddle實現
1.網絡結構 lenet.py
1 #coding:utf-8 2 ''' 3 Created by huxiaoman 2017.11.27 4 lenet.py:LeNet-5 5 ''' 6 7 import os 8 from PIL import Image 9 import numpy as np 10 import paddle.v2 as paddle 11 from paddle.trainer_config_helpers import * 12 13 with_gpu = os.getenv('WITH_GPU', '0') != '1' 14 15 def lenet(img): 16 conv_pool_1 = paddle.networks.simple_img_conv_pool( 17 input=img, 18 filter_size=5, 19 num_filters=6, 20 num_channel=3, 21 pool_size=2, 22 pool_stride=2, 23 act=paddle.activation.Relu()) 24 conv_pool_2 = paddle.networks.simple_img_conv_pool( 25 input=conv_pool_1, 26 filter_size=5, 27 num_filters=16, 28 pool_size=2, 29 pool_stride=2, 30 act=paddle.activation.Relu()) 31 conv_3 = img_conv_layer( 32 input = conv_pool_2, 33 filter_size = 1, 34 num_filters = 120, 35 stride = 1) 36 fc = paddle.layer.fc( 37 input=conv_3, size=84, act=paddle.activation.Sigmoid()) 38 return fc
2.訓練代碼 train_lenet.py
1 #coding:utf-8 2 ''' 3 Created by huxiaoman 2017.11.27 4 train_lenet.py:訓練LeNet-5對cifar10數據集進行分類 5 ''' 6 7 import sys, os 8 9 import paddle.v2 as paddle 10 from lenet import lenet 11 12 with_gpu = os.getenv('WITH_GPU', '0') != '1' 13 14 15 def main(): 16 datadim = 3 * 32 * 32 17 classdim = 10 18 19 # PaddlePaddle init 20 paddle.init(use_gpu=with_gpu, trainer_count=7) 21 22 image = paddle.layer.data( 23 name="image", type=paddle.data_type.dense_vector(datadim)) 24 25 # Add neural network config 26 # option 1. resnet 27 # net = resnet_cifar10(image, depth=32) 28 # option 2. vgg 29 net = lenet(image) 30 31 out = paddle.layer.fc( 32 input=net, size=classdim, act=paddle.activation.Softmax()) 33 34 lbl = paddle.layer.data( 35 name="label", type=paddle.data_type.integer_value(classdim)) 36 cost = paddle.layer.classification_cost(input=out, label=lbl) 37 38 # Create parameters 39 parameters = paddle.parameters.create(cost) 40 41 # Create optimizer 42 momentum_optimizer = paddle.optimizer.Momentum( 43 momentum=0.9, 44 regularization=paddle.optimizer.L2Regularization(rate=0.0002 * 128), 45 learning_rate=0.1 / 128.0, 46 learning_rate_decay_a=0.1, 47 learning_rate_decay_b=50000 * 100, 48 learning_rate_schedule='discexp') 49 50 # End batch and end pass event handler 51 def event_handler(event): 52 if isinstance(event, paddle.event.EndIteration): 53 if event.batch_id % 100 == 0: 54 print "\nPass %d, Batch %d, Cost %f, %s" % ( 55 event.pass_id, event.batch_id, event.cost, event.metrics) 56 else: 57 sys.stdout.write('.') 58 sys.stdout.flush() 59 if isinstance(event, paddle.event.EndPass): 60 # save parameters 61 with open('params_pass_%d.tar' % event.pass_id, 'w') as f: 62 parameters.to_tar(f) 63 64 result = trainer.test( 65 reader=paddle.batch( 66 paddle.dataset.cifar.test10(), batch_size=128), 67 feeding={'image': 0, 68 'label': 1}) 69 print "\nTest with Pass %d, %s" % (event.pass_id, result.metrics) 70 71 # Create trainer 72 trainer = paddle.trainer.SGD( 73 cost=cost, parameters=parameters, update_equation=momentum_optimizer) 74 75 # Save the inference topology to protobuf. 76 inference_topology = paddle.topology.Topology(layers=out) 77 with open("inference_topology.pkl", 'wb') as f: 78 inference_topology.serialize_for_inference(f) 79 80 trainer.train( 81 reader=paddle.batch( 82 paddle.reader.shuffle( 83 paddle.dataset.cifar.train10(), buf_size=50000), 84 batch_size=128), 85 num_passes=200, 86 event_handler=event_handler, 87 feeding={'image': 0, 88 'label': 1}) 89 90 # inference 91 from PIL import Image 92 import numpy as np 93 import os 94 95 def load_image(file): 96 im = Image.open(file) 97 im = im.resize((32, 32), Image.ANTIALIAS) 98 im = np.array(im).astype(np.float32) 99 # The storage order of the loaded image is W(widht), 100 # H(height), C(channel). PaddlePaddle requires 101 # the CHW order, so transpose them. 102 im = im.transpose((2, 0, 1)) # CHW 103 # In the training phase, the channel order of CIFAR 104 # image is B(Blue), G(green), R(Red). But PIL open 105 # image in RGB mode. It must swap the channel order. 106 im = im[(2, 1, 0), :, :] # BGR 107 im = im.flatten() 108 im = im / 255.0 109 return im 110 111 test_data = [] 112 cur_dir = os.path.dirname(os.path.realpath(__file__)) 113 test_data.append((load_image(cur_dir + '/image/dog.png'), )) 114 115 # users can remove the comments and change the model name 116 # with open('params_pass_50.tar', 'r') as f: 117 # parameters = paddle.parameters.Parameters.from_tar(f) 118 119 probs = paddle.infer( 120 output_layer=out, parameters=parameters, input=test_data) 121 lab = np.argsort(-probs) # probs and lab are the results of one batch data 122 print "Label of image/dog.png is: %d" % lab[0][0] 123 124 125 if __name__ == '__main__': 126 main()
3.結果輸出
1 I1129 14:52:44.314946 15153 Util.cpp:166] commandline: --use_gpu=True --trainer_count=7 2 [INFO 2017-11-29 14:52:50,490 layers.py:2539] output for __conv_pool_0___conv: c = 6, h = 28, w = 28, size = 4704 3 [INFO 2017-11-29 14:52:50,491 layers.py:2667] output for __conv_pool_0___pool: c = 6, h = 14, w = 14, size = 1176 4 [INFO 2017-11-29 14:52:50,491 layers.py:2539] output for __conv_pool_1___conv: c = 16, h = 10, w = 10, size = 1600 5 [INFO 2017-11-29 14:52:50,492 layers.py:2667] output for __conv_pool_1___pool: c = 16, h = 5, w = 5, size = 400 6 [INFO 2017-11-29 14:52:50,493 layers.py:2539] output for __conv_0__: c = 120, h = 5, w = 5, size = 3000 7 I1129 14:52:50.498749 15153 MultiGradientMachine.cpp:99] numLogicalDevices=1 numThreads=7 numDevices=8 8 I1129 14:52:50.545882 15153 GradientMachine.cpp:85] Initing parameters.. 9 I1129 14:52:50.651103 15153 GradientMachine.cpp:92] Init parameters done. 10 11 Pass 0, Batch 0, Cost 2.331898, {'classification_error_evaluator': 0.9609375} 12 ``` 13 ...... 14 Pass 199, Batch 300, Cost 0.004373, {'classification_error_evaluator': 0.0} 15 ..........................................................................................I1129 16:17:08.678097 15153 MultiGradientMachine.cpp:99] numLogicalDevices=1 numThreads=7 numDevices=8 16 17 Test with Pass 199, {'classification_error_evaluator': 0.39579999446868896} 18 Label of image/dog.png is: 7
一樣是7個線程,8個Tesla K80 GPU,batch_size = 128,迭代次數200次,耗時1h25min,錯誤分類率爲0.3957,相比與simple_cnn的0.5248提升了12.91%。固然,這個結果也並非很好,若是輸出詳細的日誌,能夠看到在訓練的過程當中loss先降後升,說明有必定程度的過擬合,對於如何防止過擬合,咱們在後面會詳細講解。
有一個可視化CNN的網站能夠對mnist和cifar10分類的網絡結構進行可視化,這是cifar-10 BaseCNN的網絡結構:
LeNet-5的Tensorflow實現
tensorflow版本的LeNet-5版本的能夠參照models/tutorials/image/cifar10/(https://github.com/tensorflow/models/tree/master/tutorials/image/cifar10)的步驟來訓練,不過這裏面的代碼包含了不少數據處理、權重衰減以及正則化的一些方法防止過擬合。按照官方寫的,batch_size=128時在Tesla K40上迭代10w次須要4小時,準確率能達到86%。不過若是不對數據作處理,直接跑的話,效果應該沒有這麼好。不過能夠仔細借鑑cifar10_inputs.py裏的distorted_inouts函數對數據預處理增大數據集的思想,以及cifar10.py裏對於權重和偏置的衰減設置等。目前迭代到1w次左右,cost是0.98,acc是78.4%
對於未進行數據處理的cifar10我準備也跑一次,看看效果如何,與paddle的結果對比一下。不過得等到週末再補上了 = =
總結
本節用常規的cifar-10數據集作圖像分類,用了三種實現方式,第一種是本身設計的一個簡單的cnn,第二種是LeNet-5,第三種是Tensorflow實現的LeNet-5,對比速度能夠見一下表格:
能夠看到LeNet-5相比於原始的simple_cnn在準確率和速度方面都有必定的的提高,等tensorflow版本跑完後能夠把結果加上去再對比一下。不過用Lenet-5網絡結構後,結果雖然有必定的提高,可是仍是不夠理想,在日誌裏看到loss的信息基本能夠推斷出是過擬合,對於神經網絡訓練過程當中出現的過擬合狀況咱們應該如何避免,下期咱們講着重講解。此外在下一節將介紹AlexNet,並對分類作一個實驗,對比其效果。
參考文獻
1.LeNet-5論文:《Gradient-based learning applied to document recognition》