基於PaddlePaddle的圖像分類實戰 | 深度學習基礎任務教程系列（一）

綜述

圖像相比文字可以提供更加生動、容易理解及更具藝術感的信息，圖像分類是根據圖像的語義信息將不一樣類別圖像區分開來，是圖像檢測、圖像分割、物體跟蹤、行爲分析等其餘高層視覺任務的基礎。圖像分類在安防、交通、互聯網、醫學等領域有着普遍的應用。html

通常來講，圖像分類經過手工提取特徵或特徵學習方法對整個圖像進行所有描述，而後使用分類器判別物體類別，所以如何提取圖像的特徵相當重要。基於深度學習的圖像分類方法，能夠經過有監督或無監督的方式學習層次化的特徵描述，從而取代了手工設計或選擇圖像特徵的工做。python

深度學習模型中的卷積神經網絡(Convolution Neural Network, CNN) 直接利用圖像像素信息做爲輸入，最大程度上保留了輸入圖像的全部信息，經過卷積操做進行特徵的提取和高層抽象，模型輸出直接是圖像識別的結果。這種基於"輸入-輸出"直接端到端的學習方法取得了很是好的效果。git

本教程主要介紹圖像分類的深度學習模型，以及如何使用PaddlePaddle在CIFAR10數據集上快速實現CNN模型。github

下載安裝命令

## CPU版本安裝命令
pip install -f https://paddlepaddle.org.cn/pip/oschina/cpu paddlepaddle

## GPU版本安裝命令
pip install -f https://paddlepaddle.org.cn/pip/oschina/gpu paddlepaddle-gpu

項目地址：數組

http://paddlepaddle.org/documentation/docs/zh/1.3/beginners_guide/basics/image_classification/index.html網絡

基於ImageNet數據集訓練的更多圖像分類模型，及對應的預訓練模型、finetune操做詳情請參照Github：框架

https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/image_classification/README_cn.mdide

效果

圖像分類包括通用圖像分類、細粒度圖像分類等。圖1展現了通用圖像分類效果，即模型能夠正確識別圖像上的主要物體。函數

圖1. 通用圖像分類展現oop

圖2展現了細粒度圖像分類-花卉識別的效果，要求模型能夠正確識別花的類別。

圖2. 細粒度圖像分類展現

一個好的模型既要對不一樣類別識別正確，同時也應該可以對不一樣視角、光照、背景、變形或部分遮擋的圖像正確識別(這裏咱們統一稱做圖像擾動)。圖3展現了一些圖像的擾動，較好的模型會像聰明的人類同樣可以正確識別。

圖3. 擾動圖片展現[7]

模型概覽：CNN

傳統CNN包含卷積層、全鏈接層等組件，並採用softmax多類別分類器和多類交叉熵損失函數，一個典型的卷積神經網絡如圖4所示，咱們先介紹用來構造CNN的常見組件。

圖4. CNN網絡示例[5]

• 卷積層(convolution layer): 執行卷積操做提取底層到高層的特徵，發掘出圖片局部關聯性質和空間不變性質。

• 池化層(pooling layer): 執行降採樣操做。經過取卷積輸出特徵圖中局部區塊的最大值(max-pooling)或者均值(avg-pooling)。降採樣也是圖像處理中常見的一種操做，能夠過濾掉一些不重要的高頻信息。

• 全鏈接層(fully-connected layer，或者fc layer): 輸入層到隱藏層的神經元是所有鏈接的。

• 非線性變化: 卷積層、全鏈接層後面通常都會接非線性變化函數，例如Sigmoid、Tanh、ReLu等來加強網絡的表達能力，在CNN裏最常使用的爲ReLu激活函數。

• Dropout [1] : 在模型訓練階段隨機讓一些隱層節點權重不工做，提升網絡的泛化能力，必定程度上防止過擬合。

接下來咱們主要介紹VGG，ResNet網絡結構。

一、VGG

牛津大學VGG(Visual Geometry Group)組在2014年ILSVRC提出的模型被稱做VGG模型[2] 。該模型相比以往模型進一步加寬和加深了網絡結構，它的核心是五組卷積操做，每兩組之間作Max-Pooling空間降維。同一組內採用屢次連續的3X3卷積，卷積核的數目由較淺組的64增多到最深組的512，同一組內的卷積核數目是同樣的。卷積以後接兩層全鏈接層，以後是分類層。因爲每組內卷積層的不一樣，有十一、1三、1六、19層這幾種模型，下圖展現一個16層的網絡結構。

VGG模型結構相對簡潔，提出以後也有不少文章基於此模型進行研究，如在ImageNet上首次公開超過人眼識別的模型[4]就是借鑑VGG模型的結構。

圖5. 基於ImageNet的VGG16模型

二、ResNet

ResNet(Residual Network) [3] 是2015年ImageNet圖像分類、圖像物體定位和圖像物體檢測比賽的冠軍。針對隨着網絡訓練加深致使準確度降低的問題，ResNet提出了殘差學習方法來減輕訓練深層網絡的困難。在已有設計思路(BN, 小卷積核，全卷積網絡)的基礎上，引入了殘差模塊。每一個殘差模塊包含兩條路徑，其中一條路徑是輸入特徵的直連通路，另外一條路徑對該特徵作兩到三次卷積操做獲得該特徵的殘差，最後再將兩條路徑上的特徵相加。

殘差模塊如圖7所示，左邊是基本模塊鏈接方式，由兩個輸出通道數相同的3x3卷積組成。右邊是瓶頸模塊(Bottleneck)鏈接方式，之因此稱爲瓶頸，是由於上面的1x1卷積用來降維(圖示例即256->64)，下面的1x1卷積用來升維(圖示例即64->256)，這樣中間3x3卷積的輸入和輸出通道數都較小(圖示例即64->64)。

圖7. 殘差模塊

三、數據準備

因爲ImageNet數據集較大，下載和訓練較慢，爲了方便你們學習，咱們使用CIFAR10數據集。CIFAR10數據集包含60,000張32x32的彩色圖片，10個類別，每一個類包含6,000張。其中50,000張圖片做爲訓練集，10000張做爲測試集。圖11從每一個類別中隨機抽取了10張圖片，展現了全部的類別。

圖11. CIFAR10數據集[6]

Paddle API提供了自動加載cifar數據集模塊paddle.dataset.cifar。

經過輸入python train.py，就能夠開始訓練模型了，如下小節將詳細介紹train.py的相關內容。

模型結構

一、Paddle 初始化

讓咱們從導入Paddle Fluid API 和輔助模塊開始。

from __future__ import print_function	
	
	
import os	
import paddle	
import paddle.fluidas fluid	
import numpy	
import sys	
from vgg import vgg_bn_drop	
from resnet import resnet_cifar10

本教程中咱們提供了VGG和ResNet兩個模型的配置。

二、VGG

首先介紹VGG模型結構，因爲CIFAR10圖片大小和數量相比ImageNet數據小不少，所以這裏的模型針對CIFAR10數據作了必定的適配。卷積部分引入了BN和Dropout操做。VGG核心模塊的輸入是數據層，vgg_bn_drop定義了16層VGG結構，每層卷積後面引入BN層和Dropout層，詳細的定義以下：

def vgg_bn_drop(input):	
    def conv_block(ipt, num_filter, groups, dropouts):	
        return fluid.nets.img_conv_group(	
            input=ipt,	
            pool_size=2,	
            pool_stride=2,	
            conv_num_filter=[num_filter] * groups,	
            conv_filter_size=3,	
            conv_act='relu',	
            conv_with_batchnorm=True,	
            conv_batchnorm_drop_rate=dropouts,	
            pool_type='max')	
	
    conv1= conv_block(input, 64, 2, [0.3, 0])	
    conv2= conv_block(conv1, 128, 2, [0.4, 0])	
    conv3= conv_block(conv2, 256, 3, [0.4, 0.4, 0])	
    conv4= conv_block(conv3, 512, 3, [0.4, 0.4, 0])	
    conv5= conv_block(conv4, 512, 3, [0.4, 0.4, 0])	
	
    drop= fluid.layers.dropout(x=conv5, dropout_prob=0.5)	
    fc1= fluid.layers.fc(input=drop, size=512, act=None)	
    bn= fluid.layers.batch_norm(input=fc1, act='relu')	
    drop2= fluid.layers.dropout(x=bn, dropout_prob=0.5)	
    fc2= fluid.layers.fc(input=drop2, size=512, act=None)	
    predict= fluid.layers.fc(input=fc2, size=10, act='softmax')	
    return predict

首先定義了一組卷積網絡，即conv_block。卷積核大小爲3x3，池化窗口大小爲2x2，窗口滑動大小爲2，groups決定每組VGG模塊是幾回連續的卷積操做，dropouts指定Dropout操做的機率。所使用的img_conv_group是在paddle.fluit.net中預約義的模塊，由若干組Conv->BN->ReLu->Dropout 和一組Pooling 組成。

五組卷積操做，即5個conv_block。第1、二組採用兩次連續的卷積操做。第3、4、五組採用三次連續的卷積操做。每組最後一個卷積後面Dropout機率爲0，即不使用Dropout操做。

最後接兩層512維的全鏈接。

在這裏，VGG網絡首先提取高層特徵，隨後在全鏈接層中將其映射到和類別維度大小一致的向量上，最後經過Softmax方法計算圖片劃爲每一個類別的機率。

三、ResNet

ResNet模型的第一、三、4步和VGG模型相同，這裏再也不介紹。主要介紹第2步即CIFAR10數據集上ResNet核心模塊。

先介紹resnet_cifar10中的一些基本函數，再介紹網絡鏈接過程。

• conv_bn_layer: 帶BN的卷積層。

• shortcut: 殘差模塊的"直連"路徑，"直連"實際分兩種形式：殘差模塊輸入和輸出特徵通道數不等時，採用1x1卷積的升維操做；殘差模塊輸入和輸出通道相等時，採用直連操做。

• basicblock: 一個基礎殘差模塊，即圖9左邊所示，由兩組3x3卷積組成的路徑和一條"直連"路徑組成。

• layer_warp: 一組殘差模塊，由若干個殘差模塊堆積而成。每組中第一個殘差模塊滑動窗口大小與其餘能夠不一樣，以用來減小特徵圖在垂直和水平方向的大小。

def conv_bn_layer(input,	
                  ch_out,	
                  filter_size,	
                  stride,	
                  padding,	
                  act='relu',	
                  bias_attr=False):	
    tmp= fluid.layers.conv2d(	
        input=input,	
        filter_size=filter_size,	
        num_filters=ch_out,	
        stride=stride,	
        padding=padding,	
        act=None,	
        bias_attr=bias_attr)	
    return fluid.layers.batch_norm(input=tmp, act=act)	
	
	
def shortcut(input, ch_in, ch_out, stride):	
    if ch_in!= ch_out:	
        return conv_bn_layer(input, ch_out, 1, stride, 0, None)	
    else:	
        return input	
	
	
def basicblock(input, ch_in, ch_out, stride):	
    tmp= conv_bn_layer(input, ch_out, 3, stride, 1)	
    tmp= conv_bn_layer(tmp, ch_out, 3, 1, 1, act=None, bias_attr=True)	
    short= shortcut(input, ch_in, ch_out, stride)	
    return fluid.layers.elementwise_add(x=tmp, y=short, act='relu')	
	
	
def layer_warp(block_func, input, ch_in, ch_out, count, stride):	
    tmp= block_func(input, ch_in, ch_out, stride)	
    for iin range(1, count):	
        tmp= block_func(tmp, ch_out, ch_out, 1)	
    return tmp

resnet_cifar10的鏈接結構主要有如下幾個過程。

底層輸入鏈接一層conv_bn_layer，即帶BN的卷積層。

而後鏈接3組殘差模塊即下面配置3組layer_warp，每組採用圖10 左邊殘差模塊組成。

最後對網絡作均值池化並返回該層。

注意：除第一層卷積層和最後一層全鏈接層以外，要求三組layer_warp總的含參層數可以被6整除，即resnet_cifar10的depth 要知足(depth - 2) % 6 = 0

def resnet_cifar10(ipt, depth=32):	
    # depth should be one of 20, 32, 44, 56, 110, 1202	
    assert (depth- 2) % 6== 0	
    n= (depth- 2) // 6	
    nStages= {16, 64, 128}	
    conv1= conv_bn_layer(ipt, ch_out=16, filter_size=3, stride=1, padding=1)	
    res1= layer_warp(basicblock, conv1, 16, 16, n, 1)	
    res2= layer_warp(basicblock, res1, 16, 32, n, 2)	
    res3= layer_warp(basicblock, res2, 32, 64, n, 2)	
    pool= fluid.layers.pool2d(	
        input=res3, pool_size=8, pool_type='avg', pool_stride=1)	
    predict= fluid.layers.fc(input=pool, size=10, act='softmax')	
    return predict

四、Infererence配置

網絡輸入定義爲data_layer(數據層)，在圖像分類中即爲圖像像素信息。CIFRAR10是RGB 3通道32x32大小的彩色圖，所以輸入數據大小爲3072(3x32x32)。

def inference_network():	
    # The image is 32 * 32 with RGB representation.	
    data_shape = [3, 32, 32]	
    images = fluid.layers.data(name='pixel', shape=data_shape, dtype='float32')	
	
	
    predict = resnet_cifar10(images, 32)	
    # predict = vgg_bn_drop(images) # un-comment to use vgg net	
return predict

五、Train 配置

而後咱們須要設置訓練程序train_network。它首先從推理程序中進行預測。在訓練期間，它將從預測中計算avg_cost。在有監督訓練中須要輸入圖像對應的類別信息，一樣經過fluid.layers.data來定義。訓練中採用多類交叉熵做爲損失函數，並做爲網絡的輸出，預測階段定義網絡的輸出爲分類器獲得的機率信息。

注意:訓練程序應該返回一個數組，第一個返回參數必須是avg_cost。訓練器使用它來計算梯度。

def train_network(predict):	
    label = fluid.layers.data(name='label', shape=[1], dtype='int64')	
    cost = fluid.layers.cross_entropy(input=predict, label=label)	
    avg_cost = fluid.layers.mean(cost)	
    accuracy = fluid.layers.accuracy(input=predict, label=label)	
return [avg_cost, accuracy]

六、Optimizer 配置

在下面的Adam optimizer，learning_rate是學習率，與網絡的訓練收斂速度有關係。

def optimizer_program():	
    return fluid.optimizer.Adam(learning_rate=0.001)

七、訓練模型

-1）Data Feeders 配置

cifar.train10()每次產生一條樣本，在完成shuffle和batch以後，做爲訓練的輸入。

# Each batch will yield 128 images	
BATCH_SIZE= 128	
	
# Reader for training	
    train_reader = paddle.batch(	
        paddle.reader.shuffle(	
           paddle.dataset.cifar.train10(), buf_size=128 * 100),	
        batch_size=BATCH_SIZE)	
# Reader for testing. A separated data set for testing.	
    test_reader = paddle.batch(	
       paddle.dataset.cifar.test10(), batch_size=BATCH_SIZE)

-2）Trainer 程序的實現

咱們須要爲訓練過程制定一個main_program, 一樣的，還須要爲測試程序配置一個test_program。定義訓練的place，並使用先前定義的優化器。

place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()	
	
    feed_order = ['pixel', 'label']	
	
    main_program = fluid.default_main_program()	
    star_program = fluid.default_startup_program()	
	
    predict = inference_network()	
avg_cost, acc = train_network(predict)	
	
# Test program	
    test_program = main_program.clone(for_test=True)	
	
    optimizer = optimizer_program()	
    optimizer.minimize(avg_cost)	
	
    exe = fluid.Executor(place)	
	
    EPOCH_NUM = 1	
# For training test cost	
    def train_test(program, reader):	
        count = 0	
        feed_var_list = [	
           program.global_block().var(var_name) for var_name in feed_order	
        ]	
        feeder_test = fluid.DataFeeder(feed_list=feed_var_list, place=place)	
        test_exe = fluid.Executor(place)	
        accumulated = len([avg_cost, acc]) * [0]	
        for tid, test_data in enumerate(reader()):	
            avg_cost_np = test_exe.run(	
                program=program,	
               feed=feeder_test.feed(test_data),	
               fetch_list=[avg_cost, acc])	
            accumulated = [	
                x[0] + x[1][0] for x in zip(accumulated, avg_cost_np)	
            ]	
            count += 1	
        return [x / count for x in accumulated]

-3）訓練主循環以及過程輸出

在接下來的主訓練循環中，咱們將經過輸出來來觀察訓練過程，或進行測試等。

# main train loop.	
    def train_loop():	
        feed_var_list_loop = [	
           main_program.global_block().var(var_name) for var_name in feed_order	
        ]	
        feeder = fluid.DataFeeder(feed_list=feed_var_list_loop, place=place)	
        exe.run(star_program)	
	
        step = 0	
        for pass_id in range(EPOCH_NUM):	
            for step_id, data_train in enumerate(train_reader()):	
                avg_loss_value = exe.run(	
                    main_program,	
                   feed=feeder.feed(data_train),	
                   fetch_list=[avg_cost, acc])	
                if step_id % 100 == 0:	
                   print("\nPass %d, Batch %d, Cost %f, Acc %f" % (	
                        step_id, pass_id, avg_loss_value[0], avg_loss_value[1]))	
                else:	
                    sys.stdout.write('.')	
                   sys.stdout.flush()	
                step += 1	
	
            avg_cost_test, accuracy_test = train_test(	
                test_program, reader=test_reader)	
            print('\nTest with Pass {0}, Loss {1:2.2}, Acc {2:2.2}'.format(	
                pass_id, avg_cost_test, accuracy_test))	
	
            if params_dirname is not None:	
               fluid.io.save_inference_model(params_dirname, ["pixel"],	
                                             [predict], exe)	
	
train_loop()

-4）訓練

經過trainer_loop函數訓練, 這裏咱們只進行了2個Epoch, 通常咱們在實際應用上會執行上百個以上Epoch

注意:CPU，每一個Epoch 將花費大約15～20分鐘。這部分可能須要一段時間。請隨意修改代碼，在GPU上運行測試，以提升訓練速度。

train_loop()

一輪訓練log示例以下所示，通過1個pass，訓練集上平均Accuracy 爲0.59 ，測試集上平均Accuracy 爲0.6 。

Pass 0, Batch 0, Cost 3.869598, Acc 0.164062

...................................................................................................

Pass 100, Batch 0, Cost 1.481038, Acc 0.460938

...................................................................................................

Pass 200, Batch 0, Cost 1.340323, Acc 0.523438

...................................................................................................

Pass 300, Batch 0, Cost 1.223424, Acc 0.593750

..........................................................................................

Test with Pass 0, Loss 1.1, Acc 0.6

圖13是訓練的分類錯誤率曲線圖，運行到第200個pass後基本收斂，最終獲得測試集上分類錯誤率爲8.54%。

圖13. CIFAR10數據集上VGG模型的分類錯誤率

應用模型

可使用訓練好的模型對圖片進行分類，下面程序展現瞭如何加載已經訓練好的網絡和參數進行推斷。

一、生成預測輸入數據

dog.png是一張小狗的圖片. 咱們將它轉換成numpy數組以知足feeder的格式.

from PIL import Image	
	
    def load_image(infer_file):	
        im = Image.open(infer_file)	
        im = im.resize((32, 32), Image.ANTIALIAS)	
	
        im = numpy.array(im).astype(numpy.float32)	
        # The storage order of the loaded image is W(width),	
        # H(height), C(channel). PaddlePaddle requires	
        # the CHW order, so transpose them.	
        im = im.transpose((2, 0, 1))  # CHW	
        im = im / 255.0	
	
        # Add one dimension to mimic the list format.	
        im = numpy.expand_dims(im, axis=0)	
        return im	
	
    cur_dir = os.path.dirname(os.path.realpath(__file__))	
    img = load_image(cur_dir + '/image/dog.png')

二、Inferencer 配置和預測

與訓練過程相似，inferencer須要構建相應的過程。咱們從params_dirname加載網絡和通過訓練的參數。咱們能夠簡單地插入前面定義的推理程序。如今咱們準備作預測。

place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()	
    exe = fluid.Executor(place)	
inference_scope = fluid.core.Scope()	
	
    with fluid.scope_guard(inference_scope):	
        # Use fluid.io.load_inference_model to obtain the inference program desc,	
        # the feed_target_names (the names of variables that will be feeded	
        # data using feed operators), and the fetch_targets (variables that	
        # we want to obtain data from using fetch operators).	
        [inference_program, feed_target_names,	
         fetch_targets] = fluid.io.load_inference_model(params_dirname, exe)	
	
        # The input's dimension of conv should be 4-D or 5-D.	
        # Use inference_transpiler to speedup	
        inference_transpiler_program = inference_program.clone()	
        t = fluid.transpiler.InferenceTranspiler()	
       t.transpile(inference_transpiler_program, place)	
        # Construct feed as a dictionary of {feed_target_name: feed_target_data}	
        # and results will contain a list of data corresponding to fetch_targets.	
        results = exe.run(	
            inference_program,	
           feed={feed_target_names[0]: img},	
           fetch_list=fetch_targets)	
	
        transpiler_results = exe.run(	
           inference_transpiler_program,	
           feed={feed_target_names[0]: img},	
           fetch_list=fetch_targets)	
	
        assert len(results[0]) == len(transpiler_results[0])	
        for i in range(len(results[0])):	
           numpy.testing.assert_almost_equal(	
                results[0][i], transpiler_results[0][i], decimal=5)	
        # infer label	
        label_list = [	
            "airplane", "automobile", "bird", "cat", "deer", "dog", "frog",	
            "horse", "ship", "truck"	
        ]	
	
        print("infer results: %s" % label_list[numpy.argmax(results[0])])

總結

傳統圖像分類方法由多個階段構成，框架較爲複雜，而端到端的CNN模型結構可一步到位，並且大幅度提高了分類準確率。本文咱們首先介紹VGG、ResNet兩個經典的模型；而後基於CIFAR10數據集，介紹如何使用PaddlePaddle配置和訓練CNN模型；最後介紹如何使用PaddlePaddle的API接口對圖片進行預測和特徵提取。對於其餘數據集好比ImageNet，配置和訓練流程是一樣的。請參照Github

https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/image_classification/README_cn.md。

參考文獻

[1] G.E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R.R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580, 2012.

[2] K. Chatfield, K. Simonyan, A. Vedaldi, A. Zisserman. Return of the Devil in the Details: Delving Deep into Convolutional Nets. BMVC, 2014。

[3] K. He, X. Zhang, S. Ren, J. Sun. Deep Residual Learning for Image Recognition. CVPR 2016.

[4] He, K., Zhang, X., Ren, S., and Sun, J. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. ArXiv e-prints, February 2015.

[5] http://deeplearning.net/tutorial/lenet.html

[6] https://www.cs.toronto.edu/~kriz/cifar.html

[7] http://cs231n.github.io/classification/

下載安裝命令

## CPU版本安裝命令
pip install -f https://paddlepaddle.org.cn/pip/oschina/cpu paddlepaddle

## GPU版本安裝命令
pip install -f https://paddlepaddle.org.cn/pip/oschina/gpu paddlepaddle-gpu

>> 訪問 PaddlePaddle 官網，瞭解更多相關內容。

本文同步分享在博客「飛槳 PaddlePaddle」（CSDN）。
若有侵權，請聯繫 support@oschina.cn 刪除。
本文參與「OSC源創計劃」，歡迎正在閱讀的你也加入，一塊兒分享。