第二十二節，TensorFlow中的圖片分類模型庫slim的使用、數據集處理

時間 2019-11-12

標籤第二十二 tensorflow 圖片分類模型 slim 使用數據處理简体版

原文原文鏈接

Google在TensorFlow1.0，以後推出了一個叫slim的庫，TF-slim是TensorFlow的一個新的輕量級的高級API接口。這個模塊是在16年新推出的，其主要目的是來作所謂的「代碼瘦身」。它相似咱們在TensorFlow模塊中所介紹的tf.contrib.lyers模塊，將不少常見的TensorFlow函數進行了二次封裝，使得代碼變得更加簡潔，特別適用於構建複雜結構的深度神經網絡，它能夠用了定義、訓練、和評估複雜的模型。html

這裏咱們爲何要過來介紹這一節的內容呢？主要是由於TensorFlow的models模塊裏提供了大量用slim寫好的網絡模型結構代碼，以及用該代碼訓練出來的模型檢查點文件，能夠做爲咱們預訓練模型來使用。所以咱們須要會使用slim庫。python

一獲取models中的slim模塊代碼

爲了可以使用models中的代碼，須要先驗證下咱們的TensorFlow版本是否集成了slim模塊。接着從GitHub上將models代碼下載下來：linux

1.驗證slim庫

在使用slim以前，要測試本地的tf.contrib.slim模塊是否有效，在命令行中輸入以下命令：git

python -c "import tensorflow.contrib.slim as slim; eval = slim.evaluation.evaluate_once"

若是沒有任何錯誤，則代表TF-Slim是能夠工做的。github

2. 下載models模塊

To use TF-Slim for image classification, you also have to install the TF-Slim image models library, which is not part of the core TF library. To do this, check out the tensorflow/models repository as follows:shell

cd $HOME/workspace
git clone https://github.com/tensorflow/models/

This will put the TF-Slim image models library in $HOME/workspace/models/research/slim. (It will also create a directory calledmodels/inception, which contains an older version of slim; you can safely ignore this.)express

To verify that this has worked, execute the following commands; it should run without raising any errors.apache

cd $HOME/workspace/models/research/slim
python -c "from nets import cifarnet; mynet = cifarnet.cifarnet"

我使用的是window操做系統，我直接從https://github.com/tensorflow/models/網址下載了該模塊：windows

二 models中的slim目錄結構

slim位於\models-master\research\slim路徑下，一共有5個文件夾：api

datasets：處理數據集相關的代碼。
deployment:部署。經過建立clone方式實現跨機器的分佈訓練，能夠在多CPU和多GPU上實現運算的同步或者異步。
nets：該文件夾裏存放着各類網絡模型。
preprocessing：適用於各類網絡的圖片處理函數。
scripts：運行網絡模型的一些案例腳本，這些腳本只能在支持shell的系統下使用。

在這裏重點介紹datasets，nets，preprocessing三個文件夾。

1.datesets數據集處理模塊

datasets裏面存放着經常使用的圖片訓練數據集相關的代碼。主要支持的數據集有cifar十、flowers、mnist、imagenet。

代碼文件的名稱和數據集相對應，可使用這些代碼下載或獲取數據集中的數據。以imagenet爲例，可使用以下函數從網上獲取imagenet標籤。

    imagenet_map = imagenet.create_readable_names_for_imagenet_labels()

上面代碼返回的是imagenet中1000個類的分類標籤名字(與樣本序列對應)。

2.nets模塊

該文件夾下面包含各類網絡模塊：

每一個網絡模型文件都是以本身的名字命名的，並且裏面的代碼結構框架也大體相同，以inception_resnet_v2爲例：

# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Contains the definition of the Inception Resnet V2 architecture.

As described in http://arxiv.org/abs/1602.07261.

  Inception-v4, Inception-ResNet and the Impact of Residual Connections
    on Learning
  Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, Alex Alemi
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function


import tensorflow as tf

slim = tf.contrib.slim


def block35(net, scale=1.0, activation_fn=tf.nn.relu, scope=None, reuse=None):
  """Builds the 35x35 resnet block."""
  with tf.variable_scope(scope, 'Block35', [net], reuse=reuse):
    with tf.variable_scope('Branch_0'):
      tower_conv = slim.conv2d(net, 32, 1, scope='Conv2d_1x1')
    with tf.variable_scope('Branch_1'):
      tower_conv1_0 = slim.conv2d(net, 32, 1, scope='Conv2d_0a_1x1')
      tower_conv1_1 = slim.conv2d(tower_conv1_0, 32, 3, scope='Conv2d_0b_3x3')
    with tf.variable_scope('Branch_2'):
      tower_conv2_0 = slim.conv2d(net, 32, 1, scope='Conv2d_0a_1x1')
      tower_conv2_1 = slim.conv2d(tower_conv2_0, 48, 3, scope='Conv2d_0b_3x3')
      tower_conv2_2 = slim.conv2d(tower_conv2_1, 64, 3, scope='Conv2d_0c_3x3')
    mixed = tf.concat(axis=3, values=[tower_conv, tower_conv1_1, tower_conv2_2])
    up = slim.conv2d(mixed, net.get_shape()[3], 1, normalizer_fn=None,
                     activation_fn=None, scope='Conv2d_1x1')
    scaled_up = up * scale
    if activation_fn == tf.nn.relu6:
      # Use clip_by_value to simulate bandpass activation.
      scaled_up = tf.clip_by_value(scaled_up, -6.0, 6.0)

    net += scaled_up
    if activation_fn:
      net = activation_fn(net)
  return net


def block17(net, scale=1.0, activation_fn=tf.nn.relu, scope=None, reuse=None):
  """Builds the 17x17 resnet block."""
  with tf.variable_scope(scope, 'Block17', [net], reuse=reuse):
    with tf.variable_scope('Branch_0'):
      tower_conv = slim.conv2d(net, 192, 1, scope='Conv2d_1x1')
    with tf.variable_scope('Branch_1'):
      tower_conv1_0 = slim.conv2d(net, 128, 1, scope='Conv2d_0a_1x1')
      tower_conv1_1 = slim.conv2d(tower_conv1_0, 160, [1, 7],
                                  scope='Conv2d_0b_1x7')
      tower_conv1_2 = slim.conv2d(tower_conv1_1, 192, [7, 1],
                                  scope='Conv2d_0c_7x1')
    mixed = tf.concat(axis=3, values=[tower_conv, tower_conv1_2])
    up = slim.conv2d(mixed, net.get_shape()[3], 1, normalizer_fn=None,
                     activation_fn=None, scope='Conv2d_1x1')

    scaled_up = up * scale
    if activation_fn == tf.nn.relu6:
      # Use clip_by_value to simulate bandpass activation.
      scaled_up = tf.clip_by_value(scaled_up, -6.0, 6.0)

    net += scaled_up
    if activation_fn:
      net = activation_fn(net)
  return net


def block8(net, scale=1.0, activation_fn=tf.nn.relu, scope=None, reuse=None):
  """Builds the 8x8 resnet block."""
  with tf.variable_scope(scope, 'Block8', [net], reuse=reuse):
    with tf.variable_scope('Branch_0'):
      tower_conv = slim.conv2d(net, 192, 1, scope='Conv2d_1x1')
    with tf.variable_scope('Branch_1'):
      tower_conv1_0 = slim.conv2d(net, 192, 1, scope='Conv2d_0a_1x1')
      tower_conv1_1 = slim.conv2d(tower_conv1_0, 224, [1, 3],
                                  scope='Conv2d_0b_1x3')
      tower_conv1_2 = slim.conv2d(tower_conv1_1, 256, [3, 1],
                                  scope='Conv2d_0c_3x1')
    mixed = tf.concat(axis=3, values=[tower_conv, tower_conv1_2])
    up = slim.conv2d(mixed, net.get_shape()[3], 1, normalizer_fn=None,
                     activation_fn=None, scope='Conv2d_1x1')

    scaled_up = up * scale
    if activation_fn == tf.nn.relu6:
      # Use clip_by_value to simulate bandpass activation.
      scaled_up = tf.clip_by_value(scaled_up, -6.0, 6.0)

    net += scaled_up
    if activation_fn:
      net = activation_fn(net)
  return net


def inception_resnet_v2_base(inputs,
                             final_endpoint='Conv2d_7b_1x1',
                             output_stride=16,
                             align_feature_maps=False,
                             scope=None,
                             activation_fn=tf.nn.relu):
  """Inception model from  http://arxiv.org/abs/1602.07261.

  Constructs an Inception Resnet v2 network from inputs to the given final
  endpoint. This method can construct the network up to the final inception
  block Conv2d_7b_1x1.

  Args:
    inputs: a tensor of size [batch_size, height, width, channels].
    final_endpoint: specifies the endpoint to construct the network up to. It
      can be one of ['Conv2d_1a_3x3', 'Conv2d_2a_3x3', 'Conv2d_2b_3x3',
      'MaxPool_3a_3x3', 'Conv2d_3b_1x1', 'Conv2d_4a_3x3', 'MaxPool_5a_3x3',
      'Mixed_5b', 'Mixed_6a', 'PreAuxLogits', 'Mixed_7a', 'Conv2d_7b_1x1']
    output_stride: A scalar that specifies the requested ratio of input to
      output spatial resolution. Only supports 8 and 16.
    align_feature_maps: When true, changes all the VALID paddings in the network
      to SAME padding so that the feature maps are aligned.
    scope: Optional variable_scope.
    activation_fn: Activation function for block scopes.

  Returns:
    tensor_out: output tensor corresponding to the final_endpoint.
    end_points: a set of activations for external use, for example summaries or
                losses.

  Raises:
    ValueError: if final_endpoint is not set to one of the predefined values,
      or if the output_stride is not 8 or 16, or if the output_stride is 8 and
      we request an end point after 'PreAuxLogits'.
  """
  if output_stride != 8 and output_stride != 16:
    raise ValueError('output_stride must be 8 or 16.')

  padding = 'SAME' if align_feature_maps else 'VALID'

  end_points = {}

  def add_and_check_final(name, net):
    end_points[name] = net
    return name == final_endpoint

  with tf.variable_scope(scope, 'InceptionResnetV2', [inputs]):
    with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d],
                        stride=1, padding='SAME'):
      # 149 x 149 x 32
      net = slim.conv2d(inputs, 32, 3, stride=2, padding=padding,
                        scope='Conv2d_1a_3x3')
      if add_and_check_final('Conv2d_1a_3x3', net): return net, end_points

      # 147 x 147 x 32
      net = slim.conv2d(net, 32, 3, padding=padding,
                        scope='Conv2d_2a_3x3')
      if add_and_check_final('Conv2d_2a_3x3', net): return net, end_points
      # 147 x 147 x 64
      net = slim.conv2d(net, 64, 3, scope='Conv2d_2b_3x3')
      if add_and_check_final('Conv2d_2b_3x3', net): return net, end_points
      # 73 x 73 x 64
      net = slim.max_pool2d(net, 3, stride=2, padding=padding,
                            scope='MaxPool_3a_3x3')
      if add_and_check_final('MaxPool_3a_3x3', net): return net, end_points
      # 73 x 73 x 80
      net = slim.conv2d(net, 80, 1, padding=padding,
                        scope='Conv2d_3b_1x1')
      if add_and_check_final('Conv2d_3b_1x1', net): return net, end_points
      # 71 x 71 x 192
      net = slim.conv2d(net, 192, 3, padding=padding,
                        scope='Conv2d_4a_3x3')
      if add_and_check_final('Conv2d_4a_3x3', net): return net, end_points
      # 35 x 35 x 192
      net = slim.max_pool2d(net, 3, stride=2, padding=padding,
                            scope='MaxPool_5a_3x3')
      if add_and_check_final('MaxPool_5a_3x3', net): return net, end_points

      # 35 x 35 x 320
      with tf.variable_scope('Mixed_5b'):
        with tf.variable_scope('Branch_0'):
          tower_conv = slim.conv2d(net, 96, 1, scope='Conv2d_1x1')
        with tf.variable_scope('Branch_1'):
          tower_conv1_0 = slim.conv2d(net, 48, 1, scope='Conv2d_0a_1x1')
          tower_conv1_1 = slim.conv2d(tower_conv1_0, 64, 5,
                                      scope='Conv2d_0b_5x5')
        with tf.variable_scope('Branch_2'):
          tower_conv2_0 = slim.conv2d(net, 64, 1, scope='Conv2d_0a_1x1')
          tower_conv2_1 = slim.conv2d(tower_conv2_0, 96, 3,
                                      scope='Conv2d_0b_3x3')
          tower_conv2_2 = slim.conv2d(tower_conv2_1, 96, 3,
                                      scope='Conv2d_0c_3x3')
        with tf.variable_scope('Branch_3'):
          tower_pool = slim.avg_pool2d(net, 3, stride=1, padding='SAME',
                                       scope='AvgPool_0a_3x3')
          tower_pool_1 = slim.conv2d(tower_pool, 64, 1,
                                     scope='Conv2d_0b_1x1')
        net = tf.concat(
            [tower_conv, tower_conv1_1, tower_conv2_2, tower_pool_1], 3)

      if add_and_check_final('Mixed_5b', net): return net, end_points
      # TODO(alemi): Register intermediate endpoints
      net = slim.repeat(net, 10, block35, scale=0.17,
                        activation_fn=activation_fn)

      # 17 x 17 x 1088 if output_stride == 8,
      # 33 x 33 x 1088 if output_stride == 16
      use_atrous = output_stride == 8

      with tf.variable_scope('Mixed_6a'):
        with tf.variable_scope('Branch_0'):
          tower_conv = slim.conv2d(net, 384, 3, stride=1 if use_atrous else 2,
                                   padding=padding,
                                   scope='Conv2d_1a_3x3')
        with tf.variable_scope('Branch_1'):
          tower_conv1_0 = slim.conv2d(net, 256, 1, scope='Conv2d_0a_1x1')
          tower_conv1_1 = slim.conv2d(tower_conv1_0, 256, 3,
                                      scope='Conv2d_0b_3x3')
          tower_conv1_2 = slim.conv2d(tower_conv1_1, 384, 3,
                                      stride=1 if use_atrous else 2,
                                      padding=padding,
                                      scope='Conv2d_1a_3x3')
        with tf.variable_scope('Branch_2'):
          tower_pool = slim.max_pool2d(net, 3, stride=1 if use_atrous else 2,
                                       padding=padding,
                                       scope='MaxPool_1a_3x3')
        net = tf.concat([tower_conv, tower_conv1_2, tower_pool], 3)

      if add_and_check_final('Mixed_6a', net): return net, end_points

      # TODO(alemi): register intermediate endpoints
      with slim.arg_scope([slim.conv2d], rate=2 if use_atrous else 1):
        net = slim.repeat(net, 20, block17, scale=0.10,
                          activation_fn=activation_fn)
      if add_and_check_final('PreAuxLogits', net): return net, end_points

      if output_stride == 8:
        # TODO(gpapan): Properly support output_stride for the rest of the net.
        raise ValueError('output_stride==8 is only supported up to the '
                         'PreAuxlogits end_point for now.')

      # 8 x 8 x 2080
      with tf.variable_scope('Mixed_7a'):
        with tf.variable_scope('Branch_0'):
          tower_conv = slim.conv2d(net, 256, 1, scope='Conv2d_0a_1x1')
          tower_conv_1 = slim.conv2d(tower_conv, 384, 3, stride=2,
                                     padding=padding,
                                     scope='Conv2d_1a_3x3')
        with tf.variable_scope('Branch_1'):
          tower_conv1 = slim.conv2d(net, 256, 1, scope='Conv2d_0a_1x1')
          tower_conv1_1 = slim.conv2d(tower_conv1, 288, 3, stride=2,
                                      padding=padding,
                                      scope='Conv2d_1a_3x3')
        with tf.variable_scope('Branch_2'):
          tower_conv2 = slim.conv2d(net, 256, 1, scope='Conv2d_0a_1x1')
          tower_conv2_1 = slim.conv2d(tower_conv2, 288, 3,
                                      scope='Conv2d_0b_3x3')
          tower_conv2_2 = slim.conv2d(tower_conv2_1, 320, 3, stride=2,
                                      padding=padding,
                                      scope='Conv2d_1a_3x3')
        with tf.variable_scope('Branch_3'):
          tower_pool = slim.max_pool2d(net, 3, stride=2,
                                       padding=padding,
                                       scope='MaxPool_1a_3x3')
        net = tf.concat(
            [tower_conv_1, tower_conv1_1, tower_conv2_2, tower_pool], 3)

      if add_and_check_final('Mixed_7a', net): return net, end_points

      # TODO(alemi): register intermediate endpoints
      net = slim.repeat(net, 9, block8, scale=0.20, activation_fn=activation_fn)
      net = block8(net, activation_fn=None)

      # 8 x 8 x 1536
      net = slim.conv2d(net, 1536, 1, scope='Conv2d_7b_1x1')
      if add_and_check_final('Conv2d_7b_1x1', net): return net, end_points

    raise ValueError('final_endpoint (%s) not recognized', final_endpoint)


def inception_resnet_v2(inputs, num_classes=1001, is_training=True,
                        dropout_keep_prob=0.8,
                        reuse=None,
                        scope='InceptionResnetV2',
                        create_aux_logits=True,
                        activation_fn=tf.nn.relu):
  """Creates the Inception Resnet V2 model.

  Args:
    inputs: a 4-D tensor of size [batch_size, height, width, 3].
      Dimension batch_size may be undefined. If create_aux_logits is false,
      also height and width may be undefined.
    num_classes: number of predicted classes. If 0 or None, the logits layer
      is omitted and the input features to the logits layer (before  dropout)
      are returned instead.
    is_training: whether is training or not.
    dropout_keep_prob: float, the fraction to keep before final layer.
    reuse: whether or not the network and its variables should be reused. To be
      able to reuse 'scope' must be given.
    scope: Optional variable_scope.
    create_aux_logits: Whether to include the auxilliary logits.
    activation_fn: Activation function for conv2d.

  Returns:
    net: the output of the logits layer (if num_classes is a non-zero integer),
      or the non-dropped-out input to the logits layer (if num_classes is 0 or
      None).
    end_points: the set of end_points from the inception model.
  """
  end_points = {}

  with tf.variable_scope(scope, 'InceptionResnetV2', [inputs],
                         reuse=reuse) as scope:
    with slim.arg_scope([slim.batch_norm, slim.dropout],
                        is_training=is_training):

      net, end_points = inception_resnet_v2_base(inputs, scope=scope,
                                                 activation_fn=activation_fn)

      if create_aux_logits and num_classes:
        with tf.variable_scope('AuxLogits'):
          aux = end_points['PreAuxLogits']
          aux = slim.avg_pool2d(aux, 5, stride=3, padding='VALID',
                                scope='Conv2d_1a_3x3')
          aux = slim.conv2d(aux, 128, 1, scope='Conv2d_1b_1x1')
          aux = slim.conv2d(aux, 768, aux.get_shape()[1:3],
                            padding='VALID', scope='Conv2d_2a_5x5')
          aux = slim.flatten(aux)
          aux = slim.fully_connected(aux, num_classes, activation_fn=None,
                                     scope='Logits')
          end_points['AuxLogits'] = aux

      with tf.variable_scope('Logits'):
        # TODO(sguada,arnoegw): Consider adding a parameter global_pool which
        # can be set to False to disable pooling here (as in resnet_*()).
        kernel_size = net.get_shape()[1:3]
        if kernel_size.is_fully_defined():
          net = slim.avg_pool2d(net, kernel_size, padding='VALID',
                                scope='AvgPool_1a_8x8')
        else:
          net = tf.reduce_mean(net, [1, 2], keep_dims=True, name='global_pool')
        end_points['global_pool'] = net
        if not num_classes:
          return net, end_points
        net = slim.flatten(net)
        net = slim.dropout(net, dropout_keep_prob, is_training=is_training,
                           scope='Dropout')
        end_points['PreLogitsFlatten'] = net
        logits = slim.fully_connected(net, num_classes, activation_fn=None,
                                      scope='Logits')
        end_points['Logits'] = logits
        end_points['Predictions'] = tf.nn.softmax(logits, name='Predictions')

    return logits, end_points
inception_resnet_v2.default_image_size = 299


def inception_resnet_v2_arg_scope(weight_decay=0.00004,
                                  batch_norm_decay=0.9997,
                                  batch_norm_epsilon=0.001,
                                  activation_fn=tf.nn.relu):
  """Returns the scope with the default parameters for inception_resnet_v2.

  Args:
    weight_decay: the weight decay for weights variables.
    batch_norm_decay: decay for the moving average of batch_norm momentums.
    batch_norm_epsilon: small float added to variance to avoid dividing by zero.
    activation_fn: Activation function for conv2d.

  Returns:
    a arg_scope with the parameters needed for inception_resnet_v2.
  """
  # Set weight_decay for weights in conv2d and fully_connected layers.
  with slim.arg_scope([slim.conv2d, slim.fully_connected],
                      weights_regularizer=slim.l2_regularizer(weight_decay),
                      biases_regularizer=slim.l2_regularizer(weight_decay)):

    batch_norm_params = {
        'decay': batch_norm_decay,
        'epsilon': batch_norm_epsilon,
        'fused': None,  # Use fused batch norm if possible.
    }
    # Set activation_fn and parameters for batch_norm.
    with slim.arg_scope([slim.conv2d], activation_fn=activation_fn,
                        normalizer_fn=slim.batch_norm,
                        normalizer_params=batch_norm_params) as scope:
      return scope

View Code

該網絡的框架接口以下：

inception_resnet_v2.default_image_size：默認圖片的大小
inception_resnet_v2_base：爲inception_resnet_v2的基礎結構實現函數，輸出inception_resnet_v2網絡中最原始的數據，默認是傳到inception_resnet_v2函數中，通常不會改變其內部。當要使用自定義的輸出層時，會將傳入本身的函數來替代inception_resnet_v2函數。
inception_resnet_v2：inception_resnet_v2網絡的實現函數，這個函數有兩個輸出，一個是預測結果logits，另外一個是輔助信息AuxLogits。輔助信息是爲了顯示或分析使用，主要包括summaries和losses。
inception_resnet_v2_arg_scope：該函數返回命名空間的名字。在外層修改或者使用模型時，可使用與模型相同的命名空間。

3.preprocessing模塊

該模塊代碼包含幾個圖片預處理文件，命名也是按照模型的名字來命名的。slim會把某一類模型經常使用的預處理函數放到一個文件裏，並命名該類模型相關的名字，並且每一個代碼文件函數結構也大體類似。例如調用inception_preprocessing函數中的代碼以下：

inception_preprocessing.preprocess_image

該函數是將傳入的圖片轉換成模型尺寸並歸一化處理。

三 slim中的數據集處理

1.準備數據集

As part of this library, we've included scripts to download several popular image datasets (listed below) and convert them to slim format.

2 下載數據集並轉換成TFRecord格式

TFRecord是TensorFlow推薦的數據集格式，與TensorFlow框架結合緊密。在TensorFlow中提供了一系列接口能夠訪問TFRecord格式，該結構存在的意義主要是爲了知足在處理海量樣本集時，須要邊執行訓練邊從硬盤上讀取數據的需求。將原始文件轉換成TFRecord的格式，而後在運行中經過多線程的方式來讀取，這樣能夠減小主線程訓練的負擔，使得訓練過程變得更高效。關於TFRecord格式詳情能夠參考文章

第十二節，TensorFlow讀取數據的幾種方法以及隊列的使用。

For each dataset, we'll need to download the raw data and convert it to TensorFlow's native TFRecord format. Each TFRecord contains a TF-Example protocol buffer. Below we demonstrate how to do this for the Flowers dataset.

$ DATA_DIR=/tmp/data/flowers
$ python download_and_convert_data.py \
    --dataset_name=flowers \
    --dataset_dir="${DATA_DIR}"

這裏有兩個關鍵點：一個是數據集(例子中的flowers)，另外一個是下載路徑(這裏是存放在/tmp/data/flowers下的)

When the script finishes you will find several TFRecord files created:

These represent the training and validation data, sharded over 5 files each. You will also find the $DATA_DIR/labels.txt file which contains the mapping from integer labels to class names.

You can use the same script to create the mnist and cifar10 datasets. However, for ImageNet, you have to follow the instructionshere. Note that you first have to sign up for an account at image-net.org. Also, the download can take several hours, and could use up to 500GB.

在這裏我詳細介紹一下執行的代碼，咱們打開download_and_convert_data.py 文件，代碼內容以下：

# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
r"""Downloads and converts a particular dataset.

Usage:
```shell

$ python download_and_convert_data.py \
    --dataset_name=mnist \
    --dataset_dir=/tmp/mnist

$ python download_and_convert_data.py \
    --dataset_name=cifar10 \
    --dataset_dir=/tmp/cifar10

$ python download_and_convert_data.py \
    --dataset_name=flowers \
    --dataset_dir=/tmp/flowers
```
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import tensorflow as tf

from datasets import download_and_convert_cifar10
from datasets import download_and_convert_flowers
from datasets import download_and_convert_mnist

FLAGS = tf.app.flags.FLAGS

tf.app.flags.DEFINE_string(
    'dataset_name',
    None,
    'The name of the dataset to convert, one of "cifar10", "flowers", "mnist".')

tf.app.flags.DEFINE_string(
    'dataset_dir',
    None,
    'The directory where the output TFRecords and temporary files are saved.')


def main(_):
  if not FLAGS.dataset_name:
    raise ValueError('You must supply the dataset name with --dataset_name')
  if not FLAGS.dataset_dir:
    raise ValueError('You must supply the dataset directory with --dataset_dir')

  if FLAGS.dataset_name == 'cifar10':
    download_and_convert_cifar10.run(FLAGS.dataset_dir)
  elif FLAGS.dataset_name == 'flowers':
    download_and_convert_flowers.run(FLAGS.dataset_dir)
  elif FLAGS.dataset_name == 'mnist':
    download_and_convert_mnist.run(FLAGS.dataset_dir)
  else:
    raise ValueError(
        'dataset_name [%s] was not recognized.' % FLAGS.dataset_name)

if __name__ == '__main__':
  tf.app.run()

View Code

程序使用過 tf.app.run()函數執行的，該函數會解析命令行參數，並傳遞給flags。當咱們執行上面那一句命令行時，即等於FLAGS.dataset_name='flowers'，FLAGS.dataset_dir=‘/tmp/data/flowers’
執行main函數，而後執行 download_and_convert_flowers.run(FLAGS.dataset_dir)該函數。該函數實現：開始下載數據集，並解壓數據集，而後再轉換成TFRecord格式，刪除數據集文件。

download_and_convert_flowers.run函數位於download_and_convert_flowers.py文件下，run()函數代碼以下：

def run(dataset_dir):
  """Runs the download and conversion operation.

  Args:
    dataset_dir: The dataset directory where the dataset is stored.
  """
  if not tf.gfile.Exists(dataset_dir):
    tf.gfile.MakeDirs(dataset_dir)

  if _dataset_exists(dataset_dir):
    print('Dataset files already exist. Exiting without re-creating them.')
    return

  dataset_utils.download_and_uncompress_tarball(_DATA_URL, dataset_dir)
  photo_filenames, class_names = _get_filenames_and_classes(dataset_dir)
  class_names_to_ids = dict(zip(class_names, range(len(class_names))))

  # Divide into train and test:
  random.seed(_RANDOM_SEED)
  random.shuffle(photo_filenames)
  training_filenames = photo_filenames[_NUM_VALIDATION:]
  validation_filenames = photo_filenames[:_NUM_VALIDATION]

  # First, convert the training and validation sets.
  _convert_dataset('train', training_filenames, class_names_to_ids,
                   dataset_dir)
  _convert_dataset('validation', validation_filenames, class_names_to_ids,
                   dataset_dir)

  # Finally, write the labels file:
  labels_to_class_names = dict(zip(range(len(class_names)), class_names))
  dataset_utils.write_label_file(labels_to_class_names, dataset_dir)

  _clean_up_temporary_files(dataset_dir)
  print('\nFinished converting the Flowers dataset!')

在這裏只粗略的解釋一下代碼的執行流程：

判斷dataset_dir文件夾是否存在，不存在則建立。
檢查dataset_dir文件夾下是否存在全部的TFRecord文件，存在則退出。
從_DATA_URL網址下載數據集，並解壓到dataset_dir文件下下。
獲取全部圖片的全路徑和類別名，注意這裏文件夾均是以類別名稱命名的，因此全路徑中就包含了類別。

建立標籤->類別名的映射字典。
打亂文件名，而後劃分驗證集和訓練集。
把訓練集每個樣本分別以TF-Example 格式寫入TFRecord文件中。
把驗證集每個樣本分別以TF-Example 格式寫入TFRecord文件中。

def image_to_tfexample(image_data, image_format, height, width, class_id):
  return tf.train.Example(features=tf.train.Features(feature={
      'image/encoded': bytes_feature(image_data),
      'image/format': bytes_feature(image_format),
      'image/class/label': int64_feature(class_id),
      'image/height': int64_feature(height),
      'image/width': int64_feature(width),
  }))

生成標籤文件.txt。每行數據格式爲標籤：類別名(後面是換行符\n)
清除數據集.tgz文件和解壓的文件。

3 利用slim讀取TFRecord中的數據

咱們已經建立好了TFRecord文件，下面就能夠讀取文件中的數據了。

# -*- coding: utf-8 -*-
"""
Created on Fri Jun  8 08:52:30 2018

@author: zy
"""

'''
導入flowers數據集
'''

from datasets import download_and_convert_flowers
from preprocessing import vgg_preprocessing
from datasets import flowers
import tensorflow as tf


slim = tf.contrib.slim


def read_flower_image_and_label(dataset_dir,is_training=False):
    '''
    下載flower_photos.tgz數據集  
    切分訓練集和驗證集
    並將數據轉換成TFRecord格式  5個訓練數據文件(3320)，5個驗證數據文件(350)，還有一個標籤文件(存放每一個數字標籤對應的類名)
            
    args:
        dataset_dir:數據集所在的目錄
        is_training：設置爲TRue，表示加載訓練數據集，不然加載驗證集
    return:
        image,label:返回隨機讀取的一張圖片，和對應的標籤
    '''    
    download_and_convert_flowers.run(dataset_dir)    
    '''
    利用slim讀取TFRecord中的數據
    '''
    #選擇數據集train
    if is_training:        
        dataset = flowers.get_split(split_name = 'train',dataset_dir=dataset_dir)
    else:
        dataset = flowers.get_split(split_name = 'validation',dataset_dir=dataset_dir)
    
    #建立一個數據provider
    provider = slim.dataset_data_provider.DatasetDataProvider(dataset)
    
    #經過provider的get隨機獲取一條樣本數據 返回的是兩個張量
    [image,label] = provider.get(['image','label'])

    return image,label

上面代碼中，先引入頭文件，而後建立provider，經過get來獲取image與label兩個張量。這是並無真的讀取到數據，只是構建圖的過程，具體數據須要經過session啓動隊列線程後才能夠。

下面咱們啓動session讀取數據。

if __name__ == '__main__':
    #test()
    #讀取一張圖片，以及對應的標籤 
    image,label = read_flower_image_and_label('./datasets/data/flowers')
    
    '''
    啓動session，讀取數據
    '''
    with tf.Session() as sess:            
        sess.run(tf.global_variables_initializer())
        
        #建立一個協調器，管理線程
        coord = tf.train.Coordinator()  
        
        #啓動QueueRunner, 此時文件名纔開始進隊。
        threads=tf.train.start_queue_runners(sess=sess,coord=coord)                      

        img, lab = sess.run([image, label])                  
        plt.imshow(img)                                
        plt.title('Original image')   
        plt.show()

        #終止線程
        coord.request_stop()
        coord.join(threads)

若是咱們想一次讀取多張圖片怎麼辦？

TFRecord格式每一行樣本定義爲：

def image_to_tfexample(image_data, image_format, height, width, class_id):
  return tf.train.Example(features=tf.train.Features(feature={
      'image/encoded': bytes_feature(image_data),
      'image/format': bytes_feature(image_format),
      'image/class/label': int64_feature(class_id),
      'image/height': int64_feature(height),
      'image/width': int64_feature(width),
  }))

假設咱們訓練時要從生成的5個TFRecord文件中讀取數據，而後組合成batch。

將example反序列化成存儲以前的格式。由tf完成

 keys_to_features = {
      'image/encoded': tf.FixedLenFeature((), tf.string, default_value=''),
      'image/format': tf.FixedLenFeature((), tf.string, default_value='png'),
      'image/class/label': tf.FixedLenFeature(
          [], tf.int64, default_value=tf.zeros([], dtype=tf.int64)),
  }

將反序列化的數據組裝成更高級的格式。由slim完成

items_to_handlers = {
      'image': slim.tfexample_decoder.Image('image/encoded','image/format'),
      'label': slim.tfexample_decoder.Tensor('image/class/label'),
  }

解碼器，進行解碼

decoder = slim.tfexample_decoder.TFExampleDecoder(
      keys_to_features, items_to_handlers)

dataset對象定義了數據集的文件位置，解碼方式等元信息

dataset = slim.dataset.Dataset(
      data_sources=file_pattern,
      reader=tf.TFRecordReader,
      decoder=decoder,
      num_samples=SPLITS_TO_SIZES[split_name],#訓練數據的總數
      items_to_descriptions=_ITEMS_TO_DESCRIPTIONS,
      num_classes=_NUM_CLASSES,
      labels_to_names=labels_to_names #字典形式，格式爲：id:class_call,
      )

provider對象根據dataset信息讀取數據

provider = slim.dataset_data_provider.DatasetDataProvider(
          dataset,
          num_readers=FLAGS.num_readers,
          common_queue_capacity=20 * FLAGS.batch_size,
          common_queue_min=10 * FLAGS.batch_size)

獲取數據，獲取到的數據是單個數據，還須要對數據進行預處理，組合數據

 [image, label] = provider.get(['image', 'label'])
 # 圖像預處理
 image = preprocessing_image(image, train_image_size, train_image_size)

 images, labels = tf.train.batch(
              [image, label],
              batch_size=FLAGS.batch_size,
              num_threads=FLAGS.num_preprocessing_threads,
              capacity=5 * FLAGS.batch_size)
 labels = slim.one_hot_encoding(
              labels, dataset.num_classes - FLAGS.labels_offset)

因爲DatasetDataProvider讀取到的一個樣本就是隨機的，所以在後面獲取批量數據的時候再也不使用tf.train.shuffle_batch函數。一次讀取batch_size個樣本的代碼以下：

def get_batch_images_and_label(dataset_dir,batch_size,num_classes,is_training=False,output_height=224, output_width=224,num_threads=10):
    '''
    每次取出batch_size個樣本
    
    注意：這裏預處理調用的是slim庫圖片預處理的函數，例如：若是你使用的vgg網絡，就調用vgg網絡的圖像預處理函數
          若是你使用的是本身定義的網絡，則能夠本身寫適合本身圖像的預處理函數，好比歸一化處理也可使用其餘網絡已經寫好的預處理函數
    
    args:
         dataset_dir:數據集所在的目錄
         batch_size:一次取出的樣本數量
         num_classes：輸出的類別 用於對標籤one_hot編碼
         is_training：設置爲TRue，表示加載訓練數據集，不然加載驗證集
         output_height：輸出圖片高度
         output_width：輸出圖片寬
         
     return:
        images,labels:返回隨機讀取的batch_size張圖片，和對應的標籤one_hot編碼
    '''
    #獲取單張圖像和標籤
    image,label = read_flower_image_and_label(dataset_dir,is_training)   
    # 圖像預處理 這裏要求圖片數據是tf.float32類型的
    image = vgg_preprocessing.preprocess_image(image, output_height, output_width,is_training=is_training)
    
    #縮放處理
    #image = tf.image.convert_image_dtype(image, dtype=tf.float32)  
    #image = tf.image.resize_image_with_crop_or_pad(image, output_height, output_width)
    
    #  shuffle_batch 函數會將數據順序打亂
    #  bacth 函數不會將數據順序打亂    
    images, labels = tf.train.batch(
                [image, label],
                batch_size = batch_size,
                capacity=5 * batch_size, 
                num_threads = num_threads)    
        
    #one-hot編碼
    labels = slim.one_hot_encoding(labels,num_classes)
    
    return images,labels

至此，就可使用images做爲神經網絡的輸入，使用labels計算損失函數等操做。

四在slim中訓練模型

slim模塊共享了模型的訓練代碼，使用者再也不須要關注模型代碼，只需經過命令行方式便可完成訓練、微調、測試等任務。

對於linux用戶，在slim的scripts文件夾下還提供了模型下載、訓練、預訓練、微調、測試等一條龍的完整shell腳本，若是你是windows，也能夠在命令行下一條一條地複製命令並執行。

1.從頭訓練

訓練模型的代碼被放在slim下的train_image_classifier.py文件裏，在該文件所在路徑下，這裏使用flower數據集來訓練Inception_v3網絡模型。在命令行下執行：

python train_image_classifier.py  --train_dir=./log/train_logs --dataset_name=flowers --dataset_split_name=train --dataset_dir=./datasets/data/flowers --model_name=inception_v3

2 預訓練模型

預訓練是在別人訓練好的模型上進行二次訓練，以獲得本身想要的模型。能夠幫你省去大量的時間。一些高質量的模型都是經過了大量的數據樣本訓練而來。Github上提供了不少訓練好的模型(在Imagenet數據集)，能夠在https://github.com/tensorflow/models/tree/master/research/slim/#Pretrained中下載。

Neural nets work best when they have many parameters, making them powerful function approximators. However, this means they must be trained on very large datasets. Because training models from scratch can be a very computationally intensive process requiring days or even weeks, we provide various pre-trained models, as listed below. These CNNs have been trained on the ILSVRC-2012-CLS image classification dataset.

In the table below, we list each model, the corresponding TensorFlow model file, the link to the model checkpoint, and the top 1 and top 5 accuracy (on the imagenet test set). Note that the VGG and ResNet V1 parameters have been converted from their original caffe formats (here and here), whereas the Inception and ResNet V2 parameters have been trained internally at Google. Also be aware that these accuracies were computed by evaluating using a single image crop. Some academic papers report higher accuracy by using multiple crops at multiple scales.

下載完預訓練模型後，只要在上一節命令中添加一個參數checkpoint_path便可。

--checkpoint_path = 模型路徑

checkpoint_path 裏的模型是用於預訓練模型的參數初始化，在訓練過程當中不會改變，新產生的模型會被保存在--train_dir路徑下。
注意：預訓練時使用的樣本必須與原來的輸入尺寸和輸出的分類個數一致。這些下載的模型都是分紅1000類的，若是你不想分這麼多類，可使用下面的微調方法。

3微調fine-tuning

上述的預訓練模型都是在imagenet上訓練的，最終輸出的是1000個分類，若是咱們想使用預訓練模型訓練本身的數據集，就要微調了。

在微調的過程當中，須要將原有模型中的最後一層去掉，換成本身的數據集對應的分類層，例如咱們要訓練flowers數據集，就須要將1000個輸出換成10個輸出。

具體作法以下：

經過參數--checkpoint_exclude_scopes指定載入預訓練時哪一層的權重不被載入。
再經過--trainable_scopes參數指定對哪一層的參數進行訓練，當--trainable_scopes出現時，沒有被指定訓練的參數將在訓練中被凍結。

舉例：使用inception_v3的模型進行微調，使其能夠訓練flowers數據集。將下載好的模型inception_v3.ckpt解壓後放在當前目錄文件夾inception_v3下，經過cmd進入命令行來到slim文件下，運行命令：

python train_image_classifier.py 
    --train_dir=./log/in3--dataset_dir=./datasets/data/flowers--dataset_name=flowers 
    --dataset_split_name=train 
    --model_name=inception_v3 
    --checkpoint_path=./inception_v3/inception_v3.ckpt--checkpoint_exclude_scopes=InceptionV3/Logits,InceptionV3/AuxLogits 
    --trainable_scopes=InceptionV3/Logits,InceptionV3/AuxLogits

在例子中，--checkpoint_path裏的模型會被載入，將權重初始化成模型裏的參數，同時--checkpoint_exclude_scopes限制了最後一層沒有被初始化成模型裏的參數。--trainable_scopes指定了只需訓練最後新加的一層，這樣在訓練過程當中被凍結的其它參數具備原來模型訓練好的合適值，而新加入的一層則經過迭代在不斷的優化本身的參數。

在微調過程當中，還能夠經過在上面命令中加入：

--max_number_of_steps=500

來指定訓練步數。若是沒有指定訓練步數，默認會一致訓練下去。更多的參數，能夠去看train_image_classifier.py源碼。另外Script中還有使用模型來識別圖片的例子。

4 評估模型

To evaluate the performance of a model (whether pretrained or your own), you can use the eval_image_classifier.py script, as shown below.

Below we give an example of downloading the pretrained inception model and evaluating it on the imagenet dataset.

python eval_image_classifier.py 
    --alsologtostderr 
    --checkpoint_path=./log/in3/model.ckpt
    --dataset_dir=./datasets/data/flowers
    --dataset_name=flowers
    --dataset_split_name=validation 
    --model_name=inception_v3

指定的./log/in3/model.ckpt，爲在微調中訓練出來的模型文件。