Google在TensorFlow1.0,以後推出了一個叫slim的庫,TF-slim是TensorFlow的一個新的輕量級的高級API接口。這個模塊是在16年新推出的,其主要目的是來作所謂的「代碼瘦身」。它相似咱們在TensorFlow模塊中所介紹的tf.contrib.lyers模塊,將不少常見的TensorFlow函數進行了二次封裝,使得代碼變得更加簡潔,特別適用於構建複雜結構的深度神經網絡,它能夠用了定義、訓練、和評估複雜的模型。html
這裏咱們爲何要過來介紹這一節的內容呢?主要是由於TensorFlow的models模塊裏提供了大量用slim寫好的網絡模型結構代碼,以及用該代碼訓練出來的模型檢查點文件,能夠做爲咱們預訓練模型來使用。所以咱們須要會使用slim庫。python
爲了可以使用models中的代碼,須要先驗證下咱們的TensorFlow版本是否集成了slim模塊。接着從GitHub上將models代碼下載下來:linux
在使用slim以前,要測試本地的tf.contrib.slim模塊是否有效,在命令行中輸入以下命令:git
python -c "import tensorflow.contrib.slim as slim; eval = slim.evaluation.evaluate_once"
若是沒有任何錯誤,則代表TF-Slim是能夠工做的。github
To use TF-Slim for image classification, you also have to install the TF-Slim image models library, which is not part of the core TF library. To do this, check out the tensorflow/models repository as follows:shell
cd $HOME/workspace
git clone https://github.com/tensorflow/models/
This will put the TF-Slim image models library in $HOME/workspace/models/research/slim
. (It will also create a directory calledmodels/inception, which contains an older version of slim; you can safely ignore this.)express
To verify that this has worked, execute the following commands; it should run without raising any errors.apache
cd $HOME/workspace/models/research/slim python -c "from nets import cifarnet; mynet = cifarnet.cifarnet"
我使用的是window操做系統,我直接從https://github.com/tensorflow/models/網址下載了該模塊:windows
slim位於\models-master\research\slim路徑下,一共有5個文件夾:api
在這裏重點介紹datasets,nets,preprocessing三個文件夾。
datasets裏面存放着經常使用的圖片訓練數據集相關的代碼。主要支持的數據集有cifar十、flowers、mnist、imagenet。
代碼文件的名稱和數據集相對應,可使用這些代碼下載或獲取數據集中的數據。以imagenet爲例,可使用以下函數從網上獲取imagenet標籤。
imagenet_map = imagenet.create_readable_names_for_imagenet_labels()
上面代碼返回的是imagenet中1000個類的分類標籤名字(與樣本序列對應)。
該文件夾下面包含各類網絡模塊:
每一個網絡模型文件都是以本身的名字命名的,並且裏面的代碼結構框架也大體相同,以inception_resnet_v2爲例:
# Copyright 2016 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # ============================================================================== """Contains the definition of the Inception Resnet V2 architecture. As described in http://arxiv.org/abs/1602.07261. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, Alex Alemi """ from __future__ import absolute_import from __future__ import division from __future__ import print_function import tensorflow as tf slim = tf.contrib.slim def block35(net, scale=1.0, activation_fn=tf.nn.relu, scope=None, reuse=None): """Builds the 35x35 resnet block.""" with tf.variable_scope(scope, 'Block35', [net], reuse=reuse): with tf.variable_scope('Branch_0'): tower_conv = slim.conv2d(net, 32, 1, scope='Conv2d_1x1') with tf.variable_scope('Branch_1'): tower_conv1_0 = slim.conv2d(net, 32, 1, scope='Conv2d_0a_1x1') tower_conv1_1 = slim.conv2d(tower_conv1_0, 32, 3, scope='Conv2d_0b_3x3') with tf.variable_scope('Branch_2'): tower_conv2_0 = slim.conv2d(net, 32, 1, scope='Conv2d_0a_1x1') tower_conv2_1 = slim.conv2d(tower_conv2_0, 48, 3, scope='Conv2d_0b_3x3') tower_conv2_2 = slim.conv2d(tower_conv2_1, 64, 3, scope='Conv2d_0c_3x3') mixed = tf.concat(axis=3, values=[tower_conv, tower_conv1_1, tower_conv2_2]) up = slim.conv2d(mixed, net.get_shape()[3], 1, normalizer_fn=None, activation_fn=None, scope='Conv2d_1x1') scaled_up = up * scale if activation_fn == tf.nn.relu6: # Use clip_by_value to simulate bandpass activation. scaled_up = tf.clip_by_value(scaled_up, -6.0, 6.0) net += scaled_up if activation_fn: net = activation_fn(net) return net def block17(net, scale=1.0, activation_fn=tf.nn.relu, scope=None, reuse=None): """Builds the 17x17 resnet block.""" with tf.variable_scope(scope, 'Block17', [net], reuse=reuse): with tf.variable_scope('Branch_0'): tower_conv = slim.conv2d(net, 192, 1, scope='Conv2d_1x1') with tf.variable_scope('Branch_1'): tower_conv1_0 = slim.conv2d(net, 128, 1, scope='Conv2d_0a_1x1') tower_conv1_1 = slim.conv2d(tower_conv1_0, 160, [1, 7], scope='Conv2d_0b_1x7') tower_conv1_2 = slim.conv2d(tower_conv1_1, 192, [7, 1], scope='Conv2d_0c_7x1') mixed = tf.concat(axis=3, values=[tower_conv, tower_conv1_2]) up = slim.conv2d(mixed, net.get_shape()[3], 1, normalizer_fn=None, activation_fn=None, scope='Conv2d_1x1') scaled_up = up * scale if activation_fn == tf.nn.relu6: # Use clip_by_value to simulate bandpass activation. scaled_up = tf.clip_by_value(scaled_up, -6.0, 6.0) net += scaled_up if activation_fn: net = activation_fn(net) return net def block8(net, scale=1.0, activation_fn=tf.nn.relu, scope=None, reuse=None): """Builds the 8x8 resnet block.""" with tf.variable_scope(scope, 'Block8', [net], reuse=reuse): with tf.variable_scope('Branch_0'): tower_conv = slim.conv2d(net, 192, 1, scope='Conv2d_1x1') with tf.variable_scope('Branch_1'): tower_conv1_0 = slim.conv2d(net, 192, 1, scope='Conv2d_0a_1x1') tower_conv1_1 = slim.conv2d(tower_conv1_0, 224, [1, 3], scope='Conv2d_0b_1x3') tower_conv1_2 = slim.conv2d(tower_conv1_1, 256, [3, 1], scope='Conv2d_0c_3x1') mixed = tf.concat(axis=3, values=[tower_conv, tower_conv1_2]) up = slim.conv2d(mixed, net.get_shape()[3], 1, normalizer_fn=None, activation_fn=None, scope='Conv2d_1x1') scaled_up = up * scale if activation_fn == tf.nn.relu6: # Use clip_by_value to simulate bandpass activation. scaled_up = tf.clip_by_value(scaled_up, -6.0, 6.0) net += scaled_up if activation_fn: net = activation_fn(net) return net def inception_resnet_v2_base(inputs, final_endpoint='Conv2d_7b_1x1', output_stride=16, align_feature_maps=False, scope=None, activation_fn=tf.nn.relu): """Inception model from http://arxiv.org/abs/1602.07261. Constructs an Inception Resnet v2 network from inputs to the given final endpoint. This method can construct the network up to the final inception block Conv2d_7b_1x1. Args: inputs: a tensor of size [batch_size, height, width, channels]. final_endpoint: specifies the endpoint to construct the network up to. It can be one of ['Conv2d_1a_3x3', 'Conv2d_2a_3x3', 'Conv2d_2b_3x3', 'MaxPool_3a_3x3', 'Conv2d_3b_1x1', 'Conv2d_4a_3x3', 'MaxPool_5a_3x3', 'Mixed_5b', 'Mixed_6a', 'PreAuxLogits', 'Mixed_7a', 'Conv2d_7b_1x1'] output_stride: A scalar that specifies the requested ratio of input to output spatial resolution. Only supports 8 and 16. align_feature_maps: When true, changes all the VALID paddings in the network to SAME padding so that the feature maps are aligned. scope: Optional variable_scope. activation_fn: Activation function for block scopes. Returns: tensor_out: output tensor corresponding to the final_endpoint. end_points: a set of activations for external use, for example summaries or losses. Raises: ValueError: if final_endpoint is not set to one of the predefined values, or if the output_stride is not 8 or 16, or if the output_stride is 8 and we request an end point after 'PreAuxLogits'. """ if output_stride != 8 and output_stride != 16: raise ValueError('output_stride must be 8 or 16.') padding = 'SAME' if align_feature_maps else 'VALID' end_points = {} def add_and_check_final(name, net): end_points[name] = net return name == final_endpoint with tf.variable_scope(scope, 'InceptionResnetV2', [inputs]): with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d], stride=1, padding='SAME'): # 149 x 149 x 32 net = slim.conv2d(inputs, 32, 3, stride=2, padding=padding, scope='Conv2d_1a_3x3') if add_and_check_final('Conv2d_1a_3x3', net): return net, end_points # 147 x 147 x 32 net = slim.conv2d(net, 32, 3, padding=padding, scope='Conv2d_2a_3x3') if add_and_check_final('Conv2d_2a_3x3', net): return net, end_points # 147 x 147 x 64 net = slim.conv2d(net, 64, 3, scope='Conv2d_2b_3x3') if add_and_check_final('Conv2d_2b_3x3', net): return net, end_points # 73 x 73 x 64 net = slim.max_pool2d(net, 3, stride=2, padding=padding, scope='MaxPool_3a_3x3') if add_and_check_final('MaxPool_3a_3x3', net): return net, end_points # 73 x 73 x 80 net = slim.conv2d(net, 80, 1, padding=padding, scope='Conv2d_3b_1x1') if add_and_check_final('Conv2d_3b_1x1', net): return net, end_points # 71 x 71 x 192 net = slim.conv2d(net, 192, 3, padding=padding, scope='Conv2d_4a_3x3') if add_and_check_final('Conv2d_4a_3x3', net): return net, end_points # 35 x 35 x 192 net = slim.max_pool2d(net, 3, stride=2, padding=padding, scope='MaxPool_5a_3x3') if add_and_check_final('MaxPool_5a_3x3', net): return net, end_points # 35 x 35 x 320 with tf.variable_scope('Mixed_5b'): with tf.variable_scope('Branch_0'): tower_conv = slim.conv2d(net, 96, 1, scope='Conv2d_1x1') with tf.variable_scope('Branch_1'): tower_conv1_0 = slim.conv2d(net, 48, 1, scope='Conv2d_0a_1x1') tower_conv1_1 = slim.conv2d(tower_conv1_0, 64, 5, scope='Conv2d_0b_5x5') with tf.variable_scope('Branch_2'): tower_conv2_0 = slim.conv2d(net, 64, 1, scope='Conv2d_0a_1x1') tower_conv2_1 = slim.conv2d(tower_conv2_0, 96, 3, scope='Conv2d_0b_3x3') tower_conv2_2 = slim.conv2d(tower_conv2_1, 96, 3, scope='Conv2d_0c_3x3') with tf.variable_scope('Branch_3'): tower_pool = slim.avg_pool2d(net, 3, stride=1, padding='SAME', scope='AvgPool_0a_3x3') tower_pool_1 = slim.conv2d(tower_pool, 64, 1, scope='Conv2d_0b_1x1') net = tf.concat( [tower_conv, tower_conv1_1, tower_conv2_2, tower_pool_1], 3) if add_and_check_final('Mixed_5b', net): return net, end_points # TODO(alemi): Register intermediate endpoints net = slim.repeat(net, 10, block35, scale=0.17, activation_fn=activation_fn) # 17 x 17 x 1088 if output_stride == 8, # 33 x 33 x 1088 if output_stride == 16 use_atrous = output_stride == 8 with tf.variable_scope('Mixed_6a'): with tf.variable_scope('Branch_0'): tower_conv = slim.conv2d(net, 384, 3, stride=1 if use_atrous else 2, padding=padding, scope='Conv2d_1a_3x3') with tf.variable_scope('Branch_1'): tower_conv1_0 = slim.conv2d(net, 256, 1, scope='Conv2d_0a_1x1') tower_conv1_1 = slim.conv2d(tower_conv1_0, 256, 3, scope='Conv2d_0b_3x3') tower_conv1_2 = slim.conv2d(tower_conv1_1, 384, 3, stride=1 if use_atrous else 2, padding=padding, scope='Conv2d_1a_3x3') with tf.variable_scope('Branch_2'): tower_pool = slim.max_pool2d(net, 3, stride=1 if use_atrous else 2, padding=padding, scope='MaxPool_1a_3x3') net = tf.concat([tower_conv, tower_conv1_2, tower_pool], 3) if add_and_check_final('Mixed_6a', net): return net, end_points # TODO(alemi): register intermediate endpoints with slim.arg_scope([slim.conv2d], rate=2 if use_atrous else 1): net = slim.repeat(net, 20, block17, scale=0.10, activation_fn=activation_fn) if add_and_check_final('PreAuxLogits', net): return net, end_points if output_stride == 8: # TODO(gpapan): Properly support output_stride for the rest of the net. raise ValueError('output_stride==8 is only supported up to the ' 'PreAuxlogits end_point for now.') # 8 x 8 x 2080 with tf.variable_scope('Mixed_7a'): with tf.variable_scope('Branch_0'): tower_conv = slim.conv2d(net, 256, 1, scope='Conv2d_0a_1x1') tower_conv_1 = slim.conv2d(tower_conv, 384, 3, stride=2, padding=padding, scope='Conv2d_1a_3x3') with tf.variable_scope('Branch_1'): tower_conv1 = slim.conv2d(net, 256, 1, scope='Conv2d_0a_1x1') tower_conv1_1 = slim.conv2d(tower_conv1, 288, 3, stride=2, padding=padding, scope='Conv2d_1a_3x3') with tf.variable_scope('Branch_2'): tower_conv2 = slim.conv2d(net, 256, 1, scope='Conv2d_0a_1x1') tower_conv2_1 = slim.conv2d(tower_conv2, 288, 3, scope='Conv2d_0b_3x3') tower_conv2_2 = slim.conv2d(tower_conv2_1, 320, 3, stride=2, padding=padding, scope='Conv2d_1a_3x3') with tf.variable_scope('Branch_3'): tower_pool = slim.max_pool2d(net, 3, stride=2, padding=padding, scope='MaxPool_1a_3x3') net = tf.concat( [tower_conv_1, tower_conv1_1, tower_conv2_2, tower_pool], 3) if add_and_check_final('Mixed_7a', net): return net, end_points # TODO(alemi): register intermediate endpoints net = slim.repeat(net, 9, block8, scale=0.20, activation_fn=activation_fn) net = block8(net, activation_fn=None) # 8 x 8 x 1536 net = slim.conv2d(net, 1536, 1, scope='Conv2d_7b_1x1') if add_and_check_final('Conv2d_7b_1x1', net): return net, end_points raise ValueError('final_endpoint (%s) not recognized', final_endpoint) def inception_resnet_v2(inputs, num_classes=1001, is_training=True, dropout_keep_prob=0.8, reuse=None, scope='InceptionResnetV2', create_aux_logits=True, activation_fn=tf.nn.relu): """Creates the Inception Resnet V2 model. Args: inputs: a 4-D tensor of size [batch_size, height, width, 3]. Dimension batch_size may be undefined. If create_aux_logits is false, also height and width may be undefined. num_classes: number of predicted classes. If 0 or None, the logits layer is omitted and the input features to the logits layer (before dropout) are returned instead. is_training: whether is training or not. dropout_keep_prob: float, the fraction to keep before final layer. reuse: whether or not the network and its variables should be reused. To be able to reuse 'scope' must be given. scope: Optional variable_scope. create_aux_logits: Whether to include the auxilliary logits. activation_fn: Activation function for conv2d. Returns: net: the output of the logits layer (if num_classes is a non-zero integer), or the non-dropped-out input to the logits layer (if num_classes is 0 or None). end_points: the set of end_points from the inception model. """ end_points = {} with tf.variable_scope(scope, 'InceptionResnetV2', [inputs], reuse=reuse) as scope: with slim.arg_scope([slim.batch_norm, slim.dropout], is_training=is_training): net, end_points = inception_resnet_v2_base(inputs, scope=scope, activation_fn=activation_fn) if create_aux_logits and num_classes: with tf.variable_scope('AuxLogits'): aux = end_points['PreAuxLogits'] aux = slim.avg_pool2d(aux, 5, stride=3, padding='VALID', scope='Conv2d_1a_3x3') aux = slim.conv2d(aux, 128, 1, scope='Conv2d_1b_1x1') aux = slim.conv2d(aux, 768, aux.get_shape()[1:3], padding='VALID', scope='Conv2d_2a_5x5') aux = slim.flatten(aux) aux = slim.fully_connected(aux, num_classes, activation_fn=None, scope='Logits') end_points['AuxLogits'] = aux with tf.variable_scope('Logits'): # TODO(sguada,arnoegw): Consider adding a parameter global_pool which # can be set to False to disable pooling here (as in resnet_*()). kernel_size = net.get_shape()[1:3] if kernel_size.is_fully_defined(): net = slim.avg_pool2d(net, kernel_size, padding='VALID', scope='AvgPool_1a_8x8') else: net = tf.reduce_mean(net, [1, 2], keep_dims=True, name='global_pool') end_points['global_pool'] = net if not num_classes: return net, end_points net = slim.flatten(net) net = slim.dropout(net, dropout_keep_prob, is_training=is_training, scope='Dropout') end_points['PreLogitsFlatten'] = net logits = slim.fully_connected(net, num_classes, activation_fn=None, scope='Logits') end_points['Logits'] = logits end_points['Predictions'] = tf.nn.softmax(logits, name='Predictions') return logits, end_points inception_resnet_v2.default_image_size = 299 def inception_resnet_v2_arg_scope(weight_decay=0.00004, batch_norm_decay=0.9997, batch_norm_epsilon=0.001, activation_fn=tf.nn.relu): """Returns the scope with the default parameters for inception_resnet_v2. Args: weight_decay: the weight decay for weights variables. batch_norm_decay: decay for the moving average of batch_norm momentums. batch_norm_epsilon: small float added to variance to avoid dividing by zero. activation_fn: Activation function for conv2d. Returns: a arg_scope with the parameters needed for inception_resnet_v2. """ # Set weight_decay for weights in conv2d and fully_connected layers. with slim.arg_scope([slim.conv2d, slim.fully_connected], weights_regularizer=slim.l2_regularizer(weight_decay), biases_regularizer=slim.l2_regularizer(weight_decay)): batch_norm_params = { 'decay': batch_norm_decay, 'epsilon': batch_norm_epsilon, 'fused': None, # Use fused batch norm if possible. } # Set activation_fn and parameters for batch_norm. with slim.arg_scope([slim.conv2d], activation_fn=activation_fn, normalizer_fn=slim.batch_norm, normalizer_params=batch_norm_params) as scope: return scope
該網絡的框架接口以下:
該模塊代碼包含幾個圖片預處理文件,命名也是按照模型的名字來命名的。slim會把某一類模型經常使用的預處理函數放到一個文件裏,並命名該類模型相關的名字,並且每一個代碼文件函數結構也大體類似。例如調用inception_preprocessing函數中的代碼以下:
inception_preprocessing.preprocess_image
該函數是將傳入的圖片轉換成模型尺寸並歸一化處理。
As part of this library, we've included scripts to download several popular image datasets (listed below) and convert them to slim format.
TFRecord是TensorFlow推薦的數據集格式,與TensorFlow框架結合緊密。在TensorFlow中提供了一系列接口能夠訪問TFRecord格式,該結構存在的意義主要是爲了知足在處理海量樣本集時,須要邊執行訓練邊從硬盤上讀取數據的需求。將原始文件轉換成TFRecord的格式,而後在運行中經過多線程的方式來讀取,這樣能夠減小主線程訓練的負擔,使得訓練過程變得更高效。關於TFRecord格式詳情能夠參考文章
第十二節,TensorFlow讀取數據的幾種方法以及隊列的使用。
For each dataset, we'll need to download the raw data and convert it to TensorFlow's native TFRecord format. Each TFRecord contains a TF-Example protocol buffer. Below we demonstrate how to do this for the Flowers dataset.
$ DATA_DIR=/tmp/data/flowers $ python download_and_convert_data.py \ --dataset_name=flowers \ --dataset_dir="${DATA_DIR}"
這裏有兩個關鍵點:一個是數據集(例子中的flowers),另外一個是下載路徑(這裏是存放在/tmp/data/flowers下的)
When the script finishes you will find several TFRecord files created:
These represent the training and validation data, sharded over 5 files each. You will also find the $DATA_DIR/labels.txt
file which contains the mapping from integer labels to class names.
You can use the same script to create the mnist and cifar10 datasets. However, for ImageNet, you have to follow the instructionshere. Note that you first have to sign up for an account at image-net.org. Also, the download can take several hours, and could use up to 500GB.
在這裏我詳細介紹一下執行的代碼,咱們打開download_and_convert_data.py 文件,代碼內容以下:
# Copyright 2016 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # ============================================================================== r"""Downloads and converts a particular dataset. Usage: ```shell $ python download_and_convert_data.py \ --dataset_name=mnist \ --dataset_dir=/tmp/mnist $ python download_and_convert_data.py \ --dataset_name=cifar10 \ --dataset_dir=/tmp/cifar10 $ python download_and_convert_data.py \ --dataset_name=flowers \ --dataset_dir=/tmp/flowers ``` """ from __future__ import absolute_import from __future__ import division from __future__ import print_function import tensorflow as tf from datasets import download_and_convert_cifar10 from datasets import download_and_convert_flowers from datasets import download_and_convert_mnist FLAGS = tf.app.flags.FLAGS tf.app.flags.DEFINE_string( 'dataset_name', None, 'The name of the dataset to convert, one of "cifar10", "flowers", "mnist".') tf.app.flags.DEFINE_string( 'dataset_dir', None, 'The directory where the output TFRecords and temporary files are saved.') def main(_): if not FLAGS.dataset_name: raise ValueError('You must supply the dataset name with --dataset_name') if not FLAGS.dataset_dir: raise ValueError('You must supply the dataset directory with --dataset_dir') if FLAGS.dataset_name == 'cifar10': download_and_convert_cifar10.run(FLAGS.dataset_dir) elif FLAGS.dataset_name == 'flowers': download_and_convert_flowers.run(FLAGS.dataset_dir) elif FLAGS.dataset_name == 'mnist': download_and_convert_mnist.run(FLAGS.dataset_dir) else: raise ValueError( 'dataset_name [%s] was not recognized.' % FLAGS.dataset_name) if __name__ == '__main__': tf.app.run()
download_and_convert_flowers.run函數位於download_and_convert_flowers.py文件下,run()函數代碼以下:
def run(dataset_dir): """Runs the download and conversion operation. Args: dataset_dir: The dataset directory where the dataset is stored. """ if not tf.gfile.Exists(dataset_dir): tf.gfile.MakeDirs(dataset_dir) if _dataset_exists(dataset_dir): print('Dataset files already exist. Exiting without re-creating them.') return dataset_utils.download_and_uncompress_tarball(_DATA_URL, dataset_dir) photo_filenames, class_names = _get_filenames_and_classes(dataset_dir) class_names_to_ids = dict(zip(class_names, range(len(class_names)))) # Divide into train and test: random.seed(_RANDOM_SEED) random.shuffle(photo_filenames) training_filenames = photo_filenames[_NUM_VALIDATION:] validation_filenames = photo_filenames[:_NUM_VALIDATION] # First, convert the training and validation sets. _convert_dataset('train', training_filenames, class_names_to_ids, dataset_dir) _convert_dataset('validation', validation_filenames, class_names_to_ids, dataset_dir) # Finally, write the labels file: labels_to_class_names = dict(zip(range(len(class_names)), class_names)) dataset_utils.write_label_file(labels_to_class_names, dataset_dir) _clean_up_temporary_files(dataset_dir) print('\nFinished converting the Flowers dataset!')
在這裏只粗略的解釋一下代碼的執行流程:
def image_to_tfexample(image_data, image_format, height, width, class_id): return tf.train.Example(features=tf.train.Features(feature={ 'image/encoded': bytes_feature(image_data), 'image/format': bytes_feature(image_format), 'image/class/label': int64_feature(class_id), 'image/height': int64_feature(height), 'image/width': int64_feature(width), }))
咱們已經建立好了TFRecord文件,下面就能夠讀取文件中的數據了。
# -*- coding: utf-8 -*- """ Created on Fri Jun 8 08:52:30 2018 @author: zy """ ''' 導入flowers數據集 ''' from datasets import download_and_convert_flowers from preprocessing import vgg_preprocessing from datasets import flowers import tensorflow as tf slim = tf.contrib.slim def read_flower_image_and_label(dataset_dir,is_training=False): ''' 下載flower_photos.tgz數據集 切分訓練集和驗證集 並將數據轉換成TFRecord格式 5個訓練數據文件(3320),5個驗證數據文件(350),還有一個標籤文件(存放每一個數字標籤對應的類名) args: dataset_dir:數據集所在的目錄 is_training:設置爲TRue,表示加載訓練數據集,不然加載驗證集 return: image,label:返回隨機讀取的一張圖片,和對應的標籤 ''' download_and_convert_flowers.run(dataset_dir) ''' 利用slim讀取TFRecord中的數據 ''' #選擇數據集train if is_training: dataset = flowers.get_split(split_name = 'train',dataset_dir=dataset_dir) else: dataset = flowers.get_split(split_name = 'validation',dataset_dir=dataset_dir) #建立一個數據provider provider = slim.dataset_data_provider.DatasetDataProvider(dataset) #經過provider的get隨機獲取一條樣本數據 返回的是兩個張量 [image,label] = provider.get(['image','label']) return image,label
上面代碼中,先引入頭文件,而後建立provider,經過get來獲取image與label兩個張量。這是並無真的讀取到數據,只是構建圖的過程,具體數據須要經過session啓動隊列線程後才能夠。
下面咱們啓動session讀取數據。
if __name__ == '__main__': #test() #讀取一張圖片,以及對應的標籤 image,label = read_flower_image_and_label('./datasets/data/flowers') ''' 啓動session,讀取數據 ''' with tf.Session() as sess: sess.run(tf.global_variables_initializer()) #建立一個協調器,管理線程 coord = tf.train.Coordinator() #啓動QueueRunner, 此時文件名纔開始進隊。 threads=tf.train.start_queue_runners(sess=sess,coord=coord) img, lab = sess.run([image, label]) plt.imshow(img) plt.title('Original image') plt.show() #終止線程 coord.request_stop() coord.join(threads)
若是咱們想一次讀取多張圖片怎麼辦?
TFRecord格式每一行樣本定義爲:
def image_to_tfexample(image_data, image_format, height, width, class_id): return tf.train.Example(features=tf.train.Features(feature={ 'image/encoded': bytes_feature(image_data), 'image/format': bytes_feature(image_format), 'image/class/label': int64_feature(class_id), 'image/height': int64_feature(height), 'image/width': int64_feature(width), }))
假設咱們訓練時要從生成的5個TFRecord文件中讀取數據,而後組合成batch。
keys_to_features = { 'image/encoded': tf.FixedLenFeature((), tf.string, default_value=''), 'image/format': tf.FixedLenFeature((), tf.string, default_value='png'), 'image/class/label': tf.FixedLenFeature( [], tf.int64, default_value=tf.zeros([], dtype=tf.int64)), }
items_to_handlers = { 'image': slim.tfexample_decoder.Image('image/encoded','image/format'), 'label': slim.tfexample_decoder.Tensor('image/class/label'), }
decoder = slim.tfexample_decoder.TFExampleDecoder(
keys_to_features, items_to_handlers)
dataset = slim.dataset.Dataset( data_sources=file_pattern, reader=tf.TFRecordReader, decoder=decoder, num_samples=SPLITS_TO_SIZES[split_name],#訓練數據的總數 items_to_descriptions=_ITEMS_TO_DESCRIPTIONS, num_classes=_NUM_CLASSES, labels_to_names=labels_to_names #字典形式,格式爲:id:class_call, )
provider = slim.dataset_data_provider.DatasetDataProvider( dataset, num_readers=FLAGS.num_readers, common_queue_capacity=20 * FLAGS.batch_size, common_queue_min=10 * FLAGS.batch_size)
[image, label] = provider.get(['image', 'label']) # 圖像預處理 image = preprocessing_image(image, train_image_size, train_image_size) images, labels = tf.train.batch( [image, label], batch_size=FLAGS.batch_size, num_threads=FLAGS.num_preprocessing_threads, capacity=5 * FLAGS.batch_size) labels = slim.one_hot_encoding( labels, dataset.num_classes - FLAGS.labels_offset)
因爲DatasetDataProvider讀取到的一個樣本就是隨機的,所以在後面獲取批量數據的時候再也不使用tf.train.shuffle_batch函數。一次讀取batch_size個樣本的代碼以下:
def get_batch_images_and_label(dataset_dir,batch_size,num_classes,is_training=False,output_height=224, output_width=224,num_threads=10): ''' 每次取出batch_size個樣本 注意:這裏預處理調用的是slim庫圖片預處理的函數,例如:若是你使用的vgg網絡,就調用vgg網絡的圖像預處理函數 若是你使用的是本身定義的網絡,則能夠本身寫適合本身圖像的預處理函數,好比歸一化處理也可使用其餘網絡已經寫好的預處理函數 args: dataset_dir:數據集所在的目錄 batch_size:一次取出的樣本數量 num_classes:輸出的類別 用於對標籤one_hot編碼 is_training:設置爲TRue,表示加載訓練數據集,不然加載驗證集 output_height:輸出圖片高度 output_width:輸出圖片寬 return: images,labels:返回隨機讀取的batch_size張圖片,和對應的標籤one_hot編碼 ''' #獲取單張圖像和標籤 image,label = read_flower_image_and_label(dataset_dir,is_training) # 圖像預處理 這裏要求圖片數據是tf.float32類型的 image = vgg_preprocessing.preprocess_image(image, output_height, output_width,is_training=is_training) #縮放處理 #image = tf.image.convert_image_dtype(image, dtype=tf.float32) #image = tf.image.resize_image_with_crop_or_pad(image, output_height, output_width) # shuffle_batch 函數會將數據順序打亂 # bacth 函數不會將數據順序打亂 images, labels = tf.train.batch( [image, label], batch_size = batch_size, capacity=5 * batch_size, num_threads = num_threads) #one-hot編碼 labels = slim.one_hot_encoding(labels,num_classes) return images,labels
至此,就可使用images做爲神經網絡的輸入,使用labels計算損失函數等操做。
slim模塊共享了模型的訓練代碼,使用者再也不須要關注模型代碼,只需經過命令行方式便可完成訓練、微調、測試等任務。
對於linux用戶,在slim的scripts文件夾下還提供了模型下載、訓練、預訓練、微調、測試等一條龍的完整shell腳本,若是你是windows,也能夠在命令行下一條一條地複製命令並執行。
訓練模型的代碼被放在slim下的train_image_classifier.py文件裏,在該文件所在路徑下,這裏使用flower數據集來訓練Inception_v3網絡模型。在命令行下執行:
python train_image_classifier.py --train_dir=./log/train_logs --dataset_name=flowers --dataset_split_name=train --dataset_dir=./datasets/data/flowers --model_name=inception_v3
預訓練是在別人訓練好的模型上進行二次訓練,以獲得本身想要的模型。能夠幫你省去大量的時間。一些高質量的模型都是經過了大量的數據樣本訓練而來。Github上提供了不少訓練好的模型(在Imagenet數據集),能夠在https://github.com/tensorflow/models/tree/master/research/slim/#Pretrained中下載。
Neural nets work best when they have many parameters, making them powerful function approximators. However, this means they must be trained on very large datasets. Because training models from scratch can be a very computationally intensive process requiring days or even weeks, we provide various pre-trained models, as listed below. These CNNs have been trained on the ILSVRC-2012-CLS image classification dataset.
In the table below, we list each model, the corresponding TensorFlow model file, the link to the model checkpoint, and the top 1 and top 5 accuracy (on the imagenet test set). Note that the VGG and ResNet V1 parameters have been converted from their original caffe formats (here and here), whereas the Inception and ResNet V2 parameters have been trained internally at Google. Also be aware that these accuracies were computed by evaluating using a single image crop. Some academic papers report higher accuracy by using multiple crops at multiple scales.
下載完預訓練模型後,只要在上一節命令中添加一個參數checkpoint_path便可。
--checkpoint_path = 模型路徑
checkpoint_path 裏的模型是用於預訓練模型的參數初始化,在訓練過程當中不會改變,新產生的模型會被保存在--train_dir路徑下。
注意:預訓練時使用的樣本必須與原來的輸入尺寸和輸出的分類個數一致。這些下載的模型都是分紅1000類的,若是你不想分這麼多類,可使用下面的微調方法。
上述的預訓練模型都是在imagenet上訓練的,最終輸出的是1000個分類,若是咱們想使用預訓練模型訓練本身的數據集,就要微調了。
在微調的過程當中,須要將原有模型中的最後一層去掉,換成本身的數據集對應的分類層,例如咱們要訓練flowers數據集,就須要將1000個輸出換成10個輸出。
具體作法以下:
舉例:使用inception_v3的模型進行微調,使其能夠訓練flowers數據集。將下載好的模型inception_v3.ckpt解壓後放在當前目錄文件夾inception_v3下,經過cmd進入命令行來到slim文件下,運行命令:
python train_image_classifier.py --train_dir=./log/in3--dataset_dir=./datasets/data/flowers--dataset_name=flowers --dataset_split_name=train --model_name=inception_v3 --checkpoint_path=./inception_v3/inception_v3.ckpt--checkpoint_exclude_scopes=InceptionV3/Logits,InceptionV3/AuxLogits --trainable_scopes=InceptionV3/Logits,InceptionV3/AuxLogits
在例子中,--checkpoint_path裏的模型會被載入,將權重初始化成模型裏的參數,同時--checkpoint_exclude_scopes限制了最後一層沒有被初始化成模型裏的參數。--trainable_scopes指定了只需訓練最後新加的一層,這樣在訓練過程當中被凍結的其它參數具備原來模型訓練好的合適值,而新加入的一層則經過迭代在不斷的優化本身的參數。
在微調過程當中,還能夠經過在上面命令中加入:
--max_number_of_steps=500
來指定訓練步數。若是沒有指定訓練步數,默認會一致訓練下去。更多的參數,能夠去看train_image_classifier.py源碼。另外Script中還有使用模型來識別圖片的例子。
To evaluate the performance of a model (whether pretrained or your own), you can use the eval_image_classifier.py script, as shown below.
Below we give an example of downloading the pretrained inception model and evaluating it on the imagenet dataset.
python eval_image_classifier.py --alsologtostderr --checkpoint_path=./log/in3/model.ckpt
--dataset_dir=./datasets/data/flowers
--dataset_name=flowers
--dataset_split_name=validation --model_name=inception_v3
指定的./log/in3/model.ckpt,爲在微調中訓練出來的模型文件。
訓練好的模型能夠被打包到各個平臺上使用,不管是iso,Android仍是linux。具體是經過一個bazel開源工具實現的。詳情參考:https://github.com/tensorflow/models/tree/master/research/slim/#Export
參考文章