使用 TensorFlow 在卷積神經網絡上實現 L2 約束的 softmax 損失函數

時間 2019-12-04

標籤使用 tensorflow 神經網絡實現 l2 約束 softmax 損失函數简体版

原文原文鏈接

做者：chen_h
微信號 & QQ：862251340
微信公衆號：coderpai
簡書地址：https://www.jianshu.com/p/d6a...git

當咱們要使用神經網絡來構建一個多分類模型時，咱們通常都會採用 softmax 函數來做爲最後的分類函數。softmax 函數對每個分類結果都會分配一個機率，咱們把比較高的那個機率對應的類別做爲模型的輸出。這就是爲何咱們能從模型中推導出具體分類結果。爲了訓練模型，咱們使用 softmax 函數進行反向傳播，進行訓練。咱們最後輸出的就是一個 0-1 向量。github

在這篇文章中，咱們不會去解釋什麼是 softmax 迴歸或者什麼是 CNN。這篇文章的主要工做是如何在 TensorFlow 上面設計一個 L2 約束的 softmax 函數，咱們使用的數據集是 MNIST。完整的理論分析能夠查看這篇論文。算法

在具體實現以前，咱們先來弄清楚一些概念。微信

softmax 損失函數

softmax 損失函數能夠定義以下：網絡

其中各個參數定義以下：架構

L2 約束的 softmax 損失函數

帶約束的損失函數定義幾乎和以前的同樣，咱們的目的仍是最小化這個損失函數。app

可是，咱們須要對 f(x) 函數進行修改。ide

咱們不是直接計算最後層權重與前一層網絡輸出 f(x) 之間的乘積，而是對前一層的 f(x) 先作一次歸一化，而後對這個歸一化的值進行 α 倍數的放大，最後咱們進行常規的 softmax 函數進行計算。函數

也就是說，損失函數是受到以下約束：性能

程序細節

因此，咱們的架構看起來是以下圖（這也是我想要實現的架構圖）：

C 表示卷積層，P 表示池化層，FC 表示全鏈接層，L2-Norm 層和Scale 層是咱們重點要實現的層。

利用 TensorFlow 進行實現

爲了實現這個模型，咱們使用這個代碼庫進行學習。

在應用 dropout 以前，咱們先對 N-1 層的輸出進行正則化，而後把正則化以後的結果乘以參數 alpha，而後進行 softmax 函數計算。下面是具體的代碼展現：

fc1 = alpha * tf.divide(fc1, tf.norm(fc1, ord='euclidean'))

若是咱們把 alpha 設置爲 0，那麼這就是常規的 softmax 函數，不然就是一個 L2 約束。

完整代碼以下：

# Actual Code : https://github.com/aymericdamien/TensorFlow-Examples/blob/master/notebooks/3_NeuralNetworks/convolutional_network.ipynb
# Modified By: Manash

from __future__ import division, print_function, absolute_import

# Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=False)

import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np

# Training Parameters
learning_rate = 0.001
num_steps = 100
batch_size = 20


# Network Parameters
num_input = 784 # MNIST data input (img shape: 28*28)
num_classes = 10 # MNIST total classes (0-9 digits)
dropout = 0.75 # Dropout, probability to keep units


# Create the neural network
def conv_net(x_dict, n_classes, dropout, reuse, is_training, alpha=5):
    
    # Define a scope for reusing the variables
    with tf.variable_scope('ConvNet', reuse=reuse):
        # TF Estimator input is a dict, in case of multiple inputs
        x = x_dict['images']

        # MNIST data input is a 1-D vector of 784 features (28*28 pixels)
        # Reshape to match picture format [Height x Width x Channel]
        # Tensor input become 4-D: [Batch Size, Height, Width, Channel]
        x = tf.reshape(x, shape=[-1, 28, 28, 1])

        # Convolution Layer with 32 filters and a kernel size of 5
        conv1 = tf.layers.conv2d(x, 32, 5, activation=tf.nn.relu)
        # Max Pooling (down-sampling) with strides of 2 and kernel size of 2
        conv1 = tf.layers.max_pooling2d(conv1, 2, 2)

        # Convolution Layer with 32 filters and a kernel size of 5
        conv2 = tf.layers.conv2d(conv1, 64, 3, activation=tf.nn.relu)
        # Max Pooling (down-sampling) with strides of 2 and kernel size of 2
        conv2 = tf.layers.max_pooling2d(conv2, 2, 2)

        # Flatten the data to a 1-D vector for the fully connected layer
        fc1 = tf.contrib.layers.flatten(conv2)

        # Fully connected layer (in tf contrib folder for now)
        fc1 = tf.layers.dense(fc1, 1024)
        
        # If alpha is not zero then perform the l2-Normalization then scaling up
        if alpha != 0:
            fc1 = alpha * tf.divide(fc1, tf.norm(fc1, ord='euclidean'))
    
        # Apply Dropout (if is_training is False, dropout is not applied)
        fc1 = tf.layers.dropout(fc1, rate=dropout, training=is_training)
  
        # Output layer, class prediction
        out = tf.layers.dense(fc1, n_classes)

    return out
  
# Define the model function (following TF Estimator Template)
def model_fn(features, labels, mode):
    # Set alpha
    alph = 50
    
    # Build the neural network
    # Because Dropout have different behavior at training and prediction time, we
    # need to create 2 distinct computation graphs that still share the same weights.
    logits_train = conv_net(features, num_classes, dropout, reuse=False, is_training=True, alpha=alph)
    
    # At test time we don't need to normalize or scale, it's redundant as per paper : https://arxiv.org/abs/1703.09507
    logits_test = conv_net(features, num_classes, dropout, reuse=True, is_training=False, alpha=0)
    
    # Predictions
    pred_classes = tf.argmax(logits_test, axis=1)
    pred_probas = tf.nn.softmax(logits_test)
    
    # If prediction mode, early return
    if mode == tf.estimator.ModeKeys.PREDICT:
        return tf.estimator.EstimatorSpec(mode, predictions=pred_classes) 
        
    # Define loss and optimizer
    loss_op = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(
        logits=logits_train, labels=tf.cast(labels, dtype=tf.int32)))
    optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
    train_op = optimizer.minimize(loss_op, global_step=tf.train.get_global_step())
    
    # Evaluate the accuracy of the model
    acc_op = tf.metrics.accuracy(labels=labels, predictions=pred_classes)
    
    # TF Estimators requires to return a EstimatorSpec, that specify
    # the different ops for training, evaluating, ...
    estim_specs = tf.estimator.EstimatorSpec(
      mode=mode,
      predictions=pred_classes,
      loss=loss_op,
      train_op=train_op,
      eval_metric_ops={'accuracy': acc_op})

    return estim_specs

  
# Build the Estimator
model = tf.estimator.Estimator(model_fn)

# Define the input function for training
input_fn = tf.estimator.inputs.numpy_input_fn(
    x={'images': mnist.train.images}, y=mnist.train.labels,
    batch_size=batch_size, num_epochs=None, shuffle=False)
# Train the Model
model.train(input_fn, steps=num_steps)

# Evaluate the Model
# Define the input function for evaluating
input_fn = tf.estimator.inputs.numpy_input_fn(
    x={'images': mnist.test.images}, y=mnist.test.labels,
    batch_size=batch_size, shuffle=False)
# Use the Estimator 'evaluate' method
model.evaluate(input_fn)


# Predict single images
n_images = 4
# Get images from test set
test_images = mnist.test.images[:n_images]
# Prepare the input data
input_fn = tf.estimator.inputs.numpy_input_fn(
    x={'images': test_images}, shuffle=False)
# Use the model to predict the images class
preds = list(model.predict(input_fn))

# Display
for i in range(n_images):
    plt.imshow(np.reshape(test_images[i], [28, 28]), cmap='gray')
    plt.show()
    print("Model prediction:", preds[i])