在tensorflow 2.0 中使用 relu 和 LeakyReLU

時間 2019-11-06

標籤 tensorflow 2.0 使用 relu leakyrelu 简体版

原文原文鏈接

網絡上關於ReLU、LReLU等很是多的理論東西，但是大部分都是理論的，彙集怎麼應用比較少。python

在 Convolutional Neural Network (CNN) https://tensorflow.google.cn/tutorials/images/cnn?hl=en 的學習課程中，激活函數是 relu。git

在學習過程當中，看有的博文中說當激活函數 ReLU 效果很差時，建議使用LReLU試試，但是網上並無特別詳細的使用方法，只好去官網上找。github

1 關於 relu 的常規使用方法

首先使用常規的relu —— 直接使用。express

直接使用官網例子《Create the convolutional base》apache

https://tensorflow.google.cn/tutorials/images/cnn?hl=en#create_the_convolutional_base

model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))

很明顯，這裏的 relu 是能夠直接使用的。api

找到官網關於 activation='relu' 的相關內容網絡

在 activation 的官網中app

（Module: tf.compat.v2.keras.activations） https://tensorflow.google.cn/api_docs/python/tf/compat/v2/keras/activations?hl=enless

咱們能夠看到以下內容dom

Built-in activation functions.

Functions

deserialize(...)

elu(...): Exponential linear unit.

exponential(...): Exponential activation function.

get(...)

hard_sigmoid(...): Hard sigmoid activation function.

linear(...): Linear activation function.

relu(...): Rectified Linear Unit.

selu(...): Scaled Exponential Linear Unit (SELU).

serialize(...)

sigmoid(...): Sigmoid.

softmax(...): The softmax activation function transforms the outputs so that all values are in

softplus(...): Softplus activation function.

softsign(...): Softsign activation function.

tanh(...): Hyperbolic Tangent (tanh) activation function.

很明顯，內建函數中包括有relu；而沒有LReLU。因此直接使用 activation='lrelu' 會報錯！

報錯內容： ValueError: Unknown activation function:lrelu 。

備註： activation='relu' 等價於 activation=tf.keras.activations.relu(）。

2 relu 函數溯源

官網 tf.keras.activations.relu https://tensorflow.google.cn/api_docs/python/tf/keras/activations/relu

tf.keras.activations.relu(
    x,
    alpha=0.0,
    max_value=None,
    threshold=0
)

參數：

x：張量或變量。
alpha：負截面的標量斜率（默認值= 0.）。
max_value：浮動。飽和度閾值。
threshold：浮動。閾值激活的閾值。

使用默認值，該函數返回 element-wise max(x, 0).

不然，它遵循以下規則：

f(x) = max_value for x >= max_value,

f(x) = x for threshold <= x < max_value,

f(x) = alpha * (x - threshold).

其實我在想，要是負截面的標量斜率 alpha ≠ 0 ，是否是就相似於RLeLU 函數了？

接下來咱們對比 LeaKyReLU 函數。

3 LeaKyReLU 溯源

3.1 LeaKyReLU 的基本概念

tf.keras.layers.LeakyReLU 的官方網址：https://tensorflow.google.cn/api_docs/python/tf/keras/layers/LeakyReLU?hl=en

class tf.keras.layers.LeakyReLU

首先須要明確的是 LeaKyReLU 是一個類，而不是函數！

該類繼承自 layer（當我意識到它是類時，覺得繼承自layers，尾後附源碼）

參數：

alpha：浮點> =0。負斜率係數。

__init__方法

__init__(
    alpha=0.3,
    **kwargs
)

3.2 LeakyReLU 的實例應用

深度卷積生成對抗網絡 Deep Convolutional Generative Adversarial Network

該文中是這樣應用的：

def make_generator_model(): model = tf.keras.Sequential() model.add(layers.Dense(7*7*256, use_bias=False, input_shape=(100,))) model.add(layers.BatchNormalization()) model.add(layers.LeakyReLU()) model.add(layers.Reshape((7, 7, 256))) assert model.output_shape == (None, 7, 7, 256) # 注意：batch size 沒有限制
 model.add(layers.Conv2DTranspose(128, (5, 5), strides=(1, 1), padding='same', use_bias=False)) assert model.output_shape == (None, 7, 7, 128) model.add(layers.BatchNormalization()) model.add(layers.LeakyReLU()) model.add(layers.Conv2DTranspose(64, (5, 5), strides=(2, 2), padding='same', use_bias=False)) assert model.output_shape == (None, 14, 14, 64) model.add(layers.BatchNormalization()) model.add(layers.LeakyReLU()) model.add(layers.Conv2DTranspose(1, (5, 5), strides=(2, 2), padding='same', use_bias=False, activation='tanh')) assert model.output_shape == (None, 28, 28, 1) return model

能夠理解爲直接實例化應用。

在官方給出的另外一文中的應用方法。

Pix2Pix https://tensorflow.google.cn/tutorials/generative/pix2pix

應用方法爲

def downsample(filters, size, apply_batchnorm=True):
  initializer = tf.random_normal_initializer(0., 0.02)

  result = tf.keras.Sequential()
  result.add(
      tf.keras.layers.Conv2D(filters, size, strides=2, padding='same',
                             kernel_initializer=initializer, use_bias=False))

  if apply_batchnorm:
    result.add(tf.keras.layers.BatchNormalization())

  result.add(tf.keras.layers.LeakyReLU())

  return result

紅色標識的地方可知，也是直接實例化應用的。

固然了也能夠直接賦值並實例化：

3.3 將 LeaKyReLU 替代 relu 的使用方法（第一章節代碼）

依據官方的方法進行修改

首先引用文件

from tensorflow.keras import layers, models
from tensorflow.keras.layers import LeakyReLU

3.3.1 構建模型（方案一）重點章節

model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), input_shape=(28, 28, 3)))
model.add(LeakyReLU(alpha=0.01))
model.add(layers.MaxPooling2D((2, 2)))

model.add(layers.Conv2D(64, (3, 3)))
model.add(LeakyReLU(alpha=0.01))
model.add(layers.MaxPooling2D((2, 2)))

model.add(layers.Conv2D(64, (3, 3)))
model.add(LeakyReLU(alpha=0.01))
model.add(layers.MaxPooling2D((2, 2)))
model.summary()

嘗試是能夠運行的：

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 26, 26, 32)        896       
_________________________________________________________________
leaky_re_lu (LeakyReLU)      (None, 26, 26, 32)        0         
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
leaky_re_lu_1 (LeakyReLU)    (None, 11, 11, 64)        0         
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 3, 3, 64)          36928     
_________________________________________________________________
leaky_re_lu_2 (LeakyReLU)    (None, 3, 3, 64)          0         
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 1, 1, 64)          0         
=================================================================
Total params: 56,320
Trainable params: 56,320
Non-trainable params: 0
_________________________________________________________________

咱們嘗試修改代碼行數，觀察是否可行：

3.3.2 構建模型（方案二）重點章節

model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation=LeakyReLU(alpha=0.01), input_shape=(32, 32, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation=LeakyReLU(alpha=0.01)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation=LeakyReLU(alpha=0.01)))
model.add(layers.MaxPooling2D((2, 2)))
model.summary()

運行

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 30, 30, 32)        896       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 15, 15, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 13, 13, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 6, 6, 64)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 4, 4, 64)          36928     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 2, 2, 64)          0         
=================================================================
Total params: 56,320
Trainable params: 56,320
Non-trainable params: 0
_________________________________________________________________

很明顯，也是能夠運行的。只是它們的 summary 有些區別。

3.3.3 意外！

雖然在 3.3.2章節 中驗證成功，可是在實際運行中卻出現了意外——報錯！

AttributeError: 'LeakyReLU' object has no attribute '__name__'

採用 3.3.1 章節 的方案卻沒有報錯。和原來未修改前同樣，故仍是使用 3.3.1章節 內容。

4 LeakyReLU 的底層代碼

爲了顯示底層代碼的重要性，咱們將其做爲單獨的章節列出

tensorflow/tensorflow中的

https://github.com/tensorflow/tensorflow/blob/r2.0/tensorflow/python/keras/layers/advanced_activations.py#L53-L56

在該內容中咱們看到以下代碼內容，從代碼中可知，class LeakyReLU 確實繼承自 class Layer

# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Layers that act as activation functions.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from tensorflow.python.keras import backend as K
from tensorflow.python.keras import constraints
from tensorflow.python.keras import initializers
from tensorflow.python.keras import regularizers
from tensorflow.python.keras.engine.base_layer import Layer
from tensorflow.python.keras.engine.input_spec import InputSpec
from tensorflow.python.keras.utils import tf_utils
from tensorflow.python.ops import math_ops
from tensorflow.python.util.tf_export import keras_export


@keras_export('keras.layers.LeakyReLU')
class LeakyReLU(Layer):
  """Leaky version of a Rectified Linear Unit.
  It allows a small gradient when the unit is not active:
  `f(x) = alpha * x for x < 0`,
  `f(x) = x for x >= 0`.
  Input shape:
    Arbitrary. Use the keyword argument `input_shape`
    (tuple of integers, does not include the samples axis)
    when using this layer as the first layer in a model.
  Output shape:
    Same shape as the input.
  Arguments:
    alpha: Float >= 0. Negative slope coefficient.
  """

  def __init__(self, alpha=0.3, **kwargs):
    super(LeakyReLU, self).__init__(**kwargs)
    self.supports_masking = True
    self.alpha = K.cast_to_floatx(alpha)

  def call(self, inputs):
    return K.relu(inputs, alpha=self.alpha)

  def get_config(self):
    config = {'alpha': float(self.alpha)}
    base_config = super(LeakyReLU, self).get_config()
    return dict(list(base_config.items()) + list(config.items()))

  @tf_utils.shape_type_conversion
  def compute_output_shape(self, input_shape):
    return input_shape


@keras_export('keras.layers.PReLU')
class PReLU(Layer):
  """Parametric Rectified Linear Unit.
  It follows:
  `f(x) = alpha * x for x < 0`,
  `f(x) = x for x >= 0`,
  where `alpha` is a learned array with the same shape as x.
  Input shape:
    Arbitrary. Use the keyword argument `input_shape`
    (tuple of integers, does not include the samples axis)
    when using this layer as the first layer in a model.
  Output shape:
    Same shape as the input.
  Arguments:
    alpha_initializer: Initializer function for the weights.
    alpha_regularizer: Regularizer for the weights.
    alpha_constraint: Constraint for the weights.
    shared_axes: The axes along which to share learnable
      parameters for the activation function.
      For example, if the incoming feature maps
      are from a 2D convolution
      with output shape `(batch, height, width, channels)`,
      and you wish to share parameters across space
      so that each filter only has one set of parameters,
      set `shared_axes=[1, 2]`.
  """

  def __init__(self,
               alpha_initializer='zeros',
               alpha_regularizer=None,
               alpha_constraint=None,
               shared_axes=None,
               **kwargs):
    super(PReLU, self).__init__(**kwargs)
    self.supports_masking = True
    self.alpha_initializer = initializers.get(alpha_initializer)
    self.alpha_regularizer = regularizers.get(alpha_regularizer)
    self.alpha_constraint = constraints.get(alpha_constraint)
    if shared_axes is None:
      self.shared_axes = None
    elif not isinstance(shared_axes, (list, tuple)):
      self.shared_axes = [shared_axes]
    else:
      self.shared_axes = list(shared_axes)

  @tf_utils.shape_type_conversion
  def build(self, input_shape):
    param_shape = list(input_shape[1:])
    if self.shared_axes is not None:
      for i in self.shared_axes:
        param_shape[i - 1] = 1
    self.alpha = self.add_weight(
        shape=param_shape,
        name='alpha',
        initializer=self.alpha_initializer,
        regularizer=self.alpha_regularizer,
        constraint=self.alpha_constraint)
    # Set input spec
    axes = {}
    if self.shared_axes:
      for i in range(1, len(input_shape)):
        if i not in self.shared_axes:
          axes[i] = input_shape[i]
    self.input_spec = InputSpec(ndim=len(input_shape), axes=axes)
    self.built = True

  def call(self, inputs):
    pos = K.relu(inputs)
    neg = -self.alpha * K.relu(-inputs)
    return pos + neg

  def get_config(self):
    config = {
        'alpha_initializer': initializers.serialize(self.alpha_initializer),
        'alpha_regularizer': regularizers.serialize(self.alpha_regularizer),
        'alpha_constraint': constraints.serialize(self.alpha_constraint),
        'shared_axes': self.shared_axes
    }
    base_config = super(PReLU, self).get_config()
    return dict(list(base_config.items()) + list(config.items()))

  @tf_utils.shape_type_conversion
  def compute_output_shape(self, input_shape):
    return input_shape


@keras_export('keras.layers.ELU')
class ELU(Layer):
  """Exponential Linear Unit.
  It follows:
  `f(x) =  alpha * (exp(x) - 1.) for x < 0`,
  `f(x) = x for x >= 0`.
  Input shape:
    Arbitrary. Use the keyword argument `input_shape`
    (tuple of integers, does not include the samples axis)
    when using this layer as the first layer in a model.
  Output shape:
    Same shape as the input.
  Arguments:
    alpha: Scale for the negative factor.
  """

  def __init__(self, alpha=1.0, **kwargs):
    super(ELU, self).__init__(**kwargs)
    self.supports_masking = True
    self.alpha = K.cast_to_floatx(alpha)

  def call(self, inputs):
    return K.elu(inputs, self.alpha)

  def get_config(self):
    config = {'alpha': float(self.alpha)}
    base_config = super(ELU, self).get_config()
    return dict(list(base_config.items()) + list(config.items()))

  @tf_utils.shape_type_conversion
  def compute_output_shape(self, input_shape):
    return input_shape


@keras_export('keras.layers.ThresholdedReLU')
class ThresholdedReLU(Layer):
  """Thresholded Rectified Linear Unit.
  It follows:
  `f(x) = x for x > theta`,
  `f(x) = 0 otherwise`.
  Input shape:
    Arbitrary. Use the keyword argument `input_shape`
    (tuple of integers, does not include the samples axis)
    when using this layer as the first layer in a model.
  Output shape:
    Same shape as the input.
  Arguments:
    theta: Float >= 0. Threshold location of activation.
  """

  def __init__(self, theta=1.0, **kwargs):
    super(ThresholdedReLU, self).__init__(**kwargs)
    self.supports_masking = True
    self.theta = K.cast_to_floatx(theta)

  def call(self, inputs):
    return inputs * math_ops.cast(
        math_ops.greater(inputs, self.theta), K.floatx())

  def get_config(self):
    config = {'theta': float(self.theta)}
    base_config = super(ThresholdedReLU, self).get_config()
    return dict(list(base_config.items()) + list(config.items()))

  @tf_utils.shape_type_conversion
  def compute_output_shape(self, input_shape):
    return input_shape


@keras_export('keras.layers.Softmax')
class Softmax(Layer):
  """Softmax activation function.
  Input shape:
    Arbitrary. Use the keyword argument `input_shape`
    (tuple of integers, does not include the samples axis)
    when using this layer as the first layer in a model.
  Output shape:
    Same shape as the input.
  Arguments:
    axis: Integer, axis along which the softmax normalization is applied.
  """

  def __init__(self, axis=-1, **kwargs):
    super(Softmax, self).__init__(**kwargs)
    self.supports_masking = True
    self.axis = axis

  def call(self, inputs):
    return K.softmax(inputs, axis=self.axis)

  def get_config(self):
    config = {'axis': self.axis}
    base_config = super(Softmax, self).get_config()
    return dict(list(base_config.items()) + list(config.items()))

  @tf_utils.shape_type_conversion
  def compute_output_shape(self, input_shape):
    return input_shape


@keras_export('keras.layers.ReLU')
class ReLU(Layer):
  """Rectified Linear Unit activation function.
  With default values, it returns element-wise `max(x, 0)`.
  Otherwise, it follows:
  `f(x) = max_value` for `x >= max_value`,
  `f(x) = x` for `threshold <= x < max_value`,
  `f(x) = negative_slope * (x - threshold)` otherwise.
  Input shape:
    Arbitrary. Use the keyword argument `input_shape`
    (tuple of integers, does not include the samples axis)
    when using this layer as the first layer in a model.
  Output shape:
    Same shape as the input.
  Arguments:
    max_value: Float >= 0. Maximum activation value.
    negative_slope: Float >= 0. Negative slope coefficient.
    threshold: Float. Threshold value for thresholded activation.
  """

  def __init__(self, max_value=None, negative_slope=0, threshold=0, **kwargs):
    super(ReLU, self).__init__(**kwargs)
    if max_value is not None and max_value < 0.:
      raise ValueError('max_value of Relu layer '
                       'cannot be negative value: ' + str(max_value))
    if negative_slope < 0.:
      raise ValueError('negative_slope of Relu layer '
                       'cannot be negative value: ' + str(negative_slope))

    self.support_masking = True
    if max_value is not None:
      max_value = K.cast_to_floatx(max_value)
    self.max_value = max_value
    self.negative_slope = K.cast_to_floatx(negative_slope)
    self.threshold = K.cast_to_floatx(threshold)

  def call(self, inputs):
    # alpha is used for leaky relu slope in activations instead of
    # negative_slope.
    return K.relu(inputs,
                  alpha=self.negative_slope,
                  max_value=self.max_value,
                  threshold=self.threshold)

  def get_config(self):
    config = {
        'max_value': self.max_value,
        'negative_slope': self.negative_slope,
        'threshold': self.threshold
    }
    base_config = super(ReLU, self).get_config()
    return dict(list(base_config.items()) + list(config.items()))

  @tf_utils.shape_type_conversion
  def compute_output_shape(self, input_shape):
    return input_shape