keras embeding設置初始值的兩種方式

隨機初始化Embeddingnode

from keras.models import Sequential
from keras.layers import Embedding
import numpy as np

model = Sequential()
model.add(Embedding(1000, 64, input_length=10))
# the model will take as input an integer matrix of size (batch, input_length).
# the largest integer (i.e. word index) in the input should be no larger than 999 (vocabulary size).
# now model.output_shape == (None, 10, 64), where None is the batch dimension.

input_array = np.random.randint(1000, size=(32, 10))

model.compile('rmsprop', 'mse')
output_array = model.predict(input_array)
print(output_array)
assert output_array.shape == (32, 10, 64)

使用weights參數指明embedding初始值python

import numpy as np

import keras

m = keras.models.Sequential()
"""
能夠經過weights參數指定初始的weights參數

由於Embedding層是不可導的 
梯度東流至此回,因此把embedding放在中間層是沒有意義的,emebedding只能做爲第一層

注意weights到embeddings的綁定過程很複雜，weights是一個列表
"""
embedding = keras.layers.Embedding(input_dim=3, output_dim=2, input_length=1, weights=[np.arange(3 * 2).reshape((3, 2))], mask_zero=True)
m.add(embedding)  # 一旦add，就會自動調用embedding的build函數,
print(keras.backend.get_value(embedding.embeddings))
m.compile(keras.optimizers.RMSprop(), keras.losses.mse)
print(m.predict([1, 2, 2, 1, 2, 0]))
print(m.get_layer(index=0).get_weights())
print(keras.backend.get_value(embedding.embeddings))

給embedding設置初始值的第二種方式：使用initializer數組

import numpy as np

import keras

m = keras.models.Sequential()
"""
能夠經過weights參數指定初始的weights參數

由於Embedding層是不可導的 
梯度東流至此回,因此把embedding放在中間層是沒有意義的,emebedding只能做爲第一層


給embedding設置權值的第二種方式，使用constant_initializer 
"""
embedding = keras.layers.Embedding(input_dim=3, output_dim=2, input_length=1, embeddings_initializer=keras.initializers.constant(np.arange(3 * 2, dtype=np.float32).reshape((3, 2))))
m.add(embedding)
print(keras.backend.get_value(embedding.embeddings))
m.compile(keras.optimizers.RMSprop(), keras.losses.mse)
print(m.predict([1, 2, 2, 1, 2]))
print(m.get_layer(index=0).get_weights())
print(keras.backend.get_value(embedding.embeddings))

關鍵的難點在於理清weights是怎麼傳入到embedding.embeddings張量裏面去的。app

Embedding是一個層，繼承自Layer，Layer有weights參數，weights參數是一個list，裏面的元素都是numpy數組。在調用Layer的構造函數的時候，weights參數就被存儲到了_initial_weights變量 basic_layer.py 之Layer類dom

if 'weights' in kwargs:
            self._initial_weights = kwargs['weights']
        else:
            self._initial_weights = None

當把Embedding層添加到模型中、跟模型的上一層進行拼接的時候，會調用layer(上一層)函數，此處layer是Embedding實例，Embedding是一個繼承了Layer的類，Embedding類沒有重寫__call__()方法，Layer實現了__call__()方法。父類Layer的__call__方法調用子類的call()方法來獲取結果。因此最終調用的是Layer.__call__()。在這個方法中，會自動檢測該層是否build過（根據self.built布爾變量）。ide

Layer.__call__函數很是重要。函數

def __call__(self, inputs, **kwargs):
        """Wrapper around self.call(), for handling internal references.

        If a Keras tensor is passed:
            - We call self._add_inbound_node().
            - If necessary, we `build` the layer to match
                the _keras_shape of the input(s).
            - We update the _keras_shape of every input tensor with
                its new shape (obtained via self.compute_output_shape).
                This is done as part of _add_inbound_node().
            - We update the _keras_history of the output tensor(s)
                with the current layer.
                This is done as part of _add_inbound_node().

        # Arguments
            inputs: Can be a tensor or list/tuple of tensors.
            **kwargs: Additional keyword arguments to be passed to `call()`.

        # Returns
            Output of the layer's `call` method.

        # Raises
            ValueError: in case the layer is missing shape information
                for its `build` call.
        """
        if isinstance(inputs, list):
            inputs = inputs[:]
        with K.name_scope(self.name):
            # Handle laying building (weight creating, input spec locking).
            if not self.built:#若是不曾build，那就要先執行build再調用call函數
                # Raise exceptions in case the input is not compatible
                # with the input_spec specified in the layer constructor.
                self.assert_input_compatibility(inputs)

                # Collect input shapes to build layer.
                input_shapes = []
                for x_elem in to_list(inputs):
                    if hasattr(x_elem, '_keras_shape'):
                        input_shapes.append(x_elem._keras_shape)
                    elif hasattr(K, 'int_shape'):
                        input_shapes.append(K.int_shape(x_elem))
                    else:
                        raise ValueError('You tried to call layer "' +
                                         self.name +
                                         '". This layer has no information'
                                         ' about its expected input shape, '
                                         'and thus cannot be built. '
                                         'You can build it manually via: '
                                         '`layer.build(batch_input_shape)`')
                self.build(unpack_singleton(input_shapes))
                self.built = True#這句話其實有些多餘，由於self.build函數已經把built置爲True了

                # Load weights that were specified at layer instantiation.
                if self._initial_weights is not None:#若是傳入了weights，把weights參數賦值到每一個變量，此處會覆蓋上面的self.build函數中的賦值。
                    self.set_weights(self._initial_weights)

            # Raise exceptions in case the input is not compatible
            # with the input_spec set at build time.
            self.assert_input_compatibility(inputs)

            # Handle mask propagation.
            previous_mask = _collect_previous_mask(inputs)
            user_kwargs = copy.copy(kwargs)
            if not is_all_none(previous_mask):
                # The previous layer generated a mask.
                if has_arg(self.call, 'mask'):
                    if 'mask' not in kwargs:
                        # If mask is explicitly passed to __call__,
                        # we should override the default mask.
                        kwargs['mask'] = previous_mask
            # Handle automatic shape inference (only useful for Theano).
            input_shape = _collect_input_shape(inputs)

            # Actually call the layer,
            # collecting output(s), mask(s), and shape(s).
            output = self.call(inputs, **kwargs)
            output_mask = self.compute_mask(inputs, previous_mask)

            # If the layer returns tensors from its inputs, unmodified,
            # we copy them to avoid loss of tensor metadata.
            output_ls = to_list(output)
            inputs_ls = to_list(inputs)
            output_ls_copy = []
            for x in output_ls:
                if x in inputs_ls:
                    x = K.identity(x)
                output_ls_copy.append(x)
            output = unpack_singleton(output_ls_copy)

            # Inferring the output shape is only relevant for Theano.
            if all([s is not None
                    for s in to_list(input_shape)]):
                output_shape = self.compute_output_shape(input_shape)
            else:
                if isinstance(input_shape, list):
                    output_shape = [None for _ in input_shape]
                else:
                    output_shape = None

            if (not isinstance(output_mask, (list, tuple)) and
                    len(output_ls) > 1):
                # Augment the mask to match the length of the output.
                output_mask = [output_mask] * len(output_ls)

            # Add an inbound node to the layer, so that it keeps track
            # of the call and of all new variables created during the call.
            # This also updates the layer history of the output tensor(s).
            # If the input tensor(s) had not previous Keras history,
            # this does nothing.
            self._add_inbound_node(input_tensors=inputs,
                                   output_tensors=output,
                                   input_masks=previous_mask,
                                   output_masks=output_mask,
                                   input_shapes=input_shape,
                                   output_shapes=output_shape,
                                   arguments=user_kwargs)

            # Apply activity regularizer if any:
            if (hasattr(self, 'activity_regularizer') and
                    self.activity_regularizer is not None):
                with K.name_scope('activity_regularizer'):
                    regularization_losses = [
                        self.activity_regularizer(x)
                        for x in to_list(output)]
                self.add_loss(regularization_losses,
                              inputs=to_list(inputs))
        return output

若是沒有build過，會自動調用Embedding類的build()函數。Embedding.build()這個函數並不會去管weights，若是它使用的initializer沒有傳入，self.embeddings_initializer會變成隨機初始化。若是傳入了，那麼在這一步就可以把weights初始化好。若是同時傳入embeddings_initializer和weights參數，那麼weights參數稍後會把Embedding#embeddings覆蓋掉。ui

embedding.py Embedding類的build函數this

def build(self, input_shape):
        self.embeddings = self.add_weight(
            shape=(self.input_dim, self.output_dim),
            initializer=self.embeddings_initializer,
            name='embeddings',
            regularizer=self.embeddings_regularizer,
            constraint=self.embeddings_constraint,
            dtype=self.dtype)
        self.built = True

綜上，在keras中，使用weights給Layer的變量賦值是一個比較通用的方法，可是不夠直觀。keras鼓勵多多使用明確的initializer，而儘可能不要觸碰weights。code