Keras:基於Theano和TensorFlow的深度學習庫

時間 2019-11-21

標籤 keras 基於 theano tensorflow 深度學習简体版

原文原文鏈接

cataloguehtml

1. 引言
2. 一些基本概念
3. Sequential模型
4. 泛型模型
5. 經常使用層
6. 卷積層
7. 池化層
8. 遞歸層Recurrent
9. 嵌入層 Embedding

1. 引言node

Keras是一個高層神經網絡庫，Keras由純Python編寫而成並基Tensorflow或Theanopython

簡易和快速的原型設計（keras具備高度模塊化，極簡，和可擴充特性）
支持CNN和RNN，或兩者的結合
支持任意的連接方案（包括多輸入和多輸出訓練）
無縫CPU和GPU切換

0x1: Keras設計原則git

1. 模塊性: 模型可理解爲一個獨立的序列或圖，徹底可配置的模塊以最少的代價自由組合在一塊兒。具體而言，網絡層、損失函數、優化器、初始化策略、激活函數、正則化方法都是獨立的模塊，咱們可使用它們來構建本身的模型
2. 極簡主義: 每一個模塊都應該儘可能的簡潔。每一段代碼都應該在初次閱讀時都顯得直觀易懂。沒有黑魔法，由於它將給迭代和創新帶來麻煩 
3. 易擴展性: 添加新模塊超級簡單的容易，只須要仿照現有的模塊編寫新的類或函數便可。建立新模塊的便利性使得Keras更適合於先進的研究工做 
4. 與Python協做: Keras沒有單獨的模型配置文件類型，模型由python代碼描述，使其更緊湊和更易debug，並提供了擴展的便利性

0x2: 快速開始github

sudo apt-get install libblas-dev liblapack-dev libatlas-base-dev gfortran
pip install scipy

Keras的核心數據結構是「模型」，模型是一種組織網絡層的方式。Keras中主要的模型是Sequential模型，Sequential是一系列網絡層按順序構成的棧算法

from keras.models import Sequential

model = Sequential()

將一些網絡層經過.add()堆疊起來，就構成了一個模型：編程

from keras.layers import Dense, Activation

model.add(Dense(output_dim=64, input_dim=100))
model.add(Activation("relu"))
model.add(Dense(output_dim=10))
model.add(Activation("softmax"))

完成模型的搭建後，咱們須要使用.compile()方法來編譯模型：json

model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])

編譯模型時必須指明損失函數和優化器，若是你須要的話，也能夠本身定製損失函數。Keras的一個核心理念就是簡明易用同時，保證用戶對Keras的絕對控制力度，用戶能夠根據本身的須要定製本身的模型、網絡層，甚至修改源代碼後端

from keras.optimizers import SGD
model.compile(loss='categorical_crossentropy', optimizer=SGD(lr=0.01, momentum=0.9, nesterov=True))

完成模型編譯後，咱們在訓練數據上按batch進行必定次數的迭代訓練，以擬合網絡數組

model.fit(X_train, Y_train, nb_epoch=5, batch_size=32)

固然，咱們也能夠手動將一個個batch的數據送入網絡中訓練，這時候須要使用

model.train_on_batch(X_batch, Y_batch)

隨後，咱們可使用一行代碼對咱們的模型進行評估，看看模型的指標是否知足咱們的要求

loss_and_metrics = model.evaluate(X_test, Y_test, batch_size=32)

或者，咱們可使用咱們的模型，對新的數據進行預測

classes = model.predict_classes(X_test, batch_size=32)
proba = model.predict_proba(X_test, batch_size=32)

Relevant Link:

https://github.com/fchollet/keras
http://playground.tensorflow.org/#activation=tanh&regularization=L1&batchSize=10&dataset=circle&regDataset=reg-plane&learningRate=0.03&regularizationRate=0.001&noise=45&networkShape=4,5&seed=0.75320&showTestData=true&discretize=true&percTrainData=50&x=true&y=true&xTimesY=true&xSquared=true&ySquared=true&cosX=false&sinX=true&cosY=false&sinY=true&collectStats=false&problem=classification&initZero=false&hideText=false

2. 一些基本概念

0x1: 符號計算

Keras的底層庫使用Theano或TensorFlow，這兩個庫也稱爲Keras的後端。不管是Theano仍是TensorFlow，都是一個"符號主義"的庫。
所以，這也使得Keras的編程與傳統的Python代碼有所差異。籠統的說，符號主義的計算首先定義各類變量，而後創建一個「計算圖」，計算圖規定了各個變量之間的計算關係。創建好的計算圖須要編譯已肯定其內部細節，然而，此時的計算圖仍是一個"空殼子"，裏面沒有任何實際的數據，只有當你把須要運算的輸入放進去後，才能在整個模型中造成數據流，從而造成輸出值。
Keras的模型搭建形式就是這種方法，在你搭建Keras模型完畢後，你的模型就是一個空殼子，只有實際生成可調用的函數後(K.function)，輸入數據，纔會造成真正的數據流

0x2: 張量

使用這個詞彙的目的是爲了表述統一，張量能夠看做是向量、矩陣的天然推廣，咱們用張量來表示普遍的數據類型
規模最小的張量是0階張量，即標量，也就是一個數
當咱們把一些數有序的排列起來，就造成了1階張量，也就是一個向量
若是咱們繼續把一組向量有序的排列起來，就造成了2階張量，也就是一個矩陣
把矩陣摞起來，就是3階張量，咱們能夠稱爲一個立方體，具備3個顏色通道的彩色圖片就是一個這樣的立方體
張量的階數有時候也稱爲維度，或者軸，軸這個詞翻譯自英文axis。譬如一個矩陣[[1,2],[3,4]]，是一個2階張量，有兩個維度或軸，沿着第0個軸（爲了與python的計數方式一致，本文檔維度和軸從0算起）你看到的是[1,2]，[3,4]兩個向量，沿着第1個軸你看到的是[1,3]，[2,4]兩個向量。

import numpy as np

a = np.array([[1,2],[3,4]])
sum0 = np.sum(a, axis=0)
sum1 = np.sum(a, axis=1)

print sum0
print sum1

0x3: 泛型模型

在本來的Keras版本中，模型其實有兩種

1. 一種叫Sequential，稱爲序貫模型，也就是單輸入單輸出，一條路通到底，層與層之間只有相鄰關係，跨層鏈接通通沒有。這種模型編譯速度快，操做上也比較簡單
2. 第二種模型稱爲Graph，即圖模型，這個模型支持多輸入多輸出，層與層之間想怎麼連怎麼連，可是編譯速度慢。能夠看到，Sequential實際上是Graph的一個特殊狀況

在如今這版Keras中，圖模型被移除，而增長了了「functional model API」，這個東西，更增強調了Sequential是特殊狀況這一點。通常的模型就稱爲Model，而後若是你要用簡單的Sequential，OK，那還有一個快捷方式Sequential。

Relevant Link:

http://keras-cn.readthedocs.io/en/latest/getting_started/concepts/

3. Sequential模型

Sequential是多個網絡層的線性堆疊
能夠經過向Sequential模型傳遞一個layer的list來構造該模型

from keras.models import Sequential
from keras.layers import Dense, Activation

model = Sequential([
Dense(32, input_dim=784),
Activation('relu'),
Dense(10),
Activation('softmax'),
])

也能夠經過.add()方法一個個的將layer加入模型中：

model = Sequential()
model.add(Dense(32, input_dim=784))
model.add(Activation('relu'))

0x1: 指定輸入數據的shape

模型須要知道輸入數據的shape，所以，Sequential的第一層須要接受一個關於輸入數據shape的參數，後面的各個層則能夠自動的推導出中間數據的shape，所以不須要爲每一個層都指定這個參數。有幾種方法來爲第一層指定輸入數據的shape

1. 傳遞一個input_shape的關鍵字參數給第一層，input_shape是一個tuple類型的數據，其中也能夠填入None，若是填入None則表示此位置多是任何正整數。數據的batch大小不該包含在其中。
2. 傳遞一個batch_input_shape的關鍵字參數給第一層，該參數包含數據的batch大小。該參數在指定固定大小batch時比較有用，例如在stateful RNNs中。事實上，Keras在內部會經過添加一個None將input_shape轉化爲batch_input_shape
3. 有些2D層，如Dense，支持經過指定其輸入維度input_dim來隱含的指定輸入數據shape。一些3D的時域層支持經過參數input_dim和input_length來指定輸入shape

下面的三個指定輸入數據shape的方法是嚴格等價的

model = Sequential()
model.add(Dense(32, input_shape=(784,)))

model = Sequential()
model.add(Dense(32, batch_input_shape=(None, 784)))
# note that batch dimension is "None" here,
# so the model will be able to process batches of any size.</pre>

model = Sequential()
model.add(Dense(32, input_dim=784))

下面三種方法也是嚴格等價的：

model = Sequential()
model.add(LSTM(32, input_shape=(10, 64)))

model = Sequential()
model.add(LSTM(32, batch_input_shape=(None, 10, 64)))

model = Sequential()
model.add(LSTM(32, input_length=10, input_dim=64))

0x2: Merge層

多個Sequential可經由一個Merge層合併到一個輸出。Merge層的輸出是一個能夠被添加到新 Sequential的層對象。下面這個例子將兩個Sequential合併到一塊兒(activation獲得最終結果矩陣)

from keras.layers import Merge

left_branch = Sequential()
left_branch.add(Dense(32, input_dim=784))

right_branch = Sequential()
right_branch.add(Dense(32, input_dim=784))

merged = Merge([left_branch, right_branch], mode='concat')

final_model = Sequential()
final_model.add(merged)
final_model.add(Dense(10, activation='softmax'))

Merge層支持一些預約義的合併模式，包括

sum(defualt):逐元素相加
concat:張量串聯，能夠經過提供concat_axis的關鍵字參數指定按照哪一個軸進行串聯
mul：逐元素相乘
ave：張量平均
dot：張量相乘，能夠經過dot_axis關鍵字參數來指定要消去的軸
cos：計算2D張量（即矩陣）中各個向量的餘弦距離

這個兩個分支的模型能夠經過下面的代碼訓練:

final_model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
final_model.fit([input_data_1, input_data_2], targets)  # we pass one data array per model input

也能夠爲Merge層提供關鍵字參數mode，以實現任意的變換，例如

merged = Merge([left_branch, right_branch], mode=lambda x: x[0] - x[1])

對於不能經過Sequential和Merge組合生成的複雜模型，能夠參考泛型模型API

0x3: 編譯

在訓練模型以前，咱們須要經過compile來對學習過程進行配置。compile接收三個參數

1. 優化器optimizer：該參數可指定爲已預約義的優化器名，如rmsprop、adagrad，或一個Optimizer類的對象 
2. 損失函數loss：該參數爲模型試圖最小化的目標函數，它可爲預約義的損失函數名，如categorical_crossentropy、mse，也能夠爲一個損失函數 
3. 指標列表metrics：對分類問題，咱們通常將該列表設置爲metrics=['accuracy']。指標能夠是一個預約義指標的名字,也能夠是一個用戶定製的函數.指標函數應該返回單個張量,或一個完成metric_name - > metric_value映射的字典

指標列表就是用來生成最後的判斷結果的

# for a multi-class classification problem
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])

# for a binary classification problem
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy'])

# for a mean squared error regression problem
model.compile(optimizer='rmsprop',
loss='mse')

# for custom metrices


# for custom metrics
import keras.backend as K

def mean_pred(y_true, y_pred):
    return K.mean(y_pred)

def false_rates(y_true, y_pred):
    false_neg = ...
    false_pos = ...
    return {
        'false_neg': false_neg,
        'false_pos': false_pos,
    }

model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy', mean_pred, false_rates])

0x4: 訓練

Keras以Numpy數組做爲輸入數據和標籤的數據類型。訓練模型通常使用fit函數

# for a single-input model with 2 classes (binary):
model = Sequential()
model.add(Dense(1, input_dim=784, activation='sigmoid'))
model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])

# generate dummy data
import numpy as np
data = np.random.random((1000, 784))
labels = np.random.randint(2, size=(1000, 1))

# train the model, iterating on the data in batches
# of 32 samples
model.fit(data, labels, nb_epoch=10, batch_size=32)

另外一個栗子

# for a multi-input model with 10 classes:

left_branch = Sequential()
left_branch.add(Dense(32, input_dim=784))

right_branch = Sequential()
right_branch.add(Dense(32, input_dim=784))

merged = Merge([left_branch, right_branch], mode='concat')

model = Sequential()
model.add(merged)
model.add(Dense(10, activation='softmax'))

model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# generate dummy data
import numpy as np
from keras.utils.np_utils import to_categorical
data_1 = np.random.random((1000, 784))
data_2 = np.random.random((1000, 784))

# these are integers between 0 and 9
labels = np.random.randint(10, size=(1000, 1))
# we convert the labels to a binary matrix of size (1000, 10)
# for use with categorical_crossentropy
labels = to_categorical(labels, 10)

# train the model
# note that we are passing a list of Numpy arrays as training data
# since the model has 2 inputs
model.fit([data_1, data_2], labels, nb_epoch=10, batch_size=32)

0x5: 一些栗子

1. 基於多層感知器的softmax多分類

from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.optimizers import SGD

model = Sequential()
# Dense(64) is a fully-connected layer with 64 hidden units.
# in the first layer, you must specify the expected input data shape:
# here, 20-dimensional vectors.
model.add(Dense(64, input_dim=20, init='uniform'))
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Dense(64, init='uniform'))
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Dense(10, init='uniform'))
model.add(Activation('softmax'))

sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy',
              optimizer=sgd,
              metrics=['accuracy'])

model.fit(X_train, y_train,
          nb_epoch=20,
          batch_size=16)
score = model.evaluate(X_test, y_test, batch_size=16)

2. 類似MLP的另外一種實現

model = Sequential()
model.add(Dense(64, input_dim=20, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='adadelta',
              metrics=['accuracy'])

3. 用於二分類的多層感知器

model = Sequential()
model.add(Dense(64, input_dim=20, init='uniform', activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

4. 相似VGG的卷積神經網絡

from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.optimizers import SGD

model = Sequential()
# input: 100x100 images with 3 channels -> (3, 100, 100) tensors.
# this applies 32 convolution filters of size 3x3 each.
model.add(Convolution2D(32, 3, 3, border_mode='valid', input_shape=(3, 100, 100)))
model.add(Activation('relu'))
model.add(Convolution2D(32, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Convolution2D(64, 3, 3, border_mode='valid'))
model.add(Activation('relu'))
model.add(Convolution2D(64, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
# Note: Keras does automatic shape inference.
model.add(Dense(256))
model.add(Activation('relu'))
model.add(Dropout(0.5))

model.add(Dense(10))
model.add(Activation('softmax'))

sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd)

model.fit(X_train, Y_train, batch_size=32, nb_epoch=1)

5. 使用LSTM的序列分類

from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.layers import Embedding
from keras.layers import LSTM

model = Sequential()
model.add(Embedding(max_features, 256, input_length=maxlen))
model.add(LSTM(output_dim=128, activation='sigmoid', inner_activation='hard_sigmoid'))
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

model.fit(X_train, Y_train, batch_size=16, nb_epoch=10)
score = model.evaluate(X_test, Y_test, batch_size=16)

6. 用於序列分類的棧式LSTM

在該模型中，咱們將三個LSTM堆疊在一塊兒，是該模型可以學習更高層次的時域特徵表示。
開始的兩層LSTM返回其所有輸出序列，而第三層LSTM只返回其輸出序列的最後一步結果，從而其時域維度下降（即將輸入序列轉換爲單個向量）

from keras.models import Sequential
from keras.layers import LSTM, Dense
import numpy as np

data_dim = 16
timesteps = 8
nb_classes = 10

# expected input data shape: (batch_size, timesteps, data_dim)
model = Sequential()
model.add(LSTM(32, return_sequences=True,
               input_shape=(timesteps, data_dim)))  # returns a sequence of vectors of dimension 32
model.add(LSTM(32, return_sequences=True))  # returns a sequence of vectors of dimension 32
model.add(LSTM(32))  # return a single vector of dimension 32
model.add(Dense(10, activation='softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

# generate dummy training data
x_train = np.random.random((1000, timesteps, data_dim))
y_train = np.random.random((1000, nb_classes))

# generate dummy validation data
x_val = np.random.random((100, timesteps, data_dim))
y_val = np.random.random((100, nb_classes))

model.fit(x_train, y_train,
          batch_size=64, nb_epoch=5,
          validation_data=(x_val, y_val))

7. 採用狀態LSTM的相同模型

狀態（stateful）LSTM的特色是，在處理過一個batch的訓練數據後，其內部狀態（記憶）會被做爲下一個batch的訓練數據的初始狀態。狀態LSTM使得咱們能夠在合理的計算複雜度內處理較長序列

from keras.models import Sequential
from keras.layers import LSTM, Dense
import numpy as np

data_dim = 16
timesteps = 8
nb_classes = 10
batch_size = 32

# expected input batch shape: (batch_size, timesteps, data_dim)
# note that we have to provide the full batch_input_shape since the network is stateful.
# the sample of index i in batch k is the follow-up for the sample i in batch k-1.
model = Sequential()
model.add(LSTM(32, return_sequences=True, stateful=True,
               batch_input_shape=(batch_size, timesteps, data_dim)))
model.add(LSTM(32, return_sequences=True, stateful=True))
model.add(LSTM(32, stateful=True))
model.add(Dense(10, activation='softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

# generate dummy training data
x_train = np.random.random((batch_size * 10, timesteps, data_dim))
y_train = np.random.random((batch_size * 10, nb_classes))

# generate dummy validation data
x_val = np.random.random((batch_size * 3, timesteps, data_dim))
y_val = np.random.random((batch_size * 3, nb_classes))

model.fit(x_train, y_train,
          batch_size=batch_size, nb_epoch=5,
          validation_data=(x_val, y_val))

8. 將兩個LSTM合併做爲編碼端來處理兩路序列的分類

兩路輸入序列經過兩個LSTM被編碼爲特徵向量
兩路特徵向量被串連在一塊兒，而後經過一個全鏈接網絡獲得結果

from keras.models import Sequential
from keras.layers import Merge, LSTM, Dense
import numpy as np

data_dim = 16
timesteps = 8
nb_classes = 10

encoder_a = Sequential()
encoder_a.add(LSTM(32, input_shape=(timesteps, data_dim)))

encoder_b = Sequential()
encoder_b.add(LSTM(32, input_shape=(timesteps, data_dim)))

decoder = Sequential()
decoder.add(Merge([encoder_a, encoder_b], mode='concat'))
decoder.add(Dense(32, activation='relu'))
decoder.add(Dense(nb_classes, activation='softmax'))

decoder.compile(loss='categorical_crossentropy',
                optimizer='rmsprop',
                metrics=['accuracy'])

# generate dummy training data
x_train_a = np.random.random((1000, timesteps, data_dim))
x_train_b = np.random.random((1000, timesteps, data_dim))
y_train = np.random.random((1000, nb_classes))

# generate dummy validation data
x_val_a = np.random.random((100, timesteps, data_dim))
x_val_b = np.random.random((100, timesteps, data_dim))
y_val = np.random.random((100, nb_classes))

decoder.fit([x_train_a, x_train_b], y_train,
            batch_size=64, nb_epoch=5,
            validation_data=([x_val_a, x_val_b], y_val))

Relevant Link:

http://www.jianshu.com/p/9dc9f41f0b29
http://keras-cn.readthedocs.io/en/latest/getting_started/sequential_model/

4. 泛型模型

Keras泛型模型接口是用戶定義多輸出模型、非循環有向模型或具備共享層的模型等複雜模型的途徑

1. 層對象接受張量爲參數，返回一個張量。張量在數學上只是數據結構的擴充，一階張量就是向量，二階張量就是矩陣，三階張量就是立方體。在這裏張量只是廣義的表達一種數據結構，例如一張彩色圖像其實就是一個三階張量(每一階都是one-hot向量)，它由三個通道的像素值堆疊而成。而10000張彩色圖構成的一個數據集合則是四階張量。
2. 輸入是張量，輸出也是張量的一個框架就是一個模型
3. 這樣的模型能夠被像Keras的Sequential同樣被訓練

例如這個全鏈接網絡

from keras.layers import Input, Dense
from keras.models import Model

# this returns a tensor
inputs = Input(shape=(784,))

# a layer instance is callable on a tensor, and returns a tensor
x = Dense(64, activation='relu')(inputs)
x = Dense(64, activation='relu')(x)
predictions = Dense(10, activation='softmax')(x)

# this creates a model that includes
# the Input layer and three Dense layers
model = Model(input=inputs, output=predictions)
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
model.fit(data, labels)  # starts training

0x1: 全部的模型都是可調用的，就像層同樣

利用泛型模型的接口，咱們能夠很容易的重用已經訓練好的模型：你能夠把模型看成一個層同樣，經過提供一個tensor來調用它。注意當你調用一個模型時，你不只僅重用了它的結構，也重用了它的權重

x = Input(shape=(784,))
# this works, and returns the 10-way softmax we defined above.
y = model(x)

這種方式能夠容許你快速的建立能處理序列信號的模型，你能夠很快將一個圖像分類的模型變爲一個對視頻分類的模型，只須要一行代碼：

from keras.layers import TimeDistributed

# input tensor for sequences of 20 timesteps,
# each containing a 784-dimensional vector
input_sequences = Input(shape=(20, 784))

# this applies our previous model to every timestep in the input sequences.
# the output of the previous model was a 10-way softmax,
# so the output of the layer below will be a sequence of 20 vectors of size 10.
processed_sequences = TimeDistributed(model)(input_sequences)

0x2: 多輸入和多輸出模型

使用泛型模型的一個典型場景是搭建多輸入、多輸出的模型。
考慮這樣一個模型。咱們但願預測Twitter上一條新聞會被轉發和點贊多少次。模型的主要輸入是新聞自己，也就是一個詞語的序列。但咱們還能夠擁有額外的輸入，如新聞發佈的日期等。這個模型的損失函數將由兩部分組成，輔助的損失函數評估僅僅基於新聞自己作出預測的狀況，主損失函數評估基於新聞和額外信息的預測的狀況，即便來自主損失函數的梯度發生彌散，來自輔助損失函數的信息也可以訓練Embeddding和LSTM層。在模型中早點使用主要的損失函數是對於深度網絡的一個良好的正則方法。總而言之，該模型框圖以下：

讓咱們用泛型模型來實現這個框圖
主要的輸入接收新聞自己，即一個整數的序列（每一個整數編碼了一個詞）。這些整數位於1到10，000之間（即咱們的字典有10，000個詞）。這個序列有100個單詞

from keras.layers import Input, Embedding, LSTM, Dense, merge
from keras.models import Model

# headline input: meant to receive sequences of 100 integers, between 1 and 10000.
# note that we can name any layer by passing it a "name" argument.
main_input = Input(shape=(100,), dtype='int32', name='main_input')

# this embedding layer will encode the input sequence
# into a sequence of dense 512-dimensional vectors.
x = Embedding(output_dim=512, input_dim=10000, input_length=100)(main_input)

# a LSTM will transform the vector sequence into a single vector,
# containing information about the entire sequence
lstm_out = LSTM(32)(x)

而後，咱們插入一個額外的損失，使得即便在主損失很高的狀況下，LSTM和Embedding層也能夠平滑的訓練

auxiliary_output = Dense(1, activation='sigmoid', name='aux_output')(lstm_out)

再而後，咱們將LSTM與額外的輸入數據串聯起來組成輸入，送入模型中

auxiliary_input = Input(shape=(5,), name='aux_input')
x = merge([lstm_out, auxiliary_input], mode='concat')

# we stack a deep fully-connected network on top
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)

# and finally we add the main logistic regression layer
main_output = Dense(1, activation='sigmoid', name='main_output')(x)

最後，咱們定義整個2輸入，2輸出的模型：

model = Model(input=[main_input, auxiliary_input], output=[main_output, auxiliary_output])

模型定義完畢，下一步編譯模型。咱們給額外的損失賦0.2的權重。咱們能夠經過關鍵字參數loss_weights或loss來爲不一樣的輸出設置不一樣的損失函數或權值。這兩個參數都可爲Python的列表或字典。這裏咱們給loss傳遞單個損失函數，這個損失函數會被應用於全部輸出上

model.compile(optimizer='rmsprop', loss='binary_crossentropy',
              loss_weights=[1., 0.2])

編譯完成後，咱們經過傳遞訓練數據和目標值訓練該模型：

model.fit([headline_data, additional_data], [labels, labels],
          nb_epoch=50, batch_size=32)

由於咱們輸入和輸出是被命名過的（在定義時傳遞了「name」參數），咱們也能夠用下面的方式編譯和訓練模型：

model.compile(optimizer='rmsprop',
              loss={'main_output': 'binary_crossentropy', 'aux_output': 'binary_crossentropy'},
              loss_weights={'main_output': 1., 'aux_output': 0.2})

# and trained it via:
model.fit({'main_input': headline_data, 'aux_input': additional_data},
          {'main_output': labels, 'aux_output': labels},
          nb_epoch=50, batch_size=32)

0x3: 共享層

另外一個使用泛型模型的場合是使用共享層的時候
考慮微博數據，咱們但願創建模型來判別兩條微博是不是來自同一個用戶，這個需求一樣能夠用來判斷一個用戶的兩條微博的類似性。
一種實現方式是，咱們創建一個模型，它分別將兩條微博的數據映射到兩個特徵向量上，而後將特徵向量串聯並加一個logistic迴歸層，輸出它們來自同一個用戶的機率。這種模型的訓練數據是一對對的微博。
由於這個問題是對稱的，因此處理第一條微博的模型固然也能重用於處理第二條微博。因此這裏咱們使用一個共享的LSTM層來進行映射。
首先，咱們將微博的數據轉爲（140，256）的矩陣，即每條微博有140個字符，每一個單詞的特徵由一個256維的詞向量表示，向量的每一個元素爲1表示某個字符出現，爲0表示不出現，這是一個one-hot編碼

from keras.layers import Input, LSTM, Dense, merge
from keras.models import Model

tweet_a = Input(shape=(140, 256))
tweet_b = Input(shape=(140, 256))

若要對不一樣的輸入共享同一層，就初始化該層一次，而後屢次調用它

# this layer can take as input a matrix
# and will return a vector of size 64
shared_lstm = LSTM(64)

# when we reuse the same layer instance
# multiple times, the weights of the layer
# are also being reused
# (it is effectively *the same* layer)
encoded_a = shared_lstm(tweet_a)
encoded_b = shared_lstm(tweet_b)

# we can then concatenate the two vectors:
merged_vector = merge([encoded_a, encoded_b], mode='concat', concat_axis=-1)

# and add a logistic regression on top
predictions = Dense(1, activation='sigmoid')(merged_vector)

# we define a trainable model linking the
# tweet inputs to the predictions
model = Model(input=[tweet_a, tweet_b], output=predictions)

model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])
model.fit([data_a, data_b], labels, nb_epoch=10)

0x4: 層「節點」的概念

不管什麼時候，當你在某個輸入上調用層時，你就建立了一個新的張量（即該層的輸出），同時你也在爲這個層增長一個「（計算）節點」。這個節點將輸入張量映射爲輸出張量。當你屢次調用該層時，這個層就有了多個節點，其下標分別爲0，1，2...

0x5: 依舊是一些栗子

1. inception模型

from keras.layers import merge, Convolution2D, MaxPooling2D, Input

input_img = Input(shape=(3, 256, 256))

tower_1 = Convolution2D(64, 1, 1, border_mode='same', activation='relu')(input_img)
tower_1 = Convolution2D(64, 3, 3, border_mode='same', activation='relu')(tower_1)

tower_2 = Convolution2D(64, 1, 1, border_mode='same', activation='relu')(input_img)
tower_2 = Convolution2D(64, 5, 5, border_mode='same', activation='relu')(tower_2)

tower_3 = MaxPooling2D((3, 3), strides=(1, 1), border_mode='same')(input_img)
tower_3 = Convolution2D(64, 1, 1, border_mode='same', activation='relu')(tower_3)

output = merge([tower_1, tower_2, tower_3], mode='concat', concat_axis=1)

2. 卷積層的殘差鏈接(Residual Network)

from keras.layers import merge, Convolution2D, Input

# input tensor for a 3-channel 256x256 image
x = Input(shape=(3, 256, 256))
# 3x3 conv with 3 output channels(same as input channels)
y = Convolution2D(3, 3, 3, border_mode='same')(x)
# this returns x + y.
z = merge([x, y], mode='sum')

3. 共享視覺模型

該模型在兩個輸入上重用了圖像處理的模型，用來判別兩個MNIST數字是不是相同的數字

from keras.layers import merge, Convolution2D, MaxPooling2D, Input, Dense, Flatten
from keras.models import Model

# first, define the vision modules
digit_input = Input(shape=(1, 27, 27))
x = Convolution2D(64, 3, 3)(digit_input)
x = Convolution2D(64, 3, 3)(x)
x = MaxPooling2D((2, 2))(x)
out = Flatten()(x)

vision_model = Model(digit_input, out)

# then define the tell-digits-apart model
digit_a = Input(shape=(1, 27, 27))
digit_b = Input(shape=(1, 27, 27))

# the vision model will be shared, weights and all
out_a = vision_model(digit_a)
out_b = vision_model(digit_b)

concatenated = merge([out_a, out_b], mode='concat')
out = Dense(1, activation='sigmoid')(concatenated)

classification_model = Model([digit_a, digit_b], out)

4. 視覺問答模型(問題性圖像驗證碼)

在針對一幅圖片使用天然語言進行提問時，該模型可以提供關於該圖片的一個單詞的答案
這個模型將天然語言的問題和圖片分別映射爲特徵向量，將兩者合併後訓練一個logistic迴歸層，從一系列可能的回答中挑選一個。

from keras.layers import Convolution2D, MaxPooling2D, Flatten
from keras.layers import Input, LSTM, Embedding, Dense, merge
from keras.models import Model, Sequential

# first, let's define a vision model using a Sequential model.
# this model will encode an image into a vector.
vision_model = Sequential()
vision_model.add(Convolution2D(64, 3, 3, activation='relu', border_mode='same', input_shape=(3, 224, 224)))
vision_model.add(Convolution2D(64, 3, 3, activation='relu'))
vision_model.add(MaxPooling2D((2, 2)))
vision_model.add(Convolution2D(128, 3, 3, activation='relu', border_mode='same'))
vision_model.add(Convolution2D(128, 3, 3, activation='relu'))
vision_model.add(MaxPooling2D((2, 2)))
vision_model.add(Convolution2D(256, 3, 3, activation='relu', border_mode='same'))
vision_model.add(Convolution2D(256, 3, 3, activation='relu'))
vision_model.add(Convolution2D(256, 3, 3, activation='relu'))
vision_model.add(MaxPooling2D((2, 2)))
vision_model.add(Flatten())

# now let's get a tensor with the output of our vision model:
image_input = Input(shape=(3, 224, 224))
encoded_image = vision_model(image_input)

# next, let's define a language model to encode the question into a vector.
# each question will be at most 100 word long,
# and we will index words as integers from 1 to 9999.
question_input = Input(shape=(100,), dtype='int32')
embedded_question = Embedding(input_dim=10000, output_dim=256, input_length=100)(question_input)
encoded_question = LSTM(256)(embedded_question)

# let's concatenate the question vector and the image vector:
merged = merge([encoded_question, encoded_image], mode='concat')

# and let's train a logistic regression over 1000 words on top:
output = Dense(1000, activation='softmax')(merged)

# this is our final model:
vqa_model = Model(input=[image_input, question_input], output=output)

# the next stage would be training this model on actual data.

5. 視頻問答模型

在作完圖片問答模型後，咱們能夠快速將其轉爲視頻問答的模型。在適當的訓練下，你能夠爲模型提供一個短視頻（如100幀）而後向模型提問一個關於該視頻的問題，如「what sport is the boy playing？」->「football」

from keras.layers import TimeDistributed

video_input = Input(shape=(100, 3, 224, 224))
# this is our video encoded via the previously trained vision_model (weights are reused)
encoded_frame_sequence = TimeDistributed(vision_model)(video_input)  # the output will be a sequence of vectors
encoded_video = LSTM(256)(encoded_frame_sequence)  # the output will be a vector

# this is a model-level representation of the question encoder, reusing the same weights as before:
question_encoder = Model(input=question_input, output=encoded_question)

# let's use it to encode the question:
video_question_input = Input(shape=(100,), dtype='int32')
encoded_video_question = question_encoder(video_question_input)

# and this is our video question answering model:
merged = merge([encoded_video, encoded_video_question], mode='concat')
output = Dense(1000, activation='softmax')(merged)
video_qa_model = Model(input=[video_input, video_question_input], output=output)

Relevant Link:

http://wiki.jikexueyuan.com/project/tensorflow-zh/resources/dims_types.html

5. 經常使用層

0x1: Dense層

Dense就是經常使用的全鏈接層

keras.layers.core.Dense(
    output_dim, 
    init='glorot_uniform', 
    activation='linear', 
    weights=None, 
    W_regularizer=None, 
    b_regularizer=None, 
    activity_regularizer=None, 
    W_constraint=None, 
    b_constraint=None, 
    bias=True, 
    input_dim=None
)

1. output_dim：大於0的整數，表明該層的輸出維度。模型中非首層的全鏈接層其輸入維度能夠自動推斷，所以非首層的全鏈接定義時不須要指定輸入維度。
2. init：初始化方法，爲預約義初始化方法名的字符串，或用於初始化權重的Theano函數。該參數僅在不傳遞weights參數時纔有意義。
3. activation：激活函數，爲預約義的激活函數名（參考激活函數），或逐元素（element-wise）的Theano函數。若是不指定該參數，將不會使用任何激活函數（即便用線性激活函數：a(x)=x）
4. weights：權值，爲numpy array的list。該list應含有一個形如（input_dim,output_dim）的權重矩陣和一個形如(output_dim,)的偏置向量。
5. W_regularizer：施加在權重上的正則項，爲WeightRegularizer對象
6. b_regularizer：施加在偏置向量上的正則項，爲WeightRegularizer對象
7. activity_regularizer：施加在輸出上的正則項，爲ActivityRegularizer對象
8. W_constraints：施加在權重上的約束項，爲Constraints對象
9. b_constraints：施加在偏置上的約束項，爲Constraints對象
10. bias：布爾值，是否包含偏置向量（即層對輸入作線性變換仍是仿射變換）
11. input_dim：整數，輸入數據的維度。當Dense層做爲網絡的第一層時，必須指定該參數或input_shape參數。

after the first layer, you don't need to specify the size of the input anymore

0x2: Activation層

激活層對一個層的輸出施加激活函數

keras.layers.core.Activation(activation) 

activation：將要使用的激活函數，爲預約義激活函數名或一個Tensorflow/Theano的函數

0x3: Dropout層

爲輸入數據施加Dropout。Dropout將在訓練過程當中每次更新參數時隨機斷開必定百分比（p）的輸入神經元鏈接，Dropout層用於防止過擬合

keras.layers.core.Dropout(p) 

p：0~1的浮點數，控制須要斷開的連接的比例

0x4: Flatten層

Flatten層用來將輸入「壓平」，即把多維的輸入一維化，經常使用在從卷積層到全鏈接層的過渡。Flatten不影響batch的大小

keras.layers.core.Flatten() 

model = Sequential()
model.add(Convolution2D(64, 3, 3, border_mode='same', input_shape=(3, 32, 32)))
# now: model.output_shape == (None, 64, 32, 32)

model.add(Flatten())
# now: model.output_shape == (None, 65536)

0x5: Reshape層

Reshape層用來將輸入shape轉換爲特定的shape

keras.layers.core.Reshape(target_shape) 

target_shape：目標shape，爲整數的tuple，不包含樣本數目的維度（batch大小）  

# as first layer in a Sequential model
model = Sequential()
model.add(Reshape((3, 4), input_shape=(12,)))
# now: model.output_shape == (None, 3, 4)
# note: `None` is the batch dimension

# as intermediate layer in a Sequential model
model.add(Reshape((6, 2)))
# now: model.output_shape == (None, 6, 2)

0x6: Permute層
Permute層將輸入的維度按照給定模式進行重排，例如，當須要將RNN和CNN網絡鏈接時，可能會用到該層

keras.layers.core.Permute(dims) 

dims：整數tuple，指定重排的模式，不包含樣本數的維度。重排模式的下標從1開始。例如（2，1）表明將輸入的第二個維度重拍到輸出的第一個維度，而將輸入的第一個維度重排到第二個維度
 
model = Sequential()
model.add(Permute((2, 1), input_shape=(10, 64)))
# now: model.output_shape == (None, 64, 10)
# note: `None` is the batch dimension

0x7: RepeatVector層

RepeatVector層將輸入重複n次

keras.layers.core.RepeatVector(n) 

n：整數，重複的次數 

model = Sequential()
model.add(Dense(32, input_dim=32))
# now: model.output_shape == (None, 32)
# note: `None` is the batch dimension

model.add(RepeatVector(3))
# now: model.output_shape == (None, 3, 32)

0x8: Merge層

Merge層根據給定的模式，將一個張量列表中的若干張量合併爲一個單獨的張量

keras.engine.topology.Merge(
    layers=None, 
    mode='sum', 
    concat_axis=-1, 
    dot_axes=-1, 
    output_shape=None, 
    node_indices=None, 
    tensor_indices=None, 
    name=None
)

1. layers：該參數爲Keras張量的列表，或Keras層對象的列表。該列表的元素數目必須大於1。
2. mode：合併模式，爲預約義合併模式名的字符串或lambda函數或普通函數，若是爲lambda函數或普通函數，則該函數必須接受一個張量的list做爲輸入，並返回一個張量。若是爲字符串，則必須是下列值之一：
「sum」，「mul」，「concat」，「ave」，「cos」，「dot」
3. concat_axis：整數，當mode=concat時指定須要串聯的軸
4. dot_axes：整數或整數tuple，當mode=dot時，指定要消去的軸
5. output_shape：整數tuple或lambda函數/普通函數（當mode爲函數時）。若是output_shape是函數時，該函數的輸入值應爲一一對應於輸入shape的list，並返回輸出張量的shape。
6. node_indices：可選，爲整數list，若是有些層具備多個輸出節點（node）的話，該參數能夠指定須要merge的那些節點的下標。若是沒有提供，該參數的默認值爲全0向量，即合併輸入層0號節點的輸出值。
7. tensor_indices：可選，爲整數list，若是有些層返回多個輸出張量的話，該參數用以指定須要合併的那些張量

在進行merge的時候須要仔細思考採用哪一種鏈接方式，以及將哪一個軸進行merge，由於這會很大程度上影響神經網絡的訓練過程

0x9: Lambda層

本函數用以對上一層的輸出施以任何Theano/TensorFlow表達式

keras.layers.core.Lambda(
    function, 
    output_shape=None, 
    arguments={}
) 

1. function：要實現的函數，該函數僅接受一個變量，即上一層的輸出
2. output_shape：函數應該返回的值的shape，能夠是一個tuple，也能夠是一個根據輸入shape計算輸出shape的函數
3. arguments：可選，字典，用來記錄向函數中傳遞的其餘關鍵字參數

0x10: ActivityRegularizer層

通過本層的數據不會有任何變化，但會基於其激活值更新損失函數值

keras.layers.core.ActivityRegularization(l1=0.0, l2=0.0) 

l1：1範數正則因子（正浮點數）
l2：2範數正則因子（正浮點數）

0x11: Masking層

使用給定的值對輸入的序列信號進行「屏蔽」，用以定位須要跳過的時間步
對於輸入張量的時間步，即輸入張量的第1維度（維度從0開始算），若是輸入張量在該時間步上都等於mask_value，則該時間步將在模型接下來的全部層（只要支持masking）被跳過（屏蔽）。
若是模型接下來的一些層不支持masking，卻接受到masking過的數據，則拋出異常

考慮輸入數據x是一個形如(samples,timesteps,features)的張量，現將其送入LSTM層。由於你缺乏時間步爲3和5的信號，因此你但願將其掩蓋。這時候應該：

賦值x[:,3,:] = 0.，x[:,5,:] = 0.
在LSTM層以前插入mask_value=0.的Masking層
model = Sequential()
model.add(Masking(mask_value=0., input_shape=(timesteps, features)))
model.add(LSTM(32))

0x12: Highway層

Highway層創建全鏈接的Highway網絡，這是LSTM在前饋神經網絡中的推廣

keras.layers.core.Highway(
    init='glorot_uniform', 
    transform_bias=-2, 
    activation='linear', 
    weights=None, 
    W_regularizer=None, 
    b_regularizer=None, 
    activity_regularizer=None, 
    W_constraint=None, 
    b_constraint=None, 
    bias=True, 
    input_dim=None
)

output_dim：大於0的整數，表明該層的輸出維度。模型中非首層的全鏈接層其輸入維度能夠自動推斷，所以非首層的全鏈接定義時不須要指定輸入維度。
init：初始化方法，爲預約義初始化方法名的字符串，或用於初始化權重的Theano函數。該參數僅在不傳遞weights參數時有意義。
activation：激活函數，爲預約義的激活函數名（參考激活函數），或逐元素（element-wise）的Theano函數。若是不指定該參數，將不會使用任何激活函數（即便用線性激活函數：a(x)=x）
weights：權值，爲numpy array的list。該list應含有一個形如（input_dim,output_dim）的權重矩陣和一個形如(output_dim,)的偏置向量。
W_regularizer：施加在權重上的正則項，爲WeightRegularizer對象
b_regularizer：施加在偏置向量上的正則項，爲WeightRegularizer對象
activity_regularizer：施加在輸出上的正則項，爲ActivityRegularizer對象
W_constraints：施加在權重上的約束項，爲Constraints對象
b_constraints：施加在偏置上的約束項，爲Constraints對象
bias：布爾值，是否包含偏置向量（即層對輸入作線性變換仍是仿射變換）
input_dim：整數，輸入數據的維度。當該層做爲網絡的第一層時，必須指定該參數或input_shape參數。
transform_bias：用以初始化傳遞參數，默認爲-2（請參考文獻理解本參數的含義）

0x13: MaxoutDense層

全鏈接的Maxout層。MaxoutDense層以nb_features個Dense(input_dim,output_dim)線性層的輸出的最大值爲輸出。MaxoutDense可對輸入學習出一個凸的、分段線性的激活函數

Relevant Link:

https://keras-cn.readthedocs.io/en/latest/layers/core_layer/

6. 卷積層

數據輸入層: 對數據作一些處理，好比去均值(把輸入數據各個維度都中心化爲0，避免數據過多誤差，影響訓練效果)、歸一化(把全部的數據都歸一到一樣的範圍)、PCA/白化等等 

中間是
CONV: 卷積計算層，線性乘積 求和(內積)
RELU: 激勵層(激活函數)，用於把向量轉化爲一個"量值"，用於評估本輪參數的分類效果
POOL: 池化層，簡言之，即取區域平均或最大 

最右邊是
FC: 全鏈接層

0x0: CNN之卷積計算層

1. CNN核心概念: 濾波

在通訊領域中，濾波(Wave filtering)指的是將信號中特定波段頻率濾除的操做，是抑制和防止干擾的一項重要措施。在CNN圖像識別領域，指的是對圖像(不一樣的數據窗口數據)和濾波矩陣(一組固定的權重：由於每一個神經元的多個權重固定，因此又能夠看作一個恆定的濾波器filter)作內積(逐個元素相乘再求和)的操做就是所謂的"卷積"操做，也是卷積神經網絡的名字來源。
直觀上理解就是從一個區域(區域的大小就是filter濾波器的size)中抽取出"重要的細節"，而抽取的方法就是創建"區域權重"，根據區域權重把一個區域中的重點細節過濾出來
再直觀一些理解就是例如上圖的汽車圖像，濾波器要作的就是把其中的輪胎、車後視鏡、前臉輪廓、A柱形狀過濾出來，從邊緣細節的角度來看待一張非格式化的圖像
這種技術的理論基礎是學術界認爲人眼對圖像的識別也是分層的，人眼第一眼接收到的就是一個物理的輪廓細節，而後傳輸給大腦皮層，而後在輪廓細節的基礎上進一步抽象創建起對一個物理的總體感知

非嚴格意義上來說，上圖中紅框框起來的部分即可以理解爲一個濾波器，即帶着一組固定權重的神經元。多個濾波器疊加便成了卷積層

2. 圖像上的卷積

在下圖對應的計算過程當中，輸入是必定區域大小(width*height)的數據，和濾波器filter（帶着一組固定權重的神經元）作內積後等到新的二維數據。

具體來講，左邊是圖像輸入，中間部分就是濾波器filter（帶着一組固定權重的神經元），不一樣的濾波器filter會獲得不一樣的輸出數據，好比顏色深淺、輪廓。至關於若是想提取圖像的不一樣特徵，則用不一樣的濾波器filter，提取想要的關於圖像的特定信息：顏色深淺或輪廓

3. CNN濾波器

在CNN中，濾波器filter（帶着一組固定權重的神經元）對局部輸入數據進行卷積計算。每計算完一個數據窗口內的局部數據後，數據窗口不斷平移滑動，直到計算完全部數據

能夠看到

兩個神經元，即depth=2，意味着有兩個濾波器。
數據窗口每次移動兩個步長取3*3的局部數據，即stride=2。
zero-padding=1

而後分別以兩個濾波器filter爲軸滑動數組進行卷積計算，獲得兩組不一樣的結果。經過這種滑動窗口的濾波過程，逐步把圖像的各個細節信息提取出來(邊緣輪廓、圖像深淺)。值得注意的是

1. 局部感知機制
左邊數據在變化，每次濾波器都是針對某一局部的數據窗口進行卷積，這就是所謂的CNN中的局部感知機制。
打個比方，濾波器就像一雙眼睛，人類視角有限，一眼望去，只能看到這世界的局部。若是一眼就看到全世界，你會累死，並且一會兒接受全世界全部信息，你大腦接收不過來。固然，即使是看局部，針對局部裏的信息人類雙眼也是有偏重、偏好的。好比看美女，對臉、胸、腿是重點關注，因此這3個輸入的權重相對較大 

2. 參數(權重)共享機制
數據窗口滑動，致使輸入濾波器的數據在變化，但中間濾波器Filter w0的權重(即每一個神經元鏈接數據窗口的權重)是固定不變的，這個權重不變即所謂的CNN中的參數(權重)共享機制。
再打個比方，某人環遊全世界，所看到的信息在變，但採集信息的雙眼不變。一我的對景物的認知在必定時間段內是保持不變的，可是須要注意的是，這些權重也不是永遠不變的，隨着訓練的進行，權重會根據激活函數的判斷結果不斷調整網絡中的權重(這就是所謂的BP反向傳播算法)

4. CNN激勵層

經常使用的非線性激活函數有sigmoid、tanh、relu等等，前二者sigmoid/tanh比較常見於全鏈接層，後者relu常見於卷積層

激活函數sigmoid

其中z是一個線性組合，好比z能夠等於：b + * + *

橫軸表示定義域z，縱軸表示值域g(z)。sigmoid函數的功能是至關於把一個實數壓縮至0到1之間。當z是很是大的正數時，g(z)會趨近於1，而z是很是大的負數時，則g(z)會趨近於0
這樣一來即可以把激活函數看做一種「分類的機率」，好比激活函數的輸出爲0.9的話即可以解釋爲90%的機率爲正樣本

ReLU激勵層

ReLU的優勢是收斂快，求梯度簡單

5. CNN池化層

池化，簡言之，即取區域平均或最大

接下來拿一個真實的CNN網絡來解釋CNN的構造原理

1. Input layer of NxN pixels (N=32).
2. Convolutional layer (64 filter maps of size 11x11).
3. Max-pooling layer.
4. Densely-connected layer (4096 neurons)
5. Output layer. 9 neurons.

輸入圖像是一個32*32的圖像集，下面分別解釋數據在各層的維度變化

1. input layer: 32x32 neurons 
2. convolutional layer(64 filters, size 11x11): (32−11+1)∗(32−11+1) = 22∗22 = 484 for each feature map. As a result, the total output of the convolutional layer is 22∗22∗64 = 30976. 
3. pooling layer(2x2 regions): reduced to 11∗11∗64 = 7744.
4. fully-connected layer: 4096 neurons
5. output layer

The number of learnable parameters P of this network is:

P = 1024∗(11∗11∗64)+64+(11∗11∗64)∗4096+4096+4096∗9+9 = 39690313

咱們注意看你第二層的CNN層，它實際上能夠理解爲咱們對同一幅圖，根據不一樣的觀察重點(濾波窗口移動)獲得的不一樣細節視角的圖像

0x1: Convolution1D層

一維卷積層，用以在一維輸入信號上進行鄰域濾波。當使用該層做爲首層時，須要提供關鍵字參數input_dim或input_shape。例如input_dim=128長爲128的向量序列輸入，而input_shape=(10,128)表明一個長爲10的128向量序列(對於byte詞頻的代碼段特徵向量來講就是input_shape=(15000, 256))

keras.layers.convolutional.Convolution1D(
    nb_filter, 
    filter_length, 
    init='uniform', 
    activation='linear', 
    weights=None, 
    border_mode='valid', 
    subsample_length=1, 
    W_regularizer=None, 
    b_regularizer=None, 
    activity_regularizer=None, 
    W_constraint=None, 
    b_constraint=None, 
    bias=True, 
    input_dim=None, 
    input_length=None
)
 
1. nb_filter：卷積核的數目(即輸出的維度)(咱們能夠利用filter來減小CNN輸入層的維度，下降計算量)
2. filter_length：卷積核的空域或時域長度
3. init：初始化方法，爲預約義初始化方法名的字符串，或用於初始化權重的Theano函數。該參數僅在不傳遞weights參數時有意義。
4. activation：激活函數，爲預約義的激活函數名（參考激活函數），或逐元素（element-wise）的Theano函數。若是不指定該參數，將不會使用任何激活函數（即便用線性激活函數：a(x)=x）
5. weights：權值，爲numpy array的list。該list應含有一個形如（input_dim,output_dim）的權重矩陣和一個形如(output_dim,)的偏置向量。
6. border_mode：邊界模式，爲「valid」, 「same」 或「full」，full須要以theano爲後端
7. subsample_length：輸出對輸入的下采樣因子
8. W_regularizer：施加在權重上的正則項，爲WeightRegularizer對象
9. b_regularizer：施加在偏置向量上的正則項，爲WeightRegularizer對象
10. activity_regularizer：施加在輸出上的正則項，爲ActivityRegularizer對象
11. W_constraints：施加在權重上的約束項，爲Constraints對象
12. b_constraints：施加在偏置上的約束項，爲Constraints對象
13. bias：布爾值，是否包含偏置向量（即層對輸入作線性變換仍是仿射變換）
14. input_dim：整數，輸入數據的維度。當該層做爲網絡的第一層時，必須指定該參數或input_shape參數。
15. input_length：當輸入序列的長度固定時，該參數爲輸入序列的長度。當須要在該層後鏈接Flatten層，而後又要鏈接Dense層時，須要指定該參數，不然全鏈接的輸出沒法計算出來

example

# apply a convolution 1d of length 3 to a sequence with 10 timesteps,
# with 64 output filters
model = Sequential()
model.add(Convolution1D(64, 3, border_mode='same', input_shape=(10, 32)))
# now model.output_shape == (None, 10, 64)

# add a new conv1d on top
model.add(Convolution1D(32, 3, border_mode='same'))
# now model.output_shape == (None, 10, 32)

能夠將Convolution1D看做Convolution2D的快捷版，對例子中（10，32）的信號進行1D卷積至關於對其進行卷積核爲（filter_length, 32）的2D卷積

0x2: AtrousConvolution1D層

AtrousConvolution1D層用於對1D信號進行濾波，是膨脹/帶孔洞的卷積。當使用該層做爲首層時，須要提供關鍵字參數input_dim或input_shape。例如input_dim=128長爲128的向量序列輸入，而input_shape=(10,128)表明一個長爲10的128向量序列.

keras.layers.convolutional.AtrousConvolution1D(
    nb_filter, 
    filter_length, 
    init='uniform', 
    activation='linear', 
    weights=None, 
    border_mode='valid', 
    subsample_length=1, 
    atrous_rate=1, 
    W_regularizer=None, 
    b_regularizer=None, 
    activity_regularizer=None, 
    W_constraint=None, 
    b_constraint=None, 
    bias=True
)

nb_filter：卷積核的數目（即輸出的維度）
filter_length：卷積核的空域或時域長度
init：初始化方法，爲預約義初始化方法名的字符串，或用於初始化權重的Theano函數。該參數僅在不傳遞weights參數時有意義。
activation：激活函數，爲預約義的激活函數名（參考激活函數），或逐元素（element-wise）的Theano函數。若是不指定該參數，將不會使用任何激活函數（即便用線性激活函數：a(x)=x）
weights：權值，爲numpy array的list。該list應含有一個形如（input_dim,output_dim）的權重矩陣和一個形如(output_dim,)的偏置向量。
border_mode：邊界模式，爲「valid」，「same」或「full」，full須要以theano爲後端
subsample_length：輸出對輸入的下采樣因子
atrous_rate:卷積核膨脹的係數，在其餘地方也被稱爲'filter_dilation'
W_regularizer：施加在權重上的正則項，爲WeightRegularizer對象
b_regularizer：施加在偏置向量上的正則項，爲WeightRegularizer對象
activity_regularizer：施加在輸出上的正則項，爲ActivityRegularizer對象
W_constraints：施加在權重上的約束項，爲Constraints對象
b_constraints：施加在偏置上的約束項，爲Constraints對象
bias：布爾值，是否包含偏置向量（即層對輸入作線性變換仍是仿射變換）
input_dim：整數，輸入數據的維度。當該層做爲網絡的第一層時，必須指定該參數或input_shape參數。
input_length：當輸入序列的長度固定時，該參數爲輸入序列的長度。當須要在該層後鏈接Flatten層，而後又要鏈接Dense層時，須要指定該參數，不然全鏈接的輸出沒法計算出來。

example

# apply an atrous convolution 1d with atrous rate 2 of length 3 to a sequence with 10 timesteps,
# with 64 output filters
model = Sequential()
model.add(AtrousConvolution1D(64, 3, atrous_rate=2, border_mode='same', input_shape=(10, 32)))
# now model.output_shape == (None, 10, 64)

# add a new atrous conv1d on top
model.add(AtrousConvolution1D(32, 3, atrous_rate=2, border_mode='same'))
# now model.output_shape == (None, 10, 32)

0x3: Convolution2D層

二維卷積層對二維輸入進行滑動窗卷積，當使用該層做爲第一層時，應提供input_shape參數。例如input_shape = (3,128,128)表明128*128的彩色RGB圖像

keras.layers.convolutional.Convolution2D(
    nb_filter, 
    nb_row, 
    nb_col, 
    init='glorot_uniform', 
    activation='linear', 
    weights=None, 
    border_mode='valid', 
    subsample=(1, 1), 
    dim_ordering='th', 
    W_regularizer=None, 
    b_regularizer=None, 
    activity_regularizer=None, 
    W_constraint=None, 
    b_constraint=None, 
    bias=True
)

nb_filter：卷積核的數目
nb_row：卷積核的行數
nb_col：卷積核的列數
init：初始化方法，爲預約義初始化方法名的字符串，或用於初始化權重的Theano函數。該參數僅在不傳遞weights參數時有意義。
activation：激活函數，爲預約義的激活函數名（參考激活函數），或逐元素（element-wise）的Theano函數。若是不指定該參數，將不會使用任何激活函數（即便用線性激活函數：a(x)=x）
weights：權值，爲numpy array的list。該list應含有一個形如（input_dim,output_dim）的權重矩陣和一個形如(output_dim,)的偏置向量。
border_mode：邊界模式，爲「valid」，「same」或「full」，full須要以theano爲後端
subsample：長爲2的tuple，輸出對輸入的下采樣因子，更廣泛的稱呼是「strides」
W_regularizer：施加在權重上的正則項，爲WeightRegularizer對象
b_regularizer：施加在偏置向量上的正則項，爲WeightRegularizer對象
activity_regularizer：施加在輸出上的正則項，爲ActivityRegularizer對象
W_constraints：施加在權重上的約束項，爲Constraints對象
b_constraints：施加在偏置上的約束項，爲Constraints對象
dim_ordering：‘th’或‘tf’。‘th’模式中通道維（如彩色圖像的3通道）位於第1個位置（維度從0開始算），而在‘tf’模式中，通道維位於第3個位置。例如128*128的三通道彩色圖片，在‘th’模式中input_shape應寫爲（3，128，128），而在‘tf’模式中應寫爲（128，128，3），注意這裏3出如今第0個位置，由於input_shape不包含樣本數的維度，在其內部實現中，其實是（None，3，128，128）和（None，128，128，3）。默認是image_dim_ordering指定的模式，可在~/.keras/keras.json中查看，若沒有設置過則爲'tf'。
bias：布爾值，是否包含偏置向量（即層對輸入作線性變換仍是仿射變換）

example

# apply a 3x3 convolution with 64 output filters on a 256x256 image:
model = Sequential()
model.add(Convolution2D(64, 3, 3, border_mode='same', input_shape=(3, 256, 256)))
# now model.output_shape == (None, 64, 256, 256)

# add a 3x3 convolution on top, with 32 output filters:
model.add(Convolution2D(32, 3, 3, border_mode='same'))
# now model.output_shape == (None, 32, 256, 256)

0x3: AtrousConvolution2D層

該層對二維輸入進行Atrous卷積，也即膨脹卷積或帶孔洞的卷積。當使用該層做爲第一層時，應提供input_shape參數。例如input_shape = (3,128,128)表明128*128的彩色RGB圖像

Relevant Link:

https://keras-cn.readthedocs.io/en/latest/layers/convolutional_layer/
http://baike.baidu.com/item/%E6%BB%A4%E6%B3%A2
http://blog.csdn.net/v_july_v/article/details/51812459
http://cs231n.github.io/convolutional-networks/#overview
http://blog.csdn.net/stdcoutzyx/article/details/41596663

7. 池化層

0x1: MaxPooling1D層

對時域1D信號進行最大值池化

keras.layers.convolutional.MaxPooling1D(
    pool_length=2, 
    stride=None, 
    border_mode='valid'
)

pool_length：下采樣因子，如取2則將輸入下采樣到一半長度
stride：整數或None，步長值
border_mode：‘valid’或者‘same’

0x2: MaxPooling2D層

爲空域信號施加最大值池化

keras.layers.convolutional.MaxPooling2D(
    pool_size=(2, 2), 
    strides=None, 
    border_mode='valid', dim_ordering='th'
) 

1. pool_size：長爲2的整數tuple，表明在兩個方向（豎直，水平）上的下采樣因子，如取（2，2）將使圖片在兩個維度上均變爲原長的一半
2. strides：長爲2的整數tuple，或者None，步長值。
3. border_mode：‘valid’或者‘same’
4. dim_ordering：‘th’或‘tf’。‘th’模式中通道維（如彩色圖像的3通道）位於第1個位置（維度從0開始算），而在‘tf’模式中，通道維位於第3個位置。例如128*128的三通道彩色圖片，在‘th’模式中input_shape應寫爲（3，128，128），而在‘tf’模式中應寫爲（128，128，3），注意這裏3出如今第0個位置，由於input_shape不包含樣本數的維度，在其內部實現中，其實是（None，3，128，128）和（None，128，128，3）。默認是image_dim_ordering指定的模式，可在~/.keras/keras.json中查看，若沒有設置過則爲'tf'

0x3: AveragePooling1D層

對時域1D信號進行平均值池化

keras.layers.convolutional.AveragePooling1D(
    pool_length=2, 
    stride=None, 
    border_mode='valid'
) 

1. pool_length：下采樣因子，如取2則將輸入下采樣到一半長度
2. stride：整數或None，步長值
3. border_mode：‘valid’或者‘same’
注意，目前‘same’模式只能在TensorFlow做爲後端時使用

0x4: GlobalMaxPooling1D層

對於時間信號的全局最大池化

keras.layers.pooling.GlobalMaxPooling1D()

Relevant Link:

https://keras-cn.readthedocs.io/en/latest/layers/pooling_layer/

8. 遞歸層Recurrent

0x1: Recurrent層

這是遞歸層的抽象類，請不要在模型中直接應用該層（由於它是抽象類，沒法實例化任何對象）。請使用它的子類LSTM或SimpleRNN。
全部的遞歸層（LSTM,GRU,SimpleRNN）都服從本層的性質，並接受本層指定的全部關鍵字參數

keras.layers.recurrent.Recurrent(
    weights=None, 
    return_sequences=False, 
    go_backwards=False, 
    stateful=False, 
    unroll=False, 
    consume_less='cpu', 
    input_dim=None, 
    input_length=None
)

1. weights：numpy array的list，用以初始化權重。該list形如[(input_dim, output_dim),(output_dim, output_dim),(output_dim,)]
2. return_sequences：布爾值，默認False，控制返回類型。若爲True則返回整個序列，不然僅返回輸出序列的最後一個輸出
3. go_backwards：布爾值，默認爲False，若爲True，則逆向處理輸入序列
4. stateful：布爾值，默認爲False，若爲True，則一個batch中下標爲i的樣本的最終狀態將會用做下一個batch一樣下標的樣本的初始狀態。
5. unroll：布爾值，默認爲False，若爲True，則遞歸層將被展開，不然就使用符號化的循環。當使用TensorFlow爲後端時，遞歸網絡原本就是展開的，所以該層不作任何事情。層展開會佔用更多的內存，但會加速RNN的運算。層展開只適用於短序列。
6. consume_less：‘cpu’或‘mem’之一。若設爲‘cpu’，則RNN將使用較少、較大的矩陣乘法來實現，從而在CPU上會運行更快，但會更消耗內存。若是設爲‘mem’，則RNN將會較多的小矩陣乘法來實現，從而在GPU並行計算時會運行更快（但在CPU上慢），並佔用較少內存。
7. input_dim：輸入維度，當使用該層爲模型首層時，應指定該值（或等價的指定input_shape)
8. input_length：當輸入序列的長度固定時，該參數爲輸入序列的長度。當須要在該層後鏈接Flatten層，而後又要鏈接Dense層時，須要指定該參數，不然全鏈接的輸出沒法計算出來。注意，若是遞歸層不是網絡的第一層，你須要在網絡的第一層中指定序列的長度，如經過input_shape指定。

0x2: SimpleRNN層

全鏈接RNN網絡，RNN的輸出會被回饋到輸入

keras.layers.recurrent.SimpleRNN(
    output_dim, 
    init='glorot_uniform', 
    inner_init='orthogonal', 
    activation='tanh', 
    W_regularizer=None, 
    U_regularizer=None, 
    b_regularizer=None, 
    dropout_W=0.0, 
    dropout_U=0.0
)

output_dim：內部投影和輸出的維度
init：初始化方法，爲預約義初始化方法名的字符串，或用於初始化權重的Theano函數。
inner_init：內部單元的初始化方法
activation：激活函數，爲預約義的激活函數名（參考激活函數）
W_regularizer：施加在權重上的正則項，爲WeightRegularizer對象
U_regularizer：施加在遞歸權重上的正則項，爲WeightRegularizer對象
b_regularizer：施加在偏置向量上的正則項，爲WeightRegularizer對象
dropout_W：0~1之間的浮點數，控制輸入單元到輸入門的鏈接斷開比例
dropout_U：0~1之間的浮點數，控制輸入單元到遞歸鏈接的斷開比例

0x3: GRU層

門限遞歸單元

keras.layers.recurrent.GRU(
    output_dim, 
    init='glorot_uniform', 
    inner_init='orthogonal', 
    activation='tanh', 
    inner_activation='hard_sigmoid', 
    W_regularizer=None, 
    U_regularizer=None, 
    b_regularizer=None, 
    dropout_W=0.0, 
    dropout_U=0.0
)

output_dim：內部投影和輸出的維度
init：初始化方法，爲預約義初始化方法名的字符串，或用於初始化權重的Theano函數。
inner_init：內部單元的初始化方法
activation：激活函數，爲預約義的激活函數名（參考激活函數）
inner_activation：內部單元激活函數
W_regularizer：施加在權重上的正則項，爲WeightRegularizer對象
U_regularizer：施加在遞歸權重上的正則項，爲WeightRegularizer對象
b_regularizer：施加在偏置向量上的正則項，爲WeightRegularizer對象
dropout_W：0~1之間的浮點數，控制輸入單元到輸入門的鏈接斷開比例
dropout_U：0~1之間的浮點數，控制輸入單元到遞歸鏈接的斷開比例

0x4: LSTM層

Keras長短時間記憶模型

keras.layers.recurrent.LSTM(
    output_dim, 
    init='glorot_uniform', 
    inner_init='orthogonal', 
    forget_bias_init='one', 
    activation='tanh', 
    inner_activation='hard_sigmoid', 
    W_regularizer=None, 
    U_regularizer=None, 
    b_regularizer=None, 
    dropout_W=0.0, 
    dropout_U=0.0
)

output_dim：內部投影和輸出的維度
init：初始化方法，爲預約義初始化方法名的字符串，或用於初始化權重的Theano函數。
inner_init：內部單元的初始化方法
forget_bias_init：遺忘門偏置的初始化函數，Jozefowicz et al.建議初始化爲全1元素
activation：激活函數，爲預約義的激活函數名（參考激活函數）
inner_activation：內部單元激活函數
W_regularizer：施加在權重上的正則項，爲WeightRegularizer對象
U_regularizer：施加在遞歸權重上的正則項，爲WeightRegularizer對象
b_regularizer：施加在偏置向量上的正則項，爲WeightRegularizer對象
dropout_W：0~1之間的浮點數，控制輸入單元到輸入門的鏈接斷開比例
dropout_U：0~1之間的浮點數，控制輸入單元到遞歸鏈接的斷開比例

Relevant Link:

https://keras-cn.readthedocs.io/en/latest/layers/recurrent_layer/

9. 嵌入層 Embedding

0x1: Embedding層

嵌入層將正整數（下標）轉換爲具備固定大小的向量，如[[4],[20]]->[[0.25,0.1],[0.6,-0.2]]。是一種數字化->向量化的編碼方式，使用Embedding須要輸入的特徵向量具有空間關聯性
Embedding層只能做爲模型的第一層

keras.layers.embeddings.Embedding(
    input_dim, 
    output_dim, 
    init='uniform', 
    input_length=None, 
    W_regularizer=None, 
    activity_regularizer=None, 
    W_constraint=None, 
    mask_zero=False, 
    weights=None, 
    dropout=0.0
)

input_dim：大或等於0的整數，字典長度，即輸入數據最大下標+1
output_dim：大於0的整數，表明全鏈接嵌入的維度
init：初始化方法，爲預約義初始化方法名的字符串，或用於初始化權重的Theano函數。該參數僅在不傳遞weights參數時有意義。
weights：權值，爲numpy array的list。該list應僅含有一個如（input_dim,output_dim）的權重矩陣
W_regularizer：施加在權重上的正則項，爲WeightRegularizer對象
W_constraints：施加在權重上的約束項，爲Constraints對象
mask_zero：布爾值，肯定是否將輸入中的‘0’看做是應該被忽略的‘填充’（padding）值，該參數在使用遞歸層處理變長輸入時有用。設置爲True的話，模型中後續的層必須都支持masking，不然會拋出異常
input_length：當輸入序列的長度固定時，該值爲其長度。若是要在該層後接Flatten層，而後接Dense層，則必須指定該參數，不然Dense層的輸出維度沒法自動推斷。
dropout：0~1的浮點數，表明要斷開的嵌入比例

Relevant Link:

https://keras-cn.readthedocs.io/en/latest/layers/embedding_layer/

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。