Deep learning with Python 學習筆記（8）

時間 2019-12-01

標籤 deep learning python 學習筆記欄目 Python 简体版

原文原文鏈接

Keras 函數式編程

利用 Keras 函數式 API，你能夠構建類圖（graph-like）模型、在不一樣的輸入之間共享某一層，而且還能夠像使用 Python 函數同樣使用 Keras 模型。Keras 回調函數和 TensorBoard 基於瀏覽器的可視化工具，讓你能夠在訓練過程當中監控模型html

對於多輸入模型、多輸出模型和類圖模型，只用 Keras 中的 Sequential模型類是沒法實現的。這時可使用另外一種更加通用、更加靈活的使用 Keras 的方式，就是函數式API（functional API）python

使用函數式 API，你能夠直接操做張量，也能夠把層看成函數來使用，接收張量並返回張量（所以得名函數式 API）算法

一個簡單示例編程

from keras.models import Sequential, Model
from keras import layers
from keras import Input

input_tensor = Input(shape=(64,))
x = layers.Dense(32, activation='relu')(input_tensor)
x = layers.Dense(32, activation='relu')(x)
output_tensor = layers.Dense(10, activation='softmax')(x)
model = Model(input_tensor, output_tensor)
model.summary()

上述使用了函數式編程，模型對應的Sequential表示以下瀏覽器

model = Sequential()
model.add(layers.Dense(32, activation='relu', input_shape=(64, )))
model.add(layers.Dense(32, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))

即
網絡

在將Model對象實例化的時候，只須要使用一個輸入張量和一個輸出張量，Keras 會在後臺檢索從 input_tensor 到 output_tensor 所包含的每一層，並將這些層組合成一個類圖的數據結構，即一個 Model。固然，這種方法有效的緣由在於，output_tensor 是經過對 input_tensor 進行屢次變換獲得的。若是你試圖利用不相關的輸入和輸出來構建一個模型，那麼會獲得 RuntimeError數據結構

函數式 API 可用於構建具備多個輸入的模型。一般狀況下，這種模型會在某一時刻用一個能夠組合多個張量的層將不一樣的輸入分支合併，張量組合方式多是相加、鏈接等。這一般利用 Keras 的合併運算來實現，好比 keras.layers.add、keras.layers.concatenate 等架構

一個多輸入模型示例app

典型的問答模型有兩個輸入：一個天然語言描述的問題和一個文本片斷後者提供用於回答問題的信息。而後模型要生成一個回答，在最簡單的狀況下，這個回答只包含一個詞，能夠經過對某個預約義的詞表作 softmax 獲得
dom

from keras.models import Model
from keras import layers
from keras import Input
import numpy as np
import keras.utils
import tools

num_samples = 1000
max_length = 100
text_vocabulary_size = 10000
question_vocabulary_size = 10000
answer_vocabulary_size = 500
# 模型
text_input = Input(shape=(None,), dtype='int32', name='text')
embedded_text = layers.Embedding(text_vocabulary_size, 64)(text_input)
encoded_text = layers.LSTM(32)(embedded_text)
question_input = Input(shape=(None,), dtype='int32', name='question')
embedded_question = layers.Embedding(question_vocabulary_size, 32)(question_input)
encoded_question = layers.LSTM(16)(embedded_question)
concatenated = layers.concatenate([encoded_text, encoded_question], axis=-1)
answer = layers.Dense(answer_vocabulary_size, activation='softmax')(concatenated)
model = Model([text_input, question_input], answer)
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['acc'])
model.summary()
# 訓練方法
text = np.random.randint(1, text_vocabulary_size, size=(num_samples, max_length))
question = np.random.randint(1, question_vocabulary_size, size=(num_samples, max_length))
answers = np.random.randint(answer_vocabulary_size, size=(num_samples))
answers = keras.utils.to_categorical(answers, answer_vocabulary_size)
history = model.fit([text, question], answers, epochs=10, batch_size=128)
# model.fit({'text': text, 'question': question}, answers, epochs=10, batch_size=128)
tools.draw_acc_and_loss(history)

tools.draw_acc_loss(history)

def draw_acc_and_loss(history):
    acc = history.history['acc']
    loss = history.history['loss']
    epochs = range(1, len(loss) + 1)
    plt.figure()
    plt.plot(epochs, acc, 'b', label='Training acc')
    plt.title('Training acc')
    plt.legend()
    plt.show()

    plt.plot(epochs, loss, 'b', label='Training loss')
    plt.title('Training loss')
    plt.legend()
    plt.show()

模型

沒什麼用的結果acc和loss

再進行訓練應該會將結果向好的方向優化，233
將epochs更改成50後的結果

利用相同的方法，咱們還可使用函數式 API 來構建具備多個輸出（或多頭）的模型，如下將輸入某個匿名人士的一系列社交媒體發帖，而後嘗試預測那我的的屬性，好比年齡、性別和收入水平

當使用多輸出模型時，咱們能夠對網絡的各個頭指定不一樣的損失函數，例如，年齡預測是標量回歸任務，而性別預測是二分類任務，兩者須要不一樣的訓練過程。可是，梯度降低要求將一個標量最小化，因此爲了可以訓練模型，咱們必須將這些損失合併爲單個標量。合併不一樣損失最簡單的方法就是對全部損失求和。在 Keras 中，你能夠在編譯時使用損失組成的列表或字典來爲不一樣輸出指定不一樣損失，而後將獲得的損失值相加獲得一個全局損失，並在訓練過程當中將這個損失最小化

當咱們爲各個頭指定不一樣的損失函數的時候，嚴重不平衡的損失貢獻會致使模型表示針對單個損失值最大的任務優先進行優化，而不考慮其餘任務的優化。爲了解決這個問題，咱們能夠爲每一個損失值對最終損失的貢獻分配不一樣大小的重要性。好比，用於年齡迴歸任務的均方偏差（MSE）損失值一般在 3~5 左右，而用於性別分類任務的交叉熵，損失值可能低至 0.1。在這種狀況下，爲了平衡不一樣損失的貢獻，咱們可讓交叉熵損失的權重取 10，而 MSE 損失的權重取 0.5

模型概要

from keras import layers
from keras import Input
from keras.models import Model

vocabulary_size = 50000
num_income_groups = 10
# 輸入設置
posts_input = Input(shape=(None,), dtype='int32', name='posts')
embedded_posts = layers.Embedding(256, vocabulary_size)(posts_input)
# 一維卷積神經網絡
x = layers.Conv1D(128, 5, activation='relu')(embedded_posts)
x = layers.MaxPooling1D(5)(x)
x = layers.Conv1D(256, 5, activation='relu')(x)
x = layers.Conv1D(256, 5, activation='relu')(x)
x = layers.MaxPooling1D(5)(x)
x = layers.Conv1D(256, 5, activation='relu')(x)
x = layers.Conv1D(256, 5, activation='relu')(x)
x = layers.GlobalMaxPooling1D()(x)
x = layers.Dense(128, activation='relu')(x)
# 預測設置  
age_prediction = layers.Dense(1, name='age')(x) 
income_prediction = layers.Dense(num_income_groups, activation='softmax', name='income')(x)
gender_prediction = layers.Dense(1, activation='sigmoid', name='gender')(x)
# 網絡整合
model = Model(posts_input, [age_prediction, income_prediction, gender_prediction])
# 網絡輸出設置
# 爲損失取不一樣的權重
model.compile(optimizer='rmsprop', 
    loss=['mse', 'categorical_crossentropy', 'binary_crossentropy'], 
    loss_weights=[0.25, 1., 10.])  
# 爲損失取不一樣的權重的等價表達式
'''
model.compile(optimizer='rmsprop', loss={'age': 'mse',
        'income': 'categorical_crossentropy',
        'gender': 'binary_crossentropy'}, 
    loss_weights={'age': 0.25,
        'income': 1., 
        'gender': 10.})
'''
# 將數據就喂入網絡  
model.fit(posts, [age_targets, income_targets, gender_targets],
 epochs=10, batch_size=64)  
# 將數據喂入網絡的等價表達式  
'''
model.fit(posts, {'age': age_targets,
    'income': income_targets,
    'gender': gender_targets},
    epochs=10, batch_size=64)
'''

利用函數式 API，咱們不只能夠構建多輸入和多輸出的模型，並且還能夠實現具備複雜的內部拓撲結構的網絡。Keras 中的神經網絡能夠是層組成的任意有向無環圖（directed acyclic graph）。無環（acyclic）這個限定詞很重要，即這些圖不能有循環，即，張量 x 不能成爲生成 x 的某一層的輸入。惟一容許的處理循環（即循環鏈接）是循環層的內部循環

使用Keras實現Inception 3一個模塊

假設咱們有一個四維輸入張量 x

from keras import layers


branch_a = layers.Conv2D(128, 1, activation='relu', strides=2)(x) 

branch_b = layers.Conv2D(128, 1, activation='relu')(x) 
branch_b = layers.Conv2D(128, 3, activation='relu', strides=2)(branch_b)

branch_c = layers.AveragePooling2D(3, strides=2)(x) 
branch_c = layers.Conv2D(128, 3, activation='relu')(branch_c)

branch_d = layers.Conv2D(128, 1, activation='relu')(x)
branch_d = layers.Conv2D(128, 3, activation='relu')(branch_d)
branch_d = layers.Conv2D(128, 3, activation='relu', strides=2)(branch_d)

output = layers.concatenate([branch_a, branch_b, branch_c, branch_d], axis=-1)

完整的Inception V3架構內置於Keras中，位置在keras.applications.inception_v3.InceptionV3，其中包括在 ImageNet 數據集上預訓練獲得的權重

殘差鏈接是讓前面某層的輸出做爲後面某層的輸入，從而在序列網絡中有效地創造了一條捷徑。前面層的輸出沒有與後面層的激活鏈接在一塊兒，而是與後面層的激活相加（這裏假設兩個激活的形狀相同）。若是它們的形狀不一樣，咱們能夠用一個線性變換將前面層的激活改變成目標形狀

若是特徵圖的尺寸相同，在 Keras 中實現殘差鏈接的方法以下，用的是恆等殘差鏈接（identity residual connection）。一樣假設咱們有一個四維輸入張量 x

from keras import layers


x = ...
# 對 x 進行變換
y = layers.Conv2D(128, 3, activation='relu', padding='same')(x) 
y = layers.Conv2D(128, 3, activation='relu', padding='same')(y)
y = layers.Conv2D(128, 3, activation='relu', padding='same')(y)
# 將原始 x 與輸出特徵相加
y = layers.add([y, x])

若是特徵圖的尺寸不一樣，實現殘差鏈接的方法以下，用的是線性殘差鏈接（linear residual connection）。依舊假設咱們有一個四維輸入張量 x

from keras import layers


x = ...
y = layers.Conv2D(128, 3, activation='relu', padding='same')(x)
y = layers.Conv2D(128, 3, activation='relu', padding='same')(y)
y = layers.MaxPooling2D(2, strides=2)(y)
# 使用 1×1 卷積，將原始 x 張量線性下采樣爲與 y 具備相同的形狀
residual = layers.Conv2D(128, 1, strides=2, padding='same')(x) 
y = layers.add([y, residual])

函數式 API 還有一個重要特性，那就是可以屢次重複使用一個層實例。若是你對一個層實例調用兩次，而不是每次調用都實例化一個新層，那麼每次調用能夠重複使用相同的權重。這樣你能夠構建具備共享分支的模型，即幾個分支全都共享相同的知識並執行相同的運算。也就是說，這些分支共享相同的表示，並同時對不一樣的輸入集合學習這些表示

from keras import layers
from keras import Input
from keras.models import Model
# 將一個 LSTM 層實例化一次
lstm = layers.LSTM(32) 

left_input = Input(shape=(None, 128)) 
left_output = lstm(left_input)

right_input = Input(shape=(None, 128)) 
# 調用已有的層實例，那麼就會重複使用它的權重
right_output = lstm(right_input)

merged = layers.concatenate([left_output, right_output], axis=-1) 
predictions = layers.Dense(1, activation='sigmoid')(merged)
model = Model([left_input, right_input], predictions) 
model.fit([left_data, right_data], targets)

在函數式 API 中，能夠像使用層同樣使用模型。實際上，你能夠將模型看做「更大的層」。Sequential 類和Model 類都是如此。這意味着你能夠在一個輸入張量上調用模型，並獲得一個輸出張量

y = model(x)

若是模型具備多個輸入張量和多個輸出張量，那麼應該用張量列表來調用模型

y1, y2 = model([x1, x2])

在調用模型實例時，就是在重複使用模型的權重，正如在調用層實例時，就是在重複使用層的權重。調用一個實例，不管是層實例仍是模型實例，都會重複使用這個實例已經學到的表示

在 Keras 中實現連體視覺模型（共享卷積基）

from keras import layers
from keras import applications
from keras import Input


# 圖像處理基礎模型是Xception 網絡（只包括卷積基）
xception_base = applications.Xception(weights=None, include_top=False) 

# 輸入250*250RGB圖像
left_input = Input(shape=(250, 250, 3)) 
left_features = xception_base(left_input) 

right_input = Input(shape=(250, 250, 3))
# 對相同的視覺模型調用第二次
right_input = xception_base(right_input)

merged_features = layers.concatenate([left_features, right_input], axis=-1)

注：

1*1 卷積

咱們已經知道，卷積可以在輸入張量的每個方塊周圍提取空間圖塊，並對全部圖塊應用相同的變換。極端狀況是提取的圖塊只包含一個方塊。這時卷積運算等價於讓每一個方塊向量通過一個 Dense 層：它計算獲得的特徵可以將輸入張量通道中的信息混合在一塊兒，但不會將跨空間的信息混合在一塊兒（由於它一次只查看一個方塊）。這種 1×1 卷積［也叫做逐點卷積（pointwise convolution）］是 Inception 模塊的特點，它有助於區分開通道特徵學習和空間特徵學習。若是你假設每一個通道在跨越空間時是高度自相關的，但不一樣的通道之間可能並不高度相關，那麼這種作法是很合理的

深度學習中的表示瓶頸

在 Sequential 模型中，每一個連續的表示層都構建於前一層之上，這意味着它只能訪問前一層激活中包含的信息。若是某一層過小（好比特徵維度過低），那麼模型將會受限於該層激活中可以塞入多少信息。殘差鏈接能夠將較早的信息從新注入到下游數據中，從而部分解決了深度學習模型的這一問題

深度學習中的梯度消失

反向傳播是用於訓練深度神經網絡的主要算法，其工做原理是未來自輸出損失的反饋信號向下傳播到更底部的層。若是這個反饋信號的傳播須要通過不少層，那麼信號可能會變得很是微弱，甚至徹底丟失，致使網絡沒法訓練。這個問題被稱爲梯度消失（vanishing gradient）  

深度網絡中存在這個問題，在很長序列上的循環網絡也存在這個問題。在這兩種狀況下，反饋信號的傳播都必須經過一長串操做。LSTM 層引入了一個攜帶軌道（carry track），能夠在與主處理軌道平行的軌道上傳播信息。殘差鏈接在前饋深度網絡中的工做原理與此相似，但它更加簡單：它引入了一個純線性的信息攜帶軌道，與主要的層堆疊方向平行，從而有助於跨越任意深度的層來傳播梯度

Deep learning with Python 學習筆記（9）
Deep learning with Python 學習筆記（7）