神經網絡基礎

時間 2019-12-07

標籤神經網絡基礎简体版

原文原文鏈接

要想入門以及往下理解深度學習，其中一些概念多是沒法避免地須要你理解一番，好比：html

什麼是感知器
什麼是神經網絡
張量以及運算
微分
梯度降低

帶着問題出發

在開始以前但願你有一點機器學習方面的知識，解決問題的前提是提出問題，咱們提出這樣一個問題，對MNIST數據集進行分析，而後在解決問題的過程當中一步一步地來捋清楚其中涉及到的概念python

MNIST數據集是一份手寫字訓練集，出自MNIST，相信你對它不會陌生，它是機器學習領域的一個經典數據集，感受任意一個教程都拿它來講事，不過這也側面證實了這個數據集的經典，這裏簡單介紹一下：git

擁有60,000個示例的訓練集，以及10,000個示例的測試集
圖片都由一個28 ×28 的矩陣表示，每張圖片都由一個784 維的向量表示
圖片分爲10類，分別對應從0～9，共10個阿拉伯數字

壓縮包內容以下：github

train-images-idx3-ubyte.gz: training set images (9912422 bytes)
train-labels-idx1-ubyte.gz: training set labels (28881 bytes)
t10k-images-idx3-ubyte.gz: test set images (1648877 bytes)
t10k-labels-idx1-ubyte.gz: test set labels (4542 bytes)

上圖：算法

圖片生成代碼以下：apache

%matplotlib inline

import matplotlib
import matplotlib.pyplot as plt
import numpy as np

from keras.datasets import mnist

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

def plot_digits(instances, images_per_row=10, **options):
    size = 28
    images_per_row = min(len(instances), images_per_row)
    images = instances
    n_rows = (len(instances) - 1) // images_per_row + 1
    row_images = []
    n_empty = n_rows * images_per_row - len(instances)
    images.append(np.zeros((size, size * n_empty)))
    for row in range(n_rows):
        rimages = images[row * images_per_row : (row + 1) * images_per_row]
        row_images.append(np.concatenate(rimages, axis=1))
    image = np.concatenate(row_images, axis=0)
    plt.imshow(image, cmap = matplotlib.cm.binary, **options)
    plt.axis("off")

plt.figure(figsize=(9,9))
plot_digits(train_images[:100], images_per_row=10)
plt.show()

不過你不用急着嘗試，接下來咱們能夠一步一步慢慢來分析手寫字訓練集數組

看這一行代碼：網絡

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

MNIST數據集經過keras.datasets加載，其中train_images和train_labels構成了訓練集，另外兩個則是測試集：app

train_images.shape: (60000, 28, 28)
train_labels.shape: (60000,)

咱們要作的事情很簡單，將訓練集丟到神經網絡裏面去，訓練後生成了咱們指望的神經網絡模型，而後模型再對測試集進行預測，咱們只須要判斷預測的數字是否是正確的便可機器學習

在用代碼構建一個神經網絡以前，我先簡單介紹一下到底什麼是神經網絡，讓咱們從感知器開始

感知器

感知器是Frank Rosenblatt提出的一個由兩層神經元組成的人工神經網絡，它的出如今當時但是引發了轟動，由於感知器是首個能夠學習的神經網絡

感知器的工做方式以下所示：

左側三個變量分別表示三個不一樣的二進制輸入，output則是一個二進制輸出，對於多種輸入，可能有的輸入成立有的不成立，在這麼多輸入的影響下，該如何判斷輸出output呢？Rosenblatt引入了權重來表示相應輸入的重要性

此時，output能夠表示爲：

上面右側的式子是一個階躍函數，就是和Sigmoid、Relu同樣做用的激活函數，而後咱們就能夠本身實現一個感知器：

import numpy as np


class Perceptron:
    """
    代碼實現 Frank Rosenblatt 提出的感知器的與非門，加深對感知器的理解
    blog: https://www.howie6879.cn/post/33/
    """

    def __init__(self, act_func, input_nums=2):
        """
        實例化一些基本參數
        :param act_func: 激活函數
        """
        # 激活函數
        self.act_func = act_func
        # 權重 已經肯定只會有兩個二進制輸入
        self.w = np.zeros(input_nums)
        # 偏置項
        self.b = 0.0

    def fit(self, input_vectors, labels, learn_nums=10, rate=0.1):
        """
        訓練出合適的 w 和 b
        :param input_vectors: 樣本訓練數據集
        :param labels: 標記值
        :param learn_nums: 學習多少次
        :param rate: 學習率
        """
        for i in range(learn_nums):
            for index, input_vector in enumerate(input_vectors):
                label = labels[index]
                output = self.predict(input_vector)
                delta = label - output
                self.w += input_vector * rate * delta
                self.b += rate * delta
        print("此時感知器權重爲{0}，偏置項爲{1}".format(self.w, self.b))
        return self

    def predict(self, input_vector):
        if isinstance(input_vector, list):
            input_vector = np.array(input_vector)
        return self.act_func(sum(self.w * input_vector) + self.b)


def f(z):
    """
    激活函數
    :param z: (w1*x1+w2*x2+...+wj*xj) + b
    :return: 1 or 0
    """
    return 1 if z > 0 else 0

def get_and_gate_training_data():
    '''
    AND 訓練數據集
    '''
    input_vectors = np.array([[1, 1], [1, 0], [0, 1], [0, 0]])
    labels = np.array([1, 0, 0, 0])
    return input_vectors, labels


if __name__ == '__main__':
    """
    輸出以下：
        此時感知器權重爲[ 0.1  0.2]，偏置項爲-0.2 與門
        1 and 1 = 1
        1 and 0 = 0
        0 and 1 = 0
        0 and 0 = 0
    """
    # 獲取樣本數據
    and_input_vectors, and_labels = get_and_gate_training_data()
    # 實例化感知器模型
    p = Perceptron(f)
    # 開始學習 AND
    p_and = p.fit(and_input_vectors, and_labels)
    # 開始預測 AND
    print('1 and 1 = %d' % p_and.predict([1, 1]))
    print('1 and 0 = %d' % p_and.predict([1, 0]))
    print('0 and 1 = %d' % p_and.predict([0, 1]))
    print('0 and 0 = %d' % p_and.predict([0, 0]))

S型神經元

神經元和感知器本質上是同樣的，他們的區別在於激活函數不一樣，好比躍遷函數改成Sigmoid函數

神經網絡能夠經過樣本的學習來調整人工神經元的權重和偏置，從而使輸出的結果更加準確，那麼怎樣給⼀個神經⽹絡設計這樣的算法呢？

以數字識別爲例，假設⽹絡錯誤地把⼀個9的圖像分類爲8，咱們可讓權重和偏置作些⼩的改動，從而達到咱們須要的結果9，這就是學習。對於感知器，咱們知道，其返還的結果不是0就是1，極可能出現這樣一個狀況，咱們好不容易將一個目標，好比把9的圖像分類爲8調整回原來正確的分類，可此時的閾值和偏置會形成其餘樣本的判斷失誤，這樣的調整不是一個好的方案

因此，咱們須要S型神經元，由於S型神經元返回的是[0,1]之間的任何實數，這樣的話權重和偏置的微⼩改動只會引發輸出的微⼩變化，此時的output能夠表示爲σ(w⋅x+b)，而σ就是S型函數，S型函數中S指的是Sigmoid函數，定義以下：

神經網絡

神經網絡其實就是按照必定規則鏈接起來的多個神經元，一個神經網絡由如下組件構成：

輸入層：接受傳遞數據，這裏應該是 784 個神經元
隱藏層：發掘出特徵
各層之間的權重：自動學習出來
每一個隱藏層都會有一個精心設計的激活函數，好比Sigmoid、Relu激活函數
輸出層，10個輸出
上⼀層的輸出做爲下⼀層的輸⼊，信息老是向前傳播，從不反向回饋：前饋神經網絡
有迴路，其中反饋環路是可⾏的：遞歸神經網絡

從輸入層傳入手寫字訓練集，而後經過隱藏層向前傳遞訓練集數據，最後輸出層會輸出10個機率值，總和爲1。如今，咱們能夠看看Keras代碼:

第一步，對數據進行預處理，咱們知道，本來數據形狀是(60000, 28, 28)，取值區間爲[0, 255]，如今改成[0, 1]：

train_images = train_images.reshape((60000, 28 * 28)) 
train_images = train_images.astype('float32') / 255

test_images = test_images.reshape((10000, 28 * 28)) 
test_images = test_images.astype('float32') / 255

而後對標籤進行分類編碼：

from keras.utils import to_categorical

train_labels = to_categorical(train_labels) 
test_labels = to_categorical(test_labels)

第二步，編寫模型：

from keras import models 
from keras import layers

network = models.Sequential() 
network.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,))) 
network.add(layers.Dense(10, activation='softmax')
            
network.compile(optimizer='rmsprop',loss='categorical_crossentropy', metrics=['accuracy'])
network.fit(train_images, train_labels, epochs=5, batch_size=128)

一個隱藏層，激活函數選用relu，輸出層使用softmax返回一個由10個機率值（總和爲 1）組成的數組

訓練過程當中顯示了兩個數字：一個是網絡在訓練數據上的損失loss，另外一個是網絡在訓練數據上的精度acc

很簡單，咱們構建和訓練一個神經網絡，就這麼幾行代碼，之因此寫的這麼剪短，是由於keras接接口封裝地比較好用，可是裏面的理論知識咱們仍是須要好好研究下

神經網絡的數據表示

TensorFlow裏面的Tensor是張量的意思，上面例子裏面存儲在多維Numpy數組中的數據就是張量：張量是數據容器，矩陣就是二維張量，張量是矩陣向任意維度的推廣，張量的維度稱爲軸

標量

包含一個數字的張量叫作標量（0D張量），以下：

x = np.array(12)
print(x, x.ndim)
# 12, 0

張量軸的個數也叫作階(rank)

向量

數字組成的數組叫作向量（1D張量），以下：

x = np.array([12, 3, 6, 14, 7])
print(x, x.ndim)
# [12  3  6 14  7] 1

矩陣

向量組成的數組叫作矩陣（2D張量），以下：

x = np.array([[5, 78, 2, 34, 0], [6, 79, 3, 35, 1], [7, 80, 4, 36, 2]])
print(x, x.ndim)
# [[ 5 78  2 34  0]
# [ 6 79  3 35  1]
# [ 7 80  4 36  2]] 2

3D張量與更高維張量

將多個矩陣組合成一個新的數組就是一個3D張量，以下：

x = np.array([[[5, 78, 2, 34, 0], [6, 79, 3, 35, 1]], [[5, 78, 2, 34, 0], [6, 79, 3, 35, 1]], [[5, 78, 2, 34, 0], [6, 79, 3, 35, 1]]])
print(x, x.ndim)
# (array([[[ 5, 78,  2, 34,  0],
#          [ 6, 79,  3, 35,  1]],
#  
#         [[ 5, 78,  2, 34,  0],
#          [ 6, 79,  3, 35,  1]],
#  
#         [[ 5, 78,  2, 34,  0],
#          [ 6, 79,  3, 35,  1]]]), 3)

將多個3D張量組合成一個數組，能夠建立一個4D張量

關鍵屬性

張量是由如下三個關鍵屬性來定義：

軸的個數：3D張量三個軸，矩陣兩個軸
形狀：是一個整數元祖，好比前面矩陣爲(3, 5)，向量(5,)，3D張量爲(3, 2, 5)
數據類型

在Numpy中操做張量

之前面加載的train_images爲：

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

好比進行切片選擇10~100個數字：

train_images[10:100].shape
# (90, 28, 28)

數據批量的概念

深度學習模型會將數據集隨機分割成小批量進行處理，好比：

batch = train_images[:128]
batch.shape
# (128, 28, 28)

現實世界的數據張量

下面將介紹下現實世界中數據的形狀：

向量數據：2D張量，(samples, features)
時間序列數據或者序列數據：3D張量，(samples, timesteps, features)
圖像：4D張量，(samples, height, width, channels) 或 (samples, channels, height, width)
視頻：5D張量，(samples, frames, height, width, channels) 或 (samples, frames, channels, height, width)

張量運算

相似於計算機程序的計算能夠轉化爲二進制計算，深度學習計算能夠轉化爲數值數據張量上的一些張量運算(tensor operation)

上面模型的隱藏層代碼以下：

keras.layers.Dense(512, activation='relu')

這一層能夠理解爲一個函數，輸入一個2D張量，輸出一個2D張量，就如同上面感知機那一節最後輸出的計算函數：

output = relu(dot(W, input) + b)

逐元素計算

Relu 和加法運算都是逐元素的運算，好比：

# 輸入示例
input_x = np.array([[2], [3], [1]])
# 權重
W = np.array([[5, 6, 1], [7, 8, 1]])
# 計算輸出 z
z = np.dot(W, input_x)

# 實現激活函數
def naive_relu(x):
    assert len(x.shape) == 2
    x = x.copy()
    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            x[i, j] = max(x[i, j], 0) 
    return x

# 激活函數對應的輸出
output = naive_relu(z)
output

廣播

張量運算那節中，有這樣一段代碼：

output = relu(dot(W, input) + b)

dot(W, input)是2D張量，b是向量，兩個形狀不一樣的張量相加，會發生什麼？

若是沒有歧義的話，較小的張量會被廣播，用來匹配較大張量的形狀：

input_x = np.array([[1], [3]])
# 權重
W = np.array([[5, 6], [7, 8]])
b = np.array([1])
# 計算輸出 z
z = np.dot(W, input_x) + b
# array([[24],
#        [32]])

張量點積

點積運算，也叫張量積，如：

import numpy as np

# 輸入示例
input_x = np.array([[2], [3], [1]])
# 權重
W = np.array([[5, 6, 1], [7, 8, 1]])
np.dot(W, input_x)

兩個向量之間的點積是一個標量：

def naive_vector_dot(x, y):
    assert len(x.shape) == 1
    assert len(y.shape) == 1 
    assert x.shape[0] == y.shape[0]
    z = 0.
    for i in range(x.shape[0]):
        z += x[i] * y[i] 
    return z

x = np.array([1,2])
y = np.array([1,2])

naive_vector_dot(x, y)

# 5.0

矩陣和向量點積後是一個向量：

np.dot(W, [1, 2, 3])
# array([20, 26])

張量變形

前面對數據進行預處理的時候：

train_images = train_images.reshape((60000, 28 * 28)) 
train_images = train_images.astype('float32') / 255

上面的例子將輸入數據的shape變成了(60000, 784)，張量變形指的就是改變張量的行和列，獲得想要的形狀，先後數據集個數不變，常常遇到一個特殊的張量變形是轉置(transposition)，以下：

x = np.zeros((300, 20))
x = np.transpose(x)
x.shape
# (20, 300)

梯度優化

針對每一個輸入，神經網絡都會經過下面的函數對輸入數據進行變換：

output = relu(dot(W, input_x) + b)

其中：

relu：激活函數
W：是一個張量，表示權重，第一步能夠取較小的隨機值進行隨機初始化
b：是一個張量，表示偏置

如今咱們須要一個算法來讓咱們找到權重和偏置，從而使得y=y(x)能夠擬合樣本輸入的x

再回到感知器

感知器學習的過程就是其中權重和偏置不斷調優更新的過程，其中的偏置能夠理解成輸入爲1的權重值，那麼權重是怎麼更新的呢？

首先，介紹一個概念，損失函數，引用李航老師統計學習方法書中的一個解釋：

監督學習問題是在假設空間中選取模型f做爲決策函數，對於給定的輸入X，由f(X)給出相應的輸出Y，這個輸出的預測值f(X)與真實值Y可能一致也可能不一致，用一個損失函數（loss function）或代價函數（cost function）來度量預測錯誤的程度，損失函數是f(X)和Y的非負實值函數，記做L(Y,f(X))

其中模型f(X)關於訓練數據集的平均損失，咱們稱之爲：經驗風險（empirical risk），上述的權重調整，就是在不斷地讓經驗風險最小，求出最好的模型f(X)，咱們暫時不考慮正則化，此時咱們經驗風險的最優化的目標函數就是：

求解出此目標函數最小時對應的權重值，就是咱們感知器裏面對應的權重值，在推導以前，咱們還得明白兩個概念：

什麼是導數
什麼是梯度

什麼是導數

假設有一個連續的光滑函數f(x) = y，什麼是函數連續性？指的是x的微小變化只能致使y的微小變化。

假設f(x)上的兩點a,b足夠接近，那麼a,b能夠近似爲一個線性函數，此時他們斜率爲k，那麼能夠說斜率k是f在b點的導數

總之，導數描述了改變x後f(x)會如何變化，若是你但願減少f(x)的值，只須要將x沿着導數的反方向移動一小步便可，反之亦然

什麼是梯度

梯度是張量運算的導數，是導數這一律念向多元函數導數的推廣，它指向函數值上升最快的方向，函數值降低最快的方向天然就是梯度的反方向

隨機梯度降低

推導過程以下：

感知器代碼裏面的這段:

self.w += input_vector * rate * delta

就對應上面式子裏面推導出來的規則

總結

再來看看所有的手寫字識別模型代碼：

from keras import models 
from keras import layers
from keras.utils import to_categorical

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

train_images = train_images.reshape((60000, 28 * 28)) 
train_images = train_images.astype('float32') / 255

test_images = test_images.reshape((10000, 28 * 28)) 
test_images = test_images.astype('float32') / 255

train_labels = to_categorical(train_labels) 
test_labels = to_categorical(test_labels)


network = models.Sequential() 
network.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,))) 
network.add(layers.Dense(10, activation='softmax'))

network.compile(optimizer='rmsprop',loss='categorical_crossentropy', metrics=['accuracy'])
network.fit(train_images, train_labels, epochs=5, batch_size=128)

test_loss, test_acc = network.evaluate(test_images, test_labels)
print('test_acc:', test_acc)

輸入數據保存在float32格式的Numpy張量中，形狀分別是(60000, 784)和(10000, 784)
神經網絡結構爲：1個輸入層、一個隱藏層、一個輸出層
categorical_crossentropy是針對分類模型的損失函數
每批128個樣本，共迭代5次，一共更新(469 * 5) = 2345次

說明

對本文有影響的書籍文章以下，感謝他們的付出：

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。