[深度學習][學習筆記]個人第一個神經網絡：Hello World！

時間 2019-11-13

標籤深度學習筆記個人第一個神經網絡 hello world 简体版

原文原文鏈接

在上篇博文中，我大略介紹了一下神經網絡究竟是什麼，但願能幫助沒有接觸過這個科技領域的同窗對神經網絡有一個感性的認知。本篇將經過簡單的數學推論和 Python 代碼實現來解釋神經網絡最基本的兩個要素：node

感知器（Perceptron）
梯度降低（Gradient Descent）

並在最後實現一個深度神經網絡。爲了方便解釋基礎運算過程，在本篇中我將僅使用numpy來進行數學運算。python

1. 感知器

神經網絡最多見的使用場景便是給予必定輸入信息後，可以處理信息，而後給出一個結果做爲輸出，這種輸出多是預測、分類結果或其它。在神經網絡執行此類任務時，輸入信息（一般是特徵值）將被帶入到一個相互鏈接的節點網絡中。這些獨立的節點被稱做感知器或者神經元，它們是構成神經網絡的基本單元。每一個感知器依照輸入數據來決定如何對數據分類。算法

以學校招生爲例，如下是某學校對於往屆申請學生的招收狀況：json

咱們則能夠在已知一個學生的高考分數和情商測試的狀況下，預測其是否會被這所學校錄取。根據往屆信息看來，一個學生是否會被錄取由高考分數和情商測試兩個因素共同決定。這兩個因素並無任何一項對結果起決定性影響，而是各自佔有必定權重（Weight）。假設咱們已知這兩個因素的各自的權重，則用來進行次學校錄取預測的神經網絡結構可能是：bash

當特徵數據被輸入感知器，它會與分配給這個特定輸入的權重相乘。例如，上圖感知器有兩個輸入，test和 iq，因此它有兩個與之相關的權重，而且能夠分別調整。一個較大的權重意味着神經網絡認爲這個輸入比其它輸入更重要，較小的權重意味着數據不是那麼重要。一個極端的例子是，若是 test 成績對學生錄取沒有影響，那麼 test 分數的權重就會是零，也就是說，它對感知器的輸出沒有影響。網絡

感知器把權重應用於輸入再加總的過程叫作線性組合。經過簡潔的數學表達方式即爲：app

經過以上的計算還不足以方便的預測出這個學生是否會被該學校錄取，感知器求和的結果須要被轉換成輸出信號才能輸出最終的結果。在這個例子中，輸出結果多是：dom

0：錄取
1：不錄取

這是須要經過把線性組合傳給激活函數 f 來實現的。一個簡單勝任的激活函數（activation function）能夠是：curl

爲了增長數學運算的功能完整性，這個公式中還將引入一個偏置項（bias）用來調整輸出信號的大小。最終咱們有了一個完整的感知器計算公式：函數

須要注意的是，在數據被整理得足夠"好"（咱們之後再聊聊怎樣預處理數據）的狀況下，咱們並不太須要偏置項。因此在後續的推導和代碼中，你可能看不到偏置項的存在，不要驚訝。

這裏給出感知器的Python實現樣例：

import numpy as np

def activation(h):
    if h <= 0:
        return 0
    else:
        return 1

inputs = np.array([0.7, 0.3])
weights = np.random.normal(loc=0.0, scale=1, size=(1, 2))
bias = np.random.normal(loc=0.0, scale=1, size=(1))

output = activation(np.dot(weights, inputs) + bias)

print('Output: {}'.format(output))

總結看來，單個感知器的結構能夠表示爲下圖左側。若是要解決以上的預測問題，神經網絡結構將不會如上圖示例同樣僅僅是一個感知器，而會是多個、多層感知器組合而成（下圖右側），一個感知器的輸出能夠變成另外一個感知器的輸入，通過多層運算後最終輸出結果。一次神經網絡預測運算將涉及其中全部感知器的運算，這一過程被稱爲正向傳播。

2. 梯度降低

在有了以上的感知器後，就能夠進行學生錄取狀況的預測工做了。可是不出意外的話，使用這樣的神經網絡並不能給出靠譜的預測，由於目前咱們並不知道各個輸入特徵的權重值（weight）。使用不靠譜的權重天然不會得出像樣的結果。好在咱們有不少現成的歷史數據，即咱們知道什麼樣的學生已經被錄取了，也知道什麼樣的學生沒有被錄取。咱們能夠將歷史數據的學生信息帶入神經網絡，看看咱們的神經網絡所產生的輸出結果和實際的結果有什麼不一樣。而後根據結果不一樣的對比來修正權重，如此下去神經網絡將變得愈來愈準確（hopefully）。

這個過程被稱爲神經網絡的訓練，而那些現有的真實數據被稱做訓練數據集。神經網絡剛被建立時，權重是隨機值。當神經網絡根據訓練數據集學習到什麼樣的輸入數據會致使什麼樣的輸出結果以後，網絡會根據以前權重下分類的錯誤來調整它們。

爲了作以上的騷操做，咱們須要理清兩件事情：

怎樣量化真實結果和網絡輸出結果的差距
知道差距以後又怎樣調整權重

關於輸出結果差距的量化，一個很直覺的方法即是把真實結果 y 和計算出的結果 y^ 相減。可是這樣並非最好的方法，由於這會帶來負數，不利於判斷差值的大小。在此，咱們用 y 和y^ 相減後的平方值來量化訓練時每一次預測計算的差值。則在神經網絡運行過全部的訓練數據後，差值的總和爲（爲何前面有個1/2?純粹爲了方便後面的演算）：

這個值被稱爲SSE（Sum of Squared Errors of prediction）。爲了使神經網絡有儘量好的表現，咱們但願SSE越小越好，由於SSE越小，神經網絡計算出的輸出結果也就越貼近事實。

接下來的問題就是怎樣調整權重和偏置項了。能夠從公式中看出，SSE的大小和輸入 x 和權重 w 相關。咱們並不能對輸入作什麼手腳，在此只能考慮怎樣對權重作出改動。爲了使說明更加清晰，在此單獨考慮一條數據記錄了計算以及相對應的那個輸出結果。假設SSE與權重 w 的關係以下圖。

若要使得SSE最小化，權重須要在每一個訓練迭代中不停作出調整，直到最終到達是SSE最小的值。這個過程便是梯度降低。

權重調整的大小與當前 w 位置的梯度值成反比，如下是一段公式推導：

δ在計算中被稱做error term，沒有實際意義，純粹爲了數學方便。η被稱爲學習速率（learning rate），由開發者自行設置，這個參數控制了權重變化的速度。正確設置這個參數在神經網絡的訓練中尤其重要，太低的學習速率會是的網絡須要花很長時間才能達到理想的準確率，過大的學習速率會使得網絡不停跳過權重的最佳值，使網絡準確率在訓練師波動頻繁，甚至徹底沒法達到理想卻實際上可能的最佳狀態。

如下爲梯度降低的Python實現樣例：

import numpy as np

# 這裏使用sigmoid做爲激活函數
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

np.random.seed(42)

n_records, n_features = features.shape
last_loss = None

weights = np.random.normal(scale=1 / n_features**.5, size=n_features)

epochs = 1000
learnrate = 0.5

for e in range(epochs):
    del_w = np.zeros(weights.shape)
    for x, y in zip(features.values, targets):
        # 公式的力量
        output = sigmoid(np.dot(x, weights))
        error = y - output
        error_term = error * output * (1 - output)
        del_w += error_term * x

    weights += learnrate * del_w / n_records
    
    if e % (epochs / 10) == 0:
        out = sigmoid(np.dot(features, weights))
        loss = np.mean((out - targets) ** 2)
        print("Train loss: ", loss)

tes_out = sigmoid(np.dot(features_test, weights))
predictions = tes_out > 0.5
accuracy = np.mean(predictions == targets_test)
print("Prediction accuracy: {:.3f}".format(accuracy))

與正向傳播相反，在複雜的網絡結構中，權重從最後一層（結果輸出）逐步向以前的網絡層級更新，這一過程便是反向傳播。雖然反向傳播的發明者、深度學習教父Geoffrey Hinton不久前指出目前的反向傳播算法有諸多缺陷，急需被取代。咱們在仰望大神們新的研究成果的同時，反向傳播還是當下最有效的學習手段。

3. 第一個神經網絡

如下是一個僅用numpy實現的包含一個隱藏層的神經網絡，激活函數分別是：

隱藏層：sigmoid
輸出層：linear

NeuralNetwork.py:

import numpy as np

class NeuralNetwork:
    def __init__(self, input_nodes, hidden_nodes, output_nodes, learning_rate,
            weights_input_to_hidden=None, weights_hidden_to_output=None):
        self.input_nodes = input_nodes
        self.hidden_nodes = hidden_nodes
        self.output_nodes = output_nodes

        # Initialize weights
        if type(weights_input_to_hidden).__name__ == 'NoneType' and type(weights_hidden_to_output).__name__ == 'NoneType':
            self.weights_input_to_hidden = np.random.normal(0.0, self.input_nodes**-0.5,
                                           (self.input_nodes, self.hidden_nodes))
            self.weights_hidden_to_output = np.random.normal(0.0, self.hidden_nodes**-0.5,
                                           (self.hidden_nodes, self.output_nodes))
        else:
            self.weights_input_to_hidden = weights_input_to_hidden
            self.weights_hidden_to_output = weights_hidden_to_output

        self.lr = learning_rate

        def sigmoid(x):
            return 1 / (1 + np.exp( -x ))

        def sigmoid_prime(x):
            return sigmoid(x) * (1 - sigmoid(x))

        def linear(x):
            return x

        def linear_prime(x):
            return x ** 0
        # Activation functions
        self.activation_function = sigmoid
        self.activation_function_prime = sigmoid_prime
        self.activation_function2 = linear
        self.activation_function_prime2 = linear_prime

    def train(self, features, targets):
        n_records = features.shape[0]
        delta_weights_i_h = np.zeros(self.weights_input_to_hidden.shape)
        delta_weights_h_o = np.zeros(self.weights_hidden_to_output.shape)

        for X, y in zip(features, targets):
            # Forward Pass
            hidden_inputs = np.dot(X, self.weights_input_to_hidden)
            hidden_outputs = self.activation_function(hidden_inputs)

            final_inputs = np.dot(hidden_outputs, self.weights_hidden_to_output)
            final_outputs = self.activation_function2(final_inputs)

            # Backward Pass
            error = y - final_outputs
            output_error_term = error * self.activation_function_prime2(final_outputs)

            hidden_error = np.dot(output_error_term, self.weights_hidden_to_output.T)
            hidden_error_term = hidden_error * self.activation_function_prime(hidden_inputs)

            # Weight steps
            delta_weights_i_h += hidden_error_term * X[:, None]
            delta_weights_h_o += output_error_term * hidden_outputs[:, None]

        self.weights_hidden_to_output += self.lr * delta_weights_h_o / n_records
        self.weights_input_to_hidden += self.lr * delta_weights_i_h / n_records

    def run(self, features):
        hidden_inputs = np.dot(features, self.weights_input_to_hidden)
        hidden_outputs = self.activation_function(hidden_inputs)

        final_inputs = np.dot(hidden_outputs, self.weights_hidden_to_output)
        final_outputs = self.activation_function2(final_inputs)

        return final_outputs

    def get_weights(self):
        return self.weights_input_to_hidden, self.weights_hidden_to_output

DataProcessor.py:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

class DataProcessor:
    def __init__(self, data_path):
        self.orig_data = pd.read_csv(data_path)
        self.data = self.orig_data
        self.scaled_features = {}
        self.train_features = None
        self.train_targets = None
        self.test_features = None
        self.test_targets = None
        self.test_data = None
        self.val_features = None
        self.val_targets = None

    def show_data(self, plot_by_dteday=False):
        print (self.data.head())
        if plot_by_dteday == True:
            self.data[:24*10].plot(x='dteday', y='cnt', title='Data for the first 10 days')
            plt.show()

    def virtualize(self):
        # Add virtualized data columns
        dummy_fields = ['season', 'weathersit', 'mnth', 'hr', 'weekday']
        for each in dummy_fields:
            dummies = pd.get_dummies(self.data[each], prefix=each, drop_first=False)
            self.data = pd.concat([self.data, dummies], axis=1)

        # Drop scale data columns
        fields_to_drop = ['instant', 'dteday', 'season', 'weathersit',
                          'weekday', 'atemp', 'mnth', 'workingday', 'hr']
        self.data = self.data.drop(fields_to_drop, axis=1)

    def normalize(self):
        quant_features = ['casual', 'registered', 'cnt', 'temp', 'hum', 'windspeed']
        for each in quant_features:
            mean, std = self.data[each].mean(), self.data[each].std()
            self.scaled_features[each] = [mean, std]
            self.data.loc[:, each] = (self.data[each] - mean) / std

    def split(self):
        # Save data of last 21 days for testing
        self.test_data = self.data[-21 * 24:]
        self.data = self.data[:-21 * 24]

        target_fields = ['cnt', 'casual', 'registered']
        features, targets = self.data.drop(target_fields, axis=1), self.data[target_fields]
        self.test_features, self.test_targets = self.test_data.drop(target_fields, axis=1), self.test_data[target_fields]
        self.train_features, self.train_targets = features[:-60*24], targets[:-60*24]
        self.val_features, self.val_targets = features[-60*24:], targets[-60*24:]

    def get_train_data(self):
        return self.train_features, self.train_targets

    def get_test_data(self):
        return self.test_features, self.test_targets, self.test_data

    def get_val_data(self):
        return self.val_features, self.val_targets

    def get_scaled_features(self):
        return self.scaled_features

    def get_orig_data(self):
        return self.orig_data

Train.py

import sys
import json
from pprint import pprint
import DataProcessor
import NeuralNetwork
import numpy as np
import matplotlib.pyplot as plt

# Get training parameters
with open('networkConfig.json') as config_file:
    config = json.load(config_file)
pprint(config)

iterations = config['iterations']
learning_rate = config['learning_rate']
hidden_nodes = config['hidden_nodes']
output_nodes = config['output_nodes']

# Get data
data_processor = DataProcessor.DataProcessor('Bike-Sharing-Dataset/hour.csv')
data_processor.virtualize()
data_processor.normalize()
data_processor.split()
train_features, train_targets = data_processor.get_train_data()
val_features, val_targets = data_processor.get_val_data()

# Initialize NeuralNetwork
N_i = train_features.shape[1]
network = NeuralNetwork.NeuralNetwork(N_i, hidden_nodes, output_nodes, learning_rate)


losses = {'train': [], 'validation': []}

def MSE(y, Y):
    return np.mean((y-Y)**2)

for ii in range(iterations):
    # pick 128 random records from training data set
    batch = np.random.choice(train_features.index, size=128)
    X, y = train_features.ix[batch].values, train_targets.ix[batch]['cnt']

    network.train(X, y)

    train_loss = MSE(network.run(train_features).T, train_targets['cnt'].values)
    val_loss = MSE(network.run(val_features).T, val_targets['cnt'].values)

    sys.stdout.write("\rProgress: {:2.1f}".format(100 * ii/float(iterations)) \
                     + "% ... Training loss: " + str(train_loss)[:5] \
                     + " ... Validation loss: " + str(val_loss)[:5])
    sys.stdout.flush()

    losses['train'].append(train_loss)
    losses['validation'].append(val_loss)

# Store weights
weights_input_to_hidden, weights_hidden_to_output = network.get_weights()
np.save('weights_input_to_hidden', weights_input_to_hidden)
np.save('weights_hidden_to_output', weights_hidden_to_output)

# Plot losses
plt.plot(losses['train'], label='Training loss')
plt.plot(losses['validation'], label='Validation loss')
plt.legend()
_ = plt.ylim()
plt.show()

Run.py

import json
from pprint import pprint
import DataProcessor
import NeuralNetwork
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Get training parameters
with open('networkConfig.json') as config_file:
    config = json.load(config_file)
pprint(config)

iterations = config['iterations']
learning_rate = config['learning_rate']
hidden_nodes = config['hidden_nodes']
output_nodes = config['output_nodes']

# Get data
data_processor = DataProcessor.DataProcessor('Bike-Sharing-Dataset/hour.csv')
data_processor.virtualize()
data_processor.normalize()
data_processor.split()
test_features, test_targets, test_data = data_processor.get_test_data()
scaled_features = data_processor.get_scaled_features()
orig_data = data_processor.get_orig_data()

mean, std = scaled_features['cnt']

# Initialize network
weights_input_to_hidden = np.load('weights_input_to_hidden.npy')
weights_hidden_to_output = np.load('weights_hidden_to_output.npy')
N_i = test_features.shape[1]
network = NeuralNetwork.NeuralNetwork(N_i, hidden_nodes, output_nodes, learning_rate,
            weights_input_to_hidden=weights_input_to_hidden,
            weights_hidden_to_output=weights_hidden_to_output)

# Run network prediction
predictions = network.run(test_features).T * std + mean

# Plot prediction and ground trueth
fig, ax = plt.subplots(figsize=(8, 4))
ax.plot(predictions[0], label='Prediction')
ax.plot((test_targets['cnt']*std + mean).values, label='Data')
ax.set_xlim(right=len(predictions))
ax.legend()

dates = pd.to_datetime(orig_data.ix[test_data.index]['dteday'])
dates = dates.apply(lambda d: d.strftime('%b %d'))
ax.set_xticks(np.arange(len(dates))[12::24])
_ = ax.set_xticklabels(dates[12::24], rotation=45)
plt.show()

networkConfig.json

{
  "iterations": 10000,
  "learning_rate": 0.1,
  "hidden_nodes": 7,
  "output_nodes": 1
}

4. 怎樣運行

4.0. 下載數據

> curl -O https://archive.ics.uci.edu/ml/machine-learning-databases/00275/Bike-Sharing-Dataset.zip
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  273k  100  273k    0     0  26888      0  0:00:10  0:00:10 --:--:-- 59889
> unzip Bike-Sharing-Dataset.zip
Archive:  Bike-Sharing-Dataset.zip
  inflating: Readme.txt
  inflating: day.csv
  inflating: hour.csv

使用這些數據，這個神經網絡在訓練以後將能夠預測租賃自行車的使用量。

4.1.0 查看數據原始數據

>>> from DataProcessor import DataProcessor as dp
>>> data_processor = dp('Bike-Sharing-Dataset/hour.csv')
>>> data_processor.show_data()
   instant      dteday  season  yr  mnth  hr  holiday  weekday  workingday  \
0        1  2011-01-01       1   0     1   0        0        6           0
1        2  2011-01-01       1   0     1   1        0        6           0
2        3  2011-01-01       1   0     1   2        0        6           0
3        4  2011-01-01       1   0     1   3        0        6           0
4        5  2011-01-01       1   0     1   4        0        6           0

   weathersit  temp   atemp   hum  windspeed  casual  registered  cnt
0           1  0.24  0.2879  0.81        0.0       3          13   16
1           1  0.22  0.2727  0.80        0.0       8          32   40
2           1  0.22  0.2727  0.80        0.0       5          27   32
3           1  0.24  0.2879  0.75        0.0       3          10   13
4           1  0.24  0.2879  0.75        0.0       0           1    1

4.1.1 查看虛擬化以後的數據

>>> data_processor.virtualize()
>>> data_processor.show_data()
   yr  holiday  temp   hum  windspeed  casual  registered  cnt  season_1  \
0   0        0  0.24  0.81        0.0       3          13   16         1
1   0        0  0.22  0.80        0.0       8          32   40         1
2   0        0  0.22  0.80        0.0       5          27   32         1
3   0        0  0.24  0.75        0.0       3          10   13         1
4   0        0  0.24  0.75        0.0       0           1    1         1

   season_2    ...      hr_21  hr_22  hr_23  weekday_0  weekday_1  weekday_2  \
0         0    ...          0      0      0          0          0          0
1         0    ...          0      0      0          0          0          0
2         0    ...          0      0      0          0          0          0
3         0    ...          0      0      0          0          0          0
4         0    ...          0      0      0          0          0          0

   weekday_3  weekday_4  weekday_5  weekday_6
0          0          0          0          1
1          0          0          0          1
2          0          0          0          1
3          0          0          0          1
4          0          0          0          1

[5 rows x 59 columns]

4.1.2 查看歸一化以後的數據

>>> data_processor.normalize()
>>> data_processor.show_data()
   yr  holiday      temp       hum  windspeed    casual  registered       cnt  \
0   0        0 -1.334609  0.947345  -1.553844 -0.662736   -0.930162 -0.956312
1   0        0 -1.438475  0.895513  -1.553844 -0.561326   -0.804632 -0.823998
2   0        0 -1.438475  0.895513  -1.553844 -0.622172   -0.837666 -0.868103
3   0        0 -1.334609  0.636351  -1.553844 -0.662736   -0.949983 -0.972851
4   0        0 -1.334609  0.636351  -1.553844 -0.723582   -1.009445 -1.039008

   season_1  season_2    ...      hr_21  hr_22  hr_23  weekday_0  weekday_1  \
0         1         0    ...          0      0      0          0          0
1         1         0    ...          0      0      0          0          0
2         1         0    ...          0      0      0          0          0
3         1         0    ...          0      0      0          0          0
4         1         0    ...          0      0      0          0          0

   weekday_2  weekday_3  weekday_4  weekday_5  weekday_6
0          0          0          0          0          1
1          0          0          0          0          1
2          0          0          0          0          1
3          0          0          0          0          1
4          0          0          0          0          1

[5 rows x 59 columns]

4.2. 訓練網絡

> python Train.py

在訓練以前，你可能想要自行調整一下networkConfig.json中的超參數：

{
  "iterations": 10000,
  "learning_rate": 0.1,
  "hidden_nodes": 7,
  "output_nodes": 1
}

訓練完成以後，你將看到：

隨着迭代次數的增長，loss的變化
兩個npy生成文件，這些便是模型數據，在運行網絡時會用到。

4.3. 運行網絡

在網絡訓練以後，便可運行網絡

> python Run.py

你將看到以下的圖，預測數據和實際數據的對比：

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。