在上篇博文中,我大略介紹了一下神經網絡究竟是什麼,但願能幫助沒有接觸過這個科技領域的同窗對神經網絡有一個感性的認知。本篇將經過簡單的數學推論和 Python 代碼實現來解釋神經網絡最基本的兩個要素:node
並在最後實現一個深度神經網絡。爲了方便解釋基礎運算過程,在本篇中我將僅使用numpy
來進行數學運算。python
神經網絡最多見的使用場景便是給予必定輸入信息後,可以處理信息,而後給出一個結果做爲輸出,這種輸出多是預測、分類結果或其它。在神經網絡執行此類任務時,輸入信息(一般是特徵值)將被帶入到一個相互鏈接的節點網絡中。這些獨立的節點被稱做感知器或者神經元,它們是構成神經網絡的基本單元。每一個感知器依照輸入數據來決定如何對數據分類。算法
以學校招生爲例,如下是某學校對於往屆申請學生的招收狀況:json
咱們則能夠在已知一個學生的高考分數和情商測試的狀況下,預測其是否會被這所學校錄取。根據往屆信息看來,一個學生是否會被錄取由高考分數和情商測試兩個因素共同決定。這兩個因素並無任何一項對結果起決定性影響,而是各自佔有必定權重(Weight)。假設咱們已知這兩個因素的各自的權重,則用來進行次學校錄取預測的神經網絡結構可能是:bash
當特徵數據被輸入感知器,它會與分配給這個特定輸入的權重相乘。例如,上圖感知器有兩個輸入,test和 iq,因此它有兩個與之相關的權重,而且能夠分別調整。一個較大的權重意味着神經網絡認爲這個輸入比其它輸入更重要,較小的權重意味着數據不是那麼重要。一個極端的例子是,若是 test 成績對學生錄取沒有影響,那麼 test 分數的權重就會是零,也就是說,它對感知器的輸出沒有影響。網絡
感知器把權重應用於輸入再加總的過程叫作線性組合。經過簡潔的數學表達方式即爲:app
經過以上的計算還不足以方便的預測出這個學生是否會被該學校錄取,感知器求和的結果須要被轉換成輸出信號才能輸出最終的結果。在這個例子中,輸出結果多是:dom
這是須要經過把線性組合傳給激活函數 f 來實現的。一個簡單勝任的激活函數(activation function)能夠是:curl
爲了增長數學運算的功能完整性,這個公式中還將引入一個偏置項(bias)用來調整輸出信號的大小。最終咱們有了一個完整的感知器計算公式:函數
須要注意的是,在數據被整理得足夠"好"(咱們之後再聊聊怎樣預處理數據)的狀況下,咱們並不太須要偏置項。因此在後續的推導和代碼中,你可能看不到偏置項的存在,不要驚訝。
這裏給出感知器的Python實現樣例:
import numpy as np def activation(h): if h <= 0: return 0 else: return 1 inputs = np.array([0.7, 0.3]) weights = np.random.normal(loc=0.0, scale=1, size=(1, 2)) bias = np.random.normal(loc=0.0, scale=1, size=(1)) output = activation(np.dot(weights, inputs) + bias) print('Output: {}'.format(output))
總結看來,單個感知器的結構能夠表示爲下圖左側。若是要解決以上的預測問題,神經網絡結構將不會如上圖示例同樣僅僅是一個感知器,而會是多個、多層感知器組合而成(下圖右側),一個感知器的輸出能夠變成另外一個感知器的輸入,通過多層運算後最終輸出結果。一次神經網絡預測運算將涉及其中全部感知器的運算,這一過程被稱爲正向傳播。
在有了以上的感知器後,就能夠進行學生錄取狀況的預測工做了。可是不出意外的話,使用這樣的神經網絡並不能給出靠譜的預測,由於目前咱們並不知道各個輸入特徵的權重值(weight)。使用不靠譜的權重天然不會得出像樣的結果。好在咱們有不少現成的歷史數據,即咱們知道什麼樣的學生已經被錄取了,也知道什麼樣的學生沒有被錄取。咱們能夠將歷史數據的學生信息帶入神經網絡,看看咱們的神經網絡所產生的輸出結果和實際的結果有什麼不一樣。而後根據結果不一樣的對比來修正權重,如此下去神經網絡將變得愈來愈準確(hopefully)。
這個過程被稱爲神經網絡的訓練,而那些現有的真實數據被稱做訓練數據集。神經網絡剛被建立時,權重是隨機值。當神經網絡根據訓練數據集學習到什麼樣的輸入數據會致使什麼樣的輸出結果以後,網絡會根據以前權重下分類的錯誤來調整它們。
爲了作以上的騷操做,咱們須要理清兩件事情:
關於輸出結果差距的量化,一個很直覺的方法即是把真實結果 y 和計算出的結果 y^ 相減。可是這樣並非最好的方法,由於這會帶來負數,不利於判斷差值的大小。在此,咱們用 y 和y^ 相減後的平方值來量化訓練時每一次預測計算的差值。則在神經網絡運行過全部的訓練數據後,差值的總和爲(爲何前面有個1/2?純粹爲了方便後面的演算):
這個值被稱爲SSE(Sum of Squared Errors of prediction)。爲了使神經網絡有儘量好的表現,咱們但願SSE越小越好,由於SSE越小,神經網絡計算出的輸出結果也就越貼近事實。
接下來的問題就是怎樣調整權重和偏置項了。能夠從公式中看出,SSE的大小和輸入 x 和權重 w 相關。咱們並不能對輸入作什麼手腳,在此只能考慮怎樣對權重作出改動。爲了使說明更加清晰,在此單獨考慮一條數據記錄了計算以及相對應的那個輸出結果。假設SSE與權重 w 的關係以下圖。
若要使得SSE最小化,權重須要在每一個訓練迭代中不停作出調整,直到最終到達是SSE最小的值。這個過程便是梯度降低。
權重調整的大小與當前 w 位置的梯度值成反比,如下是一段公式推導:
δ在計算中被稱做error term,沒有實際意義,純粹爲了數學方便。η被稱爲學習速率(learning rate),由開發者自行設置,這個參數控制了權重變化的速度。正確設置這個參數在神經網絡的訓練中尤其重要,太低的學習速率會是的網絡須要花很長時間才能達到理想的準確率,過大的學習速率會使得網絡不停跳過權重的最佳值,使網絡準確率在訓練師波動頻繁,甚至徹底沒法達到理想卻實際上可能的最佳狀態。
如下爲梯度降低的Python實現樣例:
import numpy as np # 這裏使用sigmoid做爲激活函數 def sigmoid(x): return 1 / (1 + np.exp(-x)) np.random.seed(42) n_records, n_features = features.shape last_loss = None weights = np.random.normal(scale=1 / n_features**.5, size=n_features) epochs = 1000 learnrate = 0.5 for e in range(epochs): del_w = np.zeros(weights.shape) for x, y in zip(features.values, targets): # 公式的力量 output = sigmoid(np.dot(x, weights)) error = y - output error_term = error * output * (1 - output) del_w += error_term * x weights += learnrate * del_w / n_records if e % (epochs / 10) == 0: out = sigmoid(np.dot(features, weights)) loss = np.mean((out - targets) ** 2) print("Train loss: ", loss) tes_out = sigmoid(np.dot(features_test, weights)) predictions = tes_out > 0.5 accuracy = np.mean(predictions == targets_test) print("Prediction accuracy: {:.3f}".format(accuracy))
與正向傳播相反,在複雜的網絡結構中,權重從最後一層(結果輸出)逐步向以前的網絡層級更新,這一過程便是反向傳播。雖然反向傳播的發明者、深度學習教父Geoffrey Hinton不久前指出目前的反向傳播算法有諸多缺陷,急需被取代。咱們在仰望大神們新的研究成果的同時,反向傳播還是當下最有效的學習手段。
如下是一個僅用numpy
實現的包含一個隱藏層的神經網絡,激活函數分別是:
NeuralNetwork.py:
import numpy as np class NeuralNetwork: def __init__(self, input_nodes, hidden_nodes, output_nodes, learning_rate, weights_input_to_hidden=None, weights_hidden_to_output=None): self.input_nodes = input_nodes self.hidden_nodes = hidden_nodes self.output_nodes = output_nodes # Initialize weights if type(weights_input_to_hidden).__name__ == 'NoneType' and type(weights_hidden_to_output).__name__ == 'NoneType': self.weights_input_to_hidden = np.random.normal(0.0, self.input_nodes**-0.5, (self.input_nodes, self.hidden_nodes)) self.weights_hidden_to_output = np.random.normal(0.0, self.hidden_nodes**-0.5, (self.hidden_nodes, self.output_nodes)) else: self.weights_input_to_hidden = weights_input_to_hidden self.weights_hidden_to_output = weights_hidden_to_output self.lr = learning_rate def sigmoid(x): return 1 / (1 + np.exp( -x )) def sigmoid_prime(x): return sigmoid(x) * (1 - sigmoid(x)) def linear(x): return x def linear_prime(x): return x ** 0 # Activation functions self.activation_function = sigmoid self.activation_function_prime = sigmoid_prime self.activation_function2 = linear self.activation_function_prime2 = linear_prime def train(self, features, targets): n_records = features.shape[0] delta_weights_i_h = np.zeros(self.weights_input_to_hidden.shape) delta_weights_h_o = np.zeros(self.weights_hidden_to_output.shape) for X, y in zip(features, targets): # Forward Pass hidden_inputs = np.dot(X, self.weights_input_to_hidden) hidden_outputs = self.activation_function(hidden_inputs) final_inputs = np.dot(hidden_outputs, self.weights_hidden_to_output) final_outputs = self.activation_function2(final_inputs) # Backward Pass error = y - final_outputs output_error_term = error * self.activation_function_prime2(final_outputs) hidden_error = np.dot(output_error_term, self.weights_hidden_to_output.T) hidden_error_term = hidden_error * self.activation_function_prime(hidden_inputs) # Weight steps delta_weights_i_h += hidden_error_term * X[:, None] delta_weights_h_o += output_error_term * hidden_outputs[:, None] self.weights_hidden_to_output += self.lr * delta_weights_h_o / n_records self.weights_input_to_hidden += self.lr * delta_weights_i_h / n_records def run(self, features): hidden_inputs = np.dot(features, self.weights_input_to_hidden) hidden_outputs = self.activation_function(hidden_inputs) final_inputs = np.dot(hidden_outputs, self.weights_hidden_to_output) final_outputs = self.activation_function2(final_inputs) return final_outputs def get_weights(self): return self.weights_input_to_hidden, self.weights_hidden_to_output
DataProcessor.py:
import numpy as np import pandas as pd import matplotlib.pyplot as plt class DataProcessor: def __init__(self, data_path): self.orig_data = pd.read_csv(data_path) self.data = self.orig_data self.scaled_features = {} self.train_features = None self.train_targets = None self.test_features = None self.test_targets = None self.test_data = None self.val_features = None self.val_targets = None def show_data(self, plot_by_dteday=False): print (self.data.head()) if plot_by_dteday == True: self.data[:24*10].plot(x='dteday', y='cnt', title='Data for the first 10 days') plt.show() def virtualize(self): # Add virtualized data columns dummy_fields = ['season', 'weathersit', 'mnth', 'hr', 'weekday'] for each in dummy_fields: dummies = pd.get_dummies(self.data[each], prefix=each, drop_first=False) self.data = pd.concat([self.data, dummies], axis=1) # Drop scale data columns fields_to_drop = ['instant', 'dteday', 'season', 'weathersit', 'weekday', 'atemp', 'mnth', 'workingday', 'hr'] self.data = self.data.drop(fields_to_drop, axis=1) def normalize(self): quant_features = ['casual', 'registered', 'cnt', 'temp', 'hum', 'windspeed'] for each in quant_features: mean, std = self.data[each].mean(), self.data[each].std() self.scaled_features[each] = [mean, std] self.data.loc[:, each] = (self.data[each] - mean) / std def split(self): # Save data of last 21 days for testing self.test_data = self.data[-21 * 24:] self.data = self.data[:-21 * 24] target_fields = ['cnt', 'casual', 'registered'] features, targets = self.data.drop(target_fields, axis=1), self.data[target_fields] self.test_features, self.test_targets = self.test_data.drop(target_fields, axis=1), self.test_data[target_fields] self.train_features, self.train_targets = features[:-60*24], targets[:-60*24] self.val_features, self.val_targets = features[-60*24:], targets[-60*24:] def get_train_data(self): return self.train_features, self.train_targets def get_test_data(self): return self.test_features, self.test_targets, self.test_data def get_val_data(self): return self.val_features, self.val_targets def get_scaled_features(self): return self.scaled_features def get_orig_data(self): return self.orig_data
Train.py
import sys import json from pprint import pprint import DataProcessor import NeuralNetwork import numpy as np import matplotlib.pyplot as plt # Get training parameters with open('networkConfig.json') as config_file: config = json.load(config_file) pprint(config) iterations = config['iterations'] learning_rate = config['learning_rate'] hidden_nodes = config['hidden_nodes'] output_nodes = config['output_nodes'] # Get data data_processor = DataProcessor.DataProcessor('Bike-Sharing-Dataset/hour.csv') data_processor.virtualize() data_processor.normalize() data_processor.split() train_features, train_targets = data_processor.get_train_data() val_features, val_targets = data_processor.get_val_data() # Initialize NeuralNetwork N_i = train_features.shape[1] network = NeuralNetwork.NeuralNetwork(N_i, hidden_nodes, output_nodes, learning_rate) losses = {'train': [], 'validation': []} def MSE(y, Y): return np.mean((y-Y)**2) for ii in range(iterations): # pick 128 random records from training data set batch = np.random.choice(train_features.index, size=128) X, y = train_features.ix[batch].values, train_targets.ix[batch]['cnt'] network.train(X, y) train_loss = MSE(network.run(train_features).T, train_targets['cnt'].values) val_loss = MSE(network.run(val_features).T, val_targets['cnt'].values) sys.stdout.write("\rProgress: {:2.1f}".format(100 * ii/float(iterations)) \ + "% ... Training loss: " + str(train_loss)[:5] \ + " ... Validation loss: " + str(val_loss)[:5]) sys.stdout.flush() losses['train'].append(train_loss) losses['validation'].append(val_loss) # Store weights weights_input_to_hidden, weights_hidden_to_output = network.get_weights() np.save('weights_input_to_hidden', weights_input_to_hidden) np.save('weights_hidden_to_output', weights_hidden_to_output) # Plot losses plt.plot(losses['train'], label='Training loss') plt.plot(losses['validation'], label='Validation loss') plt.legend() _ = plt.ylim() plt.show()
Run.py
import json from pprint import pprint import DataProcessor import NeuralNetwork import numpy as np import pandas as pd import matplotlib.pyplot as plt # Get training parameters with open('networkConfig.json') as config_file: config = json.load(config_file) pprint(config) iterations = config['iterations'] learning_rate = config['learning_rate'] hidden_nodes = config['hidden_nodes'] output_nodes = config['output_nodes'] # Get data data_processor = DataProcessor.DataProcessor('Bike-Sharing-Dataset/hour.csv') data_processor.virtualize() data_processor.normalize() data_processor.split() test_features, test_targets, test_data = data_processor.get_test_data() scaled_features = data_processor.get_scaled_features() orig_data = data_processor.get_orig_data() mean, std = scaled_features['cnt'] # Initialize network weights_input_to_hidden = np.load('weights_input_to_hidden.npy') weights_hidden_to_output = np.load('weights_hidden_to_output.npy') N_i = test_features.shape[1] network = NeuralNetwork.NeuralNetwork(N_i, hidden_nodes, output_nodes, learning_rate, weights_input_to_hidden=weights_input_to_hidden, weights_hidden_to_output=weights_hidden_to_output) # Run network prediction predictions = network.run(test_features).T * std + mean # Plot prediction and ground trueth fig, ax = plt.subplots(figsize=(8, 4)) ax.plot(predictions[0], label='Prediction') ax.plot((test_targets['cnt']*std + mean).values, label='Data') ax.set_xlim(right=len(predictions)) ax.legend() dates = pd.to_datetime(orig_data.ix[test_data.index]['dteday']) dates = dates.apply(lambda d: d.strftime('%b %d')) ax.set_xticks(np.arange(len(dates))[12::24]) _ = ax.set_xticklabels(dates[12::24], rotation=45) plt.show()
networkConfig.json
{ "iterations": 10000, "learning_rate": 0.1, "hidden_nodes": 7, "output_nodes": 1 }
> curl -O https://archive.ics.uci.edu/ml/machine-learning-databases/00275/Bike-Sharing-Dataset.zip % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 273k 100 273k 0 0 26888 0 0:00:10 0:00:10 --:--:-- 59889 > unzip Bike-Sharing-Dataset.zip Archive: Bike-Sharing-Dataset.zip inflating: Readme.txt inflating: day.csv inflating: hour.csv
使用這些數據,這個神經網絡在訓練以後將能夠預測租賃自行車的使用量。
>>> from DataProcessor import DataProcessor as dp >>> data_processor = dp('Bike-Sharing-Dataset/hour.csv') >>> data_processor.show_data() instant dteday season yr mnth hr holiday weekday workingday \ 0 1 2011-01-01 1 0 1 0 0 6 0 1 2 2011-01-01 1 0 1 1 0 6 0 2 3 2011-01-01 1 0 1 2 0 6 0 3 4 2011-01-01 1 0 1 3 0 6 0 4 5 2011-01-01 1 0 1 4 0 6 0 weathersit temp atemp hum windspeed casual registered cnt 0 1 0.24 0.2879 0.81 0.0 3 13 16 1 1 0.22 0.2727 0.80 0.0 8 32 40 2 1 0.22 0.2727 0.80 0.0 5 27 32 3 1 0.24 0.2879 0.75 0.0 3 10 13 4 1 0.24 0.2879 0.75 0.0 0 1 1
>>> data_processor.virtualize() >>> data_processor.show_data() yr holiday temp hum windspeed casual registered cnt season_1 \ 0 0 0 0.24 0.81 0.0 3 13 16 1 1 0 0 0.22 0.80 0.0 8 32 40 1 2 0 0 0.22 0.80 0.0 5 27 32 1 3 0 0 0.24 0.75 0.0 3 10 13 1 4 0 0 0.24 0.75 0.0 0 1 1 1 season_2 ... hr_21 hr_22 hr_23 weekday_0 weekday_1 weekday_2 \ 0 0 ... 0 0 0 0 0 0 1 0 ... 0 0 0 0 0 0 2 0 ... 0 0 0 0 0 0 3 0 ... 0 0 0 0 0 0 4 0 ... 0 0 0 0 0 0 weekday_3 weekday_4 weekday_5 weekday_6 0 0 0 0 1 1 0 0 0 1 2 0 0 0 1 3 0 0 0 1 4 0 0 0 1 [5 rows x 59 columns]
>>> data_processor.normalize() >>> data_processor.show_data() yr holiday temp hum windspeed casual registered cnt \ 0 0 0 -1.334609 0.947345 -1.553844 -0.662736 -0.930162 -0.956312 1 0 0 -1.438475 0.895513 -1.553844 -0.561326 -0.804632 -0.823998 2 0 0 -1.438475 0.895513 -1.553844 -0.622172 -0.837666 -0.868103 3 0 0 -1.334609 0.636351 -1.553844 -0.662736 -0.949983 -0.972851 4 0 0 -1.334609 0.636351 -1.553844 -0.723582 -1.009445 -1.039008 season_1 season_2 ... hr_21 hr_22 hr_23 weekday_0 weekday_1 \ 0 1 0 ... 0 0 0 0 0 1 1 0 ... 0 0 0 0 0 2 1 0 ... 0 0 0 0 0 3 1 0 ... 0 0 0 0 0 4 1 0 ... 0 0 0 0 0 weekday_2 weekday_3 weekday_4 weekday_5 weekday_6 0 0 0 0 0 1 1 0 0 0 0 1 2 0 0 0 0 1 3 0 0 0 0 1 4 0 0 0 0 1 [5 rows x 59 columns]
> python Train.py
在訓練以前,你可能想要自行調整一下networkConfig.json中的超參數:
{ "iterations": 10000, "learning_rate": 0.1, "hidden_nodes": 7, "output_nodes": 1 }
訓練完成以後,你將看到:
在網絡訓練以後,便可運行網絡
> python Run.py
你將看到以下的圖,預測數據和實際數據的對比: