在上篇博文中,我大略介紹了一下神經網絡究竟是什麼,但願能幫助沒有接觸過這個科技領域的同窗對神經網絡有一個感性的認知。本篇將經過簡單的數學推論和 Python 代碼實現來解釋神經網絡最基本的兩個要素:node
當特徵數據被輸入感知器,它會與分配給這個特定輸入的權重相乘。例如,上圖感知器有兩個輸入,test和 iq,因此它有兩個與之相關的權重,而且能夠分別調整。一個較大的權重意味着神經網絡認爲這個輸入比其它輸入更重要,較小的權重意味着數據不是那麼重要。一個極端的例子是,若是 test 成績對學生錄取沒有影響,那麼 test 分數的權重就會是零,也就是說,它對感知器的輸出沒有影響。網絡
這是須要經過把線性組合傳給激活函數 f 來實現的。一個簡單勝任的激活函數(activation function)能夠是:curl
import numpy as np def activation(h): if h <= 0: return 0 else: return 1 inputs = np.array([0.7, 0.3]) weights = np.random.normal(loc=0.0, scale=1, size=(1, 2)) bias = np.random.normal(loc=0.0, scale=1, size=(1)) output = activation(np.dot(weights, inputs) + bias) print('Output: {}'.format(output))
關於輸出結果差距的量化,一個很直覺的方法即是把真實結果 y 和計算出的結果 y^ 相減。可是這樣並非最好的方法,由於這會帶來負數,不利於判斷差值的大小。在此,咱們用 y 和y^ 相減後的平方值來量化訓練時每一次預測計算的差值。則在神經網絡運行過全部的訓練數據後,差值的總和爲(爲何前面有個1/2?純粹爲了方便後面的演算):
這個值被稱爲SSE(Sum of Squared Errors of prediction)。爲了使神經網絡有儘量好的表現,咱們但願SSE越小越好,由於SSE越小,神經網絡計算出的輸出結果也就越貼近事實。
接下來的問題就是怎樣調整權重和偏置項了。能夠從公式中看出,SSE的大小和輸入 x 和權重 w 相關。咱們並不能對輸入作什麼手腳,在此只能考慮怎樣對權重作出改動。爲了使說明更加清晰,在此單獨考慮一條數據記錄了計算以及相對應的那個輸出結果。假設SSE與權重 w 的關係以下圖。
權重調整的大小與當前 w 位置的梯度值成反比,如下是一段公式推導:
δ在計算中被稱做error term,沒有實際意義,純粹爲了數學方便。η被稱爲學習速率(learning rate),由開發者自行設置,這個參數控制了權重變化的速度。正確設置這個參數在神經網絡的訓練中尤其重要,太低的學習速率會是的網絡須要花很長時間才能達到理想的準確率,過大的學習速率會使得網絡不停跳過權重的最佳值,使網絡準確率在訓練師波動頻繁,甚至徹底沒法達到理想卻實際上可能的最佳狀態。
import numpy as np # 這裏使用sigmoid做爲激活函數 def sigmoid(x): return 1 / (1 + np.exp(-x)) np.random.seed(42) n_records, n_features = features.shape last_loss = None weights = np.random.normal(scale=1 / n_features**.5, size=n_features) epochs = 1000 learnrate = 0.5 for e in range(epochs): del_w = np.zeros(weights.shape) for x, y in zip(features.values, targets): # 公式的力量 output = sigmoid(np.dot(x, weights)) error = y - output error_term = error * output * (1 - output) del_w += error_term * x weights += learnrate * del_w / n_records if e % (epochs / 10) == 0: out = sigmoid(np.dot(features, weights)) loss = np.mean((out - targets) ** 2) print("Train loss: ", loss) tes_out = sigmoid(np.dot(features_test, weights)) predictions = tes_out > 0.5 accuracy = np.mean(predictions == targets_test) print("Prediction accuracy: {:.3f}".format(accuracy))
與正向傳播相反,在複雜的網絡結構中,權重從最後一層(結果輸出)逐步向以前的網絡層級更新,這一過程便是反向傳播。雖然反向傳播的發明者、深度學習教父Geoffrey Hinton不久前指出目前的反向傳播算法有諸多缺陷,急需被取代。咱們在仰望大神們新的研究成果的同時,反向傳播還是當下最有效的學習手段。
import numpy as np class NeuralNetwork: def __init__(self, input_nodes, hidden_nodes, output_nodes, learning_rate, weights_input_to_hidden=None, weights_hidden_to_output=None): self.input_nodes = input_nodes self.hidden_nodes = hidden_nodes self.output_nodes = output_nodes # Initialize weights if type(weights_input_to_hidden).__name__ == 'NoneType' and type(weights_hidden_to_output).__name__ == 'NoneType': self.weights_input_to_hidden = np.random.normal(0.0, self.input_nodes**-0.5, (self.input_nodes, self.hidden_nodes)) self.weights_hidden_to_output = np.random.normal(0.0, self.hidden_nodes**-0.5, (self.hidden_nodes, self.output_nodes)) else: self.weights_input_to_hidden = weights_input_to_hidden self.weights_hidden_to_output = weights_hidden_to_output self.lr = learning_rate def sigmoid(x): return 1 / (1 + np.exp( -x )) def sigmoid_prime(x): return sigmoid(x) * (1 - sigmoid(x)) def linear(x): return x def linear_prime(x): return x ** 0 # Activation functions self.activation_function = sigmoid self.activation_function_prime = sigmoid_prime self.activation_function2 = linear self.activation_function_prime2 = linear_prime def train(self, features, targets): n_records = features.shape[0] delta_weights_i_h = np.zeros(self.weights_input_to_hidden.shape) delta_weights_h_o = np.zeros(self.weights_hidden_to_output.shape) for X, y in zip(features, targets): # Forward Pass hidden_inputs = np.dot(X, self.weights_input_to_hidden) hidden_outputs = self.activation_function(hidden_inputs) final_inputs = np.dot(hidden_outputs, self.weights_hidden_to_output) final_outputs = self.activation_function2(final_inputs) # Backward Pass error = y - final_outputs output_error_term = error * self.activation_function_prime2(final_outputs) hidden_error = np.dot(output_error_term, self.weights_hidden_to_output.T) hidden_error_term = hidden_error * self.activation_function_prime(hidden_inputs) # Weight steps delta_weights_i_h += hidden_error_term * X[:, None] delta_weights_h_o += output_error_term * hidden_outputs[:, None] self.weights_hidden_to_output += self.lr * delta_weights_h_o / n_records self.weights_input_to_hidden += self.lr * delta_weights_i_h / n_records def run(self, features): hidden_inputs = np.dot(features, self.weights_input_to_hidden) hidden_outputs = self.activation_function(hidden_inputs) final_inputs = np.dot(hidden_outputs, self.weights_hidden_to_output) final_outputs = self.activation_function2(final_inputs) return final_outputs def get_weights(self): return self.weights_input_to_hidden, self.weights_hidden_to_output
import numpy as np import pandas as pd import matplotlib.pyplot as plt class DataProcessor: def __init__(self, data_path): self.orig_data = pd.read_csv(data_path) self.data = self.orig_data self.scaled_features = {} self.train_features = None self.train_targets = None self.test_features = None self.test_targets = None self.test_data = None self.val_features = None self.val_targets = None def show_data(self, plot_by_dteday=False): print (self.data.head()) if plot_by_dteday == True: self.data[:24*10].plot(x='dteday', y='cnt', title='Data for the first 10 days') plt.show() def virtualize(self): # Add virtualized data columns dummy_fields = ['season', 'weathersit', 'mnth', 'hr', 'weekday'] for each in dummy_fields: dummies = pd.get_dummies(self.data[each], prefix=each, drop_first=False) self.data = pd.concat([self.data, dummies], axis=1) # Drop scale data columns fields_to_drop = ['instant', 'dteday', 'season', 'weathersit', 'weekday', 'atemp', 'mnth', 'workingday', 'hr'] self.data = self.data.drop(fields_to_drop, axis=1) def normalize(self): quant_features = ['casual', 'registered', 'cnt', 'temp', 'hum', 'windspeed'] for each in quant_features: mean, std = self.data[each].mean(), self.data[each].std() self.scaled_features[each] = [mean, std] self.data.loc[:, each] = (self.data[each] - mean) / std def split(self): # Save data of last 21 days for testing self.test_data = self.data[-21 * 24:] self.data = self.data[:-21 * 24] target_fields = ['cnt', 'casual', 'registered'] features, targets = self.data.drop(target_fields, axis=1), self.data[target_fields] self.test_features, self.test_targets = self.test_data.drop(target_fields, axis=1), self.test_data[target_fields] self.train_features, self.train_targets = features[:-60*24], targets[:-60*24] self.val_features, self.val_targets = features[-60*24:], targets[-60*24:] def get_train_data(self): return self.train_features, self.train_targets def get_test_data(self): return self.test_features, self.test_targets, self.test_data def get_val_data(self): return self.val_features, self.val_targets def get_scaled_features(self): return self.scaled_features def get_orig_data(self): return self.orig_data
import sys import json from pprint import pprint import DataProcessor import NeuralNetwork import numpy as np import matplotlib.pyplot as plt # Get training parameters with open('networkConfig.json') as config_file: config = json.load(config_file) pprint(config) iterations = config['iterations'] learning_rate = config['learning_rate'] hidden_nodes = config['hidden_nodes'] output_nodes = config['output_nodes'] # Get data data_processor = DataProcessor.DataProcessor('Bike-Sharing-Dataset/hour.csv') data_processor.virtualize() data_processor.normalize() data_processor.split() train_features, train_targets = data_processor.get_train_data() val_features, val_targets = data_processor.get_val_data() # Initialize NeuralNetwork N_i = train_features.shape[1] network = NeuralNetwork.NeuralNetwork(N_i, hidden_nodes, output_nodes, learning_rate) losses = {'train': [], 'validation': []} def MSE(y, Y): return np.mean((y-Y)**2) for ii in range(iterations): # pick 128 random records from training data set batch = np.random.choice(train_features.index, size=128) X, y = train_features.ix[batch].values, train_targets.ix[batch]['cnt'] network.train(X, y) train_loss = MSE(network.run(train_features).T, train_targets['cnt'].values) val_loss = MSE(network.run(val_features).T, val_targets['cnt'].values) sys.stdout.write("\rProgress: {:2.1f}".format(100 * ii/float(iterations)) \ + "% ... Training loss: " + str(train_loss)[:5] \ + " ... Validation loss: " + str(val_loss)[:5]) sys.stdout.flush() losses['train'].append(train_loss) losses['validation'].append(val_loss) # Store weights weights_input_to_hidden, weights_hidden_to_output = network.get_weights() np.save('weights_input_to_hidden', weights_input_to_hidden) np.save('weights_hidden_to_output', weights_hidden_to_output) # Plot losses plt.plot(losses['train'], label='Training loss') plt.plot(losses['validation'], label='Validation loss') plt.legend() _ = plt.ylim() plt.show()
import json from pprint import pprint import DataProcessor import NeuralNetwork import numpy as np import pandas as pd import matplotlib.pyplot as plt # Get training parameters with open('networkConfig.json') as config_file: config = json.load(config_file) pprint(config) iterations = config['iterations'] learning_rate = config['learning_rate'] hidden_nodes = config['hidden_nodes'] output_nodes = config['output_nodes'] # Get data data_processor = DataProcessor.DataProcessor('Bike-Sharing-Dataset/hour.csv') data_processor.virtualize() data_processor.normalize() data_processor.split() test_features, test_targets, test_data = data_processor.get_test_data() scaled_features = data_processor.get_scaled_features() orig_data = data_processor.get_orig_data() mean, std = scaled_features['cnt'] # Initialize network weights_input_to_hidden = np.load('weights_input_to_hidden.npy') weights_hidden_to_output = np.load('weights_hidden_to_output.npy') N_i = test_features.shape[1] network = NeuralNetwork.NeuralNetwork(N_i, hidden_nodes, output_nodes, learning_rate, weights_input_to_hidden=weights_input_to_hidden, weights_hidden_to_output=weights_hidden_to_output) # Run network prediction predictions = network.run(test_features).T * std + mean # Plot prediction and ground trueth fig, ax = plt.subplots(figsize=(8, 4)) ax.plot(predictions[0], label='Prediction') ax.plot((test_targets['cnt']*std + mean).values, label='Data') ax.set_xlim(right=len(predictions)) ax.legend() dates = pd.to_datetime(orig_data.ix[test_data.index]['dteday']) dates = dates.apply(lambda d: d.strftime('%b %d')) ax.set_xticks(np.arange(len(dates))[12::24]) _ = ax.set_xticklabels(dates[12::24], rotation=45) plt.show()
{ "iterations": 10000, "learning_rate": 0.1, "hidden_nodes": 7, "output_nodes": 1 }
> curl -O https://archive.ics.uci.edu/ml/machine-learning-databases/00275/Bike-Sharing-Dataset.zip % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 273k 100 273k 0 0 26888 0 0:00:10 0:00:10 --:--:-- 59889 > unzip Bike-Sharing-Dataset.zip Archive: Bike-Sharing-Dataset.zip inflating: Readme.txt inflating: day.csv inflating: hour.csv
>>> from DataProcessor import DataProcessor as dp >>> data_processor = dp('Bike-Sharing-Dataset/hour.csv') >>> data_processor.show_data() instant dteday season yr mnth hr holiday weekday workingday \ 0 1 2011-01-01 1 0 1 0 0 6 0 1 2 2011-01-01 1 0 1 1 0 6 0 2 3 2011-01-01 1 0 1 2 0 6 0 3 4 2011-01-01 1 0 1 3 0 6 0 4 5 2011-01-01 1 0 1 4 0 6 0 weathersit temp atemp hum windspeed casual registered cnt 0 1 0.24 0.2879 0.81 0.0 3 13 16 1 1 0.22 0.2727 0.80 0.0 8 32 40 2 1 0.22 0.2727 0.80 0.0 5 27 32 3 1 0.24 0.2879 0.75 0.0 3 10 13 4 1 0.24 0.2879 0.75 0.0 0 1 1
>>> data_processor.virtualize() >>> data_processor.show_data() yr holiday temp hum windspeed casual registered cnt season_1 \ 0 0 0 0.24 0.81 0.0 3 13 16 1 1 0 0 0.22 0.80 0.0 8 32 40 1 2 0 0 0.22 0.80 0.0 5 27 32 1 3 0 0 0.24 0.75 0.0 3 10 13 1 4 0 0 0.24 0.75 0.0 0 1 1 1 season_2 ... hr_21 hr_22 hr_23 weekday_0 weekday_1 weekday_2 \ 0 0 ... 0 0 0 0 0 0 1 0 ... 0 0 0 0 0 0 2 0 ... 0 0 0 0 0 0 3 0 ... 0 0 0 0 0 0 4 0 ... 0 0 0 0 0 0 weekday_3 weekday_4 weekday_5 weekday_6 0 0 0 0 1 1 0 0 0 1 2 0 0 0 1 3 0 0 0 1 4 0 0 0 1 [5 rows x 59 columns]
>>> data_processor.normalize() >>> data_processor.show_data() yr holiday temp hum windspeed casual registered cnt \ 0 0 0 -1.334609 0.947345 -1.553844 -0.662736 -0.930162 -0.956312 1 0 0 -1.438475 0.895513 -1.553844 -0.561326 -0.804632 -0.823998 2 0 0 -1.438475 0.895513 -1.553844 -0.622172 -0.837666 -0.868103 3 0 0 -1.334609 0.636351 -1.553844 -0.662736 -0.949983 -0.972851 4 0 0 -1.334609 0.636351 -1.553844 -0.723582 -1.009445 -1.039008 season_1 season_2 ... hr_21 hr_22 hr_23 weekday_0 weekday_1 \ 0 1 0 ... 0 0 0 0 0 1 1 0 ... 0 0 0 0 0 2 1 0 ... 0 0 0 0 0 3 1 0 ... 0 0 0 0 0 4 1 0 ... 0 0 0 0 0 weekday_2 weekday_3 weekday_4 weekday_5 weekday_6 0 0 0 0 0 1 1 0 0 0 0 1 2 0 0 0 0 1 3 0 0 0 0 1 4 0 0 0 0 1 [5 rows x 59 columns]
> python Train.py
> python Run.py