做者:chen_h
微信號 & QQ:862251340
微信公衆號:coderpai
簡書地址:https://www.jianshu.com/p/d94...git
這篇教程是翻譯 Peter Roelants寫的神經網絡教程,做者已經受權翻譯,這是 原文。
該教程將介紹如何入門神經網絡,一共包含五部分。你能夠在如下連接找到完整內容。github
這部分教程將介紹一部分:算法
咱們在上次的教程中給出了一個很簡單的模型,只有一個輸入和一個輸出。在這篇教程中,咱們將構建一個二分類模型,輸入參數是兩個變量。這個模型在統計上被稱爲Logistic迴歸模型,網絡結構能夠被描述以下:微信
咱們先導入教程須要使用的軟件包。網絡
import numpy as np import matplotlib.pyplot as plt from matplotlib.colors import colorConverter, ListedColormap from matplotlib import cm
在教程中,目標分類t
將從兩個獨立分佈中產生,當t=1
時,用藍色表示。當t=0
時,用紅色表示。輸入參數X
是一個N*2
的矩陣,目標分類t
是一個N * 1
的向量。更直觀的表現,見下圖。app
# Define and generate the samples nb_of_samples_per_class = 20 # The number of sample in each class red_mean = [-1,0] # The mean of the red class blue_mean = [1,0] # The mean of the blue class std_dev = 1.2 # standard deviation of both classes # Generate samples from both classes x_red = np.random.randn(nb_of_samples_per_class, 2) * std_dev + red_mean x_blue = np.random.randn(nb_of_samples_per_class, 2) * std_dev + blue_mean # Merge samples in set of input variables x, and corresponding set of output variables t X = np.vstack((x_red, x_blue)) t = np.vstack((np.zeros((nb_of_samples_per_class,1)), np.ones((nb_of_samples_per_class,1))))
# Plot both classes on the x1, x2 plane plt.plot(x_red[:,0], x_red[:,1], 'ro', label='class red') plt.plot(x_blue[:,0], x_blue[:,1], 'bo', label='class blue') plt.grid() plt.legend(loc=2) plt.xlabel('$x_1$', fontsize=15) plt.ylabel('$x_2$', fontsize=15) plt.axis([-4, 4, -4, 4]) plt.title('red vs. blue classes in the input space') plt.show()
咱們設計的網絡的目的是從輸入的x
去預測目標t
。假設,輸入x = [x1, x2]
,權重w = [w1, w2]
,預測目標t = 1
。那麼,機率P(t = 1|x, w)
將是神經網絡輸出的y
,即y = σ(x∗wT)
。其中,σ
表示Logistic函數
,定義以下:dom
若是,對於Logistic函數和它的導數還不是很清楚的,能夠查看這個教程,裏面進行了詳細描述。函數
對於這個分類問題的損失函數優化,咱們使用交叉熵偏差函數來解決,對於每一個訓練樣本i
,交叉熵偏差函數定義以下:post
若是咱們要計算整個訓練樣本的交叉熵偏差,那麼只須要把每個樣本的值進行累加就能夠了,即:學習
關於交叉熵偏差函數更加詳細的介紹能夠看這個教程。
logistic(z)
函數實現了Logistic
函數,cost(y, t)
函數實現了損失函數,nn(x, w)
實現了神經網絡的輸出結果,nn_predict(x, w)
實現了神經網絡的預測結果。
# Define the logistic function def logistic(z): return 1 / (1 + np.exp(-z)) # Define the neural network function y = 1 / (1 + numpy.exp(-x*w)) def nn(x, w): return logistic(x.dot(w.T)) # Define the neural network prediction function that only returns # 1 or 0 depending on the predicted class def nn_predict(x,w): return np.around(nn(x,w)) # Define the cost function def cost(y, t): return - np.sum(np.multiply(t, np.log(y)) + np.multiply((1-t), np.log(1-y)))
# Plot the cost in function of the weights # Define a vector of weights for which we want to plot the cost nb_of_ws = 100 # compute the cost nb_of_ws times in each dimension ws1 = np.linspace(-5, 5, num=nb_of_ws) # weight 1 ws2 = np.linspace(-5, 5, num=nb_of_ws) # weight 2 ws_x, ws_y = np.meshgrid(ws1, ws2) # generate grid cost_ws = np.zeros((nb_of_ws, nb_of_ws)) # initialize cost matrix # Fill the cost matrix for each combination of weights for i in range(nb_of_ws): for j in range(nb_of_ws): cost_ws[i,j] = cost(nn(X, np.asmatrix([ws_x[i,j], ws_y[i,j]])) , t) # Plot the cost function surface plt.contourf(ws_x, ws_y, cost_ws, 20, cmap=cm.pink) cbar = plt.colorbar() cbar.ax.set_ylabel('$\\xi$', fontsize=15) plt.xlabel('$w_1$', fontsize=15) plt.ylabel('$w_2$', fontsize=15) plt.title('Cost function surface') plt.grid() plt.show()
梯度降低算法的工做原理是損失函數ξ
對於每個參數的求導,而後沿着負梯度方向進行參數更新。
參數w
按照必定的學習率沿着負梯度方向更新,即w(k+1)=w(k)−Δw(k+1)
,其中Δw
能夠表示爲:
對於每一個訓練樣本i
,∂ξi/∂w
計算以下:
其中,yi=σ(zi)
是神經元的Logistic
輸出,zi=xi∗wT
是神經元的輸入。
在詳細推導損失函數對於權重的導數以前,咱們先這個教程中摘取幾個推導。
參考上面的分步推導,咱們能夠獲得下面的詳細推導:
所以,對於每一個權重的更新Δwj
能夠表示爲:
在批處理中,咱們須要將N
個樣本的梯度都進行累加,即:
在開始梯度降低算法以前,你須要對參數都進行一個隨機數賦值過程,而後採用梯度降低算法更新參數,直至收斂。
gradient(w, x, t)
函數實現了梯度∂ξ/∂w
,delta_w(w_k, x, t, learning_rate)
函數實現了Δw
。
# define the gradient function. def gradient(w, x, t): return (nn(x, w) - t).T * x # define the update function delta w which returns the # delta w for each weight in a vector def delta_w(w_k, x, t, learning_rate): return learning_rate * gradient(w_k, x, t)
咱們在訓練集X
上面運行10
次去作預測,下圖中畫出了前三次的結果,圖中藍色的點表示在第k
次,w(k)
的值。
# Set the initial weight parameter w = np.asmatrix([-4, -2]) # Set the learning rate learning_rate = 0.05 # Start the gradient descent updates and plot the iterations nb_of_iterations = 10 # Number of gradient descent updates w_iter = [w] # List to store the weight values over the iterations for i in range(nb_of_iterations): dw = delta_w(w, X, t, learning_rate) # Get the delta w update w = w-dw # Update the weights w_iter.append(w) # Store the weights for plotting
# Plot the first weight updates on the error surface # Plot the error surface plt.contourf(ws_x, ws_y, cost_ws, 20, alpha=0.9, cmap=cm.pink) cbar = plt.colorbar() cbar.ax.set_ylabel('cost') # Plot the updates for i in range(1, 4): w1 = w_iter[i-1] w2 = w_iter[i] # Plot the weight-cost value and the line that represents the update plt.plot(w1[0,0], w1[0,1], 'bo') # Plot the weight cost value plt.plot([w1[0,0], w2[0,0]], [w1[0,1], w2[0,1]], 'b-') plt.text(w1[0,0]-0.2, w1[0,1]+0.4, '$w({})$'.format(i), color='b') w1 = w_iter[3] # Plot the last weight plt.plot(w1[0,0], w1[0,1], 'bo') plt.text(w1[0,0]-0.2, w1[0,1]+0.4, '$w({})$'.format(4), color='b') # Show figure plt.xlabel('$w_1$', fontsize=15) plt.ylabel('$w_2$', fontsize=15) plt.title('Gradient descent updates on cost surface') plt.grid() plt.show()
下列代碼,咱們將訓練的結果進行可視化。
# Plot the resulting decision boundary # Generate a grid over the input space to plot the color of the # classification at that grid point nb_of_xs = 200 xs1 = np.linspace(-4, 4, num=nb_of_xs) xs2 = np.linspace(-4, 4, num=nb_of_xs) xx, yy = np.meshgrid(xs1, xs2) # create the grid # Initialize and fill the classification plane classification_plane = np.zeros((nb_of_xs, nb_of_xs)) for i in range(nb_of_xs): for j in range(nb_of_xs): classification_plane[i,j] = nn_predict(np.asmatrix([xx[i,j], yy[i,j]]) , w) # Create a color map to show the classification colors of each grid point cmap = ListedColormap([ colorConverter.to_rgba('r', alpha=0.30), colorConverter.to_rgba('b', alpha=0.30)]) # Plot the classification plane with decision boundary and input samples plt.contourf(xx, yy, classification_plane, cmap=cmap) plt.plot(x_red[:,0], x_red[:,1], 'ro', label='target red') plt.plot(x_blue[:,0], x_blue[:,1], 'bo', label='target blue') plt.grid() plt.legend(loc=2) plt.xlabel('$x_1$', fontsize=15) plt.ylabel('$x_2$', fontsize=15) plt.title('red vs. blue classification boundary') plt.show()
做者:chen_h
微信號 & QQ:862251340
簡書地址:https://www.jianshu.com/p/d94...
CoderPai 是一個專一於算法實戰的平臺,從基礎的算法到人工智能算法都有設計。若是你對算法實戰感興趣,請快快關注咱們吧。加入AI實戰微信羣,AI實戰QQ羣,ACM算法微信羣,ACM算法QQ羣。長按或者掃描以下二維碼,關注 「CoderPai」 微信號(coderpai)