（一）神經網絡入門之線性迴歸

時間 2019-12-13

原文原文鏈接

做者：chen_h
微信號 & QQ：862251340
微信公衆號：coderpai
簡書地址：https://www.jianshu.com/p/0da...html

這篇教程是翻譯 Peter Roelants寫的神經網絡教程，做者已經受權翻譯，這是原文。

該教程將介紹如何入門神經網絡，一共包含五部分。你能夠在如下連接找到完整內容。python

這篇教程中的代碼是由 Python 2 IPython Notebook產生的，在教程的最後，我會給出所有代碼的連接，幫助學習。神經網絡中有關矩陣的運算咱們採用NumPy來構建，畫圖使用Matplotlib來構建。若是你來沒有按照這些軟件，那麼我強烈建議你使用Anaconda Python來安裝，這個軟件包中包含了運行這個教程的全部軟件包，很是方便使用。git

咱們先導入教程須要的軟件包github

from __future__ import print_function

import numpy as np
import matplotlib.pyplot as plt

線性迴歸

本教程主要包含三部分：算法

一個很是簡單的神經網絡
一些概念，好比目標函數，損失函數
梯度降低

首先咱們來構建一個最簡單的神經網絡，這個神經網絡只有一個輸入，一個輸出，用來構建一個線性迴歸模型，從輸入的x來預測一個真實結果t。神經網絡的模型結構爲y = x * w ，其中x是輸入參數，w是權重，y是預測結果。神經網絡的模型能夠被表示爲下圖：微信

在常規的神經網絡中，神經網絡結構中有多個層，非線性激活函數和每一個節點上面的誤差單元。在這個教程中，咱們只使用一個只有一個權重w的層，而且沒有激活函數和誤差單元。在簡單線性迴歸中，權重w和誤差單元通常都寫成一個參數向量β，其中誤差單元是y軸上面的截距，w是迴歸線的斜率。在線性迴歸中，咱們通常使用最小二乘法來優化這些參數。網絡

在這篇教程中，咱們的目的是最小化目標損失函數，使得實際輸出的y和正確結果t儘量的接近。損失函數咱們定義爲：
app

對於損失函數的優化，咱們採用梯度降低，這個方法是神經網絡中常見的優化方法。dom

定義目標函數

在這個例子中，咱們使用函數f來產生目標結果t，可是對目標結果加上一些高斯噪聲N(0, 0.2)，其中N表示正態分佈，均值是0，方差是0.2，f定義爲f(x) = 2x，x是輸入參數，迴歸線的斜率是2，截距是0。因此最後的t = f(x) + N(0, 0.2)。函數

咱們將產生20個均勻分佈的數據做爲數據樣本x，而後設計目標結果t。下面的程序咱們生成了x和t，以及畫出了他們之間的線性關係。

# Define the vector of input samples as x, with 20 values sampled from a uniform distribution
# between 0 and 1
x = np.random.uniform(0, 1, 20)

# Generate the target values t from x with small gaussian noise so the estimation won't be perfect.
# Define a function f that represents the line that generates t without noise
def f(x): return x * 2

# Create the targets t with some gaussian noise
noise_variance = 0.2 # Variance of the gaussian noise
# Gaussian noise error for each sample in x
noise = np.random.randn(x.shape[0]) * noise_variance
# Create targets t
t = f(x) + noise

# Plot the target t versus the input x
plt.plot(x, t, 'o', label='t')
# Plot the initial line
plt.plot([0, 1], [f(0), f(1)], 'b-', label='f(x)')
plt.xlabel('$x$', fontsize=15)
plt.ylabel('$t$', fontsize=15)
plt.ylim([0,2])
plt.title('inputs (x) vs targets (t)')
plt.grid()
plt.legend(loc=2)
plt.show()

定義損失函數

咱們將優化模型y = w * x中的參數w，使得對於訓練集中的N個樣本，損失函數達到最小。

即，咱們的優化目標是：

從函數中，咱們能夠發現，咱們將全部樣本的偏差都進行了累加，這就是所謂的批訓練（batch training）。咱們也能夠在訓練的時候，每次訓練一個樣本，這種方法在在線訓練中很是經常使用。

咱們利用如下函數畫出損失函數與權重的關係。從圖中，咱們能夠看出損失函數的值達到最小時，w的值是2。這個值就是咱們函數f(x)的斜率。這個損失函數是一個凸函數，而且只有一個全局最小值。

nn(x, w)函數實現了神經網絡模型，cost(y, t)函數實現了損失函數。

# Define the neural network function y = x * w
def nn(x, w): return x*w

# Define the cost function
def cost(y, t): return ((t - y) ** 2).sum()

優化損失函數

對於教程中簡單的損失函數，可能你看一眼就能知道最佳的權重是什麼。可是對於複雜的或者更高維度的損失函數，這就是咱們爲何要使用各類優化方法的緣由了。

梯度降低

在訓練神經網絡中，梯度降低算法是一種比較經常使用的優化算法。梯度降低算法的原理是損失函數對於每一個參數進行求導，而且利用負梯度對參數進行更新。權重w經過循環進行更新：

其中，w(k)表示權重w更新到第k步時的值，Δw爲定義爲：

其中，μ是學習率，它的含義是在參數更新的時候，每一步的跨度大小。∂ξ/∂w 表示損失函數 ξ 對於 w 的梯度。對於每個訓練樣本i，咱們能夠利用鏈式規則推導出對應的梯度，以下：

其中，ξi是第i個樣本的損失函數，所以，∂ξi/∂yi能夠這樣進行推導：

由於y(i) = x(i) ∗ w，因此咱們對於∂yi/∂w能夠這樣進行推導：

所以，對於第i個訓練樣本，Δw的完整推導以下：

在批處理過程當中，咱們將全部的梯度都進行累加：

在進行梯度降低以前，咱們須要對權重進行一個初始化，而後再使用梯度降低算法進行訓練，最後直至算法收斂。學習率做爲一個超參數，須要單獨調試。

gradient(w, x, t)函數實現了梯度∂ξ/∂w，delta_w(w_k, x, t, learning_rate)函數實現了Δw。

# define the gradient function. Remember that y = nn(x, w) = x * w
def gradient(w, x, t):
  return 2 * x * (nn(x, w) - t)

# define the update function delta w
def delta_w(w_k, x, t, learning_rate):
  return learning_rate * gradient(w_k, x, t).sum()

# Set the initial weight parameter
w = 0.1
# Set the learning rate
learning_rate = 0.1

# Start performing the gradient descent updates, and print the weights and cost:
nb_of_iterations = 4 # number of gradient descent updates
w_cost = [(w, cost(nn(x, w), t))] # List to store the weight, costs values
for i in range(nb_of_iterations):
  dw = delta_w(w, x, t, learning_rate) # Get the delta w update
  w = w - dw # Update the current weight parameter
  w_cost.append((w, cost(nn(x, w), t))) # Add weight, cost to list

# Print the final w, and cost
for i in range(0, len(w_cost)):
  print('w({}): {:.4f} \t cost: {:.4f}'.format(i, w_cost[i][0], w_cost[i][1]))

# output
w(0): 0.1000   cost: 23.3917
w(1): 2.3556   cost: 1.0670
w(2): 2.0795   cost: 0.7324
w(3): 2.1133   cost: 0.7274
w(4): 2.1091   cost: 0.7273

從計算結果中，咱們很容易的看出來了，梯度降低算法很快的收斂到了2.0左右，接下來可視化一下梯度降低過程。

# Plot the first 2 gradient descent updates
plt.plot(ws, cost_ws, 'r-')  # Plot the error curve
# Plot the updates
for i in range(0, len(w_cost)-2):
  w1, c1 = w_cost[i]
  w2, c2 = w_cost[i+1]
  plt.plot(w1, c1, 'bo')
  plt.plot([w1, w2],[c1, c2], 'b-')
  plt.text(w1, c1+0.5, '$w({})$'.format(i)) 
# Show figure
plt.xlabel('$w$', fontsize=15)
plt.ylabel('$\\xi$', fontsize=15)
plt.title('Gradient descent updates plotted on cost function')
plt.grid()
plt.show()

梯度更新

上圖展現了梯度降低的可視化過程。圖中藍色的點表示在第k輪中w(k)的值。從圖中咱們能夠得知，w的值愈來愈收斂於2.0。該模型訓練10次就能收斂，以下圖所示。

w = 0
# Start performing the gradient descent updates
nb_of_iterations = 10  # number of gradient descent updates
for i in range(nb_of_iterations):
  dw = delta_w(w, x, t, learning_rate)  # get the delta w update
  w = w - dw  # update the current weight parameter

# Plot the fitted line agains the target line
# Plot the target t versus the input x
plt.plot(x, t, 'o', label='t')
# Plot the initial line
plt.plot([0, 1], [f(0), f(1)], 'b-', label='f(x)')
# plot the fitted line
plt.plot([0, 1], [0*w, 1*w], 'r-', label='fitted line')
plt.xlabel('input x')
plt.ylabel('target t')
plt.ylim([0,2])
plt.title('input vs. target')
plt.grid()
plt.legend(loc=2)
plt.show()

完整代碼，點擊這裏

做者：chen_h
微信號 & QQ：862251340
簡書地址：https://www.jianshu.com/p/0da...

CoderPai 是一個專一於算法實戰的平臺，從基礎的算法到人工智能算法都有設計。若是你對算法實戰感興趣，請快快關注咱們吧。加入AI實戰微信羣，AI實戰QQ羣，ACM算法微信羣，ACM算法QQ羣。長按或者掃描以下二維碼，關注「CoderPai」微信號（coderpai）