人工智能實戰_第三次做業_陳澤寅

第三次做業:使用minibatch的方式進行梯度降低


1、簡要概述

項目 內容
課程 人工智能實戰2019
做業要求 做業要求
我在這個課程的目標是 瞭解人工智能理論,提高coding能力
這個做業在哪一個具體方面幫助我實現目標 瞭解單層神經工做原理,掌握幾種梯度降低法的優缺點,本身實現簡單算法

2、單層神經網絡原理以及權值更新依據

  • 單變量隨機梯度降低 SDG(Stochastic Grident Descent)算法

  • 正向計算過程:
    \[Z^{n \times 1}=W^{n \times f} \cdot X^{f \times 1} + B^{n \times 1}\]
    \[A^{n \times 1}=a(Z)\]網絡

  • 反向計算過程:
    \[ \Delta Z^{n \times 1} = J'(W,B) = A^{n \times 1} - Y^{1 \times 1}\]
    \[ W^{n \times f} = W^{n \times f} - \eta \cdot (\Delta Z^{n \times 1} \cdot X_T^{1 \times f})\]
    \[ B^{n \times 1} = B^{n \times 1} - \eta \cdot \Delta Z^{n \times 1}\]
    • 其中:
      \[f=特徵值數,m=樣本數,n=神經元數,\eta=步長 \\ A=預測值,Y=標籤值,X=輸入值,X_T=X的轉置\]

3、做業實現步驟

  • 模擬PDF上的步驟,仿寫出相應的單層神經網絡模型算法。
  • 採用SGD梯度降低算法更新每次的權值
  • 統計每個Epoch權值更新完以後的代價值。
  • 更改batch_size的值爲5,10,15,觀察在不一樣的batch_size下函數代價函數的收斂速度與效果。

4、代碼

#-*-coding:utf-8-*-
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path

x_data_name = "TemperatureControlXData.dat" #read the data from x
y_data_name = "TemperatureControlYData.dat" # 兩個文件名保存着數據文件的路徑


#下面這個類保存着每一個線性模型的參數

class CData(object):
    def __init__(self, loss, w, b, epoch, iteration):
        self.loss = loss
        self.w = w
        self.b = b
        self.epoch = epoch
        self.iteration = iteration

# 下面這個函數將X,Y數據從文件中讀出,而且經過reshape函數將其規整爲一行的結果

def ReadData():
    Xfile = Path(x_data_name)
    Yfile = Path(y_data_name)
    if Xfile.exists() & Yfile.exists():
        X = np.load(Xfile)
        Y = np.load(Yfile)
        return X.reshape(1, -1), Y.reshape(1, -1) #將這兩種數據所有轉化爲一個列
    else:
        return None, None
# 前向傳播過程 z = w*x + b
# 獲得相應的z
def Fowrad(w,b,x):
    z = np.dot(w,x)+b
    return z;

# 反向求導過程,求出w和b的值應該變化的量
def BackPropagation(x,y,z):
    m = np.shape(x)[1] # 表明樣本的數量
    # delta z = z - y
    deltaZ = z - y

    # delta b = sum(delta z)
    deltaB = deltaZ.sum(axis=1, keepdims=True) / m

    # delta w = (delta z) * (x')
    deltaW = np.dot(deltaZ, x.T) / m
    return deltaW,deltaB

# 每次反向求導後跟新w和b的值
def UpdateWeights(w, b, deltaW, deltaB, eta):
    w = w - eta * deltaW
    b = b - eta * deltaB
    return w, b

# 示例中給出的初始化w和b的方法,根據輸入和輸出的個數肯定不一樣
# 大小的矩陣
def SetParam(num_input,num_output, flag):
    if flag == 0:
        # zero
        W = np.zeros((num_output, num_input))
    elif flag == 1:
        # normalize
        W = np.random.normal(size=(num_output, num_input))
    elif flag == 2:
        # xavier
        W = np.random.uniform(
            -np.sqrt(6 / (num_input + num_output)),
            np.sqrt(6 / (num_input + num_output)),
            size=(num_output, num_input))

    B = np.zeros((num_output, 1))
    return W, B

#   這個函數計算在每一次迭代更新完權值以後的偏差
def GetLoss(w,b,x,y):
    m = x.shape[1]
    z = np.dot(w, x) + b
    LOSS = (z - y) ** 2
    loss = LOSS.sum() / m / 2
    return loss
# 在每次迭代的時候從x,y中取出合適大小的batch來進行訓練
def GetBatchSamples(X, Y, batch_size, iteration):
    num_feature = X.shape[0]
    start = iteration * batch_size
    end = start + batch_size
    batch_x = X[0:num_feature, start:end].reshape(num_feature, batch_size)
    batch_y = Y[0, start:end].reshape(1, batch_size)
    return batch_x, batch_y

def shuffle_batch(X, y, batch_size):
    rnd_idx = np.random.permutation(len(X))
    n_batches = len(X)//batch_size
    for batch_idx in np.array_split(rnd_idx, n_batches):
        X_batch, y_batch = X[batch_idx], y[batch_idx]
        yield X_batch, y_batch

if __name__ == '__main__':

    # 首先設置學習率 eta, batch_size, epoch的個數
    eta = 0.1
    size = {5,10,15}
    epochMax = 50
    #   x,y就是咱們的訓練數據
    x,y = ReadData()
    print np.shape(x),np.shape(y)

    plt.figure()

    for batch_size in size:
        loss = []
        #   iterMax是咱們每一個epoch迭代的次數,epochMax是咱們循環的次數
        iterMax = np.shape(x)[1] / batch_size
        w, b = SetParam(1, 1, 2)
        for epoch in range(epochMax):
            for x_batch, y_batch in shuffle_batch(X, y, batch_size):
            # 得到每個iter的batch數據

                # 得到前向傳播後的z值
                z_batch = Fowrad(w,b,x_batch)
                # 反向傳播
                deltaW,deltaB = BackPropagation(x_batch,y_batch,z_batch)
                # 更新w和b的值
                w,b = UpdateWeights(w,b,deltaW,deltaB,eta)
            # 每一個epoch後獲取新的參數得出的loss
            c = GetLoss(w,b,x,y)
            print("Epoch = %d , w = %f,b = %f ,loss = %f"%(epoch,w,b,c))
            loss.append(c)
        axisX = np.arange(5,epochMax,1)
        plt.plot(axisX,loss)
    plt.legend(['batch_size : 5','batch_size : 10','batch_size : 15'],loc ='upper right')
    plt.show()
    plt.figure()
    min = np.min(x,1)
    max = np.max(x,1)
    lineX = np.arange(min,max,0.01)
    lineX = lineX.reshape(1,-1)
    lineY = lineX * w + b
    print np.shape(lineY)
    plt.scatter(x,y,s=1)
    plt.plot(lineX[0,:],lineY[0,:],'r')

    plt.show()

5、實驗結果

  • 以上是不一樣batch_size下的代價函數隨着epoch的增長而變化的狀況能夠看當batchsize取10的時候降低最快app

  • 當咱們改變其學習率爲0.01,0.1,0.2,0.5時咱們看其變化
    • eta = 0.01時
      dom

    • eta = 0.1時
      ide

    • eta = 0.2時
      函數

    • eta = 0.5時
      學習

  • 咱們看到隨着學習率的變大,SGD的抖動也在逐漸增長。人工智能

  • 這是在最後取得的權值下,生成的擬合曲線。
    spa

代價函數變化:code

batch_size = 5

Epoch = 0 , w = 1.263874,b = 3.099159 ,loss = 0.061878
Epoch = 1 , w = 1.454535,b = 3.274156 ,loss = 0.017243
Epoch = 2 , w = 1.529773,b = 3.253623 ,loss = 0.014271
Epoch = 3 , w = 1.589247,b = 3.223812 ,loss = 0.012070
Epoch = 4 , w = 1.640776,b = 3.196955 ,loss = 0.010385
Epoch = 5 , w = 1.685762,b = 3.173438 ,loss = 0.009097
Epoch = 6 , w = 1.725059,b = 3.152890 ,loss = 0.008113
Epoch = 7 , w = 1.759388,b = 3.134939 ,loss = 0.007362
Epoch = 8 , w = 1.789379,b = 3.119256 ,loss = 0.006788
Epoch = 9 , w = 1.815578,b = 3.105557 ,loss = 0.006349
Epoch = 10 , w = 1.838466,b = 3.093589 ,loss = 0.006014
Epoch = 11 , w = 1.858460,b = 3.083133 ,loss = 0.005758
Epoch = 12 , w = 1.875927,b = 3.074000 ,loss = 0.005562
Epoch = 13 , w = 1.891186,b = 3.066021 ,loss = 0.005412
Epoch = 14 , w = 1.904517,b = 3.059050 ,loss = 0.005297
Epoch = 15 , w = 1.916162,b = 3.052961 ,loss = 0.005210
Epoch = 16 , w = 1.926335,b = 3.047641 ,loss = 0.005142
Epoch = 17 , w = 1.935222,b = 3.042994 ,loss = 0.005091
Epoch = 18 , w = 1.942986,b = 3.038934 ,loss = 0.005051
Epoch = 19 , w = 1.949769,b = 3.035387 ,loss = 0.005021
Epoch = 20 , w = 1.955694,b = 3.032289 ,loss = 0.004998
Epoch = 21 , w = 1.960870,b = 3.029582 ,loss = 0.004980
Epoch = 22 , w = 1.965392,b = 3.027218 ,loss = 0.004966
Epoch = 23 , w = 1.969342,b = 3.025152 ,loss = 0.004956
Epoch = 24 , w = 1.972793,b = 3.023348 ,loss = 0.004948
Epoch = 25 , w = 1.975808,b = 3.021771 ,loss = 0.004941
Epoch = 26 , w = 1.978442,b = 3.020394 ,loss = 0.004937
Epoch = 27 , w = 1.980742,b = 3.019191 ,loss = 0.004933
Epoch = 28 , w = 1.982752,b = 3.018140 ,loss = 0.004930
Epoch = 29 , w = 1.984508,b = 3.017222 ,loss = 0.004928
Epoch = 30 , w = 1.986042,b = 3.016420 ,loss = 0.004926
Epoch = 31 , w = 1.987382,b = 3.015719 ,loss = 0.004925
Epoch = 32 , w = 1.988553,b = 3.015107 ,loss = 0.004924
Epoch = 33 , w = 1.989575,b = 3.014572 ,loss = 0.004923
Epoch = 34 , w = 1.990469,b = 3.014105 ,loss = 0.004922
Epoch = 35 , w = 1.991249,b = 3.013697 ,loss = 0.004922
Epoch = 36 , w = 1.991931,b = 3.013340 ,loss = 0.004921
Epoch = 37 , w = 1.992527,b = 3.013029 ,loss = 0.004921
Epoch = 38 , w = 1.993047,b = 3.012757 ,loss = 0.004921
Epoch = 39 , w = 1.993502,b = 3.012519 ,loss = 0.004920
Epoch = 40 , w = 1.993899,b = 3.012311 ,loss = 0.004920
Epoch = 41 , w = 1.994246,b = 3.012130 ,loss = 0.004920
Epoch = 42 , w = 1.994549,b = 3.011972 ,loss = 0.004920
Epoch = 43 , w = 1.994813,b = 3.011833 ,loss = 0.004920
Epoch = 44 , w = 1.995045,b = 3.011712 ,loss = 0.004920
Epoch = 45 , w = 1.995247,b = 3.011607 ,loss = 0.004920
Epoch = 46 , w = 1.995423,b = 3.011514 ,loss = 0.004920
Epoch = 47 , w = 1.995577,b = 3.011434 ,loss = 0.004920
Epoch = 48 , w = 1.995712,b = 3.011363 ,loss = 0.004920
Epoch = 49 , w = 1.995830,b = 3.011302 ,loss = 0.004920

batch_size = 10

Epoch = 0 , w = 2.627484,b = 2.662620 ,loss = 0.022283
Epoch = 1 , w = 2.482123,b = 2.755555 ,loss = 0.014914
Epoch = 2 , w = 2.366523,b = 2.817388 ,loss = 0.010696
Epoch = 3 , w = 2.278350,b = 2.864496 ,loss = 0.008255
Epoch = 4 , w = 2.211115,b = 2.900418 ,loss = 0.006844
Epoch = 5 , w = 2.159845,b = 2.927810 ,loss = 0.006031
Epoch = 6 , w = 2.120750,b = 2.948697 ,loss = 0.005563
Epoch = 7 , w = 2.090938,b = 2.964625 ,loss = 0.005295
Epoch = 8 , w = 2.068205,b = 2.976770 ,loss = 0.005142
Epoch = 9 , w = 2.050870,b = 2.986031 ,loss = 0.005055
Epoch = 10 , w = 2.037652,b = 2.993094 ,loss = 0.005006
Epoch = 11 , w = 2.027572,b = 2.998479 ,loss = 0.004979
Epoch = 12 , w = 2.019886,b = 3.002585 ,loss = 0.004965
Epoch = 13 , w = 2.014025,b = 3.005717 ,loss = 0.004957
Epoch = 14 , w = 2.009556,b = 3.008104 ,loss = 0.004953
Epoch = 15 , w = 2.006148,b = 3.009925 ,loss = 0.004951
Epoch = 16 , w = 2.003549,b = 3.011314 ,loss = 0.004951
Epoch = 17 , w = 2.001568,b = 3.012372 ,loss = 0.004950
Epoch = 18 , w = 2.000056,b = 3.013180 ,loss = 0.004951
Epoch = 19 , w = 1.998904,b = 3.013795 ,loss = 0.004951
Epoch = 20 , w = 1.998025,b = 3.014265 ,loss = 0.004951
Epoch = 21 , w = 1.997355,b = 3.014623 ,loss = 0.004951
Epoch = 22 , w = 1.996845,b = 3.014896 ,loss = 0.004951
Epoch = 23 , w = 1.996455,b = 3.015104 ,loss = 0.004952
Epoch = 24 , w = 1.996158,b = 3.015263 ,loss = 0.004952
Epoch = 25 , w = 1.995931,b = 3.015384 ,loss = 0.004952
Epoch = 26 , w = 1.995759,b = 3.015476 ,loss = 0.004952
Epoch = 27 , w = 1.995627,b = 3.015546 ,loss = 0.004952
Epoch = 28 , w = 1.995526,b = 3.015600 ,loss = 0.004952
Epoch = 29 , w = 1.995450,b = 3.015641 ,loss = 0.004952
Epoch = 30 , w = 1.995391,b = 3.015672 ,loss = 0.004952
Epoch = 31 , w = 1.995347,b = 3.015696 ,loss = 0.004952
Epoch = 32 , w = 1.995313,b = 3.015714 ,loss = 0.004952
Epoch = 33 , w = 1.995287,b = 3.015728 ,loss = 0.004952
Epoch = 34 , w = 1.995267,b = 3.015738 ,loss = 0.004952
Epoch = 35 , w = 1.995252,b = 3.015746 ,loss = 0.004952
Epoch = 36 , w = 1.995241,b = 3.015753 ,loss = 0.004952
Epoch = 37 , w = 1.995232,b = 3.015757 ,loss = 0.004952
Epoch = 38 , w = 1.995225,b = 3.015761 ,loss = 0.004952
Epoch = 39 , w = 1.995220,b = 3.015764 ,loss = 0.004952
Epoch = 40 , w = 1.995216,b = 3.015766 ,loss = 0.004952
Epoch = 41 , w = 1.995213,b = 3.015767 ,loss = 0.004952
Epoch = 42 , w = 1.995211,b = 3.015768 ,loss = 0.004952
Epoch = 43 , w = 1.995209,b = 3.015769 ,loss = 0.004952
Epoch = 44 , w = 1.995208,b = 3.015770 ,loss = 0.004952
Epoch = 45 , w = 1.995207,b = 3.015771 ,loss = 0.004952
Epoch = 46 , w = 1.995206,b = 3.015771 ,loss = 0.004952
Epoch = 47 , w = 1.995206,b = 3.015771 ,loss = 0.004952
Epoch = 48 , w = 1.995205,b = 3.015772 ,loss = 0.004952
Epoch = 49 , w = 1.995205,b = 3.015772 ,loss = 0.004952

batch_size = 15

Epoch = 0 , w = 0.359754,b = 3.013313 ,loss = 0.427990
Epoch = 1 , w = 0.753299,b = 3.498736 ,loss = 0.075995
Epoch = 2 , w = 0.903888,b = 3.542616 ,loss = 0.055048
Epoch = 3 , w = 1.004769,b = 3.512235 ,loss = 0.046501
Epoch = 4 , w = 1.090577,b = 3.472006 ,loss = 0.039700
Epoch = 5 , w = 1.167951,b = 3.433004 ,loss = 0.034029
Epoch = 6 , w = 1.238552,b = 3.396928 ,loss = 0.029284
Epoch = 7 , w = 1.303122,b = 3.363848 ,loss = 0.025311
Epoch = 8 , w = 1.362202,b = 3.333566 ,loss = 0.021986
Epoch = 9 , w = 1.416263,b = 3.305853 ,loss = 0.019201
Epoch = 10 , w = 1.465732,b = 3.280493 ,loss = 0.016871
Epoch = 11 , w = 1.511001,b = 3.257288 ,loss = 0.014919
Epoch = 12 , w = 1.552425,b = 3.236052 ,loss = 0.013286
Epoch = 13 , w = 1.590330,b = 3.216621 ,loss = 0.011919
Epoch = 14 , w = 1.625017,b = 3.198839 ,loss = 0.010774
Epoch = 15 , w = 1.656758,b = 3.182568 ,loss = 0.009816
Epoch = 16 , w = 1.685803,b = 3.167679 ,loss = 0.009014
Epoch = 17 , w = 1.712381,b = 3.154054 ,loss = 0.008343
Epoch = 18 , w = 1.736702,b = 3.141586 ,loss = 0.007782
Epoch = 19 , w = 1.758958,b = 3.130177 ,loss = 0.007312
Epoch = 20 , w = 1.779323,b = 3.119737 ,loss = 0.006918
Epoch = 21 , w = 1.797959,b = 3.110184 ,loss = 0.006589
Epoch = 22 , w = 1.815012,b = 3.101442 ,loss = 0.006314
Epoch = 23 , w = 1.830617,b = 3.093442 ,loss = 0.006083
Epoch = 24 , w = 1.844897,b = 3.086122 ,loss = 0.005890
Epoch = 25 , w = 1.857964,b = 3.079423 ,loss = 0.005729
Epoch = 26 , w = 1.869921,b = 3.073294 ,loss = 0.005594
Epoch = 27 , w = 1.880863,b = 3.067685 ,loss = 0.005481
Epoch = 28 , w = 1.890875,b = 3.062552 ,loss = 0.005387
Epoch = 29 , w = 1.900037,b = 3.057855 ,loss = 0.005308
Epoch = 30 , w = 1.908421,b = 3.053557 ,loss = 0.005242
Epoch = 31 , w = 1.916093,b = 3.049624 ,loss = 0.005186
Epoch = 32 , w = 1.923113,b = 3.046026 ,loss = 0.005140
Epoch = 33 , w = 1.929538,b = 3.042732 ,loss = 0.005102
Epoch = 34 , w = 1.935416,b = 3.039719 ,loss = 0.005070
Epoch = 35 , w = 1.940796,b = 3.036961 ,loss = 0.005043
Epoch = 36 , w = 1.945718,b = 3.034438 ,loss = 0.005020
Epoch = 37 , w = 1.950222,b = 3.032129 ,loss = 0.005001
Epoch = 38 , w = 1.954344,b = 3.030016 ,loss = 0.004986
Epoch = 39 , w = 1.958116,b = 3.028082 ,loss = 0.004973
Epoch = 40 , w = 1.961568,b = 3.026313 ,loss = 0.004962
Epoch = 41 , w = 1.964726,b = 3.024694 ,loss = 0.004953
Epoch = 42 , w = 1.967616,b = 3.023212 ,loss = 0.004945
Epoch = 43 , w = 1.970261,b = 3.021856 ,loss = 0.004939
Epoch = 44 , w = 1.972681,b = 3.020616 ,loss = 0.004933
Epoch = 45 , w = 1.974895,b = 3.019480 ,loss = 0.004929
Epoch = 46 , w = 1.976922,b = 3.018442 ,loss = 0.004925
Epoch = 47 , w = 1.978776,b = 3.017491 ,loss = 0.004922
Epoch = 48 , w = 1. 980473,b = 3.016621 ,loss = 0.004920
Epoch = 49 , w = 1.982026,b = 3.015825 ,loss = 0.004918

6、實驗結論

SGD的迭代效果會和batch_size有關,batch_size過大或者太小都會致使權值的更新速度變慢。此處,batch_size = 10就是一個比較合適的大小。SGD在隨着epoch增長的時候會發生上下波動,這也正是其隨機性的體現。

7、問題解答

  • 問題2:爲何是橢圓而不是圓?如何把這個圖變成一個圓?
    • 答:由於損失函數的表達式是\(f(w,b) = \Sigma(w*x+b-y)^2\)=\(w^2*x^2+b^2+y^2+C*w+D*b+K\),所以w前面有一個x的係數,這樣的話這個函數就明顯是一個橢圓形的函數。若是咱們想要畫成一個圓,只須要把損失函數\(f(w,b)\)改爲\(f(w,b) = (w*y-w*x-b)^2\),這樣的畫畫出來的損失函數關於\(w和b\)的圖像將會是一個圓形。可是此時的損失函數的意義沒有原來的那麼明確。
  • 問題3:爲何中心是個橢圓區域而不是一個點?
    • 答:個人理解是
      • 一、咱們的數據精度不夠,致使本來不一樣的點因爲精度的緣由認爲其相等。
      • 二、因爲SGD只能獲得局部最優解,而不能獲得全局最優解,所以局部最優解可能有不少個,所以獲得是一個區域。
      • 三、實際上咱們的w和b也不是連續的而是一堆離散的點,所以會存在不少個近似相等的最優解。
相關文章
相關標籤/搜索