動手學pytorch——線性迴歸

時間 2020-02-14

標籤動手 pytorch 線性迴歸欄目應用數學简体版

原文原文鏈接

線性迴歸

原文連接算法

內容：網絡

線性迴歸的基本要素
線性迴歸模型從零開始的實現
線性迴歸模型使用pytorch的簡潔實現

線性迴歸的基本要素

模型
爲了簡單起見，這裏咱們假設價格只取決於房屋情況的兩個因素，即面積（平方米）和房齡（年）。接下來咱們但願探索價格與這兩個因素的具體關係。線性迴歸假設輸出與各個輸入之間是線性關係:
dom

數據集
咱們一般收集一系列的真實數據，例如多棟房屋的真實售出價格和它們對應的面積和房齡。咱們但願在這個數據上面尋找模型參數來使模型的預測價格與真實價格的偏差最小。在機器學習術語裏，該數據集被稱爲訓練數據集（training data set）或訓練集（training set），一棟房屋被稱爲一個樣本（sample），其真實售出價格叫做標籤（label），用來預測標籤的兩個因素叫做特徵（feature）。特徵用來表徵樣本的特色。機器學習

損失函數
在模型訓練中，咱們須要衡量價格預測值與真實值之間的偏差。一般咱們會選取一個非負數做爲偏差，且數值越小表示偏差越小。一個經常使用的選擇是平方函數。它在評估索引爲i的樣本偏差的表達式爲ide

優化函數 - 隨機梯度降低
當模型和損失函數形式較爲簡單時，上面的偏差最小化問題的解能夠直接用公式表達出來。這類解叫做解析解（analytical solution）。本節使用的線性迴歸和平方偏差恰好屬於這個範疇。然而，大多數深度學習模型並無解析解，只能經過優化算法有限次迭代模型參數來儘量下降損失函數的值。這類解叫做數值解（numerical solution）。函數

在求數值解的優化算法中，小批量隨機梯度降低（mini-batch stochastic gradient descent）在深度學習中被普遍使用。它的算法很簡單：先選取一組模型參數的初始值，如隨機選取；接下來對參數進行屢次迭代，使每次迭代均可能下降損失函數的值。在每次迭代中，先隨機均勻採樣一個由固定數目訓練數據樣本所組成的小批量（mini-batch）B，而後求小批量中數據樣本的平均損失有關模型參數的導數（梯度），最後用此結果與預先設定的一個正數的乘積做爲模型參數在本次迭代的減少量。學習

學習率:η表明在每次優化中，可以學習的步長的大小
批量大小:B是小批量計算中的批量大小batch size優化

總結一下，優化函數的有如下兩個步驟：ui

(i)初始化模型參數，通常來講使用隨機初始化；
(ii)咱們在數據上迭代屢次，經過在負梯度方向移動參數來更新每一個參數。

線性迴歸模型從零開始的實現

%matplotlib inline
import torch
from IPython import display
from matplotlib import pyplot as plt
import numpy as np
import random

print(torch.__version__)

**#生成數據集，使用線性模型來生成數據集，
#生成一個1000個樣本的數據集，下面是用來生成數據的線性關係：**

# set input feature number 
num_inputs = 2
# set example number
num_examples = 1000

# set true weight and bias in order to generate corresponded label
true_w = [2, -3.4]
true_b = 4.2

features = torch.randn(num_examples, num_inputs,
                      dtype=torch.float32)
 #torch.randn=()函數說明，正態分佈生成 num_examples*num_inputs個張量，dtype爲float32類型                  
 
#(torch.randn(*sizes, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) -> Tensor
labels = true_w[0] * features[:, 0] + true_w[1] * features[:, 1] + true_b#生成預測標籤值
labels += torch.tensor(np.random.normal(0, 0.01, size=labels.size()),
                       dtype=torch.float32)#加上(0,0.01)的標準正態分佈的高斯白噪聲干擾
                       
#使用圖像來展現生成的數據
plt.scatter(features[:, 1].numpy(), labels.numpy(), 1);


#讀取數據集
def data_iter(batch_size, features, labels):
    num_examples = len(features)
    indices = list(range(num_examples))#隨機化序號
    random.shuffle(indices)   #random read 10 samples
    
    for i in range(0, num_examples, batch_size):
        j = torch.LongTensor(indices[i: min(i + batch_size, num_examples)])  #the last time may be not enough for a whole batch
        yield  features.index_select(0, j), labels.index_select(0, j)
 #torch.index_select(a,0/1,tensor[a,b]) 第一個參數表明索引的對象，第二個參數表明0爲行索引，1爲列索引，第三個參數tensor裏面爲索引的序號。
 
 batch_size = 10
for X, y in data_iter(batch_size, features, labels):
    print(X, '\n', y)
    break
    
# 初始化模型參數        
w = torch.tensor(np.random.normal(0, 0.01, (num_inputs, 1)), dtype=torch.float32)#標準正態分佈生成（0,0.01）的(num_inputs*1)的特徵張量做爲參數，與以前的真實特徵值做爲對比
b = torch.zeros(1, dtype=torch.float32)

w.requires_grad_(requires_grad=True)#表示容許對w反向求導
b.requires_grad_(requires_grad=True)

定義模型spa

def linreg(X, w, b):
    return torch.mm(X, w) + b  #torch.mm爲矩陣乘法，torch.mul(a, b)是矩陣a和b對應位相乘，a和b的維度必須相等，好比a的維度是(1, 2)，b的維度是(1, 2)，返回的還是(1, 2)的矩陣,
    #torch.mm(a, b)是矩陣a和b矩陣相乘，好比a的維度是(1, 2)，b的維度是(2, 3)，返回的就是(1, 3)的矩陣

定義損失函數

def squared_loss(y_hat, y): 
    return (y_hat - y.view(yhat.size())) ** 2 / 2 #y.view(yhat.size(),-1)把原先多維張量變爲一維的張量，無論尺寸如何，其中-1爲自適應

定義優化函數

def sgd(params, lr, batch_size): 
    for param in params:
        param.data -= lr * param.grad  batch_size # ues .data to operate param without gradient track

訓練
當數據集、模型、損失函數和優化函數定義完了以後就可來準備進行模型的訓練了。

# super parameters init
lr = 0.03
num_epochs = 5

net = linreg    #線性模型
loss = squared_loss #平方損失函數

# training
for epoch in range(num_epochs):  # training repeats num_epochs times
    # in each epoch, all the samples in dataset will be used once
    
    # X is the feature and y is the label of a batch sample
    for X, y in data_iter(batch_size, features, labels):
        l = loss(net(X, w, b), y).sum()  
        # calculate the gradient of batch sample loss 
        l.backward()  
        # using small batch random gradient descent to iter model parameters
        sgd([w, b], lr, batch_size)  
        # reset parameter gradient
        w.grad.data.zero_() #防止(w,b)梯度累加，計算一次梯度，更新一次網絡
        b.grad.data.zero_()
    train_l = loss(net(features, w, b), labels)#訓練好的w,b與真實標籤值進行對比
    print('epoch %d, loss %f' % (epoch + 1, train_l.mean().item())) #格式化輸出
    
w, true_w, b, true_b

訓練的結果：其中用了2000個訓練樣本