動手深度學習1- pytorch初學

時間 2019-11-06

標籤動手深度學習 pytorch 初學简体版

原文原文鏈接

pytorch 初學

Tensors

Tensors 與numpy 中的ndarrays很像，pytorch能夠支持GPU操做python

from __future__ import print_function
import torch

建立空的tensor

x = torch.empty(5,3) # 5行3列的空的tensors
print(x)

tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])

建立隨機的一個隨機數矩陣

x = torch.rand(5,3)
print(x)

tensor([[0.5109, 0.1927, 0.5499],
        [0.8677, 0.8713, 0.9610],
        [0.9356, 0.0391, 0.3159],
        [0.0266, 0.7895, 0.6610],
        [0.7188, 0.1331, 0.2180]])

建立0元素的矩陣

x = torch.zeros(5,3)

print(x)

tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])

直接從已經數據建立tensor

x= torch.tensor([5,5,3])
print(x)

tensor([5, 5, 3])

建立新的矩陣

x = x.new_ones(5,3,dtype=torch.double)
print(x)
# 根據現有的張量建立張量。 這些方法將重用輸入張量的屬性，例如， dtype，除非設置新的值進行覆蓋

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]], dtype=torch.float64)

x = torch.randn_like(x,dtype=torch.float)
print(x)
# 更新了x的dtype,保留了原始的x的size

tensor([[ 0.8914,  1.5704, -0.1844],
        [ 0.7747, -0.6860, -0.5596],
        [ 0.1804, -0.2909, -1.3262],
        [-1.3021, -0.4132, -2.7060],
        [ 0.8989, -0.7269,  1.3862]])

print(x.size())

torch.Size([5, 3])

計算操做

加法操做

y = torch.rand(5,3)
print(x+y)

tensor([[ 1.6333,  2.1744,  0.4975],
        [ 1.5430, -0.5863, -0.1416],
        [ 0.6954,  0.6694, -0.4113],
        [-0.9279, -0.1156, -1.8519],
        [ 1.5791,  0.1524,  2.1037]])

print(torch.add(x,y))

tensor([[ 1.6333,  2.1744,  0.4975],
        [ 1.5430, -0.5863, -0.1416],
        [ 0.6954,  0.6694, -0.4113],
        [-0.9279, -0.1156, -1.8519],
        [ 1.5791,  0.1524,  2.1037]])

result = torch.empty(5,3)
torch.add(x,y,out=result)
print(result)

tensor([[ 1.6333,  2.1744,  0.4975],
        [ 1.5430, -0.5863, -0.1416],
        [ 0.6954,  0.6694, -0.4113],
        [-0.9279, -0.1156, -1.8519],
        [ 1.5791,  0.1524,  2.1037]])

# 加法操做，inplace 操做,替換y值，相似於y+=x
y.add_(x)
print(y)

tensor([[ 1.6333,  2.1744,  0.4975],
        [ 1.5430, -0.5863, -0.1416],
        [ 0.6954,  0.6694, -0.4113],
        [-0.9279, -0.1156, -1.8519],
        [ 1.5791,  0.1524,  2.1037]])

轉化形狀

x = torch.randn(4,4)
y= x.view(16)
z=x.view(-1,8)  # the size -1 是根據其餘維度進行計算出來的
print(x.size(),y.size(),z.size())

torch.Size([4, 4]) torch.Size([16]) torch.Size([2, 8])

torch.numel(x)  # return the number of the input tensor

tensor 與numpy 的轉化

import numpy as np
a = np.array([1,2,3])
t = torch.as_tensor(a)
print(t)

tensor([1, 2, 3])

a = torch.ones(5)
print(a)

tensor([1., 1., 1., 1., 1.])

b = a.numpy()  # 相似於a和b，a 變了，b也跟着變，相似於numpy 中的view的操做
print(b)

[1. 1. 1. 1. 1.]

a.add_(1)
print(a)
print(b)

tensor([2., 2., 2., 2., 2.])
[2. 2. 2. 2. 2.]

id(a),id(b)

(4487109080, 4807132064)

數據在GPU上的操做

# 如下代碼只有在PyTorch GPU版本上纔會執行，配的mac沒有GPU，因此沒有顯示結果
if torch.cuda.is_available():
    device = torch.device("cuda")          # GPU
    y = torch.ones_like(x, device=device)  # 直接建立一個在GPU上的Tensor
    x = x.to(device)                       # 等價於 .to("cuda")
    z = x + y
    print(z)
    print(z.to("cpu", torch.double))       # to()還能夠同時更改數據類型

自動梯度求導

深度學習中一般須要對函數求梯度(gradient),pytorch提供的autograd包可以根據輸入和前向傳播過程自動構建計算圖，並執行方向傳播過程，後續將主要介紹autograd包實現自動求梯度的有關操做函數

自動求導的概念

上節介紹的Tensor是這個包的核心類，若是將其屬性 .required_grad 設置爲True，將開始追蹤(track)在其上的全部操做(能夠利用鏈式法則進行梯度傳播了)。計算完成後，能夠調用.backward() 來完成全部的梯度計算。此Tensor的梯度將累積到.grad屬性中。
須要注意的是，若是調用y.backward()時，若是y是標量，則不須要爲backward() 傳入任何參數。其他狀況，須要傳入一個與y相同shape的Tensor。學習

若是不想被繼續追蹤，能夠調用.detach()將其追蹤記錄中分離出來，這樣就能夠防止未來的計算被追蹤，這樣梯度就傳不過去了。此外還能夠用with torch.no_grad() 將不想被追蹤的操做代碼塊包裹起來，這樣的方法在評估模型的時候經常使用，由於在評估模型時，不須要計算已經訓練出的參數的的梯度。ui

Function 類

Function是另一個很重要的類，Tensor和Function相互結合就能夠構建一個記錄有整個計算過程的有向無環圖(DAG)？？？？
每一個Tensor都有一個.grad_fn屬性，該屬性即建立Tensor的Function,就是說該Tensor是否是經過某些運算獲得的，若是是，grad_fn返還一個與這些運算相關的對象，不然是None。spa

Tensor實例

# 建立一個Tensor並設置requires_grad=True
x= torch.ones(2,2,requires_grad=True)
print(x)
print(x.grad_fn)# 返回結果爲None,x是直接建立的，則說明該Tensor不是經過運算獲得

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)
None

y =x+2
print(y)
print(y.grad_fn)  ## AddBackward0，y是經過一個假髮操做建立的


'''
想x這樣直接經過建立的稱爲葉子節點，葉子節點對應grad_fn 是None


'''
print(x.is_leaf,y.is_leaf)

tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)
<AddBackward0 object at 0x11eb55cc0>
True False

複雜運算

z = y*y*3
out = z.mean()
print(z,out)

tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>) tensor(27., grad_fn=<MeanBackward1>)

"""
能夠經過.requires_grad_()來用in_place的方式改變requires_grad的屬性
"""
a= torch.randn(2,2)  # 缺失的狀況下默認requires_grad=False
a =(a*3)/(a-1)
print(a.requires_grad)   # False
a.requires_grad_(True)
print(a.requires_grad)    # True
b = (a*a).sum()
print(b.grad_fn)

False
True
<SumBackward0 object at 0x11eb65780>

梯度

'''
由於out是一個標量，因此調用backward()時不須要指定求導變量：

'''
out.backward()    # 等價於out.backward(torch.tensor(1.))

咱們看下out關於x的梯度
\[ \frac { d ( o u t ) } { d x } \]code

print(x.grad)

tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])

\[ o = \frac { 1 } { 4 } \sum _ { i = 1 } ^ { 4 } z _ { i } = \frac { 1 } { 4 } \sum _ { i = 1 } ^ { 4 } 3 \left( x _ { i } + 2 \right) ^ { 2 } \]對象

\[ \left. \frac { \partial o } { \partial x _ { i } } \right| _ { x _ { i } = 1 } = \frac { 9 } { 2 } = 4.5 \]input

數學上的意義

數學上，若是有一個函數值和自變量都爲向量的函數 y=f(x)
y=f(x), 那麼 y關於x 的梯度就是一個雅可比矩陣（Jacobian matrix）:
\[ J = \left( \begin{array} { c c c } { \frac { \partial y 1 } { \partial x _ { 1 } } } & { \cdots } & { \frac { \partial y _ { 1 } } { \partial x _ { n } } } \\ { \vdots } & { \ddots } & { \vdots } \\ { \frac { \partial y _ { m } } { \partial x _ { 1 } } } & { \cdots } & { \frac { \partial y _ { m } } { \partial x _ { n } } } \end{array} \right) \]深度學習

而torch.autograd這個包就是用來計算一些雅克比矩陣的乘積的，例如若是v是一個標量函數的L = g(y)的梯度：數學

\[ v = \left( \begin{array} { c c c } { \frac { \partial l } { \partial y 1 } } & { \cdots } & { \frac { \partial l } { \partial y m } } \end{array} \right) \]

那麼根據鏈式法則，能夠獲得：L關於x的雅克比矩陣就是

\[ v J = \left( \begin{array} { c c c } { \frac { \partial l } { \partial y _ { 1 } } } & { \cdots } & { \frac { \partial l } { \partial y _ { m } } } \end{array} \right) \left( \begin{array} { c c c } { \frac { \partial y _ { 1 } } { \partial x _ { 1 } } } & { \cdots } & { \frac { \partial y _ { 1 } } { \partial x _ { n } } } \\ { \vdots } & { \ddots } & { \vdots } \\ { \frac { \partial y _ { m } } { \partial x _ { 1 } } } & { \cdots } & { \frac { \partial y _ { m } } { \partial x _ { n } } } \end{array} \right) = \left( \begin{array} { c c c } { \frac { \partial l } { \partial x _ { 1 } } } & { \cdots } & { \frac { \partial l } { \partial x _ { n } } } \end{array} \right) \]

注意：grad在反向傳播過程當中是累加的(accumulated)，這意味着運行反向傳播，梯度都會累加到前一次的梯度，因此通常在反正傳播以前須要把梯度清零

# 再來反向傳播一次，注意grad是累加的
out2 = x.sum()
out2.backward()
print(x.grad)

out3 = x.sum()
x.grad.data.zero_()   # 梯度清零，將梯度的數據變成0
out3.backward()
print(x.grad)

tensor([[5.5000, 5.5000],
        [5.5000, 5.5000]])
tensor([[1., 1.],
        [1., 1.]])

如今須要解釋一個問題：
爲何在y.backward()時，若是y是標量，責不須要爲backward()傳入任何參數；不然須要傳入一個與y同形的Tensor?

首先爲了不向量(甚至更高維張量)對張量求導，而轉換成標量對張量求導；
不容許張量對張量求導，只容許標量與張量求導，求導的結果是和自變量同形的張量。這個地方說的就是不能讓函數對函數求導(估計是一個意思吧）

x = torch.tensor([1.0,2.0,3.0,4.0],requires_grad=True)
y = 2*x
z= y.view(2,2)
print(z)

tensor([[2., 4.],
        [6., 8.]], grad_fn=<ViewBackward>)

如今的y不是一個標量，因此在調用backward()時須要傳入一個和y同形的權重向量進行加權就和獲得一個標量。

v = torch.tensor([[1.0,1.0],[0.01,0.01]],dtype=torch.float)
z.backward(v)   # 此時v就是與y同形的權重向量
print(x.grad)   # x.grad是和x同形的張量

tensor([2.0000, 2.0000, 0.0200, 0.0200])

中斷梯度追蹤

x = torch.tensor(1.0,requires_grad=True)
y1 = x**2
with torch.no_grad():
    y2 = x**3
y3 = y1+y2
print(x.requires_grad)
print(y1,y1.requires_grad)   # 平方
print(y2,y2.requires_grad)   # 標量，在with torch.no_grad未被追蹤
print(y3,y3.requires_grad)    # 求和
print(y2,y2.is_leaf)  #  標量沒有計算公式，y2也稱爲稱爲葉子節點，葉子節點對應grad_fn 是None

True
tensor(1., grad_fn=<PowBackward0>) True
tensor(1.) False
tensor(2., grad_fn=<AddBackward0>) True
tensor(1.) True

y3.backward()

print(x.grad)

tensor(2.)

y3 = y1+y2 =x2+ y3,當x=1 時，dy3/dx應該是5，可是y2的定義被torch.no_grad()包裹的，
因此與y2相關的梯度是不會被回傳的，只有與y1有關的梯度纔會回傳，即x**2對x的梯度

修改tensor的數字，不被autograd記錄(即不會影響方向傳播)，能夠對tensor.data 進行操做

x = torch.ones(1,requires_grad=True)
print(x.data) # 也是一個tensor
print(x.data.requires_grad)  # 可是已是獨立於計算圖以外
y = 2*x

x.data *=100 # 只改變了data屬性值，不會記錄在計算圖，所以不會影響梯度傳播
y.backward()
print(x)
print(x.grad)

tensor([1.])
False
tensor([100.], requires_grad=True)
tensor([2.])

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。