參考:Pytorch中autograd以及hook函數詳解
在pytorch中的自動求梯度機制(Autograd mechanics)中,若是將tensor的requires_grad設爲True, 那麼涉及到它的一系列運算將在反向傳播中自動求梯度。html
x = torch.randn(5, 5) # requires_grad=False by default
y = torch.randn(5, 5) # requires_grad=False by default
z = torch.randn((5, 5), requires_grad=True)
a = x + y
b = a + z
print(a.requires_grad, b.requires_grad)
可是自動求導的機制有個咱們須要注意的地方:在自動求導機制中只保存葉子節點,也就是中間變量在計算完成梯度後會自動釋放以節省空間. 因此下面代碼咱們在計算過程當中只獲得了z對x的梯度,而y和z的梯度都在梯度計算後被自動釋放了,因此顯示爲None.python
x = torch.tensor([1,2],dtype=torch.float32,requires_grad=True)
y = x * 2
z = torch.mean(y)
z.backward()
print("x.grad =", x.grad)
print("y.grad =", y.grad)
print("z.grad =", z.grad)
那麼可否獲得y,z的梯度呢?這就須要引入hook.
在pytorch的tutorial中介紹:
We’ve inspected the weights and the gradients. But how about inspecting / modifying the output and grad_output of a layer ? We introduce hooks for this purpose. hook的引入是爲了讓咱們能夠檢測或者修改一個layer的output或者grad_output.git
能夠爲Module或者Tensor註冊hook。
若是爲Tensor註冊hook, 用register_hook();
若是爲Module註冊hook, 若但願獲取前向傳播中layer的input, output信息,能夠用register_forward_hook(); 若是爲Module註冊hook, 若但願獲取反向傳播中layer的grad_in, grad_out信息,能夠用register_backward_hook().github
x = torch.tensor([1,2],dtype=torch.float32,requires_grad=True)
y = x * 2
y.register_hook(print)
z = torch.mean(y)
z.backward()
以上代碼中,對y進行register_hook引入print這個函數,print便是簡單的打印,將y相關的grad打印出來。
在執行z.backward()執行的時候,因爲y的hook函數也執行了,打印出了y關於輸出z的梯度, 即 tensor([0.5000, 0.5000]) 即是y的梯度。數組
參考連接:Toy example to understand Pytorch hooks
介紹這兩個的用法前,咱們先定義module, 以後的hook即是爲如下的module註冊的。網絡
import numpy as np
import torch
import torch.nn as nn
from IPython.display import Image
''' Define the Net '''
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.fc1 = nn.Linear(2,2)
self.s1 = nn.Sigmoid()
self.fc2 = nn.Linear(2,2)
self.s2 = nn.Sigmoid()
self.fc1.weight = torch.nn.Parameter(torch.Tensor([[0.15,0.2],[0.250,0.30]]))
self.fc1.bias = torch.nn.Parameter(torch.Tensor([0.35]))
self.fc2.weight = torch.nn.Parameter(torch.Tensor([[0.4,0.45],[0.5,0.55]]))
self.fc2.bias = torch.nn.Parameter(torch.Tensor([0.6]))
def forward(self, x):
x= self.fc1(x)
x = self.s1(x)
x= self.fc2(x)
x = self.s2(x)
return x
net = Net()
print(net)
''' Get the value of parameters defined in the Net '''
# parameters: weight and bias
print(list(net.parameters()))
''' feed the input data to get the output and loss '''
# input data
data = torch.Tensor([0.05,0.1])
# output of last layer
out = net(data)
target = torch.Tensor([0.01,0.99]) # a dummy target, for example
criterion = nn.MSELoss()
loss = criterion(out, target)
print(loss)
在MODULE.register_forward_hook(FUNCTION)中,涉及到input, output參數,
在MODULE.register_backward_hook(FUNCTION)中,涉及到grad_in, grad_out參數, 下面的圖示顯示了input, output分別是一個layer的輸入和輸出;
grad_in是整個神經網絡的輸出(能夠想成最終的損失L)對layer的output求偏導, grad_out是 ( L對output求偏導 × output對input的偏導) => 鏈式法則。session
from google.colab import files
from IPython.display import Image
uploaded = files.upload()
Image("hook_in_out.png")
input ----------------------------> output------------------> Last layer output
y--------------------------------------------z -----> ... ------------> L
grad_out <------------------------ grad_in
(dL/dz) * (dz / dy) ---------------(dL/dz)app
下面代碼中,若是backward = False, 表示的是前向傳播,input, output分別對應layer的輸入和輸出;
若是backward = True, 表示的是反向傳播過程,input 表示的是上圖中的 grad_in, output 表示的是上圖中的 grad_out.函數
''' Define hook
'''
# A simple hook class that returns the input and output of a layer during forward/backward pass
class Hook():
def __init__(self, module, backward=False):
if backward==False:
self.hook = module.register_forward_hook(self.hook_fn)
else:
self.hook = module.register_backward_hook(self.hook_fn)
def hook_fn(self, module, input, output):
self.input = input
self.output = output
def close(self):
self.hook.remove()
# get the _modules.items()
# format: (name, module)
print(list(net._modules.items()))
# use layer[0] to get the name and layer[1] to get the module
for layer in net._modules.items():
print(layer[0], layer[1])
爲Hook類建立對象時,須要傳入module參數,如下代碼經過layer[1] 獲取。將前向的hook都放在hookF數組中,將反向的hook都放在hookB的數組中。
注意必定要先註冊hook, 以後再將data傳入神經網路進行前向傳播,即註冊hook必定要在net(data)以前進行,由於hook函數是在forward的時候進行綁定的。ui
''' Register hooks on each layer
'''
hookF = [Hook(layer[1]) for layer in list(net._modules.items())]
hookB = [Hook(layer[1],backward=True) for layer in list(net._modules.items())]
# run a data batch
out=net(data)
print(out)
注意loss.backward(retain_graph = True) 對於backward_hook並不適用
如下報錯顯示了 'Hook' object has no attribute 'input', 對於loss, 它並非一個有input,output的網絡層,而只是網絡最後一層的輸出與target的aggregated的結果。
而以前定義的Hook中,要求有明確的input和output,因此,並不適用於loss.backward()
應該採用out.backward(label_tensor, retain_graph = True)
loss.backward(retain_graph = True)
print('***'*3+' Forward Hooks Inputs & Outputs '+'***'*3)
for hook in hookF:
print(hook.input)
print(hook.output)
print('---'*17)
print('\n')
#! loss.backward(retain_graph=True) # doesn't work with backward hooks,
#! since it's not a network layer but an aggregated result from the outputs of last layer vs target
print('***'*3+' Backward Hooks Inputs & Outputs '+'***'*3)
for hook in hookB:
print(hook.input)
print(hook.output)
print('---'*17)
下面採用的是正確的out.backward(torch.tensor([1,1],dtype=torch.float),retain_graph=True)的形式。
因爲調用backward()的是out, 一個tensor而不是scalar, pytorch中不能直接求解它的Jacobian矩陣,須要爲其指定grad_tensors.grad_tensors 能夠看作對應張量的每一個元素的梯度。
好比對於 y.backward(v,retain_graph = True), 其中 y = (y1, y2, y3), v = (v1, v2, v3), 那麼backward中執行的操做是,先分別 (y1 v1, y2 v2, y3 * v3),以後再對y求偏導,y再對parameter求偏導, 鏈式法則。
其實也能夠看作,在通常對網絡的輸出y, 與標籤l,利用損失函數獲得一個損失標量L,表示爲:
L = v1 y1 + v2 y2 + v3 y3;
dL/dy = (v1, v2, v3);
dL/dw = dL/dy dy/dw =( v1 dy/dw, v2 dy/dw, v3 * dy/dw)
上式dL/dw中的v即爲 y.backward(v,retain_graph = True)的v的體現。至關於對於y.backward()的梯度都對應乘了v的係數。
out.backward(torch.tensor([1, 1], dtype = torch.float), retain_graph = True)
print('***'*3+' Forward Hooks Inputs & Outputs '+'***'*3)
for hook in hookF:
print(hook.input)
print(hook.output)
print('---'*17)
print('\n')
print('***'*3+' Backward Hooks Inputs & Outputs '+'***'*3)
for hook in hookB:
print(hook.input)
print(hook.output)
print('---'*17)
Problem with backward hook function #598
在該Issue中,指出了pytorchde module的一個問題:
「Ok, so the problem is that module hooks are actually registered on the last function that the module has created. In your case x + y + z is computed as ((x + y) + z) so the hook is registered on that (_ + z) operation, and this is why you're getting only two grad inputs.
We'll definitely have to resolve this but it will need a large change in the autograd internals. However, right now @colesbury is rewriting them to make it possible to have multiple functions dispatched in parallel, and they would heavily conflict with his work. For now use only Variable hooks (or module hooks, but not on containers). Sorry!」
翻譯過來是,module hooks只爲一個module的最後的function註冊,好比對於 (x + y + z),本應分別獲得關於(x, y, z)這三個的grad, 可是pytorch會先計算(x + y), 以後計算( _ + z), 因此最終只有兩個grad,一個是關於(x + y)總體的grad, 一個是關於z的grad. 這是pytorch開發中一個比較難以解決的問題,目前該問題尚未被解決。
鑑於這個問題,爲了不沒必要要的bug出現,設計者建議使用tensor的register_hook, 而不是module的hook。若是出現相似問題,能夠知道從這裏找緣由。
from IPython.display import Image
Image(filename = "../../Downloads/zhifubao.png", width = 200, height = 200)
The wound is the place where the Light enters you. ~Rumi