pytorch固定BN層參數

背景:基於PyTorch的模型,想固定主分支參數,只訓練子分支,結果發如今不一樣epoch相同的測試數據通過主分支輸出的結果不一樣。html

緣由:未固定主分支BN層中的running_meanrunning_varpython

解決方法:將須要固定的BN層狀態設置爲eval網絡

問題示例dom

環境:torch:1.7.0測試

# -*- coding:utf-8 -*- import torch import torch.nn as nn import torch.nn.functional as F class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.conv1 = nn.Conv2d(1, 6, 3) self.bn1 = nn.BatchNorm2d(6) self.conv2 = nn.Conv2d(6, 16, 3) self.bn2 = nn.BatchNorm2d(16) # an affine operation: y = Wx + b self.fc1 = nn.Linear(16 * 6 * 6, 120) # 6*6 from image dimension self.fc2 = nn.Linear(120, 84) self.fc3 = nn.Linear(84, 5) def forward(self, x): # Max pooling over a (2, 2) window x = F.max_pool2d(F.relu(self.bn1(self.conv1(x))), (2, 2)) # If the size is a square you can only specify a single number x = F.max_pool2d(F.relu(self.bn2(self.conv2(x))), 2) x = x.view(-1, self.num_flat_features(x)) x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = self.fc3(x) return x def num_flat_features(self, x): size = x.size()[1:] # all dimensions except the batch dimension num_features = 1 for s in size: num_features *= s return num_features def print_parameter_grad_info(net): print('-------parameters requires grad info--------') for name, p in net.named_parameters(): print(f'{name}:\t{p.requires_grad}') def print_net_state_dict(net): for key, v in net.state_dict().items(): print(f'{key}') if __name__ == "__main__": net = Net() print_parameter_grad_info(net) net.requires_grad_(False) print_parameter_grad_info(net) torch.random.manual_seed(5) test_data = torch.rand(1, 1, 32, 32) train_data = torch.rand(5, 1, 32, 32) # print(test_data) # print(train_data[0, ...]) for epoch in range(2): # training phase, 假設每一個epoch只迭代一次 net.train() pre = net(train_data) # 計算損失和參數更新等 # .... # test phase net.eval() x = net(test_data) print(f'epoch:{epoch}', x) 

運行結果:優化

-------parameters requires grad info--------
conv1.weight:   True
conv1.bias:     True
bn1.weight:     True
bn1.bias:       True
conv2.weight:   True
conv2.bias:     True
bn2.weight:     True
bn2.bias:       True
fc1.weight:     True
fc1.bias:       True
fc2.weight:     True
fc2.bias:       True
fc3.weight:     True
fc3.bias:       True
-------parameters requires grad info--------
conv1.weight:   False
conv1.bias:     False
bn1.weight:     False
bn1.bias:       False
conv2.weight:   False
conv2.bias:     False
bn2.weight:     False
bn2.bias:       False
fc1.weight:     False
fc1.bias:       False
fc2.weight:     False
fc2.bias:       False
fc3.weight:     False
fc3.bias:       False
epoch:0 tensor([[-0.0755,  0.1138,  0.0966,  0.0564, -0.0224]])
epoch:1 tensor([[-0.0763,  0.1113,  0.0970,  0.0574, -0.0235]])

能夠看到:ui

net.requires_grad_(False)已經將網絡中的各參數設置成了不須要梯度更新的狀態,可是一樣的測試數據test_data在不一樣epoch中前向以後出現了不一樣的結果。this

調用print_net_state_dict能夠看到BN層中的參數running_meanrunning_var並沒在可優化參數net.parameterslua

bn1.weight
bn1.bias
bn1.running_mean
bn1.running_var
bn1.num_batches_tracked

但在training pahse的前向過程當中,這兩個參數被更新了。致使整個網絡在freeze的狀況下,一樣的測試數據出現了不一樣的結果spa

Also by default, during training this layer keeps running estimates of its computed mean and variance, which are then used for normalization during evaluation. The running estimates are kept with a defaultmomentumof 0.1. source

所以在training phase時對BN層顯式設置eval狀態:

if __name__ == "__main__": net = Net() net.requires_grad_(False) torch.random.manual_seed(5) test_data = torch.rand(1, 1, 32, 32) train_data = torch.rand(5, 1, 32, 32) # print(test_data) # print(train_data[0, ...]) for epoch in range(2): # training phase, 假設每一個epoch只迭代一次 net.train() net.bn1.eval() net.bn2.eval() pre = net(train_data) # 計算損失和參數更新等 # .... # test phase net.eval() x = net(test_data) print(f'epoch:{epoch}', x) 

能夠看到結果正常了:

epoch:0 tensor([[ 0.0944, -0.0372,  0.0059, -0.0625, -0.0048]])
epoch:1 tensor([[ 0.0944, -0.0372,  0.0059, -0.0625, -0.0048]])

交流基地:630390733

相關文章
相關標籤/搜索