VisualPytorch發佈域名+雙服務器以下:
http://nag.visualpytorch.top/static/ (對應114.115.148.27)
http://visualpytorch.top/static/ (對應39.97.209.22)python
梯度降低: \(w_{i+1} = w_i-LR *g(w_i)\),學習率(learning rate)控制更新的步伐服務器
pytorch中全部學習率控制都繼承與class _LRScheduler
網絡
主要屬性及函數:app
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1) # 設置學習率降低策略 for epoch in range(MAX_EPOCH): ... for i, data in enumerate(train_loader): ... scheduler.step() # 更新學習率,注意在每一個epoch調用,而不是每一個iteration
StepLR
功能:等間隔調整學習率
主要參數:
• step_size:調整間隔數
• gamma:調整係數
\(lr = lr_0 * gamma**(epoch//step\_size)\)函數
MultiStepLR
功能:按給定間隔調整學習率
主要參數:
• milestones:設定調整時刻數
• gamma:調整係數
\(lr = lr * gamma\)學習
ExponentialLR
功能:按指數衰減調整學習率
主要參數:
• gamma:指數的底
\(lr = lr_0 * gamma**epoch\)優化
CosineAnnealingLR
功能:餘弦週期調整學習率
主要參數:
• T_max:降低週期,如圖所示降低週期爲50epoch
• eta_min:學習率下限ui
\(\eta_t=\eta_{min}+\frac{1}{2}(\eta_{max}-\eta_{min})(1+cos(\frac{T_{cur}}{T_{max}}\pi))\)spa
ReduceLRonPlateau
功能:監控指標,當指標再也不變化則調整
主要參數:
• mode:min/max 兩種模式,min表示監控指標再也不減少則調整
• factor:調整係數
• patience:「耐心 」,接受幾回不變化
• cooldown:「冷卻時間」,中止監控一段時間
• verbose:是否打印日誌
• min_lr:學習率下限
• eps:學習率衰減最小值命令行
scheduler_lr = optim.lr_scheduler.ReduceLROnPlateau(optimizer, factor=0.1, mode="min", patience=10,cooldown=10, min_lr=1e-4, verbose=True) for epoch in range(max_epoch): for i in range(iteration): # train(...) optimizer.step() optimizer.zero_grad() if epoch == 5: loss_value = 0.4 scheduler_lr.step(loss_value) ''' Epoch 16: reducing learning rate of group 0 to 1.0000e-02. Epoch 37: reducing learning rate of group 0 to 1.0000e-03. Epoch 58: reducing learning rate of group 0 to 1.0000e-04. '''
LambdaLR
功能:自定義調整策略,對多組參數採用不一樣的學習率調整方式
主要參數:
• lr_lambda:function or list
optimizer = optim.SGD([ {'params': [weights_1]}, {'params': [weights_2]}], lr=lr_init) lambda1 = lambda epoch: 0.1 ** (epoch // 20) lambda2 = lambda epoch: 0.95 ** epoch scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda=[lambda1, lambda2])
pip install tensorbord pip install future
在含有runs的文件夾下命令行輸入 tensorboard --logdir=./
,便可打開,以下圖所示。
可視化任意網絡模型訓練的Loss,及Accuracy曲線圖,Train與Valid必須在同一個圖中(節選人民幣分類訓練部分):
# 構建 SummaryWriter writer = SummaryWriter(comment='test_your_comment', filename_suffix="_test_your_filename_suffix") for epoch in range(MAX_EPOCH): loss_mean = 0. correct = 0. total = 0. net.train() for i, data in enumerate(train_loader): ... # 記錄數據,保存於event file writer.add_scalars("Loss", {"Train": loss.item()}, iter_count) writer.add_scalars("Accuracy", {"Train": correct / total}, iter_count) # 每一個epoch,記錄梯度,權值 for name, param in net.named_parameters(): writer.add_histogram(name + '_grad', param.grad, epoch) writer.add_histogram(name + '_data', param, epoch) scheduler.step() # 更新學習率
經過matplotlib直接繪製的曲線(訓練集和驗證集,iteration爲單位),第二張是tensorbord。能夠發現,若是沒有排除離羣點和平滑,兩個圖是一致的。
能夠看到,隨着迭代次數的增長梯度愈來愈小,並非梯度消失,而是自己Loss已經達到1e-4.
add_image()
功能:記錄圖像
• tag:圖像的標籤名,圖的惟一標識
• img_tensor:圖像數據,注意尺度。只要該圖像有>1的像素點,再也不對該圖像*255
標準化
• global_step:x軸 • dataformats:數據形式,CHW,HWC,HW
torchvision.utils.make_grid()
功能:製做網格圖像
• tensor:圖像數據, B*C*H*W形式
• nrow:行數(列數自動計算)
• padding:圖像間距(像素單位)
• normalize:是否將像素值標準化
• range:標準化範圍
• scale_each:是否單張圖維度標準化
• pad_value:padding的像素值
writer = SummaryWriter(comment='test_your_comment', filename_suffix="_test_your_filename_suffix") alexnet = models.alexnet(pretrained=True) kernel_num = -1 for sub_module in alexnet.modules(): if isinstance(sub_module, nn.Conv2d): kernel_num += 1 kernels = sub_module.weight c_out, c_int, k_w, k_h = tuple(kernels.shape) # 每個卷積核單獨繪製三個通道 for o_idx in range(c_out): kernel_idx = kernels[o_idx, :, :, :].unsqueeze(1) # make_grid須要 BCHW,這裏拓展C維度 kernel_grid = vutils.make_grid(kernel_idx, normalize=True, scale_each=True, nrow=c_int) writer.add_image('{}_Convlayer_split_in_channel'.format(kernel_num), kernel_grid, global_step=o_idx) # 全部卷積核一塊兒繪製 kernel_all = kernels.view(-1, 3, k_h, k_w) # 3, h, w kernel_grid = vutils.make_grid(kernel_all, normalize=True, scale_each=True, nrow=8) # c, h, w writer.add_image('{}_all'.format(kernel_num), kernel_grid, global_step=322) print("{}_convlayer shape:{}".format(kernel_num, tuple(kernels.shape))) # 模型, 特徵圖的可視化 alexnet = models.alexnet(pretrained=True) # forward convlayer1 = alexnet.features[0] fmap_1 = convlayer1(img_tensor) # 預處理 fmap_1.transpose_(0, 1) # bchw=(1, 64, 55, 55) --> (64, 1, 55, 55) fmap_1_grid = vutils.make_grid(fmap_1, normalize=True, scale_each=True, nrow=8) writer.add_image('feature map in conv1', fmap_1_grid, global_step=322) writer.close()
add_graph()
功能:可視化模型計算圖
• model:模型,必須是 nn.Module
• input_to_model:輸出給模型的數據
• verbose:是否打印計算圖結構信息
注意使用該方法對環境有所限制,torch版本必須>=1.3,在該版本下運行生成runs文件夾後,可更換爲原環境運行tensorboard.
torchsummary
功能:查看模型信息,便於調試
• model:pytorch模型
• input_size:模型輸入size
• batch_size:batch size
• device:「cuda」 or 「cpu」
Tensor.register_hook
功能:註冊一個反向傳播hook函數,爲了避免修改主體而實現特定的功能
Hook函數僅一個輸入參數,爲張量的梯度
w = torch.tensor([1.], requires_grad=True) x = torch.tensor([2.], requires_grad=True) a = torch.add(w, x) b = torch.add(w, 1) y = torch.mul(a, b) a_grad = list() def grad_hook(grad): a_grad.append(grad) def grad_hook(grad): grad *= 2 return grad*3 # 返回值會覆蓋掉原來的grad,故最後w.grad = 6*5 = 30 handle = w.register_hook(grad_hook) handle = a.register_hook(grad_hook) y.backward() # 查看梯度 print("gradient:", w.grad, x.grad, a.grad, b.grad, y.grad) # 30 2 None None None print("a_grad[0]: ", a_grad[0]) # 2 handle.remove()
Function | Parameter | Usage |
---|---|---|
Module.register_forward_hook | module, input, output | 註冊module的前向傳播hook函數 |
register_forward_pre_hook | module, input | 註冊module前向傳播前的hook函數 |
register_backward_hook | module, input, output | 註冊module反向傳播的hook函數 |
參數:
• module: 當前網絡層
• input:當前網絡層輸入數據
• output:當前網絡層輸出數據
class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.conv1 = nn.Conv2d(1, 2, 3) self.pool1 = nn.MaxPool2d(2, 2) def forward(self, x): x = self.conv1(x) x = self.pool1(x) return x def forward_hook(module, data_input, data_output): fmap_block.append(data_output) input_block.append(data_input) def forward_pre_hook(module, data_input): print("forward_pre_hook input:{}".format(data_input)) def backward_hook(module, grad_input, grad_output): print("backward hook input:{}".format(grad_input)) print("backward hook output:{}".format(grad_output)) # 初始化網絡 net = Net() net.conv1.weight[0].detach().fill_(1) net.conv1.weight[1].detach().fill_(2) net.conv1.bias.data.detach().zero_() # 註冊hook fmap_block = list() input_block = list() net.conv1.register_forward_hook(forward_hook) net.conv1.register_forward_pre_hook(forward_pre_hook) net.conv1.register_backward_hook(backward_hook) # inference fake_img = torch.ones((1, 1, 4, 4)) # batch size * channel * H * W output = net(fake_img) loss_fnc = nn.L1Loss() target = torch.randn_like(output) loss = loss_fnc(target, output) loss.backward()
以register_forward_hook
爲例,在output = net(fake_img)
時調用過程以下:
在net.conv1.register_forward_hook(forward_hook)
註冊之後,net中_modules參數已經有了對應的_forword_hooks
Modules.__call__()
,此函數分爲4個步驟,net中不含hooks,進入forwarddef __call__(self, *input, **kwargs): # 1. _forward_pre_hooks for hook in self._forward_pre_hooks.values(): result = hook(self, input) if result is not None: if not isinstance(result, tuple): result = (result,) input = result # 2. forward if torch._C._get_tracing_state(): result = self._slow_forward(*input, **kwargs) else: result = self.forward(*input, **kwargs) # 3. _forward_hooks for hook in self._forward_hooks.values(): hook_result = hook(self, input, result) if hook_result is not None: result = hook_result # 4. _backward_hooks if len(self._backward_hooks) > 0: var = result while not isinstance(var, torch.Tensor): if isinstance(var, dict): var = next((v for v in var.values() if isinstance(v, torch.Tensor))) else: var = var[0] grad_fn = var.grad_fn if grad_fn is not None: for hook in self._backward_hooks.values(): wrapper = functools.partial(hook, self) functools.update_wrapper(wrapper, hook) grad_fn.register_hook(wrapper) return result
Net.forward
調用第一個卷積層def forward(self, x): x = self.conv1(x) x = self.pool1(x) return x
Modules.__call__()
,此時在forward後會調用相應的hook函數,即咱們在主程序中定義的CAM:類激活圖,class activation map: 在普通的網絡層最後改爲了GAP獲得最後的權重層,再由全鏈接層進行softmax。最後直接對特徵圖進行加權平均。
Grad-CAM:CAM改進版,利用梯度做爲特徵圖權重:不用再修改網絡結構
咱們獲得以上有趣的分析,發現模型預測飛機的存在不是飛機自己,而是藍色的天空,代碼實現詳見PyTorch的hook及其在Grad-CAM中的應用