pytorch中如何自適應調整學習率？

pytorch中torch.optim.lr_scheduler提供了一些基於epochs數目的自適應學習率調整方法。html

torch.optim.lr_scheduler.ReduceLROnPlateau基於一些驗證集偏差測量實現動態學習率縮減。python

1.`torch.optim.lr_scheduler.LambdaLR`(optimizer,lr_lambda,last_epoch=-1)

根據epoch，將每一個參數組（parameter group）的學習速率設置爲初始lr乘以一個給定的函數（epoch爲自變量）。當last_epoch = -1時，將初始lr設置爲lr。數組

lr_lambda (function or list) – A function which computes a multiplicative factor given an integer parameter epoch, or a list of such functions, one for each group in optimizer.param_groups.app

函數或列表，一個給定epoch計算乘積係數的函數，或這樣的函數的一個list，list中的每個函數對應於optimizer.param_groups中的一個參數組。函數

last_epoch (int) – The index of last epoch. Default: -1.學習

Example：url

>>> # Assuming optimizer has two groups.
>>> lambda1 = lambda epoch: epoch // 30
>>> lambda2 = lambda epoch: 0.95 ** epoch >>> scheduler = LambdaLR(optimizer, lr_lambda=[lambda1, lambda2]) >>> for epoch in range(100): >>> scheduler.step() >>> train(...) >>>     validate(...)

state_dict()：以dict的形式返回 scheduler的狀態。spa

It contains an entry for every variable in self.__dict__ which is not the optimizer. .net

The learning rate lambda functions will only be saved if they are callable objects and not if they are functions or lambdas.scala

load_state_dict(state_dict)：加載scheduler的狀態。

參數：state_dict（dict） - scheduler的狀態。應該是對state_dict（）的調用返回的對象。

`2.torch.optim.lr_scheduler.StepLR`(optimizer, step_size, gamma=0.1, last_epoch=-1)

將每一個參數組的學習速率設置爲每隔step_size epochs，衰減gamma倍。當last_epoch = -1時，將初始lr設置爲lr。

step_size (int) – Period of learning rate decay. 衰減週期
gamma (float) – Multiplicative factor of learning rate decay. Default: 0.1. 衰減係數
last_epoch (int) – The index of last epoch. Default: -1.

Example

>>> # Assuming optimizer uses lr = 0.05 for all groups
>>> # lr = 0.05 if epoch < 30
>>> # lr = 0.005 if 30 <= epoch < 60
>>> # lr = 0.0005 if 60 <= epoch < 90
>>> # ...
>>> scheduler = StepLR(optimizer, step_size=30, gamma=0.1) >>> for epoch in range(100): >>> scheduler.step() >>> train(...) >>>     validate(...)

`3.torch.optim.lr_scheduler.MultiStepLR`(optimizer, milestones, gamma=0.1, last_epoch=-1)

一旦epochs數達到其中一個里程碑，將每一個參數組的學習速率衰減gamma倍。當last_epoch = -1時，將初始lr設置爲lr。

milestones (list) – List of epoch indices. Must be increasing. 衰減點，必須遞增。
gamma (float) – Multiplicative factor of learning rate decay. Default: 0.1. 衰減係數。
last_epoch (int) – The index of last epoch. Default: -1.

Example

>>> # Assuming optimizer uses lr = 0.05 for all groups
>>> # lr = 0.05 if epoch < 30
>>> # lr = 0.005 if 30 <= epoch < 80
>>> # lr = 0.0005 if epoch >= 80
>>> scheduler = MultiStepLR(optimizer, milestones=[30,80], gamma=0.1) >>> for epoch in range(100): >>> scheduler.step() >>> train(...) >>>     validate(...)

`4. torch.optim.lr_scheduler.ExponentialLR`(optimizer, gamma, last_epoch=-1)

將每一個參數組的學習率設置爲每一個epoch衰減gamma倍。當last_epoch = -1時，將初始lr設置爲lr。

5. `torch.optim.lr_scheduler.CosineAnnealingLR`(optimizer, T_max, eta_min=0, last_epoch=-1)

採用餘弦退火機制設置每一個參數組的學習率，

餘弦退火與熱重啓

6. `torch.optim.lr_scheduler.ReduceLROnPlateau`(optimizer, mode='min', factor=0.1, patience=10, verbose=False, threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0, eps=1e-08)

當指標中止改進時下降學習率。一旦學習停滯，模型一般會將學習率下降2-10倍。該調度程序讀取度量指標，若是對「patience」數量的epochs沒有看到改進，則學習速率下降。

mode (str) – One of min, max. In min mode, lr will be reduced when the quantity monitored has stopped decreasing; in max mode it will be reduced when the quantity monitored has stopped increasing. Default: ‘min’. 指定指標是增加仍是減少，默認減少min。
factor (float) – Factor by which the learning rate will be reduced. new_lr = lr * factor. Default: 0.1. 衰減係數
patience (int) – Number of epochs with no improvement after which learning rate will be reduced. For example, if patience = 2, then we will ignore the first 2 epochs with no improvement, and will only decrease the LR after the 3rd epoch if the loss still hasn’t improved then. Default: 10. 耐心值
verbose (bool) – If True, prints a message to stdout for each update. Default: False.
threshold (float) – Threshold for measuring the new optimum, to only focus on significant changes. Default: 1e-4.
threshold_mode (str) – One of rel, abs. In rel mode, dynamic_threshold = best * ( 1 + threshold ) in ‘max’ mode or best * ( 1 - threshold ) in min mode. In abs mode, dynamic_threshold = best + threshold in max mode or best - threshold in min mode. Default: ‘rel’.
cooldown (int) – Number of epochs to wait before resuming normal operation after lr has been reduced. Default: 0. 機制冷卻時間
min_lr (float or list) – A scalar or a list of scalars. A lower bound on the learning rate of all param groups or each group respectively. Default: 0. 最小學習率
eps (float) – Minimal decay applied to lr. If the difference between new and old lr is smaller than eps, the update is ignored. Default: 1e-8. 最小衰減量，若是衰減先後lr小於eps,則忽略衰減。

pytorch中的動態學習率規劃器

pytorch中如何自適應調整學習率？

1.torch.optim.lr_scheduler.LambdaLR(optimizer,lr_lambda,last_epoch=-1)

2.torch.optim.lr_scheduler.StepLR(optimizer, step_size, gamma=0.1, last_epoch=-1)

3.torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones, gamma=0.1, last_epoch=-1)

4. torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma, last_epoch=-1)