從新發現梯度降低法--backtracking line search

時間 2019-11-20

標籤從新發現梯度降低 backtracking line search 简体版

原文原文鏈接

一直覺得梯度降低很簡單的，結果最近發現我寫的一個梯度降低特別慢，後來終於找到緣由：step size的選擇很關鍵，有一種叫backtracking line search的梯度降低法就很是高效，該算法描述見下圖：python

下面用一個簡單的例子來展現，給一個無約束優化問題：算法

minimize y = (x-3)*(x-3)app

下面是python代碼，比較兩種方法優化

# -*- coding: cp936 -*-
#optimization test, y = (x-3)^2
from matplotlib.pyplot import figure, hold, plot, show, xlabel, ylabel, legend
def f(x):
        "The function we want to minimize"
        return (x-3)**2
def f_grad(x):
        "gradient of function f"
        return 2*(x-3)
x = 0
y = f(x)
err = 1.0
maxIter = 300
curve = [y]
it = 0
step = 0.1
#下面展現的是我以前用的方法，看上去貌似還挺合理的，可是很慢
while err > 1e-4 and it < maxIter:
    it += 1
    gradient = f_grad(x)
    new_x = x - gradient * step
    new_y = f(new_x)
    new_err = abs(new_y - y)
    if new_y > y: #若是出現divergence的跡象，就減少step size
        step *= 0.8
    err, x, y = new_err, new_x, new_y
    print 'err:', err, ', y:', y
    curve.append(y)

print 'iterations: ', it
figure(); hold(True); plot(curve, 'r*-')
xlabel('iterations'); ylabel('objective function value')

#下面展現的是backtracking line search，速度很快
x = 0
y = f(x)
err = 1.0
alpha = 0.25
beta = 0.8
curve2 = [y]
it = 0

while err > 1e-4 and it < maxIter:
    it += 1
    gradient = f_grad(x)
    step = 1.0
    while f(x - step * gradient) > y - alpha * step * gradient**2:
        step *= beta
    x = x - step * gradient
    new_y = f(x)
    err = y - new_y
    y = new_y
    print 'err:', err, ', y:', y
    curve2.append(y)

print 'iterations: ', it
plot(curve2, 'bo-')
legend(['gradient descent I used', 'backtracking line search'])
show()