

做者:凱魯嘎吉 - 博客園

    Carl Edward Rasmussen高斯機器學習的MATLAB代碼中寫到一個優化類的函數:minimize.m,同時,Geoff Hinton在用BP算法精調深度自編碼網絡時,也借鑑了這個函數minimize.m,下面來簡單聊一聊這個函數的大體機理。java


    做用:Minimize a differentiable multivariate function.網絡

1. 線性搜索技術——肯定迭代步長

2. 非線性共軛梯度法——肯定搜索方向

3. MATLAB代碼詳解

function [X, fX, i] = minimize(X, f, length, varargin)
%X是權值偏置 f輸出的是代價函數和偏導 3次線性搜索 每層網絡對應的節點數Dim和訓練數據data

% Minimize a differentiable multivariate function. 
% Usage: [X, fX, i] = minimize(X, f, length, P1, P2, P3, ... )



%起點由「 X」(D乘1)給定,而且在字符串「 f」中命名的函數必須返回函數值和偏導數向量。
%若是函數在幾回迭代中終止,則可能代表函數值和導數不一致(即,「 f」函數的實現中可能存在錯誤)。
%函數返回找到的解「 X」,函數值「 fX」的向量表示進展,「 i」使用的迭代次數(線性搜索或函數評估,取決於「length」的符號)。

%注意:若是函數在幾回迭代中終止,則可能代表函數值和導數不一致(即,「 f」函數的實現中可能存在錯誤)。 
%函數返回找到的解「 X」,函數值「 fX」的向量表示進展,「 i」使用的迭代次數(行搜索或函數評估,取決於「長度」的符號)。

INT = 0.1;    % don't reevaluate within 0.1 of the limit of the current bracket不要在當前括號限制的0.1之內從新評估
EXT = 3.0;                  % extrapolate maximum 3 times the current step-size外推最大值爲當前步長的3倍
MAX = 20;                         % max 20 function evaluations per line search每次線性搜索最多20個函數求值
RATIO = 10;                                       % maximum allowed slope ratio最大容許斜率
SIG = 0.1; RHO = SIG/2; 
% SIG和RHO是控制Wolfe-Powell條件的常數。 
% SIG是先前斜率和新斜率(搜索方向上的導數)之間容許的最大絕對比率,所以將SIG設置爲低(正)值將強制線搜索中的更高精度。 
% RHO是指望值的最小容許分數(從線性搜索中起始點的斜率開始)。 
% 常數必須知足0 <RHO <SIG <1。調整SIG(取決於要優化的函數的性質)可能會加快最小化;使用rho可能不值得。

if max(size(length)) == 2
else %length=3
if length>0 
    S='Linesearch';  %線性搜索
    S='Function evaluation';  %函數求值

i = 0;                                            % zero the run length counter 運行長度計數器清零
ls_failed = 0;                             % no previous line search has failed先前的線性搜索沒有失敗
[f0 df0] = feval(f, X, varargin{:});          % get function value and gradient
fX = f0;
i = i + (length<0);                                            % count epochs?!
s = -df0; d0 = -s'*s;           % initial search direction (steepest) and slope初始搜索方向(最陡,負梯度方向)和斜率
x3 = red/(1-d0);                                  % initial step is red/(|s|+1) 初始步長

while i < abs(length)                                      % while not finished
  i = i + (length>0);                                      % count iterations?!

  X0 = X; F0 = f0; dF0 = df0;                   % make a copy of current values
  if length>0, M = MAX; else M = min(MAX, -length-i); end
  while 1                             % keep extrapolating as long as necessary
    x2 = 0; f2 = f0; d2 = d0; f3 = f0; df3 = df0;
    success = 0;
    while ~success && M > 0
        M = M - 1; i = i + (length<0);                         % count epochs?!
        [f3 df3] = feval(f, X+x3*s, varargin{:});    %權值(t+1)=權值(t)+初始步長*初始搜索方向
        if isnan(f3) || isinf(f3) || any(isnan(df3)+isinf(df3)), error(''), end
        success = 1;
      catch                                % catch any error which occured in f
        x3 = (x2+x3)/2;                                  % bisect and try again  %步長等分,選取新搜索點
    if f3 < F0, X0 = X+x3*s; F0 = f3; dF0 = df3; end         % keep best values
    d3 = df3'*s;                                                    % new slope
    if d3 > SIG*d0 || f3 > f0+x3*RHO*d0 || M == 0  % are we done extrapolating?
    x1 = x2; f1 = f2; d1 = d2;                        % move point 2 to point 1
    x2 = x3; f2 = f3; d2 = d3;                        % move point 3 to point 2 
    A = 6*(f1-f2)+3*(d2+d1)*(x2-x1);                 % make cubic extrapolation
    B = 3*(f2-f1)-(2*d1+d2)*(x2-x1);
    x3 = x1-d1*(x2-x1)^2/(B+sqrt(B*B-A*d1*(x2-x1))); % num. error possible, ok!
    if ~isreal(x3) || isnan(x3) || isinf(x3) || x3 < 0 % num prob | wrong sign?
      x3 = x2*EXT;                                 % extrapolate maximum amount
    elseif x3 > x2*EXT                  % new point beyond extrapolation limit?
      x3 = x2*EXT;                                 % extrapolate maximum amount
    elseif x3 < x2+INT*(x2-x1)         % new point too close to previous point?
      x3 = x2+INT*(x2-x1);
  end                                                       % end extrapolation
  while (abs(d3) > -SIG*d0 || f3 > f0+x3*RHO*d0) && M > 0  % keep interpolating
    if d3 > 0 || f3 > f0+x3*RHO*d0                         % choose subinterval
      x4 = x3; f4 = f3; d4 = d3;                      % move point 3 to point 4
      x2 = x3; f2 = f3; d2 = d3;                      % move point 3 to point 2
    if f4 > f0           
      x3 = x2-(0.5*d2*(x4-x2)^2)/(f4-f2-d2*(x4-x2));  % quadratic interpolation 二次插值
      A = 6*(f2-f4)/(x4-x2)+3*(d4+d2);                    % cubic interpolation 三次插值
      B = 3*(f4-f2)-(2*d2+d4)*(x4-x2);
      x3 = x2+(sqrt(B*B-A*d2*(x4-x2)^2)-B)/A;        % num. error possible, ok!
    if isnan(x3) || isinf(x3)
      x3 = (x2+x4)/2;               % if we had a numerical problem then bisect
    x3 = max(min(x3, x4-INT*(x4-x2)),x2+INT*(x4-x2));  % don't accept too close
    [f3 df3] = feval(f, X+x3*s, varargin{:});
    if f3 < F0, X0 = X+x3*s; F0 = f3; dF0 = df3; end         % keep best values
    M = M - 1; i = i + (length<0);                             % count epochs?!
    d3 = df3'*s;                                                    % new slope
  end                                                       % end interpolation
  if abs(d3) < -SIG*d0 && f3 < f0+x3*RHO*d0          % if line search succeeded
    X = X+x3*s; f0 = f3; fX = [fX' f0]';                     % update variables
    fprintf('%s %6i;  Value %4.6e\r', S, i, f0);
    s = (df3'*df3-df0'*df3)/(df0'*df0)*s - df3;   % Polack-Ribiere 共軛梯度方向  搜索方向的更新公式
    df0 = df3;                                               % swap derivatives
    d3 = d0; d0 = df0'*s;
    if d0 > 0                                      % new slope must be negative
      s = -df0; d0 = -s'*s;                  % otherwise use steepest direction 負梯度方向
    x3 = x3 * min(RATIO, d3/(d0-realmin));          % slope ratio but max RATIO
    ls_failed = 0;                              % this line search did not fail
    X = X0; f0 = F0; df0 = dF0;                     % restore best point so far
    if ls_failed || i > abs(length)         % line search failed twice in a row
      break;                             % or we ran out of time, so we give up
    s = -df0; d0 = -s'*s;                                        % try steepest
    x3 = 1/(1-d0);                     
    ls_failed = 1;                                    % this line search failed

4. 參考文獻

[1]汪丹戎. 非線性共軛梯度法及全局收斂性分析[D].長江大學,2016.機器學習

[2] Quadratic and Cubic Search for a Minimum函數

[3] 2006, Carl Edward Rasmussen, Minimize學習

[4] 2011, Conjugate Gradient Back-propagation with Modified Polack Rebier updates for training feed forward neural network優化

[5] 景慧麗. 無約束最優化問題的算法研究與實現[D].西安科技大學,2009.this

[6] 數值優化(Numerical Optimization)學習系列-線搜索方法(LineSearch)編碼
