LibSVM源碼剖析（java版）

時間 2019-12-04

標籤 libsvm 源碼剖析 java 欄目 Java 简体版

原文原文鏈接

以前學習了SVM的原理（見http://www.cnblogs.com/bentuwuying/p/6444249.html），以及SMO算法的理論基礎（見http://www.cnblogs.com/bentuwuying/p/6444516.html）。最近又學習了SVM的實現：LibSVM（版本爲3.22），在這裏總結一下。html

1. LibSVM總體框架

Training:

parse_command_line
    從命令行讀入須要設置的參數，並初始化各參數
read_problem
    讀入訓練數據
svm_check_parameter
    檢查參數和訓練數據是否符合規定的格式
do_cross_validation
    svm_cross_validation
        對訓練數據進行均勻劃分，使得同一個類別的數據均勻分佈在各個fold中
        最終獲得的數組中，相同fold的數據的索引放在連續內存空間數組中
        Cross Validation（k fold）
            svm_train（使用k-1個fold的數據作訓練）
            svm_predict（使用剩下的1個fold的數據作預測）
        獲得每一個樣本在Cross Validation中的預測類別
    得出Cross Validation的準確率
svm_train
    classification
        svm_group_classes
            對訓練數據按類別進行分組重排，相同類別的數據的索引放在連續內存空間數組中
            獲得類別總數：nr_class，每一個類別的標識：label，每一個類別的樣本數：count，每一個類別在重排數組中的起始位置：start，重排後的索引數組：perm（每一個元素的值表明其在原始數組中的索引）
        train k*(k-1)/2 models
            新建1個訓練數據子集sub_prob，並使用兩個類別的訓練數據進行填充
            svm_train_one //根據svm_type的不一樣，使用不一樣的方式進行單次模型訓練
                solve_c_svc //針對C-SVC這一類型進行模型訓練
                    新建1個Solver類對象s
                    s.Solve //使用SMO算法求解對偶問題（二次優化問題）
                        初始化拉格朗日乘子狀態向量（是否在邊界上）
                        初始化須要參與迭代優化計算的拉格朗日乘子集合
                        初始化梯度G，以及爲了重建梯度時候節省計算開銷而維護的中間變量G_bar
                        迭代優化
                            do_shrinking（每間隔若干次迭代進行一次shrinking）
                                找出m(a)和M(a)
                                reconstruct_gradient（若是知足中止條件，進行梯度重建）
                                    由於有時候shrinking策略太過aggressive，因此當對shrinking以後的部分變量的優化問題迭代優化到第一次知足中止條件時，即可以對梯度進行重建
                                    接下來的shrinking過程即可以創建在更精確的梯度值上
                                be_shrunk
                                    判斷該alpha是否被shrinking（再也不參與後續迭代優化）
                                swap_index
                                    交換兩個變量的位置，從而使得參與迭代優化的變量（即沒有被shrinking的變量）始終保持在變量數組的最前面的連續位置上
                            select_working_set（選擇工做集）
                            對工做集的alpha進行更新
                            更新梯度G，拉格朗日乘子狀態向量，及中間變量G_bar
                            計算b
                        填充alpha數組和SolutionInfo對象si並返回
                    返回alpha數組和SolutionInfo對象si
                輸出decision_function對象（包含alpha數組和b）
            修改nonzero數組，將alpha大於0的對應位置改成true
        填充svm_model對象model（包含nr_class，label數組，b數組，probA&probB數組，nSV數組，l，SV二維數組，sv_indices數組,sv_coef二維數組）並返回 //sv_coef二維數組內元素的放置方式很特別
svm_save_model
    保存模型到制定文件中

    
    
Prediction:

svm_predict
    svm_predict_values
        Kernel.k_function //計算預測樣本與Support Vectors的Kernel值
        用k(k-1)/2個分類器對預測樣本進行預測，得出k(k-1)/2個預測類別
        使用投票策略，選擇預測數最多的那個類別做爲最終的預測類別
        返回預測類別

2. 各參數文件

1 public class svm_node implements java.io.Serializable
2 {
3     public int index;
4     public double value;
5 }

用於存儲單個向量中的單個特徵。index是該特徵的索引，value是該特徵的值。這樣設計的好處是，能夠節省存儲空間，提升計算速度。java

1 public class svm_problem implements java.io.Serializable
2 {
3     public int l;
4     public double[] y;
5     public svm_node[][] x;
6 }

用於存儲本次運算的全部樣本（數據集），及其所屬類別。node

 1 public class svm_parameter implements Cloneable,java.io.Serializable
 2 {
 3     /* svm_type */
 4     public static final int C_SVC = 0;
 5     public static final int NU_SVC = 1;
 6     public static final int ONE_CLASS = 2;
 7     public static final int EPSILON_SVR = 3;
 8     public static final int NU_SVR = 4;
 9 
10     /* kernel_type */
11     public static final int LINEAR = 0;
12     public static final int POLY = 1;
13     public static final int RBF = 2;
14     public static final int SIGMOID = 3;
15     public static final int PRECOMPUTED = 4;
16 
17     public int svm_type;
18     public int kernel_type;
19     public int degree;    // for poly
20     public double gamma;    // for poly/rbf/sigmoid
21     public double coef0;    // for poly/sigmoid
22 
23     // these are for training only
24     public double cache_size; // in MB
25     public double eps;    // stopping criteria
26     public double C;    // for C_SVC, EPSILON_SVR and NU_SVR
27     public int nr_weight;        // for C_SVC
28     public int[] weight_label;    // for C_SVC
29     public double[] weight;        // for C_SVC
30     public double nu;    // for NU_SVC, ONE_CLASS, and NU_SVR
31     public double p;    // for EPSILON_SVR
32     public int shrinking;    // use the shrinking heuristics
33     public int probability; // do probability estimates
34 
35     public Object clone()
36     {
37         try
38         {
39             return super.clone();
40         } catch (CloneNotSupportedException e)
41         {
42             return null;
43         }
44     }
45 
46 }

用於存儲模型訓練所需設置的參數。算法

 1 public class svm_model implements java.io.Serializable
 2 {
 3     public svm_parameter param;    // parameter
 4     public int nr_class;        // number of classes, = 2 in regression/one class svm
 5     public int l;            // total #SV
 6     public svm_node[][] SV;    // SVs (SV[l])
 7     public double[][] sv_coef;    // coefficients for SVs in decision functions (sv_coef[k-1][l])
 8     public double[] rho;        // constants in decision functions (rho[k*(k-1)/2])
 9     public double[] probA;         // pariwise probability information
10     public double[] probB;
11     public int[] sv_indices;       // sv_indices[0,...,nSV-1] are values in [1,...,num_traning_data] to indicate SVs in the training set
12 
13     // for classification only
14 
15     public int[] label;        // label of each class (label[k])
16     public int[] nSV;        // number of SVs for each class (nSV[k])
17     // nSV[0] + nSV[1] + ... + nSV[k-1] = l
18 };

用於存儲訓練獲得的模型，固然原來的訓練參數也必須保留。數組

3. Cache類

Cache類主要負責運算所涉及的內存的管理，包括申請和釋放等。緩存

 1     private final int l;
 2     private long size;
 3     private final class head_t
 4     {
 5         head_t prev, next;    // a cicular list
 6         float[] data;
 7         int len;        // data[0,len) is cached in this entry
 8     }
 9     private final head_t[] head;
10     private head_t lru_head;

類成員變量。包括：框架

l：樣本總數。函數

size：指定的所有內存總量。學習

head_t：單塊申請到的內存用class head_t來記錄所申請內存，並記錄長度。並且經過雙向的指針，造成鏈表，增長尋址的速度。記錄全部申請到的內存，一方面便於釋放內存，另外方便在內存不夠時適當釋放一部分已經申請到的內存。優化

head：相似於變量指針，該指針用來記錄程序所申請的內存。

lru_head：雙向鏈表的頭。

 1     Cache(int l_, long size_)
 2     {
 3         l = l_;
 4         size = size_;
 5         head = new head_t[l];
 6         for(int i=0;i<l;i++) head[i] = new head_t();
 7         size /= 4;
 8         size -= l * (16/4);    // sizeof(head_t) == 16
 9         size = Math.max(size, 2* (long) l);  // cache must be large enough for two columns
10         lru_head = new head_t();
11         lru_head.next = lru_head.prev = lru_head;
12     }

構造函數。該函數根據樣本數L，申請L 個head_t 的空間。Lru_head 由於尚沒有head_t 中申請到內存，故雙向鏈表指向本身。至於size 的處理，先將原來的byte 數目轉化爲float 的數目，而後扣除L 個head_t 的內存數目。

1     private void lru_delete(head_t h)
2     {
3         // delete from current location
4         h.prev.next = h.next;
5         h.next.prev = h.prev;
6     }

從雙向鏈表中刪除某個元素的連接，不刪除、不釋放該元素所涉及的內存。通常是刪除當前所指向的元素。

1     private void lru_insert(head_t h)
2     {
3         // insert to last position
4         h.next = lru_head;
5         h.prev = lru_head.prev;
6         h.prev.next = h;
7         h.next.prev = h;
8     }

在鏈表後面插入一個新的節點。

 1     // request data [0,len)
 2     // return some position p where [p,len) need to be filled
 3     // (p >= len if nothing needs to be filled)
 4     // java: simulate pointer using single-element array
 5     int get_data(int index, float[][] data, int len)
 6     {
 7         head_t h = head[index];
 8         if(h.len > 0) lru_delete(h);
 9         int more = len - h.len;
10 
11         if(more > 0)
12         {
13             // free old space
14             while(size < more)
15             {
16                 head_t old = lru_head.next;
17                 lru_delete(old);
18                 size += old.len;
19                 old.data = null;
20                 old.len = 0;
21             }
22 
23             // allocate new space
24             float[] new_data = new float[len];
25             if(h.data != null) System.arraycopy(h.data,0,new_data,0,h.len);
26             h.data = new_data;
27             size -= more;
28             do {int tmp=h.len; h.len=len; len=tmp;} while(false);
29         }
30 
31         lru_insert(h);
32         data[0] = h.data;
33         return len;
34     }

該函數保證head_t[index]中至少有len 個float 的內存，而且將可使用的內存塊的指針放在 data 指針中。返回值爲申請到的內存。函數首先將head_t[index]從鏈表中斷開,若是head_t[index]原來沒有分配內存，則跳過斷開這步。計算當前head_t[index]已經申請到的內存，若是不夠，釋放部份內存，等內存足夠後，從新分配內存。從新使head_t[index]進入雙向鏈表。返回值不爲申請到的內存的長度，爲head_t[index]原來的數據長度h.len。

調用該函數後，程序會計算的值，並將其填入 data 所指向的內存區域，若是下次index 不變，正常狀況下，不用從新計算該區域的值。若index 不變，則get_data()返回值len 與本次傳入的len 一致，從get_Q( )中能夠看到，程序不會從新計算。從而提升運算速度。

 1     void swap_index(int i, int j)
 2     {
 3         if(i==j) return;
 4 
 5         if(head[i].len > 0) lru_delete(head[i]);
 6         if(head[j].len > 0) lru_delete(head[j]);
 7         do {float[] tmp=head[i].data; head[i].data=head[j].data; head[j].data=tmp;} while(false);
 8         do {int tmp=head[i].len; head[i].len=head[j].len; head[j].len=tmp;} while(false);
 9         if(head[i].len > 0) lru_insert(head[i]);
10         if(head[j].len > 0) lru_insert(head[j]);
11 
12         if(i>j) do {int tmp=i; i=j; j=tmp;} while(false);
13         for(head_t h = lru_head.next; h!=lru_head; h=h.next)
14         {
15             if(h.len > i)
16             {
17                 if(h.len > j)
18                     do {float tmp=h.data[i]; h.data[i]=h.data[j]; h.data[j]=tmp;} while(false);
19                 else
20                 {
21                     // give up
22                     lru_delete(h);
23                     size += h.len;
24                     h.data = null;
25                     h.len = 0;
26                 }
27             }
28         }
29     }

交換head_t[i] 和head_t[j]的內容，先從雙向鏈表中斷開，交換後從新進入雙向鏈表中。而後對於每一個head_t[index]中的值，交換其第 i 位和第 j 位的值。這兩步是爲了在Kernel 矩陣中在兩個維度上都進行一次交換。

4. Kernel類

Kernel類是用來進行計算Kernel evaluation矩陣的。

 1 //
 2 // Kernel evaluation
 3 //
 4 // the static method k_function is for doing single kernel evaluation
 5 // the constructor of Kernel prepares to calculate the l*l kernel matrix
 6 // the member function get_Q is for getting one column from the Q Matrix
 7 //
 8 abstract class QMatrix {
 9     abstract float[] get_Q(int column, int len);
10     abstract double[] get_QD();
11     abstract void swap_index(int i, int j);
12 };

Kernel的父類，抽象類。

 1 abstract class Kernel extends QMatrix {
 2     private svm_node[][] x;
 3     private final double[] x_square;
 4 
 5     // svm_parameter
 6     private final int kernel_type;
 7     private final int degree;
 8     private final double gamma;
 9     private final double coef0;
10 
11     abstract float[] get_Q(int column, int len);
12     abstract double[] get_QD();

Kernel類的成員變量。

 1     Kernel(int l, svm_node[][] x_, svm_parameter param)
 2     {
 3         this.kernel_type = param.kernel_type;
 4         this.degree = param.degree;
 5         this.gamma = param.gamma;
 6         this.coef0 = param.coef0;
 7 
 8         x = (svm_node[][])x_.clone();
 9 
10         if(kernel_type == svm_parameter.RBF)
11         {
12             x_square = new double[l];
13             for(int i=0;i<l;i++)
14                 x_square[i] = dot(x[i],x[i]);
15         }
16         else x_square = null;
17     }

構造函數。初始化類中的部分常量、指定核函數、克隆樣本數據（每次數據傳入時經過克隆函數來實現，徹底從新分配內存）。若是使用RBF 核函數，則計算x_sqare[i]。

 1     double kernel_function(int i, int j)
 2     {
 3         switch(kernel_type)
 4         {
 5             case svm_parameter.LINEAR:
 6                 return dot(x[i],x[j]);
 7             case svm_parameter.POLY:
 8                 return powi(gamma*dot(x[i],x[j])+coef0,degree);
 9             case svm_parameter.RBF:
10                 return Math.exp(-gamma*(x_square[i]+x_square[j]-2*dot(x[i],x[j])));
11             case svm_parameter.SIGMOID:
12                 return Math.tanh(gamma*dot(x[i],x[j])+coef0);
13             case svm_parameter.PRECOMPUTED:
14                 return x[i][(int)(x[j][0].value)].value;
15             default:
16                 return 0;    // java
17         }
18     }

對Kernel類的對象中包含的任意2個樣本求kernel evaluation。

 1     static double k_function(svm_node[] x, svm_node[] y,
 2                              svm_parameter param)
 3     {
 4         switch(param.kernel_type)
 5         {
 6             case svm_parameter.LINEAR:
 7                 return dot(x,y);
 8             case svm_parameter.POLY:
 9                 return powi(param.gamma*dot(x,y)+param.coef0,param.degree);
10             case svm_parameter.RBF:
11             {
12                 double sum = 0;
13                 int xlen = x.length;
14                 int ylen = y.length;
15                 int i = 0;
16                 int j = 0;
17                 while(i < xlen && j < ylen)
18                 {
19                     if(x[i].index == y[j].index)
20                     {
21                         double d = x[i++].value - y[j++].value;
22                         sum += d*d;
23                     }
24                     else if(x[i].index > y[j].index)
25                     {
26                         sum += y[j].value * y[j].value;
27                         ++j;
28                     }
29                     else
30                     {
31                         sum += x[i].value * x[i].value;
32                         ++i;
33                     }
34                 }
35 
36                 while(i < xlen)
37                 {
38                     sum += x[i].value * x[i].value;
39                     ++i;
40                 }
41 
42                 while(j < ylen)
43                 {
44                     sum += y[j].value * y[j].value;
45                     ++j;
46                 }
47 
48                 return Math.exp(-param.gamma*sum);
49             }
50             case svm_parameter.SIGMOID:
51                 return Math.tanh(param.gamma*dot(x,y)+param.coef0);
52             case svm_parameter.PRECOMPUTED:
53                 return    x[(int)(y[0].value)].value;
54             default:
55                 return 0;    // java
56         }
57     }

靜態方法，對參數傳入的任意2個樣本求kernel evaluation。主要應用在predict過程當中。

 1 //
 2 // Q matrices for various formulations
 3 //
 4 class SVC_Q extends Kernel
 5 {
 6     private final byte[] y;
 7     private final Cache cache;
 8     private final double[] QD;
 9 
10     SVC_Q(svm_problem prob, svm_parameter param, byte[] y_)
11     {
12         super(prob.l, prob.x, param);
13         y = (byte[])y_.clone();
14         cache = new Cache(prob.l,(long)(param.cache_size*(1<<20)));
15         QD = new double[prob.l];
16         for(int i=0;i<prob.l;i++)
17             QD[i] = kernel_function(i,i);
18     }
19 
20     float[] get_Q(int i, int len)
21     {
22         float[][] data = new float[1][];
23         int start, j;
24         if((start = cache.get_data(i,data,len)) < len)
25         {
26             for(j=start;j<len;j++)
27                 data[0][j] = (float)(y[i]*y[j]*kernel_function(i,j));
28         }
29         return data[0];
30     }
31 
32     double[] get_QD()
33     {
34         return QD;
35     }
36 
37     void swap_index(int i, int j)
38     {
39         cache.swap_index(i,j);
40         super.swap_index(i,j);
41         do {byte tmp=y[i]; y[i]=y[j]; y[j]=tmp;} while(false);
42         do {double tmp=QD[i]; QD[i]=QD[j]; QD[j]=tmp;} while(false);
43     }
44 }

Kernel類的子類，用於對C-SVC進行定製的Kernel類。其中構造函數中會調用父類Kernel類的構造函數，而且初始化自身獨有的成員變量。

在get_Q函數中，調用了Cache類的get_data函數。想要獲得第 i 個變量與其它 len 個變量的Kernel函數值。可是若是取出的緩存中沒有所有的值，只有部分的值的話，就須要從新計算一下剩下的那部分的Kernel值，這部分新計算的值會保存到緩存中去。只要不刪除掉，之後能夠繼續使用，不用再重複計算了。

5. Solver類

LibSVM中的Solver類主要是SVM中SMO算法求解拉格朗日乘子的實現。SMO算法的原理能夠參考以前的一篇博客：http://www.cnblogs.com/bentuwuying/p/6444516.html。

 1 // An SMO algorithm in Fan et al., JMLR 6(2005), p. 1889--1918
 2 // Solves:
 3 //
 4 //    min 0.5(\alpha^T Q \alpha) + p^T \alpha
 5 //
 6 //        y^T \alpha = \delta
 7 //        y_i = +1 or -1
 8 //        0 <= alpha_i <= Cp for y_i = 1
 9 //        0 <= alpha_i <= Cn for y_i = -1
10 //
11 // Given:
12 //
13 //    Q, p, y, Cp, Cn, and an initial feasible point \alpha
14 //    l is the size of vectors and matrices
15 //    eps is the stopping tolerance
16 //
17 // solution will be put in \alpha, objective value will be put in obj
18 //
19 class Solver {

5.1 Solver類的成員變量

 1     int active_size;
 2     byte[] y;
 3     double[] G;        // gradient of objective function
 4     static final byte LOWER_BOUND = 0;
 5     static final byte UPPER_BOUND = 1;
 6     static final byte FREE = 2;
 7     byte[] alpha_status;    // LOWER_BOUND, UPPER_BOUND, FREE
 8     double[] alpha;
 9     QMatrix Q;
10     double[] QD;
11     double eps;
12     double Cp,Cn;
13     double[] p;
14     int[] active_set;
15     double[] G_bar;        // gradient, if we treat free variables as 0
16     int l;
17     boolean unshrink;    // XXX
18 
19     static final double INF = java.lang.Double.POSITIVE_INFINITY;

Solver類的成員變量。包括：

active_size：計算時實際參加運算的樣本數目，通過shrinking處理後，該數目會小於所有樣本總數。

y：樣本所屬類別，+1/-1。

G：梯度，。

alpha_status：拉格朗日乘子的狀態，分別是，表明內部點（非SV，LOWER_BOUND），錯分點（BSV，UPPER_BOUND），和支持向量（SV，FREE）。

alpha：拉格朗日乘子。

Q：核函數矩陣。

QD：核函數矩陣中的對角線部分。

eps：偏差極限。

Cp，Cn：正負樣本各自的懲罰係數。

p：目標函數中的係數。

active_set：計算時實際參加運算的樣本索引。

G_bar：在重建梯度時的中間變量，能夠下降重建的計算開銷。

l：樣本數目。

5.2 Solver類的簡單輔助成員函數

1     double get_C(int i)
2     {
3         return (y[i] > 0)? Cp : Cn;
4     }

double get_c(int i)：返回對應樣本的C值（懲罰係數）。這裏對正負樣本設置了不一樣的懲罰係數Cp，Cn。

1     void update_alpha_status(int i)
2     {
3         if(alpha[i] >= get_C(i))
4             alpha_status[i] = UPPER_BOUND;
5         else if(alpha[i] <= 0)
6             alpha_status[i] = LOWER_BOUND;
7         else alpha_status[i] = FREE;
8     }

void update_alpha_status(int i)：更新拉格朗日乘子的狀態。

1     boolean is_upper_bound(int i) { return alpha_status[i] == UPPER_BOUND; }
2     boolean is_lower_bound(int i) { return alpha_status[i] == LOWER_BOUND; }
3     boolean is_free(int i) {  return alpha_status[i] == FREE; }

boolean is_upper_bound(int i)，boolean is_lower_bound(int i)，boolean is_free(int i)：返回拉格朗日是否在界上。

1     // java: information about solution except alpha,
2     // because we cannot return multiple values otherwise...
3     static class SolutionInfo {
4         double obj;
5         double rho;
6         double upper_bound_p;
7         double upper_bound_n;
8         double r;    // for Solver_NU
9     }

static class Solutioninfo：SMO算法求得的解（除了alpha）。

 1     void swap_index(int i, int j)
 2     {
 3         Q.swap_index(i,j);
 4         do {byte tmp=y[i]; y[i]=y[j]; y[j]=tmp;} while(false);
 5         do {double tmp=G[i]; G[i]=G[j]; G[j]=tmp;} while(false);
 6         do {byte tmp=alpha_status[i]; alpha_status[i]=alpha_status[j]; alpha_status[j]=tmp;} while(false);
 7         do {double tmp=alpha[i]; alpha[i]=alpha[j]; alpha[j]=tmp;} while(false);
 8         do {double tmp=p[i]; p[i]=p[j]; p[j]=tmp;} while(false);
 9         do {int tmp=active_set[i]; active_set[i]=active_set[j]; active_set[j]=tmp;} while(false);
10         do {double tmp=G_bar[i]; G_bar[i]=G_bar[j]; G_bar[j]=tmp;} while(false);
11     }

void swap_index(int i, int j)：徹底交換樣本 i 和樣本 j 的內容，包括申請的內存的地址。

5.3 Solving the Quadratic Problems

咱們能夠用一個通用的表達式來表示SMO算法要解決的二次優化問題：

對於這個二次優化問題，最困難的地方在於Q是一個很大的稠密矩陣，難以存儲，因此須要採用分解的方式解決，SMO算法就是採用每次選取拉格朗日乘子中的2個來更新，直到收斂到最優解。

5.4 Stopping Criteria

假設存在向量是（11）式的解，則必然存在實數和兩個非負向量使得下式成立：

其中，是目標函數的梯度。

上面的條件能夠被重寫爲：

進一步，存在b使得

其中，

則目標函數存在最優解的條件是：

而實際實現過程當中，優化迭代的中止條件爲：

其中，epsilon是偏差極限，tolerance。

5.5 Working Set Selection

 1     // return 1 if already optimal, return 0 otherwise
 2     int select_working_set(int[] working_set)
 3     {
 4         // return i,j such that
 5         // i: maximizes -y_i * grad(f)_i, i in I_up(\alpha)
 6         // j: mimimizes the decrease of obj value
 7         //    (if quadratic coefficeint <= 0, replace it with tau)
 8         //    -y_j*grad(f)_j < -y_i*grad(f)_i, j in I_low(\alpha)
 9 
10         double Gmax = -INF;
11         double Gmax2 = -INF;
12         int Gmax_idx = -1;
13         int Gmin_idx = -1;
14         double obj_diff_min = INF;
15 
16         for(int t=0;t<active_size;t++)
17             if(y[t]==+1)
18             {
19                 if(!is_upper_bound(t))
20                     if(-G[t] >= Gmax)
21                     {
22                         Gmax = -G[t];
23                         Gmax_idx = t;
24                     }
25             }
26             else
27             {
28                 if(!is_lower_bound(t))
29                     if(G[t] >= Gmax)
30                     {
31                         Gmax = G[t];
32                         Gmax_idx = t;
33                     }
34             }
35 
36         int i = Gmax_idx;
37         float[] Q_i = null;
38         if(i != -1) // null Q_i not accessed: Gmax=-INF if i=-1
39             Q_i = Q.get_Q(i,active_size);
40 
41         for(int j=0;j<active_size;j++)
42         {
43             if(y[j]==+1)
44             {
45                 if (!is_lower_bound(j))
46                 {
47                     double grad_diff=Gmax+G[j];
48                     if (G[j] >= Gmax2)
49                         Gmax2 = G[j];
50                     if (grad_diff > 0)
51                     {
52                         double obj_diff;
53                         double quad_coef = QD[i]+QD[j]-2.0*y[i]*Q_i[j];
54                         if (quad_coef > 0)
55                             obj_diff = -(grad_diff*grad_diff)/quad_coef;
56                         else
57                             obj_diff = -(grad_diff*grad_diff)/1e-12;
58 
59                         if (obj_diff <= obj_diff_min)
60                         {
61                             Gmin_idx=j;
62                             obj_diff_min = obj_diff;
63                         }
64                     }
65                 }
66             }
67             else
68             {
69                 if (!is_upper_bound(j))
70                 {
71                     double grad_diff= Gmax-G[j];
72                     if (-G[j] >= Gmax2)
73                         Gmax2 = -G[j];
74                     if (grad_diff > 0)
75                     {
76                         double obj_diff;
77                         double quad_coef = QD[i]+QD[j]+2.0*y[i]*Q_i[j];
78                         if (quad_coef > 0)
79                             obj_diff = -(grad_diff*grad_diff)/quad_coef;
80                         else
81                             obj_diff = -(grad_diff*grad_diff)/1e-12;
82 
83                         if (obj_diff <= obj_diff_min)
84                         {
85                             Gmin_idx=j;
86                             obj_diff_min = obj_diff;
87                         }
88                     }
89                 }
90             }
91         }
92 
93         if(Gmax+Gmax2 < eps || Gmin_idx == -1)
94             return 1;
95 
96         working_set[0] = Gmax_idx;
97         working_set[1] = Gmin_idx;
98         return 0;
99     }

其中，第1個 for 循環是在肯定工做集的第1個變量 i 。在肯定了 i 以後，第2個 for 循環便是爲了肯定第2個變量 j 的選取。當二者都肯定了以後，經過看中止條件來肯定是否當前變量已經處在最優解上了，若是是則返回1，若是不是對working_set數組賦值並返回0。

5.6 Maintaining the Gradient

咱們能夠看到，在每次迭代中，主要操做是找到（（12）式目標函數中會用到），以及（在working set selection和stopping condition中會用到）。這兩個主要操做能夠合起來一塊兒看待，由於：

在第k次迭代中，咱們已經獲得了，因而即可以獲得，用來計算（12）式的目標函數。當第k次迭代的目標函數獲得解決後，咱們又能夠獲得 k+1 次的。因此，LibSVM在整個迭代優化過程當中都一直維護着梯度數組。

5.7 The Calculation of b

令

（1）當存在，有。爲了數值的穩定性，作一個平均處理，

（2）當，有

則取中值。

 1     double calculate_rho()
 2     {
 3         double r;
 4         int nr_free = 0;
 5         double ub = INF, lb = -INF, sum_free = 0;
 6         for(int i=0;i<active_size;i++)
 7         {
 8             double yG = y[i]*G[i];
 9 
10             if(is_lower_bound(i))
11             {
12                 if(y[i] > 0)
13                     ub = Math.min(ub,yG);
14                 else
15                     lb = Math.max(lb,yG);
16             }
17             else if(is_upper_bound(i))
18             {
19                 if(y[i] < 0)
20                     ub = Math.min(ub,yG);
21                 else
22                     lb = Math.max(lb,yG);
23             }
24             else
25             {
26                 ++nr_free;
27                 sum_free += yG;
28             }
29         }
30 
31         if(nr_free>0)
32             r = sum_free/nr_free;
33         else
34             r = (ub+lb)/2;
35 
36         return r;
37     }

5.8 Shrinking

對於（11）式目標函數的最優解中會包含一些邊界值，這些值在迭代優化的過程當中可能就已經成爲了邊界值了，以後便再也不變化。爲了節省訓練時間，使用shrinking方法去除這些個邊界值，從而能夠進一步解決一個更小的子優化問題。下面的這2個定理顯示，在迭代的最後，只有一小部分變量還在改變。

使用A來表示 k 輪迭代中沒有被shrinking的元素集合，從而（11）式目標函數即可以縮減爲一個子優化問題：

其中，是被shrinking的集合。在LibSVM中，經過元素的重排使得始終有。

當（28）式解決了以後，咱們會發現可能會有部分的元素被錯誤地shrinking了。若是有這種狀況發生，那麼須要對（11）式原優化問題進行從新求解最優值，而從新求解的起始點便是，其中是（28）式子問題的最優解，是被shrinking的變量。

shrinking的步驟以下：

 1     private boolean be_shrunk(int i, double Gmax1, double Gmax2)
 2     {
 3         if(is_upper_bound(i))
 4         {
 5             if(y[i]==+1)
 6                 return(-G[i] > Gmax1);
 7             else
 8                 return(-G[i] > Gmax2);
 9         }
10         else if(is_lower_bound(i))
11         {
12             if(y[i]==+1)
13                 return(G[i] > Gmax2);
14             else
15                 return(G[i] > Gmax1);
16         }
17         else
18             return(false);
19     }

經過上述shrinking條件來判斷第 i 個變量是否被shrinking。

 1     void do_shrinking()
 2     {
 3         int i;
 4         double Gmax1 = -INF;        // max { -y_i * grad(f)_i | i in I_up(\alpha) }
 5         double Gmax2 = -INF;        // max { y_i * grad(f)_i | i in I_low(\alpha) }
 6 
 7         // find maximal violating pair first
 8         for(i=0;i<active_size;i++)
 9         {
10             if(y[i]==+1)
11             {
12                 if(!is_upper_bound(i))
13                 {
14                     if(-G[i] >= Gmax1)
15                         Gmax1 = -G[i];
16                 }
17                 if(!is_lower_bound(i))
18                 {
19                     if(G[i] >= Gmax2)
20                         Gmax2 = G[i];
21                 }
22             }
23             else
24             {
25                 if(!is_upper_bound(i))
26                 {
27                     if(-G[i] >= Gmax2)
28                         Gmax2 = -G[i];
29                 }
30                 if(!is_lower_bound(i))
31                 {
32                     if(G[i] >= Gmax1)
33                         Gmax1 = G[i];
34                 }
35             }
36         }
37 
38         if(unshrink == false && Gmax1 + Gmax2 <= eps*10)
39         {
40             unshrink = true;
41             reconstruct_gradient();
42             active_size = l;
43         }
44 
45         for(i=0;i<active_size;i++)
46             if (be_shrunk(i, Gmax1, Gmax2))
47             {
48                 active_size--;
49                 while (active_size > i)
50                 {
51                     if (!be_shrunk(active_size, Gmax1, Gmax2))
52                     {
53                         swap_index(i,active_size);
54                         break;
55                     }
56                     active_size--;
57                 }
58             }
59     }

第1個 for 循環是爲了找到這2個變量。

由於有時候shrinking策略太過aggressive，因此當對shrinking以後的部分變量的優化問題迭代優化到第一次知足條件時，即可以對梯度進行重建。因而，接下來的shrinking過程即可以創建在更精確的梯度值上。

最後的 for 循環是爲了交換兩個變量的位置，從而使得參與迭代優化的變量（即沒有被shrinking的變量）始終保持在變量數組的最前面的連續位置上。進一步，爲了減小交換操做的次數，從而優化計算開銷，可使用如下的trick：從左向右遍歷找出須要被shrinking的變量位置 t1，從右向左遍歷找出須要參與迭代優化計算的變量位置 t2，交換 t1 和 t2 的位置，而後繼續遍歷，直到兩個遍歷的指針相遇。

5.9 Reconstructing the Gradient

一旦（31）式和（32）式知足了，便須要重建梯度向量。由於在優化（28）子問題的時候，一直都在內存中的，因此在重建梯度時，咱們只須要計算便可。爲了減小計算的開銷，在迭代計算時，咱們能夠維護着另外一個向量：

而對於任意不在 A 集合中的變量 i 而言，有：

而對於上式的計算，包含了兩層循環。是先對 i 循環仍是先對 j 進行循環，意味着大相徑庭的Kernel evaluation次數。具體證實以下：

因此，咱們具體選擇method 1 仍是method 2，是要看狀況討論的：

 1     void reconstruct_gradient()
 2     {
 3         // reconstruct inactive elements of G from G_bar and free variables
 4 
 5         if(active_size == l) return;
 6 
 7         int i,j;
 8         int nr_free = 0;
 9 
10         for(j=active_size;j<l;j++)
11             G[j] = G_bar[j] + p[j];
12 
13         for(j=0;j<active_size;j++)
14             if(is_free(j))
15                 nr_free++;
16 
17         if(2*nr_free < active_size)
18             svm.info("\nWARNING: using -h 0 may be faster\n");
19 
20         if (nr_free*l > 2*active_size*(l-active_size))
21         {
22             for(i=active_size;i<l;i++)
23             {
24                 float[] Q_i = Q.get_Q(i,active_size);
25                 for(j=0;j<active_size;j++)
26                     if(is_free(j))
27                         G[i] += alpha[j] * Q_i[j];
28             }
29         }
30         else
31         {
32             for(i=0;i<active_size;i++)
33                 if(is_free(i))
34                 {
35                     float[] Q_i = Q.get_Q(i,l);
36                     double alpha_i = alpha[i];
37                     for(j=active_size;j<l;j++)
38                         G[j] += alpha_i * Q_i[j];
39                 }
40         }
41     }

5.10 迭代優化總體框架

1     void Solve(int l, QMatrix Q, double[] p_, byte[] y_,
2                double[] alpha_, double Cp, double Cn, double eps, SolutionInfo si, int shrinking)

Solver類中最重要的迭代優化函數 Solve 的接口格式。因爲這個函數行數太多，下面分段進行剖析。

 1         this.l = l;
 2         this.Q = Q;
 3         QD = Q.get_QD();
 4         p = (double[])p_.clone();
 5         y = (byte[])y_.clone();
 6         alpha = (double[])alpha_.clone();
 7         this.Cp = Cp;
 8         this.Cn = Cn;
 9         this.eps = eps;
10         this.unshrink = false;

對Solver類的一些成員變量進行初始化。

1         // initialize alpha_status
2         {
3             alpha_status = new byte[l];
4             for(int i=0;i<l;i++)
5                 update_alpha_status(i);
6         }

初始化拉格朗日乘子狀態向量值。

1         // initialize active set (for shrinking)
2         {
3             active_set = new int[l];
4             for(int i=0;i<l;i++)
5                 active_set[i] = i;
6             active_size = l;
7         }

初始化須要參與迭代優化計算的拉格朗日乘子集合。

 1         // initialize gradient
 2         {
 3             G = new double[l];
 4             G_bar = new double[l];
 5             int i;
 6             for(i=0;i<l;i++)
 7             {
 8                 G[i] = p[i];
 9                 G_bar[i] = 0;
10             }
11             for(i=0;i<l;i++)
12                 if(!is_lower_bound(i))
13                 {
14                     float[] Q_i = Q.get_Q(i,l);
15                     double alpha_i = alpha[i];
16                     int j;
17                     for(j=0;j<l;j++)
18                         G[j] += alpha_i*Q_i[j];
19                     if(is_upper_bound(i))
20                         for(j=0;j<l;j++)
21                             G_bar[j] += get_C(i) * Q_i[j];
22                 }
23         }

初始化梯度G，以及爲了重建梯度時候節省計算開銷而維護的中間變量G_bar。

  1         // optimization step
  2 
  3         int iter = 0;
  4         int max_iter = Math.max(10000000, l>Integer.MAX_VALUE/100 ? Integer.MAX_VALUE : 100*l);
  5         int counter = Math.min(l,1000)+1;
  6         int[] working_set = new int[2];
  7 
  8         while(iter < max_iter)
  9         {
 10             // show progress and do shrinking
 11 
 12             if(--counter == 0)
 13             {
 14                 counter = Math.min(l,1000);
 15                 if(shrinking!=0) do_shrinking();
 16                 svm.info(".");
 17             }
 18 
 19             if(select_working_set(working_set)!=0)
 20             {
 21                 // reconstruct the whole gradient
 22                 reconstruct_gradient();
 23                 // reset active set size and check
 24                 active_size = l;
 25                 svm.info("*");
 26                 if(select_working_set(working_set)!=0)
 27                     break;
 28                 else
 29                     counter = 1;    // do shrinking next iteration
 30             }
 31 
 32             int i = working_set[0];
 33             int j = working_set[1];
 34 
 35             ++iter;
 36 
 37             // update alpha[i] and alpha[j], handle bounds carefully
 38 
 39             float[] Q_i = Q.get_Q(i,active_size);
 40             float[] Q_j = Q.get_Q(j,active_size);
 41 
 42             double C_i = get_C(i);
 43             double C_j = get_C(j);
 44 
 45             double old_alpha_i = alpha[i];
 46             double old_alpha_j = alpha[j];
 47 
 48             if(y[i]!=y[j])
 49             {
 50                 double quad_coef = QD[i]+QD[j]+2*Q_i[j];
 51                 if (quad_coef <= 0)
 52                     quad_coef = 1e-12;
 53                 double delta = (-G[i]-G[j])/quad_coef;
 54                 double diff = alpha[i] - alpha[j];
 55                 alpha[i] += delta;
 56                 alpha[j] += delta;
 57 
 58                 if(diff > 0)
 59                 {
 60                     if(alpha[j] < 0)
 61                     {
 62                         alpha[j] = 0;
 63                         alpha[i] = diff;
 64                     }
 65                 }
 66                 else
 67                 {
 68                     if(alpha[i] < 0)
 69                     {
 70                         alpha[i] = 0;
 71                         alpha[j] = -diff;
 72                     }
 73                 }
 74                 if(diff > C_i - C_j)
 75                 {
 76                     if(alpha[i] > C_i)
 77                     {
 78                         alpha[i] = C_i;
 79                         alpha[j] = C_i - diff;
 80                     }
 81                 }
 82                 else
 83                 {
 84                     if(alpha[j] > C_j)
 85                     {
 86                         alpha[j] = C_j;
 87                         alpha[i] = C_j + diff;
 88                     }
 89                 }
 90             }
 91             else
 92             {
 93                 double quad_coef = QD[i]+QD[j]-2*Q_i[j];
 94                 if (quad_coef <= 0)
 95                     quad_coef = 1e-12;
 96                 double delta = (G[i]-G[j])/quad_coef;
 97                 double sum = alpha[i] + alpha[j];
 98                 alpha[i] -= delta;
 99                 alpha[j] += delta;
100 
101                 if(sum > C_i)
102                 {
103                     if(alpha[i] > C_i)
104                     {
105                         alpha[i] = C_i;
106                         alpha[j] = sum - C_i;
107                     }
108                 }
109                 else
110                 {
111                     if(alpha[j] < 0)
112                     {
113                         alpha[j] = 0;
114                         alpha[i] = sum;
115                     }
116                 }
117                 if(sum > C_j)
118                 {
119                     if(alpha[j] > C_j)
120                     {
121                         alpha[j] = C_j;
122                         alpha[i] = sum - C_j;
123                     }
124                 }
125                 else
126                 {
127                     if(alpha[i] < 0)
128                     {
129                         alpha[i] = 0;
130                         alpha[j] = sum;
131                     }
132                 }
133             }
134 
135             // update G
136 
137             double delta_alpha_i = alpha[i] - old_alpha_i;
138             double delta_alpha_j = alpha[j] - old_alpha_j;
139 
140             for(int k=0;k<active_size;k++)
141             {
142                 G[k] += Q_i[k]*delta_alpha_i + Q_j[k]*delta_alpha_j;
143             }
144 
145             // update alpha_status and G_bar
146 
147             {
148                 boolean ui = is_upper_bound(i);
149                 boolean uj = is_upper_bound(j);
150                 update_alpha_status(i);
151                 update_alpha_status(j);
152                 int k;
153                 if(ui != is_upper_bound(i))
154                 {
155                     Q_i = Q.get_Q(i,l);
156                     if(ui)
157                         for(k=0;k<l;k++)
158                             G_bar[k] -= C_i * Q_i[k];
159                     else
160                         for(k=0;k<l;k++)
161                             G_bar[k] += C_i * Q_i[k];
162                 }
163 
164                 if(uj != is_upper_bound(j))
165                 {
166                     Q_j = Q.get_Q(j,l);
167                     if(uj)
168                         for(k=0;k<l;k++)
169                             G_bar[k] -= C_j * Q_j[k];
170                     else
171                         for(k=0;k<l;k++)
172                             G_bar[k] += C_j * Q_j[k];
173                 }
174             }
175 
176         }

迭代優化最重要的循環部分。每間隔若干次迭代進行一次shrinking，使得部分變量不用參與迭代計算。選擇工做集合（包含2個拉格朗日乘子），對工做集合進行更新計算操做。具體的更新原理可見paper：《Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines》。更新完工做集合後，再更新梯度G，拉格朗日乘子的狀態向量，以及維護的中間變量G_bar。

 1         // calculate rho
 2 
 3         si.rho = calculate_rho();
 4 
 5         // calculate objective value
 6         {
 7             double v = 0;
 8             int i;
 9             for(i=0;i<l;i++)
10                 v += alpha[i] * (G[i] + p[i]);
11 
12             si.obj = v/2;
13         }
14 
15         // put back the solution
16         {
17             for(int i=0;i<l;i++)
18                 alpha_[active_set[i]] = alpha[i];
19         }
20 
21         si.upper_bound_p = Cp;
22         si.upper_bound_n = Cn;
23 
24         svm.info("\noptimization finished, #iter = "+iter+"\n");