交叉驗證(Cross-validation)主要用於建模應用中,例如PCR 、PLS 迴歸建模中。在給定的建模樣本中,拿出大部分樣本進行建模型,留小部分樣本用剛創建的模型進行預報,並求這小部分樣本的預報偏差,記錄它們的平方加和。這個過程一直進行,直到全部的樣本都被預報了一次並且僅被預報一次。把每一個樣本的預報偏差平方加和,稱爲PRESS(predicted Error Sum of Squares)。交叉驗證方法在克服過擬合(Over-Fitting)問題上很是有用。web
K-fold cross-validationspa
{{K折交叉驗證,初始採樣分割成K個子樣本,一個單獨的子樣本被保留做爲驗證模型的數據,其餘K-1個樣本用來訓練。交叉驗證重複K次,每一個子樣本驗證一次,平均K次的結果或者使用其它結合方式,最終獲得一個單一估測。這個方法的優點在於,同時重複運用隨機產生的子樣本進行訓練和驗證,每次的結果驗證一次,10折交叉驗證是最經常使用的。}}code
CVlm {DAAG} val=CVlm(df=cv,m=10,form.lm=formula(Y~X1+X2+X3+X4))# m=10(10-fold,df=cv爲數據框文件爲cv,擬和普通最小二乘法) Analysis of Variance Table Response: Y Df Sum Sq Mean Sq F value Pr(>F) X1 1 69.4 69.4 17.19 0.00042 X2 1 4.1 4.1 1.03 0.32210 X3 1 32.3 32.3 8.01 0.00974 X4 1 27.8 27.8 6.88 0.01552 Residuals 22 88.8 4.0 X1 *** X2 X3 ** X4 * Residuals --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 fold 1 Observations in test set: 2 13 16 Predicted 12.03 10.180 cvpred 13.49 10.768 Y 8.40 10.100 CV residual -5.09 -0.668 Sum of squares = 26.4 Mean square = 13.2 n = 2 fold 2 Observations in test set: 3 8 19 26 Predicted 13.52 12.03 8.85 cvpred 13.67 12.02 7.78 Y 12.10 10.80 13.30 CV residual -1.57 -1.22 5.52 Sum of squares = 34.4 Mean square = 11.5 n = 3 fold 3 Observations in test set: 3 9 22 25 Predicted 7.87 13.16 17.79 cvpred 8.09 13.22 15.15 Y 9.60 14.90 20.00 CV residual 1.51 1.68 4.85 Sum of squares = 28.7 Mean square = 9.56 n = 3 fold 4 Observations in test set: 3 1 20 27 Predicted 11.428 12.3 11.29 cvpred 11.571 12.5 11.52 Y 11.200 10.2 10.40 CV residual -0.371 -2.3 -1.12 Sum of squares = 6.71 Mean square = 2.24 n = 3 fold 5 Observations in test set: 3 5 17 18 Predicted 11.10 13.05 9.167 cvpred 10.73 12.89 9.229 Y 13.40 14.80 9.100 CV residual 2.67 1.91 -0.129 Sum of squares = 10.8 Mean square = 3.59 n = 3 fold 6 Observations in test set: 3 6 10 21 Predicted 15.33 9.58 12.25 cvpred 13.63 9.76 12.27 Y 18.30 8.40 13.60 CV residual 4.67 -1.36 1.33 Sum of squares = 25.4 Mean square = 8.48 n = 3 fold 7 Observations in test set: 3 12 23 24 Predicted 10.436 15.963 15.21 cvpred 10.486 16.445 15.81 Y 10.600 16.000 13.20 CV residual 0.114 -0.445 -2.61 Sum of squares = 7.03 Mean square = 2.34 n = 3 fold 8 Observations in test set: 3 2 3 11 Predicted 9.48 13.064 11.87 cvpred 9.91 13.202 12.32 Y 8.80 12.300 9.30 CV residual -1.11 -0.902 -3.02 Sum of squares = 11.2 Mean square = 3.72 n = 3 fold 9 Observations in test set: 2 4 7 Predicted 10.716 11.64 cvpred 10.646 12.21 Y 11.600 11.10 CV residual 0.954 -1.11 Sum of squares = 2.13 Mean square = 1.07 n = 2 fold 10 Observations in test set: 2 14 15 Predicted 11.26 11.441 cvpred 11.75 11.373 Y 9.60 10.900 CV residual -2.15 -0.473 Sum of squares = 4.84 Mean square = 2.42 n = 2 Overall (Sum over all 2 folds) ms 5.83 #10折平均的均方爲5.83