有些講得太爛了,我來通俗的梳理一下R2..net
在線性迴歸的模型下,咱們能夠計算SE(line), SE(y均值)。翻譯
The statistic R2describes the proportion of variance in the response variable explained by the predictor variableblog
如何理解這句話,Y自己就有本身的SE,在模型下,Y與其預測值之間又有一個SE,若是模型徹底擬合,那麼SE(line)=0. 此時的R2就是1,也就是全部的方差都被該模型解釋了(能夠想象成一種徹底過擬合的模型。)ip
決定係數(coefficient ofdetermination),有的教材上翻譯爲斷定係數,也稱爲擬合優度。ci
決定係數反應了y的波動有多少百分比能被x的波動所描述,即表徵依變數Y的變異中有多少百分比,可由控制的自變數X來解釋.get
決定係數的數值恰巧等於相關係數的平方。it
表達式:R2=SSR/SST=1-SSE/SSTio
其中:SST=SSR+SSE,SST(total sum of squares)爲總平方和,SSR(regression sum of squares)爲迴歸平方和,SSE(error sum of squares) 爲殘差平方和。function
數據的組間變異/總變異*100%,就是所謂的R-square.
組內變異(SSE)+組間變異(SSA)=總變異(SST),能夠推出公式R squared=1-SSE/SST;具體組內變異和組間變異及總變異的計算估計你會的就不寫了。
迴歸平方和:SSR(Sum of Squares forregression) = ESS (explained sum of squares)
殘差平方和:SSE(Sum of Squares for Error) = RSS(residual sum of squares)
總離差平方和:SST(Sum of Squares fortotal) = TSS(total sum of squares)
SSE+SSR=SST RSS+ESS=TSS
意義:擬合優度越大,自變量對因變量的解釋程度越高,自變量引發的變更佔總變更的百分比高。觀察點在迴歸直線附近越密集。
取值範圍:0-1.
舉例:
假設有10個點,以下圖:
咱們R來實現如何求線性方程和R2:
# 線性迴歸的方程 mylr = function(x,y){ plot(x,y) x_mean = mean(x) y_mean = mean(y) xy_mean = mean(x*y) xx_mean = mean(x*x) yy_mean = mean(y*y) m = (x_mean*y_mean - xy_mean)/(x_mean^2 - xx_mean) b = y_mean - m*x_mean f = m*x+b# 線性迴歸方程 lines(x,f) sst = sum((y-y_mean)^2) sse = sum((y-f)^2) ssr = sum((f-y_mean)^2) result = c(m,b,sst,sse,ssr) names(result) = c('m','b','sst','sse','ssr') return(result) } x = c(60,34,12,34,71,28,96,34,42,37) y = c(301,169,47,178,365,126,491,157,202,184) f = mylr(x,y) f['m'] f['b'] f['sse']+f['ssr'] f['sst'] R2 = f['ssr']/f['sst']
最後方程爲:f(x)=5.3x-15.5
R2爲99.8,說明x對y的解釋程度很是高。