（轉）決定係數R2

時間 2019-12-09

標籤決定係數 r2 简体版

原文原文鏈接

有些講得太爛了，我來通俗的梳理一下R2..net

Calculating R-squaredssr

在線性迴歸的模型下，咱們能夠計算SE(line), SE(y均值)。翻譯

The statistic R2describes the proportion of variance in the response variable explained by the predictor variableblog

如何理解這句話，Y自己就有本身的SE，在模型下，Y與其預測值之間又有一個SE，若是模型徹底擬合，那麼SE(line)=0. 此時的R2就是1，也就是全部的方差都被該模型解釋了（能夠想象成一種徹底過擬合的模型。）ip

決定係數（coefficient ofdetermination），有的教材上翻譯爲斷定係數，也稱爲擬合優度。ci

決定係數反應了y的波動有多少百分比能被x的波動所描述，即表徵依變數Y的變異中有多少百分比,可由控制的自變數X來解釋.get

決定係數的數值恰巧等於相關係數的平方。it

表達式：R2=SSR/SST=1-SSE/SSTio

其中：SST=SSR+SSE，SST(total sum of squares)爲總平方和，SSR(regression sum of squares)爲迴歸平方和，SSE(error sum of squares) 爲殘差平方和。function

數據的組間變異/總變異*100%，就是所謂的R-square.

組內變異（SSE）+組間變異（SSA）=總變異（SST），能夠推出公式R squared=1-SSE/SST；具體組內變異和組間變異及總變異的計算估計你會的就不寫了。

迴歸平方和：SSR(Sum of Squares forregression) = ESS (explained sum of squares)

殘差平方和：SSE（Sum of Squares for Error） = RSS(residual sum of squares)

總離差平方和：SST(Sum of Squares fortotal) = TSS(total sum of squares)

SSE+SSR=SST RSS+ESS=TSS

意義：擬合優度越大，自變量對因變量的解釋程度越高，自變量引發的變更佔總變更的百分比高。觀察點在迴歸直線附近越密集。

取值範圍：0-1.

舉例：

假設有10個點，以下圖：

咱們R來實現如何求線性方程和R2：

# 線性迴歸的方程
mylr = function(x,y){
  
  plot(x,y)
  
  x_mean = mean(x)
  y_mean = mean(y)
  xy_mean = mean(x*y)
  xx_mean = mean(x*x)
  yy_mean = mean(y*y)
  
  m = (x_mean*y_mean - xy_mean)/(x_mean^2 - xx_mean)
  b = y_mean - m*x_mean
  
  
  f = m*x+b# 線性迴歸方程
  
  lines(x,f)
  
  sst = sum((y-y_mean)^2)
  sse = sum((y-f)^2)
  ssr = sum((f-y_mean)^2)
  
  result = c(m,b,sst,sse,ssr)
  names(result) = c('m','b','sst','sse','ssr')
  
  return(result)
}
 
x = c(60,34,12,34,71,28,96,34,42,37)
y = c(301,169,47,178,365,126,491,157,202,184)
 
f = mylr(x,y)
 
f['m']
f['b']
f['sse']+f['ssr']
f['sst']
 
R2 = f['ssr']/f['sst']

最後方程爲：f(x)=5.3x-15.5

R2爲99.8，說明x對y的解釋程度很是高。