簡單線性迴歸經過擬合線性方程y=wx+b獲得預測值,經過取得預測值和真實值的最小差距,獲得w和b的值。dom
公式:J(w,b)min=Σ(yi-yipre)2=∑(yi-wxi+b)2,即公式取最小值ide
import numpy as np from sklearn import datasets from sklearn.model_selection import train_test_split # 導入線性迴歸庫 from sklearn.linear_model import LinearRegression boston = datasets.load_boston() a = boston.data y = boston.target a = a[y<50.0] y = y[y<50.0] X_train, X_test, y_train, y_test = train_test_split(a, y, random_state=666) lin_reg = LinearRegression() lin_reg.fit(X_train, y_train) # 權重係數 lin_reg.coef_ # 截距 lin_reg.intercept_ # R2準確率 lin_reg.score
通常J(w,b)值最小時預測準確率最高,但考慮樣本集的數量,如:10000個樣本偏差爲1000,100個樣本偏差爲500,這樣偏差爲1000的預測效果好,因此要去掉樣本集的影響。spa
公式:R2=1-∑(yi-yipre)2/∑(ymean-yi)2code
分子:表示自定義模型預測產生的偏差blog
分母:使用y=ymean基準模型預測產生的偏差get
公式表示自定義模型對比基準模型的百分比it
R2<=1且值越大表示自定義模型預測的效果越好io
R2<0表面自定義模型預測效果極差,不如基準模型,數據之間大機率不是線性關係event
R2分子分母同時除以樣本m獲得:R2 = 1-MSE(ypre-y)/var(y) (var是方差)class
import numpy as np from sklearn import datasets from sklearn.model_selection import train_test_split # 導入線性迴歸庫 from sklearn.linear_model import LinearRegression # 導入(MSE) from sklearn.metrics import mean_squared_error # 導入(MAE) from sklearn.metrics import mean_absolute_error # 導入(R2) from sklearn.metrics import r2_score boston = datasets.load_boston() a = boston.data y = boston.target a = a[y<50.0] y = y[y<50.0] X_train, X_test, y_train, y_test = train_test_split(a, y, random_state=666) lin_reg = LinearRegression() lin_reg.fit(X_train, y_train) MSE = mean_squared_error(y, y_calculate) MAE = mean_absolute_error(y, y_calculate) # RMSE RMSE = np.sqrt(MSE) # R2 r2_score(y, y_calculate)
一個樣本具備N個特徵值:y = b + w1x1 + w2x2 + ...+ wmxm
w係數的正負分別表明正負相關,數值大小表明相關程度