簡單線性迴歸是最基礎的一種迴歸模型,自變量只有一個,函數曲線爲直線,因變量爲連續型,自變量能夠是連續的或者是離散的。函數表示以下:java
其中 y 是因變量, x是自變量, β0 和 β1 屬於起始值和係數,ε 爲偏移量,爲了使獲得的函數模型更加準確,最後會加上偏移量。python
線性迴歸通常使用最小二乘法來求解函數模型級求解 β0 和 β1 。 方法以下:最小二乘法編程
如圖中四個點爲數據(x,y):(1,6)(2,5)(3,7)(4,10)。咱們須要根據紅色的數據來求藍色的曲線。目前咱們已知這四個點匹配直線y= β1 + β2x 。因此咱們要找到符合這條直線的最佳狀況,即最合適的β0 和 β1。app
最小二乘法就是儘可能取兩邊方差的最小值,這樣能夠找到最擬合的曲線。dom
而後咱們咱們同時對β0和β1求其偏導數機器學習
這樣咱們很容易就解除方程組的解函數
因此咱們就獲得了直線 y=3.5 + 1.4x學習
用Python和Java代碼表示以下:ui
# -*- coding: utf-8 -*- """ Created on Thu Dec 01 00:02:49 2016 @author: steve """ def SLR(x,y): intercept = 0.0 slope = 0.0 n = len(x) sumx = 0.0 sumy = 0.0 sumx2 = 0.0 # 第一次循環,獲得平均值 for i in range(n): sumx += x[i] sumy += y[i] sumx2 += x[i]*x[i] xbar = sumx/n ybar = sumy/n xxbar = 0.0 yybar = 0.0 xybar = 0.0 # 第二次循環,獲得方差 for i in range(n): xxbar += (x[i] - xbar) * (x[i] - xbar); yybar += (y[i] - ybar) * (y[i] - ybar); xybar += (x[i] - xbar) * (y[i] - ybar); # 計算斜率和intercept slope = xybar / xxbar intercept = ybar - slope * xbar print "slope is {}, intercept is {}".format(slope, intercept) ## 其餘統計變量我就不寫了。 x=[1,2,3,4] y=[6,5,7,10] SLR(x,y)
package regression; public class SLR { // 這是用二分法作的簡單線性迴歸 private final double intercept, slope; private final double r2; private final double svar0, svar1; public SLR(double[] x, double[] y) { if (x.length != y.length) { throw new IllegalArgumentException("lengths doesn't match!!!"); } int n = x.length; // 第一次循環,找到 全部x和y的平均值 double sumx = 0.0; double sumy = 0.0; double sumx2 = 0.0; for (int i=0; i<n; i++) { sumx += x[i]; sumy += y[i]; sumx2 += x[i]*x[i]; } double xbar = sumx / n; double ybar = sumy / n; // 第二次計算,求出方差 double xxbar = 0.0; double yybar = 0.0; double xybar = 0.0; for (int i = 0; i < n; i++) { xxbar += (x[i] - xbar) * (x[i] - xbar); yybar += (y[i] - ybar) * (y[i] - ybar); xybar += (x[i] - xbar) * (y[i] - ybar); } System.out.println(xxbar); System.out.println(yybar); System.out.println(xybar); slope = xybar / xxbar; //求偏導數的過程 intercept = ybar - slope * xbar; // 其餘的統計數據 double rss = 0.0; // residual sum of squares double ssr = 0.0; // regression sum of squares for (int i = 0; i < n; i++) { double fit = slope*x[i] + intercept; rss += (fit - y[i]) * (fit - y[i]); ssr += (fit - ybar) * (fit - ybar); } int degreesOfFreedom = n-2; r2 = ssr / yybar; double svar = rss / degreesOfFreedom; svar1 = svar / xxbar; svar0 = svar/n + xbar*xbar*svar1; } public double intercept() { return intercept; } public double slope() { return slope; } public double R2() { return r2; } public double slopeStdErr() { return Math.sqrt(svar1); } public double interceptStdErr() { return Math.sqrt(svar0); } //這是預測方法,在機器學習中的函數 public double predict(double x) { return slope*x + intercept; } public String toString() { StringBuilder s = new StringBuilder(); s.append(String.format("%.2f n + %.2f", slope(), intercept())); s.append(" (R^2 = " + String.format("%.3f", R2()) + ")"); return s.toString(); } }
這是一個迴歸總結的系列,每一篇我都會加上代碼。參考了幾篇論文,先開個頭,後面我會一點點完善。我這渣渣編程水平也太差了~~Python給我寫成這個鬼樣子····人老了果真不適合寫代碼了~spa