1. 與簡單線性迴歸的區別python
多個自變量(x)spa
2. 多元迴歸模型blog
其中,是參數,
是偏差值,
截面utf-8
3. 多元迴歸方程ci
4. 估計多元迴歸方程it
一個樣本被用來計算的點估計
io
5. 估計流程(與簡單線性迴歸相似)table
6. 估計方法class
使用sum of squares最小import
運算與簡單線性迴歸相似,涉及到線性代數和矩陣代數運算
7. 列子
一家快遞公司送貨:x1 : 運輸里程, x2: 運輸次數, y: 總運輸時間
X1 |
X2 |
X3 |
|
1 |
100 |
4 |
9.3 |
2 |
50 |
3 |
4.8 |
3 |
100 |
4 |
8.9 |
4 |
100 |
2 |
6.5 |
5 |
50 |
2 |
4.2 |
6 |
80 |
2 |
6.2 |
7 |
75 |
3 |
7.4 |
8 |
65 |
4 |
6.0 |
9 |
90 |
3 |
7.6 |
10 |
90 |
2 |
6.1 |
模型:Time=b0 + b1*Miles + b2*Deliveries
求出的模型:Time= -0.869+ 0.0611*Miles + 0.923*Deliveries
參數含義:b0: 截面,當全部的x爲0時的期待值
b1:平均每多運送一英里,運輸時間延長0.0611小時
b2: 平均每多一次運輸,運輸的時間延長0.923小時
預測:若是一個運輸任務跑102英里,運輸6次,預計多少小時?
Time= -0.869+ 0.0611*102+ 0.923*6
=10.9小時
代碼實現:
數據源:
# -*- coding:utf-8 -*-
from numpy import genfromtxt
from sklearn import linear_model
dataPath = r"Delivery.csv"
deliveryData = genfromtxt(dataPath,delimiter=',')
print ("data")
print (deliveryData)
x = deliveryData[: , : -1]
y = deliveryData[: , -1]
print("x")
print(x)
print("y")
print(y)
lr = linear_model.LinearRegression()
lr.fit(x ,y)
print("coefficients:")
print(lr.coef_) #b1 b2
print("intercept:")
print(lr.intercept_) #b0
xpredict = [102, 6]
ypredict = lr.predict(xpredict)
print("predict:")
print(ypredict)
結果
data
[[ 100. 4. 9.3]
[ 50. 3. 4.8]
[ 100. 4. 8.9]
[ 100. 2. 6.5]
[ 50. 2. 4.2]
[ 80. 2. 6.2]
[ 75. 3. 7.4]
[ 65. 4. 6. ]
[ 90. 3. 7.6]
[ 90. 2. 6.1]]
x
[[ 100. 4.]
[ 50. 3.]
[ 100. 4.]
[ 100. 2.]
[ 50. 2.]
[ 80. 2.]
[ 75. 3.]
[ 65. 4.]
[ 90. 3.]
[ 90. 2.]]
y
[ 9.3 4.8 8.9 6.5 4.2 6.2 7.4 6. 7.6 6.1]
coefficients:
[ 0.0611346 0.92342537]
intercept:
-0.868701466782
predict:
[ 10.90757981]
8. 若是自變量中有分類型變量(categorical data),如何處理?
英里數 |
次數 |
車型 |
時間 |
|
1 |
100 |
4 |
1 |
9.3 |
2 |
50 |
3 |
0 |
4.8 |
3 |
100 |
4 |
1 |
8.9 |
4 |
100 |
2 |
2 |
6.5 |
5 |
50 |
2 |
2 |
4.2 |
6 |
80 |
2 |
1 |
6.2 |
7 |
75 |
3 |
1 |
7.4 |
8 |
65 |
4 |
0 |
6.0 |
9 |
90 |
3 |
0 |
7.6 |
10 |
90 |
2 |
6.1 |
6.1 |
代碼實現:
數據源:
# -*- coding:utf-8 -*-
from numpy import genfromtxt
from sklearn import linear_model
datapath = r"Delivery_Dummy.csv"
data = genfromtxt(datapath, delimiter=",")
print("data:")
print(data)
x = data[1:,:-1]
y = data[1:,-1]
print("x:" )
print(x)
print("y:" )
print(y)
mlr = linear_model.LinearRegression()
mlr.fit(x,y)
print(mlr)
print("coef:")
print(mlr.coef_)
print("intercept:")
print(mlr.intercept_)
xpredict = [90,2,0,0,1]
ypredict = mlr.predict(xpredict)
print("predict:")
print(ypredict)
結果:
data:
[[ nan nan nan nan nan nan]
[ 100. 4. 0. 1. 0. 9.3]
[ 50. 3. 1. 0. 0. 4.8]
[ 100. 4. 0. 1. 0. 8.9]
[ 100. 2. 0. 0. 1. 6.5]
[ 50. 2. 0. 0. 1. 4.2]
[ 80. 2. 0. 1. 0. 6.2]
[ 75. 3. 0. 1. 0. 7.4]
[ 65. 4. 1. 0. 0. 6. ]
[ 90. 3. 1. 0. 0. 7.6]
[ 90. 2. 0. 0. 1. 6.1]]
x:
[[ 100. 4. 0. 1. 0.]
[ 50. 3. 1. 0. 0.]
[ 100. 4. 0. 1. 0.]
[ 100. 2. 0. 0. 1.]
[ 50. 2. 0. 0. 1.]
[ 80. 2. 0. 1. 0.]
[ 75. 3. 0. 1. 0.]
[ 65. 4. 1. 0. 0.]
[ 90. 3. 1. 0. 0.]
[ 90. 2. 0. 0. 1.]]
y:
[ 9.3 4.8 8.9 6.5 4.2 6.2 7.4 6. 7.6 6.1]
coef:
[ 0.05520428 0.6952821 -0.16572633 0.58179313 -0.4160668 ]
intercept:
0.209160181582
predict:
[ 6.1520428]
9. 關於偏差的分佈
偏差是一個隨機變量,均值爲0
的方差對於全部的自變量來講相等
全部的值是獨立的
知足正態分佈,而且經過
反應y的指望