Python 實現最小二乘法擬合直線

時間 2019-12-04

原文原文鏈接

線性迴歸

線性迴歸是迴歸分析中最多見的一種建模方式。當因變量是連續的，自變量是連續的或者離散的，且兩者的關係可用一條直線近似表示，這種迴歸分析稱爲一元線性迴歸分析。
用方程 y = mx + c，其中 y爲結果，x爲特徵，m爲係數，c爲偏差在數學中m爲梯度c爲截距。html

最小二乘法

最小二乘法用於求目標函數的最優值，它經過最小化偏差的平方和尋找匹配項因此又稱爲：最小平方法；這裏將用最小二乘法用於求得線性迴歸的最優解
關於最小二乘法推導過程，詳見這篇博客最小二乘法

pandas 處理數據

導入 pandas 模塊函數

import pandas as pd
import matplotlib.pyplot as plt

# jupyter 關於繪圖的參數配置

plt.style.use('ggplot')
%config InlineBackend.figure_format = 'retina'
%matplotlib inline

獲取表示長度和寬度關係的幾組數據測試

數據不是很完美，接下來利用 pandas 處理下ui

修改列名
重置索引

df = df.rename(columns={'Unnamed: 0':'0'})
df = df.set_index(keys=['0'])

爲了分析長度和寬度之間的線性關係，分別獲取長度和寬度的一維數據spa

xcord = df.loc['長度']
ycord = df.loc['寬度']

plt.scatter(xcord,ycord,s=30,c='red',marker='s')

從寬度和長度的數據分佈，能夠看出具備必定的線性關係，接下來咱們用最小二乘法來擬合這條直線code

最小二乘法的計算過程

## xy 的均值
(xcord*ycord).mean()

## x 的均值乘以 y 的均值
xcord.mean()* ycord.mean()

## x 的平方均值
pow(xcord,2).mean()

## x 的均值的平方
pow(xcord.mean(),2)

# m 分子是 xy 的均值減去 x 的均值乘以 y 的均值；
# m 分母是 x 平方的均值 減去 x 的均值的平方

m = ((xcord*ycord).mean() - xcord.mean()* ycord.mean())/(pow(xcord,2).mean()-pow(xcord.mean(),2))

# c 等於 y 的均值 - m 乘以 x 的均值
c = ycord.mean() - m*xcord.mean()

# 繪圖
plt.scatter(xcord,ycord,s=30,c='red',marker='s')
x=np.arange(90.0,250.0,0.1)
y=m*x+c    
plt.plot(x,y)
plt.show()

Python 建模

處理數據，計算相關係數矩陣，提取特徵和標籤orm

df = pd.read_csv('./zuixiaoerchengfa.csv',encoding='gbk')
df.rename(columns={"Unnamed: 0":""},inplace=True)
df.set_index(keys="",inplace=True)
df_new = df.T
df_new.corr()

xcord = df_new['長度']
ycord = df_new['寬度']

引入 sklearn 模塊獲得訓練集和測試集

from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

plt.style.use('ggplot')
%config InlineBackend.figure_format = 'retina'
%matplotlib inline

# 訓練數據、測試數據遵循二八法則
x_train,x_test,y_train,y_test = train_test_split(xcord, ycord, train_size = 0.8, test_size = 0.2)

# 從圖能夠看出兩個特徵之間適合簡單線性迴歸模型
plt.scatter(x_train,y_train,c = 'g')
plt.xlabel("L")
plt.ylabel("H")

建立線性迴歸模型

from sklearn.linear_model import LinearRegression

model = LinearRegression()

# model.fit model.score 須要傳遞二維列表，故經過 reshape 重塑
x_train = x_train.values.reshape(-1,1)
y_train = y_train.values.reshape(-1,1)

model.fit(x_train,y_train)

# 計算出擬合的最小二乘法方程

# y = mx + c 
c = model.intercept_
m = model.coef_

c = round(float(c),2)
m = round(float(m),2)
print("最小二乘法方程 : y = {} + {}x".format(c,m))

評估模型

x_test = x_test.values.reshape(-1,1)
y_test = y_test.values.reshape(-1,1)
model.score(x_test,y_test)

可視化效果

經過可視化效果來感覺模型擬合效果htm

x_train_result = model.predict(x_train)

plt.scatter(xcord,ycord,c = 'r', label = "source data")
plt.scatter(x_train,y_train, c = 'b',label = "train data")
plt.scatter(x_test,y_test,c = 'g',label = "test data")

plt.xlabel("L")
plt.ylabel("H")
plt.legend(loc="upper left")
plt.plot(x_train,x_train_result)

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。