02-09 對數線性迴歸(波士頓房價預測)

時間 2019-11-10

原文原文鏈接

[TOC] 更新、更全的《機器學習》的更新網站，更有python、go、數據結構與算法、爬蟲、人工智能教學等着你：http://www.javashuo.com/article/p-vozphyqp-cm.htmlhtml

對數線性迴歸(波士頓房價預測)

1、導入模塊

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.font_manager import FontProperties
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
%matplotlib inline
font = FontProperties(fname='/Library/Fonts/Heiti.ttc')

2、獲取數據

在《代碼-普通線性迴歸》的時候說到特徵LSTAT和標記MEDV有最高的相關性，可是它們之間並非線性關係，嘗試多項式迴歸發現能夠獲得不錯的結果，可是多項式可能會增長模型的複雜度容易致使過擬合的問題出現，是否是能夠假設特徵和標記之間可能符合對數線性迴歸呢？即$y$和$x$的關係爲python

\[ ln(y) = x \]

下面將使用對數線性迴歸作嘗試。算法

df = pd.read_csv('housing-data.txt', sep='\s+', header=0)
X = df[['LSTAT']].values
y = df['MEDV'].values

# np.log()默認以$e$爲底數
y_sqrt = np.log(y)

3、訓練模型

# 增長x軸座標點
X_fit = np.arange(X.min(), X.max(), 1)[:, np.newaxis]

lr = LinearRegression()

# 線性迴歸
lr.fit(X, y)
lr_predict = lr.predict(X_fit)
# 計算線性迴歸的R2值
lr_r2 = r2_score(y, lr.predict(X))

4、可視化

plt.scatter(X, y, c='gray', edgecolor='white', marker='s', label='訓練數據')
plt.plot(X_fit, lr_predict, c='r',
         label='線性,$R^2={:.2f}$'.format(lr_r2))

plt.xlabel('地位較低人口的百分比[LSTAT]', fontproperties=font)
plt.ylabel('ln(以1000美圓爲計價單位的房價[RM])', fontproperties=font)
plt.title('波士頓房價預測', fontproperties=font, fontsize=20)
plt.legend(prop=font)
plt.show()

![png](http://www.chenyoude.com/ml/02-09 對數線性迴歸(波士頓房價預測)_9_0.png?x-oss-process=style/watermark)數據結構

上圖能夠看出對數線性迴歸也能比較不錯的擬合特徵與標記之間的關係，此次只是使用了標準的對數線性迴歸擬合二者之間的關係，你也能夠自行選擇不一樣的關係函數$g(·)$去擬合二者之間的關係，也許可能會獲得一個不錯的結果。機器學習