邏輯迴歸-5.scikit-learn中的邏輯迴歸

時間 2019-11-19

標籤邏輯迴歸 5.scikit scikit learn 简体版

原文原文鏈接

scikit-learn中的邏輯迴歸

構造數據集算法

import numpy
import matplotlib.pyplot as plt

numpy.random.seed(666)
X = numpy.random.normal(0,1,size=(200,2))
# 決策邊界爲二次函數
y = numpy.array(X[:,0]**2 + X[:,1] < 1.5,dtype='int')
# 隨機改變20個點，目的是添加噪點
for _ in range(20):
    y[numpy.random.randint(200)] = 1

plt.scatter(X[y==0,0],X[y==0,1],color='red')
plt.scatter(X[y==1,0],X[y==1,1],color='blue')
plt.show()

用scikit-learn中的邏輯迴歸：數組

from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

def PolynomialLogisticRegression(degree):
    return Pipeline([
        ('poly',PolynomialFeatures(degree=degree)),
        ('stand_scalor',StandardScaler()),
        ('log_reg',LogisticRegression())
    ])
    
x_train,x_test,y_train,y_test = train_test_split(X,y)

當多項式爲2階時dom

poly_log_reg = PolynomialLogisticRegression(2)
poly_log_reg.fit(x_train,y_train)

算法準確率爲92%
繪製決策邊界(決策邊界繪製方法見上篇)：
函數

當多項式爲20階時：
spa

能夠看出，隨着多項式項的增長，模型變得過擬合了code

改變模型正則化的參數

scikit-learn中使用正則化的方稱爲：\(C\cdot J(\theta )+L1/L2\)，其中默認係數C爲1，正則化項爲L2orm

減少係數C，增大正則化項的比例

def PolynomialLogisticRegression(degree,penalty='l2',C=1):
    return Pipeline([
        ('poly',PolynomialFeatures(degree=degree)),
        ('stand_scalor',StandardScaler()),
        ('log_reg',LogisticRegression(penalty=penalty,C=C))
    ])
    
poly_log_reg2 = PolynomialLogisticRegression(20,penalty='l2',C=0.1)
poly_log_reg2.fit(x_train,y_train)

改變正則項L2爲L1

poly_log_reg3 = PolynomialLogisticRegression(20,penalty='l1',C=0.1)
poly_log_reg3.fit(x_train,y_train)

注：scikit-learn中的邏輯迴歸中，損失函數係數C，多項式階數，正則化項等都是算法的超參數，在具體的應用中，須要使用網格搜索，獲得最合適的參數組合。blog

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。