構造數據集算法
import numpy import matplotlib.pyplot as plt numpy.random.seed(666) X = numpy.random.normal(0,1,size=(200,2)) # 決策邊界爲二次函數 y = numpy.array(X[:,0]**2 + X[:,1] < 1.5,dtype='int') # 隨機改變20個點,目的是添加噪點 for _ in range(20): y[numpy.random.randint(200)] = 1 plt.scatter(X[y==0,0],X[y==0,1],color='red') plt.scatter(X[y==1,0],X[y==1,1],color='blue') plt.show()
用scikit-learn中的邏輯迴歸:數組
from sklearn.linear_model import LogisticRegression from sklearn.pipeline import Pipeline from sklearn.preprocessing import PolynomialFeatures from sklearn.preprocessing import StandardScaler from sklearn.model_selection import train_test_split def PolynomialLogisticRegression(degree): return Pipeline([ ('poly',PolynomialFeatures(degree=degree)), ('stand_scalor',StandardScaler()), ('log_reg',LogisticRegression()) ]) x_train,x_test,y_train,y_test = train_test_split(X,y)
當多項式爲2階時dom
poly_log_reg = PolynomialLogisticRegression(2) poly_log_reg.fit(x_train,y_train)
算法準確率爲92%
繪製決策邊界(決策邊界繪製方法見上篇):
函數
當多項式爲20階時:
spa
能夠看出,隨着多項式項的增長,模型變得過擬合了code
scikit-learn中使用正則化的方稱爲:\(C\cdot J(\theta )+L1/L2\),其中默認係數C爲1,正則化項爲L2orm
def PolynomialLogisticRegression(degree,penalty='l2',C=1): return Pipeline([ ('poly',PolynomialFeatures(degree=degree)), ('stand_scalor',StandardScaler()), ('log_reg',LogisticRegression(penalty=penalty,C=C)) ]) poly_log_reg2 = PolynomialLogisticRegression(20,penalty='l2',C=0.1) poly_log_reg2.fit(x_train,y_train)
poly_log_reg3 = PolynomialLogisticRegression(20,penalty='l1',C=0.1) poly_log_reg3.fit(x_train,y_train)
注:scikit-learn中的邏輯迴歸中,損失函數係數C,多項式階數,正則化項等都是算法的超參數,在具體的應用中,須要使用網格搜索,獲得最合適的參數組合。blog