05機器學習實戰之Logistic 迴歸scikit-learn實現

http://www.javashuo.com/article/p-xobqrdds-ns.html似然函數javascript

 原理:極大似然估計是創建在極大似然原理的基礎上的一個統計方法,是機率論在統計學中的應用。極大似然估計提供了一種給定觀察數據來評估模型參數的方法,即:「模型已定,參數未知」。經過若干次試驗,觀察其結果,利用試驗結果獲得某個參數值可以使樣本出現的機率爲最大,則稱爲極大似然估計。css

    因爲樣本集中的樣本都是獨立同分布,能夠只考慮一類樣本集D,來估計參數向量θ。記已知的樣本集爲:html

 似然函數(linkehood function):聯合機率密度函數稱爲相對於的θ的似然函數。html5

 https://blog.csdn.net/pql925/article/details/79021464對於似然函數的定義有些不正確,只看求導過程的推導java

 

 

In [7]:
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(666)
X = np.random.normal(0, 1, size=(200, 2))
y = np.array(X[:, 0] ** 2 + X[:, 1] < 1.5, dtype='int')

for _ in range(20):
    y[np.random.randint(200)] = 1  # 生成噪音數據
plt.scatter(X[y == 0, 0], X[y == 0, 1])
plt.scatter(X[y == 1, 0], X[y == 1, 1])
plt.show()
 
In [25]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=666)
In [26]:
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import Pipeline

log_reg = LogisticRegression(solver='lbfgs')
log_reg.fit(X_train, y_train)
Out[26]:
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='warn',
          n_jobs=None, penalty='l2', random_state=None, solver='lbfgs',
          tol=0.0001, verbose=0, warm_start=False)
In [27]:
log_reg.score(X_train, y_train)
Out[27]:
0.7933333333333333
In [28]:
log_reg.score(X_test, y_test)
Out[28]:
0.86
In [34]:
def plot_decision_boundary(model, axis):
    x0, x1 = np.meshgrid(
        np.linspace(axis[0], axis[1], int((axis[1] - axis[0]) * 100)),
        np.linspace(axis[2], axis[3], int((axis[3] - axis[2]) * 100))
    )
    X_new = np.c_[x0.ravel(), x1.ravel()]

    y_predict = model.predict(X_new)
    zz = y_predict.reshape(x0.shape)
    from matplotlib.colors import ListedColormap
    custom_cmap = ListedColormap(['#EF9A9A', '#FFF59D', '#90CAF9'])

    plt.contourf(x0, x1, zz, cmap=custom_cmap)
In [35]:
plot_decision_boundary(log_reg,axis=[-4,4,-4,4])
plt.scatter(X[y == 0, 0], X[y == 0, 1])
plt.scatter(X[y == 1, 0], X[y == 1, 1])
plt.show() 
 
 

多項式特徵應用於邏輯迴歸python

In [38]:
from sklearn.preprocessing import StandardScaler


def PolynomialLogisticRegression(degree):
    return Pipeline([
        ('Poly', PolynomialFeatures(degree=degree)),
        ('std_scaler', StandardScaler()),
        ('Logistic', LogisticRegression(solver='lbfgs'))
    ])


log_reg2 = PolynomialLogisticRegression(2)
log_reg2.fit(X_train, y_train)
log_reg2.score(X_train, y_train)
Out[38]:
0.9066666666666666
In [39]:
log_reg2.score(X_test, y_test)
Out[39]:
0.94
In [40]:
plot_decision_boundary(log_reg2, axis=[-4, 4, -4, 4])
plt.scatter(X[y == 0, 0], X[y == 0, 1])
plt.scatter(X[y == 1, 0], X[y == 1, 1])
plt.show()
 
相關文章
相關標籤/搜索