機器學習——邏輯迴歸(分類)

前言:真的是改了不少次!細節真的不少!

機器學習專欄html

  1. 機器學習——線性迴歸(預測)
  2. 機器學習——邏輯迴歸(分類)
  3. 機器學習——特徵縮放
  4. 機器學習——正則化

邏輯迴歸(分類)

一、基本原理

邏輯迴歸用於分類,是對樣本屬於某一類的機率進行預測,對數概率函數:
g ( z ) = 1 1 + e z g(z)=\frac{1}{1+e^{-z}}
在這裏插入圖片描述python

給定數據集 D = ( x ( 1 ) , y ( 1 ) ) ; ( x ( 2 ) , y ( 2 ) ) ; . . . ; ( x ( m ) , y ( i ) ) D={(x^{(1)},y^{(1)});(x^{(2)},y^{(2)});...;(x^{(m)},y^{(i)} )} ,其中 x ( i ) x^{(i)} 表示第 i i 個樣本點 x ( i ) R n x^{(i)}\in{R^n} (表示有n個屬性值)。
考慮到 y = θ 0 + θ 1 x 1 ( i ) + . . . + θ n x n ( i ) y=\theta_0+\theta_1 x^{(i)}_1+...+\theta_nx^{(i)}_n 取值是連續的,所以它不能擬合離散變量。能夠考慮用它來擬合條件機率 ,由於機率的取值也是連續的。可是其取值爲 R ,不符合機率取值爲 0 到 1,所以考慮採用廣義線性模型
對於一個簡單的二分類問題,咱們用logistics函數來代替理想的階躍函數來做爲鏈接函數
h θ ( x ( i ) ) = 1 1 + e θ T x ( i ) h_\theta(x^{(i)})=\frac{1}{1+e^{-\theta^Tx^{(i)}}}
z = θ T x ( i ) z=\theta^Tx^{(i)}
在這裏插入圖片描述
因而有:
l n h θ ( x ( i ) ) 1 h θ ( x ( i ) ) = θ T x ( i ) ln\frac{h_\theta(x^{(i)})}{1-h_\theta(x^{(i)})}=\theta^T x^{(i)}
事件發生與不發生的機率比值稱爲概率(odds), h θ ( x ( i ) ) h_\theta(x^{(i)}) 表示發生的機率,即:
{ P ( y = 1 x ( i ) , θ ) = h θ ( x ( i ) ) P ( y = 0 x ( i ) , θ ) = 1 h θ ( x ( i ) ) \left\{\begin{matrix} P(y=1|x^{(i)},\theta)=h_\theta(x^{(i)})\\ P(y=0|x^{(i)},\theta)=1-h_\theta(x^{(i)}) \end{matrix}\right.
綜合兩式可得:
P ( y x ( i ) ; θ ) = ( h θ ( x ( i ) ) ) y ( 1 h θ ( x ( i ) ) ) 1 y P(y|x^{(i)};\theta)=(h_\theta(x^{(i)}))^y(1-h_\theta(x^{(i)}))^{1-y}
所以邏輯迴歸的思路是,先擬合決策邊界(不侷限於線性,還能夠是多項式,這個過程能夠理解爲感知機),再創建這個邊界與分類的機率聯繫(經過對數概率函數),從而獲得了二分類狀況下的機率。web

關於對數似然估計的概念我這裏就不做過多介紹了,可參考浙江大學的《機率論與數理統計》,咱們由「最大似然估計法」去得出代價函數,咱們要求每一個樣本屬於其真實標記的機率越大越好,因此:
m a x L ( θ ) = i = 1 m P ( y ( i ) x ( i ) , θ ) max\quad L(\theta)=\prod_{i=1}^{m}P(y^{(i)}|x^{(i)},\theta)
取「對數似然」得:
m a x l o g L ( θ ) = i = 1 m l o g P ( y ( i ) x ( i ) , θ ) max\quad logL(\theta)=\sum_{i=1}^{m}logP(y^{(i)}|x^{(i)},\theta)
由上,咱們將代價函數定爲:
J ( θ ) = 1 m i = 1 m C ( h θ ( x ( i ) ) , y ( i ) ) = 1 m i = 1 m [ y ( i ) l o g ( h θ ( x ( i ) ) ) + ( 1 y ( i ) ) l o g ( 1 h θ ( x ( i ) ) ) ] J(\theta)=\frac{1}{m}\sum_{i=1}^{m}C(h_\theta(x^{(i)}),y^{(i)})=-\frac{1}{m}\sum_{i=1}^{m}[y^{(i)}log(h_\theta(x^{(i)}))+(1-y^{(i)})log(1-h_\theta(x^{(i)}))]
一次性計算出全部樣本的預測值(是個機率值):
h = g ( X θ ) h=g(X\theta)
其中, X = [ x 0 1 x 1 1 . . . x n 1 x 0 2 x 1 1 . . . x n 1 : : . . . : x 0 m x 1 m . . . x n m ] X=\begin{bmatrix} x_0^1 & x_1^1 &... &x_n^1 \\ x_0^2 & x_1^1 &... &x_n^1 \\ : & : &... &:\\ x_0^m & x_1^m &... &x_n^m \end{bmatrix} 表示訓練集, θ = [ θ 0 θ 1 : θ n ] \theta=\begin{bmatrix} \theta_0\\ \theta_1\\ :\\ \theta_n\end{bmatrix}
將代價函數寫成矩陣形式:
J ( θ ) = 1 m ( Y T l o g ( h ) ( 1 Y ) T l o g ( 1 h ) ) J(\theta)=-\frac{1}{m}(Y^Tlog(h)-(1-Y)^Tlog(1-h))
其中, Y = [ y ( 1 ) y ( 2 ) : y ( m ) ] Y=\begin{bmatrix} y^{(1)}\\ y^{(2)}\\ :\\ y^{(m)}\end{bmatrix} 表示由全部訓練樣本輸出構成的向量, h = [ h ( 1 ) h ( 2 ) : h ( m ) ] h=\begin{bmatrix} h(1)\\ h(2)\\ :\\ h(m) \end{bmatrix} 表示計算得出全部樣本的預測值(是個機率值)app

四、梯度降低法

梯度降低公式:
θ j : = θ j α m θ j J ( θ ) θ j : = θ j α m i = 1 m ( h θ ( x ( i ) ) y ( i ) ) x j ( i ) \theta_j:=\theta_j-\frac{\alpha}{m}\frac{\partial}{\partial\theta_j}J(\theta) \\ \theta_j:=\theta_j-\frac{\alpha}{m}\sum_{i=1}^{m}(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}
【logistics迴歸梯度降低公式的簡單推導】
θ j : = θ j α m θ j J ( θ ) J ( θ ) = 1 m i = 1 m [ y ( i ) l o g ( h θ ( x ( i ) ) ) + ( 1 y ( i ) ) l o g ( 1 h θ ( x ( i ) ) ) J ( θ ) = 1 m i = 1 m [ y ( i ) l o g ( g ( θ T x ( i ) ) ) + ( 1 y ( i ) ) l o g ( 1 g ( θ T x ( i ) ) ) J ( θ ) θ j = 1 m i = 1 m [ y ( i ) g ( θ T x ( i ) ) ( 1 y ( i ) ) 1 1 g ( θ T x ( i ) ) ] g ( θ T x ( i ) ) θ j g ( θ T x ( i ) ) θ j = ( 1 + e θ T x ( i ) ) θ j ( 1 + e θ T x ( i ) ) 2 = e θ T x ( i ) x j ( i ) ( 1 + e θ T x ( i ) ) 2 g ( θ T x ( i ) ) θ j = h θ ( x ( i ) ) ( 1 h θ ( x ( i ) ) ) x j ( i ) J ( θ ) θ j θ j : = θ j α m i = 1 m ( h θ ( x ( i ) ) y ( i ) ) x j ( i ) \theta_j:=\theta_j-\frac{\alpha}{m}\frac{\partial}{\partial\theta_j}J(\theta) \\J(\theta)=-\frac{1}{m}\sum_{i=1}^{m}[y^{(i)}log(h_\theta(x^{(i)}))+(1-y^{(i)})log(1-h_\theta(x^{(i)})) \\ J(\theta)=-\frac{1}{m}\sum_{i=1}^{m}[y^{(i)}log(g(\theta^Tx^{(i)}))+(1-y^{(i)})log(1-g(\theta^Tx^{(i)})) \\ \frac{\partial J(\theta)}{\partial \theta_j}=-\frac{1}{m}\sum_{i=1}^{m}[\frac{y^{(i)}}{g(\theta^Tx^{(i)})}-(1-y^{(i)})\frac{1}{1-g(\theta^Tx^{(i)})}]\frac{\partial g(\theta^Tx^{(i)})}{\partial \theta_j} \\ 先求:\frac{\partial g(\theta^Tx^{(i)})}{\partial \theta_j}=\frac{\frac{\partial (1+e^{-\theta^Tx^{(i)}})}{\partial \theta_j}}{(1+e^{-\theta^Tx^{(i)}})^2}=-\frac{e^{-\theta^Tx^{(i)}}x_j^{(i)}}{(1+e^{-\theta^Tx^{(i)}})^2} \\即:\frac{\partial g(\theta^Tx^{(i)})}{\partial \theta_j}=h_\theta(x^{(i)})(1-h_\theta(x^{(i)}))x_j^{(i)} \\ 代入\frac{\partial J(\theta)}{\partial \theta_j}中,得:\theta_j:=\theta_j-\frac{\alpha}{m}\sum_{i=1}^{m}(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)} 機器學習

四、sklearn實現邏輯迴歸

# -*- coding: utf-8 -*-
""" Created on Tue Nov 12 19:28:12 2019 @author: 1 """

from sklearn.model_selection import train_test_split
#導入logistics迴歸模型
from sklearn.linear_model import LogisticRegression
import numpy as np
import pandas as pd


df=pd.read_csv('D:\\workspace\\python\machine learning\\data\\breast_cancer.csv',sep=',',header=None,skiprows=1)
X = df.iloc[:,0:29]
y = df.iloc[:,30]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = LogisticRegression(solver='liblinear')
model.fit(X_train, y_train)
train_score = model.score(X_train, y_train)#R2值越接近1越好
cv_score = model.score(X_test, y_test)
print('train_score:{0:.6f}, cv_score:{1:.6f}'.format(train_score, cv_score))

y_pre = model.predict(X_test)
y_pre_proba = model.predict_proba(X_test)#輸出機率

print('matchs:{0}/{1}'.format(np.equal(y_pre, y_test).shape[0], y_test.shape[0]))#shape[0]列,shape[1]行
#print('y_pre:{}, \ny_pre_proba:{}'.format(y_pre, y_pre_proba))#輸出機率預測值

五、多分類問題

5.1多分類原理

爲了實現多分類,咱們將多個類(D)中的一個類標記爲正向類(y=1),而後將其餘全部類都標記爲負向類,這個模型記做 h θ ( 1 ) ( X ) h_\theta^{(1)}(X) 。接着,相似地第咱們選擇另外一個類標記爲正向類(y=2),再將其它類都標記爲負向類,將這個模型記做 h θ ( 2 ) ( X ) h_\theta^{(2)}(X) 依此類推。最後咱們獲得一系列的模型簡記爲:
h θ ( k ) ( X ) = P ( y = k X , θ ) h_\theta^{(k)}(X)=P(y=k|X,\theta) 其中 k = 1 , 2 , . . . , D k=1,2,...,D
最後,在作預測時,對每個輸入的測試變量,咱們將全部的分類機都運行一遍,選擇可能性最高的分類機的輸出結果做爲分類結果:
m a x    h θ ( k ) ( x ( i ) ) max\;h_\theta^{(k)}(x^{(i)}) svg

5.2sklearn實現多分類

# -*- coding: utf-8 -*-
""" Created on Tue Nov 12 22:07:34 2019 @author: 1 """

from sklearn.model_selection import train_test_split
#導入logistics迴歸模型
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score#預測準確率
import pandas as pd
import matplotlib.pyplot as plt


df=pd.read_csv('D:\\workspace\\python\machine learning\\data\\iris.csv',sep=',')
X = df.iloc[:,0:1]
y = df.iloc[:,4]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = LogisticRegression(solver='liblinear')
model.fit(X_train, y_train)
y_pre=model.predict(X_test)
print('accuracy_score:{}'.format(accuracy_score(y_test,y_pre)))#預測準確率
y_pre_proba = model.predict_proba(X_test)
print('y_pre:{}, \ny_pre_proba:{}'.format(y_pre, y_pre_proba))#輸出機率預測值

#畫原始數據圖
colors = ['blue', 'red','green']
plt.figure(1)
for i in range(3):
    plt.scatter(df.loc[df['virginica']==i].iloc[:,0],df.loc[df['virginica']==i].iloc[:,1],c=colors[i])
plt.title('原始數據分類結果')

#畫分類結果圖
colors = ['blue', 'red','green']
plt.figure(2)
df['virginica_pre']=model.predict(X)
for i in range(3):
    plt.scatter(df.loc[df['virginica_pre']==i].iloc[:,0],df.loc[df['virginica_pre']==i].iloc[:,1],c=colors[i])
plt.title('預測數據分類結果')

結果可視化:
在這裏插入圖片描述
給你們推薦一個博客:一文詳盡講解什麼是邏輯迴歸函數

相關文章
相關標籤/搜索