主要參考 《統計學習方法》 《機器學習實戰》 機器學習:從編程的角度去理解邏輯迴歸javascript
邏輯迴歸,css
有一種定義是這樣的:邏輯迴歸實際上是一個線性分類器,只是在外面嵌套了一個邏輯函數,主要用於二分類問題。這個定義明確的指明瞭邏輯迴歸的特色:html
一個線性分類器html5
外層有一個邏輯函數java
咱們知道,線性迴歸的模型是求出輸出特徵向量 Y 和輸入樣本矩陣 X 之間的線性關係係數 θ,知足 Y=Xθ。此時咱們的 Y 是連續的,因此是迴歸模型。若是咱們想要 Y 是離散的話,怎麼辦呢?一個能夠想到的辦法是,咱們對於這個 Y 再作一次函數轉換,變爲 g(Y)。若是咱們令 g(Y)的值在某個實數區間的時候是類別 A,在另外一個實數區間的時候是類別 B,以此類推,就獲得了一個分類模型。若是結果的類別只有兩種,那麼就是一個二元分類模型了。邏輯迴歸的出發點就是從這來的。python
用最大似然的方法jquery
得到導數以後,就能夠用梯度提高法來 迭代更新參數了。 linux
接下來看下代碼部分,全部的代碼示例都沒有寫預測結果,而只是畫出分界線。android
分界線怎麼畫?css3
設定 w0x0+w1x1+w2x2=0 解出x2和x1的關係,就能夠畫圖了,固然等式右邊也可換成1。這個分界線主要就是用來看下大概的一個分區。
%matplotlib inline
import numpy as np
from numpy import *
import os
import pandas as pd
import matplotlib.pyplot as plt
能夠看到21-24行的代碼,就是上面推導公式,梯度提高迭代更新參數w。
這裏要注意到,算法gradAscent裏的變量h 和偏差error都是向量, 用矩陣的形式把全部的樣本都帶進去算了,要區分後面的隨機梯度的算法。
def loadDataSet():
dataMat = []; labelMat = []
fr = open('testSet.txt')
for line in fr.readlines():
lineArr = line.strip().split()
dataMat.append([1.0, float(lineArr[0]), float(lineArr[1])])
labelMat.append(int(lineArr[2]))
return dataMat,labelMat
##邏輯函數
def sigmoid(inX):
return 1.0/(1+np.exp(-inX))
def gradAscent(dataMatIn, classLabels):
dataMatrix = mat(dataMatIn) #convert to NumPy matrix
labelMat = mat(classLabels).transpose() #convert to NumPy matrix
m,n = shape(dataMatrix)
alpha = 0.001
maxCycles = 500
weights = ones((n,1))
for k in range(maxCycles): #heavy on matrix operations
h = sigmoid(dataMatrix*weights) #matrix mult
error = (labelMat - h) #vector subtraction
weights = weights + alpha * dataMatrix.transpose()* error #matrix mult
print("h和error的形式",shape(h),shape(error))
return weights
def plotBestFit(weights):
dataMat,labelMat=loadDataSet()
dataArr = array(dataMat)
n = shape(dataArr)[0]
xcord1 = []; ycord1 = []
xcord2 = []; ycord2 = []
for i in range(n):
if int(labelMat[i])== 1:
xcord1.append(dataArr[i,1]); ycord1.append(dataArr[i,2])
else:
xcord2.append(dataArr[i,1]); ycord2.append(dataArr[i,2])
fig = plt.figure()
ax = fig.add_subplot(111)
ax.scatter(xcord1, ycord1, s=30, c='red', marker='s')
ax.scatter(xcord2, ycord2, s=30, c='green')
x = arange(-3.0, 3.0, 0.1)
y = (-weights[0]-weights[1]*x)/weights[2]+1
ax.plot(x, y)
plt.xlabel('X1'); plt.ylabel('X2');
plt.show()
if __name__ == '__main__':
dataMat, labelMat = loadDataSet()
weights=gradAscent(dataMat,labelMat)
weights=array(weights).ravel()
print(weights)
plotBestFit(weights)
def stocGradAscent0(dataMatrix, classLabels):
m,n = shape(dataMatrix)
alpha = 0.01
weights = ones(n) #initialize to all ones
for i in range(m):
h = sigmoid(sum(dataMatrix[i]*weights))
error = classLabels[i] - h
weights = weights + alpha * error * dataMatrix[i]
print("h和error的形式",h,error)
return weights
if __name__ == '__main__':
dataMat, labelMat = loadDataSet()
weights=stocGradAscent0(array(dataMat),labelMat)
print(weights)
plotBestFit(weights)
def stocGradAscent1(dataMatrix, classLabels, numIter=150):
m,n = shape(dataMatrix)
weights = ones(n) #initialize to all ones
for j in range(numIter):
dataIndex = [i for i in range(m)]
for i in range(m):
alpha = 4/(1.0+j+i)+0.0001 #apha decreases with iteration, does not
randIndex = int(random.uniform(0,len(dataIndex)))#go to 0 because of the constant
h = sigmoid(sum(dataMatrix[randIndex]*weights))
error = classLabels[randIndex] - h
weights = weights + alpha * error * dataMatrix[randIndex]
dataIndex.pop(randIndex)
return weights
if __name__ == '__main__':
dataMat, labelMat = loadDataSet()
weights=stocGradAscent1(array(dataMat),labelMat)
print(weights)
plotBestFit(weights)
使用sklearn包,用他本身提供的接口,咱們獲取到了最後的係數,而後畫出分界線。
from sklearn.linear_model import LogisticRegression
def sk_lr(X_train,y_train):
model = LogisticRegression()
model.fit(X_train, y_train)
model.score(X_train,y_train)
print('權重',model.coef_)
print(model.intercept_)
# return model.predict(X_train)
return model.coef_,model.intercept_
def plotBestFit1(coef,intercept):
weights=array(coef).ravel()
dataMat,labelMat=loadDataSet()
dataArr = array(dataMat)
n = shape(dataArr)[0]
xcord1 = []; ycord1 = []
xcord2 = []; ycord2 = []
for i in range(n):
if int(labelMat[i])== 1:
xcord1.append(dataArr[i,1]); ycord1.append(dataArr[i,2])
else:
xcord2.append(dataArr[i,1]); ycord2.append(dataArr[i,2])
fig = plt.figure()
ax = fig.add_subplot(111)
ax.scatter(xcord1, ycord1, s=30, c='red', marker='s')
ax.scatter(xcord2, ycord2, s=30, c='green')
x = arange(-3.0, 3.0, 0.1)
y = (-weights[0]-weights[1]*x)/weights[2]+intercept+1
ax.plot(x, y)
plt.xlabel('X1'); plt.ylabel('X2');
plt.show()
if __name__=='__main__':
dataMat,labelMat=loadDataSet()
coef,intercept=sk_lr(dataMat,labelMat)
plotBestFit1(coef,intercept)