迴歸就是對已知公式的未知參數進行估計。好比已知公式是$y = a*x + b$,未知參數是a和b,利用多真實的(x,y)訓練數據對a和b的取值去自動估計。估計的方法是在給定訓練樣本點和已知的公式後,對於一個或多個未知參數,機器會自動枚舉參數的全部可能取值,直到找到那個最符合樣本點分佈的參數(或參數組合)。python
設X是連續隨機變量,X服從logistic分佈是指X具備下列分佈函數和密度函數:算法
$$F(x)=P(x \le x)=\frac 1 {1+e^{-(x-\mu)/\gamma}}\ f(x)=F^{'}(x)=\frac {e^{-(x-\mu)/\gamma}} {\gamma(1+e^{-(x-\mu)/\gamma})^2}$$數組
其中,$\mu$爲位置參數,$\gamma$爲形狀參數。app
$f(x)$與$F(x)$圖像以下,其中分佈函數是以$(\mu, \frac 1 2)$爲中心對陣,$\gamma$越小曲線變化越快。
函數
二項logistic迴歸模型以下:優化
$$P(Y=1|x)=\frac {exp(w \cdot x + b)} {1 + exp(w \cdot x + b)} \ P(Y=0|x)=\frac {1} {1 + exp(w \cdot x + b)}$$code
其中,$x \in R^n$是輸入,$Y \in {0,1}$是輸出,w稱爲權值向量,b稱爲偏置,$w \cdot x$爲w和x的內積。blog
假設:ip
$$P(Y=1|x)=\pi (x), \quad P(Y=0|x)=1-\pi (x)$$utf-8
則似然函數爲:
$$\prod_{i=1}^N [\pi (x_i)]^{y_i} [1 - \pi(x_i)]^{1-y_i}
$$
求對數似然函數:
$$L(w) = \sum_{i=1}^N [y_i \log{\pi(x_i)} + (1-y_i) \log{(1 - \pi(x_i)})]\ =\sum_{i=1}^N [y_i \log{\frac {\pi (x_i)} {1 - \pi(x_i)}} + \log{(1 - \pi(x_i)})]$$
從而對$L(w)$求極大值,獲得$w$的估計值。
求極值的方法能夠是梯度降低法,梯度上升法等。
logistic迴歸的sigmoid函數:
$$\sigma (z) = \frac 1 {1 + e^{-z}}$$
假設logstic的函數爲:
$$y = w_0 + w_1 x_1 + w_2 x_2 + ... + w_n x_n$$
可簡寫爲:
$$y = w_0 + w^T x$$
梯度上升算法是按照上升最快的方向不斷移動,每次都增長$\alpha \nabla_w f(w)$,
$$w = w + \alpha \nabla_w f(w) $$
其中,$\nabla_w f(w)$爲函數導數,$\alpha$爲增加的步長。
本算法的主要思想根據logistic迴歸的sigmoid函數來將函數值映射到有限的空間內,sigmoid函數的取值範圍是0~1,從而能夠把數據按照0和1分爲兩類。在算法中,首先要初始化全部的w權值爲1,每次計算一次偏差並根據偏差調整w權值的大小。
#!/usr/bin/env python # encoding:utf-8 import math import numpy import time import matplotlib.pyplot as plt def sigmoid(x): return 1.0 / (1 + numpy.exp(-x)) def loadData(): dataMat = [] laberMat = [] with open("test.txt", 'r') as f: for line in f.readlines(): arry = line.strip().split() dataMat.append([1.0, float(arry[0]), float(arry[1])]) laberMat.append(float(arry[2])) return numpy.mat(dataMat), numpy.mat(laberMat).transpose() def gradAscent(dataMat, laberMat, alpha=0.001, maxCycle=500): """general gradscent""" start_time = time.time() m, n = numpy.shape(dataMat) weights = numpy.ones((n, 1)) for i in range(maxCycle): h = sigmoid(dataMat * weights) error = laberMat - h weights += alpha * dataMat.transpose() * error duration = time.time() - start_time print "duration of time:", duration return weights def stocGradAscent(dataMat, laberMat, alpha=0.01): start_time = time.time() m, n = numpy.shape(dataMat) weights = numpy.ones((n, 1)) for i in range(m): h = sigmoid(dataMat[i] * weights) error = laberMat[i] - h weights += alpha * dataMat[i].transpose() * error duration = time.time() - start_time print "duration of time:", duration return weights def betterStocGradAscent(dataMat, laberMat, alpha=0.01, numIter=150): """better one, use a dynamic alpha""" start_time = time.time() m, n = numpy.shape(dataMat) weights = numpy.ones((n, 1)) for j in range(numIter): for i in range(m): alpha = 4 / (1 + j + i) + 0.01 h = sigmoid(dataMat[i] * weights) error = laberMat[i] - h weights += alpha * dataMat[i].transpose() * error duration = time.time() - start_time print "duration of time:", duration return weights start_time = time.time() def show(dataMat, laberMat, weights): m, n = numpy.shape(dataMat) min_x = min(dataMat[:, 1])[0, 0] max_x = max(dataMat[:, 1])[0, 0] xcoord1 = []; ycoord1 = [] xcoord2 = []; ycoord2 = [] for i in range(m): if int(laberMat[i, 0]) == 0: xcoord1.append(dataMat[i, 1]); ycoord1.append(dataMat[i, 2]) elif int(laberMat[i, 0]) == 1: xcoord2.append(dataMat[i, 1]); ycoord2.append(dataMat[i, 2]) fig = plt.figure() ax = fig.add_subplot(111) ax.scatter(xcoord1, ycoord1, s=30, c="red", marker="s") ax.scatter(xcoord2, ycoord2, s=30, c="green") x = numpy.arange(min_x, max_x, 0.1) y = (-weights[0] - weights[1]*x) / weights[2] ax.plot(x, y) plt.xlabel("x1"); plt.ylabel("x2") plt.show() if __name__ == "__main__": dataMat, laberMat = loadData() #weights = gradAscent(dataMat, laberMat, maxCycle=500) #weights = stocGradAscent(dataMat, laberMat) weights = betterStocGradAscent(dataMat, laberMat, numIter=80) show(dataMat, laberMat, weights)
未優化的程序結果以下,
隨機梯度上升算法(下降了迭代的次數,算法較快,但結果不夠準確)結果以下,
對$\alpha$進行優化,動態調整步長(一樣下降了迭代次數,可是因爲代碼採用動態調整alpha,提升告終果的準確性),結果以下