[深度之眼機器學習訓練營第四期]對數概率迴歸

時間 2020-01-28

標籤深度機器學習訓練營第四對數概率迴歸简体版

原文原文鏈接

基本概念

對數概率迴歸（Logistic Regression，又稱邏輯迴歸）能夠用來解決二分類和多分類問題。分類問題中，輸出集合再也不是連續值，而是離散值，即\(\mathcal{Y}\in \{0,1,2,\cdots\}\)。以二分類問題爲例，其輸出集合通常爲\(\mathcal{Y}\in \{0,1\}\)。函數

爲了解決二分類問題，對數概率迴歸在線性迴歸的基礎上引入Sigmoid函數（Logistic函數），其中\(\exp(\cdot)\)是天然指數：
\[ g(z) = \dfrac{1}{1 +\exp({-z})}\\ \]
該函數的值域爲\([0,1]\)，以下圖所示：

所以，對數概率迴歸中假設集的定義爲：
\[ h_\theta (x) = g ( \theta^T x ) \]學習

實際上，\(h_{\theta}(x)\)給出了在給定參數\(\theta\)和樣本\(x\)的條件下，標籤\(y=1\)的機率。
\[ \begin{aligned}& h_\theta(x) = P(y=1 | x ; \theta) = 1 - P(y=0 | x ; \theta) \\& P(y = 0 | x;\theta) + P(y = 1 | x ; \theta) = 1\end{aligned} \]優化

損失函數

對數概率迴歸的損失函數以下所示：
\[ J(\theta) = \dfrac{1}{n} \sum_{i=1}^N \mathrm{Cost}(h_\theta(x^{(i)}),y^{(i)}) \\ \mathrm{Cost}(h_\theta(x^{(i)}),y^{(i)}) =\left\{ \begin{aligned} &-\log(h_\theta(x^{(i)})) \; & \text{if }y^{(i)} = 1\\ &-\log(1-h_\theta(x^{(i)})) \; & \text{if } y^{(i)} = 0 \end{aligned} \right. \]
該損失函數經過極大似然法導出。對於給定的輸入集\(\mathcal{X}\)和輸出集\(\mathcal{Y}\)，其似然函數爲：
\[ \prod _{i = 1}^n \left[h_\theta(x^{(i)})\right]^{y^{(i)}}\left[1 - h_\theta(x^{(i)})\right]^{1 - y^{(i)}} \]spa

因爲連乘很差優化，所以上式兩邊取對數，轉化成連加的形式，獲得對數似然函數：
\[ L(\theta)=\frac{1}{n} \sum _{i=1}^n \left[ y^{(i)} \log h_\theta(x^{(i)}) + (1-y^{(i)})\log(1 - h_\theta(x^{(i)})) \right ] \]
最大化上述對數似然函數就能夠獲得最優的參數\(\theta\)。而最大化對數似然函數\(L(\theta)\)等價於最小化\(- L(\theta)\)，所以咱們能夠獲得以下損失函數的形式：
\[ J(\theta) = -\frac{1}{n} \sum _{i=1}^n \left[ y^{(i)} \log h_\theta(x^{(i)}) + (1-y^{(i)})\log(1 - h_\theta(x^{(i)})) \right ] \]blog

參數學習

獲得損失函數後，須要使用梯度降低法求解該函數的最小值。首先，將損失函數進行化簡：
\[ \begin{aligned} J(\theta) &=-\frac{1}{n} \sum _{i=1}^N \left[ y^{(i)} \log h_\theta(x^{(i)}) + (1-y^{(i)})\log(1 - h_\theta(x^{(i)})) \right ] \\ &=-\frac{1}{n} \sum _{i=1}^n \left[ y^{(i)}\log \frac {h_\theta(x^{(i)})} {1 - h_\theta(x^{(i)})} + \log(1 - h_\theta(x^{(i)})) \right ] \\ &=-\frac{1}{n} \sum _{i=1}^n \left[ y^{(i)} \log \frac { {\exp(\theta\cdot x^{(i)})} / (1 + \exp(\theta\cdot x^{(i)}))} {{1} /(1 + \exp(\theta\cdot x^{(i)}))} + \log(1 - h_\theta(x^{(i)})) \right ] \\ &=-\frac{1}{n} \sum _{i=1}^n \left[ y_i (\theta\cdot x^{(i)}) + \log(1 + \exp (\theta\cdot x^{(i)})) \right ] \end{aligned} \]圖片

求解損失函數\(J(\theta)\)對參數\(\theta\)的偏導數：
\[ \begin{aligned} \frac{\partial}{\partial \theta}J(\theta) &=-\frac{1}{n} \sum _{i=1}^n \left [y^{(i)} \cdot x^{(i)} - \frac {1} {1 + \exp(\theta \cdot x^{(i)})} \cdot \exp(\theta \cdot x^{(i)}) \cdot x^{(i)}\right ] \\ &=-\frac{1}{n} \sum _{i=1}^n \left [y^{(i)} \cdot x^{(i)} - \frac {\exp(\theta \cdot x^{(i)})} {1 + \exp(\theta \cdot x^{(i)})} \cdot x^{(i)}\right ] \\ &=-\frac{1}{n} \sum _{i=1}^n \left (y^{(i)} - \frac {\exp(\theta \cdot x^{(i)})} {1 + \exp(\theta \cdot x^{(i)})} \right ) x^{(i)}\\ &=\frac{1}{n} \sum _{i=1}^n \left (h_\theta(x^{(i)})-y^{(i)} \right )x^{(i)} \end{aligned} \]數學

使用梯度降低法逐個更新參數：
\[ \theta_j := \theta_j - \frac{\alpha}{n} \sum_{i=1}^n \left(h_\theta(x^{(i)}) - y^{(i)}\right) x_j^{(i)} \]io