本篇文章主要針對《周志華西瓜書》、《南瓜書》的筆記總結,思路梳理。函數
線性模型:屬性的線性組合
\[ f(\boldsymbol{x})=w_{1} x_{1}+w_{2} x_{2}+\ldots+w_{d} x_{d}+b=\omega^Tx+b \]
蘊含的基本思想:spa
許多功能強大的非線性模型是基於線性模型的基礎上而引入層級結構
或高維映射
獲得的code
權值 \(\omega\) 能直觀表達各屬性
在預測中的重要性blog
數據集的表示:it
對於迴歸問題:目標就是讓 \(Loss function\) 最小化io
\(Loss \ function:\) 最小二乘偏差
\[ \begin{aligned}\left(w^{*}, b^{*}\right) &=\underset{(w, b)}{\arg \min } \sum_{i=1}^{m}\left(f\left(x_{i}\right)-y_{i}\right)^{2} \\ &=\underset{(w, b)}{\arg \min } \sum_{i=1}^{m}\left(y_{i}-w x_{i}-b\right)^{2} \end{aligned} \]
先對\(b\)求偏導:
\[ 2\sum_{i=1}^m(y_i-\omega x_i-b)(-1)=0\\ \Rightarrow b={} \frac{1}{m} \sum_{i=1}^{m}\left(y_{i}-w x_{i}\right) \]
再對\(\omega\)求偏導:
\[ \begin{aligned} 0 &=w \sum_{i=1}^{m} x_{i}^{2}-\sum_{i=1}^{m}\left(y_{i}-b\right) x_{i} \\ \Rightarrow & w \sum_{i=1}^{m} x_{i}^{2}\sum_{i=1}^{m} y_{i} x_{i}-\sum_{i=1}^{m} b x_{i} \end{aligned}\\ \]
將\(b\)代入上式中
\[ \Rightarrow w \sum_{i=1}^{m} x_{i}^{2}=\sum_{i=1}^{m} y_{i} x_{i}-\sum_{i=1}^{m}(\bar{y}-w \bar{x}) x_{i}\\ \Rightarrow w\left(\sum_{i=1}^{m} x_{i}^{2}-\bar{x} \sum_{i=1}^{m} x_{i}\right)=\sum_{i=1}^{m} y_{i} x_{i}-\bar{y} \sum_{i=1}^{m} x_{i} \\ \Rightarrow w=\frac{\sum_{i=1}^{m} y_{i} x_{i}-\bar{y} \sum_{i=1}^{m} x_{i}}{\sum_{i=1}^{m} x_{i}^{2}-\bar{x} \sum_{i=1}^{m} x_{i}} \]function
由如下兩個等式能夠轉換爲西瓜書上的公式:【技巧】
\[ \begin{aligned} \bar{y} \sum_{i=1}^{m} x_{i} ={} & \frac{1}{m} \sum_{i=1}^{m} y_{i} \sum_{i=1}^{m} x_{i}=\bar{x} \sum_{i=1}^{m} y_{i} \\ \bar{x} \sum_{i=1}^{m} x_{i} ={} & \frac{1}{m} \sum_{i=1}^{m} x_{i} \sum_{i=1}^{m} x_{i}=\frac{1}{m}\left(\sum_{i=1}^{m} x_{i}\right)^{2} \end{aligned} \]class
最終可得:
\[ \Rightarrow w=\frac{\sum_{i=1}^{m} y_{i}\left(x_{i}-\bar{x}\right)}{\sum_{i=1}^{m} x_{i}^{2}-\frac{1}{m}\left(\sum_{i=1}^{m} x_{i}\right)^{2}} \]
能夠求解得:\(\omega\) 和 \(b\) 最優解的閉式解
\[ \begin{aligned} w={} &\frac{\sum_{i=1}^{m} y_{i}\left(x_{i}-\bar{x}\right)}{\sum_{i=1}^{m} x_{i}^{2}-\frac{1}{m}\left(\sum_{i=1}^{m} x_{i}\right)^{2}}\\ b={} &\frac{1}{m} \sum_{i=1}^{m}\left(y_{i}-w x_{i}\right) \end{aligned} \]基礎
將\(\frac{1}{m}\left(\sum_{i=1}^{m} x_{i}\right)^{2}=\bar{x} \sum_{i=1}^{m} x_{i}\)代入分母可得:
\[ \begin{aligned} w &=\frac{\sum_{i=1}^{m} y_{i}\left(x_{i}-\bar{x}\right)}{\sum_{i=1}^{m} x_{i}^{2}-\bar{x} \sum_{i=1}^{m} x_{i}} \\ &=\frac{\sum_{i=1}^{m}\left(y_{i} x_{i}-y_{i} \bar{x}\right)}{\sum_{i=1}^{m}\left(x_{i}^{2}-x_{i} \bar{x}\right)} \end{aligned} \]
由如下兩個等式:【技巧】
\[ \bar{y} \sum_{i=1}^{m} x_{i}=\bar{x} \sum_{i=1}^{m} y_{i}=\sum_{i=1}^{m} \bar{y} x_{i}=\sum_{i=1}^{m} \bar{x} y_{i}=m \bar{x} \bar{y}=\sum_{i=1}^{m} \bar{x} \bar{y} \\ \sum_{i=1}^mx_i\bar{x}=\bar{x} \sum_{i=1}^{m} x_{i}=\bar{x} \cdot m \cdot \frac{1}{m} \cdot \frac{1}{m} \cdot \sum_{i=1}^{m} x_{i}=m \bar{x}^{2}=\sum_{i=1}^{m} \bar{x}^{2} \]
可將\(\omega\)的表達式化爲:
\[ \begin{aligned} w &=\frac{\sum_{i=1}^{m}\left(y_{i} x_{i}-y_{i} \bar{x}-x_{i} \bar{y}+\bar{x} \bar{y}\right)}{\sum_{i=1}^{m}\left(x_{i}^{2}-x_{i} \bar{x}-x_{i} \bar{x}+\bar{x}^{2}\right)} \\ &=\frac{\sum_{i=1}^{m}\left(x_{i}-\bar{x}\right)\left(y_{i}-\bar{y}\right)}{\sum_{i=1}^{m}\left(x_{i}-\bar{x}\right)^{2}} \end{aligned} \]
令\(\boldsymbol{x}_{d}=\left(x_{1}-\bar{x}, x_{2}-\bar{x}, \ldots, x_{m}-\bar{x}\right)^{T}\),\(\boldsymbol{y}_{d}=\left(y_{1}-\bar{y}, y_{2}-\bar{y}, \dots, y_{m}-\bar{y}\right)^{T}\)
向量化結果爲:
\[ w=\frac{\boldsymbol{x}_{d}^{T} \boldsymbol{y}_{d}}{\boldsymbol{x}_{d}^{T} \boldsymbol{x}_{d}} \]
推導過程:對於單元線性迴歸,咱們是先對\(b\)求偏導,再將\(b\)代入對\(\omega\)的偏導中
而對於多元沒法經過偏導直接求解出\(Loss\ function\)的極值,改寫表達式
\[ f(x_{i})=\omega^T x_i +b = \beta^T X \]
其中
\[ \beta = (\omega,b) \\ X = (x_i,1) \]
詳細推導以下:(手寫懶得手打了)
將損失函數求導:
對於上式中若\((X^TX)\)是滿秩矩陣或正定矩陣,則存在逆矩陣
例如,生物信息學的基因芯片數據中常有成千上萬個屬性,但樣例僅爲幾十或上百。此時可解出多個\(\hat{\omega}\),它們都能使均方偏差最小化。常見的作法是引入正則項解決這問題。