臺大機器學習——感知機

臺大機器學習筆記——感知機

最近發現本身機器學習學的不夠系統,不少知識點都存在欠缺,全部準備找一個稍微有一點難度的課程仔細學習一下。臺大的機器學習比較有深度並且講的深刻淺出,因此我準備一邊學習臺大機器學習的課程一邊作一些讀書筆記。我也決定使用IPython-notebook來進行記錄筆記。python

課程前言

What is machine learning?

learning:acquiring skill with experience accumulated from observation算法

%%dot 
digraph G {
       rankdir=LR; observations -> learing -> skill
    }

Machine learning:acquiring skill with experience accumulated/computed from dataapi

%%dot 
digraph G {
        rankdir=LR;
        data -> ML -> skill;
    }

skill: improve some proformance measure(e.g. prediction accuracy)app

%%dot 
digraph G {
        rankdir=LR;
        data -> ML -> "improved performance measure";
    }

Why use machine learning

  • when human cannot program the system manually
  • when human cannot 'define the solution' easily
  • when needing rapid decisions that humans cannot do (high-frequency trading)
  • when needing to service a massive scale user(大量的我的化服務)

Key Essence of Machine Learning

  • exists some 'underlying pattern' to be learned ('performance measure' can be improved)
  • but no programmable (easy) definition
  • somehow there is data aboun pattern

Formalize the learning Problem

  • input: $ x\in X$ (customer appilcation)
  • output: $ y\in Y$ (good/bad after approving credit card)
  • unkonwn pattern to be learned \(\Leftrightarrow\) target function: $ f: X \to Y $(ideal credit approval formula)
  • data \(\Leftrightarrow\) training examples: $ D={(x_1, y_1), (x_2, y_2), ... , (x_N, y_N)} $
  • hypothesis \(\Leftrightarrow\) skill with hopefully good performance: $g : X \to Y $('learned' formula to be used)

機器學習的通常流程

%%dot
digraph G {

rankdir=LR;a -> b;
b->c;
c -> d;
e ->c;
a [shape=box,sides=4,skew=.4,color=lightblue,style=filled,label="想要獲得的目標函數\n f: x->y"];
b [shape=box,sides=4,skew=.4,label="訓練樣本\n D:(x_1,y_1),...,(x_n,y_n)"]
c [label="learning algorithm A"];
d [shape=box, label="最終獲得的目標函數\n g"];
e [label="假設空間\n H"];
}

Perceptron Learning Algorithm (感知機)

感知機的通常工做流程以下:
咱們要找到一個w使得 \(y = w^Tx\)可以剛好分割咱們的數據集。
\(w_0\)初始化爲0機器學習

For t = 0,1,...ide

  • \(w_t\)下計算找到錯誤點\((x_{n(t)}, y_{n(t)})\)\(sign(w_t^Tx_{n(t)})\not=y_{n(t)}\)
  • 嘗試去修正錯誤,\(w\)的更新方法爲 \(w_{t+1}\gets w_t + y_{n(t)}x_{n(t)}\)

該方法精髓就是知錯能改,—— A fault confessed is half redressed!函數

Guarantee of PLA

首先PLA要終止即必須數據集線性可分,那麼是否數據集線性可分PLA必定會終止呢?學習

\(w_f\)是可以劃分數據集的完美曲線,因此有:ui

\(y_{n(t)}w_f^Tx_{n(t)}\ge\min\limits_{n}y_{n(t)}w_f^Tx_{n(t)}>0\)idea

咱們能夠推導出\(w_f^Tw_t\)隨着\((x_{n(t)},y_{n(t)})\)的更新,會愈來愈大。

\[ w_f^Tw_{t+1} = w_f(w_t + y_{n(t)}x_{n(t)}) \ge w_f^Tw_t + \min\limits_{n}y_nw_f^Tx_n > w_f^Tw_t + 0 \]

\(y_{n(t)}w_f^Tx_{n(t)}\)在不斷變大意味着兩點

  1. 這兩個向量越來接近了
  2. w的長度在變大

下面咱們要證實w的長度是有上界的,咱們有:

\[ \begin{align} ||w_{t+1}||^2 &= || w_t + y_{n(t)}x_{n(t)}||^2 \\ &= ||w_t||^2 + 2y_{n(t)}w_t^Tx_{n(t)} + ||y_{n(t)}x_{n(t)}||^2 \\ &\le ||w_t||^2 + 0 + ||y_{n(t)}x_{n(t)}||^2 \\ &\le ||w_t||^2 + \max\limits_{n}||y_nx_n||^2 \end{align} \]

即有:\(||w_t||^2 \le T\max\limits_{n}||y_nx_n||^2\),對於一個固定的訓練集來講後者是一個固定值。這說明隨着迭代\(w_f,w_t\)愈來愈接近了。

並且咱們能夠證實更新次數T是有上界的:

\[ \frac{w_f^T}{||w_f||}\frac{w_T}{||w_T||}\ge\sqrt{T}\cdot constant \]

證實以下:

\[ \begin{align} w_f^Tw_t &= w_f^T(w_{t-1} + y_{n-1}x_{n-1}) \\ &\ge w_f^Tw_{t-1} + \min\limits_{n}y_nw_f^Tx_n \\ &\ge w_0 + T*\min\limits_{n}y_nw_f^Tx_n \\ &\ge T*\min\limits_{n}y_nw_f^Tx_n \end{align} \]

\(||w_t||^2\)而言,咱們有

\[ \begin{align} ||w_t||^2 &= ||w_{t-1} + y_{n(t-1)}x_{n(t-1)}||^2 \\ & \le T\max\limits_{n}||x_n||^2 \end{align} \]

根據以上能夠得出

\[ \frac{w_f^T}{||w_f||}\frac{w_T}{||w_T||}\ge\sqrt{T}\cdot \frac{\min\limits_{n}y_nw_f^Tx_n}{||w_f||\sqrt{\max\limits_{n}||x_n||^2}} \]
咱們有

\[ \frac{w_f^T}{||w_f||}\frac{w_T}{||w_T||}\le1 \]

最後咱們能夠得出

\[ T\le\frac{\max\limits_{n}||x_n||^2\cdot ||w_f||^2}{{\min\limits_{n}}^2y_nw_f^Tx_n} \]

PLA的優缺點

  • 優勢是簡單,快速,適合推廣到高維
  • 缺點是數據集必須線性可分PLA才能中止,並且咱們沒法肯定算法的運行時間

Pocket Algorithm

現實中咱們很可貴到徹底線性可分的數據集,數據集可能會有噪聲,爲了應對噪聲,咱們採用新的權值更新策略

權值初始化爲\(\hat{w}\),並將其保存,至關於把最好的w放入到口袋中。

For t = 0,1,...

  • \(w_t\)隨機找到錯誤點\((x_{n(t)}, y_{n(t)})\)\(sign(w_t^Tx_{n(t)})\not=y_{n(t)}\)
  • 嘗試去修正錯誤,\(w\)的更新方法爲 \(w_{t+1}\gets w_t + y_{n(t)}x_{n(t)}\)
  • 若是\(w_{t+1}\)這條線產生的錯誤小於\(\hat{w}\),則用\(w_{t+1}\)代替\(\hat{w}\).
相關文章
相關標籤/搜索