臺大機器學習——感知機

時間 2019-12-04

標籤臺大機器學習感知简体版

原文原文鏈接

臺大機器學習筆記——感知機

最近發現本身機器學習學的不夠系統，不少知識點都存在欠缺，全部準備找一個稍微有一點難度的課程仔細學習一下。臺大的機器學習比較有深度並且講的深刻淺出,因此我準備一邊學習臺大機器學習的課程一邊作一些讀書筆記。我也決定使用IPython-notebook來進行記錄筆記。python

課程前言

What is machine learning?

learning：acquiring skill with experience accumulated from observation算法

%%dot 
digraph G {
       rankdir=LR; observations -> learing -> skill
    }

Machine learning：acquiring skill with experience accumulated/computed from dataapi

%%dot 
digraph G {
        rankdir=LR;
        data -> ML -> skill;
    }

skill: improve some proformance measure（e.g. prediction accuracy）app

%%dot 
digraph G {
        rankdir=LR;
        data -> ML -> "improved performance measure";
    }

Why use machine learning

when human cannot program the system manually
when human cannot 'define the solution' easily
when needing rapid decisions that humans cannot do (high-frequency trading)
when needing to service a massive scale user（大量的我的化服務）

Key Essence of Machine Learning

exists some 'underlying pattern' to be learned ('performance measure' can be improved)
but no programmable (easy) definition
somehow there is data aboun pattern

Formalize the learning Problem

input: $ x\in X$ (customer appilcation)
output: $ y\in Y$ (good/bad after approving credit card)
unkonwn pattern to be learned $\Leftrightarrow$ target function: $ f: X \to Y $(ideal credit approval formula)
data $\Leftrightarrow$ training examples: $ D={(x_1, y_1), (x_2, y_2), ... , (x_N, y_N)} $
hypothesis $\Leftrightarrow$ skill with hopefully good performance: $g : X \to Y $('learned' formula to be used)

機器學習的通常流程

%%dot
digraph G {

rankdir=LR;a -> b;
b->c;
c -> d;
e ->c;
a [shape=box,sides=4,skew=.4,color=lightblue,style=filled,label="想要獲得的目標函數\n f: x->y"];
b [shape=box,sides=4,skew=.4,label="訓練樣本\n D:(x_1,y_1),...,(x_n,y_n)"]
c [label="learning algorithm A"];
d [shape=box, label="最終獲得的目標函數\n g"];
e [label="假設空間\n H"];
}

Perceptron Learning Algorithm (感知機)

感知機的通常工做流程以下：
咱們要找到一個w使得 $y = w^Tx$可以剛好分割咱們的數據集。
$w_0$初始化爲0機器學習

For t = 0,1,...ide

在$w_t$下計算找到錯誤點$(x_{n(t)}, y_{n(t)})$ 即 $sign(w_t^Tx_{n(t)})\not=y_{n(t)}$
嘗試去修正錯誤，$w$的更新方法爲 $w_{t+1}\gets w_t + y_{n(t)}x_{n(t)}$

該方法精髓就是知錯能改，—— A fault confessed is half redressed！函數

Guarantee of PLA

首先PLA要終止即必須數據集線性可分，那麼是否數據集線性可分PLA必定會終止呢？學習

$w_f$是可以劃分數據集的完美曲線，因此有：ui

$y_{n(t)}w_f^Tx_{n(t)}\ge\min\limits_{n}y_{n(t)}w_f^Tx_{n(t)}>0$idea

咱們能夠推導出$w_f^Tw_t$隨着$(x_{n(t)},y_{n(t)})$的更新，會愈來愈大。

\[ w_f^Tw_{t+1} = w_f(w_t + y_{n(t)}x_{n(t)}) \ge w_f^Tw_t + \min\limits_{n}y_nw_f^Tx_n > w_f^Tw_t + 0 \]

$y_{n(t)}w_f^Tx_{n(t)}$在不斷變大意味着兩點

這兩個向量越來接近了
w的長度在變大

下面咱們要證實w的長度是有上界的，咱們有：

\[ \begin{align} ||w_{t+1}||^2 &= || w_t + y_{n(t)}x_{n(t)}||^2 \\ &= ||w_t||^2 + 2y_{n(t)}w_t^Tx_{n(t)} + ||y_{n(t)}x_{n(t)}||^2 \\ &\le ||w_t||^2 + 0 + ||y_{n(t)}x_{n(t)}||^2 \\ &\le ||w_t||^2 + \max\limits_{n}||y_nx_n||^2 \end{align} \]

即有:$||w_t||^2 \le T\max\limits_{n}||y_nx_n||^2$，對於一個固定的訓練集來講後者是一個固定值。這說明隨着迭代$w_f，w_t$愈來愈接近了。

並且咱們能夠證實更新次數T是有上界的：

\[ \frac{w_f^T}{||w_f||}\frac{w_T}{||w_T||}\ge\sqrt{T}\cdot constant \]

證實以下：

\[ \begin{align} w_f^Tw_t &= w_f^T(w_{t-1} + y_{n-1}x_{n-1}) \\ &\ge w_f^Tw_{t-1} + \min\limits_{n}y_nw_f^Tx_n \\ &\ge w_0 + T*\min\limits_{n}y_nw_f^Tx_n \\ &\ge T*\min\limits_{n}y_nw_f^Tx_n \end{align} \]

對$||w_t||^2$而言，咱們有

\[ \begin{align} ||w_t||^2 &= ||w_{t-1} + y_{n(t-1)}x_{n(t-1)}||^2 \\ & \le T\max\limits_{n}||x_n||^2 \end{align} \]

根據以上能夠得出

\[ \frac{w_f^T}{||w_f||}\frac{w_T}{||w_T||}\ge\sqrt{T}\cdot \frac{\min\limits_{n}y_nw_f^Tx_n}{||w_f||\sqrt{\max\limits_{n}||x_n||^2}} \]
咱們有

\[ \frac{w_f^T}{||w_f||}\frac{w_T}{||w_T||}\le1 \]

最後咱們能夠得出

\[ T\le\frac{\max\limits_{n}||x_n||^2\cdot ||w_f||^2}{{\min\limits_{n}}^2y_nw_f^Tx_n} \]