深度學習基礎階段

時間 2019-11-12

標籤深度學習基礎階段简体版

原文原文鏈接

IDH_1000=1.1 課程介紹 & 機器學習介紹.html ; Evernote Export
IDH_1001=1.2 深度學習(Deep Learning)介紹.html ; Evernote Export
IDH_1002=2 基本概念 (Basic Concepts).html ; Evernote Export
IDH_1003=3.1 決策樹(decision tree)算法.html ; Evernote Export
IDH_1004=3.2 決策樹(decision tree)應用.html ; Evernote Export
IDH_1005=4.1 最鄰近規則分類(K-Nearest Neighbor)KNN算法.html ; Evernote Export
IDH_1006=4.2 最鄰近規則分類(K-Nearest Neighbor)KNN算法應用.html ; Evernote Export
IDH_1007=5.1 支持向量機(SVM)算法（上）.html ; Evernote Export
IDH_1008=5.1 支持向量機(SVM)算法（上）應用.html ; Evernote Export
IDH_1009=5.2 支持向量機(SVM)算法（下）.html ; Evernote Export
IDH_1010=5.3 支持向量機(SVM)算法（下）應用.html ; Evernote Export
IDH_1011=6.1 神經網絡算法(Nerual Networks)（上）.html ; Evernote Export
IDH_1012=6.2 神經網絡算法(Nerual Networks)應用(上）.html ; Evernote Export
IDH_1013=6.3 神經網絡算法(Nerual Networks)應用(下）.html ; Evernote Export
IDH_1014=7.1 簡單線性迴歸 (Simple Linear Regression)上.html ; Evernote Export
IDH_1015=7.1 簡單線性迴歸 (Simple Linear Regression)下.html ; Evernote Export
IDH_1016=7.3　多元迴歸分析(multiple regression).html ; Evernote Export
IDH_1017=7.4　多元迴歸分析(multiple regression)應用.html ; Evernote Export
IDH_1018=7.5 非線性迴歸 logistic regression.html ; Evernote Export
IDH_1019=7.6 非線性迴歸應用：losgistic regression application.html ; Evernote Export
IDH_1020=7.7 迴歸中的相關度和R平方值.html ; Evernote Export
IDH_1021=7.8 迴歸中的相關度和R平方值應用.html ; Evernote Export
IDH_1022=8.1 聚類(Clustering) K-means算法.html ; Evernote Export
IDH_1023=8.2 聚類(Clustering) K-means算法應用.html ; Evernote Export
IDH_1024=8.3 聚類(Clustering) hierarchical clustering 層次聚類.html ; Evernote Export
IDH_1025=8.4 聚類(Clustering) hierarchical clustering 層次聚類應用.html ; Evernote Exporthtml

1.1 課程介紹 & 機器學習介紹node

1. 課程介紹git

2. 機器學習（Machine Learning, ML)

2.1 概念：多領域交叉學科，涉及機率論、統計學、逼近論、凸分析、算法複雜度理論等多門學科。專門研究計算機怎樣模擬或實現人類的學習行爲，以獲取新的知識或技能，從新組織已有的知識結構使之不斷改善自身的性能。

2.2 學科定位：人工智能(Artificial Intelligence, AI）的核心，是使計算機具備智能的根本途徑，其應用遍佈人工智能的各個領域，它主要使用概括、綜合而不是演繹。

2.3 定義：探究和開發一系列算法來如何使計算機不須要經過外部明顯的指示，而能夠本身經過數據來學習，建模，而且利用建好的模型和新的輸入來進行預測的學科。

Arthur Samuel (1959): 一門不須要經過外部程序指示而讓計算機有能力自我學習的學科

Langley（1996) ：「機器學習是一門人工智能的科學，該領域的主要研究對象是人工智能，特別是如何在經驗學習中改善具體算法的性能」

Tom Michell (1997): 「機器學習是對能經過經驗自動改進的計算機算法的研究」

2.4：學習：針對經驗E (experience) 和一系列的任務 T (tasks) 和必定表現的衡量 P，若是隨之經驗E的積累，針對定義好的任務T能夠提升表現P，就說計算機具備學習能力

例子：下棋，語音識別，自動駕駛汽車等

3. 機器學習的應用：

語音識別

自動駕駛

語言翻譯

計算機視覺

推薦系統

無人機

識別垃圾郵件

4. Demo：

人臉識別

無人駕駛汽車

電商推薦系統

5. 置業市場需求：LinkedIn全部職業技能需求量第一：機器學習，數據挖掘和統計分析人才

1.2 深度學習(Deep Learning)介紹

1. 機器學習更多應用舉例：人臉識別

2. 機器學習就業需求：

LinkedIn全部職業技能需求量第一：機器學習，數據挖掘和統計分析人才

http://blog.linkedin.com/2014/12/17/the-25-hottest-skills-that-got-people-hired-in-2014/

3. 深度學習(Deep Learning)

3.1 什麼是深度學習？

          深度學習是基於機器學習延伸出來的一個新的領域，由以人大腦結構爲啓發的神經網絡算法爲起源加之模型結構深度的增長髮展，並伴隨大數據和計算能力的提升而產生的一系列新的算法。

3.2 深度學習什麼時間段發展起來的？

          其概念由著名科學家Geoffrey Hinton等人在2006年和2007年在《Sciences》等上發表的文章被提出和興起。

3.3 學習能用來幹什麼？爲何近年來引發如此普遍的關注？

          深度學習，做爲機器學習中延伸出來的一個領域，被應用在圖像處理與計算機視覺，天然語言處理以及語音識別等領域。自2006年至今，學術界和工業界合做在深度學習方面的研究與應用在以上領域取得了突破性的進展。以ImageNet爲數據庫的經典圖像中的物體識別競賽爲例，擊敗了全部傳統算法，取得了史無前例的精確度。

3.4 深度學習目前有哪些表明性的學術機構和公司走在前沿？人才須要如何？

           學校以多倫多大學，紐約大學，斯坦福大學爲表明，工業界以Google, Facebook, 和百度爲表明走在深度學習研究與應用的前沿。Google挖走了Hinton，Facebook挖走了LeCun，百度硅谷的實驗室挖走了Andrew Ng，Google去年4月份以超過5億美金收購了專門研究深度學習的初創公司DeepMind, 深度學習方因技術的發展與人才的稀有形成的人才搶奪戰達到了史無前例激烈的程度。諸多的大大小小(如阿里巴巴，雅虎）等公司也都在跟進，開始涉足深度學習領域，深度學習人才需求量會持續快速增加。

3.5深度學習現在和將來將對咱們生活形成怎樣的影響？

目前咱們使用的Android手機中google的語音識別，百度識圖，google的圖片搜索，都已經使用到了深度學習技術。Facebook在去年名爲DeepFace的項目中對人臉識別的準備率第一次接近人類肉眼（97.25% vs 97.5%)。大數據時代，結合深度學習的發展在將來對咱們生活的影響沒法估量。保守而言，不少目前人類從事的活動都將由於深度學習和相關技術的發展被機器取代，如自動汽車駕駛，無人飛機，以及更加職能的機器人等。深度學習的發展讓咱們第一次看到並接近人工智能的終極目標。

4. 深度學習的應用展現：

4.1 無人駕駛汽車中的路標識別

4.2 Google Now中的語音識別

4.3 百度識圖

4.4 針對圖片，自動生成文字的描述：

「A person riding a motorcycle on a dirt road,」

「A group of young people playing Frisbee,」

「A herd of elephants walking across a dry grass field.」

2 基本概念 (Basic Concepts)

1. 基本概念：訓練集，測試集，特徵值，監督學習，非監督學習，半監督學習，分類，迴歸

2. 概念學習：人類學習概念：鳥，車，計算機

定義：概念學習是指從有關某個布爾函數的輸入輸出訓練樣例中推斷出該布爾函數

3. 例子：學習「享受運動" 這一律念：

小明進行水上運動，是否享受運動取決於不少因素

樣例	天氣	溫度	溼度	風力	水溫	預報	享受運動
1	晴	暖	普通	強	暖	同樣	是
2	晴	暖	大	強	暖	同樣	是
3	雨	冷	大	強	暖	變化	否
4	晴	暖	大	強	冷	變化	是

天氣：晴，陰，雨

溫度：暖，冷

溼度：普通，大

風力：強，弱

水溫：暖，冷

預報：同樣，變化

享受運動：是，否

概念定義在實例(instance)集合之上，這個集合表示爲X。（X：全部可能的日子，每一個日子的值由天氣，溫度，溼度，風力，水溫，預報6個屬性表示。

待學習的概念或目標函數成爲目標概念（target concept), 記作c。

c(x) = 1, 當享受運動時， c(x) = 0 當不享受運動時，c(x)也可叫作y

x: 每個實例

X: 樣例, 全部實例的集合

學習目標：f: X -> Y

4. 訓練集(training set/data)/訓練樣例（training examples): 用來進行訓練，也就是產生模型或者算法的數據集

測試集(testing set/data)/測試樣例 (testing examples)：用來專門進行測試已經學習好的模型或者算法的數據集

特徵向量(features/feature vector)：屬性的集合，一般用一個向量來表示，附屬於一個實例

標記(label): c(x), 實例類別的標記

正例(positive example)

反例(negative example)

5. 例子：研究美國硅谷房價

影響房價的兩個重要因素：面積(平方米），學區（評分1-10）

樣例	面積（平方米）	學區（11.2 深度學習(Deep Learning)介紹-10）	房價（1000$)
1	100	8	1000
2	120	9	1300
3	60	6	800
4	80	9	1100
5	95	5	850

6. 分類 (classification): 目標標記爲類別型數據(category)

迴歸(regression): 目標標記爲連續性數值 (continuous numeric value)

7. 例子：研究腫瘤良性，惡性於尺寸，顏色的關係

特徵值：腫瘤尺寸，顏色

標記：良性/惡性

有監督學習(supervised learning)：訓練集有類別標記(class label)

無監督學習(unsupervised learning)：無類別標記(class label)

半監督學習（semi-supervised learning)：有類別標記的訓練集 + 無標記的訓練集

8. 機器學習步驟框架

8.1 把數據拆分爲訓練集和測試集

8.2 用訓練集和訓練集的特徵向量來訓練算法

8.2 用學習來的算法運用在測試集上來評估算法（可能要設計到調整參數（parameter tuning), 用驗證集（validation set）

100 天：訓練集

10天：測試集（不知道是否」享受運動「，知道6個屬性，來預測每一天是否享受運動）

3.1 決策樹(decision tree)算法

0. 機器學習中分類和預測算法的評估：

準確率
速度
強壯行
可規模性
可解釋性

1. 什麼是決策樹/斷定樹（decision tree)?

斷定樹是一個相似於流程圖的樹結構：其中，每一個內部結點表示在一個屬性上的測試，每一個分支表明一個屬性輸出，而每一個樹葉結點表明類或類分佈。樹的最頂層是根結點。

2. 機器學習中分類方法中的一個重要算法

3. 構造決策樹的基本算法分支根結點

結點

樹葉

3.1 熵（entropy）概念：

信息和抽象，如何度量？

1948年，香農提出了」信息熵(entropy)「的概念

一條信息的信息量大小和它的不肯定性有直接的關係，要搞清楚一件很是很是不肯定的事情，或者

是咱們一無所知的事情，須要瞭解大量信息==>信息量的度量就等於不肯定性的多少

例子：猜世界盃冠軍，假如一無所知，猜多少次？

每一個隊奪冠的概率不是相等的

比特(bit)來衡量信息的多少

變量的不肯定性越大，熵也就越大

3.1 決策樹概括算法（ID3）

1970-1980， J.Ross. Quinlan, ID3算法

選擇屬性判斷結點

信息獲取量(Information Gain)：Gain(A) = Info(D) - Infor_A(D)

經過A來做爲節點分類獲取了多少信息

相似，Gain(income) = 0.029, Gain(student) = 0.151, Gain(credit_rating)=0.048

因此，選擇age做爲第一個根節點

重複。。。

算法：

樹以表明訓練樣本的單個結點開始（步驟1）。

若是樣本都在同一個類，則該結點成爲樹葉，並用該類標號（步驟2 和3）。

不然，算法使用稱爲信息增益的基於熵的度量做爲啓發信息，選擇可以最好地將樣本分類的屬性（步驟6）。該屬性成爲該結點的「測試」或「斷定」屬性（步驟7）。在算法的該版本中，

全部的屬性都是分類的，即離散值。連續屬性必須離散化。

對測試屬性的每一個已知的值，建立一個分枝，並據此劃分樣本（步驟8-10）。

算法使用一樣的過程，遞歸地造成每一個劃分上的樣本斷定樹。一旦一個屬性出如今一個結點上，就沒必要該結點的任何後代上考慮它（步驟13）。

遞歸劃分步驟僅當下列條件之一成立中止：

(a) 給定結點的全部樣本屬於同一類（步驟2 和3）。

(b) 沒有剩餘屬性能夠用來進一步劃分樣本（步驟4）。在此狀況下，使用多數表決（步驟5）。

這涉及將給定的結點轉換成樹葉，並用樣本中的多數所在的類標記它。替換地，能夠存放結

點樣本的類分佈。

(c) 分枝

test_attribute = a i 沒有樣本（步驟11）。在這種狀況下，以 samples 中的多數類

建立一個樹葉（步驟12）

3.1 其餘算法：

C4.5: Quinlan

Classification and Regression Trees (CART): (L. Breiman, J. Friedman, R. Olshen, C. Stone)

共同點：都是貪心算法，自上而下(Top-down approach)

區別：屬性選擇度量方法不一樣： C4.5 （gain ratio), CART(gini index), ID3 (Information Gain)

3.2 如何處理連續性變量的屬性？

4. 樹剪枝葉（避免overfitting)

4.1 先剪枝

4.2 後剪枝

5. 決策樹的優勢：

直觀，便於理解，小規模數據集有效

6. 決策樹的缺點：

處理連續變量很差

類別較多時，錯誤增長的比較快

可規模性通常（

3.2 決策樹(decision tree)應用

1. Python

2. Python機器學習的庫：scikit-learn

2.1：特性：

簡單高效的數據挖掘和機器學習分析

對全部用戶開放，根據不一樣需求高度可重用性

基於Numpy, SciPy和matplotlib

開源，商用級別：得到 BSD許可

2.2 覆蓋問題領域：

分類（classification), 迴歸（regression), 聚類（clustering), 降維(dimensionality reduction)

模型選擇(model selection), 預處理(preprocessing)

3. 使用用scikit-learn

安裝scikit-learn: pip, easy_install, windows installer

安裝必要package：numpy， SciPy和matplotlib，可以使用Anaconda (包含numpy, scipy等科學計算經常使用

package）

安裝注意問題：Python解釋器版本（2.7 or 3.4？）, 32-bit or 64-bit系統

4. 例子：

文檔： http://scikit-learn.org/stable/modules/tree.html

解釋Python代碼

安裝 Graphviz： http://www.graphviz.org/

配置環境變量

轉化dot文件至pdf可視化決策樹：dot -Tpdf iris.dot -o outpu.pdf

1. Python

2. Python機器學習的庫：scikit-learn

2.1：特性：

簡單高效的數據挖掘和機器學習分析

對全部用戶開放，根據不一樣需求高度可重用性

基於Numpy, SciPy和matplotlib

開源，商用級別：得到 BSD許可

2.2 覆蓋問題領域：

分類（classification), 迴歸（regression), 聚類（clustering), 降維(dimensionality reduction)

模型選擇(model selection), 預處理(preprocessing)

3. 使用用scikit-learn

安裝scikit-learn: pip, easy_install, windows installer

安裝必要package：numpy， SciPy和matplotlib，可以使用Anaconda (包含numpy, scipy等科學計算經常使用

package）

安裝注意問題：Python解釋器版本（2.7 or 3.4？）, 32-bit or 64-bit系統

4. 例子：

文檔： http://scikit-learn.org/stable/modules/tree.html

解釋Python代碼

安裝 Graphviz： http://www.graphviz.org/

配置環境變量

轉化dot文件至pdf可視化決策樹：dot -Tpdf iris.dot -o outpu.pdf

4.1 最鄰近規則分類(K-Nearest Neighbor)KNN算法

1. 綜述

1.1 Cover和Hart在1968年提出了最初的鄰近算法

1.2 分類(classification)算法

1.3 輸入基於實例的學習(instance-based learning), 懶惰學習(lazy learning)

2. 例子：

未知電影屬於什麼類型？

3. 算法詳述

3.1 步驟：

爲了判斷未知實例的類別，以全部已知類別的實例做爲參照

選擇參數K

計算未知實例與全部已知實例的距離

選擇最近K個已知實例

根據少數服從多數的投票法則(majority-voting)，讓未知實例歸類爲K個最鄰近樣本中最多數的類別

3.2 細節:

關於K

關於距離的衡量方法:

3.2.1 Euclidean Distance 定義

其餘距離衡量：餘弦值（cos）, 相關度（correlation）, 曼哈頓距離（Manhattan distance）

3.3 舉例

4. 算法優缺點：

4.1 算法優勢

簡單

易於理解

容易實現

經過對K的選擇可具有丟噪音數據的健壯性

4.2 算法缺點

須要大量空間儲存全部已知實例

算法複雜度高（須要比較全部已知實例與要分類的實例）

當其樣本分佈不平衡時，好比其中一類樣本過大（實例數量過多）佔主導的時候，新的未知實例容易被歸類爲這個主導樣本，由於這類樣本實例的數量過大，但這個新的未知實例實際並木接近目標樣本

5. 改進版本

考慮距離，根據距離加上權重

好比: 1/d (d: 距離）

4.2 最鄰近規則分類(K-Nearest Neighbor)KNN算法應用

1 數據集介紹：

虹膜

150個實例

萼片長度，萼片寬度，花瓣長度，花瓣寬度

(sepal length, sepal width, petal length and petal width）

類別：

Iris setosa, Iris versicolor, Iris virginica.

2. 利用Python的機器學習庫sklearn: SkLearnExample.py

from sklearn import neighbors

from sklearn import datasets

knn = neighbors.KNeighborsClassifier()

iris = datasets.load_iris()

print iris

knn.fit(iris.data, iris.target)

predictedLabel = knn.predict([[0.1, 0.2, 0.3, 0.4]])

print predictedLabel

3. KNN 實現Implementation:

# Example of kNN implemented from Scratch in Python

import csv

import random

import math

import operator

def loadDataset(filename, split, trainingSet=[] , testSet=[]):

with open(filename, 'rb') as csvfile:

lines = csv.reader(csvfile)

dataset = list(lines)

for x in range(len(dataset)-1):

for y in range(4):

dataset[x][y] = float(dataset[x][y])

if random.random() < split:

trainingSet.append(dataset[x])

else:

testSet.append(dataset[x])

def euclideanDistance(instance1, instance2, length):

distance = 0

for x in range(length):

distance += pow((instance1[x] - instance2[x]), 2)

return math.sqrt(distance)

def getNeighbors(trainingSet, testInstance, k):

distances = []

length = len(testInstance)-1

for x in range(len(trainingSet)):

dist = euclideanDistance(testInstance, trainingSet[x], length)

distances.append((trainingSet[x], dist))

distances.sort(key=operator.itemgetter(1))

neighbors = []

for x in range(k):

neighbors.append(distances[x][0])

return neighbors

def getResponse(neighbors):

classVotes = {}

for x in range(len(neighbors)):

response = neighbors[x][-1]

if response in classVotes:

classVotes[response] += 1

else:

classVotes[response] = 1

sortedVotes = sorted(classVotes.iteritems(), key=operator.itemgetter(1), reverse=True)

return sortedVotes[0][0]

def getAccuracy(testSet, predictions):

correct = 0

for x in range(len(testSet)):

if testSet[x][-1] == predictions[x]:

correct += 1

return (correct/float(len(testSet))) * 100.0

def main():

# prepare data

trainingSet=[]

testSet=[]

split = 0.67

loadDataset(r'D:\MaiziEdu\DeepLearningBasics_MachineLearning\Datasets\iris.data.txt', split, trainingSet, testSet)

print 'Train set: ' + repr(len(trainingSet))

print 'Test set: ' + repr(len(testSet))

# generate predictions

predictions=[]

k = 3

for x in range(len(testSet)):

neighbors = getNeighbors(trainingSet, testSet[x], k)

result = getResponse(neighbors)

predictions.append(result)

print('> predicted=' + repr(result) + ', actual=' + repr(testSet[x][-1]))

accuracy = getAccuracy(testSet, predictions)

print('Accuracy: ' + repr(accuracy) + '%')

main()

5.1 支持向量機(SVM)算法（上）

1. 背景：

1.1 最先是由 Vladimir N. Vapnik 和 Alexey Ya. Chervonenkis 在1963年提出

1.2 目前的版本(soft margin)是由Corinna Cortes 和 Vapnik在1993年提出，並在1995年發表

1.3 深度學習（2012）出現以前，SVM被認爲機器學習中近十幾年來最成功，表現最好的算法

2. 機器學習的通常框架：

訓練集 => 提取特徵向量 => 結合必定的算法（分類器：好比決策樹，KNN）=>獲得結果

3. 介紹：

3.1 例子：

兩類？哪條線最好？

3.2 SVM尋找區分兩類的超平面（hyper plane), 使邊際(margin)最大

總共能夠有多少個可能的超平面？無數條

如何選取使邊際(margin)最大的超平面 (Max Margin Hyperplane)？

超平面到一側最近點的距離等於到另外一側最近點的距離，兩側的兩個超平面平行

3. 線性可區分(linear separable) 和線性不可區分（linear inseparable)

4. 定義與公式創建

超平面能夠定義爲：

W: weight vectot,

, n 是特徵值的個數

X: 訓練實例

b: bias

4.1 假設2維特徵向量：X = (x1, X2)

把 b 想象爲額外的 wight

超平面方程變爲：

全部超平面右上方的點知足：

全部超平面左下方的點知足：

調整weight，使超平面定義邊際的兩邊：

綜合以上兩式，獲得：（1）

全部坐落在邊際的兩邊的的超平面上的被稱做」支持向量(support vectors)"

分界的超平面和H1或H2上任意一點的距離爲

(i.e.: 其中||W||是向量的範數(norm))

因此，最大邊際距離爲：

5. 求解

5.1 SVM如何找出最大邊際的超平面呢(MMH)？

利用一些數學推倒，以上公式（1）可變爲有限制的凸優化問題(convex quadratic optimization)

利用 Karush-Kuhn-Tucker (KKT)條件和拉格朗日公式，能夠推出MMH能夠被表示爲如下「決定邊

界 (decision boundary)」

其中，

${y_i}$ 是支持向量點 ${X_i}$ （support vector)的類別標記（class label)

${X^T}$ 是要測試的實例

${\alpha _i}$ 和 ${b_0}$ 都是單一數值型參數，由以上提到的最有算法得出

是支持向量點的個數

5.2 對於任何測試（要歸類的）實例，帶入以上公式，得出的符號是正仍是負決定

6. 例子：

5.1 支持向量機(SVM)算法（上）應用

1 sklearn簡單例子

from sklearn import svm

X = [[2, 0], [1, 1], [2,3]]

y = [0, 0, 1]

clf = svm.SVC(kernel = 'linear')

clf.fit(X, y)

print clf

# get support vectors

print clf.support_vectors_

# get indices of support vectors

print clf.support_

# get number of support vectors for each class

print clf.n_support_

2 sklearn畫出決定界限

print(__doc__)

import numpy as np

import pylab as pl

from sklearn import svm

# we create 40 separable points

np.random.seed(0)

X = np.r_[np.random.randn(20, 2) - [2, 2], np.random.randn(20, 2) + [2, 2]]

Y = [0] * 20 + [1] * 20

# fit the model

clf = svm.SVC(kernel='linear')

clf.fit(X, Y)

# get the separating hyperplane

w = clf.coef_[0]

a = -w[0] / w[1]

xx = np.linspace(-5, 5)

yy = a * xx - (clf.intercept_[0]) / w[1]

# plot the parallels to the separating hyperplane that pass through the

# support vectors

b = clf.support_vectors_[0]

yy_down = a * xx + (b[1] - a * b[0])

b = clf.support_vectors_[-1]

yy_up = a * xx + (b[1] - a * b[0])

print "w: ", w

print "a: ", a

# print " xx: ", xx

# print " yy: ", yy

print "support_vectors_: ", clf.support_vectors_

print "clf.coef_: ", clf.coef_

# In scikit-learn coef_ attribute holds the vectors of the separating hyperplanes for linear models. It has shape (n_classes, n_features) if n_classes > 1 (multi-class one-vs-all) and (1, n_features) for binary classification.

# In this toy binary classification example, n_features == 2, hence w = coef_[0] is the vector orthogonal to the hyperplane (the hyperplane is fully defined by it + the intercept).

# To plot this hyperplane in the 2D case (any hyperplane of a 2D plane is a 1D line), we want to find a f as in y = f(x) = a.x + b. In this case a is the slope of the line and can be computed by a = -w[0] / w[1].

# plot the line, the points, and the nearest vectors to the plane

pl.plot(xx, yy, 'k-')

pl.plot(xx, yy_down, 'k--')

pl.plot(xx, yy_up, 'k--')

pl.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1],

s=80, facecolors='none')

pl.scatter(X[:, 0], X[:, 1], c=Y, cmap=pl.cm.Paired)

pl.axis('tight')

pl.show()

5.2 支持向量機(SVM)算法（下）算法

1. SVM算法特性：數據庫

1.1 訓練好的模型的算法複雜度是由支持向量的個數決定的，而不是由數據的維度決定的。因此SVM不太容易產生overfitting

1.2 SVM訓練出來的模型徹底依賴於支持向量(Support Vectors), 即便訓練集裏面全部非支持向量的點都被去除，重複訓練過程，結果仍然會獲得徹底同樣的模型。

1.3 一個SVM若是訓練得出的支持向量個數比較小，SVM訓練出的模型比較容易被泛化。

2. 線性不可分的狀況（linearly inseparable case)

2.1 數據集在空間中對應的向量不可被一個超平面區分開

2.2 兩個步驟來解決：

2.2.1 利用一個非線性的映射把原數據集中的向量點轉化到一個更高維度的空間中

2.2.2 在這個高維度的空間中找一個線性的超平面來根據線性可分的狀況處理

2.2.3 視覺化演示 https://www.youtube.com/watch?v=3liCbRZPrZA

2.3 如何利用非線性映射把原始數據轉化到高維中？

2.3.1 例子：

3維輸入向量：

轉化到6維空間 Z 中去：

新的決策超平面：

其中W和Z是向量，這個超平面是線性的

解出W和b以後，而且帶入回原方程：

2.3.2 思考問題：

2.3.2.1: 如何選擇合理的非線性轉化把數據轉到高緯度中？

2.3.2.2：如何解決計算內積時算法複雜度很是高的問題？

2.3.3 使用核方法（kernel trick)

3. 核方法（kernel trick)

3.1 動機

在線性SVM中轉化爲最優化問題時求解的公式計算都是之內積(dot product)的形式出現的

，其中

是把訓練集中的向量點轉化到高維的非線性映射函數，由於內積的算法複雜

度很是大，因此咱們利用核函數來取代計算非線性映射函數的內積

3.1 如下核函數和非線性映射函數的內積等同

3.2 經常使用的核函數(kernel functions)

h度多項式核函數(polynomial kernel of degree h)：

高斯徑向基核函數(Gaussian radial basis function kernel):

S型核函數(Sigmoid function kernel):

如何選擇使用哪一個kernel？

根據先驗知識，好比圖像分類，一般使用RBF，文字不使用RBF

嘗試不一樣的kernel，根據結果準確度而定

3.3 核函數舉例:

假設定義兩個向量： x = (x1, x2, x3); y = (y1, y2, y3)

定義方程：f(x) = (x1x1, x1x2, x1x3, x2x1, x2x2, x2x3, x3x1, x3x2, x3x3)

K(x, y ) = (<x, y>)^2

假設x = (1, 2, 3); y = (4, 5, 6).

f(x) = (1, 2, 3, 2, 4, 6, 3, 6, 9)
f(y) = (16, 20, 24, 20, 25, 36, 24, 30, 36)
<f(x), f(y)> = 16 + 40 + 72 + 40 + 100+ 180 + 72 + 180 + 324 = 1024windows

K(x, y) = (4 + 10 + 18 ) ^2 = 32^2 = 1024

一樣的結果，使用kernel方法計算容易不少

4. SVM擴展可解決多個類別分類問題

對於每一個類，有一個當前類和其餘類的二類分類器（one-vs-rest)

5.3 支持向量機(SVM)算法（下）應用 api

利用SVM進行人臉識別實例：

from __future__ import print_function

from time import time

import logging

import matplotlib.pyplot as plt

from sklearn.cross_validation import train_test_split

from sklearn.datasets import fetch_lfw_people

from sklearn.grid_search import GridSearchCV

from sklearn.metrics import classification_report

from sklearn.metrics import confusion_matrix

from sklearn.decomposition import RandomizedPCA

from sklearn.svm import SVC

print(__doc__)

# Display progress logs on stdout

logging.basicConfig(level=logging.INFO, format='%(asctime)s %(message)s')

###############################################################################

# Download the data, if not already on disk and load it as numpy arrays

lfw_people = fetch_lfw_people(min_faces_per_person=70, resize=0.4)

# introspect the images arrays to find the shapes (for plotting)

n_samples, h, w = lfw_people.images.shape

# for machine learning we use the 2 data directly (as relative pixel

# positions info is ignored by this model)

X = lfw_people.data

n_features = X.shape[1]

# the label to predict is the id of the person

y = lfw_people.target

target_names = lfw_people.target_names

n_classes = target_names.shape[0]

print("Total dataset size:")

print("n_samples: %d" % n_samples)

print("n_features: %d" % n_features)

print("n_classes: %d" % n_classes)

###############################################################################

# Split into a training set and a test set using a stratified k fold

# split into a training and testing set

X_train, X_test, y_train, y_test = train_test_split(

X, y, test_size=0.25)

###############################################################################

# Compute a PCA (eigenfaces) on the face dataset (treated as unlabeled

# dataset): unsupervised feature extraction / dimensionality reduction

n_components = 150

print("Extracting the top %d eigenfaces from %d faces"

% (n_components, X_train.shape[0]))

t0 = time()

pca = RandomizedPCA(n_components=n_components, whiten=True).fit(X_train)

print("done in %0.3fs" % (time() - t0))

eigenfaces = pca.components_.reshape((n_components, h, w))

print("Projecting the input data on the eigenfaces orthonormal basis")

t0 = time()

X_train_pca = pca.transform(X_train)

X_test_pca = pca.transform(X_test)

print("done in %0.3fs" % (time() - t0))

###############################################################################

# Train a SVM classification model

print("Fitting the classifier to the training set")

t0 = time()

param_grid = {'C': [1e3, 5e3, 1e4, 5e4, 1e5],

'gamma': [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.1], }

clf = GridSearchCV(SVC(kernel='rbf', class_weight='auto'), param_grid)

clf = clf.fit(X_train_pca, y_train)

print("done in %0.3fs" % (time() - t0))

print("Best estimator found by grid search:")

print(clf.best_estimator_)

###############################################################################

# Quantitative evaluation of the model quality on the test set

print("Predicting people's names on the test set")

t0 = time()

y_pred = clf.predict(X_test_pca)

print("done in %0.3fs" % (time() - t0))

print(classification_report(y_test, y_pred, target_names=target_names))

print(confusion_matrix(y_test, y_pred, labels=range(n_classes)))

###############################################################################

# Qualitative evaluation of the predictions using matplotlib

def plot_gallery(images, titles, h, w, n_row=3, n_col=4):

"""Helper function to plot a gallery of portraits"""

plt.figure(figsize=(1.8 * n_col, 2.4 * n_row))

plt.subplots_adjust(bottom=0, left=.01, right=.99, top=.90, hspace=.35)

for i in range(n_row * n_col):

plt.subplot(n_row, n_col, i + 1)

plt.imshow(images[i].reshape((h, w)), cmap=plt.cm.gray)

plt.title(titles[i], size=12)

plt.xticks(())

plt.yticks(())

# plot the result of the prediction on a portion of the test set

def title(y_pred, y_test, target_names, i):

pred_name = target_names[y_pred[i]].rsplit(' ', 1)[-1]

true_name = target_names[y_test[i]].rsplit(' ', 1)[-1]

return 'predicted: %s\ntrue: %s' % (pred_name, true_name)

prediction_titles = [title(y_pred, y_test, target_names, i)

for i in range(y_pred.shape[0])]

plot_gallery(X_test, prediction_titles, h, w)

# plot the gallery of the most significative eigenfaces

eigenface_titles = ["eigenface %d" % i for i in range(eigenfaces.shape[0])]

plot_gallery(eigenfaces, eigenface_titles, h, w)

plt.show()

6.1 神經網絡算法(Nerual Networks)（上）

1. 背景:

1.1 以人腦中的神經網絡爲啓發，歷史上出現過不少不一樣版本

1.2 最著名的算法是1980年的 backpropagation

2. 多層向前神經網絡(Multilayer Feed-Forward Neural Network)

2.1 Backpropagation被使用在多層向前神經網絡上

2.2 多層向前神經網絡由如下部分組成：

輸入層(input layer), 隱藏層 (hidden layers), 輸入層 (output layers)

2.3 每層由單元(units)組成

2.4 輸入層(input layer)是由訓練集的實例特徵向量傳入

2.5 通過鏈接結點的權重(weight)傳入下一層，一層的輸出是下一層的輸入

2.6 隱藏層的個數能夠是任意的，輸入層有一層，輸出層有一層

2.7 每一個單元(unit)也能夠被稱做神經結點，根據生物學來源定義

2.8 以上成爲2層的神經網絡（輸入層不算）

2.8 一層中加權的求和，而後根據非線性方程轉化輸出

2.9 做爲多層向前神經網絡，理論上，若是有足夠多的隱藏層(hidden layers) 和足夠大的訓練集, 能夠模

擬出任何方程

3. 設計神經網絡結構

3.1 使用神經網絡訓練數據以前，必須肯定神經網絡的層數，以及每層單元的個數

3.2 特徵向量在被傳入輸入層時一般被先標準化(normalize）到0和1之間（爲了加速學習過程）

3.3 離散型變量能夠被編碼成每個輸入單元對應一個特徵值可能賦的值

好比：特徵值A可能取三個值（a0, a1, a2), 可使用3個輸入單元來表明A。

若是A=a0, 那麼表明a0的單元值就取1, 其餘取0；

若是A=a1, 那麼表明a1de單元值就取1，其餘取0，以此類推

3.4 神經網絡便可以用來作分類(classification）問題，也能夠解決迴歸(regression)問題

3.4.1 對於分類問題，若是是2類，能夠用一個輸出單元表示（0和1分別表明2類）

若是多餘2類，每個類別用一個輸出單元表示

因此輸入層的單元數量一般等於類別的數量

3.4.2 沒有明確的規則來設計最好有多少個隱藏層

3.4.2.1 根據實驗測試和偏差，以及準確度來實驗並改進

4. 交叉驗證方法(Cross-Validation)

K-fold cross validation

5. Backpropagation算法

5.1 經過迭代性的來處理訓練集中的實例

5.2 對比通過神經網絡後輸入層預測值(predicted value)與真實值(target value)之間

5.3 反方向（從輸出層=>隱藏層=>輸入層）來以最小化偏差(error)來更新每一個鏈接的權重(weight)

5.4 算法詳細介紹

輸入：D：數據集，l 學習率(learning rate)，一個多層前向神經網絡

輸入：一個訓練好的神經網絡(a trained neural network)

5.4.1 初始化權重(weights)和偏向(bias): 隨機初始化在-1到1之間，或者-0.5到0.5之間，每一個單元有

一個偏向

5.4.2 對於每個訓練實例X，執行如下步驟：

5.4.2.1：由輸入層向前傳送

5.4.2.2 根據偏差(error)反向傳送

對於輸出層：

對於隱藏層：

權重更新：

偏向更新

5.4.3 終止條件

5.4.3.1 權重的更新低於某個閾值

5.4.3.2 預測的錯誤率低於某個閾值

5.4.3.3 達到預設必定的循環次數

6. Backpropagation 算法舉例

對於輸出層：

對於隱藏層：

權重更新：

偏向更新

6.2 神經網絡算法(Nerual Networks)應用(上）

1. 關於非線性轉化方程(non-linear transformation function)

sigmoid函數(S 曲線)用來做爲activation function:

1.1 雙曲函數(tanh)

1.2 邏輯函數(logistic function)

2. 實現一個簡單的神經網絡算法

import numpy as np

def tanh(x):

return np.tanh(x)

def tanh_deriv(x):

return 1.0 - np.tanh(x)*np.tanh(x)

def logistic(x):

return 1/(1 + np.exp(-x))

def logistic_derivative(x):

return logistic(x)*(1-logistic(x))

class NeuralNetwork:

def __init__(self, layers, activation='tanh'):

"""

:param layers: A list containing the number of units in each layer.

Should be at least two values

:param activation: The activation function to be used. Can be

"logistic" or "tanh"

"""

if activation == 'logistic':

self.activation = logistic

self.activation_deriv = logistic_derivative

elif activation == 'tanh':

self.activation = tanh

self.activation_deriv = tanh_deriv

self.weights = []

for i in range(1, len(layers) - 1):

self.weights.append((2*np.random.random((layers[i - 1] + 1, layers[i] + 1))-1)*0.25)

self.weights.append((2*np.random.random((layers[i] + 1, layers[i + 1]))-1)*0.25)

def fit(self, X, y, learning_rate=0.2, epochs=10000):

X = np.atleast_2d(X)

temp = np.ones([X.shape[0], X.shape[1]+1])

temp[:, 0:-1] = X # adding the bias unit to the input layer

X = temp

y = np.array(y)

for k in range(epochs):

i = np.random.randint(X.shape[0])

a = [X[i]]

for l in range(len(self.weights)): #going forward network, for each layer

a.append(self.activation(np.dot(a[l], self.weights[l]))) #Computer the node value for each layer (O_i) using activation function

error = y[i] - a[-1] #Computer the error at the top layer

deltas = [error * self.activation_deriv(a[-1])] #For output layer, Err calculation (delta is updated error)

#Staring backprobagation

for l in range(len(a) - 2, 0, -1): # we need to begin at the second to last layer

#Compute the updated error (i,e, deltas) for each node going from top layer to input layer

deltas.append(deltas[-1].dot(self.weights[l].T)*self.activation_deriv(a[l]))

deltas.reverse()

for i in range(len(self.weights)):

layer = np.atleast_2d(a[i])

delta = np.atleast_2d(deltas[i])

self.weights[i] += learning_rate * layer.T.dot(delta)

def predict(self, x):

x = np.array(x)

temp = np.ones(x.shape[0]+1)

temp[0:-1] = x

a = temp

for l in range(0, len(self.weights)):

a = self.activation(np.dot(a, self.weights[l]))

return a

6.3 神經網絡算法(Nerual Networks)應用(下）網絡

1. 簡單非線性關係數據集測試(XOR):

X: Y

0 0 0

0 1 1

1 0 1

1 1 0

Code:

from NeuralNetwork import NeuralNetwork

import numpy as np

nn = NeuralNetwork([2,2,1], 'tanh')

X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])

y = np.array([0, 1, 1, 0])

nn.fit(X, y)

for i in [[0, 0], [0, 1], [1, 0], [1,1]]:

print(i, nn.predict(i))

2. 手寫數字識別：

每一個圖片8x8

識別數字：0,1,2,3,4,5,6,7,8,9

Code:

import numpy as np

from sklearn.datasets import load_digits

from sklearn.metrics import confusion_matrix, classification_report

from sklearn.preprocessing import LabelBinarizer

from NeuralNetwork import NeuralNetwork

from sklearn.cross_validation import train_test_split

digits = load_digits()

X = digits.data

y = digits.target

X -= X.min() # normalize the values to bring them into the range 0-1

X /= X.max()

nn = NeuralNetwork([64,100,10],'logistic')

X_train, X_test, y_train, y_test = train_test_split(X, y)

labels_train = LabelBinarizer().fit_transform(y_train)

labels_test = LabelBinarizer().fit_transform(y_test)

print "start fitting"

nn.fit(X_train,labels_train,epochs=3000)

predictions = []

for i in range(X_test.shape[0]):

o = nn.predict(X_test[i] )

predictions.append(np.argmax(o))

print confusion_matrix(y_test,predictions)

print classification_report(y_test,predictions)

7.1 簡單線性迴歸 (Simple Linear Regression)上app

0. 前提介紹：

爲何須要統計量？

統計量：描述數據特徵

0.1 集中趨勢衡量

0.1.1均值（平均數，平均值）（mean）

{6, 2, 9, 1, 2}

(6 + 2 + 9 + 1 + 2) / 5 = 20 / 5 = 4

0.1.2中位數（median）: 將數據中的各個數值按照大小順序排列，居於中間位置的變量

0.1.2.1. 給數據排序：1， 2， 2， 6， 9

0.1.2.2. 找出位置處於中間的變量：2

當n爲基數的時候：直接取位置處於中間的變量

當n爲偶數的時候，取中間兩個量的平均值

0.1.2衆數（mode）：數據中出現次數最多的數

0.2

0.2.1. 離散程度衡量

0.2.1.1方差（variance)

{6, 2, 9, 1, 2}

(1) (6 - 4)^2 + (2 - 4) ^2 + (9 - 4)^2 + (1 - 4)^2 + (2 - 4)^2

= 4 + 4 + 25 + 9 + 4

= 46

(2) n - 1 = 5 - 1 = 4

(3) 46 / 4 = 11.5

0.2.1.2標準差 (standard deviation)

s = sqrt(11.5) = 3.39

1. 介紹：迴歸(regression) Y變量爲連續數值型(continuous numerical variable)

如：房價，人數，降雨量

分類(Classification): Y變量爲類別型(categorical variable)

如：顏色類別，電腦品牌，有無信譽

2. 簡單線性迴歸(Simple Linear Regression)

2.1 不少作決定過過程一般是根據兩個或者多個變量之間的關係

2.3 迴歸分析(regression analysis)用來創建方程模擬兩個或者多個變量之間如何關聯

2.4 被預測的變量叫作：因變量(dependent variable), y, 輸出(output)

2.5 被用來進行預測的變量叫作：自變量(independent variable), x, 輸入(input)

3. 簡單線性迴歸介紹

3.1 簡單線性迴歸包含一個自變量(x)和一個因變量(y)

3.2 以上兩個變量的關係用一條直線來模擬

3.3 若是包含兩個以上的自變量，則稱做多元迴歸分析(multiple regression)

4. 簡單線性迴歸模型

4.1 被用來描述因變量(y)和自變量(X)以及誤差(error)之間關係的方程叫作迴歸模型

4.2 簡單線性迴歸的模型是:

其中：參數誤差

5. 簡單線性迴歸方程

E(y) = β ₀+β ₁x

這個方程對應的圖像是一條直線，稱做迴歸線

其中，β ₀是迴歸線的截距

β ₁是迴歸線的斜率

E(y)是在一個給定x值下y的指望值（均值）

6. 正向線性關係：

7. 負向線性關係：

8. 無關係

9. 估計的簡單線性迴歸方程

ŷ=b ₀+b ₁x

這個方程叫作估計線性方程(estimated regression line)

其中，b ₀是估計線性方程的縱截距

b ₁是估計線性方程的斜率

ŷ是在自變量x等於一個給定值的時候，y的估計值

10. 線性迴歸分析流程：

11. 關於誤差ε的假定

11.1 是一個隨機的變量，均值爲0

11.2 ε的方差(variance)對於全部的自變量x是同樣的

11.3 ε的值是獨立的

11.4 ε知足正態分佈

7.1 簡單線性迴歸 (Simple Linear Regression)下

1. 簡單線性迴歸模型舉例：

汽車賣家作電視廣告數量與賣出的汽車數量：

1.1 如何練處適合簡單線性迴歸模型的最佳迴歸線？

使sum of squares最小

1.1.2 計算

分子 = (1-2)(14-20)+(3-2)(24-20)+(2-2)(18-20)+(1-2)(17-20)+(3-2)(27-20)

= 6 + 4 + 0 + 3 + 7

= 20

分母 = （1-2）^2 + (3-2)^2 + (2-2)^2 + (1-2)^2 + (3-2)^2

= 1 + 1 + 0 + 1 + 1

b1 = 20/4 =5

b0 = 20 - 5*2 = 20 - 10 = 10

1.2 預測：

假設有一週廣告數量爲6，預測的汽車銷售量是多少？

x_given = 6

Y_hat = 5*6 + 10 = 40

1.3 Python實現：

import numpy as np

def fitSLR(x, y):

n = len(x)

dinominator = 0

numerator = 0

for i in range(0, n):

numerator += (x[i] - np.mean(x))*(y[i] - np.mean(y))

dinominator += (x[i] - np.mean(x))**2

b1 = numerator/float(dinominator)

b0 = np.mean(y)/float(np.mean(x))

return b0, b1

def predict(x, b0, b1):

return b0 + x*b1

x = [1, 3, 2, 1, 3]

y = [14, 24, 18, 17, 27]

b0, b1 = fitSLR(x, y)

print "intercept:", b0, " slope:", b1

x_test = 6

y_test = predict(6, b0, b1)

print "y_test:", y_test

7.3　多元迴歸分析(multiple regression)

1. 與簡單線性迴歸區別(simple linear regression)

多個自變量(x)

2. 多元迴歸模型

y=β ₀＋β _１x ₁+β ₂x ₂+ ... +β _px _p+ε

其中：β _0，β _１，β ₂... β _p是參數

_{ε是偏差值}

3. 多元迴歸方程

E(y)=β ₀＋β _１x ₁+β ₂x ₂+ ... +β _px _p

_{4. 估計多元迴歸方程:}

_{y_hat=b₀＋b_１x₁+b₂x₂+ ... +b_px_p}

一個樣本被用來計算β _0，β _１，β ₂... β _p的點估計b _0,b _1,b _2,...,b _p

5. 估計流程 (與簡單線性迴歸相似）

6. 估計方法

使sum of squares最小

運算與簡單線性迴歸相似，涉及到線性代數和矩陣代數的運算

7. 例子

一家快遞公司送貨：X1：運輸里程 X2：運輸次數 Y：總運輸時間

Driving 框架 Assignment	X1=Miles Traveled	X2=Number of Deliveries	Y= Travel Time (Hours)
1	100	4	9.3
2	50	3	4.8
3	100	4	8.9
4	100	2	6.5
5	50	2	4.2
6	80	2	6.2
7	75	3	7.4
8	65	4	6.0
9	90	3	7.6
10	90	2	6.1

Time = b0+ b1*Miles + b2 * Deliveries

Time = -0.869 + 0.0611 Miles + 0.923 Deliveries

8. 描述參數含義

b0: 平均每多運送一英里，運輸時間延長0.0611 小時

b1: 平均每多一次運輸，運輸時間延長 0.923 小時

9. 預測

若是一個運輸任務是跑102英里，運輸6次，預計多少小時？

Time = -0.869 +0.0611 *102+ 0.923 * 6

= 10.9 (小時）

10. 若是自變量中有分類型變量(categorical data) , 如何處理？

英里數	次數	車型	時間
100	4	1	9.3
50	3	0	4.8
100	4	1	8.9
100	2	2	6.5
50	2	2	4.2
80	2	1	6.2
75	3	1	7.4
65	4	0	6
90	3	0	7.6

11. 關於偏差的分佈

偏差ε是一個隨機變量，均值爲0

ε的方差對於全部的自變量來講相等

全部ε的值是獨立的

ε知足正態分佈，而且經過β ₀＋β _１x ₁+β ₂x ₂+ ... +β _px _p反映y的指望值

7.4　多元迴歸分析(multiple regression)應用

1. 例子

一家快遞公司送貨：X1：運輸里程 X2：運輸次數 Y：總運輸時間

Driving Assignment	X1=Miles Traveled	X2=Number of Deliveries	Y= Travel Time (Hours)
1	100	4	9.3
2	50	3	4.8
3	100	4	8.9
4	100	2	6.5
5	50	2	4.2
6	80	2	6.2
7	75	3	7.4
8	65	4	6.0
9	90	3	7.6
10	90	2	6.1

目的，求出b0, b1,.... bp：

y_hat=b ₀＋b _１x ₁+b ₂x ₂+ ... +b _px _p

2. Python代碼：

from numpy import genfromtxt

import numpy as np

from sklearn import datasets, linear_model

dataPath = r"D:\MaiziEdu\DeepLearningBasics_MachineLearning\Datasets\Delivery.csv"

deliveryData = genfromtxt(dataPath, delimiter=',')

print "data"

print deliveryData

X = deliveryData[:, :-1]

Y = deliveryData[:, -1]

print "X:"

print X

print "Y: "

print Y

regr = linear_model.LinearRegression()

regr.fit(X, Y)

print "coefficients"

print regr.coef_

print "intercept: "

print regr.intercept_

xPred = [102, 6]

yPred = regr.predict(xPred)

print "predicted y: "

print yPred

7.5 非線性迴歸 logistic regression

1. 機率：

1.1 定義機率(P)robability: 對一件事情發生的可能性的衡量

1.2 範圍 0 <= P <= 1

1.3 計算方法：

1.3.1 根據我的置信

1.3.2 根據歷史數據

1.3.3 根據模擬數據

1.4 條件機率：

2. Logistic Regression (邏輯迴歸)

2.1 例子

h(x) > 0.5

h(x) > 0.2

2.2 基本模型

測試數據爲X（x0，x1，x2···xn）

要學習的參數爲： Θ（θ0，θ1，θ2，···θn）

向量表示：

處理二值數據，引入Sigmoid函數時曲線平滑化

預測函數：

用機率表示：

正例(y=1)：

反例(y=0):

2.3 Cost函數

線性迴歸:

找到合適的 θ0，θ1使上式最小

Logistic regression:

Cost函數：

目標：找到合適的 θ0，θ1使上式最小

2.4 解法：梯度降低（gradient decent)

更新法則：

學習率

同時對全部的θ進行更新

重複更新直到收斂

7.6 非線性迴歸應用：losgistic regression application

Python 實現：

import numpy as np

import random

# m denotes the number of examples here, not the number of features

def gradientDescent(x, y, theta, alpha, m, numIterations):

xTrans = x.transpose()

for i in range(0, numIterations):

hypothesis = np.dot(x, theta)

loss = hypothesis - y

# avg cost per example (the 2 in 2*m doesn't really matter here.

# But to be consistent with the gradient, I include it)

cost = np.sum(loss ** 2) / (2 * m)

print("Iteration %d | Cost: %f" % (i, cost))

# avg gradient per example

gradient = np.dot(xTrans, loss) / m

# update

theta = theta - alpha * gradient

return theta

def genData(numPoints, bias, variance):

x = np.zeros(shape=(numPoints, 2))

y = np.zeros(shape=numPoints)

# basically a straight line

for i in range(0, numPoints):

# bias feature

x[i][0] = 1

x[i][1] = i

# our target variable

y[i] = (i + bias) + random.uniform(0, 1) * variance

return x, y

# gen 100 points with a bias of 25 and 10 variance as a bit of noise

x, y = genData(100, 25, 10)

m, n = np.shape(x)

numIterations= 100000

alpha = 0.0005

theta = np.ones(n)

theta = gradientDescent(x, y, theta, alpha, m, numIterations)

print(theta)

7.7 迴歸中的相關度和R平方值

1. 皮爾遜相關係數 (Pearson Correlation Coefficient):

1.1 衡量兩個值線性相關強度的量

1.2 取值範圍 [-1, 1]:

正向相關: >0, 負向相關：<0, 無相關性：=0

1.3

2. 計算方法舉例：

X	Y
1	10
3	12
8	24
7	21
9	34

3. 其餘例子：

4. R平方值:

4.1定義：決定係數，反應因變量的所有變異能經過迴歸關係被自變量解釋的比例。

4.2 描述：如R平方爲0.8，則表示迴歸關係能夠解釋因變量80%的變異。換句話說，若是咱們能控制自變量不變，則因變量的變異程度會減小80%

4.3：簡單線性迴歸：R^2 = r * r

多元線性迴歸：

5. R平方也有其侷限性：R平方隨着自變量的增長會變大，R平方和樣本量是有關係的。所以，咱們要到R平方進行修正。修正的方法：

7.8 迴歸中的相關度和R平方值應用

Python實現：

import numpy as np

from astropy.units import Ybarn

import math

def computeCorrelation(X, Y):

xBar = np.mean(X)

yBar = np.mean(Y)

SSR = 0

varX = 0

varY = 0

for i in range(0 , len(X)):

diffXXBar = X[i] - xBar

diffYYBar = Y[i] - yBar

SSR += (diffXXBar * diffYYBar)

varX += diffXXBar**2

varY += diffYYBar**2

SST = math.sqrt(varX * varY)

return SSR / SST

testX = [1, 3, 8, 7, 9]

testY = [10, 12, 24, 21, 34]

print computeCorrelation(testX, testY)

8.1 聚類(Clustering) K-means算法

1. 歸類：

聚類(clustering) 屬於非監督學習 (unsupervised learning)

無類別標記(class label)

2. 舉例：

3. K-means 算法：

3.1 Clustering 中的經典算法，數據挖掘十大經典算法之一

3.2 算法接受參數 k ；而後將事先輸入的n個數據對象劃分爲 k個聚類以便使得所得到的聚類知足：同一

聚類中的對象類似度較高；而不一樣聚類中的對象類似度較小。

3.3 算法思想：

以空間中k個點爲中心進行聚類，對最靠近他們的對象歸類。經過迭代的方法，逐次更新各聚類中心

的值，直至獲得最好的聚類結果

3.4 算法描述：

（1）適當選擇c個類的初始中心；

（2）在第k次迭代中，對任意一個樣本，求其到c各中心的距離，將該樣本歸到距離最短的中心所在

的類；

（3）利用均值等方法更新該類的中心值；

（4）對於全部的c個聚類中心，若是利用（2）（3）的迭代法更新後，值保持不變，則迭代結束，

不然繼續迭代。

3.5 算法流程：

輸入：k, data[n];

（1）選擇k個初始中心點，例如c[0]=data[0],…c[k-1]=data[k-1];

（2）對於data[0]….data[n], 分別與c[0]…c[k-1]比較，假定與c[i]差值最少，就標記爲i;

（3）對於全部標記爲i點，從新計算c[i]={ 全部標記爲i的data[j]之和}/標記爲i的個數；

（4）重複(2)(3),直到全部c[i]值的變化小於給定閾值。

4. 舉例：

中止

優勢：速度快，簡單

缺點：最終結果跟初始點選擇相關，容易陷入局部最優，需直到k值

Reference: http://croce.ggf.br/dados/K%20mean%20Clustering1.pdf

8.2 聚類(Clustering) K-means算法應用

import numpy as np

# Function: K Means

# -------------

# K-Means is an algorithm that takes in a dataset and a constant

# k and returns k centroids (which define clusters of data in the

# dataset which are similar to one another).

def kmeans(X, k, maxIt):

numPoints, numDim = X.shape

dataSet = np.zeros((numPoints, numDim + 1))

dataSet[:, :-1] = X

# Initialize centroids randomly

centroids = dataSet[np.random.randint(numPoints, size = k), :]

centroids = dataSet[0:2, :]

#Randomly assign labels to initial centorid

centroids[:, -1] = range(1, k +1)

# Initialize book keeping vars.

iterations = 0

oldCentroids = None

# Run the main k-means algorithm

while not shouldStop(oldCentroids, centroids, iterations, maxIt):

print "iteration: \n", iterations

print "dataSet: \n", dataSet

print "centroids: \n", centroids

# Save old centroids for convergence test. Book keeping.

oldCentroids = np.copy(centroids)

iterations += 1

# Assign labels to each datapoint based on centroids

updateLabels(dataSet, centroids)

# Assign centroids based on datapoint labels

centroids = getCentroids(dataSet, k)

# We can get the labels too by calling getLabels(dataSet, centroids)

return dataSet

# Function: Should Stop

# -------------

# Returns True or False if k-means is done. K-means terminates either

# because it has run a maximum number of iterations OR the centroids

# stop changing.

def shouldStop(oldCentroids, centroids, iterations, maxIt):

if iterations > maxIt:

return True

return np.array_equal(oldCentroids, centroids)

# Function: Get Labels

# -------------

# Update a label for each piece of data in the dataset.

def updateLabels(dataSet, centroids):

# For each element in the dataset, chose the closest centroid.

# Make that centroid the element's label.

numPoints, numDim = dataSet.shape

for i in range(0, numPoints):

dataSet[i, -1] = getLabelFromClosestCentroid(dataSet[i, :-1], centroids)

def getLabelFromClosestCentroid(dataSetRow, centroids):

label = centroids[0, -1];

minDist = np.linalg.norm(dataSetRow - centroids[0, :-1])

for i in range(1 , centroids.shape[0]):

dist = np.linalg.norm(dataSetRow - centroids[i, :-1])

if dist < minDist:

minDist = dist

label = centroids[i, -1]

print "minDist:", minDist

return label

# Function: Get Centroids

# -------------

# Returns k random centroids, each of dimension n.

def getCentroids(dataSet, k):

# Each centroid is the geometric mean of the points that

# have that centroid's label. Important: If a centroid is empty (no points have

# that centroid's label) you should randomly re-initialize it.

result = np.zeros((k, dataSet.shape[1]))

for i in range(1, k + 1):

oneCluster = dataSet[dataSet[:, -1] == i, :-1]

result[i - 1, :-1] = np.mean(oneCluster, axis = 0)

result[i - 1, -1] = i

return result

x1 = np.array([1, 1])

x2 = np.array([2, 1])

x3 = np.array([4, 3])

x4 = np.array([5, 4])

testX = np.vstack((x1, x2, x3, x4))

result = kmeans(testX, 2, 10)

print "final result:"

print result

8.3 聚類(Clustering) hierarchical clustering 層次聚類

假設有N個待聚類的樣本，對於層次聚類來講，步驟：

一、（初始化）把每一個樣本歸爲一類，計算每兩個類之間的距離，也就是樣本與樣本之間的類似度；

二、尋找各個類之間最近的兩個類，把他們歸爲一類（這樣類的總數就少了一個）；

三、從新計算新生成的這個類與各個舊類之間的類似度；

四、重複2和3直到全部樣本點都歸爲一類，結束

整個聚類過程實際上是創建了一棵樹，在創建的過程當中，能夠經過在第二步上設置一個閾值，當最近的兩個類的距離大於這個閾值，則認爲迭代能夠終止。另外關鍵的一步就是第三步，如何判斷兩個類之間的類似度有很多種方法。這裏介紹一下三種：

SingleLinkage：又叫作 nearest-neighbor ，就是取兩個類中距離最近的兩個樣本的距離做爲這兩個集合的距離，也就是說，最近兩個樣本之間的距離越小，這兩個類之間的類似度就越大。容易形成一種叫作 Chaining 的效果，兩個 cluster 明明從「大局」上離得比較遠，可是因爲其中個別的點距離比較近就被合併了，而且這樣合併以後 Chaining 效應會進一步擴大，最後會獲得比較鬆散的 cluster 。

CompleteLinkage：這個則徹底是 Single Linkage 的反面極端，取兩個集合中距離最遠的兩個點的距離做爲兩個集合的距離。其效果也是恰好相反的，限制很是大，兩個 cluster 即便已經很接近了，可是隻要有不配合的點存在，就頑固到底，老死不相合並，也是不太好的辦法。這兩種類似度的定義方法的共同問題就是指考慮了某個有特色的數據，而沒有考慮類內數據的總體特色。

Average-linkage：這種方法就是把兩個集合中的點兩兩的距離所有放在一塊兒求一個平均值，相對也能獲得合適一點的結果。

average-linkage的一個變種就是取兩兩距離的中值，與取均值相比更加可以解除個別偏離樣本對結果的干擾。

8.4 聚類(Clustering) hierarchical clustering 層次聚類應用

from numpy import *

"""

Code for hierarchical clustering, modified from

Programming Collective Intelligence by Toby Segaran

(O'Reilly Media 2007, page 33).

"""

class cluster_node:

def __init__(self,vec,left=None,right=None,distance=0.0,id=None,count=1):

self.left=left

self.right=right

self.vec=vec

self.id=id

self.distance=distance

self.count=count #only used for weighted average

def L2dist(v1,v2):

return sqrt(sum((v1-v2)**2))

def L1dist(v1,v2):

return sum(abs(v1-v2))

# def Chi2dist(v1,v2):

# return sqrt(sum((v1-v2)**2))

def hcluster(features,distance=L2dist):

#cluster the rows of the "features" matrix

distances={}

currentclustid=-1

# clusters are initially just the individual rows

clust=[cluster_node(array(features[i]),id=i) for i in range(len(features))]

while len(clust)>1:

lowestpair=(0,1)

closest=distance(clust[0].vec,clust[1].vec)

# loop through every pair looking for the smallest distance

for i in range(len(clust)):

for j in range(i+1,len(clust)):

# distances is the cache of distance calculations

if (clust[i].id,clust[j].id) not in distances:

distances[(clust[i].id,clust[j].id)]=distance(clust[i].vec,clust[j].vec)

d=distances[(clust[i].id,clust[j].id)]

if d<closest:

closest=d

lowestpair=(i,j)

# calculate the average of the two clusters

mergevec=[(clust[lowestpair[0]].vec[i]+clust[lowestpair[1]].vec[i])/2.0 \

for i in range(len(clust[0].vec))]

# create the new cluster

newcluster=cluster_node(array(mergevec),left=clust[lowestpair[0]],

right=clust[lowestpair[1]],

distance=closest,id=currentclustid)

# cluster ids that weren't in the original set are negative

currentclustid-=1

del clust[lowestpair[1]]

del clust[lowestpair[0]]

clust.append(newcluster)

return clust[0]

def extract_clusters(clust,dist):

# extract list of sub-tree clusters from hcluster tree with distance<dist

clusters = {}

if clust.distance<dist:

# we have found a cluster subtree

return [clust]

else:

# check the right and left branches

cl = []

cr = []

if clust.left!=None:

cl = extract_clusters(clust.left,dist=dist)

if clust.right!=None:

cr = extract_clusters(clust.right,dist=dist)

return cl+cr

def get_cluster_elements(clust):

# return ids for elements in a cluster sub-tree

if clust.id>=0:

# positive id means that this is a leaf

return [clust.id]

else:

# check the right and left branches

cl = []

cr = []

if clust.left!=None:

cl = get_cluster_elements(clust.left)

if clust.right!=None:

cr = get_cluster_elements(clust.right)

return cl+cr

def printclust(clust,labels=None,n=0):

# indent to make a hierarchy layout

for i in range(n): print ' ',

if clust.id<0:

# negative id means that this is branch

print '-'

else:

# positive id means that this is an endpoint

if labels==None: print clust.id

else: print labels[clust.id]

# now print the right and left branches

if clust.left!=None: printclust(clust.left,labels=labels,n=n+1)

if clust.right!=None: printclust(clust.right,labels=labels,n=n+1)

def getheight(clust):

# Is this an endpoint? Then the height is just 1

if clust.left==None and clust.right==None: return 1

# Otherwise the height is the same of the heights of

# each branch

return getheight(clust.left)+getheight(clust.right)

def getdepth(clust):

# The distance of an endpoint is 0.0

if clust.left==None and clust.right==None: return 0

# The distance of a branch is the greater of its two sides

# plus its own distance

return max(getdepth(clust.left),getdepth(clust.right))+clust.distance

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。