跟我學算法-SVM(支持向量機)

支持向量機是一個點離決策邊界越近，離決策面越遠的問題數組

求解的過程主要是經過拉格朗日乘子法，來求解帶約束的優化問題，在問題中涉及兩個方面，一個是線性的，一個是非線性的，非線性的有dom

咱們平時比較常見的高斯核函數(徑向基函數),他的主要作法就是把低維的數據變成高維數據，經過^2的方法函數

在支持向量基中的參數有 svc__C(鬆弛因子)和svc__gamma 兩個參數,兩個參數越大，模型的複雜度也越大測試

接下來咱們使用一組人臉數據來進行模型，咱們會進行參數調節

fetch

第一步數據載入優化

from sklearn.datasets import fetch_lfw_people #從datasets數據包中獲取數據 import pandas as pd import matplotlib.pyplot as plt faces = fetch_lfw_people(min_faces_per_person=60) #不小於60張圖片
print(faces.target_names) #輸出照片裏的人物名字
print(faces.images.shape) #輸出照片的大小, 639張， 62*47表示的是像素點，每一個像素點表明的是一個數據

第二步取前15張圖片，畫成3行5列的圖片spa

fig, ax = plt.subplots(3, 5) for i, axi in enumerate(ax.flat): axi.imshow(faces.images[i], cmap='bone') # cmap 表示配色方案,bone表示蒼白的
    axi.set(xticks=[], yticks=[], xlabel=faces.target_names[faces.target[i]])  #faces.target[i]對應着0和1標籤，
    # target_names 的 key 是 0和1...，value是名字
plt.show()

第三步：經過make_pipeline 鏈接pca，svm函數code

from sklearn.svm import SVC from sklearn.decomposition import PCA from sklearn.pipeline import make_pipeline pca = PCA(n_components=150, whiten=True, random_state=42)  #whiten確保無相關的輸出
svc = SVC(kernel='rbf', class_weight='balanced')  #核函數爲徑向基函數
 model = make_pipeline(pca, svc)  #鏈接兩個函數, 函數按照前後順序執行

第四步: 經過GridSearchCV調節svc__C 和 svc__gamma 參數,.best_estimator得到訓練好的模型component

#把函數分爲訓練集和測試集
from sklearn.model_selection import train_test_split Xtrain, Xtest, Ytrain, Ytest = train_test_split(faces.data, faces.target, random_state=40) #參數調整svc__C和svc__gamma
from sklearn.model_selection import GridSearchCV #備選參數
param_grid = {'svc__C':[1, 5, 10], 'svc__gamma':[0.0001, 0.0005, 0.001]} grid = GridSearchCV(model, param_grid) #第一個參數是model(模型)， 第二個參數是param_grid 須要調整的參數
print(Xtrain.shape, Ytrain.shape) grid.fit(Xtrain, Ytrain) #創建模型
print(grid.best_params_) #輸出模型的參數組合
 model = grid.best_estimator_  #輸出最好的模型
 yfit = model.predict(Xtest)  #用當前最好的模型作預測

第五步：對預測結果畫圖，這裏畫了4*6的圖blog

fig , ax = plt.subplots(4, 6)  #畫出24副圖，呈現4行6列的擺放形式

for i, axi in enumerate(ax.flat): axi.imshow(Xtest[i].reshape(62, 47), cmap='bone') axi.set(xticks=[], yticks=[]) axi.set_ylabel(faces.target_names[yfit[i]].split()[-1],   #取名字的後一個字符，若是預測結果與真實結果相同，賢黑色，不然顯紅色
              color='black'if yfit[i]==Ytest[i] else 'red') plt.show() fig.suptitle('Predicted Names; Incorrect Labels in Red', size=14)  #加上標題

from sklearn.metrics import classification_report   #輸出精確度，召回值
print(classification_report(Ytest, yfit, target_names=faces.target_names))

第六步：畫出一個混淆矩陣的圖

from sklearn.metrics import  confusion_matrix  #作混淆矩陣
import seaborn as sns mat = confusion_matrix(Ytest, yfit) #Ytest表示待測標籤， yfit表示預測結果

sns.heatmap(mat.T, square=True, annot=True, fmt='d', cbar=False,

 xticklabels=faces.target_names, yticklabels=faces.target_names) plt.xlabel('true label') plt.ylabel('predicted label') plt.show()