基於svm和pca的人臉識別案例分析

數據集介紹

LFW (Labeled Faces in the Wild) 人臉數據庫是由美國馬薩諸塞州立大學阿默斯特分校計算機視覺實驗室整理完成的數據庫,主要用來研究非受限狀況下的人臉識別問題。LFW 數據庫主要是從互聯網上搜集圖像,而不是實驗室,一共含有13000 多張人臉圖像,每張圖像都被標識出對應的人的名字,其中有1680 人對應不僅一張圖像,即大約1680我的包含兩個以上的人臉。LFW數據集主要測試人臉識別的準確率。數據庫

代碼實現

from time import time               #記錄時間
import logging                      #打印程序的運行日誌
import matplotlib.pyplot as plt
from sklearn.cross_validation import train_test_split   #劃分訓練集和測試集
from sklearn.datasets import fetch_lfw_people           #導入數據集(名人)
from sklearn.grid_search import GridSearchCV            #調試函數的參數
from sklearn.metrics import classification_report       #顯示分類報告,顯示主要的分類指標,準確率,召回率以及F1得分
from sklearn.metrics import confusion_matrix            #對真實類別和預測類別作出判斷,用矩陣形式表示出來
from sklearn.decomposition import RandomizedPCA         #pca降維
from sklearn.svm import SVC                             #svm的svc方程
from sklearn.cluster.tests.test_k_means import n_samples

#訓練集:sklearn自帶的人臉圖片數據集
lfw_people = fetch_lfw_people(min_faces_per_person=70,resize=0.4)
n_samples,h,w = lfw_people.images.shape     #實例數目、h、w
x = lfw_people.data                         #全部的訓練數據,1288張圖片,每張圖片1850個特徵值
n_features = x.shape[1]                     #特徵向量的維度1850
y =lfw_people.target                        #對應的人臉標記
target_names = lfw_people.target_names      #須要識別的人名字
n_class = target_names.shape[0]             #幾我的須要識別

print("total dataset size:")
print("n_samples: %d" % n_samples)
print("n_features: %d" % n_features)
print("n_class: %d" % n_class)
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.25)
n_components = 150
t0 = time()
#隨機將高維的特徵向量下降爲低維的,先創建模型
pca = RandomizedPCA(n_components=n_components,whiten=True).fit(x_train)
print("done in %0.3fs" %(time() - t0))
#提取人臉的特徵值
eigenfaces = pca.components_.reshape((n_components,h,w))            #將人臉特徵轉化爲特徵向量集
print('projecting the input data on the eigenfaces orthonomal basis')
t0 = time()
#進行數據模型降維,降成了150
x_train_pca = pca.transform(x_train)
x_test_pca = pca.transform(x_test)
print("done in %0.3fs" % (time() - t0))
print("fitting the classfier to the training set")
t0 = time()
#C 是對錯誤部分的懲罰;gamma 合成點
para_grid = {'C':[1e3,5e3,1e4,5e4,1e5],'gamma':[0.0001,0.0005,0.0001,0.005,0.01,0.1],}
#rbf處理圖像較好,C和gamma組合,窮舉出最好的一個組合 使用GridSearchCV進行自由組合,最終肯定合適的組合
clf = GridSearchCV(SVC(kernel='rbf'),para_grid)
clf = clf.fit(x_train_pca,y_train)
print("done in %0.3fs" % (time() - t0))

print("best estimator found by grid search:")
print(clf.best_estimator_)                                       #最好的模型的信息
print("predict the people's name on the test set")
t0 = time()
y_pred = clf.predict(x_test_pca)
print("done in %0.3fs" % (time() - t0))
print(classification_report(y_test,y_pred,target_names=target_names))
print(confusion_matrix(y_test,y_pred,labels=range(n_class)))

def plot_gallery(images,titles,h,w,n_row = 3,n_col = 4):
    plt.figure(figsize=(1.8*n_col,2.4*n_row))
    plt.subplots_adjust(bottom = 0,left = .01,right = .99,top = .90,hspace = .35)
    for i in range(n_row * n_col):
        plt.subplot(n_row,n_col,i+1)
        plt.imshow(images[i].reshape((h,w)),cmap=plt.cm.gray)
        plt.title(titles[i],size = 12)
        plt.xticks()
        plt.yticks()

def title(y_pred,y_test,target_names,i):
    pred_name = target_names[y_pred[i]].rsplit(' ',1)[-1]
    true_name = target_names[y_test[i]].rsplit(' ',1)[-1]
    return 'predicted : %s \nture: %s' % (pred_name,true_name)

prediction_titles = [title(y_pred,y_test,target_names,i) for i in range(y_pred.shape[0])]

plot_gallery(x_test,prediction_titles,h,w)
eigenface_title = ["eigenface %d" % i for i in range(eigenfaces.shape[0])]
plot_gallery(eigenfaces,eigenface_title,h,w)
plt.show()

結果

total dataset size:
n_samples: 1288
n_features: 1850
n_class: 7
done in 0.270s
projecting the input data on the eigenfaces orthonomal basis
done in 0.040s
fitting the classfier to the training set
done in 30.796s
best estimator found by grid search:
SVC(C=1000.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=0.005, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)
predict the people's name on the test set
done in 0.125s
precision    recall  f1-score   support

Ariel Sharon                0.91        0.67      0.77        15
Colin Powell                0.80        0.82      0.81        45
Donald Rumsfeld        0.96        0.61      0.75        41
George W Bush          0.79        0.96      0.87       144
Gerhard Schroeder     0.95        0.63      0.76        30
Hugo Chavez              1.00        0.79      0.88        19
Tony Blair                    0.86        0.89      0.88        28
avg / total                    0.85        0.84      0.83       322
混淆矩陣
[[ 10   2   0   3   0   0   0]
 [  1  37   0   7   0   0   0]
 [  0   1  25  13   1   0   1]
 [  0   4   1 138   0   0   1]
 [  0   1   0   8  19   0   2]
 [  0   1   0   3   0  15   0]
 [  0   0   0   3   0   0  25]]

基於svm和pca的人臉識別案例分析

相關文章
相關標籤/搜索