LFW (Labeled Faces in the Wild) 人臉數據庫是由美國馬薩諸塞州立大學阿默斯特分校計算機視覺實驗室整理完成的數據庫,主要用來研究非受限狀況下的人臉識別問題。LFW 數據庫主要是從互聯網上搜集圖像,而不是實驗室,一共含有13000 多張人臉圖像,每張圖像都被標識出對應的人的名字,其中有1680 人對應不僅一張圖像,即大約1680我的包含兩個以上的人臉。LFW數據集主要測試人臉識別的準確率。
數據庫
from time import time #記錄時間 import logging #打印程序的運行日誌 import matplotlib.pyplot as plt from sklearn.cross_validation import train_test_split #劃分訓練集和測試集 from sklearn.datasets import fetch_lfw_people #導入數據集(名人) from sklearn.grid_search import GridSearchCV #調試函數的參數 from sklearn.metrics import classification_report #顯示分類報告,顯示主要的分類指標,準確率,召回率以及F1得分 from sklearn.metrics import confusion_matrix #對真實類別和預測類別作出判斷,用矩陣形式表示出來 from sklearn.decomposition import RandomizedPCA #pca降維 from sklearn.svm import SVC #svm的svc方程 from sklearn.cluster.tests.test_k_means import n_samples #訓練集:sklearn自帶的人臉圖片數據集 lfw_people = fetch_lfw_people(min_faces_per_person=70,resize=0.4) n_samples,h,w = lfw_people.images.shape #實例數目、h、w x = lfw_people.data #全部的訓練數據,1288張圖片,每張圖片1850個特徵值 n_features = x.shape[1] #特徵向量的維度1850 y =lfw_people.target #對應的人臉標記 target_names = lfw_people.target_names #須要識別的人名字 n_class = target_names.shape[0] #幾我的須要識別 print("total dataset size:") print("n_samples: %d" % n_samples) print("n_features: %d" % n_features) print("n_class: %d" % n_class) x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.25) n_components = 150 t0 = time() #隨機將高維的特徵向量下降爲低維的,先創建模型 pca = RandomizedPCA(n_components=n_components,whiten=True).fit(x_train) print("done in %0.3fs" %(time() - t0)) #提取人臉的特徵值 eigenfaces = pca.components_.reshape((n_components,h,w)) #將人臉特徵轉化爲特徵向量集 print('projecting the input data on the eigenfaces orthonomal basis') t0 = time() #進行數據模型降維,降成了150 x_train_pca = pca.transform(x_train) x_test_pca = pca.transform(x_test) print("done in %0.3fs" % (time() - t0)) print("fitting the classfier to the training set") t0 = time() #C 是對錯誤部分的懲罰;gamma 合成點 para_grid = {'C':[1e3,5e3,1e4,5e4,1e5],'gamma':[0.0001,0.0005,0.0001,0.005,0.01,0.1],} #rbf處理圖像較好,C和gamma組合,窮舉出最好的一個組合 使用GridSearchCV進行自由組合,最終肯定合適的組合 clf = GridSearchCV(SVC(kernel='rbf'),para_grid) clf = clf.fit(x_train_pca,y_train) print("done in %0.3fs" % (time() - t0)) print("best estimator found by grid search:") print(clf.best_estimator_) #最好的模型的信息 print("predict the people's name on the test set") t0 = time() y_pred = clf.predict(x_test_pca) print("done in %0.3fs" % (time() - t0)) print(classification_report(y_test,y_pred,target_names=target_names)) print(confusion_matrix(y_test,y_pred,labels=range(n_class))) def plot_gallery(images,titles,h,w,n_row = 3,n_col = 4): plt.figure(figsize=(1.8*n_col,2.4*n_row)) plt.subplots_adjust(bottom = 0,left = .01,right = .99,top = .90,hspace = .35) for i in range(n_row * n_col): plt.subplot(n_row,n_col,i+1) plt.imshow(images[i].reshape((h,w)),cmap=plt.cm.gray) plt.title(titles[i],size = 12) plt.xticks() plt.yticks() def title(y_pred,y_test,target_names,i): pred_name = target_names[y_pred[i]].rsplit(' ',1)[-1] true_name = target_names[y_test[i]].rsplit(' ',1)[-1] return 'predicted : %s \nture: %s' % (pred_name,true_name) prediction_titles = [title(y_pred,y_test,target_names,i) for i in range(y_pred.shape[0])] plot_gallery(x_test,prediction_titles,h,w) eigenface_title = ["eigenface %d" % i for i in range(eigenfaces.shape[0])] plot_gallery(eigenfaces,eigenface_title,h,w) plt.show()
total dataset size: n_samples: 1288 n_features: 1850 n_class: 7 done in 0.270s projecting the input data on the eigenfaces orthonomal basis done in 0.040s fitting the classfier to the training set done in 30.796s best estimator found by grid search: SVC(C=1000.0, cache_size=200, class_weight=None, coef0=0.0, decision_function_shape='ovr', degree=3, gamma=0.005, kernel='rbf', max_iter=-1, probability=False, random_state=None, shrinking=True, tol=0.001, verbose=False) predict the people's name on the test set done in 0.125s
precision recall f1-score support Ariel Sharon 0.91 0.67 0.77 15 Colin Powell 0.80 0.82 0.81 45 Donald Rumsfeld 0.96 0.61 0.75 41 George W Bush 0.79 0.96 0.87 144 Gerhard Schroeder 0.95 0.63 0.76 30 Hugo Chavez 1.00 0.79 0.88 19 Tony Blair 0.86 0.89 0.88 28 avg / total 0.85 0.84 0.83 322
混淆矩陣 [[ 10 2 0 3 0 0 0] [ 1 37 0 7 0 0 0] [ 0 1 25 13 1 0 1] [ 0 4 1 138 0 0 1] [ 0 1 0 8 19 0 2] [ 0 1 0 3 0 15 0] [ 0 0 0 3 0 0 25]]