轉載自:https://blog.csdn.net/MyJournal/article/details/77841348?locationNum=9&fps=1html
這個算法的思路大體以下:
一、訓練人臉分類模型 輸入:圖像;輸出:這張圖像的特徵
1-一、在Caltech256數據集上pre-trained,訓練出一個較大的圖片識別庫;
1-二、利用以前人臉與非人臉的數據集對預訓練模型進行fine tune,獲得一我的臉分類模型。
二、訓練SVM模型(從新定義正負樣本)輸入:圖像的特徵 輸出:圖像類別
三、將圖片分爲多個矩形選框,用SVM模型對這些選框區域進行分類,即斷定該區域中是否包含人臉
四、使用迴歸器精細修正候選框位置node
下面將進行具體的解釋。
一、訓練人臉分類模型
以初學者的思惟(在基本掌握了MNIST手寫數字識別後),咱們一般是設置一個神經網絡(一般是借鑑在圖片分類中較好的模型的網絡層次結構,例如Alexnet、VGG16等,但聽說VGG16的計算量較大,這個我也沒有試過)直接開始訓練便可。但這時須要考慮到一個問題:咱們爲模型選擇的數據集的規模如何?
若是個人神經網絡結構是七層,前四層是卷積池化層,後三層是鏈接層,對於這樣較複雜的網絡使用多少的數據量合適呢?幾千張?幾萬張?可能都有些少了。當圖片較少時,模型很容易欠擬合,所以須要借用別人用大數據量做爲數據集已經訓練好的模型。但須要注意的是,一旦借用別人的模型,以後fine-tuning定義的模型結構須要與之相同,除了最終的圖片分類數目不一樣之外。git
如下是我定義的神經網絡:github
def inference(input_tensor, train, regularizer,num): with tf.name_scope('layer1-conv1'): conv1_weights = tf.get_variable("weight1",[5,5,3,32],initializer=tf.truncated_normal_initializer(stddev=0.1)) conv1_biases = tf.get_variable("bias1", [32], initializer=tf.constant_initializer(0.0)) conv1 = tf.nn.conv2d(input_tensor, conv1_weights, strides=[1, 1, 1, 1], padding='SAME') relu1 = tf.nn.relu(tf.nn.bias_add(conv1, conv1_biases)) with tf.name_scope("layer2-pool1"): pool1 = tf.nn.max_pool(relu1, ksize = [1,2,2,1],strides=[1,2,2,1],padding="VALID") with tf.variable_scope("layer3-conv2"): conv2_weights = tf.get_variable("weight2",[5,5,32,64],initializer=tf.truncated_normal_initializer(stddev=0.1)) conv2_biases = tf.get_variable("bias2", [64], initializer=tf.constant_initializer(0.0)) conv2 = tf.nn.conv2d(pool1, conv2_weights, strides=[1, 1, 1, 1], padding='SAME') relu2 = tf.nn.relu(tf.nn.bias_add(conv2, conv2_biases)) with tf.name_scope("layer4-pool2"): pool2 = tf.nn.max_pool(relu2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID') with tf.variable_scope("layer5-conv3"): conv3_weights = tf.get_variable("weight3",[3,3,64,128],initializer=tf.truncated_normal_initializer(stddev=0.1)) conv3_biases = tf.get_variable("bias3", [128], initializer=tf.constant_initializer(0.0)) conv3 = tf.nn.conv2d(pool2, conv3_weights, strides=[1, 1, 1, 1], padding='SAME') relu3 = tf.nn.relu(tf.nn.bias_add(conv3, conv3_biases)) with tf.name_scope("layer6-pool3"): pool3 = tf.nn.max_pool(relu3, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID') with tf.variable_scope("layer7-conv4"): conv4_weights = tf.get_variable("weight4",[3,3,128,128],initializer=tf.truncated_normal_initializer(stddev=0.1)) conv4_biases = tf.get_variable("bias4", [128], initializer=tf.constant_initializer(0.0)) conv4 = tf.nn.conv2d(pool3, conv4_weights, strides=[1, 1, 1, 1], padding='SAME') relu4 = tf.nn.relu(tf.nn.bias_add(conv4, conv4_biases)) with tf.name_scope("layer8-pool4"): pool4 = tf.nn.max_pool(relu4, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID') nodes = 6*6*128 reshaped = tf.reshape(pool4,[-1,nodes]) with tf.variable_scope('layer9-fc1'): fc1_weights = tf.get_variable("weight5", [nodes, 1024],initializer=tf.truncated_normal_initializer(stddev=0.1)) if regularizer != None: tf.add_to_collection('losses1', regularizer(fc1_weights)) fc1_biases = tf.get_variable("bias5", [1024], initializer=tf.constant_initializer(0.1)) fc1 = tf.nn.relu(tf.matmul(reshaped, fc1_weights) + fc1_biases) if train: fc1 = tf.nn.dropout(fc1, 0.5) with tf.variable_scope('layer10-fc2'): fc2_weights = tf.get_variable("weight6", [1024, 512],initializer=tf.truncated_normal_initializer(stddev=0.1)) if regularizer != None: tf.add_to_collection('losses2', regularizer(fc2_weights)) fc2_biases = tf.get_variable("bias6", [512], initializer=tf.constant_initializer(0.1)) fc2 = tf.nn.relu(tf.matmul(fc1, fc2_weights) + fc2_biases) if train: fc2 = tf.nn.dropout(fc2, 0.5) with tf.variable_scope('layer11-fc3'): fc3_weights = tf.get_variable("weight7", [512, num],initializer=tf.truncated_normal_initializer(stddev=0.1)) if regularizer != None: tf.add_to_collection('losses3', regularizer(fc3_weights)) fc3_biases = tf.get_variable("bias7", [num], initializer=tf.constant_initializer(0.1)) logit = tf.matmul(fc2, fc3_weights) + fc3_biases return logit #fc3
1-一、pre-trained——Caltech256數據集(有256類的圖片,包括靜物、動物、人物等)
最終分的類別是256類,將上述網絡中num設置爲256便可。
將訓練好的模型保存到model.ckpt中,以後fine-tuning須要將預訓練模型從新加載。算法
checkpoint_file = os.path.join(log_dir, 'model.ckpt') saver.save(sess,checkpoint_file)
1-二、fine tuning
在這裏我先介紹論文中的作法。(因爲個人電腦運行速度太慢了,我就沒有這麼作,只是找了人臉及非人臉的數據集拉進去fine tuning,可是效果不是很好…)
若是作的目標定位系統是定位男人、女人、貓、狗這四類目標,那咱們將fine tuning的神經網絡中的最後一層num設置爲5(4+1),加的這一類表明背景。那麼背景如何得到呢? 首先,須要咱們提早對圖片數據提早標定目標位置,對於每張圖可能得到一個或更多的標定矩形框(x,y,w,h分別表示橫座標的最小值,縱座標的最小值、矩形框寬度、矩形框長度)。其次,咱們經過Python selectivesearch庫中的selectivesearch指令得到多個目標框(Proposals)(selectivesearch指令根據圖片的顏色變化、紋理等將多個像素合併爲多個選框)。接着,咱們經過定義並計算出的IoU(目標框與標定框的重合程度,即IoU=重合面積/兩個矩形所佔的面積(其中一個矩形是標定框,另外一個矩形是目標框))與閾值比較,若大於這個閾值則表示該目標框標出的是男人、女人、貓或狗四類中的一類,若小於這個閾值則表示該標定框標出的是背景。論文中選取的閾值threshold=0.5。最後,加載pre-trained模型後,訓練這些圖片,在預訓練模型的基礎上對各個參數進行微調。
IOU的定義以下:網絡
def if_intersection(xmin_a, xmax_a, ymin_a, ymax_a, xmin_b, xmax_b, ymin_b, ymax _b): if_intersect = False # 經過四條if來查看兩個方框是否有交集。若是四種情況都不存在,咱們視爲無交集
if xmin_a < xmax_b <= xmax_a and (ymin_a < ymax_b <= ymax_a or ymin_a <= ymin_b < ymax_a): if_intersect = True elif xmin_a <= xmin_b < xmax_a and (ymin_a < ymax_b <= ymax_a or ymin_a <= ymin_b < ymax_a): if_intersect = True elif xmin_b < xmax_a <= xmax_b and (ymin_b < ymax_a <= ymax_b or ymin_b <= ymin_a < ymax_b): if_intersect = True elif xmin_b <= xmin_a < xmax_b and (ymin_b < ymax_a <= ymax_b or ymin_b <= ymin_a < ymax_b): if_intersect = True else: return False # 在有交集的狀況下,咱們經過大小關係整理兩個方框各自的四個頂點, 經過它們獲得交集面積
if if_intersect == True: x_sorted_list = sorted([xmin_a, xmax_a, xmin_b, xmax_b])#from small to big number
y_sorted_list = sorted([ymin_a, ymax_a, ymin_b, ymax_b]) x_intersect_w = x_sorted_list[2] - x_sorted_list[1] y_intersect_h = y_sorted_list[2] - y_sorted_list[1] area_inter = x_intersect_w * y_intersect_h return area_inter def IOU(ver1, ver2): vertice1 = [ver1[0], ver1[1], ver1[0]+ver1[2], ver1[1]+ver1[3]] vertice2 = [ver2[0], ver2[1], ver2[0]+ver2[2], ver2[1]+ver2[3]] area_inter = if_intersection(vertice1[0], vertice1[2], vertice1[1], vertice1[3], vertice2[0], vertice2[2], vertice2[1], vertice2[3]) # 若是有交集,計算IOU
if area_inter: area_1 = ver1[2] * ver1[3] area_2 = ver2[2] * ver2[3] iou = float(area_inter) / (area_1 + area_2 - area_inter) return iou iou = 0 return iou
加載pre-trained模型並進行fine-tune訓練:session
def load_with_skip(data_path, session, skip_layer): reader = pywrap_tensorflow.NewCheckpointReader(ckpt.model_checkpoint_path) data_dict = reader.get_variable_to_shape_map() for key in data_dict: print("tensor_name: ", key) if key not in skip_layer: print ( data_dict[key]) print (reader.get_tensor(key)) session.run([key]) saver = tf.train.Saver() with tf.Session() as sess: restore = False sess.run(tf.global_variables_initializer()) ckpt1 = tf.train.get_checkpoint_state(aim_dir) if ckpt1 and ckpt1.model_checkpoint_path: restore = True saver.restore(sess,ckpt1.model_checkpoint_path) print ('fine-tuning model has already exist!') print("Continue training") else: ckpt = tf.train.get_checkpoint_state(log_dir) if ckpt and ckpt.model_checkpoint_path: restore = True print ('original model has already exist!') print("Continue training") load_with_skip(ckpt.model_checkpoint_path, sess, ['layer11-fc3','layer11-fc2','layer11-fc1'])
二、訓練SVM模型,論文中是這麼說的:
(1)SVM分類與CNN分類的數據集區別:
‘for finetuning we map each object proposal to the ground-truth instance with which it has maximum IoU overlap (if any) and label it as a positive for the matched ground-truth class if the IoU is at least 0.5. All other proposals are labeled 「background」 (i.e., negative examples for all classes). For training SVMs, in contrast, we take only the ground-truth boxes as positive examples for their respective classes and label proposals with less than 0.3 IoU overlap with all instances of a class as a negative for that class. Proposals that fall into the grey zone (more than 0.3 IoU overlap, but are not ground truth) are ignored.’
Fine tuning 階段咱們將IoU大於0.5的目標框圈定的圖片做爲正樣本,小於0.5的目標框圈定的圖片做爲負樣本。而在對每一類目標分類的SVM訓練階段,咱們將標定框圈定的圖片做爲正樣本,IoU小於0.3的目標框圈定的圖片做爲負樣本,其他目標框捨棄。
(2)對每一類目標選擇SVM模型
‘Once features are extracted and training labels are applied, we optimize one linear SVM per class.’
對SVM(支持向量機)簡單的理解就是:尋找一個(超)平面將一個事物與其對立面儘量劃分開來。(二分類問題)
咱們將正樣本做爲輸入送入fine-tune模型中,輸出是某一鏈接層獲得的特徵值,將這個輸出與其標籤(上面標定過的正負樣本)做爲SVM的樣本進行訓練,獲得SVM模型。
(3)爲何選擇SVM?
‘In Appendix B we discuss why the positive and negative examples are defined differently in fine-tuning versus SVM training. We also discuss the trade-offs involved in training detection SVMs rather than simply using the outputs from the final softmax layer of the fine-tuned CNN.’
論文的附錄中提到了爲何不直接選擇CNN模型及softmax對目標分類,而是選擇SVM來分類。app
def load_from_pkl(dataset_file): X, Y = pickle.load(open(dataset_file, 'rb')) return X,Y def load_train_proposals(datafile, num_clss, threshold = 0.5, svm = False, save=False, save_path='dataset.pkl'): train_list = open(datafile,'r') labels = [] images = [] n = 0 for line in train_list: n = n+1
print ('n: '+str(n)) tmp = line.strip().split(' ') # tmp0 = image address
# tmp1 = label
# tmp2 = rectangle vertices
img = skimage.io.imread(tmp[0]) ref_rect = tmp[2].split(',') ref_rect_int = [int(i) for i in ref_rect] print (ref_rect) # im_orig:輸入圖片 scale:表示felzenszwalb分割時,值越大,表示保留的下來的集合就越大
# sigma:表示felzenszwalb分割時,用的高斯核寬度 min_size:表示分割後最小組尺寸
img_lbl, regions = selectivesearch.selective_search(img, scale=200, sigma=0.3, min_size=25) candidates = set() for r in regions: # excluding same rectangle (with different segments)
if r['rect'] in candidates:# 剔除重複的方框
continue
if r['size'] < 220:# 剔除過小的方框
continue
if r['size'] > 4000: continue proposal_img, proposal_vertice = clip_pic(img, r['rect']) if len(proposal_img) == 0:# Delete Empty array
continue x, y, w, h = r['rect'] if w == 0 or h == 0: # 長或寬爲0的方框,剔除
continue
if h/w <= 0.7 or h/w>=1.3: continue
# Check if any 0-dimension exist image array的dim裏有0的,剔除
[a, b, c] = np.shape(proposal_img) if a == 0 or b == 0 or c == 0: continue im = Image.fromarray(proposal_img) resized_proposal_img = resize_image(im, 100, 100,resize_mode=3) # 重整方框的大小
candidates.add(r['rect']) img_float = pil_to_nparray(resized_proposal_img) images.append(img_float) # 計算IOU
iou_val = IOU(ref_rect_int, proposal_vertice) # labels, let 0 represent default class, which is background
index = int(tmp[1]) if svm == False: label = np.zeros(num_clss+1) if iou_val < threshold: labels.append(0) else: labels.append(index) labels.append(label) else: if iou_val < threshold: labels.append(0) else: labels.append(index) print (r['rect']) print ('iou_val: '+str(iou_val)) print ('labels append!') if save: pickle.dump((images, labels), open(save_path, 'wb')) return images, labels def generate_single_svm_train(one_class_train_file):#獲取SVM訓練樣本
trainfile = one_class_train_file savepath = one_class_train_file.replace('txt', 'pkl') print (savepath) images = [] Y = [] if os.path.isfile(savepath): print("restoring svm dataset " + savepath) images, Y = load_from_pkl(savepath) else: print("loading svm dataset " + savepath) images, Y = load_train_proposals(trainfile, 3, threshold=0.3, svm=True, save=True, save_path=savepath) return images, Y def train_svms(train_file_folder, model): listings = os.listdir(train_file_folder) print (listings) svms = [] for train_file in listings: if "pkl" in train_file: continue X, Y = generate_single_svm_train(train_file_folder+train_file) print (np.shape(X)) print ('success!') train_features = [] for i in range(0,len(Y)): imgsvm = X[i] labelsvm = Y[i] print ('svm LABEL:'+str(labelsvm)) feats,prelabel = Restore_show(imgsvm) train_features.append(feats[0]) print("feature dimension") clf = svm.LinearSVC() print("fit svm") clf.fit(train_features,Y) print (clf) print(clf.score(train_features, Y)) # 打印擬合優度
joblib.dump(clf,os.getcwd()+'/svm/filename.pkl')#保存SVM模型
svms.append(clf) print (svms) return svms
三、將圖片用selectivesearch指令分爲多個矩形選框,用SVM模型對這些選框區域進行分類,即斷定該區域中是否包含人臉,並將標籤爲1(即包含人臉的圖片)記錄下來:less
imgs, verts = image_proposal(img_path)#image_proposal函數相似於以前的load_train_proposals函數,用於將選框篩選出來
with tf.Session() as sess: features = [] box_images = [] print("predict image:") results = [] results_label = [] results_ratio = [] count = 0 number = 0 temp = [] for f in imgs: feats ,prelabel ,ratio= Restore_show(f)#Restore_show函數是將圖片送入CNN分類模型預測,輸出分別是特徵、預測標籤、是人臉的機率
clf=joblib.load(os.getcwd()+'/svm/filename.pkl')#載入SVM模型
pred = clf.predict(feats[0])#用模型進行預測,feats[0]是圖片的特徵
print(pred) if pred[0] != 0: results.append(verts[count]) results_label.append(pred[0]) results_ratio.append(ratio) temp.append ((ratio,verts[count][0],verts[count][1],verts[count][2],verts[count][3])) number += 1 count += 1
四、使用迴歸器精細修正候選框位置 (box regression)
至於這一部分論文中及許多博客上都仔細講過,主要計算公式我就再也不贅述。大體的原理就是標定框與目標框之間存在必定偏差,咱們須要尋找一種關係從新對目標框設置中心點及大小。爲了保持這個關係爲線性關係,咱們在使用ridge regression時選擇的目標框應是與標定框之間的IoU在0.6以上的值(論文中選擇的值,我選取的是0.7,感受效果也能夠。)
4-一、ridge regression訓練的輸入是:圖片標定框的特徵值,標定框的中心點座標、長、寬(x,y,w,h),目標框的中心點座標、長、寬(x,y,w,h)
4-二、預測:ide
feature, classnum = Output_show(img_path,0,0,size[0],size[1]) #Output_show函數相似於Restore_show函數,將圖片送入CNN分類模型預測,輸出分別是特徵、預測標籤
clf=joblib.load(os.getcwd()+'/boxregression/filenamex.pkl')#載入ridge regression模型
predx = clf.predict(feature) clf=joblib.load(os.getcwd()+'/boxregression/filenamey.pkl') predy = clf.predict(feature) clf=joblib.load(os.getcwd()+'/boxregression/filenamew.pkl') predw = clf.predict(feature) clf=joblib.load(os.getcwd()+'/boxregression/filenameh.pkl') predh = clf.predict(feature) for i in range(number-1,-1,-1): if i not in flag_not: print (temp[i][1],temp[i][2],temp[i][3],temp[i][4]) x = float(temp[i][1]) y = float(temp[i][2]) w = float(temp[i][3]) h = float(temp[i][4]) x1 = max(w*predx+x,0) y1 = max(h*predy+y,0) w1 = w*math.exp(predw) h1 = h*math.exp(predh) print (str(x1)+' '+str(y1)+' '+str(w1)+' '+str(h1)) rect = mpatches.Rectangle( (x1, y1), w1, h1, fill=False, edgecolor='red', linewidth=2) ax.add_patch(rect)#畫出邊框迴歸後的矩形
rect1 = mpatches.Rectangle( (x, y), w, h, fill=False, edgecolor='white', linewidth=2) ax.add_patch(rect1)#畫出爲邊框迴歸的矩形
out_ratio = str(temp[i][1]) plt.text(x1+15, y1+15, str(temp[i][0]),color='red') #在矩形框上寫出預測機率
一、http://blog.csdn.net/bixiwen_liu/article/details/53840913
二、http://blog.csdn.net/ture_dream/article/details/52896452
三、http://blog.csdn.net/daunxx/article/details/51578787
四、https://github.com/rbgirshick/rcnn
五、http://www.cnblogs.com/edwardbi/p/5647522.html