OpenCV3的對象檢測-使用Pythonhtml
http://docs.opencv.org/master/d7/d8b/tutorial_py_face_detection.html
算法
In this session,windows
We will see the basics of face detection using Haar Feature-based Cascade Classifiersapi
We will extend the same for eye detection etc.
session
使用 Haar 分類器進行面部檢測app
目標:less
• 以 Haar 特徵分類器爲基礎的面部檢測技術。
機器學習
• 將面部檢測擴展到眼部檢測等。ide
Object Detection using Haar feature-based cascade classifiers is an effective object detection method proposed by Paul Viola and Michael Jones in their paper, "Rapid Object Detection using a Boosted Cascade of Simple Features" in 2001. It is a machine learning based approach where a cascade function is trained from a lot of positive and negative images. It is then used to detect objects in other images.學習
Here we will work with face detection. Initially, the algorithm needs a lot of positive images (images of faces) and negative images (images without faces) to train the classifier. Then we need to extract features from it. For this, haar features shown in below image are used. They are just like our convolutional kernel. Each feature is a single value obtained by subtracting sum of pixels under white rectangle from sum of pixels under black rectangle.
1 基礎
以 Haar 特徵分類器爲基礎的對象檢測技術是一種很是有效的對象檢測 技術(2001 年 Paul_Viola 和 Michael_Jones 提出)。它是基於機器學習的, 經過使用大量的正負樣本圖像訓練獲得一個 cascade_function,最後再用它 來作對象檢測。
如今咱們來學習面部檢測。開始時,算法須要大量的正樣本圖像(面部圖 像)和負樣本圖像(不含面部的圖像)來訓練分類器。咱們須要從其中提取特 徵。下圖中的 Haar 特徵會被使用。它們就像咱們的卷積核。每個特徵是一 個值,這個值等於黑色矩形中的像素值以後減去白色矩形中的像素值之和。
Now all possible sizes and locations of each kernel is used to calculate plenty of features. (Just imagine how much computation it needs? Even a 24x24 window results over 160000 features). For each feature calculation, we need to find sum of pixels under white and black rectangles. To solve this, they introduced the integral images. It simplifies calculation of sum of pixels, how large may be the number of pixels, to an operation involving just four pixels. Nice, isn't it? It makes things super-fast.
使用全部可能的核來計算足夠多的特徵。(想象一下這須要多少計算量?僅 僅是一個 24x24 的窗口就有 160000 個特徵)。對於每個特徵的計算咱們 好須要計算白色和黑色矩形內的像素和。爲了解決這個問題,做者引入了積分 圖像,這能夠大大的簡化求和運算,對於任何一個區域的像素和只須要對積分 圖像上的四個像素操做便可。很是漂亮,它可使運算速度飛快!
But among all these features we calculated, most of them are irrelevant. For example, consider the image below. Top row shows two good features. The first feature selected seems to focus on the property that the region of the eyes is often darker than the region of the nose and cheeks. The second feature selected relies on the property that the eyes are darker than the bridge of the nose. But the same windows applying on cheeks or any other place is irrelevant. So how do we select the best features out of 160000+ features? It is achieved by Adaboost.
可是在咱們計算獲得的全部的這些特徵中,大多數是不相關的。以下圖所 示。上邊一行顯示了兩個好的特徵,第一個特徵看上去是對眼部周圍區域的描 述,由於眼睛老是比鼻子黑一些。第二個特徵是描述的是眼睛比鼻樑要黑一些。可是若是把這兩個窗口放到臉頰的話,就一點都不相關。那麼咱們怎樣從超過160000+ 個特徵中選出最好的特徵呢?使用 Adaboost。
For this, we apply each and every feature on all the training images. For each feature, it finds the best threshold which will classify the faces to positive and negative. But obviously, there will be errors or misclassifications. We select the features with minimum error rate, which means they are the features that best classifies the face and non-face images. (The process is not as simple as this. Each image is given an equal weight in the beginning. After each classification, weights of misclassified images are increased. Then again same process is done. New error rates are calculated. Also new weights. The process is continued until required accuracy or error rate is achieved or required number of features are found).
爲了達到這個目的,咱們將每個特徵應用於全部的訓練圖像。對於每一 個特徵,咱們要找到它可以區分出正樣本和負樣本的最佳閾值。可是很明顯, 這會產生錯誤或者錯誤分類。咱們要選取錯誤率最低的特徵,這說明它們是檢 測面部和非面部圖像最好的特徵。(這個過程其實不像咱們說的這麼簡單。在開 始時每一張圖像都具備相同的權重,每一次分類以後,被錯分的圖像的權重會 增大。一樣的過程會被再作一遍。而後咱們又獲得新的錯誤率和新的權重。重 復執行這個過程知道到達要求的準確率或者錯誤率或者要求數目的特徵找到)。
Final classifier is a weighted sum of these weak classifiers. It is called weak because it alone can't classify the image, but together with others forms a strong classifier. The paper says even 200 features provide detection with 95% accuracy. Their final setup had around 6000 features. (Imagine a reduction from 160000+ features to 6000 features. That is a big gain).
最終的分類器是這些弱分類器的加權和。之因此成爲弱分類器是應爲只是 用這些分類器不足以對圖像進行分類,可是與其餘的分類器聯合起來就是一個 很強的分類器了。文章中說 200 個特徵就可以提供 95% 的準確度了。他們最 後使用 6000 個特徵。(從 160000 減到 6000,效果顯著呀!)。
So now you take an image. Take each 24x24 window. Apply 6000 features to it. Check if it is face or not. Wow.. Wow.. Isn't it a little inefficient and time consuming? Yes, it is. Authors have a good solution for that.
如今你有一幅圖像,對每個 24x24 的窗口使用這 6000 個特徵來作檢 查,看它是否是面部。這是否是很低效很耗時呢?的確如此,但做者有更好的 解決方法。
In an image, most of the image region is non-face region. So it is a better idea to have a simple method to check if a window is not a face region. If it is not, discard it in a single shot. Don't process it again. Instead focus on region where there can be a face. This way, we can find more time to check a possible face region.
在一副圖像中大多數區域是非面部區域。因此最好有一個簡單的方法來證 明這個窗口不是面部區域,若是不是就直接拋棄,不用對它再作處理。而不是 集中在研究這個區域是否是面部。按照這種方法咱們能夠在多是面部的區域 多花點時間。
For this they introduced the concept of Cascade of Classifiers. Instead of applying all the 6000 features on a window, group the features into different stages of classifiers and apply one-by-one. (Normally first few stages will contain very less number of features). If a window fails the first stage, discard it. We don't consider remaining features on it. If it passes, apply the second stage of features and continue the process. The window which passes all stages is a face region. How is the plan !!!
Authors' detector had 6000+ features with 38 stages with 1, 10, 25, 25 and 50 features in first five stages. (Two features in the above image is actually obtained as the best two features from Adaboost). According to authors, on an average, 10 features out of 6000+ are evaluated per sub-window.
So this is a simple intuitive explanation of how Viola-Jones face detection works. Read paper for more details or check out the references in Additional Resources section.
爲了達到這個目的做者提出了級聯分類器的概念。不是在一開始就對窗口 進行這 6000 個特徵測試,將這些特徵分紅不一樣組。在不一樣的分類階段逐個使 用。(一般前面不多的幾個階段使用較少的特徵檢測)。若是一個窗口第一階段 的檢測都過不了就能夠直接放棄後面的測試了,若是它經過了就進入第二階段 的檢測。若是一個窗口通過了全部的測試,那麼這個窗口就被認爲是面部區域。 這個計劃是否是很帥!!
做者將 6000 多個特徵分爲 38 個階段,前五個階段的特徵數分別爲 1,10,25,25 和 50。(上圖中的兩個特徵其實就是從 Adaboost 得到的最好 特徵)。
According to authors, on an average, 10 features out of 6000+ are evaluated per sub-window.
上面是咱們對 Viola-Jones 面部檢測是如何工做的直觀解釋。讀一下原始 文獻或者更多資源中非參考文獻將會對你有更大幫助。
OpenCV comes with a trainer as well as detector. If you want to train your own classifier for any object like car, planes etc. you can use OpenCV to create one. Its full details are given here: Cascade Classifier Training.
Here we will deal with detection. OpenCV already contains many pre-trained classifiers for face, eyes, smile etc. Those XML files are stored in opencv/data/haarcascades/ folder. Let's create face and eye detector with OpenCV.
First we need to load the required XML classifiers. Then load our input image (or video) in grayscale mode.
2 OpenCV 中的 Haar 級聯檢測
OpenCV 自帶了訓練器和檢測器。若是你想本身訓練一個分類器來檢測 汽車,飛機等的話,可使用 OpenCV 構建。其中的細節在這裏:Cascade Classifier Training
如今咱們來學習一下如何使用檢測器。OpenCV 已經包含了不少已經訓練 好的分類器,其中包括:面部,眼睛,微笑等。這些 XML 文件保存在/opencv/ data/haarcascades/文件夾中。下面咱們將使用 OpenCV 建立一個面部和眼 部檢測器。
首先咱們要加載須要的 XML 分類器。而後以灰度格式加載輸入圖像或者 是視頻。
# -*- coding: utf-8 -*- import numpy as npimport cv2 face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml') eye_cascade = cv2.CascadeClassifier('haarcascade_eye.xml') img = cv2.imread('sachin.jpg') gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
Now we find the faces in the image. If faces are found, it returns the positions of detected faces as Rect(x,y,w,h). Once we get these locations, we can create a ROI for the face and apply eye detection on this ROI (since eyes are always on the face !!! ).
如今咱們在圖像中檢測面部。若是檢測到面部,它會返回面部所在的矩形 區域 Rect(x,y,w,h)。一旦咱們得到這個位置,咱們能夠建立一個 ROI 並在 其中進行眼部檢測。(誰讓眼睛長在臉上呢!)
#Detects objects of different sizes in the input image. # The detected objects are returned as a list of rectangles. #cv2.CascadeClassifier.detectMultiScale(image, scaleFactor, minNeighbors, flags, minSize, maxSize) #scaleFactor – Parameter specifying how much the image size is reduced at each image #scale. #minNeighbors – Parameter specifying how many neighbors each candidate rectangle should #have to retain it. #minSize – Minimum possible object size. Objects smaller than that are ignored. #maxSize – Maximum possible object size. Objects larger than that are ignored. faces = face_cascade.detectMultiScale(gray, 1.3, 5) for (x,y,w,h) in faces: img = cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),2) roi_gray = gray[y:y+h, x:x+w] roi_color = img[y:y+h, x:x+w] eyes = eye_cascade.detectMultiScale(roi_gray) for (ex,ey,ew,eh) in eyes: cv2.rectangle(roi_color,(ex,ey),(ex+ew,ey+eh),(0,255,0),2) cv2.imshow('img',img) cv2.waitKey(0) cv2.destroyAllWindows()
Result looks like below:
結果以下:
更多資源
1. Video Lecture on Face Detection and Tracking
2. AninterestinginterviewregardingFaceDetectionbyAdamHar- vey