Viola–Jones object detection framework--Rapid Object Detection using a Boosted Cascade of Simple Fea

ACCEPTED CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION 2001web

 

 

 

Rapid Object Detection using a Boosted Cascade of Simple算法

Features數據庫

 簡單特徵的優化級聯在快速目標檢測中的應用c#

Paul Viola                                                            Michael Joneswindows

viola@merl.com                                               mjones@crl.dec.comapi

Mitsubishi Electric Research Labs                                        Compaq CRL               數組

三菱電氣實驗室                                                      康柏劍橋研究所網絡

201 Broadway, 8th FL                                            One Cambridge Centerapp

Cambridge, MA 02139                                           Cambridge, MA 02142框架

 

 

 

Abstract

 

This paper describes a machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates. This work is distinguished by three key contributions. The first is the introduction of a new image representation called the 「Integral Image」 which allows the features used by our detector to be computed very quickly. The second is a learning algorithm, based on AdaBoost, which selects a small number of critical visual features from a larger set and yields extremely efficient classifiers[6]. The third contribution is a method for combining increasingly more complex classi- fiers in a 「cascade」 which allows background regions of the image to be quickly discarded while spending more compu- tation on promising object-like regions. The cascade can be viewed as an object specific focus-of-attention mechanism which unlike previous approaches provides statistical guar- antees that discarded regions are unlikely to contain the ob- ject of interest. In the domain of face detection the system yields detection rates comparable to the best previous sys- tems.  Used in real-time applications, the detector runs at 15 frames per second without resorting to image differenc- ing or skin color detection.

摘要

本文描述了一個視覺目標檢測的機器學習法,它可以很是快速地處理圖像並且能實現高檢測速率。這項工做可分爲三個創新性研究成果。第一個是一種新的圖像表徵說明,稱爲「積分圖」,它容許咱們的檢測的特徵得以很快地計算出來。第二個是一個學習算法,基於Adaboost自適應加強法,能夠從一些更大的設置和產量極爲有效的分類器中選擇出幾個關鍵的視覺特徵。第三個成果是一個方法:用一個「級聯」的形式不斷合併分類器,這樣便容許圖像的背景區域被很快丟棄,從而將更多的計算放在多是目標的區域上。這個級聯能夠視做一個目標特定的注意力集中機制,它不像之前的途徑提供統計保障,保證舍掉的地區不太可能包含感興趣的對象。在人臉檢測領域,此係統的檢測率比得上以前系統的最佳值。在實時監測的應用中,探測器以每秒15幀速度運行,不採用幀差值或膚色檢測的方法。

 

1.  Introduction

 

This paper brings together new algorithms and insights to construct a framework for robust and extremely rapid object detection. This framework is demonstrated on, and in part motivated by, the task of face detection.  Toward this end we have constructed a frontal face detection system which achieves detection and false positive rates which are equiv- alent to the best published results [16, 12, 15, 11, 1]. This face detection system is most clearly distinguished from previous approaches in its ability to detect faces extremely rapidly. Operating on 384 by 288 pixel images, faces are detected at 15 frames per second on a conventional 700 MHz Intel Pentium III. In other face detection systems, auxiliary information, such as image differences in video sequences, or pixel color in color images, have been used to achieve high frame rates.   Our system achieves high frame rates working only with the information present in a single grey scale image.  These alternative sources of information can also be integrated with our system to achieve even higher frame rates.

1.引言

本文聚集了新的算法和看法,構築一個魯棒性良好的極速目標檢測框架。這一框架主要是體現人臉檢測的任務。爲了實現這一目標,咱們已經創建了一個正面的人臉檢測系統,實現了至關於已公佈的最佳結果的檢測率和正誤視率, [16,12,15,11,1]。這種人臉檢測系統區分人臉比以往的方法都要清楚,並且速度很快。經過對384×288像素的圖像,硬件環境是常規700 MHz英特爾奔騰III,人臉檢測速度達到了每秒15幀。在其它人臉檢測系統中,一些輔助信息如視頻序列中的圖像差別,或在彩色圖像中像素的顏色,被用來實現高幀率。而咱們的系統僅僅使用一個單一的灰度圖像信息實現了高幀速率。上述可供選擇的信息來源也能夠與咱們的系統集成,以得到更高的幀速率。

 

There are three main contributions of our object detection framework. We will introduce each of these ideas briefly below and then describe them in detail in subsequent sections.

本文的目標檢測框架包含三個主要創新性成果。下面將簡短介紹這三個概念,以後將分章節對它們一一進行詳細描述。

 

The first contribution of this paper is a new image representation called an integral image that allows for very fast feature evaluation. Motivated in part by the work of Papageorgiou et al. our detection system does not work directly with image intensities [10].  Like these authors we use a set of features which are reminiscent of Haar Basis functions (though we will also use related filters which are more complex than Haar filters). In order to compute these fea- tures very rapidly at many scales we introduce the integral image representation for images. The integral image can be computed from an image using a few operations per pixel. Once computed, any one of these Harr-like features can be computed at any scale or location in constant time.

本文的第一個成果是一個新的圖像表徵,稱爲積分圖像,容許進行快速特徵評估。咱們的檢測系統不能直接利用圖像強度的信息工做[10]。和這些做者同樣,咱們使用一系列與Haar基本函數相關的特徵:(儘管咱們也將使用一些更復雜的濾波器)。爲了很是迅速地計算多尺度下的這些特性,咱們引進了積分圖像。在一幅圖像中,每一個像素使用不多的一些操做,即可以計算獲得積分圖像。任何一個類Haar特徵能夠在任何規模或位置上被計算出來,且是在固定時間內。

 

The second contribution of this paper is a method for constructing a classifier by selecting a small number of im- portant features using AdaBoost [6]. Within any image sub- window the total number of Harr-like features is very large, far larger than the number of pixels. In order to ensure fast classification, the learning process must exclude a large ma- jority of the available features, and focus on a small set of critical features. Motivated by the work of Tieu and Viola, feature selection is achieved through a simple modification of the AdaBoost procedure: the weak learner is constrained so that each weak classifier returned can depend on only a single feature [2].  As a result each stage of the boosting process, which selects a new weak classifier, can be viewed as a feature selection process. AdaBoost provides an effec- tive learning algorithm and strong bounds on generalization performance [13, 9, 10].

本文的第二個成果是經過使用AdaBoost算法選擇數個重要的特徵構建一個分類器[6]。在任何圖像子窗口裏的類Haar特徵的數目很是大,遠遠超過了像素數目。爲了確保快速分類,在學習過程當中必須剔除的大部分可用的特徵,關注一小部分關鍵特徵。選拔工做是經過一個AdaBoost的程序簡單修改:約束弱學習者,使每個弱分類器返回時僅可依賴1個特徵[2]。所以,每一個改善過程的階段,即選擇一個新的弱分類器的過程,能夠做爲一個特徵選擇過程。 AdaBoost算法顯示了一個有效的學習算法和良好的泛化性能[13,9,10]。

 

The third major contribution of this paper is a method for combining successively more complex classifiers in a cascade structure which dramatically increases the speed of the detector by focusing attention on promising regions of the image. The notion behind focus of attention approaches is that it is often possible to rapidly determine where in an image an object might occur [17, 8, 1]. More complex pro- cessing is reserved only for these promising regions.  The key measure of such an approach is the 「false negative」 rate of the attentional process.  It must be the case that all, or almost all, object instances are selected by the attentional filter.

本文的第三個主要成果是在一個在級聯結構中連續結合更復雜的分類器的方法,經過將注意力集中到圖像中有但願的地區,來大大提升了探測器的速度。在集中注意力的方法背後的概念是,它每每可以迅速肯定在圖像中的一個對象可能會出如今哪裏[17,8,1]。更復雜的處理僅僅是爲這些有但願的地區所保留。衡量這種作法的關鍵是注意力過程的「負誤視」(在模式識別中,將屬於物體標註爲不屬於物體)的機率。在幾乎全部的實例中,對象實例必須是由注意力濾波器選擇。

 

We will describe a process for training an extremely sim- ple and efficient classifier which can be used as a 「super- vised」 focus of attention operator.   The term supervised refers to the fact that the attentional operator is trained to detect examples of a particular class. In the domain of face detection it is possible to achieve fewer than 1% false neg- atives and 40% false positives using a classifier constructed from two Harr-like features.  The effect of this filter is to reduce by over one half the number of locations where the final detector must be evaluated.

咱們將描述一個過程:訓練一個很是簡單又高效的分類器,用來做爲注意力操做的「監督」中心。術語「監督」是指:注意力操做被訓練用來監測特定分類的例子。在人臉檢測領域,使用一個由兩個類Haar特徵構建的分類器,有可能達到1%不到的負誤視和40%正誤視。該濾波器的做用是減小超過一半的最終檢測器必須進行評估的地方。

 

Those sub-windows which are not rejected by the initial classifier are processed by a sequence of classifiers, each slightly more complex than the last. If any classifier rejects the sub-window, no further processing is performed.  The structure of the cascaded detection process is essentially that of a degenerate decision tree, and as such is related to the work of Geman and colleagues [1, 4].

這些沒有被最初的分類器排除的子窗口,由接下來的一系列分類處理,每一個分類器都比其前一個稍有複雜。若是某個子窗口被任一個分類器排除,那它將不會被進一步處理。在檢測過程的級聯結構基本上是一個退化型決策樹,這點能夠參照German和同事的工做[1,4]。

 

An extremely fast face detector will have broad prac- tical applications.   These include user interfaces, image databases,  and teleconferencing.    In applications where rapid frame-rates are not necessary, our system will allow for significant additional post-processing and analysis.  In addition our system can be implemented on a wide range of small low power devices, including hand-helds and embed- ded processors. In our lab we have implemented this face detector on the Compaq iPaq handheld and have achieved detection at two frames per second (this device has a low power 200 mips Strong Arm processor which lacks floating point hardware).

一個很是快速的人臉檢測器有普遍實用性。這包括用戶界面,圖像數據庫,及電話會議。在不太須要高幀速率的應用中,咱們的系統可提供額外的重要後處理和分析。另外咱們的系統可以在各類低功率的小型設備上實現,包括手持設備和嵌入式處理器。在咱們實驗室咱們已經將該人臉檢測系統在Compaq公司的ipaq上實現,並達到了兩幀每秒的檢測率(該設備僅有200 MIPS的低功耗處理器,缺少浮點硬件)。

 

The remainder of the paper describes our contributions and a number of experimental results, including a detailed description of our experimental methodology.  Discussion of closely related work takes place at the end of each section.

本文接下來描述咱們的研究成果和一些實驗結果,包括咱們實驗方法學的詳盡描述。每章結尾會有對近似工做的討論。

 

2.  Features

 

Our object detection procedure classifies images based on the value of simple features.  There are many motivations for using features rather than the pixels directly. The most common reason is that features can act to encode ad-hoc domain knowledge that is difficult to learn using a finite quantity of training data.  For this system there is also a second critical motivation for features:  the feature based system operates much faster than a pixel-based system.

2.特徵

咱們的目標檢測程序是基於簡單的特徵值來分類圖像的。之因此選擇使用特徵而不是直接使用像素,主要是由於特徵能夠解決特定領域知識很難學會使用有限訓練資料的問題。對於這些系統來講,選擇使用特徵還有另一個重要緣由:基於特徵的系統的運行速度要遠比基於像素的快。

 

The simple features used are reminiscent of Haar basis functions which have been used by Papageorgiou et al. [10]. More specifically, we use three kinds of features. The value of a two-rectangle feature is the difference between the sum of the pixels within two rectangular regions.  The regions have the same size and shape and are horizontally or ver- tically adjacent (see Figure 1).  A three-rectangle feature computes the sum within two outside rectangles subtracted from the sum in a center rectangle. Finally a four-rectangle feature computes the difference between diagonal pairs of rectangles.

上述簡單特徵是基於Haar基本函數設置的,Papageorgiou等人已使用過[10]。而咱們則是更具體地選擇了特定的三類特徵。其中,雙矩形特徵的值定義爲兩個矩形區域裏像素和的差。而區域則具備相同尺寸和大小,而且水平或垂直相鄰(如圖1)。而三矩形特徵的值則是兩個外側矩形的像素和減去中間矩形的和所得的最終值。最後一個四矩形特徵的值是計算兩組對角線矩形的區別而得的。

 

Given that the base resolution of the detector is 24x24, the exhaustive set of rectangle features is quite large, over 180,000 . Note that unlike the Haar basis, the set of rectan- gle features is overcomplete1 .

檢測器的基本分辨率設定爲24×24,既而獲得數目巨大的矩形特徵的完備集,超過了180000。須要注意的是,矩形特徵的集合不像Haar基底,它是過完備1的。

 Figure 1: Example rectangle features shown relative to the enclosing detection window. The sum of the pixels which lie within the white rectangles are subtracted from the sum of pixels in the grey rectangles. Two-rectangle features are shown in (A) and (B). Figure (C) shows a three-rectangle feature, and (D) a four-rectangle feature.

矩形特徵能夠反映檢測窗口之間的聯繫。白色矩形框中的像素和減去灰色矩形框內的像素和獲得特徵值。(A)和(B)是矩形特徵。(C)是三矩形特徵。(D)是四矩形特徵。

圖 1

2.1. Integral Image

Rectangle features can be computed very rapidly using an intermediate representation for the image which we call the integral image.2The integral image at location x, y contains the sum of the pixels above and to the left of x, y , inclusive:

咱們採用一箇中間表示方法來計算圖像的矩形特徵,這裏稱爲積分圖像2。位置x,y上的積分圖像包含點x,y上邊和左邊的像素和,包括:



 

1 A complete basis has no linear dependence between basis elements and has the same number of elements as the image space, in this case 576. The full set of 180,000 thousand features is many times over-complete.

2 There is a close relation to 「summed area tables」 as used in graphics [3]. We choose a different name here in order to emphasize its use for the analysis of images, rather than for texture mapping.

 

1 一個完備基底在集元素之間沒有線性獨立,且數目和圖像空間的元素個數相等,這裏是576。在總數爲180,000的全集中,數千特徵是屢次過完備的。

 

2在圖形學中還有個近義詞稱爲「區域求和表」[3]。這裏咱們選擇一個不一樣名稱,是爲了便於讀者理解這是用來進行圖像處理,而不是紋理映射的。

 

 


 

Figure 2: The sum of the pixels within rectangle D can be computed with four array references. The value of the integral image at location 1 is the sum of the pixels in rectangle A. The value at location 2 is A+B , at location 3 is A+C, and at location 4 is A+B+C+D. The sum within D can be computed as 4+1-(2+3).

矩形D內的像素和能夠按四個數組計算。位置1的積分圖像的值就是矩形A中的像素之和。位置2的值是A+B,位置3的值是A+C,而位置4的值是A+B+C+D。那麼D中的像素和就是4+1-(2+3)。

圖 2

ii(x,y)是積分圖像,i(x,y)是原始圖像。可使用下列一對循環:

( 這裏S(x,y)是行累積和 S(x,-1)=0,ii(-1,y)=0 積分圖像能夠經過已知原始圖像而一步求得。

 

Using the integral image any rectangular sum can be computed in four array references (see Figure 2).  Clearly the difference between two rectangular sums can be computed in eight references. Since the two-rectangle features defined above involve adjacent rectangular sums they can be computed in six array references, eight in the case of the three-rectangle features, and nine for four-rectangle features.

使用積分圖像能夠把任意一個矩形用四個數組計算(見圖2)。顯然兩個矩形和之差能夠用八個數組。由於雙矩形特徵的定義是兩個相鄰矩形的和,因此僅用6個數組就能夠計算出結果。同理三矩形特徵用8個,四矩形特徵用9個。

 

2.2. Feature Discussion

Rectangle features are somewhat primitive when compared with alternatives such as steerable filters [5, 7]. Steerable fil- ters, and their relatives, are excellent for the detailed analy- sis of boundaries, image compression, and texture analysis. In contrast rectangle features, while sensitive to the pres- ence of edges, bars, and other simple image structure, are quite coarse.  Unlike steerable filters the only orientations available are vertical, horizontal, and diagonal. The set of rectangle features do however provide a rich image repre- sentation which supports effective learning. In conjunction with the integral image , the efficiency of the rectangle fea- ture set provides ample compensation for their limited flex- ibility.

2.2特徵討論

和一些類似方法,如導向濾波比較起來,矩形特徵看似有些原始[5,7]。導向濾波等相似方法,很是適合作對邊界的詳細分析,圖像壓縮,紋理分析。相比之下矩形特徵,對於邊緣,條紋,以及其餘簡單的圖像結構的敏感度,是至關粗糙的。不一樣於導向濾波,它僅有的有效位置就是垂直,水平和對角線。矩形特徵的設置作不過是提供了豐富的圖像表徵,支持有效的學習。與積分圖像一塊兒,矩形特徵的高效給它們有限的靈活性提供了極大補償。

 

3.  Learning Classification Functions

 

Given a feature set and a training set of positive and neg- ative images, any number of machine learning approaches could be used to learn a classification function. In our sys- tem a variant of AdaBoost is used both to select a small set of features and train the classifier [6]. In its original form, the AdaBoost learning algorithm is used to boost the clas- sification performance of a simple (sometimes called weak) learning algorithm. There are a number of formal guaran- tees provided by the AdaBoost learning procedure. Freund and Schapire proved that the training error of the strong classifier approaches zero exponentially in the number of rounds.  More importantly a number of results were later proved about generalization performance [14].   The key insight is that generalization performance is related to the margin of the examples, and that AdaBoost achieves large margins rapidly.

3.自學式分類功能

給定一個特徵集和一個包含正圖像和負圖像的訓練集,任何數量的機器學習方法能夠用來學習分類功能。在咱們的系統中,使用AdaBoost的一種變種來選擇小規模特徵集和調試分類器[6]。在其原來的形式中,這種AdaBoost自學式算法是用來提升一個簡單(有時稱爲弱式)自學式算法的。AdaBoost自學步驟提很多有效保證。Freund和Schapire證實,在至關數量的循環中,強分類器的調試偏差接近於零。更重要的是,最近至關數量的結果證實了關於泛化性能的優點[14]。其關鍵觀點是泛化性能與例子的邊界有關,而AdaBoost能迅速達到較大的邊界。

 

Recall that there are over 180,000 rectangle features as- sociated with each image sub-window, a number far larger than the number of pixels.  Even though each feature can be computed very efficiently, computing the complete set is prohibitively expensive. Our hypothesis, which is borne out by experiment, is that a very small number of these features can be combined to form an effective classifier. The main challenge is to find these features.

回想一下,有超過180,000個矩形特徵與每一個圖像子窗口有關,這個數字遠大過像素數。雖然每一個特徵的計算效率很是高,可是對整個集合進行計算卻花費高昂。而咱們的假說,已被實驗證明,能夠將極少數的特徵結合起來,造成有效的分類器。而主要挑戰是如何找到這些特徵。

 

In support of this goal, the weak learning algorithm is designed to select the single rectangle feature which best separates the positive and negative examples (this is similar to the approach of [2] in the domain of image database re- trieval). For each feature, the weak learner determines the optimal threshold classification function, such that the min- imum number of examples are misclassified. A weak clas- sifier hj(x) thus consists of a feature fj , a threshold θj and a parity pj indicating the direction of the inequality sign:

Here x is a 24x24 pixel sub-window of an image. See Ta- ble 1 for a summary of the boosting process.

爲實現這一目標,咱們設計弱學習算法,用來選擇使得正例和負例獲得最佳分離的單一矩形特徵(這是[2]中方法相似,在圖像數據庫檢索域)。對於每個特徵,弱學習者決定最優閾值分類功能,這樣可使錯誤分類的數目最小化。弱分類器hj(x)包括:特徵 fj,閾值 θj,和一個正負校驗 pj,即保證式子兩邊符號相同:


這裏是一個圖像中2424像素的子窗口。表1是優化過程的概述。

 

In practice no single feature can perform the classifica- tion task with low error. Features which are selected in early rounds of the boosting process had error rates between 0.1 and 0.3.  Features selected in later rounds, as the task be- comes more difficult, yield error rates between 0.4 and 0.5.

在實踐中沒有單個特徵能在低錯誤的條件下執行分類任務。在優化過程的循環初期中被選中的特徵錯誤率在0.1到0.3之間。在循環後期,因爲任務變得更難,所以被選擇的特徵偏差率在0.4和0.5之間。

 

3.1. Learning Discussion

Many general feature selection procedures have been pro- posed (see chapter 8 of [18] for a review). Our final appli- cation demanded a very aggressive approach which would discard the vast majority of features. For a similar recogni- tion problem Papageorgiou et al. proposed a scheme for fea- ture selection based on feature variance [10]. They demon- strated good results selecting 37 features out of a total 1734 features.

3.1自學習討論

許多通用的特徵選擇程序已經提出(見18]的第八章)。咱們的最終應用的方法要求是一個很是積極的,能拋棄絕大多數特徵的方法。對於相似的識別問題,Papageorgiou等人提出了一個基於特徵差別的特徵選擇計劃。他們從1734個特徵中選出37個特徵,實現了很好的結果。

 

Roth et al.   propose a feature selection process based on the Winnow exponential perceptron learning rule [11]. The Winnow learning process converges to a solution where many of these weights are zero. Nevertheless a very large number of features are retained (perhaps a few hundred or thousand).

Roth等人提出了一種基於winnow指數感知機學習規則的特徵選擇過程[11]。這種Winnow學習過程收斂了一個解決方法,其中有很多權重爲零。然而卻保留下來至關大一部分的特徵(也許有好幾百或幾千)。

 

Table  1:   The  AdaBoost  algorithm  for  classifier learning.  Each round of boosting selects one feature from the

180,000 potential features.

表1:關於自學式分類的Adaboost算法。每一個循環都在180,000個潛在特徵中選擇一個特徵。

 

3.2. Learning Results

 

While details on the training and performance of the final system are presented in Section 5, several simple results merit discussion.  Initial experiments demonstrated that a frontal face classifier constructed from 200 features yields a detection rate of 95% with a false positive rate of 1 in 14084. These results are compelling, but not sufficient for many real-world tasks. In terms of computation, this clas- sifier is probably faster than any other published system, requiring 0.7 seconds to scan an 384 by 288 pixel image. Unfortunately, the most straightforward technique for im- proving detection performance, adding features to the classifier, directly increases computation time.

3.2自學習結果

最終系統的詳細調試和執行將在第5節中介紹,如今對幾個簡單的結果進行討論。初步實驗證實,正面人臉分類器由200個特徵構造而成,正誤視率在14084中爲1,檢測率爲95%。這些結果是引人注目的,但對許多實際任務仍是不夠的。就計算而言,這個分類器可能比任何其餘公佈的系統更快,掃描由1個384乘288像素圖像僅須要0.7秒。不幸的是,若用這個最簡單的技術改善檢測性能,給分類器添加特徵,會直接增長計算時間。

 

For the task of face detection, the initial rectangle fea- tures selected by AdaBoost are meaningful and easily inter- preted. The first feature selected seems to focus on the prop- erty that the region of the eyes is often darker than the region of the nose and cheeks (see Figure 3).  This feature is rel- atively large in comparison with the detection sub-window, and should be somewhat insensitive to size and location of the face. The second feature selected relies on the property that the eyes are darker than the bridge of the nose.

對於人臉檢測的任務,由AdaBoost選擇的最初的矩形特徵是有意義的且容易理解。選定的第一個特徵的重點是眼睛區域每每比鼻子和臉頰區域更黑暗(見圖3)。此特徵的檢測子窗口相對較大,而且某種程度上不受面部大小和位置的影響。第二個特徵選擇依賴於眼睛的所在位置比鼻樑更暗

Figure 3:  The first and second features selected by Ad- aBoost. The two features are shown in the top row and then overlayed on a typical training face in the bottom row. The first feature measures the difference in intensity between the region of the eyes and a region across the upper cheeks. The feature capitalizes on the observation that the eye region is often darker than the cheeks. The second feature compares the intensities in the eye regions to the intensity across the bridge of the nose.

這兩個特色顯示在最上面一行,而後一個典型的調試面部疊加在底部一行。第一個特色,測量眼睛部區域和上臉頰地區的強烈程度的區別。該特徵利用了眼睛部區域每每比臉頰更暗。第二個特色比較了眼睛區域與鼻樑的強度。

 

4.  The Attentional Cascade

 

This section describes an algorithm for constructing a cas- cade of classifiers which achieves increased detection per- formance while radically reducing computation time. The key insight is that smaller, and therefore more efficient, boosted classifiers can be constructed which reject many of the negative sub-windows while detecting almost all posi- tive instances (i.e. the threshold of a boosted classifier can be adjusted so that the false negative rate is close to zero). Simpler classifiers are used to reject the majority of sub- windows before more complex classifiers are called upon to achieve low false positive rates.

4.注意力級聯

本章描述了構建級聯分類器的算法,它能增長檢測性能達從而從根本上減小計算時間。它的主要觀點是構建一種優化分類器,其規模越小就越高效。這種分類器在檢測幾乎全部都是正例時剔除許多負子窗口(即,優化分類器閾值能夠調整使得負誤視率接近零)。在調用較複雜的分類器以前,咱們使用相對簡單的分類器來剔除大多數子窗口,以實現低正誤視率。

 

The overall form of the detection process is that of a degenerate decision tree, what we call a 「cascade」 (see Fig- ure 4). A positive result from the first classifier triggers the evaluation of a second classifier which has also been ad- justed to achieve very high detection rates. A positive result from the second classifier triggers a third classifier, and so on. A negative outcome at any point leads to the immediate rejection of the sub-window.

在檢測過程當中,總體形式是一個退化決策樹,咱們稱之爲「級聯」(見圖4)。從第一個分類獲得的有效結果能觸發第二個分類器,也已調整至達到很是高的檢測率。再獲得一個有效結果使得第二個分類器觸發第三個分類器,以此類推。在任何一個點的錯誤結果都致使子窗口馬上被剔除。

 

Stages in the cascade are constructed by training clas- sifiers using AdaBoost and then adjusting the threshold to minimize false negatives.  Note that the default AdaBoost threshold is designed to yield a low error rate on the train- ing data. In general a lower threshold yields higher detection rates and higher false positive rates.

級聯階段的構成首先是利用AdaBoost訓練分類器,而後調整閾值使得負誤視最大限度地減小。注意,默認AdaBoost的閾值旨在數據過程當中產生低錯誤率。通常而言,一個較低的閾值會產生更高的檢測速率和更高的正誤視率。

 

Figure 4: Schematic depiction of a the detection cascade. A series of classifiers are applied to every sub-window. The initial classifier eliminates a large number of negative exam- ples with very little processing. Subsequent layers eliminate additional negatives but require additional computation. Af- ter several stages of processing the number of sub-windows have been reduced radically.  Further processing can take any form such as additional stages of the cascade (as in our detection system) or an alternative detection system.

一系列的分類器適用於每個子窗口。最初的分類器用不多的處理來消除大部分的負例。隨後的層次消除額外的負例,可是須要額外的計算。通過數個階段處理之後,子窗口的數量急劇減小。進一步的處理能夠採起任何形式,如額外的級聯階段(正如咱們的檢測系統中的)或者另外一個檢測系統。

 

For example an excellent first stage classifier can be con- structed from a two-feature strong classifier by reducing the threshold to minimize false negatives. Measured against a validation training set, the threshold can be adjusted to de- tect 100% of the faces with a false positive rate of 40%. See Figure 3 for a description of the two features used in this classifier.

 例如,一個兩特徵強分類器經過下降閾值,達到最小的負誤視後,能夠構成一個優秀的第一階段分類器。測量一個定的訓練集時,閾值能夠進行調整,最後達到100%的人臉檢測率和40%的正誤視率。圖3爲此分類器這兩個特徵的使用說明。

 

Computation of the two feature classifier amounts to about 60 microprocessor instructions.   It seems hard to imagine that any simpler filter could achieve higher rejec- tion rates.  By comparison, scanning a simple image tem- plate, or a single layer perceptron, would require at least 20 times as many operations per sub-window.

計算這兩個特徵分類器要使用大約60個微處理器指令。很難想象還會有其它任何簡單的濾波器能夠達到更高的剔除率。相比之下,一個簡單的圖像掃描模板,或單層感知器,將至少須要20倍於每一個子窗口的操做。

 

The  structure  of  the  cascade  reflects  the  fact  that within any single image an overwhelming majority of sub- windows are negative. As such, the cascade attempts to re- ject as many negatives as possible at the earliest stage pos- sible. While a positive instance will trigger the evaluation of every classifier in the cascade, this is an exceedingly rare event.

 該級聯結構反映了,在任何一個單一的圖像中,絕大多數的子窗口是無效的。所以,咱們的級聯試圖在儘量早的階段剔除儘量多的負例。雖然正例將觸發評估每個在級聯中的分類器,但這極其罕見。

 

Much like a  decision tree,  subsequent classifiers are trained using those examples which pass through all the previous stages.  As a result, the second classifier faces a more difficult task than the first. The examples which make it through the first stage are 「harder」 than typical exam- ples.  The more difficult examples faced by deeper classi- fiers push the entire receiver operating characteristic (ROC) curve downward. At a given detection rate, deeper classi- fiers have correspondingly higher false positive rates.

隨後的分類器就像一個決策樹,使用這些經過全部之前的階段例子進行訓練。所以,第二個分類器所面臨的任務比第一個更難。這些過第一階段的例子比典型例子更「難」。這些例子推進整個受試者工做特徵曲線(ROC)向下。在給定檢測率的狀況下,更深層次分類器有着相應較高的正誤視率。

 

4.1. Training a Cascade of Classifiers

The cascade training process involves two types of trade- offs.    In most cases classifiers with more features will achieve higher detection rates and lower false positive rates.At the same time classifiers with more features require more time to compute. In principle one could define an optimiza- tion framework in which: i) the number of classifier stages, ii) the number of features in each stage, and iii) the thresh- old of each stage, are traded off in order to minimize the expected number of evaluated features. Unfortunately find- ing this optimum is a tremendously difficult problem.

4.1 調試分類器級聯

級聯的調試過程包括兩個類型的權衡。在大多數狀況下具備更多的特徵分類器達到較高的檢測率和較低的正誤視率。同時具備更多的特徵的分類器須要更多的時間來計算。原則上能夠定義一個優化框架,其中:一)分級級數,二)在每一個階段的特徵數目,三)每一個階段爲最小化預計數量評價功能而進行的門限值交換。不幸的是,發現這個最佳方案是一個很是困難的問題。

 

In practice a very simple framework is used to produce an effective classifier which is highly efficient. Each stage in the cascade reduces the false positive rate and decreases the detection rate.  A target is selected for the minimum reduction in false positives and the maximum decrease in detection. Each stage is trained by adding features until the target detection and false positives rates are met ( these rates are determined by testing the detector on a validation set). Stages are added until the overall target for false positive and detection rate is met.

在實踐中用一個很是簡單的框架產生一個有效的高效分類器。級聯中的每一個階段下降了正誤視率而且減少了檢測率。如今的目標旨在最小化正誤視率和最大化檢測率。調試每一個階段,不斷增長特徵,直到檢測率和正誤視率的目標實現(這些比率是經過將探測器在驗證設置上測試而得的)。同時添加階段,直到整體目標的正誤視和檢測率獲得知足爲止。

 

4.2. Detector Cascade Discussion

The complete face detection cascade has 38 stages with over 6000 features. Nevertheless the cascade structure results in fast average detection times.  On a difficult dataset, con- taining 507 faces and 75 million sub-windows, faces are detected using an average of 10 feature evaluations per sub- window. In comparison, this system is about 15 times faster than an implementation of the detection system constructed by Rowley et al.3 [12]

4.2 探測器級聯的探討

完整的人臉檢測級聯已經有擁有超過6000個特徵的38個階段。儘管如此,級聯結構仍是可以縮短平均檢測時間。在一個複雜的包含507張人臉和7500萬個子窗口的數據集中,人臉在檢測時是每一個子窗口由平均10個特徵來評估。相比之下,本系統的速度是由羅利等人3[12]構建的檢測系統的15倍。

 

A notion similar to the cascade appears in the face de- tection system described by Rowley et al. in which two de- tection networks are used [12]. Rowley et al. used a faster yet less accurate network to prescreen the image in order to find candidate regions for a slower more accurate network. Though it is difficult to determine exactly, it appears that Rowley et al.’s two network face system is the fastest existing face detector.4

由Rowley等人描述的一個相似於級聯的概念出現人臉檢測系統中。在這個系統中他們使用了兩個檢測網絡。Rowley等人用更快但相對不許確的網絡,以先篩選圖像,這樣作是爲了使較慢但更準確的網絡找到候選區域。雖然這很難準確判斷,可是Rowley等人的雙網絡系統,是目前速度最快的臉部探測器。4

 

The structure of the cascaded detection process is es- sentially that of a degenerate decision tree, and as such is related to the work of Amit and Geman [1].  Unlike tech- niques which use a fixed detector, Amit and Geman propose an alternative point of view where unusual co-occurrences of simple image features are used to trigger the evaluation of a more complex detection process. In this way the full detection process need not be evaluated at many of the po- tential image locations and scales. While this basic insight is very valuable, in their implementation it is necessary to first evaluate some feature detector at every location. These features are then grouped to find unusual co-occurrences. In practice, since the form of our detector and the features that it uses are extremely efficient, the amortized cost of evalu- ating our detector at every scale and location is much faster than finding and grouping edges throughout the image.

在檢測過程當中的級聯結構基本上是退化決策樹,所以是涉及到了Amit和Geman[1]的工做。,Amit和Geman建議再也不使用固定一個探測器的技術,而他們提出一個不尋常的合做同現,即簡單的圖像特徵用於觸發評價一個更爲複雜的檢測過程。這樣,完整的檢測過程當中不須要對潛在的圖像位置和範圍進行估計。然而這種基本的觀點很是有價值,在它們的執行過程當中,必需要對每個位置的某些功能檢測首先進行估計。這些特徵被歸類,以用於找到不尋常的合做。在實踐中,因爲咱們的檢測器的形式,它的使用很是高效,用於評估咱們在每一個探測器的規模和位置的成本消耗比尋找和分組整個圖像邊緣快不少。

 

In recent work Fleuret and Geman have presented a face detection technique which relies on a 「chain」 of tests in or- der to signify the presence of a face at a particular scale and location [4]. The image properties measured by Fleuret and Geman, disjunctions of fine scale edges, are quite different than rectangle features which are simple, exist at all scales, and are somewhat interpretable. The two approaches also differ radically in their learning philosophy. The motivation for Fleuret and Geman’s learning process is density estima- tion and density discrimination, while our detector is purely discriminative. Finally the false positive rate of Fleuret and Geman’s approach appears to be higher than that of previ- ous approaches like Rowley et al. and this approach. Un- fortunately the paper does not report quantitative results of this kind. The included example images each have between 2 and 10 false positives.

在最近的工做中Fleuret和Geman已經提交了一種人臉檢測技術,它以「鏈測試」爲主調,用來表示在某一特定範圍和位置人臉是否存在[4]。由Fleuret和Geman測量的圖像屬性,細尺度邊界的分離,與簡單、存在於全部尺度且某種程度可辨別的矩陣特徵有很大的不一樣。這兩種方法的基本原理也存在根本上的差別。Fleuret和Geman的學習過程的目的是密度估計和密度辨別,而咱們的探測器是單純的辨別。最後,Fleuret和Geman的方法中的正誤視率彷佛也比之前的如Rowley等人的方法中的更高。不幸的是,這種辦法在文章中並無定量分析結果。圖像所包含的每一個例子都有2到10個正誤視。

 

5    Results

 

A 38 layer cascaded classifier was trained to detect frontal upright faces. To train the detector, a set of face and non- face training images were used. The face training set con- sisted of 4916 hand labeled faces scaled and aligned to a base resolution of 24 by 24 pixels.   The faces were ex- tracted from images downloaded during a random crawl of the world wide web. Some typical face examples are shown in Figure 5.  The non-face subwindows used to train the detector come from 9544 images which were manually in- spected and found to not contain any faces. There are about 350 million subwindows within these non-face images.

5.實驗結果

咱們訓練一個38層級聯分類器,用來檢測正面直立人臉。爲了訓練分類器,咱們使用了一系列包含人臉和不包含人臉的圖片。人臉訓練集由4916個手標人臉組成,都縮放和對齊成24×24像素的基本塊。提取人臉的圖片是在使用隨機爬蟲在萬維網上下載。一些典型人臉例子如圖5所示。訓練檢測器的沒有人臉的子窗口來自9544張圖片,都已經進行人工檢查,肯定不包含任何人臉。在這些沒有人臉的圖片中,子窗口共有大概3.5億個。

 

The number of features in the first five layers of the de- tector is 1, 10, 25, 25 and 50 features respectively.  The remaining layers have increasingly more features. The total number of features in all layers is 6061.

在開始五層檢測器中特徵的數量分別爲一、十、2五、25和50。剩下的各層包含的特徵數量急劇增多。特徵總數是6061個。

 

Each classifier in the cascade was trained with the 4916 training faces (plus their vertical mirror images for a total of 9832 training faces) and 10,000 non-face sub-windows (also of size 24 by 24 pixels) using the Adaboost training procedure.  For the initial one feature classifier, the non- face training examples were collected by selecting random sub-windows from a set of 9544 images which did not con- tain faces. The non-face examples used to train subsequent layers were obtained by scanning the partial cascade across the non-face images and collecting false positives. A max- imum of 10000 such non-face sub-windows were collected for each layer.

在級聯中的每一個分類器都通過4916個受訓人臉(加上它們的垂直鏡像,一共有9832個受訓人臉)和10000個無人臉的子窗口(一樣它們的尺寸都是24×24),使用自適應加強訓練程序訓練。對於最初的含一個特徵的分類器,無人臉訓練實例從一系列9544張沒有人臉的圖片中隨機選擇出子窗口。用來訓練隨後的層的沒有人臉實例是經過掃描部分級聯的無人臉圖像以及收集正誤視率而得的。每一層收集的像這樣無人臉的子窗口的最大值是10000。

 

Figure 5: Example of frontal upright face images used for training

 

Speed of the Final Detector

The speed of the cascaded detector is directly related to the number of features evaluated per scanned sub-window. Evaluated on the MIT+CMU test set [12], an average of 10 features out of a total of 6061 are evaluated per sub-window. This is possible because a large majority of sub-windows are rejected by the first or second layer in the cascade. On a 700 Mhz Pentium III processor, the face detector can pro- cess a 384 by 288 pixel image in about .067 seconds (us- ing a starting scale of 1.25 and a step size of 1.5 described below).  This is roughly 15 times faster than the Rowley- Baluja-Kanade detector [12] and about 600 times faster than the Schneiderman-Kanade detector [15].

最終檢測器的速度

級聯的檢測器的速度是和在每次掃描子窗口中評估的特徵數目有直接影響的。在MIT+CMU測試集的評估中[12],平均6061個特徵中有10個特徵被挑出,評估每個子窗口。這並不是不可能,由於有大量子窗口被級聯的第一層和第二層剔除。在700兆赫的奔騰3處理器上,該人臉檢測能夠約0.67秒的速度處理一幅384×288像素大小的圖像(使用)。這個大概是Rowley-Baluja-Kanade檢測器[12]的速度的15倍,是Schneiderman- Kanade檢測器[15]速度的約600倍。

 

Image Processing

All example sub-windows used for training were vari- ance normalized to minimize the effect of different light- ing conditions. Normalization is therefore necessary during detection as well.  The variance of an image sub-window can be computed quickly using a pair of integral images. Recall that  , where    is the standard deviation,      is the mean, and     is the pixel value within the sub-window. The mean of a sub-window can be com- puted using the integral image. The sum of squared pixels is computed using an integral image of the image squared (i.e. two integral images are used in the scanning process). During scanning the effect of image normalization can be achieved by post-multiplying the feature values rather than pre-multiplying the pixels.

圖像處理

全部用來訓練的子窗口實例都通過方差標準化達到最小值,儘可能減小不一樣光照條件的影響。所以,在檢測中也必須規範化。一個圖像子窗口的方差可使用一對積分圖像快速計算。回憶,此處是標準差,是均值,而是在子窗口中的像素值。子窗口的均值能夠由積分圖像計算得出。像素的平方和能夠由一個圖像的積分圖像的平方得出(即,兩個積分圖像在掃描進程中使用)。在掃描圖像中,圖像的規範化能夠經過後乘以特徵值達到,而不是預先乘以像素值。

 

Scanning the Detector

The final detector is scanned across the image at multi- ple scales and locations. Scaling is achieved by scaling the detector itself, rather than scaling the image. This process makes sense because the features can be evaluated at any scale with the same cost. Good results were obtained using a set of scales a factor of 1.25 apart.

掃描檢測器

掃描最終檢測器在多尺度和定位下對圖像進行掃描。尺度縮放更可能是由縮放檢測器自身而不是縮放圖像獲得。這個進程的意義在於特徵能夠在任意尺度下評估。使用1.25的間隔的能夠獲得良好結果。

 

The detector is also scanned across location. Subsequent locations are obtained by shifting the window some number of pixels Δ. This shifting process is affected by the scale of the detector: if the current scale is S the window is shifted by [SΔ] , where [] is the rounding operation.

檢測器也根據定位掃描。後續位置的得到是經過將窗口平移⊿個像素得到的。這個平移程序受檢測器的尺度影響:若當前尺度是s,窗口將移動[s⊿],這裏[]是指湊整操做。

 

The choice of Δ affects both the speed of the detector as well as accuracy. The results we present are for  Δ = 1.0 . We can achieve a significant speedup by setting Δ = 1.5 with only a slight decrease in accuracy.

⊿的選擇不只影響到檢測器的速度還影響到檢測精度。咱們展現的結果是取了⊿=1.0。經過設定⊿=1.5,咱們實現一個有意義的加速,而精度只有微弱下降。

 

Integration of Multiple Detections

Since the final detector is insensitive to small changes in translation and scale, multiple detections will usually occur around each face in a scanned image. The same is often true of some types of false positives. In practice it often makes sense to return one final detection per face. Toward this end it is useful to postprocess the detected sub-windows in order to combine overlapping detections into a single detection.

多檢測的整合

由於最終檢測器對於傳遞和掃描中的微小變化都很敏感,在一幅掃描圖像中每一個人臉一般會獲得多檢測結果,一些類型的正誤視率也是如此。在實際應用中每一個人臉返回一個最終檢測結果才顯得比較有意義。

 

In these experiments detections are combined in a very simple fashion.  The set of detections are first partitioned into disjoint subsets. Two detections are in the same subset if their bounding regions overlap.  Each partition yields a single final detection.  The corners of the final bounding region are the average of the corners of all detections in the set.

在這些試驗中,咱們用很是簡便的模式合併檢測結果。首先把一系列檢測分割成許多不相交的子集。若兩個檢測結果的邊界區重疊了,那麼它們就是相同子集的。每一個部分產生單個最終檢測結果。最後的邊界區的角落定義爲一個集合中全部檢測結果的角落平均值。

 

Experiments on a Real-World Test Set

We tested our system on the MIT+CMU frontal face test set [12]. This set consists of 130 images with 507 labeled frontal faces. A ROC curve showing the performance of our detector on this test set is shown in Figure 6. To create the ROC curve the threshold of the final layer classifier is adjusted from -∞ to +∞ .  Adjusting the threshold to +∞ will yield a detection rate of 0.0 and a false positive rate of 0.0. Adjusting the threshold to -∞ , however, increases both the detection rate and false positive rate, but only to a certain point. Neither rate can be higher than the rate of the detection cascade minus the final layer. In effect, a threshold of -∞ is equivalent to removing that layer.  Further increasing the detection and false positive rates requires decreasing the threshold of the next classifier in the cascade.Thus, in order to construct a complete ROC curve, classifier layers are removed. We use the number of false positives as opposed to the rate of false positives for the x-axis of the ROC curve to facilitate comparison with other systems. To compute the false positive rate, simply divide by the total number of sub-windows scanned.  In our experiments, the number of sub-windows scanned is 75,081,800.

在現實測試集中實驗

咱們在MIT+CMU正面人臉測試集[12]上對系統進行測試。這個集合由130幅圖像組成,共有507個標記好的正面人臉。圖6是一個ROC曲線,顯示在該測試集上運行的檢測器的性能。其中末層分類器的閾值設置爲從—∞到+∞。當調節閾值趨近+∞時,檢測率趨於0.0,正誤視率也趨於0.0。而當調節閾值趨近—∞時,檢測率和正誤視率都增加了,但最終會趨向一個恆值。速率最高的就是級聯中末層的。實際上,閾值趨近—∞就等價於移走這一層。要想獲得檢測率和正誤視率更多的增加,就須要減少下一級分類器的閾值。所以,爲了構建一個完整的ROC曲線,咱們將分類器層數移走了。爲了方便與其它系統比較,咱們使用正誤視的數目而不是正誤視機率做爲座標的x軸爲了計算正誤視率,簡單將掃描的子窗口總數與之相除便可。在咱們的實驗中,掃描過的子窗口總數達到了75,081,800。

 

Unfortunately, most previous published results on face detection have only included a single operating regime (i.e. single point on the ROC curve). To make comparison with our detector easier we have listed our detection rate for the false positive rates reported by the other systems. Table 2 lists the detection rate for various numbers of false detec- tions for our system as well as other published systems. For the Rowley-Baluja-Kanade results [12], a number of differ- ent versions of their detector were tested yielding a number of different results they are all listed in under the same head- ing. For the Roth-Yang-Ahuja detector [11], they reported their result on the MIT+CMU test set minus 5 images containing line drawn faces removed.

不幸的是,大多數人臉檢測的先前已公佈的結果僅有單一操做制度(即,ROC曲線上的單一點)。爲了使之與咱們的檢測器更容易進行比較,咱們將咱們系統在由其它系統測出的正誤視率下的檢測率進行列表。表2列出了咱們的系統和其它已公佈系統的不一樣數目錯誤檢測結果下的檢測率。對Rowley-Baluja-Kanade的結論[12],咱們對他們的一些不一樣版本的檢測器進行測試,產生一些不一樣結果,都列在同一標題下。Roth-Yang-Ahuja[11]檢測器的結果的5幅圖像包括線繪人臉被移除了。

 

 

Figure 6:   ROC curve for our face detector on the MIT+CMU test set. The detector was run using a step size of 1.0 and starting scale of 1.0 (75,081,800 sub-windows scanned).

圖 6 檢測器在MIT+CMU測試集上的ROC曲線

 

Figure 7 shows the output of our face detector on some test images from the MIT+CMU test set.

圖7則展現了對於一些來自MIT+CMU測試集中的測試圖片,咱們的人臉檢測器的輸出結果。

 

 

Figure 7: Output of our face detector on a number of test images from the MIT+CMU test set.

圖7:咱們的人臉檢測器的輸出結果,在數個來自MIT+CMU測試集的測試圖像上

 

A simple voting scheme to further improve results

In table 2 we also show results from running three de- tectors (the 38 layer one described above plus two similarly trained detectors) and outputting the majority vote of the three detectors. This improves the detection rate as well as eliminating more false positives. The improvement would be greater if the detectors were more independent. The cor- relation of their errors results in a modest improvement over the best single detector.

簡易完善計劃

在表2咱們也顯示了運行三個檢測器的結果(一個本文描述的38層檢測器加上兩個相似受訓檢測器)。在提升檢測率的同時也消除不少正誤視率,且隨檢測器獨立性加強而提升。因爲它們之間存在偏差,因此對於最佳的單一檢測器,檢測率是有一個適度提升。

Table 2: Detection rates for various numbers of false positives on the MIT+CMU test set containing 130 images and 507 faces.

表2:不一樣正誤視率下的檢測率,MIT+CMU測試集,包含130幅圖像和507我的臉

 

6    Conclusions

 

We have presented an approach for object detection which minimizes computation time while achieving high detection accuracy.  The approach was used to construct a face de- tection system which is approximately 15 faster than any previous approach.

6.結論

咱們展現了一個目標檢測的方法,既能使計算時間最小化,又能達到高檢測精度。這個用該方法構建的一我的臉檢測系統,達到檢測速度約是以往方法的15倍。

 

This paper brings together new algorithms, representa- tions, and insights which are quite generic and may well have broader application in computer vision and image pro- cessing.

本文結合了十分通用的新算法、表徵和概念,可能會在機器視覺和圖像處理方面實現普遍應用。

 

Finally this paper presents a set of detailed experiments on a difficult face detection dataset which has been widely studied. This dataset includes faces under a very wide range of conditions including: illumination, scale, pose, and cam- era variation.  Experiments on such a large and complex dataset are difficult and time consuming. Nevertheless sys- tems which work under these conditions are unlikely to be brittle or limited to a single set of conditions. More impor- tantly conclusions drawn from this dataset are unlikely to be experimental artifacts.

本文最後展現了的一系列詳細的實驗,是在一個已獲得普遍研究的複雜人臉檢測數據庫中進行的。這個數據庫中的人臉各式各樣條件都普遍不一樣:照明、規模、構成及相機的變化。在這樣一個龐大繁雜的數據庫中實驗難度很大,且十分耗時。然而,在這樣的條件下工做的系統不易損壞或者受限於單一條件。從該數據庫中取得的更多重要結論,都不多是實驗的人爲產物。

 

References

 參考文獻

[1]  Y. Amit, D. Geman, and K. Wilder. Joint induction of shape features and tree classifi ers, 1997.

[2]  Anonymous. Anonymous. In Anonymous, 2000.


[3]  F. Crow.    Summed-area tables for texture mapping.    In

Proceedings of SIGGRAPH, volume 18(3), pages 207–212,

1984.

[4]  F. Fleuret and D. Geman. Coarse-to-fi ne face detection. Int.

J. Computer Vision, 2001.

[5]  William T. Freeman and Edward H. Adelson.  The design and use of steerable fi lters.  IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(9):891–906, 1991.

[6]  Yoav Freund and Robert E. Schapire.  A decision-theoretic generalization of on-line learning and an application to boosting. In Computational Learning Theory: Eurocolt ’95, pages 23–37. Springer-Verlag, 1995.

[7]  H. Greenspan, S. Belongie, R. Gooodman, P. Perona, S. Rak- shit, and C. Anderson. Overcomplete steerable pyramid fi l- ters and rotation invariance. In Proceedings of the IEEE Con- ference on Computer Vision and Pattern Recognition, 1994.

[8]  L. Itti, C. Koch, and E. Niebur.  A model of saliency-based visual attention for rapid scene analysis.  IEEE Patt. Anal. Mach. Intell., 20(11):1254–1259, November 1998.

[9]  Edgar Osuna, Robert Freund, and Federico Girosi. Training support vector machines:  an application to face detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1997.

[10]  C. Papageorgiou, M. Oren, and T. Poggio. A general frame- work for object detection.  In International Conference on Computer Vision, 1998.

[11]  D. Roth, M. Yang, and N. Ahuja. A snowbased face detector.

In Neural Information Processing 12, 2000.

[12]  H. Rowley, S. Baluja, and T. Kanade. Neural network-based face detection. In IEEE Patt. Anal. Mach. Intell., volume 20, pages 22–38, 1998.

[13]  R. E. Schapire, Y. Freund, P. Bartlett, and W. S. Lee. Boost- ing the margin: a new explanation for the effectiveness of voting methods. Ann. Stat., 26(5):1651–1686, 1998.

[14]  Robert  E.  Schapire,  Yoav  Freund,  Peter  Bartlett,  and Wee Sun Lee. Boosting the margin: A new explanation for the effectiveness of voting methods.  In Proceedings of the Fourteenth International Conference on Machine Learning,

1997.

[15]  H. Schneiderman and T. Kanade. A statistical method for 3D object detection applied to faces and cars.  In International Conference on Computer Vision, 2000.

 

[16]  K. Sung and T. Poggio.  Example-based learning for view- based face detection. In IEEE Patt. Anal. Mach. Intell., vol- ume 20, pages 39–51, 1998.

[17]  J.K. Tsotsos, S.M. Culhane, W.Y.K. Wai, Y.H. Lai, N. Davis, and F. Nuflo.  Modeling visual-attention via selective tun- ing. Artifi cial Intelligence Journal, 78(1-2):507–545, Octo- ber 1995.

[18]  Andrew Webb. Statistical Pattern Recognition. Oxford Uni- versity Press, New York, 1999.

 

 Matlab implementation Viola Jones Detection

相關文章
相關標籤/搜索