关键词不能为空

当前您在: 主页 > 英语 >

人脸识别英文原文(可供中文翻译)

作者:高考题库网
来源:https://www.bjmy2z.cn/gaokao
2021-02-06 00:58
tags:

-

2021年2月6日发(作者:瓦斯)



Rowley, Baluja, and Kanade: Neural Network-Based Face Detection (PAMI, January 1998)


1


Copyright


1998


IEEE.


Personal


use


of


this


material


is


permitted.



However,


permission to


reprint/republish this


material


for


advertising or


promotional pur- poses or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of


this work in other works must be obtained from the IEEE.



Neural Network-Based Face Detection




Henry



A.



Rowley,



Shumeet



Baluja,



and



Takeo



Kanade










Abstract





Rowley, Baluja, and Kanade: Neural Network-Based Face Detection (PAMI, January 1998)


2


We present a neural network-based upright frontal face detection system. A retinally con- nected


neural network examines small windows of an image, and decides whether each win- dow contains


a


face.


The


system


arbitrates


between multiple


networks to


improve


performance


over


a


single


network.


We present a straightforward procedure for aligning positive face ex- amples for training.


To


collect negative examples,


we use a


bootstrap algorithm, which


adds false detections into the


training set as training progresses. This eliminates the difficult task of manually selecting nonface


training


examples,


which


must


be


chosen


to


span


the


entire


space


of


nonface


images.


Simple


heuristics,


such


as


using


the


fact


that


faces


rarely


overlap


in


images,


can


further


improve


the


accuracy.


Comparisons with several other state- of-the-art face detec- tion systems are presented;


showing that our system has comparable performance in terms of detection and false- positive rates.



Keywords:


Face detection, Pattern recognition, Computer vision, Artificial neural networks,


Ma- chine learning




1




Introduction



In this paper, we present a neural network-based algorithm to detect upright, frontal views


of


faces


in


gray-scale


images


1


.


The


algorithm


works


by


applying


one


or


more


neural


networks


directly


to


portions


of


the


input


image,


and


arbitrating


their


results.



Each


network


is


trained


to


output


the


presence


or


absence


of


a


face.



The


algorithms


and


training methods are designed to be general, with little customization for faces.



Rowley, Baluja, and Kanade: Neural Network-Based Face Detection (PAMI, January 1998)


3


Many


face


detection


researchers


have


used


the


idea


that


facial


images


can


be


characterized directly in terms of pixel intensities. These images can be characterized by


probabilistic models of the set of face images [4, 13, 15], or implicitly by neural networks or


other mechanisms [3, 12, 14,


19, 21, 23, 25, 26]. The parameters for these models are adjusted either automatically from


example


images (as in our work) or by hand.


A few authors have taken the approach of extracting


features


and


applying


either


manually


or


automatically


generated


rules


for


evaluating


these features [7,


1


1].


Training


a


neural


network


for


the


face


detection


task


is


challenging


because


of


the


difficulty in characterizing prototypical


“non


f


ace”


images.


Unlike face


recognition


, in which


the classes to be discriminated are different faces, the two classes to be discriminated in


face


detection


are



images


containing


faces




and



images


not


containing


f


aces”


.


It


is


easy


to


get


a


representative


sample


of


images


which


contain


faces,


but


much


harder


to


get


a


representative sample of those which do not. We avoid the problem of using a huge training


set for nonfaces by selectively adding images to the



Rowley, Baluja, and Kanade: Neural Network-Based Face Detection (PAMI, January 1998)


4


training


set


as


training


progresses


[21].


This


< p>
bootstra


p”



method


reduces


the


size


of


the


training


set


needed.


The


use


of


arbitration


between


multiple


networks


and


heuristics to clean up the results significantly improves the accuracy of the detector.


Detailed


descriptions


of


the


example


collection


and


training


methods,


network


architecture,


and arbitration methods are


given


in Section 2. In


Section 3,


the performance of the


system


is


examined.



We


find


that


the


system


is


able


to


detect


90.5%


of


the


faces


over


a


test


set


of


130 complex


images,


with


an


acceptable


number


of


false


positives.


Section


4


briefly discusses


some techniques that can be used to make the system run


faster,


and


Section


5


compares


this


system


with


similar


systems.


Conclusions


and


directions for future research are presented in Section 6.




2




Description of the System



Our system operates in two stages: it first applies a set of neural network-based filters


to an


image, and then uses an arbitrator to combine the outputs.


The filters examine


each location in the image at several scales, looking for locations that might contain a


face.


The


arbitrator


then


merges


detections


from


individual


filters


and


eliminates


overlapping detections.




Rowley, Baluja, and Kanade: Neural Network-Based Face Detection (PAMI, January 1998)


5



2.1



Stage One: A Neural Network-Based


Filter




The


first


component


of


our


system


is


a


filter


that


receives


as


input


a


20x20


pixel


region


of


the


image,


and


generates


an


output


ranging


from


1


to


-1,


signifying


the


presence or absence of a face, respectively. To detect faces anywhere in the input, the


filter is applied at every location in the image.


To detect faces larger than the window


size, the input image is repeatedly reduced in size (by subsampling),


and


the filter is


applied at each size. This filter must have some invariance to position and scale. The


amount of invariance determines the number of scales and positions at which it must


be applied.


For the work presented here, we apply the filter at every pixel position in


the image, and scale the image down by a factor of 1.2 for each step in the pyramid.


The


filtering


algorithm


is


shown


in


Fig.


1.



First,


a


preprocessing


step,


adapted


from


[21], is



Rowley, Baluja, and Kanade: Neural Network-Based Face Detection (PAMI, January 1998)


6


applied


to


a


window


of


the


image.


The


window


is


then


passed


through


a


neural


network,


which


decides


whether


the


window


contains


a


face.



The


preprocessing


first attempts


to


equalize


the intensity values in across the window.


We fit a function


which


varies


linearly


across


the


window


to


the


intensity


values


in


an


oval


region


inside


the


window.



Pixels


outside


the


oval


(shown


in


Fig.


2a)


may


represent


the


background,


so


those


intensity


values


are


ignored


in


computing


the


lighting


variation across the face.


The linear function will approximate the overall brightness


of each


part


of


the


window,


and


can


be


subtracted


from


the


window


to


compensate


for


a


variety of lighting conditions. Then histogram equalization is performed, which


non-linearly


maps


the


intensity


values


to


expand


the


range


of


intensities


in


the


window.


The histogram is computed for pixels inside an oval region in the window. This


compensates


for


differences


in


camera


input


gains,


as


well


as


improving


contrast


in


some cases. The preprocessing steps are shown in Fig. 2.




Rowley, Baluja, and Kanade: Neural Network-Based Face Detection (PAMI, January 1998)


7


The


preprocessed


window


is


then


passed


through


a


neural


network.


The


network


has retinal


connections to its input layer; the receptive fields of hidden units are shown in Fig. 1.


There


are


three


types


of


hidden


units:


4


which


look


at


10x10


pixel


subregions,


16


which


look


at


5x5


pixel


subregions,


and


6


which


look


at


overlapping


20x5


pixel


horizontal stripes of pixels.


Each of these types


was


chosen


to


allow the


hidden


units


to


detect


local


features


that


might


be


important


forface


detection.


In


particular, the


horizontal stripes allow the hidden units to detect such features as mouths or pairs of


eyes, while the hidden units with square receptive fields might detect features such as


individual


eyes,


the


nose,


or


corners


of


the


mouth.



Although


the


figure


shows


a


single hidden unit for each


subregion of the input, these units can be replicated.


For


the experiments which are described later, we use networks with two and three sets of


these


hidden


units.


Similar


input


connection


patterns


are


commonly


used


in


speech


and character recognition tasks [10, 24]. The network has a single, real-valued output,


which indicates whether or not the window contains a face.


Examples of output from a single network are shown in Fig. 3.


In the figure, each box


represents


the


position


and


size


of


a


window


to


which


the


neural


network


gave


a


positive


response.



The


network


has


some


invariance


to


position


and


scale,


which


results


in


multiple boxes around some faces. Note also that there are some false detections; they


will be eliminated by methods presented in Section 2.2.



Rowley, Baluja, and Kanade: Neural Network-Based Face Detection (PAMI, January 1998)


8



TTo train the


neural network used in stage


one


to serve as an accurate filter, a


large


number of


face and nonface images are needed. Nearly 1050 face examples were gathered from face


databases


at


CMU,


Harvard


2


,


and


from


the


World


Wide


Web.


The


images


contained


faces of various sizes, orientations, positions, and intensities. The eyes, tip of nose, and


corners and center of the mouth of each face were labelled manually.


hese points were


used to normalize each face to the same scale, orientation, and position, as follows:


1.


Initialize


, a vector which will be the average positions of each labelled feature over




all the faces, with the feature locations in the first face


F


.


2.


The feature coordinates in


are rotated, translated, and scaled,


so that the average


locations of the eyes will appear at predetermined locations in a 20x20 pixel window.



3.


For each face


i


, compute the best rotation, translation, and scaling to align the face



s


features




Rowley, Baluja, and Kanade: Neural Network-Based Face Detection (PAMI, January 1998)


with the average feature locations



linear



9


.



Such


transformations


can


be


written


as


a


F


function


of


their


parameters.


Thus,


we


can


write


a


system


of


linear


equations


mapping the



features from


F


to



.


The


least


squares


solution


to


this


over-constrained


system


yields the



p


arameters for the best alignment transformation. Call the aligned feature locations


F


.





Rowley, Baluja, and Kanade: Neural Network-Based Face Detection (PAMI, January 1998)


10


4.


Update


by averaging the aligned feature locations



for each face .



5.


Go to step 2.



The


alignment


algorithm


converges


within


five


iterations,


yielding


for


each


face


a


function which


maps that face to a


20x20


pixel


window.


Fifteen face


examples are


generated for the


training set



from each original image, by randomly rotating the images (about their center points)


up to


10


,





scaling


between


90%


and


110%,


translating


up


to


half


a


pixel,


and


mirroring.


Each


20x20


window


in


the


set


is


then


preprocessed


(by


applying


lighting


correction


and


histogram equalization). A few example images are shown in Fig. 4.


The randomization


gives


the


filter


invariance


to


translations


of


less


than


a


pixel


and


scalings


of


20%.


Larger


changes


in translation


and


scale


are


dealt


with by


applying


the


filter at


every


pixel position in an image pyramid, in which the images are scaled by factors of 1.2.


Practically


any


image


can


serve


as


a


nonface


example


because


the


space


of


nonface


images is


much larger than the space of face images. However, collecting a


< br>representativ


e”


set of


nonfaces



Rowley, Baluja, and Kanade: Neural Network-Based Face Detection (PAMI, January 1998)


11


is


difficult. Instead of


collecting the images before training is started,


the images are


collected during training, in the following manner, adapted from [21]:


1.


Create


an


initial


set


of


nonface


images


by


generating


1000


random


images.


Apply


the pre- processing steps to each of these images.


2.


Train a neural network to produce an output of 1 for the face examples, and -1 for the


nonface


examples.


The


training


algorithm


is


standard


error


backpropogation


with


momentum [8]. On the first iteration of this loop, the network



s weights are initialized


randomly.


After


the


first


iteration,


we


use


the


weights


computed


by


training


in


the


previous iteration as the starting point.


3.


Run the system on an image of scenery


which contains no faces


. Collect subimages in


which the network incorrectly identifies a face (an output activation ).


4.


Select up to 250 of these subimages at random, apply the preprocessing steps, and


add them into the training set as negative examples. Go to step 2.


Some


examples


of


nonfaces


that


are


collected


during


training


are


shown


in


Fig.


5.


Note


that


some


of


the


examples


resemble


faces,


although


they


are


not


very


close


to


the


positive


examples


shown


in


Fig.


4.



The


presence


of


these


examples


forces


the


neural


network


to


learn


the


precise


boundary between face and nonface images.


We


used 120 images of scenery for collecting negative examples


in the


bootstrap manner


described above.


A typical training run selects approximately


8000 nonface images from the 146,212,178 subimages that are available at all locations


and scales



Rowley, Baluja, and Kanade: Neural Network-Based Face Detection (PAMI, January 1998)


12


in


the


training


scenery


images.


A


similar


training


algorithm


was


described


in


[5],


where


at


each


iteration


an


entirely


new


network


was


trained


with


the


examples


on


which the previous networks had made mistakes.



2.2



Stage Two: Merging Overlapping Detections and Arbitration



The examples in Fig. 3 showed that the raw output from a single network will contain a


number


of


false


detections.


In


this


section,


we


present


two strategies


to


improve


the


reliability of the detector: merging overlapping detections from a single network and


arbitrating among multiple networks.


2.2.1


Merging Overlapping Detections



Note that in Fig. 3, most faces are detected at multiple nearby positions or scales, while


false


detec-


tions


often


occur


with


less


consistency.


This


observation


leads


to


a


heuristic which can eliminate many


false detections. For


each


location and


scale,


the


number of detections within a specified neighborhood of that location can be counted.


If


the


number


is


above


a


threshold,


then


that


lo-


cation


is


classified


as


a


face.



The


centroid of the nearby


detections defines the location of the detection result, thereby


collapsing


multiple


detections.


In


the


experiments


section,


this


heuristic


will


be


referred to as



thres


holdi ng”.


-


-


-


-


-


-


-


-



本文更新与2021-02-06 00:58,由作者提供,不代表本网站立场,转载请注明出处:https://www.bjmy2z.cn/gaokao/604291.html

人脸识别英文原文(可供中文翻译)的相关文章