-
Thomas David Heseltine BSc. Hons. The
University of York
Department of
Computer Science
For the Qualification
of PhD. -- September 2005 -
《
Face Recognition: Two-
Dimensional and Three-Dimensional
Techniques
》
4 Two-dimensional Face Recognition
4.1 Feature Localization
Before discussing the methods of
comparing two facial images we now take a brief
look at
some at the preliminary
processes of facial feature alignment. This
process typically consists of
two
stages: face detection and eye localisation.
Depending on the application, if the position of
the face within the image is known
beforehand (for a cooperative subject in a door
access system
for example) then the
face detection stage can often be skipped, as the
region of interest is
already known.
Therefore, we discuss eye localisation here, with
a brief discussion of face
detection in
the literature review(section 3.1.1).
The eye localisation method is used to
align the 2D face images of the various test sets
used
throughout this section. However,
to ensure that all results presented are
representative of the face recognition
accuracy and not a product of the performance of
the eye
localisation routine, all image
alignments are manually checked and any errors
corrected, prior to
testing and
evaluation.
We detect the position of
the eyes within an image using a simple template
based
method. A training set of
manually pre-aligned images of faces is taken, and
each
image cropped to an area around
both eyes. The average image is calculated and
used
as a template.
Figure 4-1 - The average
eyes. Used as a template for eye detection.
Both eyes are included in a single
template, rather than individually searching for
each eye
in turn, as the characteristic
symmetry of the eyes either side of the nose,
provides a useful
feature that helps
distinguish between the eyes and other false
positives that may be picked up in
the
background. Although this method is highly
susceptible to scale(i.e. subject distance from
the
camera) and also introduces the
assumption that eyes in the image appear near
horizontal. Some
preliminary
experimentation also reveals that it is
advantageous to include the area of skin just
beneath the eyes. The reason being that
in some cases the eyebrows can closely match the
template, particularly if there are
shadows in the eye-sockets, but the area of skin
below the eyes
helps to distinguish the eyes from
eyebrows (the area just below the eyebrows contain
eyes,
whereas the area below the eyes
contains only plain skin).
A window is
passed over the test images and the absolute
difference taken to that of the average
eye image shown above. The area of the
image with the lowest difference is taken as the
region
of interest containing the eyes.
Applying the same procedure using a smaller
template of the
individual left and
right eyes then refines each eye position.
This basic template-based method of eye
localisation, although providing fairly
preciselocalisations, often fails to
locate the eyes completely. However, we are able
to
improve performance by including a
weighting scheme.
Eye localisation is
performed on the set of training images, which is
then separated into two
sets: those in
which eye detection was successful; and those in
which eye detection failed.
Taking the
set of successful localisations we compute the
average distance from the eye template
(Figure 4-2 top). Note that the image
is quite dark, indicating that the detected eyes
correlate
closely to the eye template,
as we would expect. However, bright points do
occur near the whites
of the eye,
suggesting that this area is often inconsistent,
varying greatly from the average eye
template.
Figure 4-2
–
Distance to the eye template for successful
detections (top) indicating variance
due to
noise and failed
detections (bottom) showing credible variance due
to miss-detected
features.
In the lower image (Figure 4-2 bottom),
we have taken the set of failed
localisations(images
of the forehead,
nose, cheeks, background etc. falsely detected by
the localisation routine) and
once
again computed the average distance from the eye
template. The bright pupils surrounded
by darker areas indicate that a failed
match is often due to the high correlation of the
nose and
cheekbone regions overwhelming
the poorly correlated pupils. Wanting to emphasise
the
difference of the pupil regions for
these failed matches and minimise the variance of
the whites
of the eyes for successful
matches, we divide the lower image values by the
upper image to
produce a weights vector
as shown in Figure 4-3. When applied to the
difference image before
summing a total
error, this weighting scheme provides a much
improved detection rate.
2
Figure 4-3 - Eye template
weights used to give higher priority to those
pixels that best
represent the eyes.
4.2 The Direct Correlation
Approach
We begin our investigation
into face recognition with perhaps the simplest
approach,known
as the direct
correlation method (also referred to as template
matching by Brunelli and Poggio
[ 29 ])
involving the direct comparison of pixel intensity
values taken from facial images. We use
the term ‘Direct Correlation’ to
encompass all techniques in which face images are
compared
directly, without any form of
image space analysis, weighting schemes or feature
extraction,
regardless of the distance
metric use
d. Therefore, we do not infer
that Pearson’s correlation is
applied
as the similarity function (although such an
approach would obviously come under our
definition of direct correlation). We
typically use the Euclidean distance as our metric
in these
inves
tigations
(inversely related to Pearson’s correlation and
can be considered as a scale and
translation sensitive form of image
correlation), as this persists with the contrast
made between
image space and subspace
approaches in later sections.
Firstly,
all facial images must be aligned such that the
eye centres are located at two
specified pixel coordinates and the
image cropped to remove any background
information. These images are stored as
greyscale bitmaps of 65 by 82 pixels and prior to
recognition converted into a vector of
5330 elements (each element containing the
corresponding
pixel intensity value).
Each corresponding vector can be thought of as
describing a point within a
5330
dimensional image space. This simple principle can
easily be extended to much larger
images: a 256 by 256 pixel image
occupies a single point in 65,536-dimensional
image space
and again, similar images
occupy close points within that space. Likewise,
similar faces are
located close
together within the image space, while dissimilar
faces are spaced far apart.
Calculating
the Euclidean distance
d
,
between two facial image vectors (often referred
to as the
query image
q
, and gallery image
g
), we get an indication of
similarity. A threshold is then
applied
to make the final verification decision.
d
q g
(
d
threshold
?
accept
4.2.1 Verification Tests
The
primary concern in any face recognition system is
its ability to correctly verify a
claimed identity or determine a
person's most likely identity from a set of
potential matches in a
database. In
order to assess a given system’s ability to
perform these tasks, a variety of
3
d
threshold
?
reject
) .
Equ.
4-1
evaluation methodologies have arisen.
Some of these analysis methods simulate a specific
mode
of operation (i.e. secure site
access or surveillance), while others provide a
more mathematical
description of data
distribution in some
classification
space. In addition, the results generated from
each analysis method may
be presented
in a variety of formats. Throughout the
experimentations in this thesis, we primarily
use the verification test as our method
of analysis and comparison, although we also use
Fisher’s
Linear Discriminant to analyse
individual subspace components in section 7 and
the
identification test for the final
evaluations described in section 8. The
verification test measures a
system’s
ability to correctly accept or reject the proposed
identity of an individual. At a
functional level, this reduces to two
images being presented for comparison, for which
the
system must return either an
acceptance (the two images are of the same person)
or rejection (the
two images are of
different people). The test is designed to
simulate the application area of
secure
site access. In this scenario, a subject will
present some form of identification at a point of
entry, perhaps as a swipe card,
proximity chip or PIN number. This number is then
used to
retrieve a stored image from a
database of known subjects (often referred to as
the target or
gallery image) and
compared with a live image captured at the point
of entry (the query image).
Access is
then granted depending on the acceptance/rejection
decision.
The results of
the test are calculated according to how many
times the accept/reject decision
is
made correctly. In order to execute this test we
must first define our test set of face images.
Although the number of images in the
test set does not affect the results produced (as
the error
rates are specified as
percentages of image comparisons), it is important
to ensure that the test set
is
sufficiently large such that statistical anomalies
become insignificant (for example, a couple of
badly aligned images matching well).
Also, the type of images (high variation in
lighting, partial
occlusions etc.) will
significantly alter the results of the test.
Therefore, in order to compare
multiple
face recognition systems, they must be applied to
the same test set.
However, it should also be
noted that if the results are to be representative
of system
performance in a real world
situation, then the test data should be captured
under precisely the
same circumstances
as in the application the other hand, if the
purpose of the
experimentation is to
evaluate and improve a method of face recognition,
which may be applied
to a range of
application environments, then the test data
should present the range of difficulties
that are to be overcome. This may mean
including a greater percentage of ‘difficult’
images than
would be expected in the
perceived operating conditions and hence higher
error rates in the
results produced.
Below we provide the algorithm for executing the
verification test. The
algorithm is
applied to a single test set of face images, using
a single function call to the face
recognition algorithm:
CompareFaces(FaceA, FaceB). This call is used to
compare two facial
4
-
-
-
-
-
-
-
-
-
上一篇:关于惊喜的英文句子表达
下一篇:高考英语阅读理解精讲精析