-
附录
(
原文及译文
)<
/p>
翻译原文来自
Thomas David Heseltine BSc. Hons. The
University of York
Department of
Computer Science
For the Qualification
of PhD. -- September 2005 -
《
Face Recognition: Two-
Dimensional and Three-Dimensional
Techniques
》
4 Two-dimensional Face Recognition
4.1 Feature Localization
Before discussing the methods of
comparing two facial images we now take a brief
look at
some at the preliminary
processes of facial feature alignment. This
process typically consists of
two
stages: face detection and eye localisation.
Depending on the application, if the position of
the face within the image is known
beforehand (for a cooperative subject in a door
access system
for example) then the
face detection stage can often be skipped, as the
region of interest is
already known.
Therefore, we discuss eye localisation here, with
a brief discussion of face
detection in
the literature review(section 3.1.1).
The eye localisation method is used to
align the 2D face images of the various test sets
used
throughout this section. However,
to ensure that all results presented are
representative of the face recognition
accuracy and not a product of the performance of
the eye
localisation routine, all image
alignments are manually checked and any errors
corrected, prior to
testing and
evaluation.
We detect the position of
the eyes within an image using a simple template
based
method. A training set
of manually pre-aligned images of faces is taken,
and each
image cropped to an area
around both eyes. The average image is calculated
and used
as a template.
Figure 4-1 - The average
eyes. Used as a template for eye detection.
Both eyes are included in a single
template, rather than individually searching for
each eye
in turn, as the characteristic
symmetry of the eyes either side of the nose,
provides a useful
feature that helps
distinguish between the eyes and other false
positives that may be picked up in
the
background. Although this method is highly
susceptible to scale(i.e. subject distance from
the
camera) and also introduces the
assumption that eyes in the image appear near
horizontal. Some
preliminary
experimentation also reveals that it is
advantageous to include the area of skin just
beneath the eyes. The reason being that
in some cases the eyebrows can closely match the
template, particularly if there are
shadows in the eye-sockets, but the area of skin
below the eyes
helps to distinguish the
eyes from eyebrows (the area just below the
eyebrows contain eyes,
whereas the area
below the eyes contains only plain skin).
A window is passed over the test images
and the absolute difference taken to that of the
average
eye image shown above. The area
of the image with the lowest difference is taken
as the region
of interest containing
the eyes. Applying the same procedure using a
smaller template of the
individual left
and right eyes then refines each eye position.
This basic template-based method of eye
localisation, although providing fairly
preciselocalisations, often fails to
locate the eyes completely. However, we are able
to
improve performance by
including a weighting scheme.
Eye
localisation is performed on the set of training
images, which is then separated into two
sets: those in which eye detection was
successful; and those in which eye detection
failed.
Taking the set of successful
localisations we compute the average distance from
the eye template
(Figure 4-2 top). Note
that the image is quite dark, indicating that the
detected eyes correlate
closely to the
eye template, as we would expect. However, bright
points do occur near the whites
of the
eye, suggesting that this area is often
inconsistent, varying greatly from the average eye
template.
Figure 4-2
–
Distance to the eye template for successful
detections (top) indicating variance
due to
noise and failed
detections (bottom) showing credible variance due
to miss-detected
features.
In the lower image (Figure 4-2 bottom),
we have taken the set of failed
localisations(images
of the forehead,
nose, cheeks, background etc. falsely detected by
the localisation routine) and
once
again computed the average distance from the eye
template. The bright pupils surrounded
by darker areas indicate that a failed
match is often due to the high correlation of the
nose and
cheekbone regions overwhelming
the poorly correlated pupils. Wanting to emphasise
the
2
difference of
the pupil regions for these failed matches and
minimise the variance of the whites
of
the eyes for successful matches, we divide the
lower image values by the upper image to
produce a weights vector as shown in
Figure 4-3. When applied to the difference image
before
summing a total error, this
weighting scheme provides a much improved
detection rate.
Figure 4-3 - Eye template weights used
to give higher priority to those pixels that best
represent the eyes.
4.2 The Direct Correlation Approach
We begin our investigation into face
recognition with perhaps the simplest
approach,known
as the direct
correlation method (also referred to as template
matching by Brunelli and Poggio
[ 29 ])
involving the direct comparison of pixel intensity
values taken from facial images. We use
the term ‘Direct Correlation’ to
encompass all techniques in which face images are
compared
directly, without any form of
image space analysis, weighting schemes or feature
extraction,
regardless of the
di
stance metric used. Therefore, we do
not infer that Pearson’s correlation is
applied as the similarity function
(although such an approach would obviously come
under our
definition of direct
correlation). We typically use the Euclidean
distance as our metric in these
investigations (inversely related to
Pearson’s correlation and can be considered as a
scale and
translation sensitive form of
image correlation), as this persists with the
contrast made between
image space and
subspace approaches in later sections.
Firstly, all facial images must be
aligned such that the eye centres are located at
two
specified pixel coordinates and the
image cropped to remove any background
information. These images are stored as
greyscale bitmaps of 65 by 82 pixels and prior to
recognition converted into a vector of
5330 elements (each element containing the
corresponding
pixel intensity value).
Each corresponding vector can be thought of as
describing a point within a
5330
dimensional image space. This simple principle can
easily be extended to much larger
images: a 256 by 256 pixel image
occupies a single point in 65,536-dimensional
image space
and again, similar images
occupy close points within that space. Likewise,
similar faces are
located close
together within the image space, while dissimilar
faces are spaced far apart.
Calculating
the Euclidean distance
d
,
between two facial image vectors (often referred
to as the
query image
q
, and gallery image
g
), we get an indication of
similarity. A threshold is then
applied
to make the final verification decision.
d
??
q
g
(
d
??
threshold
?
a
ccept
)
??
(
d
??
threshold
?
r
eject
) .
Equ. 4-1
3
4.2.1 Verification Tests
The primary concern in any face
recognition system is its ability to correctly
verify a
claimed identity or determine
a person's most likely identity from a set of
potential matches in a
database. In
order to assess a given system’s ability to
perform these tasks, a variety of
evaluation methodologies have arisen.
Some of these analysis methods simulate a specific
mode
of operation (i.e. secure site
access or surveillance), while others provide a
more mathematical
description of data
distribution in some
classification
space. In addition, the results generated from
each analysis method may
be presented
in a variety of formats. Throughout the
experimentations in this thesis, we primarily
use the verification test as our method
of analysis and comparison, although we also use
Fisher’s
Linear Discriminant to analyse
individual subspace components in section 7 and
the
identification test for the final
evaluations described in section 8. The
verification test measures a
system’s
ability to correctly accept or reject the proposed
identity of an individual. At a
functional level, this reduces to two
images being presented for comparison, for which
the
system must return either an
acceptance (the two images are of the same person)
or rejection (the
two images are of
different people). The test is designed to
simulate the application area of
secure
site access. In this scenario, a subject will
present some form of identification at a point of
entry, perhaps as a swipe card,
proximity chip or PIN number. This number is then
used to
retrieve a stored image from a
database of known subjects (often referred to as
the target or
gallery image) and
compared with a live image captured at the point
of entry (the query image).
Access is
then granted depending on the acceptance/rejection
decision.
The results of
the test are calculated according to how many
times the accept/reject decision
is
made correctly. In order to execute this test we
must first define our test set of face images.
Although the number of images in the
test set does not affect the results produced (as
the error
rates are specified as
percentages of image comparisons), it is important
to ensure that the test set
is
sufficiently large such that statistical anomalies
become insignificant (for example, a couple of
badly aligned images matching well).
Also, the type of images (high variation in
lighting, partial
occlusions etc.) will
significantly alter the results of the test.
Therefore, in order to compare
multiple
face recognition systems, they must be applied to
the same test set.
However, it should also be
noted that if the results are to be representative
of system
performance in a real world
situation, then the test data should be captured
under precisely the
same circumstances
as in the application the other hand, if the
purpose of the
experimentation is to
evaluate and improve a method of face recognition,
which may be applied
to a range of
application environments, then the test data
should present the range of difficulties
that are to be overcome. This may mean
including a greater percentage of ‘difficult’
images than
4
would be
expected in the perceived operating conditions and
hence higher error rates in the
results
produced. Below we provide the algorithm for
executing the verification test. The
algorithm is applied to a single test
set of face images, using a single function call
to the face
recognition algorithm:
CompareFaces(FaceA, FaceB). This call is used to
compare two facial
images, returning a
distance score indicating how dissimilar the two
face images are: the lower
the score
the more similar the two face images. Ideally,
images of the same face should produce
low scores, while images of different
faces should produce high scores.
Every
image is compared with every other image, no image
is compared with itself and no
pair is
compared more than once (we assume that the
relationship is symmetrical). Once two
images have been compared, producing a
similarity score, the ground-truth is used to
determine
if the images are of the same
person or different people. In practical tests
this information is
often encapsulated
as part of the image filename (by means of a
unique person identifier). Scores
are
then stored in one of two lists: a list containing
scores produced by comparing images of
different people and a list containing
scores produced by comparing images of the same
person.
The final acceptance/rejection
decision is made by application of a threshold.
Any incorrect
decision is recorded as
either a false acceptance or false rejection. The
false rejection rate (FRR)
is
calculated as the percentage of scores from the
same people that were classified as rejections.
The false acceptance rate (FAR) is
calculated as the percentage of scores from
different people
that were classified
as acceptances.
For IndexA
= 0 to length(TestSet)
For IndexB =
IndexA+1 to length(TestSet)
Score =
CompareFaces(TestSet[IndexA], TestSet[IndexB])
If IndexA and IndexB are the same
person
Append Score to AcceptScoresList
Else
Append Score to
RejectScoresList
For Threshold =
Minimum Score to Maximum Score:
FalseAcceptCount, FalseRejectCount = 0
For each Score in RejectScoresList
If Score <= Threshold
Increase FalseAcceptCount
For each Score in AcceptScoresList
If Score > Threshold
Increase FalseRejectCount
5
FalseAcceptRate = FalseAcceptCount /
Length(AcceptScoresList)
FalseRejectRate = FalseRejectCount /
length(RejectScoresList)
Add plot to
error curve at (FalseRejectRate, FalseAcceptRate)
These two error rates
express the inadequacies of the system when
operating at a
specific
threshold value. Ideally, both these figures
should be zero, but in reality reducing either
the FAR or FRR (by altering the
threshold value) will inevitably result
in increasing the other. Therefore, in
order to describe the full operating range of
a
particular system, we vary
the threshold value through the entire range of
scores
produced. The application of
each threshold value produces an additional FAR,
FRR
pair, which when plotted on a graph
produces the error rate curve shown
below.
6
Figure 4-5 - Example Error Rate Curve
produced by the verification test.
The
equal error rate (EER) can be seen as the point at
which FAR is equal to FRR. This EER
value is often used as a single figure
representing the general recognition
performance of a biometric system and
allows for easy visual comparison of
multiple
methods. However,
it is important to note that the EER does not
indicate the level of
error that would
be expected in a real world application. It is
unlikely that any real
system would use
a threshold value such that the percentage of
false acceptances were
equal
to the percentage of false rejections. Secure site
access systems would typically
set the
threshold such that false acceptances were
significantly lower than false rejections:
unwilling to tolerate intruders at the
cost of inconvenient access denials.
Surveillance systems on the other hand
would require low false rejection rates to
successfully identify people in a less
controlled environment. Therefore we should bear
in mind
that a system with a lower EER
might not necessarily be the better performer
towards the
extremes of its operating
capability.
There is a
strong connection between the above graph and the
receiver operating
characteristic (ROC) curves, also used
in such experiments. Both graphs are simply two
visualisations of the same results, in
that the ROC format uses the True Acceptance
Rate(TAR),
where TAR = 1.0
–
FRR in place of the FRR,
effectively flipping the graph vertically. Another
visualisation of the verification test
results is to display both the FRR and FAR as
functions of
the threshold value. This
presentation format provides a reference to
determine the threshold
value necessary
to achieve a specific FRR and FAR. The EER can be
seen as the point where the
two curves
intersect.
7