BIOMETRICS: A TECHNICAL
PRIMER
Author: Elaine M.
Newton, with John D. Woodward
Adapted from John D.
Woodward, Katherine W. Webb, Elaine M.
Newton et al., Appendix A, "Biometrics:
A Technical Primer," "Army
Biometric Applications: Identifying and
Addressing Sociocultural Concerns,"
RAND/MR-1237-A,
Santa Monica
,
CA
: RAND 2001. Copyright RAND 2001.
Acknowledgements to Andy Oram and Jean Camp.
INTRODUCTION
In general, there are
three approaches to authenticating an
individual's identity. In order of most
secure and convenient to least secure and
convenient, they are as follows:
·
Something
you are - a biometric.
·
Something you know - PIN, password.
·
Something you have - key, token,
card.
Any combination of these
approaches can potentially further heighten
security.1
Facial recognition
software, fingerprint readers, hand geometry
readers, and other forms of biometrics
appear increasingly in systems with
mission-critical security. Given the
widespread consensus in the security
community that passwords and magnetic-stripe
cards accompanied by PINs have weaknesses,
biometrics could well be ensconced in future
security systems.
This document begins with
a definition of biometrics and related
terms. It then describes the steps in the
biometric authentication process, and
reviews issues of template management and
storage. The appendix concludes with a brief
review of mainstream biometric applications.2
OVERVIEW
A biometric is any measurable,
robust, distinctive physical
characteristic or personal trait that can be
used to identify, or verify the claimed
identity of, an individual. Biometric
authentication, in the context of this
report, refers to automated methods of
identifying, or verifying the identity of, a
living person.
The italicized terms above
require explanation.
Measurable
means that the characteristic or trait can
be easily presented to a sensor and
converted into a quantifiable, digital
format. This allows for the automated
matching process to occur in a matter of
seconds.
The robustness
of a biometric is a measure of the extent to
which the characteristic or trait is subject
to significant changes over time. These
changes can occur as a result of age,
injury, illness, occupational use, or
chemical exposure. A highly robust biometric
does not change significantly over time. A
less robust biometric does. For example, the
iris, which changes very little over a
person's lifetime, is more robust than a
voice.
Distinctiveness
is a measure of the variations or
differences in the biometric pattern among
the general population. The higher the
degree of distinctiveness, the more unique
the identifier. The highest degree of
distinctiveness implies a unique identifier.
A low degree of distinctiveness indicates a
biometric pattern found frequently in the
general population. The iris and the retina
have higher degrees of distinctiveness than
hand or finger geometry.
The application helps
determine the degree of robustness and
distinctiveness required. The system's
ability to match a sample to a template is
sometimes referred to as the biometric's
reliability.
Systems can be used either
to identify people in a consensual or
nonconsensual manner - as when faces are
scanned in public places - or to verify the
claimed identity of a person who presents a
biometrics sample in order to gain access or
authorization for an activity. The following
section expands on this issue.
The presence of a living
person distinguishes biometric
authentication from forensics, which does
not involve real-time identification of a
living individual.
IDENTIFICATION VERSUS
VERIFICATION
Identification and
verification differ significantly. With
identification, the biometric system asks
and attempts to answer the question,
"Who is X?" In an identification
application, the biometric device reads a
sample and compares that sample against
every template in the database. This is
called a "one-to-many" search
(1:N). The device will either make a match
and subsequently identify the person or it
will not make a match and not be able to
identify the person.
Verification is when the
biometric system asks and attempts to answer
the question, "Is this X?" after
the user claims to be X. In a verification
application, the biometric device requires
input from the user, at which time the user
claims his identity via a password, token,
or user name (or any combination of the
three). This user input points the device to
a template in the database. The device also
requires a biometric sample from the user.
It then compares the sample to or against
the user-defined template. This is called a
"one-to-one" search (1:1). The
device will either find or fail to find a
match between the two.
Identification
applications require a highly robust and
distinctive biometric; otherwise, the error
rates falsely matching and falsely
nonmatching user's samples against templates
cause security problems and inhibit
convenience. Identification applications are
common where the end-user wants to identify
criminals (immigration, law enforcement,
etc.) or other "wolves in sheep's
clothing." Other types of applications
may use a verification process.3
In many ways, deciding whether to use
identification or verification requires a
trade-off: the end-user's needs for security
versus convenience.
In sum, biometric
authentication is used in two ways: to prove
who you are or who you claim you are and to
prove who you are not (e.g., to resolve a
case of mistaken identity).
THREE BASIC ELEMENTS OF
ALL BIOMETRIC SYSTEMS
All
biometric systems consist of three basic
elements:
1. Enrollment, or the process of collecting
biometric samples from an individual, known
as the enrollee, and the subsequent
generation of his template.
2. Templates, or the data representing the
enrollee's biometric.
3. Matching, or the process of comparing a
live biometric sample against one or many
templates in the system's database.
Performance refers to the
ability of a biometric system to correctly
match, or identify individuals.
Enrollment
Enrollment is the crucial
first stage for biometric authentication
because it generates a template that will be
used for all subsequent matching. Typically,
the device takes three samples of the same
biometric and averages them to produce an
enrollment template. Enrollment is
complicated by the fact that a users'
familiarity with a biometric device usually
improves performance because they know how
to place themselves in front of or onto a
sensor, but enrollment is usually the first
time the user is exposed to the device.
Environmental conditions
also affect enrollment. Enrollment should
take place under conditions similar to those
expected during the routine matching
process. For example, if voice verification
is used in an environment where there is
background noise, the enrolling system
should capture voice templates in the same
environment.
In addition to user and
environmental issues, biometrics themselves
change over time. Many biometric systems
account for these changes by continuously
averaging. Templates are averaged and
updated each time the user attempts
authentication.
Templates
The biometric device
stores the data captured when enrolling a
person as a template. The device uses a
proprietary algorithm to extract features
appropriate to that biometric from the
enrollee's samples. Templates are only a
record of distinguishing features, sometimes
called minutiae points, of a person's
biometric characteristic or trait. For
example, templates are not an image or
record of the actual fingerprint or voice.4
In basic terms, templates are numerical
representations of key points taken from a
person's body. They can be thought of as
very long passwords that can identify a body
part or behavior.
The template usually
occupies a small amount of computer memory
(and is smaller than the original image) and
thus allows for quick processing, a key
feature of making biometric authentication
practical.
The template must be
stored somewhere so that subsequent
templates, created when a user tries to
access the system using a sensor, can be
compared. Some biometric experts claim it is
impossible to reverse-engineer, or recreate,
a person's print or image from the biometric
template.
Matching
Matching is the comparison
of two templates: the one produced at the
time of enrollment (or at previous sessions,
if there is continuous updating) and the one
produced "on the spot" as a user
tries to gain access by providing a
biometric sample via a sensor.
There are three ways a
match can fail:
· Failure
to enroll / Failure to acquire
·
False match
·
False nonmatch
Both failure to enroll
(during enrollment) and failure to acquire
(prior to matching) are failures to extract
distinguishing features appropriate to that
technology. For example, a small percentage
of the population fails to enroll in
fingerprint-based biometric authentication
systems. There are two primary reasons for
this failure: the individual's fingerprints
are not distinctive enough to be picked up
by the system, or the distinguishing
characteristics of the individual's
fingerprints have been altered because of
the individual's age or occupation, e.g., as
might happen with an elderly bricklayer.
False match (FM) and false
nonmatch (FNM) are frequently misnomered
"false acceptance" and "false
rejection," respectively, but the
latter pair of terms are
application-dependent in meaning. FM and FNM
are application-neutral terms that describe
the matching process between a live sample
and a biometric template.
A false match occurs when
a sample is incorrectly matched to a
template in the database (i.e., an imposter
is accepted). A false nonmatch occurs when a
sample is incorrectly not matched to a truly
matching template in the database (i.e., a
legitimate match is denied). People
deploying biometric systems calculate rates
for FM and FNM and use them to make
tradeoffs between security and convenience
when choosing a system or tuning its
parameters. For example, a heavy security
emphasis errs on the side of denying
legitimate matches and does not tolerate
acceptance of imposters. A heavy emphasis on
user convenience results in little tolerance
for denying legitimate matches but tolerates
some acceptance of imposters.
TEMPLATE
MANAGEMENT-STORAGE AND SECURITY
Template management is
critically linked to privacy, security, and
convenience. All biometric authentication
systems face a common issue: biometric
templates must be stored somewhere.
Templates must be protected to prevent
identity fraud and to protect the privacy of
users. Privacy is affected when additional
information is stored about each user along
with the biometric template.
Possible locations
template storage include :
·
The biometric device itself
·
A central computer that is
remotely accessed
·
A plastic card or token via a
bar code or magnetic
stripe
·
Radio Frequency Identification
Device cards and tags
·
Optical memory cards
·
Personal Computer Memory Card
International
Association
cards
·
Smart cards
In general, transmitting
biometric data over communications lines
reduces system security because the data
become vulnerable to the same interception
or tampering possible when any data is sent
"over the wire." On the other
hand, a network or central repository may be
needed for some applications where there are
multiple access points, or when there is a
need to confirm information with another
node or higher authority. Biometrics are
more secure when stored under the control of
the authorized user, such as on a smart
card, and used in verification applications.
Cards have varying degrees of utility and
storage memory.
Smart cards are the size
of credit cards and have an embedded
microchip or microprocessor chip. The chip
stores electronic data that can be protected
using biometrics. There are two types of
smart cards: contact and contactless smart
cards. A contact smart card must be inserted
into a smart card reader to be used. A
contactless smart card only has to be placed
near an antenna to carry out a transaction.5
Security for template
database storage is also affected by the
number of uses to which the database is put:
will it have a unique use or will it be used
for multiple security purposes?
For example, a facilities
manager might use a fingerprint reader for
physical access control to the building. The
manager might also want to use the same
fingerprint template database for his
employees to access their computer network.
Should the manager use separate databases
for these different uses, or is he willing
to risk accessing employee fingerprints from
a remote location for multiple purposes?
Additional security
features can be incorporated into biometric
systems to detect a "wolf," or
unauthorized user. For example, a
"liveliness test" tries to
determine whether the biometric sample is
being read from a live person versus a faux
body part or body part of a dead person.
Liveliness tests are done in many ways. The
device can check for such things as heat,
heartbeat, or electrical capacitance.6
Other security features
include encryption of biometric data and the
use of sequence numbers in template
transmission. A template with such a number
out of sequence suggests unauthorized use.
In general, verification
applications provide more security than
identification applications because a
biometric and at least one other piece of
input (e.g., PIN, password, token, user
name) are required to match a template and
the corresponding record. In essence, it is
a second layer of security.
Verification provides a
user with more control over his data and
over the process when the template is stored
only on a card. Such a system would not
allow for clandestine, or involuntary,
capture of biometric data because the
individual would know each time, where, and
to what system s/he were submitting their
biometric. Verification applications with
storage (and possibly matching, too) of a
biometric template on a card are potentially
more palatable to the public (for privacy,
convenience, and security concerns) and more
secure than identification applications or
applications with a repository for many
reasons:
1. There is no large
centralized storage location of templates,
which could be abused or hacked. Even
distributed databases should be regarded as
"honey pots" for hackers and leave
open the possibility of abuse by an
administrator.
2. They require the user's
consent to capture data.
3. There is an added layer
of security making it necessary to be in
possession of a card. Also requiring a
password can add yet another layer of
security.
4. Because the search
seeks only a match against one template in
the database, verification applications
require less processing time and memory.
BIOMETRIC APPLICATIONS
Most biometric
applications fall into one of nine general
categories. First there are financial
services (e.g., ATMs and kiosks) to limit
risks by using biometrics to provide
authentication to data. The second large
class is to evaluate the right of
individuals to make certain movements and
cross borders. These are most widely
proposed for immigration and border control
(e.g., points of entry, precleared frequent
travelers, passport and visa issuance,
asylum cases).
Biometrics are broadly
used in cases where the physical entity is
authenticated. In social services,
biometrics provide fraud prevention in
entitlement programs. In health care
biometrics offer security measures for the
privacy of medical records. Biometrics are
used for physical access control in a
variety of institutions (e.g. government,
and residential).
Biometrics are also used
for narrow replacements for traditional
problems of verification. Applications here
include time and attendance where biometrics
are used as a replacement of time
punch-cards. Biometrics are widely proposed
as solutions to problems in computer
security including personal computer access,
network access, Internet use, e-commerce,
and e-mail authentication.
Biometrics are proposed as
an enabling underlying service in
telecommunications to limit mobile phone
fraud, authenticate callers into call
centers, strengthen the security of phone
cards, and enable televised shopping.
Finally, biometrics have
been embraced by law enforcement for use in
criminal investigations, national ID
systems, driver's licenses, correctional
institutions/prisons, home confinement, and
have been integrated into smart gun designs.
MAINSTREAM BIOMETRICS AND
THEIR APPLICATIONS
While there are many
possible biometrics, at least eight
mainstream biometric authentication
technologies have been deployed or
pilot-tested in applications in the public
and private sectors:7 (The
leaders are listed as the top four.)
fingerprint
iris
scan
facial
recognition
hand/finger
geometry
voice
recognition
retinal
scan
dynamic
signature verification
keystroke
dynamics
Fingerprint
The fingerprint biometric
is an automated, digital version of the old
ink-and-paper method used for more than a
century for identification, primarily by law
enforcement agencies. Users place their
finger on a platen for the print to be read.
The minutiae are then extracted by the
vendor's algorithm, which also makes a
fingerprint pattern analysis. Fingerprint
template sizes are typically 50 to 1,000
bytes.
Fingerprint biometrics
currently have three main application
arenas: large-scale Automated Finger Imaging
Systems (AFIS) (generally used for law
enforcement), fraud prevention in
entitlement programs, and physical and
computer access.
Iris Scan
Iris scanning measures the
iris pattern in the colored part of the eye,
although the iris color has nothing to do
with the biometric. Iris patterns are formed
randomly. As a result, the iris patterns in
your left and right eyes are different, and
so are the iris patterns of identical twins.
Iris scan templates are typically around 256
bytes.
Iris scanning can provide
quick authentication for both identification
and verification applications because of its
large number of degrees of freedom. Current
pilot programs and applications include ATMs
("Eye-TMs"), grocery stores (for
checking out), and the Charlotte/Douglas
International Airport (physical access).
During the Winter Olympics in
Nagano
,
Japan
, an iris scanning identification system
controlled access to the rifles used in the
biathlon.
Facial Recognition
Facial recognition records
the spatial geometry of distinguishing
features of the face. Different vendors use
different methods of facial recognition,
however, all focus on measures of key
features. Facial recognition templates are
typically 83 to 1,000 bytes. Facial
recognition technologies can encounter
performance problems stemming from a number
of factors, including noncooperative user
behavior and environmental variables such as
lighting.
Facial recognition has
been used to identify card counters in
casinos, shoplifters in stores, criminals in
targeted urban areas, and terrorists.
(See the Appendix for an
in depth review of face recognition
performance.)
Hand/Finger Geometry
Hand or finger geometry is
an automated measurement of many dimensions
of the hand and fingers. Neither of these
methods takes actual prints of the palm or
fingers. Only the spatial geometry is
examined as the user puts his hand on the
sensor's surface and uses guiding poles
between the fingers to properly place the
hand and initiate the reading. Hand geometry
templates are typically 9 bytes, and finger
geometry templates are 20 to 25 bytes.
Finger geometry usually measures two or
three fingers. During the 1996 Summer
Olympics, hand geometry secured the
athlete's dormitories at Georgia Tech. Hand
geometry is a well-developed technology that
has been thoroughly field-tested and is
easily accepted by users.
Voice Recognition
Voice or speaker
recognition uses vocal characteristics to
identify individuals. It involves their
speaking a pass-phrase so that the sample
they used when enrolling can match the
sample the use at the time of attempted
access. A telephone or microphone can serve
as a sensor, which makes it a relatively
cheap and easily deployable technology.
Voice recognition can be
affected by environmental factors,
particularly background noise. Additionally,
it is unclear whether the technologies
actually recognize the voice or just the
pronunciation of the pass-phrase (password)
used. This technology has been the focus of
considerable efforts on the part of the
telecommunications industry and NSA, which
continue to work on improving reliability.
Retinal Scan
Retinal scans measure the
blood vessel patterns in the back of the
eye. Retinal scan templates are typically 40
to 96 bytes. Because the retina can change
with certain medical conditions, such as
pregnancy, high blood pressure, and AIDS,
this biometric might have the potential to
reveal more information than just an
individual's identity.
Because end-users perceive
the technology to be somewhat intrusive,
retinal scanning has not gained popularity
with them. The device shines a light into
the eye of a user, who must be standing very
still within inches of the device.
Dynamic Signature
Verification
Dynamic signature
verification is an automated method of
examining an individual's signature. This
technology examines such dynamics as speed,
direction, and pressure of writing; the time
that the stylus is in and out of contact
with the "paper"; the total time
taken to make the signature; and where the
stylus is raised from and lowered onto the
"paper." Dynamic signature
verification templates are typically 50 to
300 bytes.
Keystroke Dynamics
Keystroke dynamics is an
automated method of examining an
individual's keystrokes on a keyboard. This
technology examines such dynamics as speed
and pressure, the total time taken to type a
particular password, and the time a user
takes between hitting certain keys. This
technology's algorithms are still being
developed to improve robustness and
distinctiveness. One potentially useful
application that may emerge is computer
access, where this biometric could be used
to verify the computer user's identity
continuously.
CLASSIFYING BIOMETRIC
APPLICATIONS
Biometric applications may
be classified in many different ways. James
Wayman of the
National
Biometric
Test
Center
suggests the following seven categories for
classifying biometric applications,
explained below.
1.
overt or clandestine
2. cooperative or noncooperative
3. habituated or nonhabituated
4. supervised or nonsupervised
5. standard or nonstandard environment
6. closed or open system
7. public or private.
Overt versus clandestine
capture of a biometric sample refers to the
user's awareness that he is participating in
biometric authentication.8
Facial recognition is an example of a
biometric that can be used for clandestine
identification of individuals. Most uses of
biometrics are overt, because users' active
participation improves performance and
lowers error rates. Verification
applications are nearly always overt.
Cooperative versus
noncooperative applications refer to the
behavior that is in the best interest of the
"wolf." Is it in the interest of
"wolves" to match or to not match
a template in the database? Which is to the
"wolf's" benefit? This is
important in planning a security system with
biometrics because no perfect biometric
system exists. Every system can be tricked
into falsely not matching one's sample and
template-some more easily than others. It is
also possible to trick a biometric device
into falsely matching your sample against a
template, but it could be argued that this
requires more work and a sophisticated
hacker to make a model of the biometric
sample.
In systems that store user
information in a database, an intruder or
"wolf" may try to trick the system
into divulging biometric samples or other
information. One way to strengthen security
in a cooperative application is to require a
password or token along with a biometric, so
that the "wolf" must match one
specific template and is not allowed to
exploit the entire database for his gain.
To gain access to a
computer, a "wolf" would want to
be cooperative. To attempt to foil an INS
database consisting of illegal border
crossing recidivists, a "wolf"
(recidivist) would be noncooperative.
Habituated versus
nonhabituated use of a biometric system
refers to how often the users interface with
the biometric device. This is significant
because the user's familiarity with the
device affects its performance. Depending on
which type of application is chosen, the
end-user may need to utilize a biometric
that is highly robust. As examples, the use
of fingerprints for computer or network
access is a habituated use, while the use of
fingerprints on a driver's license, which is
updated once every several years, is a
nonhabituated use. Even
"habituated" applications are
"nonhabituated" during their first
week or so of operation or until the users
adjust to using the system.
Supervised versus
nonsupervised applications refer to whether
supervision (e.g., a security officer) is a
resource available to the end-user's
security system. Do users need to be
instructed on how to use the device (because
the application has many new users or
nonhabituated users) or be supervised to
ensure they are being properly sampled (such
as border crossing situations that deal with
the problem of recidivists or other
noncooperative applications)? Or is the
application made for increased convenience,
such as at an ATM? Routine use of an access
system may or may not require supervision.
The process of enrollment nearly always
requires supervision.
Standard versus
nonstandard environments are generally a
dichotomy between indoors versus outdoors. A
standard environment is optimal for a
biometric system and matching performance. A
nonstandard environment may present
variables that would create false nonmatches.
For example, a facial recognition template
depends, in part, on the lighting conditions
when the "picture" (image) was
taken. The variable lighting outdoors can
cause false nonmatches. Some indoor
situations may also be considered
nonstandard environments.
Closed versus open systems
refers to the number of uses of the template
database, now and in the future. Will the
database have a unique use (closed), or will
it be used for multiple security measures
(open)? Recall the fingerprint example from
"Template Management-Storage and
Security" for employees to enter a
building and log on to their computer
network. Should they use separate databases
for these different uses, or do they want to
risk remotely accessing employee
fingerprints for multiple purposes?
Other examples are state
driver's licenses and entitlement programs.
A state may want to communicate with other
states or other programs within the same
state to eliminate fraud. This would be an
open system, in which standard formats of
data and compression would be required to
exchange and compare information.
Public or private
applications refer to the users and their
relationship to system management. Examples
of users of public applications include
customers and entitlement recipients. Users
of private applications include employees of
business or government. Both user attitudes
toward biometric devices and management's
approach vary depending on whether the
application is public or private. Once
again, user attitudes toward the device will
affect the performance of the biometric
system.
It should be noted here
that performance figures and error rates
from vendor testing are unreliable for many
reasons. Part of the problem is that
determining the distinctiveness of a
biometric accurately requires thousands or
even millions of people. To acquire samples
over any amount of time in any number of
contexts from this number of people would be
impossible. To test for the many variables
in each type of application would be
impossible in most cases, and too costly in
the few where it is possible. Operational
and pilot testing is the only reasonable
method to test a system. Additionally,
vendor and scientific laboratory testing
generally present only the easiest
deployment scenario of a biometric
application: overt, cooperative, habituated,
supervised, standard, closed, and private.
SALIENT CHARACTERISTICS
OF MAINSTREAM BIOMETRICS
Table A.1 compares the
eight mainstream biometrics in terms of a
number of characteristics, including how
robust and they are, how intrusive they are,
and what applications they can be used for
(i.e., identification or verification, or
verification alone).9 This
table is an attempt to assist the reader in
categorizing biometrics along important
dimensions. Because this industry is still
working to establish comprehensive standards
and the technology is changing rapidly,
however, it is difficult to make assessments
with which everyone would agree. The table
represents an assessment based on
discussions with technologists, vendors, and
program managers.
|
Biometric
|
Identify
versus Verify
|
Robust
|
Distinctive
|
Intrusive
|
|
Fingerprint
|
Either
|
High
to Moderate 10
|
High
|
Touching
|
|
Hand/Finger
Geometry
|
Verify
|
Moderate
|
Low
|
Touching
|
|
Facial
Recognition
|
Either
|
Moderate
|
Moderate
|
12+
inches
|
|
Voice
Recognition
|
Verify
|
Low
|
Moderate
|
Remote
|
|
Iris
Scan
|
Either
|
High
|
High
|
12+
inches
|
|
Retinal
Scan
|
Either
|
High
|
High
|
1-2
inches
|
|
Dynamic
Signature Verification
|
Verify
|
Low
|
Low
|
Touching
|
|
Keystroke
Dynamics
|
Verify
|
Low
|
Low
|
Touching
|
Table A.1 Comparison of
Mainstream Biometrics
Half the systems in Table
A.1 can be used for either identification or
verification, while the rest can be used
only for verification. In particular, hand
geometry has been used only for verification
applications, such as physical access
control and time and attendance
verification. In addition, voice
recognition, because of the need for
enrollment and matching using a pass-phrase,
is typically used for verification only.
Robustness and
distinctiveness vary considerably.
Fingerprinting is moderately robust, and,
although it is distinctive, a small
percentage of the population has unusable
prints, usually because of age, genetics,
injury, occupation, exposure to chemicals,
or other occupational hazards. Hand/finger
geometry is moderate on the distinctiveness
scale, but it is not very robust, while
facial recognition is neither highly robust
nor distinctive. As for voice recognition,
assuming the voice and not the pronunciation
is what is being measured, this biometric is
moderately robust and distinctive. Iris
scans are both highly robust (because they
are not highly susceptible to day-to-day
changes or damage) and distinctive (because
they are randomly formed). Retinal scans are
fairly robust and very distinctive. Finally,
neither dynamic signature verification nor
keystroke dynamics are particularly robust
or distinctive.
As the table shows, the
biometrics vary in terms of how intrusive
they are, ranging from those biometrics that
require touching to others that can
recognize an individual from a distance.
APPENDIX A: FACE
RECOGNITION PERFORMANCE
There are three types of
evaluations in the biometrics community:
technology, scenario, and operations. The
Facial Recognition Vendor Test 2000 (FRVT
2000) marked the state-of-the-art of face
biometrics and the research issues that
continue to challenge the face recognition
community through both technology and
scenario evaluations. (The results of the
follow-on study, FRVT 2002, will be released
sometime in 2003.) A significant part of the
story that FRVT 2000 tells is not just in
the results but in the number of
participants, types of participants,
approach and evaluation protocol, parameters
tested, and how the results are reported.11
The Facial Recognition
Vendor Test 2000 consisted of two
components: the Technology Evaluation
(referred to as the "Recognition
Performance Test" in the FRVT 2000
Report) and the Physical Access Scenario
Test (referred to as the "Product
Usability Test" in the FRVT 2000
Report). The FRVT 2000 Technology Evaluation
is an assessment of commercially available
facial recognition systems. The FRVT 2000
Physical Access Scenario Test is an example
of a limited scenario evaluation, but not
all products tested were designed for
physical access applications. Hence, the
performance results of different systems are
difficult to compare to one another or
analyze.
By far, the most important
results of the FRVT 2000 Report are drawn
from the Technology Evaluation and are
reported here. The vendors that volunteered
to participate in the evaluation were Banque-Tec,
C-VIS, Lau Technologies, Miros Inc. (eTrue),
and Visionics Corp.
TECHNOLOGY EVALUATION
METHODOLOGY
For the Technology
Evaluation (referred to as the
"Recognition Performance Test" in
the FRVT 2000 Report) portion of this study,
the vendors were asked to compare 13,872
images from a sequestered database (called
FERET, see 2.1.1). Since each image is
compared to the entire set, this amounts to
more than 192 million comparisons. The
vendors were given 72 hours to make these
comparisons. C-VIS, Lau Technologies, and
Visionics Corp. successfully completed the
comparison task. Banque-Tec completed
approximately 9,000 images, and Miros Inc. (eTrue)
completed approximately 4,000 images in the
time allowed. Since Banque-Tec and Miros
Inc. (eTrue) were unable to complete all of
the comparisons, their results were not
included in the Technology evaluation.
In face recognition, a
gallery is a set of known individuals
against which an algorithm attempts to
perform recognition. A probe set is a set of
images of unknown individuals that an
algorithm attempts to recognize. For the
Technology Evaluation, the complete set of
13,872 images and the corresponding matrix
of 13,872 x 13,872 similarity scores were
divided into several subsets and used as
probe and gallery images for various
experiments.
FERET Images
Research in face
recognition was greatly enhanced and more
firmly established as a scientific field in
the 1990s through the work of P. Jonathon
Phillips with the U.S. Army and the
Department of Defense Counterdrug Technology
Development Program. Part of this work
includes the collection of the first
database of face images collected in a
systematic manner called the Face
Recognition Technology (FERET) database,
which was accompanied by an evaluation and
its methodology. (This creation of all three
were gigantic leaps for face recognition as
well as for any single biometric.)
The FERET database was
collected between 1993 and 1996. For each
individual, two frontal views were taken (
fa and fb ); a different facial expression
was requested for the second frontal image.
For 200 sets of images, a third frontal
image was taken with a different camera and
different lighting (this is referred to as
the fc image). The remaining images were
collected at various aspects between right
and left profile. To add simple variations
to the database, photographers sometimes
took a second set of images, for which the
subjects were asked to put on their glasses
and/or pull their hair back. Sometimes a
second set of images of a person was taken
on a later date; such a set of images is
referred to as a duplicate set. Such
duplicates sets result in variations in
scale, pose, expression, and illumination of
the face.
The database has 14,126
images of 1199 individuals. For some people,
over two years elapsed between their first
and last photos. The Government sequestered
1061 sets out of 1564 sets of images to
enable independent evaluations like the FRVT
2000.
EXPERIMENTS
Expression Experiments
The expression experiments
were designed to evaluate the performance of
face matching algorithms when comparing
images of the same person with different
facial expressions, an obvious issue in real
world applications.
The results of this
experiment provide an upper bound on
performance of an algorithm for gallery and
probe images taken within five minutes of
each other because it turns out that
recognition algorithms are robust to
expression changes and have far bigger
issues, such as recognizing someone at a
later time (e.g. a month later).
Temporal Experiments
The temporal experiments
address the effect of time delay between a
gallery image and subsequent captures of
facial images. Solving this problem is very
important to the success of real world
applications of this technology. In testing,
the problem with temporal experiments is
getting large sets of data to test
algorithms on since it requires volunteers
to return to be photographed over many
years. The FRVT 2000 experiments rely on
imagery gathered during a period of less
than two years.
Tests done on images with
a delay of 0 to 1031 days compared to a test
with a delay of 540 to 1031 days have
similar top one rank results in identifying
subjects, which is a significant mark of
improvement over the older FERET evaluations
where the difference between these two tests
was seven percentage points.
Other temporal experiments
varied galleries by lighting and compared
them to probe images recorded between 11 and
13 months prior to the gallery. The lighting
types from easiest to most difficult for
recognition algorithms were mugshot
lighting, FERET-style lighting (similar to
studio lighting), and overhead lighting.
These results also show that identifying
faces from images taken more than a year
apart remains a problem.
Pose Experiments
In an application of face
recognition where subjects may be unaware of
surveillance being used, they will
undoubtedly not be looking straight into one
of the cameras in a given area. A subject's
head may be vary by the declination angle or
head tilt, but more commonly, they vary by
the azimuthal head angle, referred to here
as the "pose."
The pose experiments show
that performance is stable when the angle
between a frontal gallery image and a probe
is less than 25 degrees and that performance
dramatically falls off when the angle is
greater than 40 degrees.
Compression Experiments
The compression
experiments were designed to estimate the
effect of lossy image compression on the
performance of face-matching algorithms.
Although image compression is widely used to
satisfy space and bandwidth constraints, its
effect in computer vision applications is
often assumed to be harmful to imagery and
is therefore usually avoided. The gallery
images in this experiment were obtained
under favorable, uncompressed circumstances,
but the probe sets were compressed by
different ratios.
These experiments show
that compression of facial images does not
necessarily adversely affect performance.
Performance actually slightly increases for
10:1 and 20:1 compression rates versus
uncompressed probe images. It is not until a
compression ratio of 40:1 that the
performance rate drops below that of the
uncompressed probes. These results were
aggregated and only considered JPEG
compression.
Media Experiments
To evaluate the effect of
storage media of images on performance, the
FRVT 2000 tested digital CCD and 35mm film
images. This issue is raised when comparing
mugshot pictures, captured on 35 mm film, to
video camera images. Using the different
media did not significantly affect the
performance of Lau Technologies or Visionics
Corp.'s algorithms.
Distance Experiments
These experiments were
designed to evaluate the effect of
increasing distance on performance of face
matching algorithms. The probe images in
these experiments were taken from relatively
low-resolution, lightly compressed, video
sequences of subjects walking toward a
camera.
Increases in distance are
akin to lowering resolution, in that the
concern is the decreasing number of pixels
that are used to display a person's face.
The distance experiments
across all algorithms and the three sets of
distance experiments show that performance
decreased as distance between the person and
camera increased. There were three sets of
distance experiments: indoor digital gallery
images v. indoor video probes 2, 3 and 5
meters from the camera; indoor video gallery
images v. indoor video probes 3 and 5 meters
from the camera; and outdoor video gallery
images v. outdoor video probes 3 and 5
meters from the camera. (The performance
effects due to lighting differences between
the last two experiments are covered in the
next section.)
Illumination Experiments
While many researchers
have devoted efforts to correcting for (or
normalizing) lighting differences in images,
overcoming illumination variation remains a
difficult issue in face recognition. Gallery
images (or enrollment images for a real
system) need to be captured under the same
lighting conditions as the expected future
images for successful subsequent matching.
This problem is particularly acute when
comparing indoor and outdoor images.
These experiments compare
a gallery of subjects taken indoors under
mugshot lighting to different probe sets of
images taken shortly before or after their
gallery matches using different lighting
arrangements. All images were high-quality
digital pictures, and subjects had normal
facial expressions.
The least difficult test
for the algorithms involved comparing the
indoor mugshot gallery to an indoor probe
set with overhead lighting. The most
difficult was comparing the gallery to an
outdoor probe set.
Resolution Experiments
Image resolution is
another important factor in face recognition
performance. As resolution is decreased,
performance may degrade, and there is
lower-bound at which the face is no longer
recognizable. Here, resolution was defined
as the number of pixels between the eyes
from eye-center to eye-center, or
inter-pupil distance.
The gallery consisted of
high-resolution (full-resolution, digital
CCD) indoor images at a fixed distance to a
camera shot under mugshot standard flood
lighting, and was compared to various probe
sets. The gallery inter-pupil distance
averaged to 138.7 pixels across all subjects
and ranged from 88 to 163 pixels. The probe
sets were derived from these gallery images,
using a standard reduction algorithm that
reduces the inter-pupil distance ratio as
desired, preserving the aspect ratio. The
lower end of the new images have inter-pupil
distances as low as 15 pixels. These new
images in the probe set are normalized
across face size (i.e. subjects with large
faces are reduced by a greater factor).
Overall and contrary to
conventional wisdom, these experiments
showed that performance improved with lower
resolution, while above a lower-bound. The
results of the test of inter-pupil distance
for 45 pixels exceeded the results for an
inter-pupil distance of 60 pixels. All
systems had their worst performance with the
probe set with an inter-pupil distance of 15
pixels.
OVERALL CONCLUSIONS FOR
THE TECHNOLOGY EVALUATION
Where previous evaluations
identified temporal and pose variations as
two key areas for future research in face
recognition, the FRVT 2000 showed that
progress had been made with respect to the
former, but developing algorithms that can
handle a year or more variation between
image capture is still a very imperative
research area. In addition, developing
algorithms that can compensate for pose,
illumination, and distance changes were
noted as other areas of needed research.
Differences in expression and media storage
do not appear to be issues for commercial
algorithms.
The FRVT 2000 experiments
on compression confirm previous findings
that moderate levels of compression do not
adversely affect performance. Resolution
experiments find that moderately decreasing
the resolution can slightly improve
performance, which is good news since many
video surveillance cameras do not acquire
high quality images - especially aged
cameras. In most cases, compression and
reducing resolution are lowpass filters, and
suggest that such filtering can increase
performance.
REFERENCES AND END NOTES
P. J. Phillips, A. Martin,
C. L. Wilson, and M. Przybocki. "An
Introduction to Evaluating Biometric
Systems." IEEE Computer, February 2000,
pp. 56-63.
P. J. Phillips, H. Moon,
S. Rizvi
, and P. Rauss, "The FERET Evaluation
methodology for face-recognition
algorithms," IEEE trans. PAMI, Vol. 22,
No. 10, 2000.
T. Mansfield and J.L.
Wayman. Best Practices in Testing and
Reporting Performance of Biometric Devices,
Version 1.0.
12 January 2000
. http://www.afb.org.uk/bwg/bestprac10.pdf
1.
Security also depends on other factors, such
as the care taken to safeguard tokens and
passwords and to ensure that transmissions
of biometric data are adequately protected.
2 This
primer does not cover standards for
interoperability or so-called "plug and
play" applications because this subject
is tangential to the project. This appendix
relies heavily on the following sources:
Hawkes and Hefferman (1999) and Wayman
(1999c, 19TK). See also Jain, Bolle, and
Pankanti (1998).
3 See, e.g., Appendix B,
Program Reports, Fort Sill Biometrically
Protected Smart Card.
4 Image files of
fingerprints may be of interest to an
organization (such as the FBI or a bank)
because of their law enforcement or security
applications. In the case of fingerprints,
the military may want to keep both
electronic image files of the fingerprint as
well as the biometric templates. The image
files are too large to be used for biometric
applications but would be useful for
forensic purposes. Moreover, an organization
might want to store image files to give it
greater technical flexibility. For example,
if the FBI did not keep image files of
enrollees, it might have to physically
reenroll each individual if the FBI decided
to change to a different proprietary
biometric system. Image files are also known
as raw data or the corpus.
5 For a detailed
discussion of smart cards, see Ratha and
Bolle (1999).
6 Electrical capacitance
has proved to be the best and least
reproducible method for effectively
identifying a live person.
7 For a detailed
discussion of these mainstream biometrics,
see Jain, Bolle, and Pankanti (1999).
8 James Wayman used
"covert" instead of
"clandestine."
9 The authors compiled
Table A.1 from various sources at the SJB
Biometrics 99 Workshop,
November 9-11, 1999
, including Hawkes and Hefferman (1999). See
also Jain, Bolle, and Pankanti (1998).
10 This is a function of
the population using the system.
11 Although not covered
here, vendors, city governments, and
airports have conducted scenario evaluations
of face recognition systems to determine
their efficacy in specific locations when
used by the general population or the
airport employees. Overall, these pilots
have shown very poor performance.
|