KSG navigation to KSG and Harvard



 Method
 Privacy

 Technology

 Scenarios

 Registration
 Reference
 Home

 

Print-friendly version

BIOMETRICS: A TECHNICAL PRIMER
Author: Elaine M. Newton, with John D. Woodward

Adapted from John D. Woodward, Katherine W. Webb, Elaine M. Newton et al., Appendix A, "Biometrics: A Technical Primer," "Army Biometric Applications: Identifying and Addressing Sociocultural Concerns," RAND/MR-1237-A, Santa Monica , CA : RAND 2001. Copyright RAND 2001. Acknowledgements to Andy Oram and Jean Camp.

INTRODUCTION 

In general, there are three approaches to authenticating an individual's identity. In order of most secure and convenient to least secure and convenient, they are as follows:

·        Something you are - a biometric.

·        Something you know - PIN, password.

·        Something you have - key, token, card.

Any combination of these approaches can potentially further heighten security.1  

Facial recognition software, fingerprint readers, hand geometry readers, and other forms of biometrics appear increasingly in systems with mission-critical security. Given the widespread consensus in the security community that passwords and magnetic-stripe cards accompanied by PINs have weaknesses, biometrics could well be ensconced in future security systems.

This document begins with a definition of biometrics and related terms. It then describes the steps in the biometric authentication process, and reviews issues of template management and storage. The appendix concludes with a brief review of mainstream biometric applications.2

OVERVIEW

A biometric is any measurable, robust, distinctive physical characteristic or personal trait that can be used to identify, or verify the claimed identity of, an individual. Biometric authentication, in the context of this report, refers to automated methods of identifying, or verifying the identity of, a living person.

The italicized terms above require explanation.

Measurable means that the characteristic or trait can be easily presented to a sensor and converted into a quantifiable, digital format. This allows for the automated matching process to occur in a matter of seconds.

The robustness of a biometric is a measure of the extent to which the characteristic or trait is subject to significant changes over time. These changes can occur as a result of age, injury, illness, occupational use, or chemical exposure. A highly robust biometric does not change significantly over time. A less robust biometric does. For example, the iris, which changes very little over a person's lifetime, is more robust than a voice.

Distinctiveness is a measure of the variations or differences in the biometric pattern among the general population. The higher the degree of distinctiveness, the more unique the identifier. The highest degree of distinctiveness implies a unique identifier. A low degree of distinctiveness indicates a biometric pattern found frequently in the general population. The iris and the retina have higher degrees of distinctiveness than hand or finger geometry.

The application helps determine the degree of robustness and distinctiveness required. The system's ability to match a sample to a template is sometimes referred to as the biometric's reliability.

Systems can be used either to identify people in a consensual or nonconsensual manner - as when faces are scanned in public places - or to verify the claimed identity of a person who presents a biometrics sample in order to gain access or authorization for an activity. The following section expands on this issue.

The presence of a living person distinguishes biometric authentication from forensics, which does not involve real-time identification of a living individual.

IDENTIFICATION VERSUS VERIFICATION

Identification and verification differ significantly. With identification, the biometric system asks and attempts to answer the question, "Who is X?" In an identification application, the biometric device reads a sample and compares that sample against every template in the database. This is called a "one-to-many" search (1:N). The device will either make a match and subsequently identify the person or it will not make a match and not be able to identify the person.

Verification is when the biometric system asks and attempts to answer the question, "Is this X?" after the user claims to be X. In a verification application, the biometric device requires input from the user, at which time the user claims his identity via a password, token, or user name (or any combination of the three). This user input points the device to a template in the database. The device also requires a biometric sample from the user. It then compares the sample to or against the user-defined template. This is called a "one-to-one" search (1:1). The device will either find or fail to find a match between the two.

Identification applications require a highly robust and distinctive biometric; otherwise, the error rates falsely matching and falsely nonmatching user's samples against templates cause security problems and inhibit convenience. Identification applications are common where the end-user wants to identify criminals (immigration, law enforcement, etc.) or other "wolves in sheep's clothing." Other types of applications may use a verification process.3 In many ways, deciding whether to use identification or verification requires a trade-off: the end-user's needs for security versus convenience.

In sum, biometric authentication is used in two ways: to prove who you are or who you claim you are and to prove who you are not (e.g., to resolve a case of mistaken identity).

THREE BASIC ELEMENTS OF ALL BIOMETRIC SYSTEMS

All biometric systems consist of three basic elements:

1. Enrollment, or the process of collecting biometric samples from an individual, known as the enrollee, and the subsequent generation of his template.

2. Templates, or the data representing the enrollee's biometric.

3. Matching, or the process of comparing a live biometric sample against one or many templates in the system's database.

Performance refers to the ability of a biometric system to correctly match, or identify individuals.

Enrollment

Enrollment is the crucial first stage for biometric authentication because it generates a template that will be used for all subsequent matching. Typically, the device takes three samples of the same biometric and averages them to produce an enrollment template. Enrollment is complicated by the fact that a users' familiarity with a biometric device usually improves performance because they know how to place themselves in front of or onto a sensor, but enrollment is usually the first time the user is exposed to the device.

Environmental conditions also affect enrollment. Enrollment should take place under conditions similar to those expected during the routine matching process. For example, if voice verification is used in an environment where there is background noise, the enrolling system should capture voice templates in the same environment.

In addition to user and environmental issues, biometrics themselves change over time. Many biometric systems account for these changes by continuously averaging. Templates are averaged and updated each time the user attempts authentication.

Templates

The biometric device stores the data captured when enrolling a person as a template. The device uses a proprietary algorithm to extract features appropriate to that biometric from the enrollee's samples. Templates are only a record of distinguishing features, sometimes called minutiae points, of a person's biometric characteristic or trait. For example, templates are not an image or record of the actual fingerprint or voice.4 In basic terms, templates are numerical representations of key points taken from a person's body. They can be thought of as very long passwords that can identify a body part or behavior.

The template usually occupies a small amount of computer memory (and is smaller than the original image) and thus allows for quick processing, a key feature of making biometric authentication practical.

The template must be stored somewhere so that subsequent templates, created when a user tries to access the system using a sensor, can be compared. Some biometric experts claim it is impossible to reverse-engineer, or recreate, a person's print or image from the biometric template.

Matching

Matching is the comparison of two templates: the one produced at the time of enrollment (or at previous sessions, if there is continuous updating) and the one produced "on the spot" as a user tries to gain access by providing a biometric sample via a sensor.

There are three ways a match can fail:

        ·      Failure to enroll / Failure to acquire
·       False match
·
       False nonmatch

Both failure to enroll (during enrollment) and failure to acquire (prior to matching) are failures to extract distinguishing features appropriate to that technology. For example, a small percentage of the population fails to enroll in fingerprint-based biometric authentication systems. There are two primary reasons for this failure: the individual's fingerprints are not distinctive enough to be picked up by the system, or the distinguishing characteristics of the individual's fingerprints have been altered because of the individual's age or occupation, e.g., as might happen with an elderly bricklayer.

False match (FM) and false nonmatch (FNM) are frequently misnomered "false acceptance" and "false rejection," respectively, but the latter pair of terms are application-dependent in meaning. FM and FNM are application-neutral terms that describe the matching process between a live sample and a biometric template.

A false match occurs when a sample is incorrectly matched to a template in the database (i.e., an imposter is accepted). A false nonmatch occurs when a sample is incorrectly not matched to a truly matching template in the database (i.e., a legitimate match is denied). People deploying biometric systems calculate rates for FM and FNM and use them to make tradeoffs between security and convenience when choosing a system or tuning its parameters. For example, a heavy security emphasis errs on the side of denying legitimate matches and does not tolerate acceptance of imposters. A heavy emphasis on user convenience results in little tolerance for denying legitimate matches but tolerates some acceptance of imposters.

TEMPLATE MANAGEMENT-STORAGE AND SECURITY

Template management is critically linked to privacy, security, and convenience. All biometric authentication systems face a common issue: biometric templates must be stored somewhere. Templates must be protected to prevent identity fraud and to protect the privacy of users. Privacy is affected when additional information is stored about each user along with the biometric template.

Possible locations template storage include :

        ·        The biometric device itself
·        A central computer that is remotely accessed
·        A plastic card or token via a bar code or magnetic
      stripe
·        Radio Frequency Identification Device cards and tags
·        Optical memory cards
·        Personal Computer Memory Card International
      Association cards
·        Smart cards

In general, transmitting biometric data over communications lines reduces system security because the data become vulnerable to the same interception or tampering possible when any data is sent "over the wire." On the other hand, a network or central repository may be needed for some applications where there are multiple access points, or when there is a need to confirm information with another node or higher authority. Biometrics are more secure when stored under the control of the authorized user, such as on a smart card, and used in verification applications. Cards have varying degrees of utility and storage memory.

Smart cards are the size of credit cards and have an embedded microchip or microprocessor chip. The chip stores electronic data that can be protected using biometrics. There are two types of smart cards: contact and contactless smart cards. A contact smart card must be inserted into a smart card reader to be used. A contactless smart card only has to be placed near an antenna to carry out a transaction.5

Security for template database storage is also affected by the number of uses to which the database is put: will it have a unique use or will it be used for multiple security purposes?

For example, a facilities manager might use a fingerprint reader for physical access control to the building. The manager might also want to use the same fingerprint template database for his employees to access their computer network. Should the manager use separate databases for these different uses, or is he willing to risk accessing employee fingerprints from a remote location for multiple purposes?

Additional security features can be incorporated into biometric systems to detect a "wolf," or unauthorized user. For example, a "liveliness test" tries to determine whether the biometric sample is being read from a live person versus a faux body part or body part of a dead person. Liveliness tests are done in many ways. The device can check for such things as heat, heartbeat, or electrical capacitance.6

Other security features include encryption of biometric data and the use of sequence numbers in template transmission. A template with such a number out of sequence suggests unauthorized use.

In general, verification applications provide more security than identification applications because a biometric and at least one other piece of input (e.g., PIN, password, token, user name) are required to match a template and the corresponding record. In essence, it is a second layer of security.

Verification provides a user with more control over his data and over the process when the template is stored only on a card. Such a system would not allow for clandestine, or involuntary, capture of biometric data because the individual would know each time, where, and to what system s/he were submitting their biometric. Verification applications with storage (and possibly matching, too) of a biometric template on a card are potentially more palatable to the public (for privacy, convenience, and security concerns) and more secure than identification applications or applications with a repository for many reasons:

1. There is no large centralized storage location of templates, which could be abused or hacked. Even distributed databases should be regarded as "honey pots" for hackers and leave open the possibility of abuse by an administrator.

2. They require the user's consent to capture data.

3. There is an added layer of security making it necessary to be in possession of a card. Also requiring a password can add yet another layer of security.

4. Because the search seeks only a match against one template in the database, verification applications require less processing time and memory.

BIOMETRIC APPLICATIONS

Most biometric applications fall into one of nine general categories. First there are financial services (e.g., ATMs and kiosks) to limit risks by using biometrics to provide authentication to data. The second large class is to evaluate the right of individuals to make certain movements and cross borders. These are most widely proposed for immigration and border control (e.g., points of entry, precleared frequent travelers, passport and visa issuance, asylum cases).

Biometrics are broadly used in cases where the physical entity is authenticated. In social services, biometrics provide fraud prevention in entitlement programs. In health care biometrics offer security measures for the privacy of medical records. Biometrics are used for physical access control in a variety of institutions (e.g. government, and residential).

Biometrics are also used for narrow replacements for traditional problems of verification. Applications here include time and attendance where biometrics are used as a replacement of time punch-cards. Biometrics are widely proposed as solutions to problems in computer security including personal computer access, network access, Internet use, e-commerce, and e-mail authentication.

Biometrics are proposed as an enabling underlying service in telecommunications to limit mobile phone fraud, authenticate callers into call centers, strengthen the security of phone cards, and enable televised shopping.

Finally, biometrics have been embraced by law enforcement for use in criminal investigations, national ID systems, driver's licenses, correctional institutions/prisons, home confinement, and have been integrated into smart gun designs.

MAINSTREAM BIOMETRICS AND THEIR APPLICATIONS

While there are many possible biometrics, at least eight mainstream biometric authentication technologies have been deployed or pilot-tested in applications in the public and private sectors:7 (The leaders are listed as the top four.)

fingerprint
iris scan
facial recognition
hand/finger geometry
voice recognition
retinal scan
dynamic signature verification
keystroke dynamics

Fingerprint

The fingerprint biometric is an automated, digital version of the old ink-and-paper method used for more than a century for identification, primarily by law enforcement agencies. Users place their finger on a platen for the print to be read. The minutiae are then extracted by the vendor's algorithm, which also makes a fingerprint pattern analysis. Fingerprint template sizes are typically 50 to 1,000 bytes.

Fingerprint biometrics currently have three main application arenas: large-scale Automated Finger Imaging Systems (AFIS) (generally used for law enforcement), fraud prevention in entitlement programs, and physical and computer access.

Iris Scan

Iris scanning measures the iris pattern in the colored part of the eye, although the iris color has nothing to do with the biometric. Iris patterns are formed randomly. As a result, the iris patterns in your left and right eyes are different, and so are the iris patterns of identical twins. Iris scan templates are typically around 256 bytes.

Iris scanning can provide quick authentication for both identification and verification applications because of its large number of degrees of freedom. Current pilot programs and applications include ATMs ("Eye-TMs"), grocery stores (for checking out), and the Charlotte/Douglas International Airport (physical access). During the Winter Olympics in Nagano , Japan , an iris scanning identification system controlled access to the rifles used in the biathlon.

Facial Recognition

Facial recognition records the spatial geometry of distinguishing features of the face. Different vendors use different methods of facial recognition, however, all focus on measures of key features. Facial recognition templates are typically 83 to 1,000 bytes. Facial recognition technologies can encounter performance problems stemming from a number of factors, including noncooperative user behavior and environmental variables such as lighting.

Facial recognition has been used to identify card counters in casinos, shoplifters in stores, criminals in targeted urban areas, and terrorists.

(See the Appendix for an in depth review of face recognition performance.)

Hand/Finger Geometry

Hand or finger geometry is an automated measurement of many dimensions of the hand and fingers. Neither of these methods takes actual prints of the palm or fingers. Only the spatial geometry is examined as the user puts his hand on the sensor's surface and uses guiding poles between the fingers to properly place the hand and initiate the reading. Hand geometry templates are typically 9 bytes, and finger geometry templates are 20 to 25 bytes. Finger geometry usually measures two or three fingers. During the 1996 Summer Olympics, hand geometry secured the athlete's dormitories at Georgia Tech. Hand geometry is a well-developed technology that has been thoroughly field-tested and is easily accepted by users.

Voice Recognition

Voice or speaker recognition uses vocal characteristics to identify individuals. It involves their speaking a pass-phrase so that the sample they used when enrolling can match the sample the use at the time of attempted access. A telephone or microphone can serve as a sensor, which makes it a relatively cheap and easily deployable technology.

Voice recognition can be affected by environmental factors, particularly background noise. Additionally, it is unclear whether the technologies actually recognize the voice or just the pronunciation of the pass-phrase (password) used. This technology has been the focus of considerable efforts on the part of the telecommunications industry and NSA, which continue to work on improving reliability.

Retinal Scan

Retinal scans measure the blood vessel patterns in the back of the eye. Retinal scan templates are typically 40 to 96 bytes. Because the retina can change with certain medical conditions, such as pregnancy, high blood pressure, and AIDS, this biometric might have the potential to reveal more information than just an individual's identity.

Because end-users perceive the technology to be somewhat intrusive, retinal scanning has not gained popularity with them. The device shines a light into the eye of a user, who must be standing very still within inches of the device.

Dynamic Signature Verification

Dynamic signature verification is an automated method of examining an individual's signature. This technology examines such dynamics as speed, direction, and pressure of writing; the time that the stylus is in and out of contact with the "paper"; the total time taken to make the signature; and where the stylus is raised from and lowered onto the "paper." Dynamic signature verification templates are typically 50 to 300 bytes.

Keystroke Dynamics

Keystroke dynamics is an automated method of examining an individual's keystrokes on a keyboard. This technology examines such dynamics as speed and pressure, the total time taken to type a particular password, and the time a user takes between hitting certain keys. This technology's algorithms are still being developed to improve robustness and distinctiveness. One potentially useful application that may emerge is computer access, where this biometric could be used to verify the computer user's identity continuously.

CLASSIFYING BIOMETRIC APPLICATIONS

Biometric applications may be classified in many different ways. James Wayman of the National Biometric Test Center suggests the following seven categories for classifying biometric applications, explained below.

1. overt or clandestine
2. cooperative or noncooperative
3. habituated or nonhabituated
4. supervised or nonsupervised
5. standard or nonstandard environment
6. closed or open system
7. public or private.

Overt versus clandestine capture of a biometric sample refers to the user's awareness that he is participating in biometric authentication.8 Facial recognition is an example of a biometric that can be used for clandestine identification of individuals. Most uses of biometrics are overt, because users' active participation improves performance and lowers error rates. Verification applications are nearly always overt.

Cooperative versus noncooperative applications refer to the behavior that is in the best interest of the "wolf." Is it in the interest of "wolves" to match or to not match a template in the database? Which is to the "wolf's" benefit? This is important in planning a security system with biometrics because no perfect biometric system exists. Every system can be tricked into falsely not matching one's sample and template-some more easily than others. It is also possible to trick a biometric device into falsely matching your sample against a template, but it could be argued that this requires more work and a sophisticated hacker to make a model of the biometric sample.

In systems that store user information in a database, an intruder or "wolf" may try to trick the system into divulging biometric samples or other information. One way to strengthen security in a cooperative application is to require a password or token along with a biometric, so that the "wolf" must match one specific template and is not allowed to exploit the entire database for his gain.

To gain access to a computer, a "wolf" would want to be cooperative. To attempt to foil an INS database consisting of illegal border crossing recidivists, a "wolf" (recidivist) would be noncooperative.

Habituated versus nonhabituated use of a biometric system refers to how often the users interface with the biometric device. This is significant because the user's familiarity with the device affects its performance. Depending on which type of application is chosen, the end-user may need to utilize a biometric that is highly robust. As examples, the use of fingerprints for computer or network access is a habituated use, while the use of fingerprints on a driver's license, which is updated once every several years, is a nonhabituated use. Even "habituated" applications are "nonhabituated" during their first week or so of operation or until the users adjust to using the system.

Supervised versus nonsupervised applications refer to whether supervision (e.g., a security officer) is a resource available to the end-user's security system. Do users need to be instructed on how to use the device (because the application has many new users or nonhabituated users) or be supervised to ensure they are being properly sampled (such as border crossing situations that deal with the problem of recidivists or other noncooperative applications)? Or is the application made for increased convenience, such as at an ATM? Routine use of an access system may or may not require supervision. The process of enrollment nearly always requires supervision.

Standard versus nonstandard environments are generally a dichotomy between indoors versus outdoors. A standard environment is optimal for a biometric system and matching performance. A nonstandard environment may present variables that would create false nonmatches. For example, a facial recognition template depends, in part, on the lighting conditions when the "picture" (image) was taken. The variable lighting outdoors can cause false nonmatches. Some indoor situations may also be considered nonstandard environments.

Closed versus open systems refers to the number of uses of the template database, now and in the future. Will the database have a unique use (closed), or will it be used for multiple security measures (open)? Recall the fingerprint example from "Template Management-Storage and Security" for employees to enter a building and log on to their computer network. Should they use separate databases for these different uses, or do they want to risk remotely accessing employee fingerprints for multiple purposes?

Other examples are state driver's licenses and entitlement programs. A state may want to communicate with other states or other programs within the same state to eliminate fraud. This would be an open system, in which standard formats of data and compression would be required to exchange and compare information.

Public or private applications refer to the users and their relationship to system management. Examples of users of public applications include customers and entitlement recipients. Users of private applications include employees of business or government. Both user attitudes toward biometric devices and management's approach vary depending on whether the application is public or private. Once again, user attitudes toward the device will affect the performance of the biometric system.

It should be noted here that performance figures and error rates from vendor testing are unreliable for many reasons. Part of the problem is that determining the distinctiveness of a biometric accurately requires thousands or even millions of people. To acquire samples over any amount of time in any number of contexts from this number of people would be impossible. To test for the many variables in each type of application would be impossible in most cases, and too costly in the few where it is possible. Operational and pilot testing is the only reasonable method to test a system. Additionally, vendor and scientific laboratory testing generally present only the easiest deployment scenario of a biometric application: overt, cooperative, habituated, supervised, standard, closed, and private.

SALIENT CHARACTERISTICS OF MAINSTREAM BIOMETRICS

Table A.1 compares the eight mainstream biometrics in terms of a number of characteristics, including how robust and they are, how intrusive they are, and what applications they can be used for (i.e., identification or verification, or verification alone).9 This table is an attempt to assist the reader in categorizing biometrics along important dimensions. Because this industry is still working to establish comprehensive standards and the technology is changing rapidly, however, it is difficult to make assessments with which everyone would agree. The table represents an assessment based on discussions with technologists, vendors, and program managers.

Biometric

Identify versus Verify

Robust

Distinctive

Intrusive

Fingerprint

Either

High to Moderate 10

High

Touching

Hand/Finger Geometry

Verify

Moderate

Low

Touching

Facial Recognition

Either

Moderate

Moderate

12+ inches

Voice Recognition

Verify

Low

Moderate

Remote

Iris Scan

Either

High

High

12+ inches

Retinal Scan

Either

High

High

1-2 inches

Dynamic Signature Verification

Verify

Low

Low

Touching

Keystroke Dynamics

Verify

Low

Low

Touching

Table A.1 Comparison of Mainstream Biometrics

Half the systems in Table A.1 can be used for either identification or verification, while the rest can be used only for verification. In particular, hand geometry has been used only for verification applications, such as physical access control and time and attendance verification. In addition, voice recognition, because of the need for enrollment and matching using a pass-phrase, is typically used for verification only.

Robustness and distinctiveness vary considerably. Fingerprinting is moderately robust, and, although it is distinctive, a small percentage of the population has unusable prints, usually because of age, genetics, injury, occupation, exposure to chemicals, or other occupational hazards. Hand/finger geometry is moderate on the distinctiveness scale, but it is not very robust, while facial recognition is neither highly robust nor distinctive. As for voice recognition, assuming the voice and not the pronunciation is what is being measured, this biometric is moderately robust and distinctive. Iris scans are both highly robust (because they are not highly susceptible to day-to-day changes or damage) and distinctive (because they are randomly formed). Retinal scans are fairly robust and very distinctive. Finally, neither dynamic signature verification nor keystroke dynamics are particularly robust or distinctive.

As the table shows, the biometrics vary in terms of how intrusive they are, ranging from those biometrics that require touching to others that can recognize an individual from a distance.

APPENDIX A: FACE RECOGNITION PERFORMANCE

There are three types of evaluations in the biometrics community: technology, scenario, and operations. The Facial Recognition Vendor Test 2000 (FRVT 2000) marked the state-of-the-art of face biometrics and the research issues that continue to challenge the face recognition community through both technology and scenario evaluations. (The results of the follow-on study, FRVT 2002, will be released sometime in 2003.) A significant part of the story that FRVT 2000 tells is not just in the results but in the number of participants, types of participants, approach and evaluation protocol, parameters tested, and how the results are reported.11

The Facial Recognition Vendor Test 2000 consisted of two components: the Technology Evaluation (referred to as the "Recognition Performance Test" in the FRVT 2000 Report) and the Physical Access Scenario Test (referred to as the "Product Usability Test" in the FRVT 2000 Report). The FRVT 2000 Technology Evaluation is an assessment of commercially available facial recognition systems. The FRVT 2000 Physical Access Scenario Test is an example of a limited scenario evaluation, but not all products tested were designed for physical access applications. Hence, the performance results of different systems are difficult to compare to one another or analyze.

By far, the most important results of the FRVT 2000 Report are drawn from the Technology Evaluation and are reported here. The vendors that volunteered to participate in the evaluation were Banque-Tec, C-VIS, Lau Technologies, Miros Inc. (eTrue), and Visionics Corp.

TECHNOLOGY EVALUATION

METHODOLOGY

For the Technology Evaluation (referred to as the "Recognition Performance Test" in the FRVT 2000 Report) portion of this study, the vendors were asked to compare 13,872 images from a sequestered database (called FERET, see 2.1.1). Since each image is compared to the entire set, this amounts to more than 192 million comparisons. The vendors were given 72 hours to make these comparisons. C-VIS, Lau Technologies, and Visionics Corp. successfully completed the comparison task. Banque-Tec completed approximately 9,000 images, and Miros Inc. (eTrue) completed approximately 4,000 images in the time allowed. Since Banque-Tec and Miros Inc. (eTrue) were unable to complete all of the comparisons, their results were not included in the Technology evaluation.

In face recognition, a gallery is a set of known individuals against which an algorithm attempts to perform recognition. A probe set is a set of images of unknown individuals that an algorithm attempts to recognize. For the Technology Evaluation, the complete set of 13,872 images and the corresponding matrix of 13,872 x 13,872 similarity scores were divided into several subsets and used as probe and gallery images for various experiments.

FERET Images

Research in face recognition was greatly enhanced and more firmly established as a scientific field in the 1990s through the work of P. Jonathon Phillips with the U.S. Army and the Department of Defense Counterdrug Technology Development Program. Part of this work includes the collection of the first database of face images collected in a systematic manner called the Face Recognition Technology (FERET) database, which was accompanied by an evaluation and its methodology. (This creation of all three were gigantic leaps for face recognition as well as for any single biometric.)

The FERET database was collected between 1993 and 1996. For each individual, two frontal views were taken ( fa and fb ); a different facial expression was requested for the second frontal image. For 200 sets of images, a third frontal image was taken with a different camera and different lighting (this is referred to as the fc image). The remaining images were collected at various aspects between right and left profile. To add simple variations to the database, photographers sometimes took a second set of images, for which the subjects were asked to put on their glasses and/or pull their hair back. Sometimes a second set of images of a person was taken on a later date; such a set of images is referred to as a duplicate set. Such duplicates sets result in variations in scale, pose, expression, and illumination of the face.

The database has 14,126 images of 1199 individuals. For some people, over two years elapsed between their first and last photos. The Government sequestered 1061 sets out of 1564 sets of images to enable independent evaluations like the FRVT 2000.

EXPERIMENTS

Expression Experiments

The expression experiments were designed to evaluate the performance of face matching algorithms when comparing images of the same person with different facial expressions, an obvious issue in real world applications.

The results of this experiment provide an upper bound on performance of an algorithm for gallery and probe images taken within five minutes of each other because it turns out that recognition algorithms are robust to expression changes and have far bigger issues, such as recognizing someone at a later time (e.g. a month later).

Temporal Experiments

The temporal experiments address the effect of time delay between a gallery image and subsequent captures of facial images. Solving this problem is very important to the success of real world applications of this technology. In testing, the problem with temporal experiments is getting large sets of data to test algorithms on since it requires volunteers to return to be photographed over many years. The FRVT 2000 experiments rely on imagery gathered during a period of less than two years.

Tests done on images with a delay of 0 to 1031 days compared to a test with a delay of 540 to 1031 days have similar top one rank results in identifying subjects, which is a significant mark of improvement over the older FERET evaluations where the difference between these two tests was seven percentage points.

Other temporal experiments varied galleries by lighting and compared them to probe images recorded between 11 and 13 months prior to the gallery. The lighting types from easiest to most difficult for recognition algorithms were mugshot lighting, FERET-style lighting (similar to studio lighting), and overhead lighting. These results also show that identifying faces from images taken more than a year apart remains a problem.

Pose Experiments

In an application of face recognition where subjects may be unaware of surveillance being used, they will undoubtedly not be looking straight into one of the cameras in a given area. A subject's head may be vary by the declination angle or head tilt, but more commonly, they vary by the azimuthal head angle, referred to here as the "pose."

The pose experiments show that performance is stable when the angle between a frontal gallery image and a probe is less than 25 degrees and that performance dramatically falls off when the angle is greater than 40 degrees.

Compression Experiments

The compression experiments were designed to estimate the effect of lossy image compression on the performance of face-matching algorithms. Although image compression is widely used to satisfy space and bandwidth constraints, its effect in computer vision applications is often assumed to be harmful to imagery and is therefore usually avoided. The gallery images in this experiment were obtained under favorable, uncompressed circumstances, but the probe sets were compressed by different ratios.

These experiments show that compression of facial images does not necessarily adversely affect performance. Performance actually slightly increases for 10:1 and 20:1 compression rates versus uncompressed probe images. It is not until a compression ratio of 40:1 that the performance rate drops below that of the uncompressed probes. These results were aggregated and only considered JPEG compression.

Media Experiments

To evaluate the effect of storage media of images on performance, the FRVT 2000 tested digital CCD and 35mm film images. This issue is raised when comparing mugshot pictures, captured on 35 mm film, to video camera images. Using the different media did not significantly affect the performance of Lau Technologies or Visionics Corp.'s algorithms.

Distance Experiments

These experiments were designed to evaluate the effect of increasing distance on performance of face matching algorithms. The probe images in these experiments were taken from relatively low-resolution, lightly compressed, video sequences of subjects walking toward a camera.

Increases in distance are akin to lowering resolution, in that the concern is the decreasing number of pixels that are used to display a person's face.

The distance experiments across all algorithms and the three sets of distance experiments show that performance decreased as distance between the person and camera increased. There were three sets of distance experiments: indoor digital gallery images v. indoor video probes 2, 3 and 5 meters from the camera; indoor video gallery images v. indoor video probes 3 and 5 meters from the camera; and outdoor video gallery images v. outdoor video probes 3 and 5 meters from the camera. (The performance effects due to lighting differences between the last two experiments are covered in the next section.)

Illumination Experiments

While many researchers have devoted efforts to correcting for (or normalizing) lighting differences in images, overcoming illumination variation remains a difficult issue in face recognition. Gallery images (or enrollment images for a real system) need to be captured under the same lighting conditions as the expected future images for successful subsequent matching. This problem is particularly acute when comparing indoor and outdoor images.

These experiments compare a gallery of subjects taken indoors under mugshot lighting to different probe sets of images taken shortly before or after their gallery matches using different lighting arrangements. All images were high-quality digital pictures, and subjects had normal facial expressions.

The least difficult test for the algorithms involved comparing the indoor mugshot gallery to an indoor probe set with overhead lighting. The most difficult was comparing the gallery to an outdoor probe set.

Resolution Experiments

Image resolution is another important factor in face recognition performance. As resolution is decreased, performance may degrade, and there is lower-bound at which the face is no longer recognizable. Here, resolution was defined as the number of pixels between the eyes from eye-center to eye-center, or inter-pupil distance.

The gallery consisted of high-resolution (full-resolution, digital CCD) indoor images at a fixed distance to a camera shot under mugshot standard flood lighting, and was compared to various probe sets. The gallery inter-pupil distance averaged to 138.7 pixels across all subjects and ranged from 88 to 163 pixels. The probe sets were derived from these gallery images, using a standard reduction algorithm that reduces the inter-pupil distance ratio as desired, preserving the aspect ratio. The lower end of the new images have inter-pupil distances as low as 15 pixels. These new images in the probe set are normalized across face size (i.e. subjects with large faces are reduced by a greater factor).

Overall and contrary to conventional wisdom, these experiments showed that performance improved with lower resolution, while above a lower-bound. The results of the test of inter-pupil distance for 45 pixels exceeded the results for an inter-pupil distance of 60 pixels. All systems had their worst performance with the probe set with an inter-pupil distance of 15 pixels.

OVERALL CONCLUSIONS FOR THE TECHNOLOGY EVALUATION

Where previous evaluations identified temporal and pose variations as two key areas for future research in face recognition, the FRVT 2000 showed that progress had been made with respect to the former, but developing algorithms that can handle a year or more variation between image capture is still a very imperative research area. In addition, developing algorithms that can compensate for pose, illumination, and distance changes were noted as other areas of needed research. Differences in expression and media storage do not appear to be issues for commercial algorithms.

The FRVT 2000 experiments on compression confirm previous findings that moderate levels of compression do not adversely affect performance. Resolution experiments find that moderately decreasing the resolution can slightly improve performance, which is good news since many video surveillance cameras do not acquire high quality images - especially aged cameras. In most cases, compression and reducing resolution are lowpass filters, and suggest that such filtering can increase performance.

REFERENCES AND END NOTES

P. J. Phillips, A. Martin, C. L. Wilson, and M. Przybocki. "An Introduction to Evaluating Biometric Systems." IEEE Computer, February 2000, pp. 56-63.

P. J. Phillips, H. Moon, S. Rizvi , and P. Rauss, "The FERET Evaluation methodology for face-recognition algorithms," IEEE trans. PAMI, Vol. 22, No. 10, 2000.

T. Mansfield and J.L. Wayman. Best Practices in Testing and Reporting Performance of Biometric Devices, Version 1.0. 12 January 2000 . http://www.afb.org.uk/bwg/bestprac10.pdf

1. Security also depends on other factors, such as the care taken to safeguard tokens and passwords and to ensure that transmissions of biometric data are adequately protected.

2 This primer does not cover standards for interoperability or so-called "plug and play" applications because this subject is tangential to the project. This appendix relies heavily on the following sources: Hawkes and Hefferman (1999) and Wayman (1999c, 19TK). See also Jain, Bolle, and Pankanti (1998).

3 See, e.g., Appendix B, Program Reports, Fort Sill Biometrically Protected Smart Card.

4 Image files of fingerprints may be of interest to an organization (such as the FBI or a bank) because of their law enforcement or security applications. In the case of fingerprints, the military may want to keep both electronic image files of the fingerprint as well as the biometric templates. The image files are too large to be used for biometric applications but would be useful for forensic purposes. Moreover, an organization might want to store image files to give it greater technical flexibility. For example, if the FBI did not keep image files of enrollees, it might have to physically reenroll each individual if the FBI decided to change to a different proprietary biometric system. Image files are also known as raw data or the corpus.

5 For a detailed discussion of smart cards, see Ratha and Bolle (1999).

6 Electrical capacitance has proved to be the best and least reproducible method for effectively identifying a live person.

7 For a detailed discussion of these mainstream biometrics, see Jain, Bolle, and Pankanti (1999).

8 James Wayman used "covert" instead of "clandestine."

9 The authors compiled Table A.1 from various sources at the SJB Biometrics 99 Workshop, November 9-11, 1999 , including Hawkes and Hefferman (1999). See also Jain, Bolle, and Pankanti (1998).

10 This is a function of the population using the system.

11 Although not covered here, vendors, city governments, and airports have conducted scenario evaluations of face recognition systems to determine their efficacy in specific locations when used by the general population or the airport employees. Overall, these pilots have shown very poor performance.