Face Biometry Performance report¶
last updated: 2024-05-28
Introduction¶
das-Face is the face recognition engine designed and developed by Veridas Digital Authentication Solutions S.L. with the goal of performing automatic identity verification (1:1) and identification (1:N) under different scenarios.
In this document, a performance analysis of das-face is summarised, for verification (1:1) and identification (1:N).
These evaluation standards are commonly used in commercial and academic communities worldwide for system evaluation and comparison against state-of-the-art purposes. The results shown here are:
- Verification (1:1): FRVT 1:1 Verification, submission date: 2021-09-02
- Identification (1:N): FRVT 1:N Identification, submission date: 2021-11-09
das-Face is also capable of performing liveness detection using a passive procedure based on a selfie image, and a challenge-response active methodology based on a selfie and an annotated video. This document shows performance of das-Face for both use cases.
Veridas active liveness detection implemented in Selfie-Alive Pro was tested by iBeta to the ISO 30107-3 Biometric Presentation Attack Detection Standard. Confirmation letters are available in the links below:
Veridas passive liveness detection was tested by iBeta to the ISO 30107-3 Biometric Presentation Attack Detection Standard. Confirmation letters are available in the links below:
The document's content is divided as follows: In Section 2, definitions for understanding better analysis and results. In Section 3, analysis of das-Face performance in terms of the verification task (1:1). In Section 4, analysis of das-Face performance in terms of the identification task (1:N). In Section 5, the system calibration for the main use cases is presented. Section 6 presents information on the liveness detection engine.
The face recognition engine developed by Veridas was ranked by NIST as the third best in the world in the WILD category on April 4th, 2019, and it’s the subject of continuous development and improvement efforts.
The face recognition engine developed by VERIDAS was ranked by NIST in the top 25% of the systems presented to FRVT 1:1 to the WILD category. Find below a picture of all the competitors in the mentioned WILD category. The VERIDAS system has been marked in red (Results shown from NIST do not constitute an endorsement of any particular system, product, service, or company by NIST.)
VERIDAS achieved a False Non Match Rate (FNMR) of 3.11% for a False Match Rate (FMR) threshold fixed at 0.001%. VERIDAS also achieved a False Non Match Rate (FNMR) of 2.88% for a False Match Rate (FMR) threshold fixed at 0.01%. These figures put VERIDAS 0.14 points from the Top-3 system. Taking into account these results, VERIDAS will comply with the requirements of FIDO for facial biometric verification systems. Specifically, FIDO states that FNMR should be less than 5% for a FMR of 0.01%.
VERIDAS also complies with the CCN regulation, as the FNMR (FNR) should be less or equal than 5% for an FMR (FPR) of 0.0001% on the VISABORDER category. VERIDAS fully meets this requirement, as the FNMR (FNR) on that category is just 0.8%.
The WILD category is characterized by a non-collaborative subject, so the person whose face is being captured does not have to be facing the camera, and the picture could show different issues in terms of illumination, contrast, exposure, …
Because of the on-boarding pictures nature, the procedure may be similar to WILD category because the person is taking a picture in uncontrolled conditions.
The face recognition engine developed by Veridas was ranked by NIST in the top 25% best systems in the world in the MUGSHOT category on August 5th, 2021, and it’s the subject of continuous development and improvement efforts.
The face recognition engine developed by VERIDAS was ranked by NIST in the 51 of 319 systems presented to FRVT 1:N to the MUGSHOT category. The evaluation was performed on 2021 November. Find below a picture of all the competitors in the mentioned MUGSHOT category. The VERIDAS system has been marked in red (Results shown from NIST do not constitute an endorsement of any particular system, product, service, or company by NIST.)
VERIDAS achieved a False Negative Identification Rate (FNIR) of 1% for a False Positive Identification Rate (FPIR) threshold fixed at 0.3%, and with a gallery with N=1.6M.
The MUGSHOT category is characterized by a collaborative subject almost following ISO 19794-5, so the person whose face is being captured is in good acquisition conditions.
In the next diagram, we can see the relation between the searching time, the FNIR and other algorithms.
Definitions¶
Definitions to better understand the analysis and results:
- Verification task (1:1): Use case in which two different images containing the face of a person are presented to the system for it to determine if they are (or not) the same person.
- National Institute of Standards and Technology (NIST): Measurement standards laboratory whose mission is to promote innovation and industrial competitiveness.
- Negative evaluation: Evaluation of two images belonging to two different people.
- Positive evaluation: Evaluation of two images belonging to the same person.
- Accuracy: Percentage of correct answers provided by the system.
- False Positive Rate (FPR) or False Match Rate (FMR): Ratio between the number of negative evaluations wrongly categorized as positive and the total number of actual negative evaluations.
- True Positive Rate (TPR): Ratio between the number of positive evaluations correctly categorized as positive and the total number of actual positive evaluations.
- False Non Match Rate (FNMR): Ratio between the number of positive evaluations rejected by the system and the total number of actual positive evaluations.
- Identification task (1:N): Use case in which an image containing the face of a person is presented to the system, having the system access to a pool of N images each corresponding to an identity, in order for the system to determine to which of the N identities (if any) the presented image belongs to.
- Identification Rate (IR): Ratio between the number of successful identifications and the total number of performed identifications.
- Identification Rank (R): Upper bound of the position where the match should be in the list of candidates returned by the system in order to consider it a successful match.
- False Negative Identification Rate (FNIR): Ratio between the number of positive identifications rejected because they do not achieved the threshold, over the total number of actual positive identifications.
- False Positive Identification Rate (FPIR): Ratio between the number of negative identifications accepted because they do achieved the threshold, over the total number of negative identifications.
- Liveness detection: An automatic procedure whose purpose is to detect how likely the captured evidence (images, videos, …) belong to an actual person and not to a spoofed sample of a person.
- Bonafide: A presentation attempt that is performed by a trustworthy person.
- Attack: A presentation attempt that is performed by an impostor or spoofer.
- Attack Percentage Classification Error (APCER): Is the ratio between the number of spoof attacks misclassified as authentic over the total number of performed attacks.
- Bonafide Percentage Classification Error (BPCER): Is the ratio between the number of bonafide (actual person’s faces) misclassified as attacks over the total number of performed bonafide samples.
das-Face performance in identity verification (1:1)¶
In this section, the latest version of das-Face performance is carried out, using NIST evaluations as a reference.
1:1 Ongoing Face Recognition Vendor Test (FRVT) - Verification¶
This section describes the most relevant results on our model in the NIST 1:1.
On verification, multiple datasets are used when the system is tested. Currently the next types of datasets are used on NIST benchmarks:
- Wild images: Typical social-media images, with many photojournalism-style images. Images are given to the algorithm. This category is the most difficult one, as it is not collaborative.
- Mugshot images: Photographic portrait of a person from the waist up, typically taken after a person is arrested. These photos have a reasonable conformance with the ISO/IEC 19794-5 Full Frontal image type.
- VISA images: Photos collected in immigration offices,
- Border images: Border crossing images collected in primary immigration lanes.
The next graph report shows the system performance on different domains.
das-Face performance in identification (1:N)¶
This section describes the most relevant results of our biometric model in the NIST FRVT 1:N. In identification, a particular identity (probe) is searched within a pool of N number of known identities (gallery). The next Table reports how the system FNIR is affected by the size N of the gallery, fixed an operational point for FPIR=0.1% and using the dataset FRVT’2018 MUGSHOT, and when the identification rank is R=1.
N | VERIDAS FNIR (%) |
---|---|
640K | 0.42% |
1.6M | 0.58% |
3M | 0.77% |
6M | 0.77% |
12M | 2.32% |
Table I System performance for identification task (1:N) under FRVT 1:N evaluation for FRVT’2018 MUGSHOT dataset and different gallery sizes (N).
From this table it can be deduced that, if you have a database with 1.6 million people, the FPIR of 0.1% means that 1 out of every 1000 identifications where the probe is not in the gallery the system finds a match with a wrong person, and the shown FNIR means the system incorrectly rejects a 0.58% of the time a person that actually is in the gallery.
Face verification technologies¶
Selfie vs Selfie¶
When using the system with selfie photos, the response may change because of the characteristics of this particular use case. Results of the system for the case of selfie-vs-selfie are presented in Table III, evaluated using an internal database created for this purpose. This table is the same for Veridas Native and HTML SDKs (mobile & desktop).
Similarity Threshold | FPR (%) | FNR (%) |
---|---|---|
0.50 | 0.042 | 0.156 |
0.55 | 0.042 | 0.156 |
0.60 | 0.042 | 0.156 |
0.65 | 0.019 | 0.159 |
0.70 | 0.019 | 0.159 |
0.75 | 0.019 | 0.159 |
0.80 | 0.004 | 0.165 |
0.85 | 0.003 | 0.174 |
0.90 | 0.002 | 0.188 |
0.95 | 0.001 | 0.250 |
Table III System performance for selfie vs selfie
In the context of biometric comparison, a threshold value of 0.80 is employed to determine the degree of similarity between two biometric samples. Comparisons that yield a similarity score above 0.80 are classified as a positive match, indicating that the two samples belong to the same individual. Conversely, comparisons that yield a similarity score below 0.80 are classified as negative matches, indicating that the samples belong to different individuals. In the specific scenario of comparing two self-captured facial images (i.e., "selfie-mode"), a threshold value of 0.80 results in a false negative rate of 0.165%, meaning that 0.165% of the genuine matches are incorrectly classified as non-matches. Additionally, the false positive rate for this scenario is 0.004%, indicating that 0.004% of the non-matches are incorrectly classified as genuine matches.
Selfie vs ID Document¶
When using the system to compare a selfie photo and an identity document photograph crop, the response may change again because of the characteristics of this particular use case.
The influence of the ID document manufacturing process, the effect of environmental conditions during the capture process of both the document and the selfie, the presence of visual artifacts in the document image, the effect of the capture technology and the lens used, the possible facial complements a person may wear, as well as the time difference between the two photos, make the biometric comparison process in a digital onboarding process extremely variable.
Images printed on European identity documents
Selfie images from onboarding processes
The facial biometrics engine is specifically trained for the selfie vs. document comparison use case, allowing for optimized performance. The Veridas biometric engine is robust in the following situations.
- Presence of glare in the printed photo area.
- Presence of the kinegram and other visual artifacts on the printed photo.
- Temporal difference between the selfie photo captured by the user and the printed photo.
- Changes in the face: presence of beard, mustache, hair changes, glasses, make-up, etc.
- Presence of face-mask.
In this case, the Table IV is more suitable to state the behavior of the system. The system has been trained with document and selfie pairs, in order to adapt the system to this use case. The table has been computed by using an internal testing dataset, created for this purpose, with 3.416 real cases of selfie and document images. This table is the same for Veridas Native and HTML SDKs (mobile & desktop).
Similarity Threshold | FPR (%) | FNR (%) |
---|---|---|
0.50 | 0.400 | 1.027 |
0.55 | 0.252 | 1.109 |
0.60 | 0.165 | 1.163 |
0.65 | 0.125 | 1.184 |
0.70 | 0.061 | 1.204 |
0.75 | 0.040 | 1.330 |
0.80 | 0.016 | 1.476 |
0.85 | 0.009 | 1.694 |
0.90 | 0.003 | 2.010 |
0.95 | 0.001 | 2.779 |
Table IV System performance for selfie vs ID Document
In the context of biometric comparison, a threshold value of 0.70 is employed to determine the degree of similarity between two biometric samples. Comparisons that yield a similarity score above 0.70 are classified as a positive match, indicating that the two samples belong to the same individual. Conversely, comparisons that yield a similarity score below 0.70 are classified as negative matches, indicating that the samples belong to different individuals. In the specific scenario of comparing a selfie photo and an identity document photograph (i.e., "document-mode"), a threshold value of 0.70 results in a false negative rate of 1.20%, meaning that 1.20% of the genuine matches are incorrectly classified as non-matches. Additionally, the false positive rate for this scenario is 0.06%, indicating that 0.06% of the non-matches are incorrectly classified as genuine matches.
Liveness detection technologies¶
Veridas liveness detection has been evaluated with different presentation attack species, as well as DeepFake images. The presentation attack species are grouped into Level 1 and Level 2 as it is done by iBeta. Level 1 group is composed of replay attacks, print attacks, and other screen or paper based attacks. Level 2 group is composed of rendered faces, higher quality paper masks, and 3D masks.
SDK Selfie Alive Pro¶
das-Face implements an active liveness detection test based on a challenge-response method. das-Face generates a challenge that is consumed by Selfie-Alive Pro (SAP) SDK, and the device will start the interaction with the user. During the interaction, the user is asked to capture a selfie photograph and to record a small video of his face performing a few random head movements. The number of random movements is configurable by the integrator, we recommend 2 movements as standard, and 6 movements for maximum security. Once everything is recorded, the SDK will delegate all the captured evidence, and the device must send all the data back to the das-Face server for its processing. das-Face will analyze the video and selfie data looking for liveness evidence.
Veridas active liveness detection implemented in Selfie-Alive Pro was tested by iBeta to the ISO 30107-3 Biometric Presentation Attack Detection Standard and was found to be in compliance with Level 1 and Level2.
Having achieved this result, and because the ISO 30107-3 testing was performed with different paper-based attacks, screens, and 3D masks, the Selfie-Alive Pro solution can be found into the levels A, B and C indicated by FIDO recommendations.
The system's algorithms are trained on multiple databases that combine different presentation attack species, and they are internally evaluated using datasets carefully cured by Veridas. Additionally, Veridas submits the solution for evaluation by third parties, such as iBeta, as mentioned above.
Passive Liveness Detection Engine¶
das-Face also includes a passive liveness detector designed to avoid fraudulent access. Given a selfie photo, the detector estimates a score of the photo being captured from an actual person’s face.
Veridas passive liveness detection was tested by iBeta to the ISO 30107-3 Biometric Presentation Attack Detection Standard and was found to be in compliance with Level 1 and Level 2.
The system's algorithms are trained on multiple databases that combine different presentation attack species, and they are internally evaluated using datasets carefully cured by Veridas. Additionally, Veridas submits the solution for evaluation by third parties, such as iBeta, as mentioned above.
References¶
- Gary B. Huang, Manu Ramesh, Tamara Berg, and Erik Learned-Miller. "Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments". University of Massachusetts, Amherst, Technical Report 07-49, October, 2007.
- FIDO Alliance. Biometric Requirements v1.0 (PAD criteria). 2019. url: https://fidoalliance.org/specs/biometric/requirements/ (visited on 2020-07-10).
- J. Liu, Y. Deng, and C. Huang. "Targeting ultimate accuracy: Face recognition via deep embedding". arXiv:1506.07310, 2015.
- F. Schroff, D. Kalenichenko, and J. Philbin. "Facenet: A unified embedding for face recognition and clustering". CVPR, 2015.
- Taigman, Y., Yang, M., Ranzato, M. & Wolf, L. "Deepface: closing the gap to human-level performance in face verification". Proc. Conference on Computer Vision and Pattern Recognition 1701–1708 (2014).
- Maze, B., et al. "IARPA Janus Benchmark – C: Face Dataset and Protocol". 11th IAPR International Conference on Biometrics (2018).
- Wang, M., et al. "Racial Faces in-the-Wild: Reducing Racial Bias by Information Maximization Adaptation Network". arXiv:1812.00194, 2019.