Face Biometry Performance report¶
last updated: 2022-05-17
Introduction¶
das-Face is the face recognition engine designed and developed by Veridas Digital Authentication Solutions S.L. with the goal of performing automatic identity verification (1:1) and identification (1:N) under different scenarios.
In this document, a performance analysis of das-face is summarised, for verification (1:1) and identification (1:N).
These evaluation standards are commonly used in commercial and academic communities worldwide for system evaluation and comparison against state-of-the-art purposes. The results shown here are:
- Verification (1:1): FRVT 1:1 Verification, submission date: 2021-09-02
- Identification (1:N): FRVT 1:N Identification, submission date: 2021-11-09
das-Face is also capable of performing liveness detection using a passive procedure based on a selfie image, and a challenge-response active methodology based on a selfie and an annotated video. This document shows performance of das-Face for both use cases.
Veridas active liveness detection implemented in Selfie-Alive Pro was tested by iBeta to the ISO 30107-3 Biometric Presentation Attack Detection Standard. Confirmation letters are available in the links below:
The document's content is divided as follows: In Section 2, definitions for understanding better analysis and results. In Section 3, analysis of das-Face performance in terms of the verification task (1:1). In Section 4, analysis of das-Face performance in terms of the identification task (1:N). In Section 5, the system calibration for the main use cases is presented. Section 6 presents information on the accuracy of the liveness detection engine.
The face recognition engine developed by Veridas was ranked by NIST as the third best in the world in the WILD category on April 4th, 2019, and it’s the subject of continuous development and improvement efforts.
The face recognition engine developed by VERIDAS was ranked by NIST in the top 25% of the systems presented to FRVT 1:1 to the WILD category. Find below a picture of all the competitors in the mentioned WILD category. The VERIDAS system has been marked in red (Results shown from NIST do not constitute an endorsement of any particular system, product, service, or company by NIST.)
VERIDAS achieved a False Non Match Rate (FNMR) of 3.11% for a False Match Rate (FMR) threshold fixed at 0.001%. VERIDAS also achieved a False Non Match Rate (FNMR) of 2.84% for a False Match Rate (FMR) threshold fixed at 0.01%. These figures put VERIDAS 0.14 points from the Top-3 system. Taking into account these results, VERIDAS will comply with the requirements of FIDO for facial biometric verification systems. Specifically, FIDO states that FNMR should be less than 5% for a FMR of 0.01%.
VERIDAS also complies with the CCN regulation, as the FNMR (FNR) should be less or equal than 5% for an FMR (FPR) of 0.0001% on the VISABORDER category. VERIDAS fully meets this requirement, as the FNMR (FNR) on that category is just 0.8%.
The WILD category is characterized by a non-collaborative subject, so the person whose face is being captured does not have to be facing the camera, and the picture could show different issues in terms of illumination, contrast, exposure, …
Because of the on-boarding pictures nature, the procedure may be similar to WILD category because the person is taking a picture in uncontrolled conditions.
The face recognition engine developed by Veridas was ranked by NIST in the top 25% best systems in the world in the MUGSHOT category on August 5th, 2021, and it’s the subject of continuous development and improvement efforts.
The face recognition engine developed by VERIDAS was ranked by NIST in the 51 of 319 systems presented to FRVT 1:N to the MUGSHOT category. The evaluation was performed on 2021 November. Find below a picture of all the competitors in the mentioned MUGSHOT category. The VERIDAS system has been marked in red (Results shown from NIST do not constitute an endorsement of any particular system, product, service, or company by NIST.)
VERIDAS achieved a False Negative Identification Rate (FNIR) of 1% for a False Positive Identification Rate (FPIR) threshold fixed at 0.3%, and with a gallery with N=1.6M.
The MUGSHOT category is characterized by a collaborative subject almost following ISO 19794-5, so the person whose face is being captured is in good acquisition conditions.
In the next diagram, we can see the relation between the searching time, the FNIR and other algorithms.
Definitions¶
Definitions to better understand the analysis and results:
- Verification task (1:1): Use case in which two different images containing the face of a person are presented to the system for it to determine if they are (or not) the same person.
- National Institute of Standards and Technology (NIST): Measurement standards laboratory whose mission is to promote innovation and industrial competitiveness.
- Negative evaluation: Evaluation of two images belonging to two different people.
- Positive evaluation: Evaluation of two images belonging to the same person.
- Accuracy: Percentage of correct answers provided by the system.
- False Positive Rate (FPR) or False Match Rate (FMR): Ratio between the number of negative evaluations wrongly categorized as positive and the total number of actual negative evaluations.
- True Positive Rate (TPR): Ratio between the number of positive evaluations correctly categorized as positive and the total number of actual positive evaluations.
- False Non Match Rate (FNMR): Ratio between the number of positive evaluations rejected by the system and the total number of actual positive evaluations.
- Identification task (1:N): Use case in which an image containing the face of a person is presented to the system, having the system access to a pool of N images each corresponding to an identity, in order for the system to determine to which of the N identities (if any) the presented image belongs to.
- Identification Rate (IR): Ratio between the number of successful identifications and the total number of performed identifications.
- Identification Rank (R): Upper bound of the position where the match should be in the list of candidates returned by the system in order to consider it a successful match.
- False Negative Identification Rate (FNIR): Ratio between the number of positive identifications rejected because they do not achieved the threshold, over the total number of actual positive identifications.
- False Positive Identification Rate (FPIR): Ratio between the number of negative identifications accepted because they do achieved the threshold, over the total number of negative identifications.
- Liveness detection: An automatic procedure whose purpose is to detect how likely the captured evidence (images, videos, …) belong to an actual person and not to a spoofed sample of a person.
- Bonafide: A presentation attempt that is performed by a trustworthy person.
- Attack: A presentation attempt that is performed by an impostor or spoofer.
- Attack Percentage Classification Error (APCER): Is the ratio between the number of spoof attacks misclassified as authentic over the total number of performed attacks.
- Bonafide Percentage Classification Error (BPCER): Is the ratio between the number of bonafide (actual person’s faces) misclassified as attacks over the total number of performed bonafide samples.
das-Face performance in identity verification (1:1)¶
In this section, the latest version of das-Face performance is carried out, using NIST evaluations as a reference.
1:1 Ongoing Face Recognition Vendor Test (FRVT) - Verification¶
This section describes the most relevant results on our model in the NIST 1:1.
On verification, multiple datasets are used when the system is tested. Currently the next types of datasets are used on NIST benchmarks:
- Wild images: Typical social-media images, with many photojournalism-style images. Images are given to the algorithm. This category is the most difficult one, as it is not collaborative.
- Mugshot images: Photographic portrait of a person from the waist up, typically taken after a person is arrested. These photos have a reasonable conformance with the ISO/IEC 19794-5 Full Frontal image type.
- VISA images: Photos collected in immigration offices,
- Border images: Border crossing images collected in primary immigration lanes.
The next graph report shows the system performance on different domains.
das-Face performance in identification (1:N)¶
This section describes the most relevant results of our biometric model in the NIST FRVT 1:N. In identification, a particular identity (probe) is searched within a pool of N number of known identities (gallery). The next Table reports how the system FNIR is affected by the size N of the gallery, fixed an operational point for FPIR=0.1% and using the dataset FRVT’2018 MUGSHOT, and when the identification rank is R=1.
N | VERIDAS FNIR (%) |
---|---|
640K | 1.17% |
1.6M | 1.66% |
3M | 2.19% |
6M | 4.45% |
12M | 15.43% |
Table I System performance for identification task (1:N) under FRVT 1:N evaluation for FRVT’2018 MUGSHOT dataset and different gallery sizes (N).
From this table it can be deduced that, if you have a database with 1.6 million people, the FPIR of 0.1% means that 1 out of every 1000 identifications where the probe is not in the gallery the system finds a match with a wrong person, and the shown FNIR means the system incorrectly rejects a 1.66% of the time a person that actually is in the gallery.
In the next Table, it is reported the FNIR of the system for different FPIR operational thresholds, when the gallery size is N=3k and N=640k, and when the identification rank is R=1.
Gallery size | FPIR (%) | FNIR(%) | Threshold |
---|---|---|---|
3,000 | 0.001% | 3.0% | >0.99 |
3,000 | 0.01% | 1.3% | >0.98 |
3,000 | 0.1% | 0.4% | >0.96 |
3,000 | 1% | <0.1% | >0.90 |
640,000 | 0.03% | 2.5% | >0.99 |
640,000 | 0.1% | 1.2% | >0.98 |
640,000 | 1% | 0.5% | >0.96 |
Table II Calibration curve for identification task (1:N)
The previous table shows that for a confidence over the threshold of 0.95, if the gallery database contains N=640K persons, the system finds false matches 1% of the time, and incorrectly rejects 1.6% of the people that actually are in the gallery.
Face verification technologies¶
Selfie vs Selfie¶
When using the system with selfie photos, the response may change because of the characteristics of this particular use case. Results of the system for the case of selfie-vs-selfie are presented in Table III, evaluated using an internal database created for this purpose. This table is the same for Veridas Native and HTML SDKs (mobile & desktop).
Similarity Threshold | FPR (%) | FNR (%) |
---|---|---|
0.50 | 0.039 | 0.156 |
0.55 | 0.027 | 0.156 |
0.60 | 0.026 | 0.156 |
0.65 | 0.026 | 0.156 |
0.70 | 0.011 | 0.159 |
0.75 | 0.011 | 0.162 |
0.80 | 0.006 | 0.171 |
0.85 | 0.004 | 0.188 |
0.90 | 0.003 | 0.215 |
0.95 | 0.001 | 0.344 |
Table III System performance for selfie vs selfie
For instance, choosing a 0.80 as threshold, all biometric comparisons with score above 0.80 will be considered as the same person, and all comparisons with a score below 0.80 will be considered as different persons. For 0.80, in the case of selfie vs selfie, 0.171% of the comparisons of a person's selfies will be rejected (false negative), and only a 0.006% of the cases will be incorrectly classified as authentic (false positive).
Selfie vs ID Document¶
When using the system to compare a selfie photo and an identity document photograph crop, the response may change again because of the characteristics of this particular use case.
The influence of the ID document manufacturing process, the effect of environmental conditions during the capture process of both the document and the selfie, the presence of visual artifacts in the document image, the effect of the capture technology and the lens used, the possible facial complements a person may wear, as well as the time difference between the two photos, make the biometric comparison process in a digital onboarding process extremely variable.
Images printed on European identity documents
Selfie images from onboarding processes
The facial biometrics engine is specifically trained for the selfie vs. document comparison use case, allowing for optimized performance. The Veridas biometric engine is robust in the following situations.
- Presence of glare in the printed photo area.
- Presence of the kinegram and other visual artifacts on the printed photo.
- Temporal difference between the selfie photo captured by the user and the printed photo.
- Changes in the face: presence of beard, mustache, hair changes, glasses, make-up, etc.
- Presence of face-mask.
In this case, the Table IV is more suitable to state the behavior of the system. The system has been trained with document and selfie pairs, in order to adapt the system to this use case. The table has been computed by using an internal testing dataset, created for this purpose, with 3.416 real cases of selfie and document images. This table is the same for Veridas Native and HTML SDKs (mobile & desktop).
Similarity Threshold | FPR (%) | FNR (%) |
---|---|---|
0.50 | 1.12 | 1.22 |
0.55 | 0.81 | 1.30 |
0.60 | 0.59 | 1.36 |
0.65 | 0.41 | 1.46 |
0.70 | 0.27 | 1.57 |
0.75 | 0.17 | 1.73 |
0.80 | 0.09 | 1.85 |
0.85 | 0.05 | 2.10 |
0.90 | 0.01 | 2.31 |
0.95 | <0.01 | 3.50 |
Table IV System performance for selfie vs ID Document
Based on our experience, the operation point is usually at least 0.70. For instance, choosing a 0.70 as threshold, all biometric comparisons with score above 0.70 will be considered as the same person, and all comparisons with a score below 0.70 will be considered as different persons. For 0.70, in the case of selfie vs document, a 1.57% of the comparisons of a person selfie and its corresponding legit ID card will be rejected (false negative), and only a 0.27% of the cases comparing a selfie and a ID card corresponding to different persons will be incorrectly classified as authentic (false positive).
Liveness detection technologies¶
Veridas liveness detection has been measured for different scenarios. These scenarios are of Level 1 and Level 2. Level 1 scenarios are the following:
- PHOTO-REPLAY-ATTACK: Attack that consists of displaying a photo on a digital device in front of the camera.
- PRINT-3D-LAYERED-MASK: Attack with a mask created with several layers of the same photo, with less contour each one, so it creates a depth effect. It is a modification of PRINT-MASK-ATTACK.
- PRINT-ATTACK: Attack that consists of displaying a simple paper photo in front of the camera (without any holes).
- PRINT-MASK-ATTACK: Attack that consists of creating, cutting a simple paper photo, and displaying it in front of the camera.
- VIDEO-REPLAY-ATTACK: Attack that consists of displaying a video on a digital device in front of the camera.
Level 2 scenarios where it has been measured are the following:
- 2D-TO-3D-AVATAR-ATTACK: Attack that consists of attacking using a regenerated 3D avatar created by any available phone and/or web app.
- 3D-CURVED-PAPER-MASK: Attack that consists of creating, cutting, manufacturing, and bending a mask from a high resolution print.
- LATEX-MASK-ATTACK: Attack that consists of attacking wearing a latex mask in front of the camera.
- LAYERED-2D-TRANSPARENT-PHOTO: Attack that consists of attacking using a printed photo in a traslucid paper over a printed photo.
- MANNEQUIN-HEAD-ATTACK: Attack that consists of attacking with a mannequin in front of a camera.
- PHOTO-REPLAY-3D-RENDER-ATTACK : Attack that consists of displaying a CG (Computer Graphics) customized model, as a still image, rendered in a specific engine digital on a device in front of the camera.
- PLASTER-MASK-ATTACK: Attack that consists of attacking wearing masks that seem to be created with plaster and painted with detail.
- PLASTIC-MASK-ATTACK: Attack that consists of attacking wearing a rigid plastic mask in front of the camera.
- RESIN-MASK-ATTACK: Attack that consists of attacking wearing a silicone mask in front of the camera.
- SILICONE-MASK-ATTACK: Attack that consists of attacking wearing a silicone mask in front of the camera.
- TRANSPARENT-MASK-ATTACK: Attack that consists of attacking wearing a traslucid mask.
- VIDEO-REPLAY-3D-RENDER-ATTACK : Attack that consists of displaying a CG (Computer Graphics) customized model, with movement, rendered in a specific engine digital on a device in front of the camera.
SDK Selfie Alive Pro¶
das-Face incorporates also an active liveness detection procedure based on a challenge-response method. das-Face generates a challenge that is consumed by Selfie-Alive Pro (SAP) SDK, and the device will start the interaction with the user. During the interaction, the user is asked to capture a selfie photograph and to record a small video of his face performing a few random head movements. The number of random movements is configurable by the integrator, we recommend 2 movements as standard, and 6 movements for maximum security. Once everything is recorded, the SDK will delegate all the captured evidence, and the device must send all the data back to the das-Face server for its processing. das-Face will analyze the video and selfie data looking for liveness evidence.
Veridas active liveness detection implemented in Selfie-Alive Pro was tested by iBeta to the ISO 30107-3 Biometric Presentation Attack Detection Standard and was found to be in compliance with Level 1 and Level2.
Having achieved this result, and because the ISO 30107-3 testing was performed with 2D printouts, paper masks, 3D layered photos, replayed photos in screens, 2D photo attached to 3D contoured mask, 3D animation software, latex, silicone and resin mask, the Selfie-Alive Pro solution can be found into the levels A, B and C indicated by FIDO recommendations.
The system algorithms are trained in multiple databases combining the different types of attacks explained before. The system's performance has been evaluated over an internal database composed of more than 1400 authentic samples and 6000 attacks. The numbers with this dataset are depicted in Table V:
Liveness Threshold | BPCER (%) | MAX APCER (%) Level 1 |
---|---|---|
0.50 | 0.55 | 25.00 |
0.55 | 0.74 | 9.41 |
0.60 | 1.05 | 7.06 |
0.65 | 4.19 | 3.53 |
0.70 | 5.12 | 1.18 |
0.75 | 6.97 | <1.18 |
0.80 | 10.60 | <1.18 |
0.85 | 16.83 | <1.18 |
0.90 | 31.01 | <1.18 |
0.95 | 66.58 | <1.18 |
Table V System performance for SAP
The MAX APCER column indicates the maximum APCER for its category of attacks (Level 1 or Level 2). The increase at certain liveness thresholds with respect to older versions is because previously it was expressed as the average of the APCER of all types of attacks in a category, and not the maximum.
Based on Table V, using an operation point at 0.70, the 5.12% of authentic cases will be rejected and, if the attack belongs to the ones of Level 1 with the worst performance, the 1.18% of spoofing attempts will be misclassified as authentic.
Optimal performance requires following constraints:
- All evidence must be kept as returned by the SDK, any additional compression may lead to accuracy problems.
- Face must be of 150px width to ensure it can be processed by the anti-spoofing system. To ensure accuracy, we recommended faces with more than 320px width.
- This size allows processing of images taken with Native and HTML SDKs provided by Veridas. Using other capture procedures may harm the correct operation of the system.
- Face is expected to be frontal with the camera in the selfie.
- Face movements should be smooth during the video record.
Based on Table V, the following thresholding criteria are recommended:
- When the threshold > 0.7 the attempt is classified as "bona fide".
- When the threshold < 0.5 the attempt is classified as "attack".
- When the threshold is in between 0.5 and 0.7 the attempt is doubtful and should be reviewed by a human operator.
If the main goal is protecting from Level 2 attacks, it is encouraged to increase the threshold to 0.90. In this case, the BPCER (%) is 31.01% and ACPER 5.06%. In other words, assuming the worst-case scenario of Level 2 attacks, 31.01% of authentic cases are rejected and 5.06% of spoofing attempts will be misclassified as genuine.
Passive Liveness Detection Engine¶
das-Face also includes a passive liveness detector designed to avoid fraudulent access. Given a selfie photo, the detector estimates a score of the photo being captured from an actual person’s face.
The system algorithms are trained in multiple databases combining the different types of attacks explained before. The system's performance has been evaluated over an internal database composed of more than 1500 authentic samples and 7000 attacks.
The anti-spoofing performance is shown in Table VI. Notice the performance of the system is shown for different authenticity thresholds, i.e., 1.00 means authentic and 0.00 means spoof attempt.
Liveness Threshold | BPCER (%) | MAX APCER (%) Level 1 |
---|---|---|
0.50 | 0.42 | 22.06 |
0.55 | 0.65 | 17.65 |
0.60 | 0.83 | 13.24 |
0.65 | 1.01 | 8.09 |
0.70 | 1.48 | 3.67 |
0.75 | 2.61 | 0.74 |
0.80 | 4.86 | 0.63 |
0.85 | 8.54 | <0.63 |
0.90 | 14.42 | <0.63 |
0.95 | 28.60 | <0.63 |
Table VI System performance for passive liveness detection
The MAX APCER column indicates the maximum APCER for its category of attacks (Level 1 or Level 2). Please note that in previous versions of the documentation, the table showed the average of the APCER of all types of attacks in a category and not the maximum.
Based on Table VI, using an operation point at 0.70, the 1.48% of authentic cases will be rejected and, if the attack belongs to the ones of Level 1 with the worst performance, the 3.67% of spoofing attempts will be misclassified as authentic.
Based on Table VI, the following thresholding criteria are recommended:
- When the threshold > 0.7 the attempt is classified as "bona fide".
- When the threshold < 0.5 the attempt is classified as "attack".
- When the threshold is in between 0.5 and 0.7 the attempt is doubtful and should be reviewed by a human operator.
If the main goal is protecting from Level 2 attacks, it is encouraged to increase the threshold to 0.95. In this case, the BPCER (%) is 28.60% and ACPER 7.86%. In other words, assuming the worst-case scenario of Level 2 attacks, 28.60% of authentic cases are rejected and 7.86% of spoofing attempts will be misclassified as genuine.
References¶
- Gary B. Huang, Manu Ramesh, Tamara Berg, and Erik Learned-Miller. "Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments". University of Massachusetts, Amherst, Technical Report 07-49, October, 2007.
- FIDO Alliance. Biometric Requirements v1.0 (PAD criteria). 2019. url: https://fidoalliance.org/specs/biometric/requirements/ (visited on 2020-07-10).
- J. Liu, Y. Deng, and C. Huang. "Targeting ultimate accuracy: Face recognition via deep embedding". arXiv:1506.07310, 2015.
- F. Schroff, D. Kalenichenko, and J. Philbin. "Facenet: A unified embedding for face recognition and clustering". CVPR, 2015.
- Taigman, Y., Yang, M., Ranzato, M. & Wolf, L. "Deepface: closing the gap to human-level performance in face verification". Proc. Conference on Computer Vision and Pattern Recognition 1701–1708 (2014).
- Maze, B., et al. "IARPA Janus Benchmark – C: Face Dataset and Protocol". 11th IAPR International Conference on Biometrics (2018).
- Wang, M., et al. "Racial Faces in-the-Wild: Reducing Racial Bias by Information Maximization Adaptation Network". arXiv:1812.00194, 2019.