Voice Biometry Performance Report¶
das-Peak microservice¶
Voice biometrics is a state-of-the-art technology that allows a person to be validated by his/her voice. VERIDAS solution captures the unique physical features of the vocal apparatus and features such as frequency, speed and accents and compiles them together into a virtually unique voice biometric vector per person.
The voice biometric vector is a mathematical descriptor obtained from the characteristics of the voice in an audio recording. This mathematical conversion from voice into a biometric vector is irreversible. Therefore, it is not possible to recover a person's voice signal from the calculated biometric vector.
VERIDAS has developed its own speaker verification engine (das-Peak) as a cloud-based solution that can be consumed via APIs.
VERIDAS’ voice biometrics group has participated in the short-duration Speaker Verification Challenge(SdSV) 2020 getting the 3º award (2º single model), demonstrating best results in the state of the art in Voice Biometrics for short utterances conditions. Check here the results.
das-Peak calculates the similarity between two audio recordings (in terms of the speakers present in them) using biometric algorithms. das-Peak engine allows to authenticate users voice without the need of using a password or predefined phrase (passive recognition) as it is based on text-independent technology. This means that the biometric comparison is related to the voice characteristics and not to the content of the sentence. However, the system is flexible to use pre-defined phrases in order to fulfill customer requirements or additional controls. This, also, entails that das-Peak is a language-independent technology. Hence, das-Peak is able to identify a person whichever the language they speak.
System quality report¶
The subsequent performance analysis has been done with audio samples encoded with PCM_16 and G711 (mu-law/a-law, for 8 bits per sample) codecs. Any deviation from these conditions may produce unexpected modifications in the reported results. To ensure compliance, please see the conditions for audios in Section Main features.
Verification performance (Telephone Channel)¶
VERIDAS has evaluated its voice biometrics model (das-Peak 2023Q4 version) with an internal Telephone audio english language database with different duration enrollment audios (10 and 5 seconds) and different duration audios test (10 and 5 seconds). In the next table the values of False Positive Rate (FPR, the probability to accept a non-legit person) and False Negative Rate (FNR, the probability to reject a legit person) with different threshold values are showed. With these values it is possible to choose the desired working point of the voice biometric system.
TELEPHONE | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Similarity threshold | Enrollment = 3s | Enrollment = 5s | Enrollment = 10s | |||||||
Verification = 3s | Verification = 3s | Verification = 5s | Verification = 5s | Verification = 10s | ||||||
FPR (%) | FNR (%) | FPR (%) | FNR (%) | FPR (%) | FNR (%) | FPR (%) | FNR (%) | FPR (%) | FNR (%) | |
0.50 | 5.00 | 7.04 | 5.00 | 5.75 | 5.00 | 4.67 | 5.00 | 3.84 | 5.00 | 3.03 |
0.55 | 4.06 | 7.79 | 4.05 | 6.25 | 4.05 | 5.00 | 4.05 | 4.05 | 4.05 | 3.19 |
0.60 | 3.26 | 8.68 | 3.25 | 6.84 | 3.24 | 5.42 | 3.24 | 4.32 | 3.24 | 3.40 |
0.65 | 2.57 | 9.76 | 2.56 | 7.58 | 2.55 | 5.89 | 2.55 | 4.64 | 2.55 | 3.62 |
0.70 | 1.97 | 10.99 | 1.97 | 8.44 | 1.96 | 6.51 | 1.96 | 5.03 | 1.96 | 3.88 |
0.75 | 1.46 | 12.59 | 1.46 | 9.62 | 1.46 | 7.30 | 1.45 | 5.59 | 1.46 | 4.19 |
0.80 | 1.02 | 14.65 | 1.02 | 11.17 | 1.02 | 8.33 | 1.02 | 6.32 | 1.02 | 4.68 |
0.85 | 0.64 | 17.68 | 0.65 | 13.41 | 0.65 | 9.83 | 0.65 | 7.40 | 0.65 | 5.35 |
0.90 | 0.34 | 22.56 | 0.34 | 17.08 | 0.34 | 12.46 | 0.34 | 9.20 | 0.34 | 6.58 |
0.95 | 0.11 | 31.98 | 0.11 | 24.83 | 0.12 | 18.21 | 0.12 | 13.41 | 0.12 | 9.47 |
This calibration shows different security work points depending on the similarity threshold and the audio voice duration that are compared.
For example, if the use case is 10 seconds to enroll and 5 seconds to verify, with a threshold equal to 0.8, it is obtained FPR=1.02% and FNR=6.32%. In this case, 93.68% of the comparisons of a person's voice and its corresponding voice registration will be considered as the same person, and only 1.02% of the cases comparing voices to different persons will be incorrectly classified as the same person.
Verification performance (Lossless Audio)¶
VERIDAS has internally evaluated its voice biometrics model (das-Peak 2023Q4 version) with a test database with different duration enrollment audios (10 and 5 seconds) and different duration audios test (10 and 5 seconds). The database language is English. In the next table the False Positive Rate (FPR, probability to accept a non-legit person) and the False Negative Rate (FNR, probability to reject a legit person) with different threshold values are shown. With these values it is possible to choose the desired working point of the voice biometric system.
LOSS-LESS | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Similarity threshold | Enrollment = 3s | Enrollment = 5s | Enrollment = 10s | |||||||
Verification = 3s | Verification = 3s | Verification = 5s | Verification = 5s | Verification = 10s | ||||||
FPR (%) | FNR (%) | FPR (%) | FNR (%) | FPR (%) | FNR (%) | FPR (%) | FNR (%) | FPR (%) | FNR (%) | |
0.50 | 1.00 | 11.24 | 1.00 | 7.71 | 1.00 | 4.98 | 1.00 | 3.51 | 1.00 | 2.45 |
0.55 | 0.73 | 13.12 | 0.74 | 9.00 | 0.74 | 5.82 | 0.74 | 4.12 | 0.74 | 2.77 |
0.60 | 0.53 | 15.20 | 0.53 | 10.66 | 0.54 | 6.90 | 0.54 | 4.86 | 0.54 | 3.25 |
0.65 | 0.38 | 17.72 | 0.38 | 12.55 | 0.38 | 8.29 | 0.38 | 5.63 | 0.39 | 3.82 |
0.70 | 0.26 | 20.61 | 0.26 | 14.79 | 0.26 | 9.89 | 0.26 | 6.72 | 0.27 | 4.47 |
0.75 | 0.17 | 24.11 | 0.17 | 17.65 | 0.17 | 11.85 | 0.17 | 8.14 | 0.17 | 5.32 |
0.80 | 0.10 | 28.31 | 0.10 | 21.23 | 0.10 | 14.70 | 0.10 | 10.12 | 0.10 | 6.68 |
0.85 | 0.05 | 33.88 | 0.05 | 26.26 | 0.05 | 18.56 | 0.05 | 13.08 | 0.05 | 8.56 |
0.90 | 0.02 | 42.06 | 0.02 | 33.57 | 0.02 | 24.80 | 0.02 | 17.91 | 0.02 | 11.94 |
0.95 | 0.00 | 55.64 | 0.00 | 46.77 | 0.00 | 36.78 | 0.00 | 28.12 | 0.00 | 19.69 |
This calibration shows different security work points depending on the similarity threshold and the voice duration audios that are compared.
For example, if the use case is 5 seconds to enroll and 5 seconds to verify, with a threshold equal to 0.8, it is obtained FPR=0.1% and FNR=14.70%. In this case, 85.30% of the comparisons of a person's voice and its corresponding voice registration will be considered as the same person, and only 0.1% of the cases comparing voices to different persons will be incorrectly classified as the same person.
Identification performance¶
VERIDAS has evaluated its latest voice biometrics model (2023Q4
) with a
state-of-the-art database for the identification 1:N use case (N = 1000). In
this database, speakers have been recorded in different sessions with different
acoustic conditions (street, pub, train station, room, office,...) through a web
recording application. The database language is English. More than 10000
identification tasks have been performed for each N (from N=5, 10, 20, 50, 100,
200, 500, 1000). For example, in a 1 to 1000 identification process (N=1000) the
probability to identify the right individual is 98.15 % (Enroll 10s-Test 5s).
The identification accuracy results for N = 5 to 1000 depending on the
enroll-test audios duration can be observed in the following table:
Durations/N speakers | 5 | 10 | 20 | 50 | 100 | 200 | 500 | 1000 |
---|---|---|---|---|---|---|---|---|
Enroll 3s-Test 3s | 99.2 | 99.46 | 99.05 | 98.6 | 98.12 | 97.43 | 96.43 | 95.48 |
Enroll 5s-Test 3s | 99.76 | 99.6 | 99.35 | 98.95 | 98.6 | 98.15 | 97.33 | 96.71 |
Enroll 5s-Test 5s | 99.8 | 99.72 | 99.57 | 99.26 | 98.97 | 98.6 | 98.1 | 97.6 |
Enroll 10s-Test 3s | 99.76 | 99.66 | 99.5 | 99.25 | 98.84 | 98.54 | 98.1 | 97.46 |
Enroll 10s-Test 5s | 99.8 | 99.82 | 99.83 | 99.5 | 99.24 | 99.02 | 98.69 | 98.15 |
Enroll 10s-Test 10s | 99.92 | 99.84 | 99.84 | 99.66 | 99.42 | 99.29 | 98.96 | 98.62 |
Enroll 20s-Test 3s | 99.76 | 99.72 | 99.58 | 99.38 | 99.14 | 98.94 | 98.44 | 97.96 |
Enroll 20s-Test 5s | 99.84 | 99.88 | 99.69 | 99.61 | 99.41 | 99.35 | 98.91 | 98.59 |
Enroll 20s-Test 10s | 99.92 | 99.9 | 99.78 | 99.7 | 99.62 | 99.5 | 99.19 | 98.93 |
Enroll 20s-Test 20s | 99.96 | 99.9 | 99.84 | 99.84 | 99.68 | 99.61 | 99.42 | 99.3 |
A graphical representation of the accuracy of the system per value of N, for different audio durations is shown next.
Voice Authenticity (Anti-spoofing)¶
Veridas has developed proprietary anti-spoofing technology designed to detect and prevent fraudulent access attempts. Given an audio file with a voice, the detector estimates the likelihood of the voice being authentic.
The performance of this technology has been proven efficient against all kinds of expected attacks including both presentation and injection attacks using millions of samples of spoof and non-spoof attempts.
Veridas also works with third-party evaluators to assess and validate the security of the Veridas voice anti-spoofing technology.