Skip to content

Voice Biometry Performance Report

das-Peak microservice

Voice biometrics is a state-of-the-art technology that allows a person to be validated by his/her voice. VERIDAS solution captures the unique physical features of the vocal apparatus and features such as frequency, speed and accents and compiles them together into a virtually unique voice biometric vector per person.

The voice biometric vector is a mathematical descriptor obtained from the characteristics of the voice in an audio recording. This mathematical conversion from voice into a biometric vector is irreversible. Therefore, it is not possible to recover a person's voice signal from the calculated biometric vector.

VERIDAS has developed its own speaker verification engine (das-Peak) as a cloud-based solution that can be consumed via APIs.

VERIDAS' voice biometrics group has participated in the short-duration Speaker Verification Challenge(SdSV) 2020 getting the 3º award (2º single model), demonstrating best results in the state of the art in Voice Biometrics for short utterances conditions. Check here the results.

das-Peak calculates the similarity between two audio recordings (in terms of the speakers present in them) using biometric algorithms. das-Peak engine allows to authenticate users voice without the need of using a password or predefined phrase (passive recognition) as it is based on text-independent technology. This means that the biometric comparison is related to the voice characteristics and not to the content of the sentence. However, the system is flexible to use pre-defined phrases in order to fulfill customer requirements or additional controls. This, also, entails that das-Peak is a language-independent technology. Hence, das-Peak is able to identify a person whichever the language they speak.

System quality report

The subsequent performance analysis has been done with audio samples encoded with PCM_16 and G711 (mu-law/a-law, for 8 bits per sample) codecs. Any deviation from these conditions may produce unexpected modifications in the reported results. To ensure compliance, please see the conditions for audios in Section Main features.

Verification performance (Telephone Channel)

VERIDAS has evaluated its voice biometrics model (das-Peak 2024Q1 version) with an internal Telephone audio English and Spanish languages database with different enrollment and verification audio durations. In the next table, the values of False Acceptance Rate (FAR, the probability to accept a non-legit person) and False Rejection Rate (FRR, the probability to reject a legit person) with different threshold values are shown. With these values, it is possible to choose the desired working point of the voice biometric system.

TELEPHONE
Similarity threshold FAR (%) Enrollment = 3s Enrollment = 5s Enrollment = 10s
Verification = 3s Verification = 3s Verification = 5s Verification = 5s Verification = 10s
FRR (%) FRR (%) FRR (%) FRR (%) FRR (%)
0.50 5.00 7.33 5.34 3.47 2.64 1.88
0.55 4.02 8.05 5.82 3.77 2.88 2.06
0.60 3.21 8.91 6.38 4.11 3.12 2.24
0.65 2.52 9.83 7.12 4.60 3.37 2.38
0.70 1.95 11.05 7.93 5.12 3.69 2.62
0.75 1.44 12.30 8.98 5.96 4.22 2.96
0.80 1.02 14.29 10.50 7.04 4.93 3.38
0.85 0.65 16.86 12.49 8.59 6.14 4.13
0.90 0.34 21.15 15.83 11.22 7.83 5.56
0.95 0.11 29.94 23.14 16.69 11.90 8.45

This calibration shows different security work points depending on the similarity threshold and the audio voice duration that are compared.

For example, if the use case is 10 seconds to enroll and 5 seconds to verify, with a threshold equal to 0.8, it is obtained FAR=1.02% and FRR=4.93%. In this case, 95.07% of the comparisons of a person's voice and its corresponding voice registration will be considered as the same person, and only 1.02% of the cases comparing voices to different persons will be incorrectly classified as the same person.

Verification performance (Lossless Audio)

VERIDAS has evaluated its voice biometrics model (das-Peak 2024Q1 version) with an internal Telephoneaudio English and Spanish languages database with different enrollment and verification audio durations. The database language is English. In the next table the False Acceptance Rate (FAR, probability to accept a non-legit person) and the False Rejection Rate (FRR, probability to reject a legit person) with different threshold values are shown. With these values it is possible to choose the desired working point of the voice biometric system.

LOSSLESS
Similarity threshold FAR (%) Enrollment = 3s Enrollment = 5s Enrollment = 10s
Verification = 3s Verification = 3s Verification = 5s Verification = 5s Verification = 10s
FRR (%) FRR (%) FRR (%) FRR (%) FRR (%)
0.50 5.00 2.33 1.58 1.06 0.64 0.39
0.55 3.91 2.83 1.91 1.28 0.76 0.45
0.60 3.05 3.40 2.32 1.55 0.91 0.51
0.65 2.36 4.17 2.84 1.84 1.11 0.59
0.70 1.80 5.12 3.53 2.34 1.37 0.70
0.75 1.36 6.48 4.46 2.97 1.75 0.88
0.80 1.01 8.15 5.76 3.89 2.33 1.18
0.85 0.74 10.81 7.75 5.28 3.22 1.67
0.90 0.54 15.11 11.19 7.72 4.94 2.72
0.95 0.36 24.41 18.83 13.78 9.49 5.73

This calibration shows different security work points depending on the similarity threshold and the voice duration audios that are compared.

For example, if the use case is 5 seconds to enroll and 5 seconds to verify, with a threshold equal to 0.8, it is obtained FAR=1.01% and FRR=3.89%. In this case, 96.11% of the comparisons of a person's voice and its corresponding voice registration will be considered as the same person, and only 1.01% of the cases comparing voices to different persons will be incorrectly classified as the same person.

Warning

The results shown above are merely a reference of the technology’s performance in a realistic production-like environment, while meeting certain minimum usage conditions (audio sample quality, amount of net speech, etc.) that favor biometric performance. Any other use under different conditions (low-quality audio, limited speech content, lossy audio encodings, multiple speakers in the same recording, etc.) may result in potentially worse performance than that shared in this report. For more information on how to make the best use of Veridas voice biometrics, please contact our technical team.

Identification performance

VERIDAS has evaluated its latest voice biometrics model (2024Q1) with a state-of-the-art database for the identification 1:N use case (N = 10000). In this database, speakers have been recorded in different sessions with different acoustic conditions (street, pub, train station, room, office,...) through a web recording application. The database language is English. More than 10000 identification tasks have been performed for each N (from N=5, 10, 20, 50, 100, 200, 500, 10000). For example, in a 1 to 10000 identification process (N=10000) the probability to identify the right individual is 98.15 % (Enroll 10s-Test 5s).

Lossless audio

The identification accuracy results for N = 5 to 10000 depending on the enroll-test audios duration for the lossless audio can be observed in the following table:

Durations/N speakers 5 10 20 50 100 200 500 1000 2000 5000 10000
Enroll 3s-Test 3s 99.24 99.24 98.79 98.44 97.73 96.89 95.62 94.55 93.31 91.15 88.53
Enroll 5s-Test 3s 99.56 99.56 99.34 98.74 98.41 97.89 96.94 96.23 95.19 93.53 91.64
Enroll 5s-Test 5s 99.76 99.64 99.46 99.09 98.80 98.41 97.76 97.28 96.50 95.20 94.00
Enroll 10s-Test 3s 99.88 99.72 99.44 99.15 98.72 98.33 97.89 97.18 96.50 95.16 93.82
Enroll 10s-Test 5s 99.76 99.80 99.66 99.38 99.17 98.87 98.58 98.08 97.47 96.47 95.38
Enroll 10s-Test 10s 99.92 99.82 99.75 99.57 99.37 99.20 98.88 98.52 98.13 97.47 96.63
Enroll 20s-Test 3s 99.68 99.64 99.44 99.36 98.99 98.73 98.22 97.78 97.21 96.16 95.07
Enroll 20s-Test 5s 99.80 99.88 99.70 99.56 99.36 99.17 98.85 98.45 97.98 97.20 96.37
Enroll 20s-Test 10s 99.88 99.92 99.80 99.67 99.56 99.41 99.17 98.88 98.58 98.02 97.37
Enroll 20s-Test 20s 100.00 99.90 99.84 99.78 99.68 99.53 99.39 99.24 99.01 98.47 97.84

A graphical representation of the accuracy of the system per value of N, for different audio durations is shown next.

Alt

Voice Authenticity (Anti-spoofing)

Veridas has developed proprietary anti-spoofing technology designed to detect and prevent fraudulent access attempts. Given an audio file with a voice, the detector estimates the likelihood of the voice being authentic.

The performance of this technology has been proven efficient against all kinds of expected attacks including both presentation and injection attacks using millions of samples of spoof and non-spoof attempts.

Veridas also works with third-party evaluators to assess and validate the security of the Veridas voice anti-spoofing technology.