Introduction¶
Voice biometrics is a state-of-the-art technology that allows a person to be validated by his/her voice. VERIDAS solution captures the unique physical features of the vocal apparatus and features such as frequency, speed and accents and compiles them together into a virtually unique voice biometric vector per person.
The voice biometric vector is a mathematical descriptor obtained from the characteristics of the voice in an audio recording. This mathematical conversion from voice into a biometric vector is irreversible. Therefore, it is not possible to recover a person's voice signal from the calculated biometric vector.
VERIDAS has developed its own speaker verification engine (das-Peak) as a cloud-based solution that can be consumed via APIs.
VERIDAS’ voice biometrics group has participated in the short-duration Speaker Verification Challenge(SdSV) 2020 getting the 3º award (2º single model), demonstrating best results in the state of the art in Voice Biometrics for short utterances conditions. Check here the results.
das-Peak calculates the similarity between two audio recordings (in terms of the speakers present in them) using biometric algorithms. das-Peak engine allows to authenticate users voice without the need of using a password or predefined phrase (passive recognition) as it is based on text-independent technology. This means that the biometric comparison is related to the voice characteristics and not to the content of the sentence. However, the system is flexible to use pre-defined phrases in order to fulfill customer requirements or additional controls.
Within the voice biometrics field, two scenarios are typically handled:
- Verification: The process of checking the identity of a person by comparing two audios.
- Identification: The process of searching a person or a set of persons within a database of identities and its audio input data.
So far, das-Peak holds solutions for the verification and identification problem.
Given two audio recordings, the system returns a score based on the similarity of both of them, not regarding speech recognition but to the speakers present in them.
das-Peak is offered as an API REST format. The process to obtain the value of similarity between two audios is described below.
- Two audio recordings are sent to the API.
- The audio recordings are pre-processed. This process detects voice in the audio recordings (removing parts of silence) and analyzes the noise of the signals.
- The audio recordings are converted into irreversible mathematical descriptors (voice biometric vectors).
- Both mathematical vectors are compared and a matching score between 0 and 1 is provided. This matching score represents the probability that the audios belong to the same person. The higher the score, the greater the certainty to be the same person.
- You can use this matching score to validate the identity of a customer. Recommendation to define a threshold within required confidence level using the FAR (False Acceptance Rate) and FRR (False Rejection Rate) expected ratios.
das-Peak also provides the biometric vector generated from audio. With this information, it is possible to carry out the verification between a biometric registration vector and a new audio, instead of between two audios.