Skip to content

Presentation

This framework records audio from the microphone and applies some Digital Signal Processing to it.

The client app can request all the recorded audio and the audio fragment that has only voice samples in a real-time fashion, as well as other information from the recording.

Specifications

  • Android minimum SDK version: API Level 21 (Android 5) (mobile devices only).
  • Signal format: 16 bit linear PCM, signed.
  • Frequency of sampling: 8 or 16 KHz, configurable.
  • 30 secs. of recording (maximum).
  • SDK size: 0.068 MB.

Library dependencies

  • Internal dependencies: - VDLibrary (common-core-jvm.jar)

Integration

  • Create a new Android Project.
  • Add permission "android.permission.RECORD_AUDIO" to app's manifest.
  • Copy the SDK "voice-capture-release.aar" into "app/libs" directory.
  • Copy the SDK "common-core-jvm.jar" into "app/libs" directory.
  • Add "implementation fileTree(dir: 'libs', include: ['*.aar', *.jar'])" to dependencies section in app's build.gradle.
  • See the example below as a possible starting point for coding. Note that PermissionsHandler is only a helper class and thus, the use of this class is optional. The app must request the user the permission for recording audio, anyway.

Real-time requests from the UI to the voice framework

Typical scenario

  1. The UI configures the recording.
  2. The UI starts the recording. It can be stopped at any time.
  3. The UI sends requests to the SDK and receives the responses asynchronously through callbacks. There are 5 possible responses:
    • The audio info with the recording information.
    • The state of the recorder.
    • The total signal recorded so far.
    • The portion of voice from the recorded signal so far.
    • Possible errors from the SDK.

Example scenario

  • On the MainActivity (example scenario):
    • Create a VoiceRecorder.
    • Configure the VoiceRecorded (optional).
    • Register a callback listener, as a subclass of VoiceRecorderAdapter, and override the desired methods.
    • Starts the recording, on user action.
    • Process the responses within the adapter, like the received audio. Also process the changes of state of the recorder, tipically to enable/disable certain user controls.
    • Send extra requests to the VoiceRecorder and process the asynchronous response with the adapter.
    • Stops the recorder when desired.

Simplified Object Diagram

object diagram

Example of Android's MainActivity

package com.example.voice

import androidx.appcompat.app.AppCompatActivity
import android.os.Bundle
import com.veridas.voice.sdk.AudioInfo
import com.veridas.voice.sdk.SignalWrapper
import com.veridas.voice.sdk.VoiceRecorder
import com.veridas.voice.sdk.VoiceRecorderAdapter
import com.veridas.voice.sdk.helpers.PermissionsHandler

class MainActivity : AppCompatActivity(), PermissionsHandler.ResultCallback {
    // [...] Views & stuff

    // Model & recorder
    // [...]
    private lateinit var voiceRecorder: VoiceRecorder

    // Adapter for callbacks
    private lateinit var recAdapter: RecAdapter

    // Permissions
    private lateinit var permissionsHandler: PermissionsHandler


    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        setContentView(R.layout.activity_main)

        // Configure Views
        // [...]

        // Model state & recorder
        // [...]
        voiceRecorder = VoiceRecorder()

        // Callback control
        recAdapter = RecAdapter()

        // Ask for permission
        /* (The PermissionsHandler is not mandatory, you can use whatever permissions manager you like) */
        permissionsHandler = PermissionsHandler(this, arrayOf(android.Manifest.permission.RECORD_AUDIO))
        permissionsHandler.askForPermissions()

        println(voiceRecorder.version)
    }


    override fun onResume() {
        super.onResume();
        val config = VoiceRecorder.Config.Builder()
            .setFormat(VoiceRecorder.Config.RAW_FORMAT)
            .setFs(8000)
            .setAutoAudioInfo(true)
            .setAutoVoiceCapture(true)
            .setStopOnCapture(true)
            .build()
        voiceRecorder.setConfig(config)
        voiceRecorder.setListener(recAdapter)
    }


    override fun onPause() {
        voiceRecorder.setListener(null)
        super.onPause()
    }


    /**
     * Delegate the Activity's permissions callback to PermissionsHandler helper class.
     */
    override fun onRequestPermissionsResult(requestCode: Int, permissions: Array<out String>, grantResults: IntArray) {
        super.onRequestPermissionsResult(requestCode, permissions, grantResults)
        permissionsHandler.onRequestPermissionsResult(requestCode, permissions, grantResults)
    }


    /**
     * PermissionsHandler callback, accepted or not.
     */
    override fun onPermissionsResult(accepted: Boolean) {
        if (accepted) {
            voiceRecorder.start();
            println("Please, say something to the device (at least 10 seconds of speech)...")
        }
    }


    // ---------------------------------------------------------------------------------------------

    /**
     * Listener (adapter) for receiving the AudioThread callbacks.
     */
    private class RecAdapter: VoiceRecorderAdapter() {
        /**
         * Receives the state of the recorder. It could be STOPPED or RECORDING.
         * The UI changes in response to this state, enabling or disabling controls.
         */
        override fun onRecorderState(recorderState: Int) { /* Stub */ }


        override fun onException(exception: Exception ) { /* Stub */ }


        /**
         * Receives the AudioInfo of the recording, and displays it on the UI.
         */
        override fun onAudioInfo(audioInfo: AudioInfo) {
            // Voice & total audio recorded so far (ms)
            val audio = audioInfo.audioDuration
            val voice = audioInfo.voiceDuration
            val ok = audioInfo.enoughQuality
            println("Audio info: audio: $audio, voice: $voice, quality ok: $ok")
        }


        /**
         * Receives a wrapper with AudioInfo and all the audio recorded so far.
         */
        override fun onRecordedAudio(wrapper: SignalWrapper) {
            if (wrapper.rawSignal != null) { /* process audio in raw format (short[]) */ }
            if (wrapper.wavSignal != null) { /* process audio in wav format (byte[]) */ }
            if (wrapper.wavBase64Signal != null) { /* process audio in wav format (base64 text encoded) */ }
            println("Received ${wrapper.audioInfo.audioDuration} ms of audio.")
        }


        /**
         * Receives a wrapper with AudioInfo and all the voice recorded so far.
         */
        override fun onRecordedVoice(wrapper: SignalWrapper) {
            if (wrapper.rawSignal != null) { /* process voice in raw format (short[]) */ }
            if (wrapper.wavSignal != null) { /* process voice in wav format (byte[]) */ }
            if (wrapper.wavBase64Signal != null) { /* process voice in wav format (base64 text encoded) */ }
            println("Received ${wrapper.audioInfo.voiceDuration} ms of audio.")
        }
    }
}

Important note about integration

The SDK validates the audio recoriding prior to send it to the das-Peak service on the cloud. Thus, the expected behaviour is that both ends agree, either that the audio recording is valid or it is not valid.

But, although this is true in a general sense, there may be a few cases where both ends, the SDK and the cloud, disagree. This effect is due to slight algorithmic and precision differences.

The following practices are recommended:

  • Das-Peak service always has the final word. It could be different from SDK's validation.
  • To avoid discrepances between those systems, it is recomended to configure the SDK a little more restrictive than the cloud service. The default values of the VoiceRecorder guarantees that behaviour: 5 seconds of voice / 5 dB of signal-to-noise ratio.