VAD API
Voice Activity Detection (VAD) API reference for sherpa-onnx-node.
Source file
API
Vad
Voice Activity Detector. Detects speech segments in an audio stream.
Constructor
const vad = new sherpa_onnx.Vad(config, bufferSizeInSeconds);
- param config
VAD configuration object.
- param bufferSizeInSeconds
Buffer size in seconds (number).
The config object has the following properties:
sileroVad(object, optional) — Silero VAD model config:model(string) — Path tosilero_vad.onnx.threshold(number) — Speech threshold, e.g.0.5.minSpeechDuration(number) — Min speech duration in seconds.minSilenceDuration(number) — Min silence duration in seconds.windowSize(number) — Window size in samples, e.g.512.maxSpeechDuration(number, optional) — Max speech duration in seconds.
tenVad(object, optional) — Ten VAD model config (same fields assileroVad).sampleRate(number) — Sample rate in Hz, e.g.16000.numThreads(number, optional) — Number of threads.debug(boolean, optional) — Enable debug output.
Methods
vad.acceptWaveform(samples)
Feed audio samples to the VAD.
- param samples
Input audio samples (
Float32Array).
vad.isDetected()
- returns
trueif speech is currently being detected (boolean).
vad.isEmpty()
- returns
trueif no completed speech segments are available (boolean).
vad.front(enableExternalBuffer?)
Get the earliest completed speech segment without removing it.
- param enableExternalBuffer
Whether to use an external buffer (
boolean, defaulttrue).- returns
A
SpeechSegmentobject withstart(number) andsamples(Float32Array).
vad.pop()
Remove the earliest completed speech segment from the queue.
vad.flush()
Flush the internal buffer to emit any pending speech segments.
vad.clear()
Clear all internal state.
vad.reset()
Reset the detector to its initial state.
Properties
vad.config— The configuration object passed to the constructor.
CircularBuffer
A circular buffer that stores Float32 audio samples.
Constructor
const buffer = new sherpa_onnx.CircularBuffer(capacity);
- param capacity
Buffer capacity in samples (number).
Methods
buffer.push(samples)
Push samples into the buffer.
- param samples
Audio samples (
Float32Array).
buffer.get(startIndex, n, enableExternalBuffer?)
Get a slice of samples from the buffer.
- param startIndex
Start index (number).
- param n
Number of samples to read (number).
- param enableExternalBuffer
Whether to use an external buffer (
boolean, defaulttrue).- returns
Audio samples (
Float32Array).
buffer.pop(n)
Remove n samples from the front of the buffer.
- param n
Number of samples to remove (number).
buffer.size()
- returns
Current number of samples in the buffer (
number).
buffer.head()
- returns
The head index of the buffer (
number).
buffer.reset()
Reset the buffer to empty state.
Example
const sherpa_onnx = require('sherpa-onnx-node');
const vad = new sherpa_onnx.Vad({
sileroVad: { model: './silero_vad.onnx', threshold: 0.5,
minSpeechDuration: 0.25, minSilenceDuration: 0.5, windowSize: 512 },
sampleRate: 16000, debug: false, numThreads: 1,
}, 60);
// Feed audio in chunks of windowSize samples
vad.acceptWaveform(samples);
if (vad.isDetected()) {
console.log('Speech detected');
}
while (!vad.isEmpty()) {
const segment = vad.front();
vad.pop();
console.log(`Segment: ${segment.samples.length} samples`);
}
Notes
Feed audio in chunks matching
windowSize(512 for Silero VAD at 16kHz).isDetected()returnstruewhile speech is ongoing.isEmpty()/front()/pop()are used to extract completed speech segments (speech followed by enough silence).Use
flush()at the end of a stream to emit any remaining buffered speech.You can use
ten-vad.onnxinstead ofsilero_vad.onnxby setting thetenVadconfig field and leavingsileroVad.modelempty.