Speaker Diarization API
Offline speaker diarization API reference for sherpa-onnx-node.
Source file
scripts/node-addon-api/lib/non-streaming-speaker-diarization.js
API
OfflineSpeakerDiarization
Identifies “who spoke when” in an audio recording.
Constructor
const diarizer = new sherpa_onnx.OfflineSpeakerDiarization(config);
- param config
Configuration object with:
segmentation(object, optional) — Segmentation model config:pyannote—{ model: string }path to the segmentation ONNX model.numThreads(number, optional).debug(boolean, optional).provider(string, optional).
embedding(object, optional) — Speaker embedding model config:model(string) — Path to the embedding ONNX model.numThreads(number, optional).debug(boolean, optional).provider(string, optional).
clustering(object, optional) — Clustering config:numClusters(number, optional) — Number of speakers (0 = auto).threshold(number, optional) — Clustering threshold.
minDurationOn(number, optional) — Min speaker segment duration.minDurationOff(number, optional) — Min non-speech duration.
Methods
diarizer.process(samples)
Run diarization on the input audio.
- param samples
Audio samples in
[-1, 1](Float32Array).- returns
An array of
SpeakerDiarizationSegmentobjects, each with:start(number) — Start time in seconds.end(number) — End time in seconds.speaker(number) — Speaker ID (integer).
diarizer.setConfig(config)
Update clustering configuration at runtime.
- param config
{ clustering: { numClusters?, threshold? } }.
Properties
diarizer.config— The configuration object.diarizer.sampleRate— Expected sample rate in Hz (number).
Example
const sherpa_onnx = require('sherpa-onnx-node');
const diarizer = new sherpa_onnx.OfflineSpeakerDiarization({
segmentation: { pyannote: { model: './segmentation-3-0.onnx' } },
embedding: { model: './3dspeaker_speech_eres2net_base_sv_zh-cn_3dspeaker_16k.onnx' },
clustering: { numClusters: 0, threshold: 0.5 },
});
const wave = sherpa_onnx.readWave('./audio.wav');
const segments = diarizer.process(wave.samples);
for (const seg of segments) {
console.log(`Speaker ${seg.speaker}: ${seg.start.toFixed(2)}s - ${seg.end.toFixed(2)}s`);
}
Notes
The input audio should be mono, 16kHz, float32 in
[-1, 1].Set
numClusters: 0to auto-detect the number of speakers.Use
setConfig()to adjust clustering parameters without re-creating the diarizer.