Speaker Diarization API

Offline speaker diarization API reference for sherpa-onnx-node.

Source file

scripts/node-addon-api/lib/non-streaming-speaker-diarization.js

API

OfflineSpeakerDiarization

Identifies “who spoke when” in an audio recording.

Constructor

const diarizer = new sherpa_onnx.OfflineSpeakerDiarization(config);
param config

Configuration object with:

  • segmentation (object, optional) — Segmentation model config:

    • pyannote{ model: string } path to the segmentation ONNX model.

    • numThreads (number, optional).

    • debug (boolean, optional).

    • provider (string, optional).

  • embedding (object, optional) — Speaker embedding model config:

    • model (string) — Path to the embedding ONNX model.

    • numThreads (number, optional).

    • debug (boolean, optional).

    • provider (string, optional).

  • clustering (object, optional) — Clustering config:

    • numClusters (number, optional) — Number of speakers (0 = auto).

    • threshold (number, optional) — Clustering threshold.

  • minDurationOn (number, optional) — Min speaker segment duration.

  • minDurationOff (number, optional) — Min non-speech duration.

Methods

diarizer.process(samples)

Run diarization on the input audio.

param samples

Audio samples in [-1, 1] (Float32Array).

returns

An array of SpeakerDiarizationSegment objects, each with:

  • start (number) — Start time in seconds.

  • end (number) — End time in seconds.

  • speaker (number) — Speaker ID (integer).

diarizer.setConfig(config)

Update clustering configuration at runtime.

param config

{ clustering: { numClusters?, threshold? } }.

Properties

  • diarizer.config — The configuration object.

  • diarizer.sampleRate — Expected sample rate in Hz (number).

Example

const sherpa_onnx = require('sherpa-onnx-node');

const diarizer = new sherpa_onnx.OfflineSpeakerDiarization({
  segmentation: { pyannote: { model: './segmentation-3-0.onnx' } },
  embedding: { model: './3dspeaker_speech_eres2net_base_sv_zh-cn_3dspeaker_16k.onnx' },
  clustering: { numClusters: 0, threshold: 0.5 },
});

const wave = sherpa_onnx.readWave('./audio.wav');
const segments = diarizer.process(wave.samples);

for (const seg of segments) {
  console.log(`Speaker ${seg.speaker}: ${seg.start.toFixed(2)}s - ${seg.end.toFixed(2)}s`);
}

Notes

  • The input audio should be mono, 16kHz, float32 in [-1, 1].

  • Set numClusters: 0 to auto-detect the number of speakers.

  • Use setConfig() to adjust clustering parameters without re-creating the diarizer.