Spoken Language Identification

Identify the language spoken in a WAV file using a Whisper multilingual model. This example classifies audio files into their corresponding languages.

Source file

nodejs-addon-examples/test_spoken_language_identification.js

Code

// Copyright (c)  2023-2024  Xiaomi Corporation
//
// Spoken language identification using a Whisper multilingual model.
//
// Usage:
//   node spoken_language_identification.js
//
const sherpa_onnx = require('sherpa-onnx-node');

// Download whisper multi-lingual models from
// https://github.com/k2-fsa/sherpa-onnx/releases/tag/asr-models
function createSpokenLanguageID() {
  const config = {
    whisper: {
      encoder: './sherpa-onnx-whisper-tiny/tiny-encoder.int8.onnx',
      decoder: './sherpa-onnx-whisper-tiny/tiny-decoder.int8.onnx',
    },
    debug: true,
    numThreads: 1,
    provider: 'cpu',
  };
  return new sherpa_onnx.SpokenLanguageIdentification(config);
}

const slid = createSpokenLanguageID();

const testWaves = [
  './spoken-language-identification-test-wavs/ar-arabic.wav',
  './spoken-language-identification-test-wavs/de-german.wav',
  './spoken-language-identification-test-wavs/en-english.wav',
  './spoken-language-identification-test-wavs/fr-french.wav',
  './spoken-language-identification-test-wavs/pt-portuguese.wav',
  './spoken-language-identification-test-wavs/es-spanish.wav',
  './spoken-language-identification-test-wavs/zh-chinese.wav',
];

// Intl.DisplayNames converts ISO language codes to human-readable names.
const display = new Intl.DisplayNames(['en'], {type: 'language'});

for (let f of testWaves) {
  const stream = slid.createStream();

  const wave = sherpa_onnx.readWave(f);
  stream.acceptWaveform({sampleRate: wave.sampleRate, samples: wave.samples});

  const lang = slid.compute(stream);
  console.log(`${f}: ${lang} (${display.of(lang)})`);
}

How to run

Install the package:
```
npm install sherpa-onnx-node
```

Download the Whisper multilingual model and test files:

curl -LS -O https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-whisper-tiny.tar.bz2
tar xvf sherpa-onnx-whisper-tiny.tar.bz2
rm sherpa-onnx-whisper-tiny.tar.bz2

curl -SL -O https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/spoken-language-identification-test-wavs.tar.bz2
tar xvf spoken-language-identification-test-wavs.tar.bz2
rm spoken-language-identification-test-wavs.tar.bz2

Set the library path and run:

# macOS
export DYLD_LIBRARY_PATH=$(npm root)/sherpa-onnx-node/lib:$DYLD_LIBRARY_PATH

# Linux
export LD_LIBRARY_PATH=$(npm root)/sherpa-onnx-node/lib:$LD_LIBRARY_PATH

node spoken_language_identification.js

Expected output

ar-arabic.wav: ar (Arabic)
de-german.wav: de (German)
en-english.wav: en (English)
fr-french.wav: fr (French)
pt-portuguese.wav: pt (Portuguese)
es-spanish.wav: es (Spanish)
zh-chinese.wav: zh (Chinese)

Notes

SpokenLanguageIdentification requires a Whisper multilingual model (not an English-only model).
compute() returns an ISO 639-1 language code (e.g., en, zh, fr).
Intl.DisplayNames is a built-in JavaScript API that converts language codes to human-readable names.