Speech Enhancement

Remove background noise from audio using a GTCRN (Global Token Channel Attention Recurrent Network) model. This is useful for cleaning up noisy recordings before transcription.

Source file

nodejs-addon-examples/test_offline_speech_enhancement_gtcrn.js

Code

 1// Copyright (c)  2025  Xiaomi Corporation
 2//
 3// Offline speech enhancement (denoising) using a GTCRN model.
 4//
 5// Usage:
 6//   node speech_enhancement.js
 7//
 8const sherpa_onnx = require('sherpa-onnx-node');
 9
10// Download models from
11// https://github.com/k2-fsa/sherpa-onnx/releases/tag/speech-enhancement-models
12function createOfflineSpeechDenoiser() {
13  const config = {
14    model: {
15      gtcrn: {model: './gtcrn_simple.onnx'},
16      debug: true,
17      numThreads: 1,
18    },
19  };
20  return new sherpa_onnx.OfflineSpeechDenoiser(config);
21}
22
23const sd = createOfflineSpeechDenoiser();
24
25const waveFilename = './inp_16k.wav';
26const wave = sherpa_onnx.readWave(waveFilename);
27
28// run() accepts {samples, sampleRate, enableExternalBuffer} and returns
29// {samples, sampleRate}.
30const denoised = sd.run({
31  samples: wave.samples,
32  sampleRate: wave.sampleRate,
33  enableExternalBuffer: true
34});
35
36sherpa_onnx.writeWave(
37    './enhanced-16k.wav',
38    {samples: denoised.samples, sampleRate: denoised.sampleRate});
39
40console.log(`Saved to ./enhanced-16k.wav`);

How to run

  1. Install the package:

    npm install sherpa-onnx-node
    
  2. Download the model and test file:

    curl -SL -O https://github.com/k2-fsa/sherpa-onnx/releases/download/speech-enhancement-models/gtcrn_simple.onnx
    curl -SL -O https://github.com/k2-fsa/sherpa-onnx/releases/download/speech-enhancement-models/inp_16k.wav
    
  3. Set the library path and run:

    # macOS
    export DYLD_LIBRARY_PATH=$(npm root)/sherpa-onnx-node/lib:$DYLD_LIBRARY_PATH
    
    # Linux
    export LD_LIBRARY_PATH=$(npm root)/sherpa-onnx-node/lib:$LD_LIBRARY_PATH
    
    node speech_enhancement.js
    

Expected output

Saved to ./enhanced-16k.wav

Notes

  • OfflineSpeechDenoiser processes the entire audio file at once.

  • run() accepts {samples, sampleRate, enableExternalBuffer} and returns {samples, sampleRate}.

  • enableExternalBuffer: true enables zero-copy buffer sharing.

  • The output sample rate matches the input sample rate (16kHz in this example).

  • You can also use dpdfnet_baseline.onnx as an alternative model.