Pre-trained models

Pre-trained models can be found at https://github.com/k2-fsa/sherpa-onnx/releases/tag/speech-enhancement-models

gtcrn_simple

This model is from https://github.com/Xiaobin-Rong/gtcrn. You can find its paper at https://ieeexplore.ieee.org/document/10448310.

In the following, we describe how to download and use it with sherpa-onnx.

Download the model

Please use the following code to download the model:

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/speech-enhancement-models/gtcrn_simple.onnx

After downloading, you can check its file size:

ls -lh  gtcrn_simple.onnx
-rw-r--r--  1 fangjun  staff   523K Mar 10 18:44 gtcrn_simple.onnx

Then we download a wave file for testing

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/speech-enhancement-models/speech_with_noise.wav

The info about the downloaded test wave file is given below:

soxi ./speech_with_noise.wav

Input File     : './speech_with_noise.wav'
Channels       : 1
Sample Rate    : 16000
Precision      : 16-bit
Duration       : 00:00:02.40 = 38363 samples ~ 179.827 CDDA sectors
File Size      : 76.8k
Bit Rate       : 256k
Sample Encoding: 16-bit Signed Integer PCM

Now we can run:

./build/bin/sherpa-onnx-offline-denoiser \
  --speech-denoiser-gtcrn-model=./gtcrn_simple.onnx \
  --input-wav=./speech_with_noise.wav \
  --output-wav=./enhanced-16k.wav

The log of the above command is:

/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:375 ./build/bin/sherpa-onnx-offline-denoiser --speech-denoiser-gtcrn-model=./gtcrn_simple.onnx --input-wav=./speech_with_noise.wav --output-wav=./enhanced-16k.wav

OfflineSpeechDenoiserConfig(model=OfflineSpeechDenoiserModelConfig(gtcrn=OfflineSpeechDenoiserGtcrnModelConfig(model="./gtcrn_simple.onnx"), num_threads=1, debug=False, provider="cpu"))
Started
Done
Saved to ./enhanced-16k.wav
num threads: 1
Elapsed seconds: 0.171 s
Real time factor (RTF): 0.171 / 2.398 = 0.071
ls -lh enhanced-16k.wav
-rw-r--r--  1 fangjun  staff    75K Mar 22 16:08 enhanced-16k.wav

soxi ./enhanced-16k.wav

Input File     : './enhanced-16k.wav'
Channels       : 1
Sample Rate    : 16000
Precision      : 16-bit
Duration       : 00:00:02.38 = 38144 samples ~ 178.8 CDDA sectors
File Size      : 76.3k
Bit Rate       : 256k
Sample Encoding: 16-bit Signed Integer PCM

For comparison, we give the two wave files below so that you can listen to them.

Wave filename Content
speech_with_noise.wav
enhanced-16k.wav