English
Hint
Please refer to Installation to install sherpa-onnx before you read this section.
Note
We use ./build/bin/sherpa-offline as an example in this section. You can use other scripts such as
This page lists offline CTC models from NeMo for English.
stt_en_citrinet_512
This model is converted from
Citrinet-512 model which has been trained on the ASR Set dataset with over 7000 hours of english speech.
In the following, we describe how to download it and use it with sherpa-onnx.
Download the model
Please use the following commands to download it.
cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-nemo-ctc-en-citrinet-512.tar.bz2
tar xvf sherpa-onnx-nemo-ctc-en-citrinet-512.tar.bz2
rm sherpa-onnx-nemo-ctc-en-citrinet-512.tar.bz2
Please check that the file sizes of the pre-trained models are correct. See
the file sizes of *.onnx
files below.
sherpa-onnx-nemo-ctc-en-citrinet-512 fangjun$ ls -lh *.onnx
-rw-r--r-- 1 fangjun staff 36M Apr 7 16:10 model.int8.onnx
-rw-r--r-- 1 fangjun staff 142M Apr 7 14:24 model.onnx
Decode wave files
Hint
It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.
The following code shows how to use fp32
models to decode wave files.
Please replace model.onnx
with model.int8.onnx
to use int8
quantized model.
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-offline \
--tokens=./sherpa-onnx-nemo-ctc-en-citrinet-512/tokens.txt \
--nemo-ctc-model=./sherpa-onnx-nemo-ctc-en-citrinet-512/model.onnx \
--num-threads=2 \
--decoding-method=greedy_search \
--debug=false \
./sherpa-onnx-nemo-ctc-en-citrinet-512/test_wavs/0.wav \
./sherpa-onnx-nemo-ctc-en-citrinet-512/test_wavs/1.wav \
./sherpa-onnx-nemo-ctc-en-citrinet-512/test_wavs/8k.wav
Note
Please use ./build/bin/Release/sherpa-onnx-offline.exe
for Windows.
You should see the following output:
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:361 ./build/bin/sherpa-onnx-offline --tokens=./sherpa-onnx-nemo-ctc-en-citrinet-512/tokens.txt --nemo-ctc-model=./sherpa-onnx-nemo-ctc-en-citrinet-512/model.onnx --num-threads=2 --decoding-method=greedy_search --debug=false ./sherpa-onnx-nemo-ctc-en-citrinet-512/test_wavs/0.wav ./sherpa-onnx-nemo-ctc-en-citrinet-512/test_wavs/1.wav ./sherpa-onnx-nemo-ctc-en-citrinet-512/test_wavs/8k.wav
OfflineRecognizerConfig(feat_config=OfflineFeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model="./sherpa-onnx-nemo-ctc-en-citrinet-512/model.onnx"), tokens="./sherpa-onnx-nemo-ctc-en-citrinet-512/tokens.txt", num_threads=2, debug=False), decoding_method="greedy_search")
Creating recognizer ...
Started
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:105 Creating a resampler:
in_sample_rate: 8000
output_sample_rate: 16000
Done!
./sherpa-onnx-nemo-ctc-en-citrinet-512/test_wavs/0.wav
after early nightfall the yellow lamps would light up here and there the squalid quarter of the brothels
----
./sherpa-onnx-nemo-ctc-en-citrinet-512/test_wavs/1.wav
god as a direct consequence of the sin which man thus punished had given her a lovely child whose place was on that same dishonoured bosom to connect her parent for ever with the race and descent of mortals and to be finally a blessed soul in heaven
----
./sherpa-onnx-nemo-ctc-en-citrinet-512/test_wavs/8k.wav
yet these thoughts affected hester prynne less with hope than apprehension
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 4.963 s
Real time factor (RTF): 4.963 / 28.165 = 0.176
stt_en_conformer_ctc_small
This model is converted from
It contains small size versions of Conformer-CTC (13M parameters) trained on NeMo ASRSet with around 16000 hours of english speech. The model transcribes speech in lower case english alphabet along with spaces and apostrophes.
In the following, we describe how to download it and use it with sherpa-onnx.
Download the model
Please use the following commands to download it.
cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-nemo-ctc-en-conformer-small.tar.bz2
tar xvf sherpa-onnx-nemo-ctc-en-conformer-small.tar.bz2
rm sherpa-onnx-nemo-ctc-en-conformer-small.tar.bz2
Please check that the file sizes of the pre-trained models are correct. See
the file sizes of *.onnx
files below.
sherpa-onnx-nemo-ctc-en-conformer-small fangjun$ ls -lh *.onnx
-rw-r--r-- 1 fangjun staff 44M Apr 7 20:24 model.int8.onnx
-rw-r--r-- 1 fangjun staff 81M Apr 7 18:56 model.onnx
Decode wave files
Hint
It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.
The following code shows how to use fp32
models to decode wave files.
Please replace model.onnx
with model.int8.onnx
to use int8
quantized model.
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-offline \
--tokens=./sherpa-onnx-nemo-ctc-en-conformer-small/tokens.txt \
--nemo-ctc-model=./sherpa-onnx-nemo-ctc-en-conformer-small/model.onnx \
--num-threads=2 \
--decoding-method=greedy_search \
--debug=false \
./sherpa-onnx-nemo-ctc-en-conformer-small/test_wavs/0.wav \
./sherpa-onnx-nemo-ctc-en-conformer-small/test_wavs/1.wav \
./sherpa-onnx-nemo-ctc-en-conformer-small/test_wavs/8k.wav
Note
Please use ./build/bin/Release/sherpa-onnx-offline.exe
for Windows.
You should see the following output:
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:361 ./build/bin/sherpa-onnx-offline --tokens=./sherpa-onnx-nemo-ctc-en-conformer-small/tokens.txt --nemo-ctc-model=./sherpa-onnx-nemo-ctc-en-conformer-small/model.onnx --num-threads=2 --decoding-method=greedy_search --debug=false ./sherpa-onnx-nemo-ctc-en-conformer-small/test_wavs/0.wav ./sherpa-onnx-nemo-ctc-en-conformer-small/test_wavs/1.wav ./sherpa-onnx-nemo-ctc-en-conformer-small/test_wavs/8k.wav
OfflineRecognizerConfig(feat_config=OfflineFeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model="./sherpa-onnx-nemo-ctc-en-conformer-small/model.onnx"), tokens="./sherpa-onnx-nemo-ctc-en-conformer-small/tokens.txt", num_threads=2, debug=False), decoding_method="greedy_search")
Creating recognizer ...
Started
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:105 Creating a resampler:
in_sample_rate: 8000
output_sample_rate: 16000
Done!
./sherpa-onnx-nemo-ctc-en-conformer-small/test_wavs/0.wav
after early nightfall the yellow lamps would light up here and there the squalid quarter of the brothels
----
./sherpa-onnx-nemo-ctc-en-conformer-small/test_wavs/1.wav
god as a direct consequence of the sin which man thus punished had given her a lovely child whose place was on that same dishonoured bosom to connect her parent for ever with the race and descent of mortals and to be finally a blessed soul in heaven
----
./sherpa-onnx-nemo-ctc-en-conformer-small/test_wavs/8k.wav
yet these thoughts affected hester prin less with hope than apprehension
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 0.665 s
Real time factor (RTF): 0.665 / 28.165 = 0.024
stt_en_conformer_ctc_medium
This model is converted from
It contains medium size versions of Conformer-CTC (around 30M parameters) trained on NeMo ASRSet with around 16000 hours of english speech. The model transcribes speech in lower case english alphabet along with spaces and apostrophes.
In the following, we describe how to download it and use it with sherpa-onnx.
Download the model
Please use the following commands to download it.
cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-nemo-ctc-en-conformer-medium.tar.bz2
tar xvf sherpa-onnx-nemo-ctc-en-conformer-medium.tar.bz2
rm sherpa-onnx-nemo-ctc-en-conformer-medium.tar.bz2
Please check that the file sizes of the pre-trained models are correct. See
the file sizes of *.onnx
files below.
sherpa-onnx-nemo-ctc-en-conformer-medium fangjun$ ls -lh *.onnx
-rw-r--r-- 1 fangjun staff 64M Apr 7 20:44 model.int8.onnx
-rw-r--r-- 1 fangjun staff 152M Apr 7 20:43 model.onnx
Decode wave files
Hint
It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.
The following code shows how to use fp32
models to decode wave files.
Please replace model.onnx
with model.int8.onnx
to use int8
quantized model.
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-offline \
--tokens=./sherpa-onnx-nemo-ctc-en-conformer-medium/tokens.txt \
--nemo-ctc-model=./sherpa-onnx-nemo-ctc-en-conformer-medium/model.onnx \
--num-threads=2 \
--decoding-method=greedy_search \
--debug=false \
./sherpa-onnx-nemo-ctc-en-conformer-medium/test_wavs/0.wav \
./sherpa-onnx-nemo-ctc-en-conformer-medium/test_wavs/1.wav \
./sherpa-onnx-nemo-ctc-en-conformer-medium/test_wavs/8k.wav
Note
Please use ./build/bin/Release/sherpa-onnx-offline.exe
for Windows.
You should see the following output:
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:361 ./build/bin/sherpa-onnx-offline --tokens=./sherpa-onnx-nemo-ctc-en-conformer-medium/tokens.txt --nemo-ctc-model=./sherpa-onnx-nemo-ctc-en-conformer-medium/model.onnx --num-threads=2 --decoding-method=greedy_search --debug=false ./sherpa-onnx-nemo-ctc-en-conformer-medium/test_wavs/0.wav ./sherpa-onnx-nemo-ctc-en-conformer-medium/test_wavs/1.wav ./sherpa-onnx-nemo-ctc-en-conformer-medium/test_wavs/8k.wav
OfflineRecognizerConfig(feat_config=OfflineFeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model="./sherpa-onnx-nemo-ctc-en-conformer-medium/model.onnx"), tokens="./sherpa-onnx-nemo-ctc-en-conformer-medium/tokens.txt", num_threads=2, debug=False), decoding_method="greedy_search")
Creating recognizer ...
Started
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:105 Creating a resampler:
in_sample_rate: 8000
output_sample_rate: 16000
Done!
./sherpa-onnx-nemo-ctc-en-conformer-medium/test_wavs/0.wav
after early nightfall the yellow lamps would light up here and there the squalid quarter of the brothels
----
./sherpa-onnx-nemo-ctc-en-conformer-medium/test_wavs/1.wav
god as a direct consequence of the sin which man thus punished had given her a lovely child whose place was on that same dishonored bosom to connect her parent for ever with the race and descent of mortals and to be finally a blessed soul in heaven
----
./sherpa-onnx-nemo-ctc-en-conformer-medium/test_wavs/8k.wav
yet these thoughts affected hester pryne less with hope than apprehension
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 1.184 s
Real time factor (RTF): 1.184 / 28.165 = 0.042
stt_en_conformer_ctc_large
This model is converted from
It contains large size versions of Conformer-CTC (around 120M parameters) trained on NeMo ASRSet with around 24500 hours of english speech. The model transcribes speech in lower case english alphabet along with spaces and apostrophes
In the following, we describe how to download it and use it with sherpa-onnx.
Download the model
Please use the following commands to download it.
cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-nemo-ctc-en-conformer-large.tar.bz2
tar xvf sherpa-onnx-nemo-ctc-en-conformer-large.tar.bz2
rm sherpa-onnx-nemo-ctc-en-conformer-large.tar.bz2
Please check that the file sizes of the pre-trained models are correct. See
the file sizes of *.onnx
files below.
sherpa-onnx-nemo-ctc-en-conformer-large fangjun$ ls -lh *.onnx
-rw-r--r-- 1 fangjun staff 162M Apr 7 22:01 model.int8.onnx
-rw-r--r-- 1 fangjun staff 508M Apr 7 22:01 model.onnx
Decode wave files
Hint
It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.
The following code shows how to use fp32
models to decode wave files.
Please replace model.onnx
with model.int8.onnx
to use int8
quantized model.
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-offline \
--tokens=./sherpa-onnx-nemo-ctc-en-conformer-large/tokens.txt \
--nemo-ctc-model=./sherpa-onnx-nemo-ctc-en-conformer-large/model.onnx \
--num-threads=2 \
--decoding-method=greedy_search \
--debug=false \
./sherpa-onnx-nemo-ctc-en-conformer-large/test_wavs/0.wav \
./sherpa-onnx-nemo-ctc-en-conformer-large/test_wavs/1.wav \
./sherpa-onnx-nemo-ctc-en-conformer-large/test_wavs/8k.wav
Note
Please use ./build/bin/Release/sherpa-onnx-offline.exe
for Windows.
You should see the following output:
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:361 ./build/bin/sherpa-onnx-offline --tokens=./sherpa-onnx-nemo-ctc-en-conformer-large/tokens.txt --nemo-ctc-model=./sherpa-onnx-nemo-ctc-en-conformer-large/model.onnx --num-threads=2 --decoding-method=greedy_search --debug=false ./sherpa-onnx-nemo-ctc-en-conformer-large/test_wavs/0.wav ./sherpa-onnx-nemo-ctc-en-conformer-large/test_wavs/1.wav ./sherpa-onnx-nemo-ctc-en-conformer-large/test_wavs/8k.wav
OfflineRecognizerConfig(feat_config=OfflineFeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model="./sherpa-onnx-nemo-ctc-en-conformer-large/model.onnx"), tokens="./sherpa-onnx-nemo-ctc-en-conformer-large/tokens.txt", num_threads=2, debug=False), decoding_method="greedy_search")
Creating recognizer ...
Started
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:105 Creating a resampler:
in_sample_rate: 8000
output_sample_rate: 16000
Done!
./sherpa-onnx-nemo-ctc-en-conformer-large/test_wavs/0.wav
after early nightfall the yellow lamps would light up here and there the squalid quarter of the brothels
----
./sherpa-onnx-nemo-ctc-en-conformer-large/test_wavs/1.wav
god as a direct consequence of the sin which man thus punished had given her a lovely child whose place was on that same dishonored bosom to connect her parent for ever with the race and descent of mortals and to be finally a blesed soul in heaven
----
./sherpa-onnx-nemo-ctc-en-conformer-large/test_wavs/8k.wav
yet these thoughts afected hester pryne les with hope than aprehension
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 3.553 s
Real time factor (RTF): 3.553 / 28.165 = 0.126