NeMo transducer-based Models

Hint

Please refer to Installation to install sherpa-onnx before you read this section.

sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8 (English, 英语)

This model is converted from

You can find the conversion script at

In the following, we describe how to download it and use it with sherpa-onnx.

Hint

This model supports punctuations and cases.

Download the model

Please use the following commands to download it.

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8.tar.bz2
tar xvf sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8.tar.bz2
rm sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8.tar.bz2

Hint

If you want to try float16 quantized model, please use sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-fp16.tar.bz2.

If you want to try non-quantized decoder and joiner models, please use sherpa-onnx-nemo-parakeet-tdt-0.6b-v2.tar.bz2

You should see something like below after downloading:

ls -lh sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/
total 1295752
-rw-r--r--  1 fangjun  staff   6.9M May  6 16:24 decoder.int8.onnx
-rw-r--r--  1 fangjun  staff   622M May  6 16:24 encoder.int8.onnx
-rw-r--r--  1 fangjun  staff   1.7M May  6 16:24 joiner.int8.onnx
drwxr-xr-x  3 fangjun  staff    96B May  6 16:24 test_wavs
-rw-r--r--  1 fangjun  staff   9.2K May  6 16:24 tokens.txt

Decode wave files

Hint

It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-offline \
  --encoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/encoder.int8.onnx \
  --decoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/decoder.int8.onnx \
  --joiner=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/joiner.int8.onnx \
  --tokens=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/tokens.txt \
  --model-type=nemo_transducer \
  ./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/test_wavs/0.wav

Note

Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.

You should see the following output:

/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --encoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/encoder.int8.onnx --decoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/decoder.int8.onnx --joiner=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/joiner.int8.onnx --tokens=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/tokens.txt --model-type=nemo_transducer ./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/test_wavs/0.wav

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/encoder.int8.onnx", decoder_filename="./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/decoder.int8.onnx", joiner_filename="./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/joiner.int8.onnx"), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), telespeech_ctc="", tokens="./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type="nemo_transducer", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(dict_dir="", lexicon="", rule_fsts=""))
Creating recognizer ...
Started
Done!

./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/test_wavs/0.wav
{"lang": "", "emotion": "", "event": "", "text": " Well, I don't wish to see it any more, observed Phebe, turning away her eyes. It is certainly very like the old portrait.", "timestamps": [0.32, 0.64, 0.72, 0.80, 0.88, 0.96, 1.04, 1.12, 1.28, 1.44, 1.60, 1.76, 1.92, 2.00, 2.24, 2.32, 2.40, 2.48, 2.64, 2.72, 2.88, 3.12, 3.36, 3.44, 3.52, 3.68, 3.76, 3.92, 4.16, 4.24, 4.32, 4.64, 4.96, 5.12, 5.36, 5.44, 5.52, 5.60, 5.76, 6.00, 6.24, 6.40, 6.48, 6.64, 6.72, 6.80, 6.88, 7.04], "tokens":[" Well", ",", " I", " don", "'", "t", " w", "ish", " to", " see", " it", " any", " more", ",", " ob", "s", "er", "ved", " P", "he", "be", ",", " t", "ur", "ning", " a", "way", " her", " e", "y", "es", ".", " It", " is", " c", "ert", "ain", "ly", " very", " like", " the", " o", "ld", " p", "ort", "ra", "it", "."], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 0.874 s
Real time factor (RTF): 0.874 / 7.435 = 0.118

Speech recognition from a microphone

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-microphone-offline \
  --encoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/encoder.int8.onnx \
  --decoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/decoder.int8.onnx \
  --joiner=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/joiner.int8.onnx \
  --tokens=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/tokens.txt \
  --model-type=nemo_transducer

Speech recognition from a microphone with VAD

cd /path/to/sherpa-onnx

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx

./build/bin/sherpa-onnx-vad-microphone-offline-asr \
  --silero-vad-model=./silero_vad.onnx \
  --encoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/encoder.int8.onnx \
  --decoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/decoder.int8.onnx \
  --joiner=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/joiner.int8.onnx \
  --tokens=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/tokens.txt \
  --model-type=nemo_transducer

RTF on RK3588 with Cortex A76 CPU

In the following, we test this model on RK3588 with Cortex A76 CPU.

Information about the CPUs on the board is given below:

Architecture:                    aarch64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
CPU(s):                          8
On-line CPU(s) list:             0-7
Vendor ID:                       ARM
Model name:                      Cortex-A55
Model:                           0
Thread(s) per core:              1
Core(s) per socket:              4
Socket(s):                       1
Stepping:                        r2p0
CPU max MHz:                     1800.0000
CPU min MHz:                     408.0000
BogoMIPS:                        48.00
Flags:                           fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
Model name:                      Cortex-A76
Model:                           0
Thread(s) per core:              1
Core(s) per socket:              4
Socket(s):                       1
Stepping:                        r4p0
CPU max MHz:                     2304.0000
CPU min MHz:                     408.0000
BogoMIPS:                        48.00
Flags:                           fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
L1d cache:                       384 KiB (8 instances)
L1i cache:                       384 KiB (8 instances)
L2 cache:                        2.5 MiB (8 instances)
L3 cache:                        3 MiB (1 instance)
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Mmio stale data:   Not affected
Vulnerability Retbleed:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:        Mitigation; __user pointer sanitization
Vulnerability Spectre v2:        Vulnerable: Unprivileged eBPF enabled
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected

You can see that it has 8 CPUs: 4 Cortex A55 + 4 Cortex A76.

We use taskset below to test the RTF on Cortex A76.

taskset 0x80 sherpa-onnx-offline \
  --num-threads=1 \
  --encoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/encoder.int8.onnx \
  --decoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/decoder.int8.onnx \
  --joiner=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/joiner.int8.onnx \
  --tokens=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/tokens.txt \
  --model-type=nemo_transducer \
  ./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/test_wavs/0.wav

Its output is given below:

/project/sherpa-onnx/csrc/parse-options.cc:Read:372 sherpa-onnx-offline --num-threads=1 --encoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/encoder.int8.onnx --decoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/decoder.int8.onnx --joiner=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/joiner.int8.onnx --tokens=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/tokens.txt --model-type=nemo_transducer ./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/test_wavs/0.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/encoder.int8.onnx", decoder_filename="./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/decoder.int8.onnx", joiner_filename="./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/joiner.int8.onnx"), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), telespeech_ctc="", tokens="./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="nemo_transducer", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(dict_dir="", lexicon="", rule_fsts=""))
Creating recognizer ...
Started
Done!

./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/test_wavs/0.wav
{"lang": "", "emotion": "", "event": "", "text": " Well, I don't wish to see it any more, observed Phebe, turning away her eyes. It is certainly very like the old portrait.", "timestamps": [0.32, 0.64, 0.72, 0.80, 0.88, 0.96, 1.04, 1.12, 1.28, 1.44, 1.60, 1.76, 1.92, 2.00, 2.24, 2.32, 2.40, 2.48, 2.64, 2.72, 2.88, 3.12, 3.36, 3.44, 3.52, 3.68, 3.76, 3.92, 4.16, 4.24, 4.32, 4.64, 4.96, 5.12, 5.36, 5.44, 5.52, 5.60, 5.76, 6.00, 6.24, 6.40, 6.48, 6.64, 6.72, 6.80, 6.88, 7.04], "tokens":[" Well", ",", " I", " don", "'", "t", " w", "ish", " to", " see", " it", " any", " more", ",", " ob", "s", "er", "ved", " P", "he", "be", ",", " t", "ur", "ning", " a", "way", " her", " e", "y", "es", ".", " It", " is", " c", "ert", "ain", "ly", " very", " like", " the", " o", "ld", " p", "ort", "ra", "it", "."], "words": []}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 1.639 s
Real time factor (RTF): 1.639 / 7.435 = 0.220

To test the RTF with different --num-threads, we use:

taskset 0xc0 sherpa-onnx-offline \
  --num-threads=2 \
  --encoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/encoder.int8.onnx \
  --decoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/decoder.int8.onnx \
  --joiner=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/joiner.int8.onnx \
  --tokens=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/tokens.txt \
  --model-type=nemo_transducer \
  ./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/test_wavs/0.wav

taskset 0xe0 sherpa-onnx-offline \
  --num-threads=3 \
  --encoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/encoder.int8.onnx \
  --decoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/decoder.int8.onnx \
  --joiner=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/joiner.int8.onnx \
  --tokens=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/tokens.txt \
  --model-type=nemo_transducer \
  ./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/test_wavs/0.wav

taskset 0xf0 sherpa-onnx-offline \
  --num-threads=4 \
  --encoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/encoder.int8.onnx \
  --decoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/decoder.int8.onnx \
  --joiner=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/joiner.int8.onnx \
  --tokens=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/tokens.txt \
  --model-type=nemo_transducer \
  ./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/test_wavs/0.wav

The results are summarized below:

Number of threads

1

2

3

4

RTF on Cortex A76 CPU

0.220

0.142

0.118

0.088

sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19 (Russian, 俄语)

This model is converted from

You can find the conversion script at

In the following, we describe how to download it and use it with sherpa-onnx.

Download the model

Please use the following commands to download it.

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19.tar.bz2
tar xvf sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19.tar.bz2
rm sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19.tar.bz2

You should see something like below after downloading:

ls -lh sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19

total 231M
-rw-r--r-- 1 501 staff 3.2M Apr 20 01:58 decoder.onnx
-rw-r--r-- 1 501 staff 226M Apr 20 01:59 encoder.int8.onnx
-rw-r--r-- 1 501 staff 1.4M Apr 20 01:58 joiner.onnx
-rw-r--r-- 1 501 staff 219K Apr 20 01:59 LICENSE
-rw-r--r-- 1 501 staff  302 Apr 20 01:59 README.md
-rwxr-xr-x 1 501 staff  868 Apr 20 01:51 run-rnnt-v2.sh
-rwxr-xr-x 1 501 staff 8.9K Apr 20 01:59 test-onnx-rnnt.py
drwxr-xr-x 2 501 staff 4.0K Apr 21 09:35 test_wavs
-rw-r--r-- 1 501 staff  196 Apr 20 01:58 tokens.txt

Decode wave files

Hint

It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-offline \
  --encoder=./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/encoder.int8.onnx \
  --decoder=./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/decoder.onnx \
  --joiner=./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/joiner.onnx \
  --tokens=./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/tokens.txt \
  --model-type=nemo_transducer \
  ./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/test_wavs/example.wav

Note

Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.

Caution

If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

You should see the following output:

/project/sherpa-onnx/csrc/parse-options.cc:Read:375 sherpa-onnx-offline --encoder=./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/encoder.int8.onnx --decoder=./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/decoder.onnx --joiner=./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/joiner.onnx --tokens=./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/tokens.txt --model-type=nemo_transducer ./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/test_wavs/example.wav

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/encoder.int8.onnx", decoder_filename="./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/decoder.onnx", joiner_filename="./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/joiner.onnx"), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), telespeech_ctc="", tokens="./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type="nemo_transducer", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="")
Creating recognizer ...
Started
Done!

./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/test_wavs/example.wav
{"lang": "", "emotion": "", "event": "", "text": "ничьих не требуя похвал счастлив уж я надеждой сладкой что дева с трепетом любви посмотрит может быть украдкой на песни грешные мои у лукоморья дуб зеленый", "timestamps": [0.04, 0.12, 0.16, 0.24, 0.32, 0.40, 0.44, 0.52, 0.56, 0.60, 0.64, 0.72, 0.76, 0.80, 0.88, 0.96, 1.04, 1.12, 1.16, 1.24, 1.32, 1.36, 1.44, 1.56, 1.76, 1.84, 1.88, 1.96, 2.00, 2.04, 2.08, 2.16, 2.24, 2.28, 2.36, 2.40, 2.48, 2.60, 2.68, 2.72, 2.76, 2.84, 2.92, 2.96, 3.04, 3.08, 3.16, 3.20, 3.24, 3.32, 3.36, 3.44, 3.52, 3.56, 3.64, 3.68, 3.72, 3.76, 3.80, 3.88, 3.92, 4.00, 4.08, 4.16, 4.20, 4.24, 4.28, 4.32, 4.36, 4.44, 4.52, 4.56, 4.64, 4.68, 4.76, 4.80, 4.88, 4.92, 5.00, 5.08, 5.16, 5.36, 5.44, 5.52, 5.60, 5.68, 5.72, 5.76, 5.84, 5.92, 6.00, 6.04, 6.12, 6.16, 6.20, 6.24, 6.28, 6.32, 6.40, 6.44, 6.48, 6.52, 6.56, 6.64, 6.72, 6.76, 6.84, 6.92, 7.00, 7.04, 7.12, 7.16, 7.20, 7.24, 7.32, 7.36, 7.40, 7.48, 7.60, 7.64, 7.72, 7.76, 7.84, 7.88, 8.00, 8.08, 8.16, 8.24, 8.28, 8.32, 8.44, 8.76, 9.24, 9.32, 9.40, 9.44, 9.52, 9.60, 9.68, 9.76, 9.84, 9.92, 10.00, 10.08, 10.12, 10.24, 10.32, 10.44, 10.52, 10.56, 10.60, 10.68, 10.72, 10.84, 10.92], "tokens":["н", "и", "ч", "ь", "и", "х", " ", "н", "е", " ", "т", "р", "е", "б", "у", "я", " ", "п", "о", "х", "в", "а", "л", " ", "с", "ч", "а", "с", "т", "л", "и", "в", " ", "у", "ж", " ", "я", " ", "н", "а", "д", "е", "ж", "д", "о", "й", " ", "с", "л", "а", "д", "к", "о", "й", " ", "ч", "т", "о", " ", "д", "е", "в", "а", " ", "с", " ", "т", "р", "е", "п", "е", "т", "о", "м", " ", "л", "ю", "б", "в", "и", " ", "п", "о", "с", "м", "о", "т", "р", "и", "т", " ", "м", "о", "ж", "е", "т", " ", "б", "ы", "т", "ь", " ", "у", "к", "р", "а", "д", "к", "о", "й", " ", "н", "а", " ", "п", "е", "с", "н", "и", " ", "г", "р", "е", "ш", "н", "ы", "е", " ", "м", "о", "и", " ", "у", " ", "л", "у", "к", "о", "м", "о", "р", "ь", "я", " ", "д", "у", "б", " ", "з", "е", "л", "е", "н", "ы", "й"], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 4.317 s
Real time factor (RTF): 4.317 / 11.290 = 0.382

Speech recognition from a microphone

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-microphone-offline \
  --encoder=./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/encoder.int8.onnx \
  --decoder=./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/decoder.onnx \
  --joiner=./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/joiner.onnx \
  --tokens=./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/tokens.txt \
  --model-type=nemo_transducer

Speech recognition from a microphone with VAD

cd /path/to/sherpa-onnx

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx

./build/bin/sherpa-onnx-vad-microphone-offline-asr \
  --silero-vad-model=./silero_vad.onnx \
  --encoder=./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/encoder.int8.onnx \
  --decoder=./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/decoder.onnx \
  --joiner=./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/joiner.onnx \
  --tokens=./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/tokens.txt \
  --model-type=nemo_transducer

sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24 (Russian, 俄语)

This model is converted from

You can find the conversion script at

Warning

The license of the model can be found at https://github.com/salute-developers/GigaAM/blob/main/GigaAM%20License_NC.pdf.

It is for non-commercial use only.

In the following, we describe how to download it and use it with sherpa-onnx.

Download the model

Please use the following commands to download it.

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24.tar.bz2
tar xvf sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24.tar.bz2
rm sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24.tar.bz2

You should see something like below after downloading:

ls -lh sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/
total 548472
-rw-r--r--  1 fangjun  staff    89K Oct 25 13:36 GigaAM%20License_NC.pdf
-rw-r--r--  1 fangjun  staff   318B Oct 25 13:37 README.md
-rw-r--r--  1 fangjun  staff   3.8M Oct 25 13:36 decoder.onnx
-rw-r--r--  1 fangjun  staff   262M Oct 25 13:37 encoder.int8.onnx
-rw-r--r--  1 fangjun  staff   3.8K Oct 25 13:32 export-onnx-rnnt.py
-rw-r--r--  1 fangjun  staff   2.0M Oct 25 13:36 joiner.onnx
-rwxr-xr-x  1 fangjun  staff   2.0K Oct 25 13:32 run-rnnt.sh
-rwxr-xr-x  1 fangjun  staff   8.7K Oct 25 13:32 test-onnx-rnnt.py
drwxr-xr-x  4 fangjun  staff   128B Oct 25 13:37 test_wavs
-rw-r--r--  1 fangjun  staff   5.8K Oct 25 13:36 tokens.txt

Decode wave files

Hint

It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-offline \
  --encoder=./sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/encoder.int8.onnx \
  --decoder=./sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/decoder.onnx \
  --joiner=./sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/joiner.onnx \
  --tokens=./sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/tokens.txt \
  --model-type=nemo_transducer \
  ./sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/test_wavs/example.wav

Note

Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.

Caution

If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

You should see the following output:

/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:375 ./build/bin/sherpa-onnx-offline --encoder=./sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/encoder.int8.onnx --decoder=./sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/decoder.onnx --joiner=./sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/joiner.onnx --tokens=./sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/tokens.txt --model-type=nemo_transducer ./sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/test_wavs/example.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="./sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/encoder.int8.onnx", decoder_filename="./sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/decoder.onnx", joiner_filename="./sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/joiner.onnx"), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), telespeech_ctc="", tokens="./sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type="nemo_transducer", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="")
Creating recognizer ...
Started
Done!

./sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/test_wavs/example.wav
{"lang": "", "emotion": "", "event": "", "text": " ничьих не требуя похвал счастлив уж я надеждой сладкой что дева с трепетом любви посмотрит может быть украдкой на песни грешные мои у лукоморья дуб зеленый", "timestamps": [0.04, 0.16, 0.24, 0.28, 0.40, 0.48, 0.60, 0.68, 0.80, 0.92, 1.04, 1.20, 1.28, 1.44, 1.76, 1.88, 2.00, 2.08, 2.16, 2.28, 2.36, 2.44, 2.64, 2.76, 2.92, 3.00, 3.04, 3.16, 3.24, 3.36, 3.48, 3.56, 3.68, 3.88, 4.04, 4.16, 4.24, 4.32, 4.40, 4.56, 4.76, 4.88, 4.92, 5.36, 5.64, 5.84, 5.92, 6.04, 6.32, 6.52, 6.60, 6.72, 6.84, 6.92, 7.04, 7.16, 7.28, 7.36, 7.44, 7.56, 7.68, 7.72, 7.88, 8.00, 8.20, 8.36, 9.28, 9.40, 9.44, 9.52, 9.68, 9.84, 9.88, 9.92, 10.12, 10.32, 10.40, 10.52, 10.56, 10.76, 10.84], "tokens":[" ни", "ч", "ь", "и", "х", " не", " т", "ре", "бу", "я", " по", "х", "ва", "л", " с", "ча", "ст", "ли", "в", " у", "ж", " я", " на", "де", "ж", "до", "й", " с", "ла", "д", "ко", "й", " что", " де", "ва", " с", " т", "ре", "пе", "том", " лю", "б", "ви", " пос", "мот", "ри", "т", " может", " быть", " у", "к", "ра", "д", "ко", "й", " на", " п", "е", "с", "ни", " г", "ре", "ш", "ные", " мо", "и", " у", " ", "лу", "ко", "мо", "р", "ь", "я", " ду", "б", " з", "е", "лен", "ы", "й"], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 1.775 s
Real time factor (RTF): 1.775 / 11.290 = 0.157

Speech recognition from a microphone

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-microphone-offline \
  --encoder=./sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/encoder.int8.onnx \
  --decoder=./sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/decoder.onnx \
  --joiner=./sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/joiner.onnx \
  --tokens=./sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/tokens.txt \
  --model-type=nemo_transducer

Speech recognition from a microphone with VAD

cd /path/to/sherpa-onnx

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx

./build/bin/sherpa-onnx-vad-microphone-offline-asr \
  --silero-vad-model=./silero_vad.onnx \
  --encoder=./sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/encoder.int8.onnx \
  --decoder=./sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/decoder.onnx \
  --joiner=./sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/joiner.onnx \
  --tokens=./sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/tokens.txt \
  --model-type=nemo_transducer