NeMo transducer-based Models
Hint
Please refer to Installation to install sherpa-onnx before you read this section.
sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8 (English, 英语)
This model is converted from
You can find the conversion script at
In the following, we describe how to download it and use it with sherpa-onnx.
Hint
This model supports punctuations and cases.
Download the model
Please use the following commands to download it.
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8.tar.bz2
tar xvf sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8.tar.bz2
rm sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8.tar.bz2
Hint
If you want to try float16
quantized model, please use sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-fp16.tar.bz2.
If you want to try non-quantized
decoder and joiner models, please use sherpa-onnx-nemo-parakeet-tdt-0.6b-v2.tar.bz2
You should see something like below after downloading:
ls -lh sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/
total 1295752
-rw-r--r-- 1 fangjun staff 6.9M May 6 16:24 decoder.int8.onnx
-rw-r--r-- 1 fangjun staff 622M May 6 16:24 encoder.int8.onnx
-rw-r--r-- 1 fangjun staff 1.7M May 6 16:24 joiner.int8.onnx
drwxr-xr-x 3 fangjun staff 96B May 6 16:24 test_wavs
-rw-r--r-- 1 fangjun staff 9.2K May 6 16:24 tokens.txt
Decode wave files
Hint
It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-offline \
--encoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/encoder.int8.onnx \
--decoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/decoder.int8.onnx \
--joiner=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/joiner.int8.onnx \
--tokens=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/tokens.txt \
--model-type=nemo_transducer \
./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/test_wavs/0.wav
Note
Please use ./build/bin/Release/sherpa-onnx-offline.exe
for Windows.
You should see the following output:
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --encoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/encoder.int8.onnx --decoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/decoder.int8.onnx --joiner=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/joiner.int8.onnx --tokens=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/tokens.txt --model-type=nemo_transducer ./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/test_wavs/0.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/encoder.int8.onnx", decoder_filename="./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/decoder.int8.onnx", joiner_filename="./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/joiner.int8.onnx"), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), telespeech_ctc="", tokens="./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type="nemo_transducer", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(dict_dir="", lexicon="", rule_fsts=""))
Creating recognizer ...
Started
Done!
./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/test_wavs/0.wav
{"lang": "", "emotion": "", "event": "", "text": " Well, I don't wish to see it any more, observed Phebe, turning away her eyes. It is certainly very like the old portrait.", "timestamps": [0.32, 0.64, 0.72, 0.80, 0.88, 0.96, 1.04, 1.12, 1.28, 1.44, 1.60, 1.76, 1.92, 2.00, 2.24, 2.32, 2.40, 2.48, 2.64, 2.72, 2.88, 3.12, 3.36, 3.44, 3.52, 3.68, 3.76, 3.92, 4.16, 4.24, 4.32, 4.64, 4.96, 5.12, 5.36, 5.44, 5.52, 5.60, 5.76, 6.00, 6.24, 6.40, 6.48, 6.64, 6.72, 6.80, 6.88, 7.04], "tokens":[" Well", ",", " I", " don", "'", "t", " w", "ish", " to", " see", " it", " any", " more", ",", " ob", "s", "er", "ved", " P", "he", "be", ",", " t", "ur", "ning", " a", "way", " her", " e", "y", "es", ".", " It", " is", " c", "ert", "ain", "ly", " very", " like", " the", " o", "ld", " p", "ort", "ra", "it", "."], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 0.874 s
Real time factor (RTF): 0.874 / 7.435 = 0.118
Speech recognition from a microphone
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-microphone-offline \
--encoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/encoder.int8.onnx \
--decoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/decoder.int8.onnx \
--joiner=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/joiner.int8.onnx \
--tokens=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/tokens.txt \
--model-type=nemo_transducer
Speech recognition from a microphone with VAD
cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
./build/bin/sherpa-onnx-vad-microphone-offline-asr \
--silero-vad-model=./silero_vad.onnx \
--encoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/encoder.int8.onnx \
--decoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/decoder.int8.onnx \
--joiner=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/joiner.int8.onnx \
--tokens=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/tokens.txt \
--model-type=nemo_transducer
RTF on RK3588 with Cortex A76 CPU
In the following, we test this model on RK3588 with Cortex A76 CPU.
Information about the CPUs on the board is given below:
Architecture: aarch64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Vendor ID: ARM
Model name: Cortex-A55
Model: 0
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
Stepping: r2p0
CPU max MHz: 1800.0000
CPU min MHz: 408.0000
BogoMIPS: 48.00
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
Model name: Cortex-A76
Model: 0
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
Stepping: r4p0
CPU max MHz: 2304.0000
CPU min MHz: 408.0000
BogoMIPS: 48.00
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
L1d cache: 384 KiB (8 instances)
L1i cache: 384 KiB (8 instances)
L2 cache: 2.5 MiB (8 instances)
L3 cache: 3 MiB (1 instance)
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; __user pointer sanitization
Vulnerability Spectre v2: Vulnerable: Unprivileged eBPF enabled
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
You can see that it has 8 CPUs: 4 Cortex A55 + 4 Cortex A76.
We use taskset
below to test the RTF on Cortex A76.
taskset 0x80 sherpa-onnx-offline \
--num-threads=1 \
--encoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/encoder.int8.onnx \
--decoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/decoder.int8.onnx \
--joiner=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/joiner.int8.onnx \
--tokens=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/tokens.txt \
--model-type=nemo_transducer \
./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/test_wavs/0.wav
Its output is given below:
/project/sherpa-onnx/csrc/parse-options.cc:Read:372 sherpa-onnx-offline --num-threads=1 --encoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/encoder.int8.onnx --decoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/decoder.int8.onnx --joiner=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/joiner.int8.onnx --tokens=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/tokens.txt --model-type=nemo_transducer ./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/test_wavs/0.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/encoder.int8.onnx", decoder_filename="./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/decoder.int8.onnx", joiner_filename="./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/joiner.int8.onnx"), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), telespeech_ctc="", tokens="./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="nemo_transducer", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(dict_dir="", lexicon="", rule_fsts=""))
Creating recognizer ...
Started
Done!
./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/test_wavs/0.wav
{"lang": "", "emotion": "", "event": "", "text": " Well, I don't wish to see it any more, observed Phebe, turning away her eyes. It is certainly very like the old portrait.", "timestamps": [0.32, 0.64, 0.72, 0.80, 0.88, 0.96, 1.04, 1.12, 1.28, 1.44, 1.60, 1.76, 1.92, 2.00, 2.24, 2.32, 2.40, 2.48, 2.64, 2.72, 2.88, 3.12, 3.36, 3.44, 3.52, 3.68, 3.76, 3.92, 4.16, 4.24, 4.32, 4.64, 4.96, 5.12, 5.36, 5.44, 5.52, 5.60, 5.76, 6.00, 6.24, 6.40, 6.48, 6.64, 6.72, 6.80, 6.88, 7.04], "tokens":[" Well", ",", " I", " don", "'", "t", " w", "ish", " to", " see", " it", " any", " more", ",", " ob", "s", "er", "ved", " P", "he", "be", ",", " t", "ur", "ning", " a", "way", " her", " e", "y", "es", ".", " It", " is", " c", "ert", "ain", "ly", " very", " like", " the", " o", "ld", " p", "ort", "ra", "it", "."], "words": []}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 1.639 s
Real time factor (RTF): 1.639 / 7.435 = 0.220
To test the RTF with different --num-threads
, we use:
taskset 0xc0 sherpa-onnx-offline \
--num-threads=2 \
--encoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/encoder.int8.onnx \
--decoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/decoder.int8.onnx \
--joiner=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/joiner.int8.onnx \
--tokens=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/tokens.txt \
--model-type=nemo_transducer \
./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/test_wavs/0.wav
taskset 0xe0 sherpa-onnx-offline \
--num-threads=3 \
--encoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/encoder.int8.onnx \
--decoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/decoder.int8.onnx \
--joiner=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/joiner.int8.onnx \
--tokens=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/tokens.txt \
--model-type=nemo_transducer \
./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/test_wavs/0.wav
taskset 0xf0 sherpa-onnx-offline \
--num-threads=4 \
--encoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/encoder.int8.onnx \
--decoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/decoder.int8.onnx \
--joiner=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/joiner.int8.onnx \
--tokens=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/tokens.txt \
--model-type=nemo_transducer \
./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/test_wavs/0.wav
The results are summarized below:
Number of threads |
1 |
2 |
3 |
4 |
RTF on Cortex A76 CPU |
0.220 |
0.142 |
0.118 |
0.088 |
sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19 (Russian, 俄语)
This model is converted from
You can find the conversion script at
In the following, we describe how to download it and use it with sherpa-onnx.
Download the model
Please use the following commands to download it.
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19.tar.bz2
tar xvf sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19.tar.bz2
rm sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19.tar.bz2
You should see something like below after downloading:
ls -lh sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19
total 231M
-rw-r--r-- 1 501 staff 3.2M Apr 20 01:58 decoder.onnx
-rw-r--r-- 1 501 staff 226M Apr 20 01:59 encoder.int8.onnx
-rw-r--r-- 1 501 staff 1.4M Apr 20 01:58 joiner.onnx
-rw-r--r-- 1 501 staff 219K Apr 20 01:59 LICENSE
-rw-r--r-- 1 501 staff 302 Apr 20 01:59 README.md
-rwxr-xr-x 1 501 staff 868 Apr 20 01:51 run-rnnt-v2.sh
-rwxr-xr-x 1 501 staff 8.9K Apr 20 01:59 test-onnx-rnnt.py
drwxr-xr-x 2 501 staff 4.0K Apr 21 09:35 test_wavs
-rw-r--r-- 1 501 staff 196 Apr 20 01:58 tokens.txt
Decode wave files
Hint
It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-offline \
--encoder=./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/encoder.int8.onnx \
--decoder=./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/decoder.onnx \
--joiner=./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/joiner.onnx \
--tokens=./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/tokens.txt \
--model-type=nemo_transducer \
./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/test_wavs/example.wav
Note
Please use ./build/bin/Release/sherpa-onnx-offline.exe
for Windows.
Caution
If you use Windows and get encoding issues, please run:
CHCP 65001
in your commandline.
You should see the following output:
/project/sherpa-onnx/csrc/parse-options.cc:Read:375 sherpa-onnx-offline --encoder=./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/encoder.int8.onnx --decoder=./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/decoder.onnx --joiner=./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/joiner.onnx --tokens=./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/tokens.txt --model-type=nemo_transducer ./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/test_wavs/example.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/encoder.int8.onnx", decoder_filename="./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/decoder.onnx", joiner_filename="./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/joiner.onnx"), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), telespeech_ctc="", tokens="./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type="nemo_transducer", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="")
Creating recognizer ...
Started
Done!
./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/test_wavs/example.wav
{"lang": "", "emotion": "", "event": "", "text": "ничьих не требуя похвал счастлив уж я надеждой сладкой что дева с трепетом любви посмотрит может быть украдкой на песни грешные мои у лукоморья дуб зеленый", "timestamps": [0.04, 0.12, 0.16, 0.24, 0.32, 0.40, 0.44, 0.52, 0.56, 0.60, 0.64, 0.72, 0.76, 0.80, 0.88, 0.96, 1.04, 1.12, 1.16, 1.24, 1.32, 1.36, 1.44, 1.56, 1.76, 1.84, 1.88, 1.96, 2.00, 2.04, 2.08, 2.16, 2.24, 2.28, 2.36, 2.40, 2.48, 2.60, 2.68, 2.72, 2.76, 2.84, 2.92, 2.96, 3.04, 3.08, 3.16, 3.20, 3.24, 3.32, 3.36, 3.44, 3.52, 3.56, 3.64, 3.68, 3.72, 3.76, 3.80, 3.88, 3.92, 4.00, 4.08, 4.16, 4.20, 4.24, 4.28, 4.32, 4.36, 4.44, 4.52, 4.56, 4.64, 4.68, 4.76, 4.80, 4.88, 4.92, 5.00, 5.08, 5.16, 5.36, 5.44, 5.52, 5.60, 5.68, 5.72, 5.76, 5.84, 5.92, 6.00, 6.04, 6.12, 6.16, 6.20, 6.24, 6.28, 6.32, 6.40, 6.44, 6.48, 6.52, 6.56, 6.64, 6.72, 6.76, 6.84, 6.92, 7.00, 7.04, 7.12, 7.16, 7.20, 7.24, 7.32, 7.36, 7.40, 7.48, 7.60, 7.64, 7.72, 7.76, 7.84, 7.88, 8.00, 8.08, 8.16, 8.24, 8.28, 8.32, 8.44, 8.76, 9.24, 9.32, 9.40, 9.44, 9.52, 9.60, 9.68, 9.76, 9.84, 9.92, 10.00, 10.08, 10.12, 10.24, 10.32, 10.44, 10.52, 10.56, 10.60, 10.68, 10.72, 10.84, 10.92], "tokens":["н", "и", "ч", "ь", "и", "х", " ", "н", "е", " ", "т", "р", "е", "б", "у", "я", " ", "п", "о", "х", "в", "а", "л", " ", "с", "ч", "а", "с", "т", "л", "и", "в", " ", "у", "ж", " ", "я", " ", "н", "а", "д", "е", "ж", "д", "о", "й", " ", "с", "л", "а", "д", "к", "о", "й", " ", "ч", "т", "о", " ", "д", "е", "в", "а", " ", "с", " ", "т", "р", "е", "п", "е", "т", "о", "м", " ", "л", "ю", "б", "в", "и", " ", "п", "о", "с", "м", "о", "т", "р", "и", "т", " ", "м", "о", "ж", "е", "т", " ", "б", "ы", "т", "ь", " ", "у", "к", "р", "а", "д", "к", "о", "й", " ", "н", "а", " ", "п", "е", "с", "н", "и", " ", "г", "р", "е", "ш", "н", "ы", "е", " ", "м", "о", "и", " ", "у", " ", "л", "у", "к", "о", "м", "о", "р", "ь", "я", " ", "д", "у", "б", " ", "з", "е", "л", "е", "н", "ы", "й"], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 4.317 s
Real time factor (RTF): 4.317 / 11.290 = 0.382
Speech recognition from a microphone
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-microphone-offline \
--encoder=./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/encoder.int8.onnx \
--decoder=./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/decoder.onnx \
--joiner=./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/joiner.onnx \
--tokens=./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/tokens.txt \
--model-type=nemo_transducer
Speech recognition from a microphone with VAD
cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
./build/bin/sherpa-onnx-vad-microphone-offline-asr \
--silero-vad-model=./silero_vad.onnx \
--encoder=./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/encoder.int8.onnx \
--decoder=./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/decoder.onnx \
--joiner=./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/joiner.onnx \
--tokens=./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/tokens.txt \
--model-type=nemo_transducer
sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24 (Russian, 俄语)
This model is converted from
You can find the conversion script at
Warning
The license of the model can be found at https://github.com/salute-developers/GigaAM/blob/main/GigaAM%20License_NC.pdf.
It is for non-commercial use only.
In the following, we describe how to download it and use it with sherpa-onnx.
Download the model
Please use the following commands to download it.
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24.tar.bz2
tar xvf sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24.tar.bz2
rm sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24.tar.bz2
You should see something like below after downloading:
ls -lh sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/
total 548472
-rw-r--r-- 1 fangjun staff 89K Oct 25 13:36 GigaAM%20License_NC.pdf
-rw-r--r-- 1 fangjun staff 318B Oct 25 13:37 README.md
-rw-r--r-- 1 fangjun staff 3.8M Oct 25 13:36 decoder.onnx
-rw-r--r-- 1 fangjun staff 262M Oct 25 13:37 encoder.int8.onnx
-rw-r--r-- 1 fangjun staff 3.8K Oct 25 13:32 export-onnx-rnnt.py
-rw-r--r-- 1 fangjun staff 2.0M Oct 25 13:36 joiner.onnx
-rwxr-xr-x 1 fangjun staff 2.0K Oct 25 13:32 run-rnnt.sh
-rwxr-xr-x 1 fangjun staff 8.7K Oct 25 13:32 test-onnx-rnnt.py
drwxr-xr-x 4 fangjun staff 128B Oct 25 13:37 test_wavs
-rw-r--r-- 1 fangjun staff 5.8K Oct 25 13:36 tokens.txt
Decode wave files
Hint
It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-offline \
--encoder=./sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/encoder.int8.onnx \
--decoder=./sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/decoder.onnx \
--joiner=./sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/joiner.onnx \
--tokens=./sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/tokens.txt \
--model-type=nemo_transducer \
./sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/test_wavs/example.wav
Note
Please use ./build/bin/Release/sherpa-onnx-offline.exe
for Windows.
Caution
If you use Windows and get encoding issues, please run:
CHCP 65001
in your commandline.
You should see the following output:
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:375 ./build/bin/sherpa-onnx-offline --encoder=./sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/encoder.int8.onnx --decoder=./sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/decoder.onnx --joiner=./sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/joiner.onnx --tokens=./sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/tokens.txt --model-type=nemo_transducer ./sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/test_wavs/example.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="./sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/encoder.int8.onnx", decoder_filename="./sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/decoder.onnx", joiner_filename="./sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/joiner.onnx"), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), telespeech_ctc="", tokens="./sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type="nemo_transducer", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="")
Creating recognizer ...
Started
Done!
./sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/test_wavs/example.wav
{"lang": "", "emotion": "", "event": "", "text": " ничьих не требуя похвал счастлив уж я надеждой сладкой что дева с трепетом любви посмотрит может быть украдкой на песни грешные мои у лукоморья дуб зеленый", "timestamps": [0.04, 0.16, 0.24, 0.28, 0.40, 0.48, 0.60, 0.68, 0.80, 0.92, 1.04, 1.20, 1.28, 1.44, 1.76, 1.88, 2.00, 2.08, 2.16, 2.28, 2.36, 2.44, 2.64, 2.76, 2.92, 3.00, 3.04, 3.16, 3.24, 3.36, 3.48, 3.56, 3.68, 3.88, 4.04, 4.16, 4.24, 4.32, 4.40, 4.56, 4.76, 4.88, 4.92, 5.36, 5.64, 5.84, 5.92, 6.04, 6.32, 6.52, 6.60, 6.72, 6.84, 6.92, 7.04, 7.16, 7.28, 7.36, 7.44, 7.56, 7.68, 7.72, 7.88, 8.00, 8.20, 8.36, 9.28, 9.40, 9.44, 9.52, 9.68, 9.84, 9.88, 9.92, 10.12, 10.32, 10.40, 10.52, 10.56, 10.76, 10.84], "tokens":[" ни", "ч", "ь", "и", "х", " не", " т", "ре", "бу", "я", " по", "х", "ва", "л", " с", "ча", "ст", "ли", "в", " у", "ж", " я", " на", "де", "ж", "до", "й", " с", "ла", "д", "ко", "й", " что", " де", "ва", " с", " т", "ре", "пе", "том", " лю", "б", "ви", " пос", "мот", "ри", "т", " может", " быть", " у", "к", "ра", "д", "ко", "й", " на", " п", "е", "с", "ни", " г", "ре", "ш", "ные", " мо", "и", " у", " ", "лу", "ко", "мо", "р", "ь", "я", " ду", "б", " з", "е", "лен", "ы", "й"], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 1.775 s
Real time factor (RTF): 1.775 / 11.290 = 0.157
Speech recognition from a microphone
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-microphone-offline \
--encoder=./sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/encoder.int8.onnx \
--decoder=./sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/decoder.onnx \
--joiner=./sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/joiner.onnx \
--tokens=./sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/tokens.txt \
--model-type=nemo_transducer
Speech recognition from a microphone with VAD
cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
./build/bin/sherpa-onnx-vad-microphone-offline-asr \
--silero-vad-model=./silero_vad.onnx \
--encoder=./sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/encoder.int8.onnx \
--decoder=./sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/decoder.onnx \
--joiner=./sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/joiner.onnx \
--tokens=./sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24/tokens.txt \
--model-type=nemo_transducer