Non-streaming Canary models

sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8 (English + Spanish + German + French, 英语+西班牙语+德语+法语)

This model is converted from https://huggingface.co/nvidia/canary-180m-flash.

As described in its huggingface model repo:

It supports automatic speech-to-text recognition (ASR) in 4 languages
(English, German, French, Spanish) and translation from English to
German/French/Spanish and from German/French/Spanish to English with or
without punctuation and capitalization (PnC).

You can find the conversion script at

https://github.com/k2-fsa/sherpa-onnx/tree/master/scripts/nemo/canary

In the following, we describe how to download it and use it with sherpa-onnx.

Download the model

Please use the following commands to download it.

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8.tar.bz2
tar xvf sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8.tar.bz2
rm sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8.tar.bz2

Hint

If you want to try the non-quantized model, please use sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr.tar.bz2

You should see something like below after downloading:

ls -lh sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/
total 428208
-rw-r--r--  1 fangjun  staff    71M Jul  7 16:03 decoder.int8.onnx
-rw-r--r--  1 fangjun  staff   127M Jul  7 16:03 encoder.int8.onnx
drwxr-xr-x  4 fangjun  staff   128B Jul  7 16:03 test_wavs
-rw-r--r--  1 fangjun  staff    52K Jul  7 16:03 tokens.txt

Decode wave files

Hint

It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.

Input English, output English

We use --canary-src-lang=en to indicate that the input audio contains English speech. --canary-tgt-lang=en means the recognition result should be in English.

Hint

If the input audio is English, we can select whether to output English, French, German, or Spanish.

If the input audio is German, we can select whether to output English or German.

If the input audio is French, we can select whether to output English or French.

If the input audio is Spanish, we can select whether to output English or Spanish.

Warning

--src-lang and --tgt-lang have the same default value en.

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-offline \
  --canary-encoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/encoder.int8.onnx \
  --canary-decoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/decoder.int8.onnx \
  --tokens=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/tokens.txt \
  --canary-src-lang=en \
  --canary-tgt-lang=en \
  ./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/test_wavs/en.wav

Note

Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.

You should see the following output:

/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --canary-encoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/encoder.int8.onnx --canary-decoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/decoder.int8.onnx --tokens=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/tokens.txt --canary-src-lang=en --canary-tgt-lang=en ./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/test_wavs/en.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/encoder.int8.onnx", decoder="./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/decoder.int8.onnx", src_lang="en", tgt_lang="en", use_pnc=True), telespeech_ctc="", tokens="./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(dict_dir="", lexicon="", rule_fsts=""))
Creating recognizer ...
Started
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:164 Creating a resampler:
   in_sample_rate: 24000
   output_sample_rate: 16000

Done!

./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/test_wavs/en.wav
{"lang": "", "emotion": "", "event": "", "text": " Ask not what your country can do for you. Ask what you can do for your country.", "timestamps": [], "tokens":[" A", "s", "k", " not", " wh", "at", " y", "our", " co", "un", "tr", "y", " can", " do", " for", " you", ".", " A", "s", "k", " wh", "at", " you", " can", " do", " for", " y", "our", " co", "un", "tr", "y", "."], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 0.601 s
Real time factor (RTF): 0.601 / 3.845 = 0.156

Input English, output German

We use --canary-src-lang=en to indicate that the input audio contains English speech. --canary-tgt-lang=de means the recognition result should be in German.

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-offline \
  --canary-encoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/encoder.int8.onnx \
  --canary-decoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/decoder.int8.onnx \
  --tokens=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/tokens.txt \
  --canary-src-lang=en \
  --canary-tgt-lang=de \
  ./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/test_wavs/en.wav

Note

Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.

You should see the following output:

/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --canary-encoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/encoder.int8.onnx --canary-decoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/decoder.int8.onnx --tokens=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/tokens.txt --canary-src-lang=en --canary-tgt-lang=de ./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/test_wavs/en.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/encoder.int8.onnx", decoder="./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/decoder.int8.onnx", src_lang="en", tgt_lang="de", use_pnc=True), telespeech_ctc="", tokens="./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(dict_dir="", lexicon="", rule_fsts=""))
Creating recognizer ...
Started
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:164 Creating a resampler:
   in_sample_rate: 24000
   output_sample_rate: 16000

Done!

./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/test_wavs/en.wav
{"lang": "", "emotion": "", "event": "", "text": " Fragen Sie nicht, was Ihr Land für Sie tun kann. Fragen Sie, was Sie für Ihr Land tun können.", "timestamps": [], "tokens":[" F", "ra", "gen", " Sie", " n", "icht", ",", " was", " I", "hr", " L", "and", " für", " Sie", " t", "un", " k", "ann", ".", " F", "ra", "gen", " Sie", ",", " was", " Sie", " für", " I", "hr", " L", "and", " t", "un", " kön", "nen", "."], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 0.659 s
Real time factor (RTF): 0.659 / 3.845 = 0.171

Input English, output Spanish

We use --canary-src-lang=en to indicate that the input audio contains English speech. --canary-tgt-lang=es means the recognition result should be in Spanish.

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-offline \
  --canary-encoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/encoder.int8.onnx \
  --canary-decoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/decoder.int8.onnx \
  --tokens=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/tokens.txt \
  --canary-src-lang=en \
  --canary-tgt-lang=es \
  ./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/test_wavs/en.wav

Note

Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.

You should see the following output:

/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --canary-encoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/encoder.int8.onnx --canary-decoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/decoder.int8.onnx --tokens=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/tokens.txt --canary-src-lang=en --canary-tgt-lang=es ./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/test_wavs/en.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/encoder.int8.onnx", decoder="./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/decoder.int8.onnx", src_lang="en", tgt_lang="es", use_pnc=True), telespeech_ctc="", tokens="./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(dict_dir="", lexicon="", rule_fsts=""))
Creating recognizer ...
Started
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:164 Creating a resampler:
   in_sample_rate: 24000
   output_sample_rate: 16000

Done!

./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/test_wavs/en.wav
{"lang": "", "emotion": "", "event": "", "text": " No preguntes qué puede hacer tu país por ti. Pregúntale qué puedes hacer por tu país.", "timestamps": [], "tokens":[" No", " pre", "g", "unt", "es", " qué", " pue", "de", " ha", "cer", " tu", " pa", "ís", " por", " ti", ".", " P", "re", "g", "ún", "t", "al", "e", " qué", " pue", "d", "es", " ha", "cer", " por", " tu", " pa", "ís", "."], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 0.602 s
Real time factor (RTF): 0.602 / 3.845 = 0.157

Input English, output French

We use --canary-src-lang=en to indicate that the input audio contains English speech. --canary-tgt-lang=fr means the recognition result should be in French.

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-offline \
  --canary-encoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/encoder.int8.onnx \
  --canary-decoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/decoder.int8.onnx \
  --tokens=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/tokens.txt \
  --canary-src-lang=en \
  --canary-tgt-lang=fr \
  ./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/test_wavs/en.wav

Note

Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.

You should see the following output:

/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --canary-encoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/encoder.int8.onnx --canary-decoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/decoder.int8.onnx --tokens=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/tokens.txt --canary-src-lang=en --canary-tgt-lang=fr ./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/test_wavs/en.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/encoder.int8.onnx", decoder="./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/decoder.int8.onnx", src_lang="en", tgt_lang="fr", use_pnc=True), telespeech_ctc="", tokens="./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(dict_dir="", lexicon="", rule_fsts=""))
Creating recognizer ...
Started
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:164 Creating a resampler:
   in_sample_rate: 24000
   output_sample_rate: 16000

Done!

./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/test_wavs/en.wav
{"lang": "", "emotion": "", "event": "", "text": " Demandez-vous ce que votre pays peut faire pour vous. Demandez ce que vous pouvez faire pour votre pays.", "timestamps": [], "tokens":[" D", "em", "and", "ez", "-", "v", "ous", " ce", " que", " vot", "re", " p", "ays", " pe", "ut", " f", "aire", " p", "our", " v", "ous", ".", " D", "em", "and", "ez", " ce", " que", " v", "ous", " p", "ouve", "z", " f", "aire", " p", "our", " vot", "re", " p", "ays", "."], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 0.692 s
Real time factor (RTF): 0.692 / 3.845 = 0.180

Input German, output English

We use --canary-src-lang=de to indicate that the input audio contains German speech. --canary-tgt-lang=en means the recognition result should be in English.

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-offline \
  --canary-encoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/encoder.int8.onnx \
  --canary-decoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/decoder.int8.onnx \
  --tokens=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/tokens.txt \
  --canary-src-lang=de \
  --canary-tgt-lang=en \
  ./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/test_wavs/de.wav

Note

Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.

You should see the following output:

/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --canary-encoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/encoder.int8.onnx --canary-decoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/decoder.int8.onnx --tokens=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/tokens.txt --canary-src-lang=de --canary-tgt-lang=en ./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/test_wavs/de.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/encoder.int8.onnx", decoder="./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/decoder.int8.onnx", src_lang="de", tgt_lang="en", use_pnc=True), telespeech_ctc="", tokens="./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(dict_dir="", lexicon="", rule_fsts=""))
Creating recognizer ...
Started
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:164 Creating a resampler:
   in_sample_rate: 22050
   output_sample_rate: 16000

Done!

./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/test_wavs/de.wav
{"lang": "", "emotion": "", "event": "", "text": " Everything has an end. Only the sausage has two", "timestamps": [], "tokens":[" E", "ver", "y", "th", "ing", " has", " an", " en", "d", ".", " O", "n", "ly", " the", " sa", "us", "age", " has", " two"], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 0.351 s
Real time factor (RTF): 0.351 / 2.752 = 0.128

Input German, output German

We use --canary-src-lang=de to indicate that the input audio contains German speech. --canary-tgt-lang=de means the recognition result should be in German.

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-offline \
  --canary-encoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/encoder.int8.onnx \
  --canary-decoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/decoder.int8.onnx \
  --tokens=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/tokens.txt \
  --canary-src-lang=de \
  --canary-tgt-lang=de \
  ./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/test_wavs/de.wav

Note

Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.

You should see the following output:

/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --canary-encoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/encoder.int8.onnx --canary-decoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/decoder.int8.onnx --tokens=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/tokens.txt --canary-src-lang=de --canary-tgt-lang=de ./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/test_wavs/de.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/encoder.int8.onnx", decoder="./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/decoder.int8.onnx", src_lang="de", tgt_lang="de", use_pnc=True), telespeech_ctc="", tokens="./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(dict_dir="", lexicon="", rule_fsts=""))
Creating recognizer ...
Started
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:164 Creating a resampler:
   in_sample_rate: 22050
   output_sample_rate: 16000

Done!

./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/test_wavs/de.wav
{"lang": "", "emotion": "", "event": "", "text": " Alles hat ein Ende, nur die Wurst hat zwei.", "timestamps": [], "tokens":[" All", "es", " hat", " ein", " E", "nd", "e", ",", " nur", " die", " W", "ur", "st", " hat", " zwe", "i", "."], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 0.338 s
Real time factor (RTF): 0.338 / 2.752 = 0.123

Python API examples

Please see https://github.com/k2-fsa/sherpa-onnx/blob/master/python-api-examples/offline-nemo-canary-decode-files.py