Non-streaming Canary models
sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8 (English + Spanish + German + French, 英语+西班牙语+德语+法语)
This model is converted from https://huggingface.co/nvidia/canary-180m-flash.
As described in its huggingface model repo:
It supports automatic speech-to-text recognition (ASR) in 4 languages
(English, German, French, Spanish) and translation from English to
German/French/Spanish and from German/French/Spanish to English with or
without punctuation and capitalization (PnC).
You can find the conversion script at
In the following, we describe how to download it and use it with sherpa-onnx.
Download the model
Please use the following commands to download it.
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8.tar.bz2
tar xvf sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8.tar.bz2
rm sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8.tar.bz2
Hint
If you want to try the non-quantized model, please use sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr.tar.bz2
You should see something like below after downloading:
ls -lh sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/
total 428208
-rw-r--r-- 1 fangjun staff 71M Jul 7 16:03 decoder.int8.onnx
-rw-r--r-- 1 fangjun staff 127M Jul 7 16:03 encoder.int8.onnx
drwxr-xr-x 4 fangjun staff 128B Jul 7 16:03 test_wavs
-rw-r--r-- 1 fangjun staff 52K Jul 7 16:03 tokens.txt
Decode wave files
Hint
It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.
Input English, output English
We use --canary-src-lang=en
to indicate that the input audio contains
English speech. --canary-tgt-lang=en
means the recognition result should
be in English.
Hint
If the input audio is English, we can select whether to output English, French, German, or Spanish.
If the input audio is German, we can select whether to output English or German.
If the input audio is French, we can select whether to output English or French.
If the input audio is Spanish, we can select whether to output English or Spanish.
Warning
--src-lang
and --tgt-lang
have the same default value en
.
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-offline \
--canary-encoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/encoder.int8.onnx \
--canary-decoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/decoder.int8.onnx \
--tokens=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/tokens.txt \
--canary-src-lang=en \
--canary-tgt-lang=en \
./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/test_wavs/en.wav
Note
Please use ./build/bin/Release/sherpa-onnx-offline.exe
for Windows.
You should see the following output:
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --canary-encoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/encoder.int8.onnx --canary-decoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/decoder.int8.onnx --tokens=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/tokens.txt --canary-src-lang=en --canary-tgt-lang=en ./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/test_wavs/en.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/encoder.int8.onnx", decoder="./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/decoder.int8.onnx", src_lang="en", tgt_lang="en", use_pnc=True), telespeech_ctc="", tokens="./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(dict_dir="", lexicon="", rule_fsts=""))
Creating recognizer ...
Started
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:164 Creating a resampler:
in_sample_rate: 24000
output_sample_rate: 16000
Done!
./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/test_wavs/en.wav
{"lang": "", "emotion": "", "event": "", "text": " Ask not what your country can do for you. Ask what you can do for your country.", "timestamps": [], "tokens":[" A", "s", "k", " not", " wh", "at", " y", "our", " co", "un", "tr", "y", " can", " do", " for", " you", ".", " A", "s", "k", " wh", "at", " you", " can", " do", " for", " y", "our", " co", "un", "tr", "y", "."], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 0.601 s
Real time factor (RTF): 0.601 / 3.845 = 0.156
Input English, output German
We use --canary-src-lang=en
to indicate that the input audio contains
English speech. --canary-tgt-lang=de
means the recognition result should
be in German.
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-offline \
--canary-encoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/encoder.int8.onnx \
--canary-decoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/decoder.int8.onnx \
--tokens=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/tokens.txt \
--canary-src-lang=en \
--canary-tgt-lang=de \
./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/test_wavs/en.wav
Note
Please use ./build/bin/Release/sherpa-onnx-offline.exe
for Windows.
You should see the following output:
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --canary-encoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/encoder.int8.onnx --canary-decoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/decoder.int8.onnx --tokens=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/tokens.txt --canary-src-lang=en --canary-tgt-lang=de ./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/test_wavs/en.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/encoder.int8.onnx", decoder="./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/decoder.int8.onnx", src_lang="en", tgt_lang="de", use_pnc=True), telespeech_ctc="", tokens="./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(dict_dir="", lexicon="", rule_fsts=""))
Creating recognizer ...
Started
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:164 Creating a resampler:
in_sample_rate: 24000
output_sample_rate: 16000
Done!
./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/test_wavs/en.wav
{"lang": "", "emotion": "", "event": "", "text": " Fragen Sie nicht, was Ihr Land für Sie tun kann. Fragen Sie, was Sie für Ihr Land tun können.", "timestamps": [], "tokens":[" F", "ra", "gen", " Sie", " n", "icht", ",", " was", " I", "hr", " L", "and", " für", " Sie", " t", "un", " k", "ann", ".", " F", "ra", "gen", " Sie", ",", " was", " Sie", " für", " I", "hr", " L", "and", " t", "un", " kön", "nen", "."], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 0.659 s
Real time factor (RTF): 0.659 / 3.845 = 0.171
Input English, output Spanish
We use --canary-src-lang=en
to indicate that the input audio contains
English speech. --canary-tgt-lang=es
means the recognition result should
be in Spanish.
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-offline \
--canary-encoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/encoder.int8.onnx \
--canary-decoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/decoder.int8.onnx \
--tokens=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/tokens.txt \
--canary-src-lang=en \
--canary-tgt-lang=es \
./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/test_wavs/en.wav
Note
Please use ./build/bin/Release/sherpa-onnx-offline.exe
for Windows.
You should see the following output:
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --canary-encoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/encoder.int8.onnx --canary-decoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/decoder.int8.onnx --tokens=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/tokens.txt --canary-src-lang=en --canary-tgt-lang=es ./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/test_wavs/en.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/encoder.int8.onnx", decoder="./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/decoder.int8.onnx", src_lang="en", tgt_lang="es", use_pnc=True), telespeech_ctc="", tokens="./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(dict_dir="", lexicon="", rule_fsts=""))
Creating recognizer ...
Started
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:164 Creating a resampler:
in_sample_rate: 24000
output_sample_rate: 16000
Done!
./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/test_wavs/en.wav
{"lang": "", "emotion": "", "event": "", "text": " No preguntes qué puede hacer tu país por ti. Pregúntale qué puedes hacer por tu país.", "timestamps": [], "tokens":[" No", " pre", "g", "unt", "es", " qué", " pue", "de", " ha", "cer", " tu", " pa", "ís", " por", " ti", ".", " P", "re", "g", "ún", "t", "al", "e", " qué", " pue", "d", "es", " ha", "cer", " por", " tu", " pa", "ís", "."], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 0.602 s
Real time factor (RTF): 0.602 / 3.845 = 0.157
Input English, output French
We use --canary-src-lang=en
to indicate that the input audio contains
English speech. --canary-tgt-lang=fr
means the recognition result should
be in French.
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-offline \
--canary-encoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/encoder.int8.onnx \
--canary-decoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/decoder.int8.onnx \
--tokens=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/tokens.txt \
--canary-src-lang=en \
--canary-tgt-lang=fr \
./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/test_wavs/en.wav
Note
Please use ./build/bin/Release/sherpa-onnx-offline.exe
for Windows.
You should see the following output:
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --canary-encoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/encoder.int8.onnx --canary-decoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/decoder.int8.onnx --tokens=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/tokens.txt --canary-src-lang=en --canary-tgt-lang=fr ./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/test_wavs/en.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/encoder.int8.onnx", decoder="./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/decoder.int8.onnx", src_lang="en", tgt_lang="fr", use_pnc=True), telespeech_ctc="", tokens="./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(dict_dir="", lexicon="", rule_fsts=""))
Creating recognizer ...
Started
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:164 Creating a resampler:
in_sample_rate: 24000
output_sample_rate: 16000
Done!
./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/test_wavs/en.wav
{"lang": "", "emotion": "", "event": "", "text": " Demandez-vous ce que votre pays peut faire pour vous. Demandez ce que vous pouvez faire pour votre pays.", "timestamps": [], "tokens":[" D", "em", "and", "ez", "-", "v", "ous", " ce", " que", " vot", "re", " p", "ays", " pe", "ut", " f", "aire", " p", "our", " v", "ous", ".", " D", "em", "and", "ez", " ce", " que", " v", "ous", " p", "ouve", "z", " f", "aire", " p", "our", " vot", "re", " p", "ays", "."], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 0.692 s
Real time factor (RTF): 0.692 / 3.845 = 0.180
Input German, output English
We use --canary-src-lang=de
to indicate that the input audio contains
German speech. --canary-tgt-lang=en
means the recognition result should
be in English.
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-offline \
--canary-encoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/encoder.int8.onnx \
--canary-decoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/decoder.int8.onnx \
--tokens=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/tokens.txt \
--canary-src-lang=de \
--canary-tgt-lang=en \
./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/test_wavs/de.wav
Note
Please use ./build/bin/Release/sherpa-onnx-offline.exe
for Windows.
You should see the following output:
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --canary-encoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/encoder.int8.onnx --canary-decoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/decoder.int8.onnx --tokens=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/tokens.txt --canary-src-lang=de --canary-tgt-lang=en ./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/test_wavs/de.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/encoder.int8.onnx", decoder="./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/decoder.int8.onnx", src_lang="de", tgt_lang="en", use_pnc=True), telespeech_ctc="", tokens="./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(dict_dir="", lexicon="", rule_fsts=""))
Creating recognizer ...
Started
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:164 Creating a resampler:
in_sample_rate: 22050
output_sample_rate: 16000
Done!
./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/test_wavs/de.wav
{"lang": "", "emotion": "", "event": "", "text": " Everything has an end. Only the sausage has two", "timestamps": [], "tokens":[" E", "ver", "y", "th", "ing", " has", " an", " en", "d", ".", " O", "n", "ly", " the", " sa", "us", "age", " has", " two"], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 0.351 s
Real time factor (RTF): 0.351 / 2.752 = 0.128
Input German, output German
We use --canary-src-lang=de
to indicate that the input audio contains
German speech. --canary-tgt-lang=de
means the recognition result should
be in German.
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-offline \
--canary-encoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/encoder.int8.onnx \
--canary-decoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/decoder.int8.onnx \
--tokens=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/tokens.txt \
--canary-src-lang=de \
--canary-tgt-lang=de \
./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/test_wavs/de.wav
Note
Please use ./build/bin/Release/sherpa-onnx-offline.exe
for Windows.
You should see the following output:
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --canary-encoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/encoder.int8.onnx --canary-decoder=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/decoder.int8.onnx --tokens=./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/tokens.txt --canary-src-lang=de --canary-tgt-lang=de ./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/test_wavs/de.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/encoder.int8.onnx", decoder="./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/decoder.int8.onnx", src_lang="de", tgt_lang="de", use_pnc=True), telespeech_ctc="", tokens="./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(dict_dir="", lexicon="", rule_fsts=""))
Creating recognizer ...
Started
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:164 Creating a resampler:
in_sample_rate: 22050
output_sample_rate: 16000
Done!
./sherpa-onnx-nemo-canary-180m-flash-en-es-de-fr-int8/test_wavs/de.wav
{"lang": "", "emotion": "", "event": "", "text": " Alles hat ein Ende, nur die Wurst hat zwei.", "timestamps": [], "tokens":[" All", "es", " hat", " ein", " E", "nd", "e", ",", " nur", " die", " W", "ur", "st", " hat", " zwe", "i", "."], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 0.338 s
Real time factor (RTF): 0.338 / 2.752 = 0.123