Paraformer models

Hint

Please refer to Installation to install sherpa-onnx before you read this section.

sherpa-onnx-paraformer-zh-int8-2025-10-07 (四川话、重庆话、川渝方言)

This model is converted from

It is fine-tuned on csukuangfj/sherpa-onnx-paraformer-zh-2023-03-28 (Chinese + English) with 10k hours of Sichuanese (川渝方言) data.

In the following, we describe how to use it.

Huggingface space

You can visit

to try this model in your browser.

Hint

You need to first select the language 四川话 and then select the model csukuangfj/sherpa-onnx-paraformer-zh-int8-2025-10-07.

Android APKs

Real-time speech recognition Android APKs can be found at

Please always download the latest version.

Hint

Please search for paraformer_四川话.apk in the above page, e.g., sherpa-onnx-1.12.15-arm64-v8a-simulated_streaming_asr-zh-paraformer_四川话.apk.

Download

Please use the following commands to download it:

cd /path/to/sherpa-onnx

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-paraformer-zh-int8-2025-10-07.tar.bz2
tar xvf sherpa-onnx-paraformer-zh-int8-2025-10-07.tar.bz2
rm sherpa-onnx-paraformer-zh-int8-2025-10-07.tar.bz2

After downloading, you should find the following files:

ls -lh sherpa-onnx-paraformer-zh-int8-2025-10-07

total 491872
-rw-r--r--@  1 fangjun  staff   227M  7 Oct 20:19 model.int8.onnx
-rw-r--r--@  1 fangjun  staff   337B  7 Oct 20:19 README.md
drwxr-xr-x@ 18 fangjun  staff   576B  7 Oct 20:19 test_wavs
-rw-r--r--@  1 fangjun  staff    74K  7 Oct 20:19 tokens.txt
ls sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/

1.wav  10.wav 11.wav 12.wav 13.wav 14.wav 15.wav 16.wav 2.wav  3.wav  4.wav  5.wav  6.wav  7.wav  8.wav  9.wav

In the following, we show how to decode the files sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/*.wav.

1.wav

Wave filename Content Ground truth
1.wav 来,哥哥再给你唱首歌。好儿,哎呦,把伴奏给我放起来,放就放嘛,还要躲人家钩子。
./build/bin/sherpa-onnx-offline \
  --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx \
  --num-threads=1 \
  sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/1.wav
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx --num-threads=1 sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/1.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), telespeech_ctc="", tokens="./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
Started
Done!

sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/1.wav
{"lang": "", "emotion": "", "event": "", "text": "来哥哥再给你唱首歌二哎呦把伴奏给我放起来放就放嘛还要多人家钩子", "timestamps": [], "durations": [], "tokens":["来", "哥", "哥", "再", "给", "你", "唱", "首", "歌", "二", "哎", "呦", "把", "伴", "奏", "给", "我", "放", "起", "来", "放", "就", "放", "嘛", "还", "要", "多", "人", "家", "钩", "子"], "words": []}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.573 s
Real time factor (RTF): 0.573 / 7.808 = 0.073

2.wav

Wave filename Content Ground truth
2.wav 对不起,只有二娃才能让我真正体会作为女人的快乐。
./build/bin/sherpa-onnx-offline \
  --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx \
  --num-threads=1 \
  sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/2.wav
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx --num-threads=1 sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/2.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), telespeech_ctc="", tokens="./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
Started
Done!

sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/2.wav
{"lang": "", "emotion": "", "event": "", "text": "对不起只有二娃才能让我真正体会作为女人的快乐", "timestamps": [], "durations": [], "tokens":["对", "不", "起", "只", "有", "二", "娃", "才", "能", "让", "我", "真", "正", "体", "会", "作", "为", "女", "人", "的", "快", "乐"], "words": []}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.325 s
Real time factor (RTF): 0.325 / 4.352 = 0.075

3.wav

Wave filename Content Ground truth
3.wav 我想去逛街,欢迎进入直播间,晚上好,那我的名字是怎么说的呢?
./build/bin/sherpa-onnx-offline \
  --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx \
  --num-threads=1 \
  sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/3.wav
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx --num-threads=1 sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/3.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), telespeech_ctc="", tokens="./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
Started
Done!

sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/3.wav
{"lang": "", "emotion": "", "event": "", "text": "我想去逛街欢迎进入直播间晚上好那我的名字是怎么说的嘞", "timestamps": [], "durations": [], "tokens":["我", "想", "去", "逛", "街", "欢", "迎", "进", "入", "直", "播", "间", "晚", "上", "好", "那", "我", "的", "名", "字", "是", "怎", "么", "说", "的", "嘞"], "words": []}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.498 s
Real time factor (RTF): 0.498 / 6.784 = 0.073

4.wav

Wave filename Content Ground truth
4.wav 梦见的就是你,不行啊,有四川话根本唱不起来,根本唱不起来呀!
./build/bin/sherpa-onnx-offline \
  --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx \
  --num-threads=1 \
  sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/4.wav
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx --num-threads=1 sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/4.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), telespeech_ctc="", tokens="./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
Started
Done!

sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/4.wav
{"lang": "", "emotion": "", "event": "", "text": "梦见就是你不行啊有是这话根本藏不起啊根本藏不起来呀", "timestamps": [], "durations": [], "tokens":["梦", "见", "就", "是", "你", "不", "行", "啊", "有", "是", "这", "话", "根", "本", "藏", "不", "起", "啊", "根", "本", "藏", "不", "起", "来", "呀"], "words": []}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.565 s
Real time factor (RTF): 0.565 / 7.552 = 0.075

5.wav

Wave filename Content Ground truth
5.wav 就临走那天挑了个飘了一下嗨呀,弟弟灵魂儿就飞上九霄云,就飘着一下魂都飞了,对不对?
./build/bin/sherpa-onnx-offline \
  --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx \
  --num-threads=1 \
  sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/5.wav
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx --num-threads=1 sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/5.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), telespeech_ctc="", tokens="./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
Started
Done!

sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/5.wav
{"lang": "", "emotion": "", "event": "", "text": "就临走那个挑了个飘了一眼嗨呀弟弟灵魂儿就飞上九霄云就飘这一下魂都飞了对不对", "timestamps": [], "durations": [], "tokens":["就", "临", "走", "那", "个", "挑", "了", "个", "飘", "了", "一", "眼", "嗨", "呀", "弟", "弟", "灵", "魂", "儿", "就", "飞", "上", "九", "霄", "云", "就", "飘", "这", "一", "下", "魂", "都", "飞", "了", "对", "不", "对"], "words": []}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.696 s
Real time factor (RTF): 0.696 / 9.088 = 0.077

6.wav

Wave filename Content Ground truth
6.wav 是不是给人感觉后头是青花亮色的然后说话是很平和的眼神是不慌乱的不散的。
./build/bin/sherpa-onnx-offline \
  --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx \
  --num-threads=1 \
  sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/6.wav
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx --num-threads=1 sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/6.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), telespeech_ctc="", tokens="./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
Started
Done!

sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/6.wav
{"lang": "", "emotion": "", "event": "", "text": "是不是给人感觉好多是青花亮色的然后说话是很平和的眼神是不慌乱的不闪的", "timestamps": [], "durations": [], "tokens":["是", "不", "是", "给", "人", "感", "觉", "好", "多", "是", "青", "花", "亮", "色", "的", "然", "后", "说", "话", "是", "很", "平", "和", "的", "眼", "神", "是", "不", "慌", "乱", "的", "不", "闪", "的"], "words": []}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.589 s
Real time factor (RTF): 0.589 / 7.808 = 0.075

7.wav

Wave filename Content Ground truth
7.wav 他坐在椅子上,挺直起腰杆,脸上展现出灿烂的笑容。
./build/bin/sherpa-onnx-offline \
  --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx \
  --num-threads=1 \
  sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/7.wav
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx --num-threads=1 sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/7.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), telespeech_ctc="", tokens="./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
Started
Done!

sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/7.wav
{"lang": "", "emotion": "", "event": "", "text": "他坐在椅子上挺直起腰杆脸上展现出灿烂的笑容", "timestamps": [], "durations": [], "tokens":["他", "坐", "在", "椅", "子", "上", "挺", "直", "起", "腰", "杆", "脸", "上", "展", "现", "出", "灿", "烂", "的", "笑", "容"], "words": []}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.559 s
Real time factor (RTF): 0.559 / 7.600 = 0.074

8.wav

Wave filename Content Ground truth
8.wav 唤起路由无限的感慨,使他,更加痛恨官场的欺诈污浊。
./build/bin/sherpa-onnx-offline \
  --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx \
  --num-threads=1 \
  sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/8.wav
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx --num-threads=1 sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/8.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), telespeech_ctc="", tokens="./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
Started
Done!

sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/8.wav
{"lang": "", "emotion": "", "event": "", "text": "换起路游无限的感慨使他更加痛恨官场的倾炸污浊", "timestamps": [], "durations": [], "tokens":["换", "起", "路", "游", "无", "限", "的", "感", "慨", "使", "他", "更", "加", "痛", "恨", "官", "场", "的", "倾", "炸", "污", "浊"], "words": []}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.541 s
Real time factor (RTF): 0.541 / 7.296 = 0.074

9.wav

Wave filename Content Ground truth
9.wav 看面貌约五十左右却自称活了两百多岁,在清顺治时出家当过和尚,还有杜蝶为证。
./build/bin/sherpa-onnx-offline \
  --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx \
  --num-threads=1 \
  sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/9.wav
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx --num-threads=1 sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/9.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), telespeech_ctc="", tokens="./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
Started
Done!

sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/9.wav
{"lang": "", "emotion": "", "event": "", "text": "看面貌约五十左右却自称活了两百多岁在清顺治时出家当过和尚还有杜蝶为政", "timestamps": [], "durations": [], "tokens":["看", "面", "貌", "约", "五", "十", "左", "右", "却", "自", "称", "活", "了", "两", "百", "多", "岁", "在", "清", "顺", "治", "时", "出", "家", "当", "过", "和", "尚", "还", "有", "杜", "蝶", "为", "政"], "words": []}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.694 s
Real time factor (RTF): 0.694 / 9.088 = 0.076

10.wav

Wave filename Content Ground truth
10.wav 其言曰,士大夫以其见闻之广反各有所偏,自有负担杀者有负良骑者。
./build/bin/sherpa-onnx-offline \
  --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx \
  --num-threads=1 \
  sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/10.wav
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx --num-threads=1 sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/10.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), telespeech_ctc="", tokens="./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
Started
Done!

sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/10.wav
{"lang": "", "emotion": "", "event": "", "text": "其言曰士大夫一其见闻之广反各有所偏自有福单杀者有福良记者", "timestamps": [], "durations": [], "tokens":["其", "言", "曰", "士", "大", "夫", "一", "其", "见", "闻", "之", "广", "反", "各", "有", "所", "偏", "自", "有", "福", "单", "杀", "者", "有", "福", "良", "记", "者"], "words": []}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.611 s
Real time factor (RTF): 0.611 / 8.192 = 0.075

11.wav

Wave filename Content Ground truth
11.wav 据说有网友坐飞机的时候呢,广播全程播报。
./build/bin/sherpa-onnx-offline \
  --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx \
  --num-threads=1 \
  sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/11.wav
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx --num-threads=1 sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/11.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), telespeech_ctc="", tokens="./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
Started
Done!

sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/11.wav
{"lang": "", "emotion": "", "event": "", "text": "据说有网友坐飞机的时候呢广播全程播报", "timestamps": [], "durations": [], "tokens":["据", "说", "有", "网", "友", "坐", "飞", "机", "的", "时", "候", "呢", "广", "播", "全", "程", "播", "报"], "words": []}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.294 s
Real time factor (RTF): 0.294 / 3.712 = 0.079

12.wav

Wave filename Content Ground truth
12.wav 将溃疡两周以上都应该及时就医,据了解啊小云平时呢都喜欢吃比较烫的饭菜,也喜欢吃麻辣烫火锅之类的高温食物。
./build/bin/sherpa-onnx-offline \
  --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx \
  --num-threads=1 \
  sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/12.wav
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx --num-threads=1 sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/12.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), telespeech_ctc="", tokens="./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
Started
Done!

sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/12.wav
{"lang": "", "emotion": "", "event": "", "text": "腔溃疡两周以上都应该及时就医据了解啊小云平时呢都喜欢吃比较烫的饭菜也喜欢吃麻辣烫火锅之类的高温食物", "timestamps": [], "durations": [], "tokens":["腔", "溃", "疡", "两", "周", "以", "上", "都", "应", "该", "及", "时", "就", "医", "据", "了", "解", "啊", "小", "云", "平", "时", "呢", "都", "喜", "欢", "吃", "比", "较", "烫", "的", "饭", "菜", "也", "喜", "欢", "吃", "麻", "辣", "烫", "火", "锅", "之", "类", "的", "高", "温", "食", "物"], "words": []}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.818 s
Real time factor (RTF): 0.818 / 10.880 = 0.075

13.wav

Wave filename Content Ground truth
13.wav 绝佳好位置好像我被看到了,就问你敢不敢进来吧你,一套带走猪脚亮。
./build/bin/sherpa-onnx-offline \
  --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx \
  --num-threads=1 \
  sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/13.wav
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx --num-threads=1 sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/13.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), telespeech_ctc="", tokens="./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
Started
Done!

sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/13.wav
{"lang": "", "emotion": "", "event": "", "text": "绝佳好位置好像我被看到了如问你敢不敢进来把你一套带走猪脚亮", "timestamps": [], "durations": [], "tokens":["绝", "佳", "好", "位", "置", "好", "像", "我", "被", "看", "到", "了", "如", "问", "你", "敢", "不", "敢", "进", "来", "把", "你", "一", "套", "带", "走", "猪", "脚", "亮"], "words": []}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.444 s
Real time factor (RTF): 0.444 / 5.760 = 0.077

14.wav

Wave filename Content Ground truth
14.wav 两岸猿声啼不住,有家难回车里住。
./build/bin/sherpa-onnx-offline \
  --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx \
  --num-threads=1 \
  sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/14.wav
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx --num-threads=1 sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/14.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), telespeech_ctc="", tokens="./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
Started
Done!

sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/14.wav
{"lang": "", "emotion": "", "event": "", "text": "两岸猿声啼不住有家难回车里住", "timestamps": [], "durations": [], "tokens":["两", "岸", "猿", "声", "啼", "不", "住", "有", "家", "难", "回", "车", "里", "住"], "words": []}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.418 s
Real time factor (RTF): 0.418 / 5.632 = 0.074

15.wav

Wave filename Content Ground truth
15.wav 杨大人一律就退还会再要求,以关注货币,来补助这个差额,天宝年间杨胜坚转任。
./build/bin/sherpa-onnx-offline \
  --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx \
  --num-threads=1 \
  sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/15.wav
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx --num-threads=1 sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/15.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), telespeech_ctc="", tokens="./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
Started
Done!

sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/15.wav
{"lang": "", "emotion": "", "event": "", "text": "杨大人一律就退还会者要求以关注和必来补助这个差额天宝年间杨胜坚转任", "timestamps": [], "durations": [], "tokens":["杨", "大", "人", "一", "律", "就", "退", "还", "会", "者", "要", "求", "以", "关", "注", "和", "必", "来", "补", "助", "这", "个", "差", "额", "天", "宝", "年", "间", "杨", "胜", "坚", "转", "任"], "words": []}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.758 s
Real time factor (RTF): 0.758 / 10.112 = 0.075

16.wav

Wave filename Content Ground truth
16.wav 做钱的速度还快,这真的是,一个经济爆发式增长的时代。
./build/bin/sherpa-onnx-offline \
  --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx \
  --num-threads=1 \
  sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/16.wav
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx --num-threads=1 sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/16.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), telespeech_ctc="", tokens="./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
Started
Done!

sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/16.wav
{"lang": "", "emotion": "", "event": "", "text": "做钱的速度还快这真的是一个经济爆发式增长的时代", "timestamps": [], "durations": [], "tokens":["做", "钱", "的", "速", "度", "还", "快", "这", "真", "的", "是", "一", "个", "经", "济", "爆", "发", "式", "增", "长", "的", "时", "代"], "words": []}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.428 s
Real time factor (RTF): 0.428 / 5.632 = 0.076

Speech recognition from a microphone

./build/bin/sherpa-onnx-microphone-offline \
  --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx

Speech recognition from a microphone with VAD

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx

./build/bin/sherpa-onnx-vad-microphone-offline-asr \
  --silero-vad-model=./silero_vad.onnx \
  --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx

Real-time speech recognition from a microphone with VAD

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx

./build/bin/sherpa-onnx-vad-microphone-simulated-streaming-asr \
  --silero-vad-model=./silero_vad.onnx \
  --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx

csukuangfj/sherpa-onnx-paraformer-trilingual-zh-cantonese-en (Chinese + English + Cantonese 粤语)

Note

This model does not support timestamps. It is a trilingual model, supporting both Chinese and English. (支持普通话、粤语、河南话、天津话、四川话等方言)

This model is converted from

https://www.modelscope.cn/models/dengcunqin/speech_seaco_paraformer_large_asr_nat-zh-cantonese-en-16k-common-vocab11666-pytorch/summary

In the following, we describe how to download it and use it with sherpa-onnx.

Please use the following commands to download it.

cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-paraformer-trilingual-zh-cantonese-en.tar.bz2

tar xvf sherpa-onnx-paraformer-trilingual-zh-cantonese-en.tar.bz2

Please check that the file sizes of the pre-trained models are correct. See the file sizes of *.onnx files below.

sherpa-onnx-paraformer-trilingual-zh-cantonese-en$ ls -lh *.onnx

-rw-r--r-- 1 1001 127 234M Mar 10 02:12 model.int8.onnx
-rw-r--r-- 1 1001 127 831M Mar 10 02:12 model.onnx

Hint

It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.

fp32

The following code shows how to use fp32 models to decode wave files:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-offline \
  --tokens=./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/model.onnx \
  ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/1.wav \
  ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/2.wav \
  ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/3-sichuan.wav \
  ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/4-tianjin.wav \
  ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/5-henan.wav \
  ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/6-zh-en.wav

Note

Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.

Caution

If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

You should see the following output:

/project/sherpa-onnx/csrc/parse-options.cc:Read:361 sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/tokens.txt --paraformer=./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/model.onnx ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/1.wav ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/2.wav ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/3-sichuan.wav ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/4-tianjin.wav ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/5-henan.wav ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/6-zh-en.wav 

OfflineRecognizerConfig(feat_config=OfflineFeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/model.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), tokens="./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0)
Creating recognizer ...
Started
/project/sherpa-onnx/csrc/offline-paraformer-greedy-search-decoder.cc:Decode:65 time stamp for batch: 0, 13 vs -1
/project/sherpa-onnx/csrc/offline-paraformer-greedy-search-decoder.cc:Decode:65 time stamp for batch: 1, 15 vs -1
/project/sherpa-onnx/csrc/offline-paraformer-greedy-search-decoder.cc:Decode:65 time stamp for batch: 2, 40 vs -1
/project/sherpa-onnx/csrc/offline-paraformer-greedy-search-decoder.cc:Decode:65 time stamp for batch: 3, 41 vs -1
/project/sherpa-onnx/csrc/offline-paraformer-greedy-search-decoder.cc:Decode:65 time stamp for batch: 4, 37 vs -1
/project/sherpa-onnx/csrc/offline-paraformer-greedy-search-decoder.cc:Decode:65 time stamp for batch: 5, 16 vs -1
Done!

./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/1.wav
{"text": "有无人知道湾仔活道系点去㗎", "timestamps": [], "tokens":["有", "无", "人", "知", "道", "湾", "仔", "活", "道", "系", "点", "去", "㗎"]}
----
./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/2.wav
{"text": "我喺黄大仙九龙塘联合道荡失路啊", "timestamps": [], "tokens":["我", "喺", "黄", "大", "仙", "九", "龙", "塘", "联", "合", "道", "荡", "失", "路", "啊"]}
----
./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/3-sichuan.wav
{"text": "自己就是在那个在那个就是在情节里面就是感觉是演得特别好就是好像很真实一样你知道吧", "timestamps": [], "tokens":["自", "己", "就", "是", "在", "那", "个", "在", "那", "个", "就", "是", "在", "情", "节", "里", "面", "就", "是", "感", "觉", "是", "演", "得", "特", "别", "好", "就", "是", "好", "像", "很", "真", "实", "一", "样", "你", "知", "道", "吧"]}
----
./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/4-tianjin.wav
{"text": "其实他就是怕每个人都可以守法就这意思法律意识太单薄了而且就是嗯也不顾及到别人的感受", "timestamps": [], "tokens":["其", "实", "他", "就", "是", "怕", "每", "个", "人", "都", "可", "以", "守", "法", "就", "这", "意", "思", "法", "律", "意", "识", "太", "单", "薄", "了", "而", "且", "就", "是", "嗯", "也", "不", "顾", "及", "到", "别", "人", "的", "感", "受"]}
----
./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/5-henan.wav
{"text": "它这个管一下都通到有时候都通到七八层楼高然后它这管一下就可以浇到那那柱子上", "timestamps": [], "tokens":["它", "这", "个", "管", "一", "下", "都", "通", "到", "有", "时", "候", "都", "通", "到", "七", "八", "层", "楼", "高", "然", "后", "它", "这", "管", "一", "下", "就", "可", "以", "浇", "到", "那", "那", "柱", "子", "上"]}
----
./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/6-zh-en.wav
{"text": " yesterday was 星期一 today is tuesday 明天是星期三", "timestamps": [], "tokens":["yesterday", "was", "星", "期", "一", "today", "is", "tu@@", "es@@", "day", "明", "天", "是", "星", "期", "三"]}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 6.871 s
Real time factor (RTF): 6.871 / 42.054 = 0.163

int8

The following code shows how to use int8 models to decode wave files:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-offline \
  --tokens=./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/model.int8.onnx \
  ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/1.wav \
  ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/2.wav \
  ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/3-sichuan.wav \
  ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/4-tianjin.wav \
  ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/5-henan.wav \
  ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/6-zh-en.wav

Note

Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.

Caution

If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

You should see the following output:

/project/sherpa-onnx/csrc/parse-options.cc:Read:361 sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/tokens.txt --paraformer=./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/model.int8.onnx ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/1.wav ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/2.wav ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/3-sichuan.wav ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/4-tianjin.wav ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/5-henan.wav ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/6-zh-en.wav 

OfflineRecognizerConfig(feat_config=OfflineFeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), tokens="./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0)
Creating recognizer ...
Started
/project/sherpa-onnx/csrc/offline-paraformer-greedy-search-decoder.cc:Decode:65 time stamp for batch: 0, 13 vs -1
/project/sherpa-onnx/csrc/offline-paraformer-greedy-search-decoder.cc:Decode:65 time stamp for batch: 1, 15 vs -1
/project/sherpa-onnx/csrc/offline-paraformer-greedy-search-decoder.cc:Decode:65 time stamp for batch: 2, 40 vs -1
/project/sherpa-onnx/csrc/offline-paraformer-greedy-search-decoder.cc:Decode:65 time stamp for batch: 3, 41 vs -1
/project/sherpa-onnx/csrc/offline-paraformer-greedy-search-decoder.cc:Decode:65 time stamp for batch: 4, 37 vs -1
/project/sherpa-onnx/csrc/offline-paraformer-greedy-search-decoder.cc:Decode:65 time stamp for batch: 5, 16 vs -1
Done!

./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/1.wav
{"text": "有无人知道湾仔活道系点去㗎", "timestamps": [], "tokens":["有", "无", "人", "知", "道", "湾", "仔", "活", "道", "系", "点", "去", "㗎"]}
----
./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/2.wav
{"text": "我喺黄大仙九龙塘联合道荡失路啊", "timestamps": [], "tokens":["我", "喺", "黄", "大", "仙", "九", "龙", "塘", "联", "合", "道", "荡", "失", "路", "啊"]}
----
./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/3-sichuan.wav
{"text": "自己就是在那个在那个就是在情节里面就是感觉是演得特别好就是好像很真实一样你知道吧", "timestamps": [], "tokens":["自", "己", "就", "是", "在", "那", "个", "在", "那", "个", "就", "是", "在", "情", "节", "里", "面", "就", "是", "感", "觉", "是", "演", "得", "特", "别", "好", "就", "是", "好", "像", "很", "真", "实", "一", "样", "你", "知", "道", "吧"]}
----
./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/4-tianjin.wav
{"text": "其实他就是怕每个人都可以守法就这意思法律意识太单薄了而且就是嗯也不顾及到别人的感受", "timestamps": [], "tokens":["其", "实", "他", "就", "是", "怕", "每", "个", "人", "都", "可", "以", "守", "法", "就", "这", "意", "思", "法", "律", "意", "识", "太", "单", "薄", "了", "而", "且", "就", "是", "嗯", "也", "不", "顾", "及", "到", "别", "人", "的", "感", "受"]}
----
./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/5-henan.wav
{"text": "它这个管一下都通到有时候都通到七八层楼高然后它这管一下就可以浇到那那柱子上", "timestamps": [], "tokens":["它", "这", "个", "管", "一", "下", "都", "通", "到", "有", "时", "候", "都", "通", "到", "七", "八", "层", "楼", "高", "然", "后", "它", "这", "管", "一", "下", "就", "可", "以", "浇", "到", "那", "那", "柱", "子", "上"]}
----
./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/6-zh-en.wav
{"text": " yesterday was 星期一 today is tuesday 明天是星期三", "timestamps": [], "tokens":["yesterday", "was", "星", "期", "一", "today", "is", "tu@@", "es@@", "day", "明", "天", "是", "星", "期", "三"]}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 6.290 s
Real time factor (RTF): 6.290 / 42.054 = 0.150
cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-microphone-offline \
  --tokens=./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/model.int8.onnx

csukuangfj/sherpa-onnx-paraformer-en-2024-03-09 (English)

Note

This model does not support timestamps. It supports only English.

This model is converted from

https://www.modelscope.cn/models/iic/speech_paraformer_asr-en-16k-vocab4199-pytorch/summary

In the following, we describe how to download it and use it with sherpa-onnx.

Please use the following commands to download it.

cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-paraformer-en-2024-03-09.tar.bz2

tar xvf sherpa-onnx-paraformer-en-2024-03-09.tar.bz2

Please check that the file sizes of the pre-trained models are correct. See the file sizes of *.onnx files below.

sherpa-onnx-paraformer-en-2024-03-09$ ls -lh *.onnx

-rw-r--r-- 1 1001 127 220M Mar 10 02:12 model.int8.onnx
-rw-r--r-- 1 1001 127 817M Mar 10 02:12 model.onnx

Hint

It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.

fp32

The following code shows how to use fp32 models to decode wave files:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-offline \
  --tokens=./sherpa-onnx-paraformer-en-2024-03-09/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-en-2024-03-09/model.onnx \
  ./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/0.wav \
  ./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/1.wav \
  ./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/8k.wav

Note

Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.

You should see the following output:

/project/sherpa-onnx/csrc/parse-options.cc:Read:361 sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-en-2024-03-09/tokens.txt --paraformer=./sherpa-onnx-paraformer-en-2024-03-09/model.onnx ./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/0.wav ./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/1.wav ./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/8k.wav 

OfflineRecognizerConfig(feat_config=OfflineFeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-en-2024-03-09/model.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), tokens="./sherpa-onnx-paraformer-en-2024-03-09/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0)
Creating recognizer ...
Started
/project/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:119 Creating a resampler:
   in_sample_rate: 8000
   output_sample_rate: 16000

Done!

./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/0.wav
{"text": " after early nightfall the yellow lamps would light up here and there the squalid quarter of the brothels", "timestamps": [], "tokens":["after", "early", "ni@@", "ght@@", "fall", "the", "yel@@", "low", "la@@", "mp@@", "s", "would", "light", "up", "here", "and", "there", "the", "squ@@", "al@@", "id", "quarter", "of", "the", "bro@@", "the@@", "ls"]}
----
./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/1.wav
{"text": " god as a direct consequence of the sin which man thus punished had given her a lovely child whose place was 'on' that same dishonoured bosom to connect her parent for ever with the race and descent of mortals and to be finally a blessed soul in heaven", "timestamps": [], "tokens":["god", "as", "a", "direct", "con@@", "sequence", "of", "the", "sin", "which", "man", "thus", "p@@", "uni@@", "shed", "had", "given", "her", "a", "lo@@", "vely", "child", "whose", "place", "was", "'on'", "that", "same", "di@@", "sh@@", "on@@", "ou@@", "red", "bo@@", "so@@", "m", "to", "connect", "her", "paren@@", "t", "for", "ever", "with", "the", "race", "and", "des@@", "cent", "of", "mor@@", "tal@@", "s", "and", "to", "be", "finally", "a", "bl@@", "essed", "soul", "in", "hea@@", "ven"]}
----
./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/8k.wav
{"text": " yet these thoughts affected hester prynne less with hope than apprehension", "timestamps": [], "tokens":["yet", "these", "thoughts", "aff@@", "ected", "he@@", "ster", "pr@@", "y@@", "n@@", "ne", "less", "with", "hope", "than", "ap@@", "pre@@", "hen@@", "sion"]}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 7.173 s
Real time factor (RTF): 7.173 / 28.165 = 0.255

int8

The following code shows how to use int8 models to decode wave files:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-offline \
  --tokens=./sherpa-onnx-paraformer-en-2024-03-09/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-en-2024-03-09/model.int8.onnx \
  ./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/0.wav \
  ./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/1.wav \
  ./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/8k.wav

Note

Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.

You should see the following output:

/project/sherpa-onnx/csrc/parse-options.cc:Read:361 sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-en-2024-03-09/tokens.txt --paraformer=./sherpa-onnx-paraformer-en-2024-03-09/model.int8.onnx ./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/0.wav ./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/1.wav ./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/8k.wav 

OfflineRecognizerConfig(feat_config=OfflineFeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-en-2024-03-09/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), tokens="./sherpa-onnx-paraformer-en-2024-03-09/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0)
Creating recognizer ...
Started
/project/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:119 Creating a resampler:
   in_sample_rate: 8000
   output_sample_rate: 16000

Done!

./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/0.wav
{"text": " after early nightfall the yellow lamps would light up here and there the squalid quarter of the brothels", "timestamps": [], "tokens":["after", "early", "ni@@", "ght@@", "fall", "the", "yel@@", "low", "la@@", "mp@@", "s", "would", "light", "up", "here", "and", "there", "the", "squ@@", "al@@", "id", "quarter", "of", "the", "bro@@", "the@@", "ls"]}
----
./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/1.wav
{"text": " god as a direct consequence of the sin which man thus punished had given her a lovely child whose place was 'on' that same dishonoured bosom to connect her parent for ever with the race and descent of mortals and to be finally a blessed soul in heaven", "timestamps": [], "tokens":["god", "as", "a", "direct", "con@@", "sequence", "of", "the", "sin", "which", "man", "thus", "p@@", "uni@@", "shed", "had", "given", "her", "a", "lo@@", "vely", "child", "whose", "place", "was", "'on'", "that", "same", "di@@", "sh@@", "on@@", "ou@@", "red", "bo@@", "so@@", "m", "to", "connect", "her", "paren@@", "t", "for", "ever", "with", "the", "race", "and", "des@@", "cent", "of", "mor@@", "tal@@", "s", "and", "to", "be", "finally", "a", "bl@@", "essed", "soul", "in", "hea@@", "ven"]}
----
./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/8k.wav
{"text": " yet these thoughts affected hester prynne less with hope than apprehension", "timestamps": [], "tokens":["yet", "these", "thoughts", "aff@@", "ected", "he@@", "ster", "pr@@", "y@@", "n@@", "ne", "less", "with", "hope", "than", "ap@@", "pre@@", "hen@@", "sion"]}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 5.492 s
Real time factor (RTF): 5.492 / 28.165 = 0.195
cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-microphone-offline \
  --tokens=./sherpa-onnx-paraformer-en-2024-03-09/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-en-2024-03-09/model.int8.onnx

csukuangfj/sherpa-onnx-paraformer-zh-small-2024-03-09 (Chinese + English)

Note

This model does not support timestamps. It is a bilingual model, supporting both Chinese and English. (支持普通话、河南话、天津话、四川话等方言)

This model is converted from

https://www.modelscope.cn/models/crazyant/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8358-onnx/summary

In the following, we describe how to download it and use it with sherpa-onnx.

Please use the following commands to download it.

cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-paraformer-zh-small-2024-03-09.tar.bz2

tar xvf sherpa-onnx-paraformer-zh-small-2024-03-09.tar.bz2

Please check that the file sizes of the pre-trained models are correct. See the file sizes of *.onnx files below.

sherpa-onnx-paraformer-zh-small-2024-03-09$ ls -lh *.onnx

-rw-r--r-- 1 1001 127 79M Mar 10 00:48 model.int8.onnx

Hint

It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.

int8

The following code shows how to use int8 models to decode wave files:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-offline \
  --tokens=./sherpa-onnx-paraformer-zh-small-2024-03-09/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-zh-small-2024-03-09/model.int8.onnx \
  ./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/0.wav \
  ./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/1.wav \
  ./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/8k.wav \
  ./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/2-zh-en.wav \
  ./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/3-sichuan.wav \
  ./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/4-tianjin.wav \
  ./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/5-henan.wav

Note

Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.

Caution

If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

You should see the following output:

/project/sherpa-onnx/csrc/parse-options.cc:Read:361 sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-small-2024-03-09/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-small-2024-03-09/model.int8.onnx ./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/0.wav ./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/1.wav ./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/8k.wav ./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/2-zh-en.wav ./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/3-sichuan.wav ./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/4-tianjin.wav ./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/5-henan.wav 

OfflineRecognizerConfig(feat_config=OfflineFeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-small-2024-03-09/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), tokens="./sherpa-onnx-paraformer-zh-small-2024-03-09/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0)
Creating recognizer ...
Started
/project/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:119 Creating a resampler:
   in_sample_rate: 8000
   output_sample_rate: 16000

Done!

./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/0.wav
{"text": "对我做了介绍啊那么我想说的是呢大家如果对我的研究感兴趣呢", "timestamps": [], "tokens":["对", "我", "做", "了", "介", "绍", "啊", "那", "么", "我", "想", "说", "的", "是", "呢", "大", "家", "如", "果", "对", "我", "的", "研", "究", "感", "兴", "趣", "呢"]}
----
./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/1.wav
{"text": "重点呢想谈三个问题首先呢就是这一轮全球金融动荡的表现", "timestamps": [], "tokens":["重", "点", "呢", "想", "谈", "三", "个", "问", "题", "首", "先", "呢", "就", "是", "这", "一", "轮", "全", "球", "金", "融", "动", "荡", "的", "表", "现"]}
----
./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/8k.wav
{"text": "深入的分析这一次全球金融动荡背后的根源", "timestamps": [], "tokens":["深", "入", "的", "分", "析", "这", "一", "次", "全", "球", "金", "融", "动", "荡", "背", "后", "的", "根", "源"]}
----
./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/2-zh-en.wav
{"text": " yesterday was 星期一 today is tuesday 明天是星期三", "timestamps": [], "tokens":["ye@@", "ster@@", "day", "was", "星", "期", "一", "today", "is", "tu@@", "es@@", "day", "明", "天", "是", "星", "期", "三"]}
----
./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/3-sichuan.wav
{"text": "自己就是在那个在那个就是在情节里面就是感觉是演的特别好就是好像很真实一样你知道吧", "timestamps": [], "tokens":["自", "己", "就", "是", "在", "那", "个", "在", "那", "个", "就", "是", "在", "情", "节", "里", "面", "就", "是", "感", "觉", "是", "演", "的", "特", "别", "好", "就", "是", "好", "像", "很", "真", "实", "一", "样", "你", "知", "道", "吧"]}
----
./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/4-tianjin.wav
{"text": "其实他就是怕每个人都可以守法就这意思法律意识太单薄了而且就是嗯也不顾及到别人的感受", "timestamps": [], "tokens":["其", "实", "他", "就", "是", "怕", "每", "个", "人", "都", "可", "以", "守", "法", "就", "这", "意", "思", "法", "律", "意", "识", "太", "单", "薄", "了", "而", "且", "就", "是", "嗯", "也", "不", "顾", "及", "到", "别", "人", "的", "感", "受"]}
----
./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/5-henan.wav
{"text": "他这个管一向都通到有时候都通到七八层楼缸然后他这管一向就可以浇到那个那柱子上", "timestamps": [], "tokens":["他", "这", "个", "管", "一", "向", "都", "通", "到", "有", "时", "候", "都", "通", "到", "七", "八", "层", "楼", "缸", "然", "后", "他", "这", "管", "一", "向", "就", "可", "以", "浇", "到", "那", "个", "那", "柱", "子", "上"]}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 3.562 s
Real time factor (RTF): 3.562 / 47.023 = 0.076
cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-microphone-offline \
  --tokens=./sherpa-onnx-paraformer-zh-small-2024-03-09/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-zh-small-2024-03-09/model.int8.onnx

csukuangfj/sherpa-onnx-paraformer-zh-2024-03-09 (Chinese + English)

Note

This model does not support timestamps. It is a bilingual model, supporting both Chinese and English. (支持普通话、河南话、天津话、四川话等方言)

This model is converted from

https://www.modelscope.cn/models/iic/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8358-tensorflow1/summary

In the following, we describe how to download it and use it with sherpa-onnx.

Please use the following commands to download it.

cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-paraformer-zh-2024-03-09.tar.bz2

tar xvf sherpa-onnx-paraformer-zh-2024-03-09.tar.bz2

Please check that the file sizes of the pre-trained models are correct. See the file sizes of *.onnx files below.

sherpa-onnx-paraformer-zh-2024-03-09$ ls -lh *.onnx

-rw-r--r-- 1 1001 127 217M Mar 10 02:22 model.int8.onnx
-rw-r--r-- 1 1001 127 785M Mar 10 02:22 model.onnx

Hint

It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.

fp32

The following code shows how to use fp32 models to decode wave files:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-offline \
  --tokens=./sherpa-onnx-paraformer-zh-2024-03-09/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-zh-2024-03-09/model.onnx \
  ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/0.wav \
  ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/1.wav \
  ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/8k.wav \
  ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/2-zh-en.wav \
  ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/3-sichuan.wav \
  ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/4-tianjin.wav \
  ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/5-henan.wav

Note

Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.

Caution

If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

You should see the following output:

/project/sherpa-onnx/csrc/parse-options.cc:Read:361 sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-2024-03-09/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-2024-03-09/model.onnx ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/0.wav ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/1.wav ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/8k.wav ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/2-zh-en.wav ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/3-sichuan.wav ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/4-tianjin.wav ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/5-henan.wav 

OfflineRecognizerConfig(feat_config=OfflineFeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-2024-03-09/model.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), tokens="./sherpa-onnx-paraformer-zh-2024-03-09/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0)
Creating recognizer ...
Started
/project/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:119 Creating a resampler:
   in_sample_rate: 8000
   output_sample_rate: 16000

Done!

./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/0.wav
{"text": "对我做了介绍啊那么我想说的是呢大家如果对我的研究感兴趣呢你", "timestamps": [], "tokens":["对", "我", "做", "了", "介", "绍", "啊", "那", "么", "我", "想", "说", "的", "是", "呢", "大", "家", "如", "果", "对", "我", "的", "研", "究", "感", "兴", "趣", "呢", "你"]}
----
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/1.wav
{"text": "重点呢想谈三个问题首先呢就是这一轮全球金融动荡的表现", "timestamps": [], "tokens":["重", "点", "呢", "想", "谈", "三", "个", "问", "题", "首", "先", "呢", "就", "是", "这", "一", "轮", "全", "球", "金", "融", "动", "荡", "的", "表", "现"]}
----
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/8k.wav
{"text": "深入的分析这一次全球金融动荡背后的根源", "timestamps": [], "tokens":["深", "入", "的", "分", "析", "这", "一", "次", "全", "球", "金", "融", "动", "荡", "背", "后", "的", "根", "源"]}
----
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/2-zh-en.wav
{"text": " yesterday was 星期一 today is tuesday 明天是星期三", "timestamps": [], "tokens":["ye@@", "ster@@", "day", "was", "星", "期", "一", "today", "is", "tu@@", "es@@", "day", "明", "天", "是", "星", "期", "三"]}
----
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/3-sichuan.wav
{"text": "自己就是在那个在那个就是在情节里面就是感觉是演的特别好就是好像很真实一样你知道吧", "timestamps": [], "tokens":["自", "己", "就", "是", "在", "那", "个", "在", "那", "个", "就", "是", "在", "情", "节", "里", "面", "就", "是", "感", "觉", "是", "演", "的", "特", "别", "好", "就", "是", "好", "像", "很", "真", "实", "一", "样", "你", "知", "道", "吧"]}
----
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/4-tianjin.wav
{"text": "其实他就是怕每个人都可以守法就这意思法律意识太单薄了而且就是嗯也不顾及到别人的感受", "timestamps": [], "tokens":["其", "实", "他", "就", "是", "怕", "每", "个", "人", "都", "可", "以", "守", "法", "就", "这", "意", "思", "法", "律", "意", "识", "太", "单", "薄", "了", "而", "且", "就", "是", "嗯", "也", "不", "顾", "及", "到", "别", "人", "的", "感", "受"]}
----
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/5-henan.wav
{"text": "他这个管一向都通到有时候都通到七八层楼高然后他的管一向就可以交到那个那柱子上", "timestamps": [], "tokens":["他", "这", "个", "管", "一", "向", "都", "通", "到", "有", "时", "候", "都", "通", "到", "七", "八", "层", "楼", "高", "然", "后", "他", "的", "管", "一", "向", "就", "可", "以", "交", "到", "那", "个", "那", "柱", "子", "上"]}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 6.829 s
Real time factor (RTF): 6.829 / 47.023 = 0.145

int8

The following code shows how to use int8 models to decode wave files:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-offline \
  --tokens=./sherpa-onnx-paraformer-zh-2024-03-09/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-zh-2024-03-09/model.int8.onnx \
  ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/0.wav \
  ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/1.wav \
  ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/8k.wav \
  ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/2-zh-en.wav \
  ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/3-sichuan.wav \
  ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/4-tianjin.wav \
  ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/5-henan.wav

Note

Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.

Caution

If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

You should see the following output:

/project/sherpa-onnx/csrc/parse-options.cc:Read:361 sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-2024-03-09/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-2024-03-09/model.onnx ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/0.wav ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/1.wav ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/8k.wav ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/2-zh-en.wav ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/3-sichuan.wav ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/4-tianjin.wav ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/5-henan.wav 

OfflineRecognizerConfig(feat_config=OfflineFeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-2024-03-09/model.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), tokens="./sherpa-onnx-paraformer-zh-2024-03-09/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0)
Creating recognizer ...
Started
/project/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:119 Creating a resampler:
   in_sample_rate: 8000
   output_sample_rate: 16000

Done!

./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/0.wav
{"text": "对我做了介绍啊那么我想说的是呢大家如果对我的研究感兴趣呢你", "timestamps": [], "tokens":["对", "我", "做", "了", "介", "绍", "啊", "那", "么", "我", "想", "说", "的", "是", "呢", "大", "家", "如", "果", "对", "我", "的", "研", "究", "感", "兴", "趣", "呢", "你"]}
----
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/1.wav
{"text": "重点呢想谈三个问题首先呢就是这一轮全球金融动荡的表现", "timestamps": [], "tokens":["重", "点", "呢", "想", "谈", "三", "个", "问", "题", "首", "先", "呢", "就", "是", "这", "一", "轮", "全", "球", "金", "融", "动", "荡", "的", "表", "现"]}
----
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/8k.wav
{"text": "深入的分析这一次全球金融动荡背后的根源", "timestamps": [], "tokens":["深", "入", "的", "分", "析", "这", "一", "次", "全", "球", "金", "融", "动", "荡", "背", "后", "的", "根", "源"]}
----
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/2-zh-en.wav
{"text": " yesterday was 星期一 today is tuesday 明天是星期三", "timestamps": [], "tokens":["ye@@", "ster@@", "day", "was", "星", "期", "一", "today", "is", "tu@@", "es@@", "day", "明", "天", "是", "星", "期", "三"]}
----
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/3-sichuan.wav
{"text": "自己就是在那个在那个就是在情节里面就是感觉是演的特别好就是好像很真实一样你知道吧", "timestamps": [], "tokens":["自", "己", "就", "是", "在", "那", "个", "在", "那", "个", "就", "是", "在", "情", "节", "里", "面", "就", "是", "感", "觉", "是", "演", "的", "特", "别", "好", "就", "是", "好", "像", "很", "真", "实", "一", "样", "你", "知", "道", "吧"]}
----
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/4-tianjin.wav
{"text": "其实他就是怕每个人都可以守法就这意思法律意识太单薄了而且就是嗯也不顾及到别人的感受", "timestamps": [], "tokens":["其", "实", "他", "就", "是", "怕", "每", "个", "人", "都", "可", "以", "守", "法", "就", "这", "意", "思", "法", "律", "意", "识", "太", "单", "薄", "了", "而", "且", "就", "是", "嗯", "也", "不", "顾", "及", "到", "别", "人", "的", "感", "受"]}
----
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/5-henan.wav
{"text": "他这个管一向都通到有时候都通到七八层楼高然后他的管一向就可以交到那个那柱子上", "timestamps": [], "tokens":["他", "这", "个", "管", "一", "向", "都", "通", "到", "有", "时", "候", "都", "通", "到", "七", "八", "层", "楼", "高", "然", "后", "他", "的", "管", "一", "向", "就", "可", "以", "交", "到", "那", "个", "那", "柱", "子", "上"]}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 6.829 s
Real time factor (RTF): 6.829 / 47.023 = 0.145
cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-microphone-offline \
  --tokens=./sherpa-onnx-paraformer-zh-2024-03-09/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-zh-2024-03-09/model.int8.onnx

csukuangfj/sherpa-onnx-paraformer-zh-2023-03-28 (Chinese + English)

Note

This model does not support timestamps. It is a bilingual model, supporting both Chinese and English. (支持普通话、河南话、天津话、四川话等方言)

This model is converted from

https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch

The code for converting can be found at

https://huggingface.co/csukuangfj/paraformer-onnxruntime-python-example/tree/main

In the following, we describe how to download it and use it with sherpa-onnx.

Please use the following commands to download it.

cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-paraformer-zh-2023-03-28.tar.bz2

tar xvf sherpa-onnx-paraformer-zh-2023-03-28.tar.bz2

Please check that the file sizes of the pre-trained models are correct. See the file sizes of *.onnx files below.

sherpa-onnx-paraformer-zh-2023-03-28$ ls -lh *.onnx
-rw-r--r-- 1 kuangfangjun root 214M Apr  1 07:28 model.int8.onnx
-rw-r--r-- 1 kuangfangjun root 824M Apr  1 07:28 model.onnx

Hint

It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.

fp32

The following code shows how to use fp32 models to decode wave files:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-offline \
  --tokens=./sherpa-onnx-paraformer-zh-2023-03-28/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-zh-2023-03-28/model.onnx \
  ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/0.wav \
  ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/1.wav \
  ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/2.wav \
  ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/3-sichuan.wav \
  ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/4-tianjin.wav \
  ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/5-henan.wav \
  ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/6-zh-en.wav \
  ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/8k.wav

Note

Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.

Caution

If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

You should see the following output:

/project/sherpa-onnx/csrc/parse-options.cc:Read:361 sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-2023-03-28/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-2023-03-28/model.onnx ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/0.wav ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/1.wav ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/2.wav ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/3-sichuan.wav ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/4-tianjin.wav ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/5-henan.wav ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/6-zh-en.wav ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/8k.wav 

OfflineRecognizerConfig(feat_config=OfflineFeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-2023-03-28/model.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), tokens="./sherpa-onnx-paraformer-zh-2023-03-28/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0)
Creating recognizer ...
Started
/project/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:119 Creating a resampler:
   in_sample_rate: 8000
   output_sample_rate: 16000

Done!

./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/0.wav
{"text": "对我做了介绍啊那么我想说的是呢大家如果对我的研究感兴趣呢你", "timestamps": [], "tokens":["对", "我", "做", "了", "介", "绍", "啊", "那", "么", "我", "想", "说", "的", "是", "呢", "大", "家", "如", "果", "对", "我", "的", "研", "究", "感", "兴", "趣", "呢", "你"]}
----
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/1.wav
{"text": "重点呢想谈三个问题首先呢就是这一轮全球金融动荡的表现", "timestamps": [], "tokens":["重", "点", "呢", "想", "谈", "三", "个", "问", "题", "首", "先", "呢", "就", "是", "这", "一", "轮", "全", "球", "金", "融", "动", "荡", "的", "表", "现"]}
----
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/2.wav
{"text": "深入的分析这一次全球金融动荡背后的根源", "timestamps": [], "tokens":["深", "入", "的", "分", "析", "这", "一", "次", "全", "球", "金", "融", "动", "荡", "背", "后", "的", "根", "源"]}
----
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/3-sichuan.wav
{"text": "自己就是在那个在那个就是在情节里面就是感觉是演的特别好就是好像很真实一样你知道吧", "timestamps": [], "tokens":["自", "己", "就", "是", "在", "那", "个", "在", "那", "个", "就", "是", "在", "情", "节", "里", "面", "就", "是", "感", "觉", "是", "演", "的", "特", "别", "好", "就", "是", "好", "像", "很", "真", "实", "一", "样", "你", "知", "道", "吧"]}
----
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/4-tianjin.wav
{"text": "其实他就是那每个人都可以守法就这意思法律意识太单薄了而且就是嗯也不顾及到别人的感受", "timestamps": [], "tokens":["其", "实", "他", "就", "是", "那", "每", "个", "人", "都", "可", "以", "守", "法", "就", "这", "意", "思", "法", "律", "意", "识", "太", "单", "薄", "了", "而", "且", "就", "是", "嗯", "也", "不", "顾", "及", "到", "别", "人", "的", "感", "受"]}
----
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/5-henan.wav
{"text": "他这个管一向都通到有时候都通到七八层楼高然后他这管一向就可以浇到那个那柱子上", "timestamps": [], "tokens":["他", "这", "个", "管", "一", "向", "都", "通", "到", "有", "时", "候", "都", "通", "到", "七", "八", "层", "楼", "高", "然", "后", "他", "这", "管", "一", "向", "就", "可", "以", "浇", "到", "那", "个", "那", "柱", "子", "上"]}
----
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/6-zh-en.wav
{"text": " yesterday was 星期一 today is tuesday 明天是星期三", "timestamps": [], "tokens":["ye@@", "ster@@", "day", "was", "星", "期", "一", "today", "is", "tu@@", "es@@", "day", "明", "天", "是", "星", "期", "三"]}
----
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/8k.wav
{"text": "甚至出现交易几乎停滞的情况", "timestamps": [], "tokens":["甚", "至", "出", "现", "交", "易", "几", "乎", "停", "滞", "的", "情", "况"]}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 8.547 s
Real time factor (RTF): 8.547 / 51.236 = 0.167

int8

The following code shows how to use int8 models to decode wave files:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-offline \
  --tokens=./sherpa-onnx-paraformer-zh-2023-03-28/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-zh-2023-03-28/model.int8.onnx \
  ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/0.wav \
  ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/1.wav \
  ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/2.wav \
  ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/3-sichuan.wav \
  ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/4-tianjin.wav \
  ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/5-henan.wav \
  ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/6-zh-en.wav \
  ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/8k.wav

Note

Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.

Caution

If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

You should see the following output:

/project/sherpa-onnx/csrc/parse-options.cc:Read:361 sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-2023-03-28/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-2023-03-28/model.int8.onnx ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/0.wav ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/1.wav ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/2.wav ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/3-sichuan.wav ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/4-tianjin.wav ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/5-henan.wav ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/6-zh-en.wav ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/8k.wav 

OfflineRecognizerConfig(feat_config=OfflineFeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-2023-03-28/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), tokens="./sherpa-onnx-paraformer-zh-2023-03-28/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0)
Creating recognizer ...
Started
/project/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:119 Creating a resampler:
   in_sample_rate: 8000
   output_sample_rate: 16000

Done!

./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/0.wav
{"text": "对我做了介绍啊那么我想说的是呢大家如果对我的研究感兴趣呢你", "timestamps": [], "tokens":["对", "我", "做", "了", "介", "绍", "啊", "那", "么", "我", "想", "说", "的", "是", "呢", "大", "家", "如", "果", "对", "我", "的", "研", "究", "感", "兴", "趣", "呢", "你"]}
----
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/1.wav
{"text": "重点呢想谈三个问题首先呢就是这一轮全球金融动荡的表现", "timestamps": [], "tokens":["重", "点", "呢", "想", "谈", "三", "个", "问", "题", "首", "先", "呢", "就", "是", "这", "一", "轮", "全", "球", "金", "融", "动", "荡", "的", "表", "现"]}
----
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/2.wav
{"text": "深入的分析这一次全球金融动荡背后的根源", "timestamps": [], "tokens":["深", "入", "的", "分", "析", "这", "一", "次", "全", "球", "金", "融", "动", "荡", "背", "后", "的", "根", "源"]}
----
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/3-sichuan.wav
{"text": "自己就是在那个在那个就是在情节里面就是感觉是演的特别好就是好像很真实一样你知道吧", "timestamps": [], "tokens":["自", "己", "就", "是", "在", "那", "个", "在", "那", "个", "就", "是", "在", "情", "节", "里", "面", "就", "是", "感", "觉", "是", "演", "的", "特", "别", "好", "就", "是", "好", "像", "很", "真", "实", "一", "样", "你", "知", "道", "吧"]}
----
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/4-tianjin.wav
{"text": "其实他就是那每个人都可以守法就这意思法律意识太单薄了而且就是嗯也不顾及到别人的感受", "timestamps": [], "tokens":["其", "实", "他", "就", "是", "那", "每", "个", "人", "都", "可", "以", "守", "法", "就", "这", "意", "思", "法", "律", "意", "识", "太", "单", "薄", "了", "而", "且", "就", "是", "嗯", "也", "不", "顾", "及", "到", "别", "人", "的", "感", "受"]}
----
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/5-henan.wav
{"text": "他这个管一向都通到有时候都通到七八层楼高然后他这管一向就可以浇到那个那柱子上", "timestamps": [], "tokens":["他", "这", "个", "管", "一", "向", "都", "通", "到", "有", "时", "候", "都", "通", "到", "七", "八", "层", "楼", "高", "然", "后", "他", "这", "管", "一", "向", "就", "可", "以", "浇", "到", "那", "个", "那", "柱", "子", "上"]}
----
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/6-zh-en.wav
{"text": " yesterday was 星期一 today is tuesday 明天是星期三", "timestamps": [], "tokens":["ye@@", "ster@@", "day", "was", "星", "期", "一", "today", "is", "tu@@", "es@@", "day", "明", "天", "是", "星", "期", "三"]}
----
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/8k.wav
{"text": "甚至出现交易几乎停滞的情况", "timestamps": [], "tokens":["甚", "至", "出", "现", "交", "易", "几", "乎", "停", "滞", "的", "情", "况"]}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 6.439 s
Real time factor (RTF): 6.439 / 51.236 = 0.126
cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-microphone-offline \
  --tokens=./sherpa-onnx-paraformer-zh-2023-03-28/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-zh-2023-03-28/model.int8.onnx

csukuangfj/sherpa-onnx-paraformer-zh-2023-09-14 (Chinese + English))

Note

This model supports timestamps. It is a bilingual model, supporting both Chinese and English. (支持普通话、河南话、天津话、四川话等方言)

This model is converted from

https://www.modelscope.cn/models/iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-onnx/summary

In the following, we describe how to download it and use it with sherpa-onnx.

Please use the following commands to download it.

cd /path/to/sherpa-onnx

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-paraformer-zh-2023-09-14.tar.bz2

tar xvf sherpa-onnx-paraformer-zh-2023-09-14.tar.bz2

Please check that the file sizes of the pre-trained models are correct. See the file sizes of *.onnx files below.

sherpa-onnx-paraformer-zh-2023-09-14$ ls -lh *.onnx
-rw-r--r--  1 fangjun  staff   232M Sep 14 13:46 model.int8.onnx

Hint

It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.

int8

The following code shows how to use int8 models to decode wave files:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-offline \
  --tokens=./sherpa-onnx-paraformer-zh-2023-09-14/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-zh-2023-09-14/model.int8.onnx \
  --model-type=paraformer \
  ./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/0.wav \
  ./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/1.wav \
  ./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/2.wav \
  ./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/3-sichuan.wav \
  ./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/4-tianjin.wav \
  ./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/5-henan.wav \
  ./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/6-zh-en.wav \
  ./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/8k.wav

Note

Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.

Caution

If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

You should see the following output:

/project/sherpa-onnx/csrc/parse-options.cc:Read:361 sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-2023-09-14/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-2023-09-14/model.int8.onnx --model-type=paraformer ./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/0.wav ./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/1.wav ./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/2.wav ./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/3-sichuan.wav ./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/4-tianjin.wav ./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/5-henan.wav ./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/6-zh-en.wav ./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/8k.wav

OfflineRecognizerConfig(feat_config=OfflineFeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-2023-09-14/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), tokens="./sherpa-onnx-paraformer-zh-2023-09-14/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type="paraformer"), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0)
Creating recognizer ...
Started
/project/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:119 Creating a resampler:
   in_sample_rate: 8000
   output_sample_rate: 16000

Done!

./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/0.wav
{"text": "对我做了介绍啊那么我想说的是呢大家如果对我的研究感兴趣呢你", "timestamps": [0.36, 0.48, 0.62, 0.72, 0.86, 1.02, 1.32, 1.74, 1.90, 2.12, 2.20, 2.38, 2.50, 2.62, 2.74, 3.18, 3.32, 3.52, 3.62, 3.74, 3.82, 3.90, 3.98, 4.08, 4.20, 4.34, 4.56, 4.74, 5.10], "tokens":["对", "我", "做", "了", "介", "绍", "啊", "那", "么", "我", "想", "说", "的", "是", "呢", "大", "家", "如", "果", "对", "我", "的", "研", "究", "感", "兴", "趣", "呢", "你"]}
----
./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/1.wav
{"text": "重点呢想谈三个问题首先呢就是这一轮全球金融动荡的表现", "timestamps": [0.16, 0.30, 0.42, 0.56, 0.72, 0.96, 1.08, 1.20, 1.30, 2.08, 2.26, 2.44, 2.58, 2.72, 2.98, 3.14, 3.26, 3.46, 3.62, 3.80, 3.88, 4.02, 4.12, 4.20, 4.36, 4.56], "tokens":["重", "点", "呢", "想", "谈", "三", "个", "问", "题", "首", "先", "呢", "就", "是", "这", "一", "轮", "全", "球", "金", "融", "动", "荡", "的", "表", "现"]}
----
./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/2.wav
{"text": "深入的分析这一次全球金融动荡背后的根源", "timestamps": [0.34, 0.54, 0.66, 0.80, 1.08, 1.52, 1.72, 1.90, 2.40, 2.68, 2.86, 2.96, 3.16, 3.26, 3.46, 3.54, 3.66, 3.80, 3.90], "tokens":["深", "入", "的", "分", "析", "这", "一", "次", "全", "球", "金", "融", "动", "荡", "背", "后", "的", "根", "源"]}
----
./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/3-sichuan.wav
{"text": "自己就是在那个在那个就是在情节里面就是感觉是演的特别好就是好像很真实一样你知道吧", "timestamps": [0.16, 0.30, 0.56, 0.72, 0.92, 1.18, 1.32, 1.88, 2.24, 2.40, 3.16, 3.28, 3.40, 3.54, 3.76, 3.88, 4.06, 4.24, 4.36, 4.56, 4.66, 4.88, 5.14, 5.30, 5.44, 5.60, 5.72, 5.84, 5.96, 6.14, 6.24, 6.38, 6.56, 6.78, 6.98, 7.08, 7.22, 7.38, 7.50, 7.62], "tokens":["自", "己", "就", "是", "在", "那", "个", "在", "那", "个", "就", "是", "在", "情", "节", "里", "面", "就", "是", "感", "觉", "是", "演", "的", "特", "别", "好", "就", "是", "好", "像", "很", "真", "实", "一", "样", "你", "知", "道", "吧"]}
----
./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/4-tianjin.wav
{"text": "其实他就是那每个人都可以守法就这意思法律意识太单薄了而且就是嗯也不顾及到别人的感受", "timestamps": [0.08, 0.24, 0.36, 0.56, 0.66, 0.78, 1.04, 1.14, 1.26, 1.38, 1.50, 1.58, 1.70, 1.84, 2.28, 2.38, 2.64, 2.74, 3.08, 3.28, 3.66, 3.80, 3.94, 4.14, 4.34, 4.64, 4.84, 4.94, 5.12, 5.24, 5.84, 6.10, 6.24, 6.44, 6.54, 6.66, 6.86, 7.02, 7.14, 7.24, 7.44], "tokens":["其", "实", "他", "就", "是", "那", "每", "个", "人", "都", "可", "以", "守", "法", "就", "这", "意", "思", "法", "律", "意", "识", "太", "单", "薄", "了", "而", "且", "就", "是", "嗯", "也", "不", "顾", "及", "到", "别", "人", "的", "感", "受"]}
----
./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/5-henan.wav
{"text": "他这个管一向都通到有时候都通到七八层楼高然后他这管一向就可以浇到那个那柱子上", "timestamps": [0.08, 0.20, 0.30, 0.42, 0.94, 1.14, 1.26, 1.46, 1.66, 2.28, 2.50, 2.62, 2.70, 2.82, 2.98, 3.14, 3.28, 3.52, 3.70, 3.86, 4.94, 5.06, 5.18, 5.30, 5.42, 5.66, 5.76, 5.94, 6.08, 6.24, 6.38, 6.60, 6.78, 6.96, 7.10, 7.30, 7.50, 7.62], "tokens":["他", "这", "个", "管", "一", "向", "都", "通", "到", "有", "时", "候", "都", "通", "到", "七", "八", "层", "楼", "高", "然", "后", "他", "这", "管", "一", "向", "就", "可", "以", "浇", "到", "那", "个", "那", "柱", "子", "上"]}
----
./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/6-zh-en.wav
{"text": " yesterday was 星期一 today is tuesday 明天是星期三", "timestamps": [0.36, 0.60, 0.84, 1.22, 2.24, 2.44, 2.74, 3.52, 4.06, 4.68, 5.00, 5.12, 5.76, 5.96, 6.24, 6.82, 7.02, 7.26], "tokens":["ye@@", "ster@@", "day", "was", "星", "期", "一", "today", "is", "tu@@", "es@@", "day", "明", "天", "是", "星", "期", "三"]}
----
./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/8k.wav
{"text": "甚至出现交易几乎停滞的情况", "timestamps": [0.48, 0.78, 1.04, 1.18, 1.52, 1.78, 2.06, 2.18, 2.50, 2.66, 2.88, 3.10, 3.30], "tokens":["甚", "至", "出", "现", "交", "易", "几", "乎", "停", "滞", "的", "情", "况"]}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 9.206 s
Real time factor (RTF): 9.206 / 51.236 = 0.180
cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-microphone-offline \
  --tokens=./sherpa-onnx-paraformer-zh-2023-09-14/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-zh-2023-09-14/model.int8.onnx \
  --model-type=paraformer