Paraformer models
Hint
Please refer to Installation to install sherpa-onnx before you read this section.
sherpa-onnx-paraformer-zh-int8-2025-10-07 (四川话、重庆话、川渝方言)
This model is converted from
It is fine-tuned on csukuangfj/sherpa-onnx-paraformer-zh-2023-03-28 (Chinese + English) with 10k hours
of Sichuanese (川渝方言) data.
In the following, we describe how to use it.
Huggingface space
You can visit
to try this model in your browser.
Hint
You need to first select the language 四川话
and then select the model csukuangfj/sherpa-onnx-paraformer-zh-int8-2025-10-07.
Android APKs
Real-time speech recognition Android APKs can be found at
Please always download the latest version.
Hint
Please search for paraformer_四川话.apk in the above page, e.g.,
sherpa-onnx-1.12.15-arm64-v8a-simulated_streaming_asr-zh-paraformer_四川话.apk.
Hint
For Chinese users, you can also visit https://k2-fsa.github.io/sherpa/onnx/android/apk-simulate-streaming-asr-cn.html
Download
Please use the following commands to download it:
cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-paraformer-zh-int8-2025-10-07.tar.bz2
tar xvf sherpa-onnx-paraformer-zh-int8-2025-10-07.tar.bz2
rm sherpa-onnx-paraformer-zh-int8-2025-10-07.tar.bz2
After downloading, you should find the following files:
ls -lh sherpa-onnx-paraformer-zh-int8-2025-10-07
total 491872
-rw-r--r--@ 1 fangjun staff 227M 7 Oct 20:19 model.int8.onnx
-rw-r--r--@ 1 fangjun staff 337B 7 Oct 20:19 README.md
drwxr-xr-x@ 18 fangjun staff 576B 7 Oct 20:19 test_wavs
-rw-r--r--@ 1 fangjun staff 74K 7 Oct 20:19 tokens.txt
ls sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/
1.wav 10.wav 11.wav 12.wav 13.wav 14.wav 15.wav 16.wav 2.wav 3.wav 4.wav 5.wav 6.wav 7.wav 8.wav 9.wav
In the following, we show how to decode the files sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/*.wav.
1.wav
| Wave filename | Content | Ground truth |
|---|---|---|
| 1.wav | 来,哥哥再给你唱首歌。好儿,哎呦,把伴奏给我放起来,放就放嘛,还要躲人家钩子。 |
./build/bin/sherpa-onnx-offline \
--tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt \
--paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx \
--num-threads=1 \
sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/1.wav
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx --num-threads=1 sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/1.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), telespeech_ctc="", tokens="./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
Started
Done!
sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/1.wav
{"lang": "", "emotion": "", "event": "", "text": "来哥哥再给你唱首歌二哎呦把伴奏给我放起来放就放嘛还要多人家钩子", "timestamps": [], "durations": [], "tokens":["来", "哥", "哥", "再", "给", "你", "唱", "首", "歌", "二", "哎", "呦", "把", "伴", "奏", "给", "我", "放", "起", "来", "放", "就", "放", "嘛", "还", "要", "多", "人", "家", "钩", "子"], "words": []}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.573 s
Real time factor (RTF): 0.573 / 7.808 = 0.073
2.wav
| Wave filename | Content | Ground truth |
|---|---|---|
| 2.wav | 对不起,只有二娃才能让我真正体会作为女人的快乐。 |
./build/bin/sherpa-onnx-offline \
--tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt \
--paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx \
--num-threads=1 \
sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/2.wav
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx --num-threads=1 sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/2.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), telespeech_ctc="", tokens="./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
Started
Done!
sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/2.wav
{"lang": "", "emotion": "", "event": "", "text": "对不起只有二娃才能让我真正体会作为女人的快乐", "timestamps": [], "durations": [], "tokens":["对", "不", "起", "只", "有", "二", "娃", "才", "能", "让", "我", "真", "正", "体", "会", "作", "为", "女", "人", "的", "快", "乐"], "words": []}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.325 s
Real time factor (RTF): 0.325 / 4.352 = 0.075
3.wav
| Wave filename | Content | Ground truth |
|---|---|---|
| 3.wav | 我想去逛街,欢迎进入直播间,晚上好,那我的名字是怎么说的呢? |
./build/bin/sherpa-onnx-offline \
--tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt \
--paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx \
--num-threads=1 \
sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/3.wav
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx --num-threads=1 sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/3.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), telespeech_ctc="", tokens="./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
Started
Done!
sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/3.wav
{"lang": "", "emotion": "", "event": "", "text": "我想去逛街欢迎进入直播间晚上好那我的名字是怎么说的嘞", "timestamps": [], "durations": [], "tokens":["我", "想", "去", "逛", "街", "欢", "迎", "进", "入", "直", "播", "间", "晚", "上", "好", "那", "我", "的", "名", "字", "是", "怎", "么", "说", "的", "嘞"], "words": []}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.498 s
Real time factor (RTF): 0.498 / 6.784 = 0.073
4.wav
| Wave filename | Content | Ground truth |
|---|---|---|
| 4.wav | 梦见的就是你,不行啊,有四川话根本唱不起来,根本唱不起来呀! |
./build/bin/sherpa-onnx-offline \
--tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt \
--paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx \
--num-threads=1 \
sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/4.wav
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx --num-threads=1 sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/4.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), telespeech_ctc="", tokens="./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
Started
Done!
sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/4.wav
{"lang": "", "emotion": "", "event": "", "text": "梦见就是你不行啊有是这话根本藏不起啊根本藏不起来呀", "timestamps": [], "durations": [], "tokens":["梦", "见", "就", "是", "你", "不", "行", "啊", "有", "是", "这", "话", "根", "本", "藏", "不", "起", "啊", "根", "本", "藏", "不", "起", "来", "呀"], "words": []}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.565 s
Real time factor (RTF): 0.565 / 7.552 = 0.075
5.wav
| Wave filename | Content | Ground truth |
|---|---|---|
| 5.wav | 就临走那天挑了个飘了一下嗨呀,弟弟灵魂儿就飞上九霄云,就飘着一下魂都飞了,对不对? |
./build/bin/sherpa-onnx-offline \
--tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt \
--paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx \
--num-threads=1 \
sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/5.wav
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx --num-threads=1 sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/5.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), telespeech_ctc="", tokens="./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
Started
Done!
sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/5.wav
{"lang": "", "emotion": "", "event": "", "text": "就临走那个挑了个飘了一眼嗨呀弟弟灵魂儿就飞上九霄云就飘这一下魂都飞了对不对", "timestamps": [], "durations": [], "tokens":["就", "临", "走", "那", "个", "挑", "了", "个", "飘", "了", "一", "眼", "嗨", "呀", "弟", "弟", "灵", "魂", "儿", "就", "飞", "上", "九", "霄", "云", "就", "飘", "这", "一", "下", "魂", "都", "飞", "了", "对", "不", "对"], "words": []}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.696 s
Real time factor (RTF): 0.696 / 9.088 = 0.077
6.wav
| Wave filename | Content | Ground truth |
|---|---|---|
| 6.wav | 是不是给人感觉后头是青花亮色的然后说话是很平和的眼神是不慌乱的不散的。 |
./build/bin/sherpa-onnx-offline \
--tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt \
--paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx \
--num-threads=1 \
sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/6.wav
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx --num-threads=1 sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/6.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), telespeech_ctc="", tokens="./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
Started
Done!
sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/6.wav
{"lang": "", "emotion": "", "event": "", "text": "是不是给人感觉好多是青花亮色的然后说话是很平和的眼神是不慌乱的不闪的", "timestamps": [], "durations": [], "tokens":["是", "不", "是", "给", "人", "感", "觉", "好", "多", "是", "青", "花", "亮", "色", "的", "然", "后", "说", "话", "是", "很", "平", "和", "的", "眼", "神", "是", "不", "慌", "乱", "的", "不", "闪", "的"], "words": []}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.589 s
Real time factor (RTF): 0.589 / 7.808 = 0.075
7.wav
| Wave filename | Content | Ground truth |
|---|---|---|
| 7.wav | 他坐在椅子上,挺直起腰杆,脸上展现出灿烂的笑容。 |
./build/bin/sherpa-onnx-offline \
--tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt \
--paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx \
--num-threads=1 \
sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/7.wav
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx --num-threads=1 sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/7.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), telespeech_ctc="", tokens="./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
Started
Done!
sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/7.wav
{"lang": "", "emotion": "", "event": "", "text": "他坐在椅子上挺直起腰杆脸上展现出灿烂的笑容", "timestamps": [], "durations": [], "tokens":["他", "坐", "在", "椅", "子", "上", "挺", "直", "起", "腰", "杆", "脸", "上", "展", "现", "出", "灿", "烂", "的", "笑", "容"], "words": []}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.559 s
Real time factor (RTF): 0.559 / 7.600 = 0.074
8.wav
| Wave filename | Content | Ground truth |
|---|---|---|
| 8.wav | 唤起路由无限的感慨,使他,更加痛恨官场的欺诈污浊。 |
./build/bin/sherpa-onnx-offline \
--tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt \
--paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx \
--num-threads=1 \
sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/8.wav
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx --num-threads=1 sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/8.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), telespeech_ctc="", tokens="./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
Started
Done!
sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/8.wav
{"lang": "", "emotion": "", "event": "", "text": "换起路游无限的感慨使他更加痛恨官场的倾炸污浊", "timestamps": [], "durations": [], "tokens":["换", "起", "路", "游", "无", "限", "的", "感", "慨", "使", "他", "更", "加", "痛", "恨", "官", "场", "的", "倾", "炸", "污", "浊"], "words": []}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.541 s
Real time factor (RTF): 0.541 / 7.296 = 0.074
9.wav
| Wave filename | Content | Ground truth |
|---|---|---|
| 9.wav | 看面貌约五十左右却自称活了两百多岁,在清顺治时出家当过和尚,还有杜蝶为证。 |
./build/bin/sherpa-onnx-offline \
--tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt \
--paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx \
--num-threads=1 \
sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/9.wav
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx --num-threads=1 sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/9.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), telespeech_ctc="", tokens="./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
Started
Done!
sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/9.wav
{"lang": "", "emotion": "", "event": "", "text": "看面貌约五十左右却自称活了两百多岁在清顺治时出家当过和尚还有杜蝶为政", "timestamps": [], "durations": [], "tokens":["看", "面", "貌", "约", "五", "十", "左", "右", "却", "自", "称", "活", "了", "两", "百", "多", "岁", "在", "清", "顺", "治", "时", "出", "家", "当", "过", "和", "尚", "还", "有", "杜", "蝶", "为", "政"], "words": []}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.694 s
Real time factor (RTF): 0.694 / 9.088 = 0.076
10.wav
| Wave filename | Content | Ground truth |
|---|---|---|
| 10.wav | 其言曰,士大夫以其见闻之广反各有所偏,自有负担杀者有负良骑者。 |
./build/bin/sherpa-onnx-offline \
--tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt \
--paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx \
--num-threads=1 \
sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/10.wav
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx --num-threads=1 sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/10.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), telespeech_ctc="", tokens="./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
Started
Done!
sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/10.wav
{"lang": "", "emotion": "", "event": "", "text": "其言曰士大夫一其见闻之广反各有所偏自有福单杀者有福良记者", "timestamps": [], "durations": [], "tokens":["其", "言", "曰", "士", "大", "夫", "一", "其", "见", "闻", "之", "广", "反", "各", "有", "所", "偏", "自", "有", "福", "单", "杀", "者", "有", "福", "良", "记", "者"], "words": []}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.611 s
Real time factor (RTF): 0.611 / 8.192 = 0.075
11.wav
| Wave filename | Content | Ground truth |
|---|---|---|
| 11.wav | 据说有网友坐飞机的时候呢,广播全程播报。 |
./build/bin/sherpa-onnx-offline \
--tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt \
--paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx \
--num-threads=1 \
sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/11.wav
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx --num-threads=1 sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/11.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), telespeech_ctc="", tokens="./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
Started
Done!
sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/11.wav
{"lang": "", "emotion": "", "event": "", "text": "据说有网友坐飞机的时候呢广播全程播报", "timestamps": [], "durations": [], "tokens":["据", "说", "有", "网", "友", "坐", "飞", "机", "的", "时", "候", "呢", "广", "播", "全", "程", "播", "报"], "words": []}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.294 s
Real time factor (RTF): 0.294 / 3.712 = 0.079
12.wav
| Wave filename | Content | Ground truth |
|---|---|---|
| 12.wav | 将溃疡两周以上都应该及时就医,据了解啊小云平时呢都喜欢吃比较烫的饭菜,也喜欢吃麻辣烫火锅之类的高温食物。 |
./build/bin/sherpa-onnx-offline \
--tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt \
--paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx \
--num-threads=1 \
sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/12.wav
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx --num-threads=1 sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/12.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), telespeech_ctc="", tokens="./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
Started
Done!
sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/12.wav
{"lang": "", "emotion": "", "event": "", "text": "腔溃疡两周以上都应该及时就医据了解啊小云平时呢都喜欢吃比较烫的饭菜也喜欢吃麻辣烫火锅之类的高温食物", "timestamps": [], "durations": [], "tokens":["腔", "溃", "疡", "两", "周", "以", "上", "都", "应", "该", "及", "时", "就", "医", "据", "了", "解", "啊", "小", "云", "平", "时", "呢", "都", "喜", "欢", "吃", "比", "较", "烫", "的", "饭", "菜", "也", "喜", "欢", "吃", "麻", "辣", "烫", "火", "锅", "之", "类", "的", "高", "温", "食", "物"], "words": []}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.818 s
Real time factor (RTF): 0.818 / 10.880 = 0.075
13.wav
| Wave filename | Content | Ground truth |
|---|---|---|
| 13.wav | 绝佳好位置好像我被看到了,就问你敢不敢进来吧你,一套带走猪脚亮。 |
./build/bin/sherpa-onnx-offline \
--tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt \
--paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx \
--num-threads=1 \
sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/13.wav
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx --num-threads=1 sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/13.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), telespeech_ctc="", tokens="./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
Started
Done!
sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/13.wav
{"lang": "", "emotion": "", "event": "", "text": "绝佳好位置好像我被看到了如问你敢不敢进来把你一套带走猪脚亮", "timestamps": [], "durations": [], "tokens":["绝", "佳", "好", "位", "置", "好", "像", "我", "被", "看", "到", "了", "如", "问", "你", "敢", "不", "敢", "进", "来", "把", "你", "一", "套", "带", "走", "猪", "脚", "亮"], "words": []}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.444 s
Real time factor (RTF): 0.444 / 5.760 = 0.077
14.wav
| Wave filename | Content | Ground truth |
|---|---|---|
| 14.wav | 两岸猿声啼不住,有家难回车里住。 |
./build/bin/sherpa-onnx-offline \
--tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt \
--paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx \
--num-threads=1 \
sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/14.wav
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx --num-threads=1 sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/14.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), telespeech_ctc="", tokens="./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
Started
Done!
sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/14.wav
{"lang": "", "emotion": "", "event": "", "text": "两岸猿声啼不住有家难回车里住", "timestamps": [], "durations": [], "tokens":["两", "岸", "猿", "声", "啼", "不", "住", "有", "家", "难", "回", "车", "里", "住"], "words": []}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.418 s
Real time factor (RTF): 0.418 / 5.632 = 0.074
15.wav
| Wave filename | Content | Ground truth |
|---|---|---|
| 15.wav | 杨大人一律就退还会再要求,以关注货币,来补助这个差额,天宝年间杨胜坚转任。 |
./build/bin/sherpa-onnx-offline \
--tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt \
--paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx \
--num-threads=1 \
sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/15.wav
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx --num-threads=1 sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/15.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), telespeech_ctc="", tokens="./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
Started
Done!
sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/15.wav
{"lang": "", "emotion": "", "event": "", "text": "杨大人一律就退还会者要求以关注和必来补助这个差额天宝年间杨胜坚转任", "timestamps": [], "durations": [], "tokens":["杨", "大", "人", "一", "律", "就", "退", "还", "会", "者", "要", "求", "以", "关", "注", "和", "必", "来", "补", "助", "这", "个", "差", "额", "天", "宝", "年", "间", "杨", "胜", "坚", "转", "任"], "words": []}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.758 s
Real time factor (RTF): 0.758 / 10.112 = 0.075
16.wav
| Wave filename | Content | Ground truth |
|---|---|---|
| 16.wav | 做钱的速度还快,这真的是,一个经济爆发式增长的时代。 |
./build/bin/sherpa-onnx-offline \
--tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt \
--paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx \
--num-threads=1 \
sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/16.wav
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx --num-threads=1 sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/16.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), telespeech_ctc="", tokens="./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
Started
Done!
sherpa-onnx-paraformer-zh-int8-2025-10-07/test_wavs/16.wav
{"lang": "", "emotion": "", "event": "", "text": "做钱的速度还快这真的是一个经济爆发式增长的时代", "timestamps": [], "durations": [], "tokens":["做", "钱", "的", "速", "度", "还", "快", "这", "真", "的", "是", "一", "个", "经", "济", "爆", "发", "式", "增", "长", "的", "时", "代"], "words": []}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.428 s
Real time factor (RTF): 0.428 / 5.632 = 0.076
Speech recognition from a microphone
./build/bin/sherpa-onnx-microphone-offline \
--tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt \
--paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx
Speech recognition from a microphone with VAD
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
./build/bin/sherpa-onnx-vad-microphone-offline-asr \
--silero-vad-model=./silero_vad.onnx \
--tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt \
--paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx
Real-time speech recognition from a microphone with VAD
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
./build/bin/sherpa-onnx-vad-microphone-simulated-streaming-asr \
--silero-vad-model=./silero_vad.onnx \
--tokens=./sherpa-onnx-paraformer-zh-int8-2025-10-07/tokens.txt \
--paraformer=./sherpa-onnx-paraformer-zh-int8-2025-10-07/model.int8.onnx
csukuangfj/sherpa-onnx-paraformer-trilingual-zh-cantonese-en (Chinese + English + Cantonese 粤语)
Note
This model does not support timestamps. It is a trilingual model, supporting
both Chinese and English. (支持普通话、粤语、河南话、天津话、四川话等方言)
This model is converted from
In the following, we describe how to download it and use it with sherpa-onnx.
Please use the following commands to download it.
cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-paraformer-trilingual-zh-cantonese-en.tar.bz2
tar xvf sherpa-onnx-paraformer-trilingual-zh-cantonese-en.tar.bz2
Please check that the file sizes of the pre-trained models are correct. See
the file sizes of *.onnx files below.
sherpa-onnx-paraformer-trilingual-zh-cantonese-en$ ls -lh *.onnx
-rw-r--r-- 1 1001 127 234M Mar 10 02:12 model.int8.onnx
-rw-r--r-- 1 1001 127 831M Mar 10 02:12 model.onnx
Hint
It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.
fp32
The following code shows how to use fp32 models to decode wave files:
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-offline \
--tokens=./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/tokens.txt \
--paraformer=./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/model.onnx \
./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/1.wav \
./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/2.wav \
./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/3-sichuan.wav \
./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/4-tianjin.wav \
./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/5-henan.wav \
./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/6-zh-en.wav
Note
Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.
Caution
If you use Windows and get encoding issues, please run:
CHCP 65001
in your commandline.
You should see the following output:
/project/sherpa-onnx/csrc/parse-options.cc:Read:361 sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/tokens.txt --paraformer=./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/model.onnx ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/1.wav ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/2.wav ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/3-sichuan.wav ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/4-tianjin.wav ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/5-henan.wav ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/6-zh-en.wav
OfflineRecognizerConfig(feat_config=OfflineFeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/model.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), tokens="./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0)
Creating recognizer ...
Started
/project/sherpa-onnx/csrc/offline-paraformer-greedy-search-decoder.cc:Decode:65 time stamp for batch: 0, 13 vs -1
/project/sherpa-onnx/csrc/offline-paraformer-greedy-search-decoder.cc:Decode:65 time stamp for batch: 1, 15 vs -1
/project/sherpa-onnx/csrc/offline-paraformer-greedy-search-decoder.cc:Decode:65 time stamp for batch: 2, 40 vs -1
/project/sherpa-onnx/csrc/offline-paraformer-greedy-search-decoder.cc:Decode:65 time stamp for batch: 3, 41 vs -1
/project/sherpa-onnx/csrc/offline-paraformer-greedy-search-decoder.cc:Decode:65 time stamp for batch: 4, 37 vs -1
/project/sherpa-onnx/csrc/offline-paraformer-greedy-search-decoder.cc:Decode:65 time stamp for batch: 5, 16 vs -1
Done!
./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/1.wav
{"text": "有无人知道湾仔活道系点去㗎", "timestamps": [], "tokens":["有", "无", "人", "知", "道", "湾", "仔", "活", "道", "系", "点", "去", "㗎"]}
----
./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/2.wav
{"text": "我喺黄大仙九龙塘联合道荡失路啊", "timestamps": [], "tokens":["我", "喺", "黄", "大", "仙", "九", "龙", "塘", "联", "合", "道", "荡", "失", "路", "啊"]}
----
./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/3-sichuan.wav
{"text": "自己就是在那个在那个就是在情节里面就是感觉是演得特别好就是好像很真实一样你知道吧", "timestamps": [], "tokens":["自", "己", "就", "是", "在", "那", "个", "在", "那", "个", "就", "是", "在", "情", "节", "里", "面", "就", "是", "感", "觉", "是", "演", "得", "特", "别", "好", "就", "是", "好", "像", "很", "真", "实", "一", "样", "你", "知", "道", "吧"]}
----
./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/4-tianjin.wav
{"text": "其实他就是怕每个人都可以守法就这意思法律意识太单薄了而且就是嗯也不顾及到别人的感受", "timestamps": [], "tokens":["其", "实", "他", "就", "是", "怕", "每", "个", "人", "都", "可", "以", "守", "法", "就", "这", "意", "思", "法", "律", "意", "识", "太", "单", "薄", "了", "而", "且", "就", "是", "嗯", "也", "不", "顾", "及", "到", "别", "人", "的", "感", "受"]}
----
./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/5-henan.wav
{"text": "它这个管一下都通到有时候都通到七八层楼高然后它这管一下就可以浇到那那柱子上", "timestamps": [], "tokens":["它", "这", "个", "管", "一", "下", "都", "通", "到", "有", "时", "候", "都", "通", "到", "七", "八", "层", "楼", "高", "然", "后", "它", "这", "管", "一", "下", "就", "可", "以", "浇", "到", "那", "那", "柱", "子", "上"]}
----
./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/6-zh-en.wav
{"text": " yesterday was 星期一 today is tuesday 明天是星期三", "timestamps": [], "tokens":["yesterday", "was", "星", "期", "一", "today", "is", "tu@@", "es@@", "day", "明", "天", "是", "星", "期", "三"]}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 6.871 s
Real time factor (RTF): 6.871 / 42.054 = 0.163
int8
The following code shows how to use int8 models to decode wave files:
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-offline \
--tokens=./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/tokens.txt \
--paraformer=./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/model.int8.onnx \
./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/1.wav \
./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/2.wav \
./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/3-sichuan.wav \
./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/4-tianjin.wav \
./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/5-henan.wav \
./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/6-zh-en.wav
Note
Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.
Caution
If you use Windows and get encoding issues, please run:
CHCP 65001
in your commandline.
You should see the following output:
/project/sherpa-onnx/csrc/parse-options.cc:Read:361 sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/tokens.txt --paraformer=./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/model.int8.onnx ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/1.wav ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/2.wav ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/3-sichuan.wav ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/4-tianjin.wav ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/5-henan.wav ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/6-zh-en.wav
OfflineRecognizerConfig(feat_config=OfflineFeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), tokens="./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0)
Creating recognizer ...
Started
/project/sherpa-onnx/csrc/offline-paraformer-greedy-search-decoder.cc:Decode:65 time stamp for batch: 0, 13 vs -1
/project/sherpa-onnx/csrc/offline-paraformer-greedy-search-decoder.cc:Decode:65 time stamp for batch: 1, 15 vs -1
/project/sherpa-onnx/csrc/offline-paraformer-greedy-search-decoder.cc:Decode:65 time stamp for batch: 2, 40 vs -1
/project/sherpa-onnx/csrc/offline-paraformer-greedy-search-decoder.cc:Decode:65 time stamp for batch: 3, 41 vs -1
/project/sherpa-onnx/csrc/offline-paraformer-greedy-search-decoder.cc:Decode:65 time stamp for batch: 4, 37 vs -1
/project/sherpa-onnx/csrc/offline-paraformer-greedy-search-decoder.cc:Decode:65 time stamp for batch: 5, 16 vs -1
Done!
./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/1.wav
{"text": "有无人知道湾仔活道系点去㗎", "timestamps": [], "tokens":["有", "无", "人", "知", "道", "湾", "仔", "活", "道", "系", "点", "去", "㗎"]}
----
./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/2.wav
{"text": "我喺黄大仙九龙塘联合道荡失路啊", "timestamps": [], "tokens":["我", "喺", "黄", "大", "仙", "九", "龙", "塘", "联", "合", "道", "荡", "失", "路", "啊"]}
----
./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/3-sichuan.wav
{"text": "自己就是在那个在那个就是在情节里面就是感觉是演得特别好就是好像很真实一样你知道吧", "timestamps": [], "tokens":["自", "己", "就", "是", "在", "那", "个", "在", "那", "个", "就", "是", "在", "情", "节", "里", "面", "就", "是", "感", "觉", "是", "演", "得", "特", "别", "好", "就", "是", "好", "像", "很", "真", "实", "一", "样", "你", "知", "道", "吧"]}
----
./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/4-tianjin.wav
{"text": "其实他就是怕每个人都可以守法就这意思法律意识太单薄了而且就是嗯也不顾及到别人的感受", "timestamps": [], "tokens":["其", "实", "他", "就", "是", "怕", "每", "个", "人", "都", "可", "以", "守", "法", "就", "这", "意", "思", "法", "律", "意", "识", "太", "单", "薄", "了", "而", "且", "就", "是", "嗯", "也", "不", "顾", "及", "到", "别", "人", "的", "感", "受"]}
----
./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/5-henan.wav
{"text": "它这个管一下都通到有时候都通到七八层楼高然后它这管一下就可以浇到那那柱子上", "timestamps": [], "tokens":["它", "这", "个", "管", "一", "下", "都", "通", "到", "有", "时", "候", "都", "通", "到", "七", "八", "层", "楼", "高", "然", "后", "它", "这", "管", "一", "下", "就", "可", "以", "浇", "到", "那", "那", "柱", "子", "上"]}
----
./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/6-zh-en.wav
{"text": " yesterday was 星期一 today is tuesday 明天是星期三", "timestamps": [], "tokens":["yesterday", "was", "星", "期", "一", "today", "is", "tu@@", "es@@", "day", "明", "天", "是", "星", "期", "三"]}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 6.290 s
Real time factor (RTF): 6.290 / 42.054 = 0.150
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-microphone-offline \
--tokens=./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/tokens.txt \
--paraformer=./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/model.int8.onnx
csukuangfj/sherpa-onnx-paraformer-en-2024-03-09 (English)
Note
This model does not support timestamps. It supports only English.
This model is converted from
https://www.modelscope.cn/models/iic/speech_paraformer_asr-en-16k-vocab4199-pytorch/summary
In the following, we describe how to download it and use it with sherpa-onnx.
Please use the following commands to download it.
cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-paraformer-en-2024-03-09.tar.bz2
tar xvf sherpa-onnx-paraformer-en-2024-03-09.tar.bz2
Please check that the file sizes of the pre-trained models are correct. See
the file sizes of *.onnx files below.
sherpa-onnx-paraformer-en-2024-03-09$ ls -lh *.onnx
-rw-r--r-- 1 1001 127 220M Mar 10 02:12 model.int8.onnx
-rw-r--r-- 1 1001 127 817M Mar 10 02:12 model.onnx
Hint
It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.
fp32
The following code shows how to use fp32 models to decode wave files:
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-offline \
--tokens=./sherpa-onnx-paraformer-en-2024-03-09/tokens.txt \
--paraformer=./sherpa-onnx-paraformer-en-2024-03-09/model.onnx \
./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/0.wav \
./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/1.wav \
./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/8k.wav
Note
Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.
You should see the following output:
/project/sherpa-onnx/csrc/parse-options.cc:Read:361 sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-en-2024-03-09/tokens.txt --paraformer=./sherpa-onnx-paraformer-en-2024-03-09/model.onnx ./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/0.wav ./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/1.wav ./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/8k.wav
OfflineRecognizerConfig(feat_config=OfflineFeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-en-2024-03-09/model.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), tokens="./sherpa-onnx-paraformer-en-2024-03-09/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0)
Creating recognizer ...
Started
/project/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:119 Creating a resampler:
in_sample_rate: 8000
output_sample_rate: 16000
Done!
./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/0.wav
{"text": " after early nightfall the yellow lamps would light up here and there the squalid quarter of the brothels", "timestamps": [], "tokens":["after", "early", "ni@@", "ght@@", "fall", "the", "yel@@", "low", "la@@", "mp@@", "s", "would", "light", "up", "here", "and", "there", "the", "squ@@", "al@@", "id", "quarter", "of", "the", "bro@@", "the@@", "ls"]}
----
./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/1.wav
{"text": " god as a direct consequence of the sin which man thus punished had given her a lovely child whose place was 'on' that same dishonoured bosom to connect her parent for ever with the race and descent of mortals and to be finally a blessed soul in heaven", "timestamps": [], "tokens":["god", "as", "a", "direct", "con@@", "sequence", "of", "the", "sin", "which", "man", "thus", "p@@", "uni@@", "shed", "had", "given", "her", "a", "lo@@", "vely", "child", "whose", "place", "was", "'on'", "that", "same", "di@@", "sh@@", "on@@", "ou@@", "red", "bo@@", "so@@", "m", "to", "connect", "her", "paren@@", "t", "for", "ever", "with", "the", "race", "and", "des@@", "cent", "of", "mor@@", "tal@@", "s", "and", "to", "be", "finally", "a", "bl@@", "essed", "soul", "in", "hea@@", "ven"]}
----
./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/8k.wav
{"text": " yet these thoughts affected hester prynne less with hope than apprehension", "timestamps": [], "tokens":["yet", "these", "thoughts", "aff@@", "ected", "he@@", "ster", "pr@@", "y@@", "n@@", "ne", "less", "with", "hope", "than", "ap@@", "pre@@", "hen@@", "sion"]}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 7.173 s
Real time factor (RTF): 7.173 / 28.165 = 0.255
int8
The following code shows how to use int8 models to decode wave files:
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-offline \
--tokens=./sherpa-onnx-paraformer-en-2024-03-09/tokens.txt \
--paraformer=./sherpa-onnx-paraformer-en-2024-03-09/model.int8.onnx \
./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/0.wav \
./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/1.wav \
./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/8k.wav
Note
Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.
You should see the following output:
/project/sherpa-onnx/csrc/parse-options.cc:Read:361 sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-en-2024-03-09/tokens.txt --paraformer=./sherpa-onnx-paraformer-en-2024-03-09/model.int8.onnx ./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/0.wav ./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/1.wav ./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/8k.wav
OfflineRecognizerConfig(feat_config=OfflineFeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-en-2024-03-09/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), tokens="./sherpa-onnx-paraformer-en-2024-03-09/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0)
Creating recognizer ...
Started
/project/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:119 Creating a resampler:
in_sample_rate: 8000
output_sample_rate: 16000
Done!
./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/0.wav
{"text": " after early nightfall the yellow lamps would light up here and there the squalid quarter of the brothels", "timestamps": [], "tokens":["after", "early", "ni@@", "ght@@", "fall", "the", "yel@@", "low", "la@@", "mp@@", "s", "would", "light", "up", "here", "and", "there", "the", "squ@@", "al@@", "id", "quarter", "of", "the", "bro@@", "the@@", "ls"]}
----
./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/1.wav
{"text": " god as a direct consequence of the sin which man thus punished had given her a lovely child whose place was 'on' that same dishonoured bosom to connect her parent for ever with the race and descent of mortals and to be finally a blessed soul in heaven", "timestamps": [], "tokens":["god", "as", "a", "direct", "con@@", "sequence", "of", "the", "sin", "which", "man", "thus", "p@@", "uni@@", "shed", "had", "given", "her", "a", "lo@@", "vely", "child", "whose", "place", "was", "'on'", "that", "same", "di@@", "sh@@", "on@@", "ou@@", "red", "bo@@", "so@@", "m", "to", "connect", "her", "paren@@", "t", "for", "ever", "with", "the", "race", "and", "des@@", "cent", "of", "mor@@", "tal@@", "s", "and", "to", "be", "finally", "a", "bl@@", "essed", "soul", "in", "hea@@", "ven"]}
----
./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/8k.wav
{"text": " yet these thoughts affected hester prynne less with hope than apprehension", "timestamps": [], "tokens":["yet", "these", "thoughts", "aff@@", "ected", "he@@", "ster", "pr@@", "y@@", "n@@", "ne", "less", "with", "hope", "than", "ap@@", "pre@@", "hen@@", "sion"]}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 5.492 s
Real time factor (RTF): 5.492 / 28.165 = 0.195
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-microphone-offline \
--tokens=./sherpa-onnx-paraformer-en-2024-03-09/tokens.txt \
--paraformer=./sherpa-onnx-paraformer-en-2024-03-09/model.int8.onnx
csukuangfj/sherpa-onnx-paraformer-zh-small-2024-03-09 (Chinese + English)
Note
This model does not support timestamps. It is a bilingual model, supporting both Chinese and English. (支持普通话、河南话、天津话、四川话等方言)
This model is converted from
In the following, we describe how to download it and use it with sherpa-onnx.
Please use the following commands to download it.
cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-paraformer-zh-small-2024-03-09.tar.bz2
tar xvf sherpa-onnx-paraformer-zh-small-2024-03-09.tar.bz2
Please check that the file sizes of the pre-trained models are correct. See
the file sizes of *.onnx files below.
sherpa-onnx-paraformer-zh-small-2024-03-09$ ls -lh *.onnx
-rw-r--r-- 1 1001 127 79M Mar 10 00:48 model.int8.onnx
Hint
It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.
int8
The following code shows how to use int8 models to decode wave files:
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-offline \
--tokens=./sherpa-onnx-paraformer-zh-small-2024-03-09/tokens.txt \
--paraformer=./sherpa-onnx-paraformer-zh-small-2024-03-09/model.int8.onnx \
./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/0.wav \
./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/1.wav \
./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/8k.wav \
./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/2-zh-en.wav \
./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/3-sichuan.wav \
./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/4-tianjin.wav \
./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/5-henan.wav
Note
Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.
Caution
If you use Windows and get encoding issues, please run:
CHCP 65001
in your commandline.
You should see the following output:
/project/sherpa-onnx/csrc/parse-options.cc:Read:361 sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-small-2024-03-09/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-small-2024-03-09/model.int8.onnx ./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/0.wav ./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/1.wav ./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/8k.wav ./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/2-zh-en.wav ./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/3-sichuan.wav ./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/4-tianjin.wav ./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/5-henan.wav
OfflineRecognizerConfig(feat_config=OfflineFeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-small-2024-03-09/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), tokens="./sherpa-onnx-paraformer-zh-small-2024-03-09/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0)
Creating recognizer ...
Started
/project/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:119 Creating a resampler:
in_sample_rate: 8000
output_sample_rate: 16000
Done!
./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/0.wav
{"text": "对我做了介绍啊那么我想说的是呢大家如果对我的研究感兴趣呢", "timestamps": [], "tokens":["对", "我", "做", "了", "介", "绍", "啊", "那", "么", "我", "想", "说", "的", "是", "呢", "大", "家", "如", "果", "对", "我", "的", "研", "究", "感", "兴", "趣", "呢"]}
----
./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/1.wav
{"text": "重点呢想谈三个问题首先呢就是这一轮全球金融动荡的表现", "timestamps": [], "tokens":["重", "点", "呢", "想", "谈", "三", "个", "问", "题", "首", "先", "呢", "就", "是", "这", "一", "轮", "全", "球", "金", "融", "动", "荡", "的", "表", "现"]}
----
./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/8k.wav
{"text": "深入的分析这一次全球金融动荡背后的根源", "timestamps": [], "tokens":["深", "入", "的", "分", "析", "这", "一", "次", "全", "球", "金", "融", "动", "荡", "背", "后", "的", "根", "源"]}
----
./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/2-zh-en.wav
{"text": " yesterday was 星期一 today is tuesday 明天是星期三", "timestamps": [], "tokens":["ye@@", "ster@@", "day", "was", "星", "期", "一", "today", "is", "tu@@", "es@@", "day", "明", "天", "是", "星", "期", "三"]}
----
./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/3-sichuan.wav
{"text": "自己就是在那个在那个就是在情节里面就是感觉是演的特别好就是好像很真实一样你知道吧", "timestamps": [], "tokens":["自", "己", "就", "是", "在", "那", "个", "在", "那", "个", "就", "是", "在", "情", "节", "里", "面", "就", "是", "感", "觉", "是", "演", "的", "特", "别", "好", "就", "是", "好", "像", "很", "真", "实", "一", "样", "你", "知", "道", "吧"]}
----
./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/4-tianjin.wav
{"text": "其实他就是怕每个人都可以守法就这意思法律意识太单薄了而且就是嗯也不顾及到别人的感受", "timestamps": [], "tokens":["其", "实", "他", "就", "是", "怕", "每", "个", "人", "都", "可", "以", "守", "法", "就", "这", "意", "思", "法", "律", "意", "识", "太", "单", "薄", "了", "而", "且", "就", "是", "嗯", "也", "不", "顾", "及", "到", "别", "人", "的", "感", "受"]}
----
./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/5-henan.wav
{"text": "他这个管一向都通到有时候都通到七八层楼缸然后他这管一向就可以浇到那个那柱子上", "timestamps": [], "tokens":["他", "这", "个", "管", "一", "向", "都", "通", "到", "有", "时", "候", "都", "通", "到", "七", "八", "层", "楼", "缸", "然", "后", "他", "这", "管", "一", "向", "就", "可", "以", "浇", "到", "那", "个", "那", "柱", "子", "上"]}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 3.562 s
Real time factor (RTF): 3.562 / 47.023 = 0.076
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-microphone-offline \
--tokens=./sherpa-onnx-paraformer-zh-small-2024-03-09/tokens.txt \
--paraformer=./sherpa-onnx-paraformer-zh-small-2024-03-09/model.int8.onnx
csukuangfj/sherpa-onnx-paraformer-zh-2024-03-09 (Chinese + English)
Note
This model does not support timestamps. It is a bilingual model, supporting both Chinese and English. (支持普通话、河南话、天津话、四川话等方言)
This model is converted from
In the following, we describe how to download it and use it with sherpa-onnx.
Please use the following commands to download it.
cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-paraformer-zh-2024-03-09.tar.bz2
tar xvf sherpa-onnx-paraformer-zh-2024-03-09.tar.bz2
Please check that the file sizes of the pre-trained models are correct. See
the file sizes of *.onnx files below.
sherpa-onnx-paraformer-zh-2024-03-09$ ls -lh *.onnx
-rw-r--r-- 1 1001 127 217M Mar 10 02:22 model.int8.onnx
-rw-r--r-- 1 1001 127 785M Mar 10 02:22 model.onnx
Hint
It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.
fp32
The following code shows how to use fp32 models to decode wave files:
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-offline \
--tokens=./sherpa-onnx-paraformer-zh-2024-03-09/tokens.txt \
--paraformer=./sherpa-onnx-paraformer-zh-2024-03-09/model.onnx \
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/0.wav \
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/1.wav \
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/8k.wav \
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/2-zh-en.wav \
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/3-sichuan.wav \
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/4-tianjin.wav \
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/5-henan.wav
Note
Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.
Caution
If you use Windows and get encoding issues, please run:
CHCP 65001
in your commandline.
You should see the following output:
/project/sherpa-onnx/csrc/parse-options.cc:Read:361 sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-2024-03-09/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-2024-03-09/model.onnx ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/0.wav ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/1.wav ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/8k.wav ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/2-zh-en.wav ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/3-sichuan.wav ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/4-tianjin.wav ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/5-henan.wav
OfflineRecognizerConfig(feat_config=OfflineFeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-2024-03-09/model.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), tokens="./sherpa-onnx-paraformer-zh-2024-03-09/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0)
Creating recognizer ...
Started
/project/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:119 Creating a resampler:
in_sample_rate: 8000
output_sample_rate: 16000
Done!
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/0.wav
{"text": "对我做了介绍啊那么我想说的是呢大家如果对我的研究感兴趣呢你", "timestamps": [], "tokens":["对", "我", "做", "了", "介", "绍", "啊", "那", "么", "我", "想", "说", "的", "是", "呢", "大", "家", "如", "果", "对", "我", "的", "研", "究", "感", "兴", "趣", "呢", "你"]}
----
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/1.wav
{"text": "重点呢想谈三个问题首先呢就是这一轮全球金融动荡的表现", "timestamps": [], "tokens":["重", "点", "呢", "想", "谈", "三", "个", "问", "题", "首", "先", "呢", "就", "是", "这", "一", "轮", "全", "球", "金", "融", "动", "荡", "的", "表", "现"]}
----
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/8k.wav
{"text": "深入的分析这一次全球金融动荡背后的根源", "timestamps": [], "tokens":["深", "入", "的", "分", "析", "这", "一", "次", "全", "球", "金", "融", "动", "荡", "背", "后", "的", "根", "源"]}
----
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/2-zh-en.wav
{"text": " yesterday was 星期一 today is tuesday 明天是星期三", "timestamps": [], "tokens":["ye@@", "ster@@", "day", "was", "星", "期", "一", "today", "is", "tu@@", "es@@", "day", "明", "天", "是", "星", "期", "三"]}
----
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/3-sichuan.wav
{"text": "自己就是在那个在那个就是在情节里面就是感觉是演的特别好就是好像很真实一样你知道吧", "timestamps": [], "tokens":["自", "己", "就", "是", "在", "那", "个", "在", "那", "个", "就", "是", "在", "情", "节", "里", "面", "就", "是", "感", "觉", "是", "演", "的", "特", "别", "好", "就", "是", "好", "像", "很", "真", "实", "一", "样", "你", "知", "道", "吧"]}
----
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/4-tianjin.wav
{"text": "其实他就是怕每个人都可以守法就这意思法律意识太单薄了而且就是嗯也不顾及到别人的感受", "timestamps": [], "tokens":["其", "实", "他", "就", "是", "怕", "每", "个", "人", "都", "可", "以", "守", "法", "就", "这", "意", "思", "法", "律", "意", "识", "太", "单", "薄", "了", "而", "且", "就", "是", "嗯", "也", "不", "顾", "及", "到", "别", "人", "的", "感", "受"]}
----
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/5-henan.wav
{"text": "他这个管一向都通到有时候都通到七八层楼高然后他的管一向就可以交到那个那柱子上", "timestamps": [], "tokens":["他", "这", "个", "管", "一", "向", "都", "通", "到", "有", "时", "候", "都", "通", "到", "七", "八", "层", "楼", "高", "然", "后", "他", "的", "管", "一", "向", "就", "可", "以", "交", "到", "那", "个", "那", "柱", "子", "上"]}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 6.829 s
Real time factor (RTF): 6.829 / 47.023 = 0.145
int8
The following code shows how to use int8 models to decode wave files:
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-offline \
--tokens=./sherpa-onnx-paraformer-zh-2024-03-09/tokens.txt \
--paraformer=./sherpa-onnx-paraformer-zh-2024-03-09/model.int8.onnx \
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/0.wav \
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/1.wav \
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/8k.wav \
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/2-zh-en.wav \
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/3-sichuan.wav \
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/4-tianjin.wav \
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/5-henan.wav
Note
Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.
Caution
If you use Windows and get encoding issues, please run:
CHCP 65001
in your commandline.
You should see the following output:
/project/sherpa-onnx/csrc/parse-options.cc:Read:361 sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-2024-03-09/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-2024-03-09/model.onnx ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/0.wav ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/1.wav ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/8k.wav ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/2-zh-en.wav ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/3-sichuan.wav ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/4-tianjin.wav ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/5-henan.wav
OfflineRecognizerConfig(feat_config=OfflineFeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-2024-03-09/model.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), tokens="./sherpa-onnx-paraformer-zh-2024-03-09/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0)
Creating recognizer ...
Started
/project/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:119 Creating a resampler:
in_sample_rate: 8000
output_sample_rate: 16000
Done!
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/0.wav
{"text": "对我做了介绍啊那么我想说的是呢大家如果对我的研究感兴趣呢你", "timestamps": [], "tokens":["对", "我", "做", "了", "介", "绍", "啊", "那", "么", "我", "想", "说", "的", "是", "呢", "大", "家", "如", "果", "对", "我", "的", "研", "究", "感", "兴", "趣", "呢", "你"]}
----
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/1.wav
{"text": "重点呢想谈三个问题首先呢就是这一轮全球金融动荡的表现", "timestamps": [], "tokens":["重", "点", "呢", "想", "谈", "三", "个", "问", "题", "首", "先", "呢", "就", "是", "这", "一", "轮", "全", "球", "金", "融", "动", "荡", "的", "表", "现"]}
----
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/8k.wav
{"text": "深入的分析这一次全球金融动荡背后的根源", "timestamps": [], "tokens":["深", "入", "的", "分", "析", "这", "一", "次", "全", "球", "金", "融", "动", "荡", "背", "后", "的", "根", "源"]}
----
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/2-zh-en.wav
{"text": " yesterday was 星期一 today is tuesday 明天是星期三", "timestamps": [], "tokens":["ye@@", "ster@@", "day", "was", "星", "期", "一", "today", "is", "tu@@", "es@@", "day", "明", "天", "是", "星", "期", "三"]}
----
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/3-sichuan.wav
{"text": "自己就是在那个在那个就是在情节里面就是感觉是演的特别好就是好像很真实一样你知道吧", "timestamps": [], "tokens":["自", "己", "就", "是", "在", "那", "个", "在", "那", "个", "就", "是", "在", "情", "节", "里", "面", "就", "是", "感", "觉", "是", "演", "的", "特", "别", "好", "就", "是", "好", "像", "很", "真", "实", "一", "样", "你", "知", "道", "吧"]}
----
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/4-tianjin.wav
{"text": "其实他就是怕每个人都可以守法就这意思法律意识太单薄了而且就是嗯也不顾及到别人的感受", "timestamps": [], "tokens":["其", "实", "他", "就", "是", "怕", "每", "个", "人", "都", "可", "以", "守", "法", "就", "这", "意", "思", "法", "律", "意", "识", "太", "单", "薄", "了", "而", "且", "就", "是", "嗯", "也", "不", "顾", "及", "到", "别", "人", "的", "感", "受"]}
----
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/5-henan.wav
{"text": "他这个管一向都通到有时候都通到七八层楼高然后他的管一向就可以交到那个那柱子上", "timestamps": [], "tokens":["他", "这", "个", "管", "一", "向", "都", "通", "到", "有", "时", "候", "都", "通", "到", "七", "八", "层", "楼", "高", "然", "后", "他", "的", "管", "一", "向", "就", "可", "以", "交", "到", "那", "个", "那", "柱", "子", "上"]}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 6.829 s
Real time factor (RTF): 6.829 / 47.023 = 0.145
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-microphone-offline \
--tokens=./sherpa-onnx-paraformer-zh-2024-03-09/tokens.txt \
--paraformer=./sherpa-onnx-paraformer-zh-2024-03-09/model.int8.onnx
csukuangfj/sherpa-onnx-paraformer-zh-2023-03-28 (Chinese + English)
Note
This model does not support timestamps. It is a bilingual model, supporting both Chinese and English. (支持普通话、河南话、天津话、四川话等方言)
This model is converted from
The code for converting can be found at
https://huggingface.co/csukuangfj/paraformer-onnxruntime-python-example/tree/main
In the following, we describe how to download it and use it with sherpa-onnx.
Please use the following commands to download it.
cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-paraformer-zh-2023-03-28.tar.bz2
tar xvf sherpa-onnx-paraformer-zh-2023-03-28.tar.bz2
Please check that the file sizes of the pre-trained models are correct. See
the file sizes of *.onnx files below.
sherpa-onnx-paraformer-zh-2023-03-28$ ls -lh *.onnx
-rw-r--r-- 1 kuangfangjun root 214M Apr 1 07:28 model.int8.onnx
-rw-r--r-- 1 kuangfangjun root 824M Apr 1 07:28 model.onnx
Hint
It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.
fp32
The following code shows how to use fp32 models to decode wave files:
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-offline \
--tokens=./sherpa-onnx-paraformer-zh-2023-03-28/tokens.txt \
--paraformer=./sherpa-onnx-paraformer-zh-2023-03-28/model.onnx \
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/0.wav \
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/1.wav \
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/2.wav \
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/3-sichuan.wav \
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/4-tianjin.wav \
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/5-henan.wav \
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/6-zh-en.wav \
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/8k.wav
Note
Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.
Caution
If you use Windows and get encoding issues, please run:
CHCP 65001
in your commandline.
You should see the following output:
/project/sherpa-onnx/csrc/parse-options.cc:Read:361 sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-2023-03-28/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-2023-03-28/model.onnx ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/0.wav ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/1.wav ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/2.wav ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/3-sichuan.wav ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/4-tianjin.wav ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/5-henan.wav ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/6-zh-en.wav ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/8k.wav
OfflineRecognizerConfig(feat_config=OfflineFeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-2023-03-28/model.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), tokens="./sherpa-onnx-paraformer-zh-2023-03-28/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0)
Creating recognizer ...
Started
/project/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:119 Creating a resampler:
in_sample_rate: 8000
output_sample_rate: 16000
Done!
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/0.wav
{"text": "对我做了介绍啊那么我想说的是呢大家如果对我的研究感兴趣呢你", "timestamps": [], "tokens":["对", "我", "做", "了", "介", "绍", "啊", "那", "么", "我", "想", "说", "的", "是", "呢", "大", "家", "如", "果", "对", "我", "的", "研", "究", "感", "兴", "趣", "呢", "你"]}
----
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/1.wav
{"text": "重点呢想谈三个问题首先呢就是这一轮全球金融动荡的表现", "timestamps": [], "tokens":["重", "点", "呢", "想", "谈", "三", "个", "问", "题", "首", "先", "呢", "就", "是", "这", "一", "轮", "全", "球", "金", "融", "动", "荡", "的", "表", "现"]}
----
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/2.wav
{"text": "深入的分析这一次全球金融动荡背后的根源", "timestamps": [], "tokens":["深", "入", "的", "分", "析", "这", "一", "次", "全", "球", "金", "融", "动", "荡", "背", "后", "的", "根", "源"]}
----
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/3-sichuan.wav
{"text": "自己就是在那个在那个就是在情节里面就是感觉是演的特别好就是好像很真实一样你知道吧", "timestamps": [], "tokens":["自", "己", "就", "是", "在", "那", "个", "在", "那", "个", "就", "是", "在", "情", "节", "里", "面", "就", "是", "感", "觉", "是", "演", "的", "特", "别", "好", "就", "是", "好", "像", "很", "真", "实", "一", "样", "你", "知", "道", "吧"]}
----
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/4-tianjin.wav
{"text": "其实他就是那每个人都可以守法就这意思法律意识太单薄了而且就是嗯也不顾及到别人的感受", "timestamps": [], "tokens":["其", "实", "他", "就", "是", "那", "每", "个", "人", "都", "可", "以", "守", "法", "就", "这", "意", "思", "法", "律", "意", "识", "太", "单", "薄", "了", "而", "且", "就", "是", "嗯", "也", "不", "顾", "及", "到", "别", "人", "的", "感", "受"]}
----
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/5-henan.wav
{"text": "他这个管一向都通到有时候都通到七八层楼高然后他这管一向就可以浇到那个那柱子上", "timestamps": [], "tokens":["他", "这", "个", "管", "一", "向", "都", "通", "到", "有", "时", "候", "都", "通", "到", "七", "八", "层", "楼", "高", "然", "后", "他", "这", "管", "一", "向", "就", "可", "以", "浇", "到", "那", "个", "那", "柱", "子", "上"]}
----
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/6-zh-en.wav
{"text": " yesterday was 星期一 today is tuesday 明天是星期三", "timestamps": [], "tokens":["ye@@", "ster@@", "day", "was", "星", "期", "一", "today", "is", "tu@@", "es@@", "day", "明", "天", "是", "星", "期", "三"]}
----
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/8k.wav
{"text": "甚至出现交易几乎停滞的情况", "timestamps": [], "tokens":["甚", "至", "出", "现", "交", "易", "几", "乎", "停", "滞", "的", "情", "况"]}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 8.547 s
Real time factor (RTF): 8.547 / 51.236 = 0.167
int8
The following code shows how to use int8 models to decode wave files:
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-offline \
--tokens=./sherpa-onnx-paraformer-zh-2023-03-28/tokens.txt \
--paraformer=./sherpa-onnx-paraformer-zh-2023-03-28/model.int8.onnx \
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/0.wav \
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/1.wav \
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/2.wav \
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/3-sichuan.wav \
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/4-tianjin.wav \
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/5-henan.wav \
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/6-zh-en.wav \
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/8k.wav
Note
Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.
Caution
If you use Windows and get encoding issues, please run:
CHCP 65001
in your commandline.
You should see the following output:
/project/sherpa-onnx/csrc/parse-options.cc:Read:361 sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-2023-03-28/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-2023-03-28/model.int8.onnx ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/0.wav ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/1.wav ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/2.wav ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/3-sichuan.wav ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/4-tianjin.wav ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/5-henan.wav ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/6-zh-en.wav ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/8k.wav
OfflineRecognizerConfig(feat_config=OfflineFeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-2023-03-28/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), tokens="./sherpa-onnx-paraformer-zh-2023-03-28/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0)
Creating recognizer ...
Started
/project/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:119 Creating a resampler:
in_sample_rate: 8000
output_sample_rate: 16000
Done!
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/0.wav
{"text": "对我做了介绍啊那么我想说的是呢大家如果对我的研究感兴趣呢你", "timestamps": [], "tokens":["对", "我", "做", "了", "介", "绍", "啊", "那", "么", "我", "想", "说", "的", "是", "呢", "大", "家", "如", "果", "对", "我", "的", "研", "究", "感", "兴", "趣", "呢", "你"]}
----
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/1.wav
{"text": "重点呢想谈三个问题首先呢就是这一轮全球金融动荡的表现", "timestamps": [], "tokens":["重", "点", "呢", "想", "谈", "三", "个", "问", "题", "首", "先", "呢", "就", "是", "这", "一", "轮", "全", "球", "金", "融", "动", "荡", "的", "表", "现"]}
----
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/2.wav
{"text": "深入的分析这一次全球金融动荡背后的根源", "timestamps": [], "tokens":["深", "入", "的", "分", "析", "这", "一", "次", "全", "球", "金", "融", "动", "荡", "背", "后", "的", "根", "源"]}
----
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/3-sichuan.wav
{"text": "自己就是在那个在那个就是在情节里面就是感觉是演的特别好就是好像很真实一样你知道吧", "timestamps": [], "tokens":["自", "己", "就", "是", "在", "那", "个", "在", "那", "个", "就", "是", "在", "情", "节", "里", "面", "就", "是", "感", "觉", "是", "演", "的", "特", "别", "好", "就", "是", "好", "像", "很", "真", "实", "一", "样", "你", "知", "道", "吧"]}
----
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/4-tianjin.wav
{"text": "其实他就是那每个人都可以守法就这意思法律意识太单薄了而且就是嗯也不顾及到别人的感受", "timestamps": [], "tokens":["其", "实", "他", "就", "是", "那", "每", "个", "人", "都", "可", "以", "守", "法", "就", "这", "意", "思", "法", "律", "意", "识", "太", "单", "薄", "了", "而", "且", "就", "是", "嗯", "也", "不", "顾", "及", "到", "别", "人", "的", "感", "受"]}
----
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/5-henan.wav
{"text": "他这个管一向都通到有时候都通到七八层楼高然后他这管一向就可以浇到那个那柱子上", "timestamps": [], "tokens":["他", "这", "个", "管", "一", "向", "都", "通", "到", "有", "时", "候", "都", "通", "到", "七", "八", "层", "楼", "高", "然", "后", "他", "这", "管", "一", "向", "就", "可", "以", "浇", "到", "那", "个", "那", "柱", "子", "上"]}
----
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/6-zh-en.wav
{"text": " yesterday was 星期一 today is tuesday 明天是星期三", "timestamps": [], "tokens":["ye@@", "ster@@", "day", "was", "星", "期", "一", "today", "is", "tu@@", "es@@", "day", "明", "天", "是", "星", "期", "三"]}
----
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/8k.wav
{"text": "甚至出现交易几乎停滞的情况", "timestamps": [], "tokens":["甚", "至", "出", "现", "交", "易", "几", "乎", "停", "滞", "的", "情", "况"]}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 6.439 s
Real time factor (RTF): 6.439 / 51.236 = 0.126
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-microphone-offline \
--tokens=./sherpa-onnx-paraformer-zh-2023-03-28/tokens.txt \
--paraformer=./sherpa-onnx-paraformer-zh-2023-03-28/model.int8.onnx
csukuangfj/sherpa-onnx-paraformer-zh-2023-09-14 (Chinese + English))
Note
This model supports timestamps. It is a bilingual model, supporting both Chinese and English. (支持普通话、河南话、天津话、四川话等方言)
This model is converted from
In the following, we describe how to download it and use it with sherpa-onnx.
Please use the following commands to download it.
cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-paraformer-zh-2023-09-14.tar.bz2
tar xvf sherpa-onnx-paraformer-zh-2023-09-14.tar.bz2
Please check that the file sizes of the pre-trained models are correct. See
the file sizes of *.onnx files below.
sherpa-onnx-paraformer-zh-2023-09-14$ ls -lh *.onnx
-rw-r--r-- 1 fangjun staff 232M Sep 14 13:46 model.int8.onnx
Hint
It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.
int8
The following code shows how to use int8 models to decode wave files:
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-offline \
--tokens=./sherpa-onnx-paraformer-zh-2023-09-14/tokens.txt \
--paraformer=./sherpa-onnx-paraformer-zh-2023-09-14/model.int8.onnx \
--model-type=paraformer \
./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/0.wav \
./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/1.wav \
./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/2.wav \
./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/3-sichuan.wav \
./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/4-tianjin.wav \
./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/5-henan.wav \
./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/6-zh-en.wav \
./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/8k.wav
Note
Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.
Caution
If you use Windows and get encoding issues, please run:
CHCP 65001
in your commandline.
You should see the following output:
/project/sherpa-onnx/csrc/parse-options.cc:Read:361 sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-2023-09-14/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-2023-09-14/model.int8.onnx --model-type=paraformer ./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/0.wav ./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/1.wav ./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/2.wav ./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/3-sichuan.wav ./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/4-tianjin.wav ./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/5-henan.wav ./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/6-zh-en.wav ./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/8k.wav
OfflineRecognizerConfig(feat_config=OfflineFeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-2023-09-14/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), tokens="./sherpa-onnx-paraformer-zh-2023-09-14/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type="paraformer"), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0)
Creating recognizer ...
Started
/project/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:119 Creating a resampler:
in_sample_rate: 8000
output_sample_rate: 16000
Done!
./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/0.wav
{"text": "对我做了介绍啊那么我想说的是呢大家如果对我的研究感兴趣呢你", "timestamps": [0.36, 0.48, 0.62, 0.72, 0.86, 1.02, 1.32, 1.74, 1.90, 2.12, 2.20, 2.38, 2.50, 2.62, 2.74, 3.18, 3.32, 3.52, 3.62, 3.74, 3.82, 3.90, 3.98, 4.08, 4.20, 4.34, 4.56, 4.74, 5.10], "tokens":["对", "我", "做", "了", "介", "绍", "啊", "那", "么", "我", "想", "说", "的", "是", "呢", "大", "家", "如", "果", "对", "我", "的", "研", "究", "感", "兴", "趣", "呢", "你"]}
----
./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/1.wav
{"text": "重点呢想谈三个问题首先呢就是这一轮全球金融动荡的表现", "timestamps": [0.16, 0.30, 0.42, 0.56, 0.72, 0.96, 1.08, 1.20, 1.30, 2.08, 2.26, 2.44, 2.58, 2.72, 2.98, 3.14, 3.26, 3.46, 3.62, 3.80, 3.88, 4.02, 4.12, 4.20, 4.36, 4.56], "tokens":["重", "点", "呢", "想", "谈", "三", "个", "问", "题", "首", "先", "呢", "就", "是", "这", "一", "轮", "全", "球", "金", "融", "动", "荡", "的", "表", "现"]}
----
./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/2.wav
{"text": "深入的分析这一次全球金融动荡背后的根源", "timestamps": [0.34, 0.54, 0.66, 0.80, 1.08, 1.52, 1.72, 1.90, 2.40, 2.68, 2.86, 2.96, 3.16, 3.26, 3.46, 3.54, 3.66, 3.80, 3.90], "tokens":["深", "入", "的", "分", "析", "这", "一", "次", "全", "球", "金", "融", "动", "荡", "背", "后", "的", "根", "源"]}
----
./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/3-sichuan.wav
{"text": "自己就是在那个在那个就是在情节里面就是感觉是演的特别好就是好像很真实一样你知道吧", "timestamps": [0.16, 0.30, 0.56, 0.72, 0.92, 1.18, 1.32, 1.88, 2.24, 2.40, 3.16, 3.28, 3.40, 3.54, 3.76, 3.88, 4.06, 4.24, 4.36, 4.56, 4.66, 4.88, 5.14, 5.30, 5.44, 5.60, 5.72, 5.84, 5.96, 6.14, 6.24, 6.38, 6.56, 6.78, 6.98, 7.08, 7.22, 7.38, 7.50, 7.62], "tokens":["自", "己", "就", "是", "在", "那", "个", "在", "那", "个", "就", "是", "在", "情", "节", "里", "面", "就", "是", "感", "觉", "是", "演", "的", "特", "别", "好", "就", "是", "好", "像", "很", "真", "实", "一", "样", "你", "知", "道", "吧"]}
----
./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/4-tianjin.wav
{"text": "其实他就是那每个人都可以守法就这意思法律意识太单薄了而且就是嗯也不顾及到别人的感受", "timestamps": [0.08, 0.24, 0.36, 0.56, 0.66, 0.78, 1.04, 1.14, 1.26, 1.38, 1.50, 1.58, 1.70, 1.84, 2.28, 2.38, 2.64, 2.74, 3.08, 3.28, 3.66, 3.80, 3.94, 4.14, 4.34, 4.64, 4.84, 4.94, 5.12, 5.24, 5.84, 6.10, 6.24, 6.44, 6.54, 6.66, 6.86, 7.02, 7.14, 7.24, 7.44], "tokens":["其", "实", "他", "就", "是", "那", "每", "个", "人", "都", "可", "以", "守", "法", "就", "这", "意", "思", "法", "律", "意", "识", "太", "单", "薄", "了", "而", "且", "就", "是", "嗯", "也", "不", "顾", "及", "到", "别", "人", "的", "感", "受"]}
----
./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/5-henan.wav
{"text": "他这个管一向都通到有时候都通到七八层楼高然后他这管一向就可以浇到那个那柱子上", "timestamps": [0.08, 0.20, 0.30, 0.42, 0.94, 1.14, 1.26, 1.46, 1.66, 2.28, 2.50, 2.62, 2.70, 2.82, 2.98, 3.14, 3.28, 3.52, 3.70, 3.86, 4.94, 5.06, 5.18, 5.30, 5.42, 5.66, 5.76, 5.94, 6.08, 6.24, 6.38, 6.60, 6.78, 6.96, 7.10, 7.30, 7.50, 7.62], "tokens":["他", "这", "个", "管", "一", "向", "都", "通", "到", "有", "时", "候", "都", "通", "到", "七", "八", "层", "楼", "高", "然", "后", "他", "这", "管", "一", "向", "就", "可", "以", "浇", "到", "那", "个", "那", "柱", "子", "上"]}
----
./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/6-zh-en.wav
{"text": " yesterday was 星期一 today is tuesday 明天是星期三", "timestamps": [0.36, 0.60, 0.84, 1.22, 2.24, 2.44, 2.74, 3.52, 4.06, 4.68, 5.00, 5.12, 5.76, 5.96, 6.24, 6.82, 7.02, 7.26], "tokens":["ye@@", "ster@@", "day", "was", "星", "期", "一", "today", "is", "tu@@", "es@@", "day", "明", "天", "是", "星", "期", "三"]}
----
./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/8k.wav
{"text": "甚至出现交易几乎停滞的情况", "timestamps": [0.48, 0.78, 1.04, 1.18, 1.52, 1.78, 2.06, 2.18, 2.50, 2.66, 2.88, 3.10, 3.30], "tokens":["甚", "至", "出", "现", "交", "易", "几", "乎", "停", "滞", "的", "情", "况"]}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 9.206 s
Real time factor (RTF): 9.206 / 51.236 = 0.180
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-microphone-offline \
--tokens=./sherpa-onnx-paraformer-zh-2023-09-14/tokens.txt \
--paraformer=./sherpa-onnx-paraformer-zh-2023-09-14/model.int8.onnx \
--model-type=paraformer