Pre-trained models
You can download pre-trained models for Ascend NPU from https://github.com/k2-fsa/sherpa-onnx/releases/tag/asr-models-ascend.
We provide exported *.om models for 910B, 910B2, and 310P3 with
CANN 7.0, 8.0, 8.2 on Linux aarch64.
If you need models for other types of NPU or for a different version of CANN, please see Export models to Ascend NPU.
sherpa-onnx-ascend-910B-cann-8.0-sense-voice-zh-en-ja-ko-yue-2024-07-17 (Chinese, English, Japanese, Korean, Cantonese, 中英日韩粤语)
This model is converted from sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17 (Chinese, English, Japanese, Korean, Cantonese, 中英日韩粤语) using code from the following URL:
Hint
You can find how to run the export code at
The original PyTorch checkpoint is available at
Hint
It supports dynamic input shapes, but the batch size is fixed to 1 at present.
Decode long files with a VAD
The following example demonstrates how to use the model to decode a long wave file.
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/lei-jun-test.wav
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models-ascend/sherpa-onnx-ascend-910B-cann-8.0-sense-voice-zh-en-ja-ko-yue-2024-07-17.tar.bz2
tar xvf sherpa-onnx-ascend-910B-cann-8.0-sense-voice-zh-en-ja-ko-yue-2024-07-17.tar.bz2
rm sherpa-onnx-ascend-910B-cann-8.0-sense-voice-zh-en-ja-ko-yue-2024-07-17.tar.bz2
ls -lh sherpa-onnx-ascend-910B-cann-8.0-sense-voice-zh-en-ja-ko-yue-2024-07-17
You should see the following output:
ls -lh sherpa-onnx-ascend-910B-cann-8.0-sense-voice-zh-en-ja-ko-yue-2024-07-17/
total 999M
-rw-r--r-- 1 root root 204K Oct 23 21:43 features.bin
-rw-r--r-- 1 root root 71 Oct 23 13:52 LICENSE
-rw------- 1 root root 998M Oct 23 13:52 model.om
-rw-r--r-- 1 root root 104 Oct 23 13:52 README.md
-rwxr-xr-x 1 root root 3.6K Oct 23 21:43 test_om.py
drwxr-xr-x 2 root root 4.0K Oct 23 13:52 test_wavs
-rw-r--r-- 1 root root 309K Oct 23 13:52 tokens.txt
Hint
The above test_om.py uses ais_bench
Python API to run model.om without sherpa-onnx
Then run:
cd /path/to/sherpa-onnx/build
./bin/sherpa-onnx-vad-with-offline-asr \
--provider=ascend \
--silero-vad-model=./silero_vad.onnx \
--silero-vad-threshold=0.4 \
--sense-voice-model=./sherpa-onnx-ascend-910B-cann-8.0-sense-voice-zh-en-ja-ko-yue-2024-07-17/model.om \
--tokens=./sherpa-onnx-ascend-910B-cann-8.0-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt \
./lei-jun-test.wav
| Wave filename | Content |
|---|---|
| lei-jun-test.wav |
The output is given below:
Click ▶ to see the output
/root/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./bin/sherpa-onnx-vad-with-offline-asr --provider=ascend --silero-vad-model=./silero_vad.onnx --silero-vad-threshold=0.4 --sense-voice-model=./sherpa-onnx-ascend-910B-cann-8.0-sense-voice-zh-en-ja-ko-yue-2024-07-17/model.om --tokens=./sherpa-onnx-ascend-910B-cann-8.0-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt ./lei-jun-test.wav
VadModelConfig(silero_vad=SileroVadModelConfig(model="./silero_vad.onnx", threshold=0.4, min_silence_duration=0.5, min_speech_duration=0.25, max_speech_duration=20, window_size=512), ten_vad=TenVadModelConfig(model="", threshold=0.5, min_silence_duration=0.5, min_speech_duration=0.25, max_speech_duration=20, window_size=256), sample_rate=16000, num_threads=1, provider="cpu", debug=False)
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="./sherpa-onnx-ascend-910B-cann-8.0-sense-voice-zh-en-ja-ko-yue-2024-07-17/model.om", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), telespeech_ctc="", tokens="./sherpa-onnx-ascend-910B-cann-8.0-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt", num_threads=2, debug=False, provider="ascend", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
Recognizer created!
Started
Reading: ./lei-jun-test.wav
Started!
28.934 -- 36.140: 朋友们晚上好欢迎大家来参加今天晚上的活动谢谢大家
42.118 -- 57.676: 这是我第四次颁年度演讲前三次呢因为疫情的原因都在小米科技园内举办现场的人很少这是第四次
58.182 -- 67.020: 我们仔细想了想我们还是想办一个比较大的聚会然后呢让我们的新朋友老朋友一起聚一聚
67.718 -- 71.084: 今天的话呢我们就在北京的
71.654 -- 91.580: 国家会议中心呢举办了这么一个活动现场呢来了很多人大概有三千五百人还有很多很多的朋友呢通过观看直播的方式来参与再一次呢对大家的参加表示感谢谢谢大家
98.470 -- 104.780: 两个月前我参加了今年武汉大学的毕业典礼
105.894 -- 110.892: 今年呢是武汉大学建校一百三十周年
111.750 -- 117.388: 作为校友被母校邀请在毕业典礼上致辞
117.990 -- 122.892: 这对我来说是至高无上的荣誉
123.654 -- 134.380: 站在讲台的那一刻面对全校师生关于武大的所有的记忆一下子涌现在脑海里
134.950 -- 139.660: 今天呢我就先和大家聊聊五大往事
141.830 -- 144.012: 那还是三十六年前
145.926 -- 147.724: 一九八七年
148.678 -- 151.724: 我呢考上了武汉大学的计算机系
152.646 -- 156.908: 在武汉大学的图书馆里看了一本书
157.574 -- 158.796: 硅谷之火
159.302 -- 161.708: 建立了我一生的梦想
163.206 -- 164.492: 看完书以后
165.286 -- 166.508: 热血沸腾
167.590 -- 169.356: 激动的睡不着觉
170.406 -- 171.244: 我还记得
172.006 -- 174.764: 那天晚上星光很亮
175.398 -- 177.868: 我就在五大的操场上
178.342 -- 179.948: 就是屏幕上这个超场
180.774 -- 185.388: 走了一圈又一圈走了整整一个晚上
186.470 -- 187.788: 我心里有团火
188.934 -- 191.884: 我也想办一个伟大的公司
193.958 -- 194.860: 就是这样
197.574 -- 202.540: 梦想之火在我心里彻底点燃了
209.734 -- 212.716: 但是一个大一的新生
220.230 -- 222.828: 但是一个大一的新生
223.782 -- 227.244: 一个从县城里出来的年轻人
228.134 -- 230.764: 什么也不会什么也没有
231.526 -- 236.460: 就想创办一家伟大的公司这不就是天荒夜谭吗
237.574 -- 242.476: 这么离谱的一个梦想该如何实现呢
243.846 -- 246.988: 那天晚上我想了一整晚上
247.942 -- 249.068: 说实话
250.342 -- 253.900: 越想越糊涂完全理不清头绪
254.982 -- 261.516: 后来我在想哎干脆别想了把书练好是正事
262.150 -- 265.900: 所以呢我就下定决心认认真真读书
266.630 -- 267.340: 那么
268.486 -- 271.692: 我怎么能够把书读的不同凡响呢
num threads: 2
decoding method: greedy_search
Elapsed seconds: 6.264 s
Real time factor (RTF): 6.264 / 272.448 = 0.023
Decode a short file
The following example demonstrates how to use the model to decode a short wave file.
cd /path/to/sherpa-onnx/build
./bin/sherpa-onnx-offline \
--provider=ascend \
--sense-voice-model=./sherpa-onnx-ascend-910B-cann-8.0-sense-voice-zh-en-ja-ko-yue-2024-07-17/model.om \
--tokens=./sherpa-onnx-ascend-910B-cann-8.0-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt \
./sherpa-onnx-ascend-910B-cann-8.0-sense-voice-zh-en-ja-ko-yue-2024-07-17/test_wavs/zh.wav
The output is given below:
/root/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./bin/sherpa-onnx-offline --provider=ascend --sense-voice-model=./sherpa-onnx-ascend-910B-cann-8.0-sense-voice-zh-en-ja-ko-yue-2024-07-17/model.om --tokens=./sherpa-onnx-ascend-910B-cann-8.0-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt ./sherpa-onnx-ascend-910B-cann-8.0-sense-voice-zh-en-ja-ko-yue-2024-07-17/test_wavs/zh.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="./sherpa-onnx-ascend-910B-cann-8.0-sense-voice-zh-en-ja-ko-yue-2024-07-17/model.om", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), telespeech_ctc="", tokens="./sherpa-onnx-ascend-910B-cann-8.0-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt", num_threads=2, debug=False, provider="ascend", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
Started
Done!
./sherpa-onnx-ascend-910B-cann-8.0-sense-voice-zh-en-ja-ko-yue-2024-07-17/test_wavs/zh.wav
{"lang": "<|zh|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "开放时间早上九点至下午五点", "timestamps": [0.72, 0.96, 1.26, 1.44, 1.92, 2.10, 2.58, 2.82, 3.30, 3.90, 4.20, 4.56, 4.74], "durations": [], "tokens":["开", "放", "时", "间", "早", "上", "九", "点", "至", "下", "午", "五", "点"], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 0.081 s
Real time factor (RTF): 0.081 / 5.592 = 0.014
sherpa-onnx-ascend-910B-cann-8.0-sense-voice-zh-en-ja-ko-yue-2025-09-09 (Chinese, English, Japanese, Korean, Cantonese, 中英日韩粤语)
This model is converted from sherpa-onnx-sense-voice-zh-en-ja-ko-yue-int8-2025-09-09 (Chinese, English, Japanese, Korean, Cantonese, 中英日韩粤语) using code from the following URL:
Hint
You can find how to run the export code at
The original PyTorch checkpoint is available at
Please refer to sherpa-onnx-ascend-910B-cann-8.0-sense-voice-zh-en-ja-ko-yue-2024-07-17 (Chinese, English, Japanese, Korean, Cantonese, 中英日韩粤语) for how to use this model.
sherpa-onnx-ascend-910B-cann-8.0-paraformer-zh-2023-03-28 (Chinese + English)
This model is converted from csukuangfj/sherpa-onnx-paraformer-zh-2023-03-28 (Chinese + English) using code from the following URL:
Hint
You can find how to run the export code at
Hint
It supports dynamic input shapes, but the batch size is fixed to 1 at present.
Decode long files with a VAD
The following example demonstrates how to use the model to decode a long wave file.
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/lei-jun-test.wav
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models-ascend/sherpa-onnx-ascend-910B-cann-8.0-paraformer-zh-2023-03-28.tar.bz2
tar xvf sherpa-onnx-ascend-910B-cann-8.0-paraformer-zh-2023-03-28.tar.bz2
rm sherpa-onnx-ascend-910B-cann-8.0-paraformer-zh-2023-03-28.tar.bz2
ls -lh sherpa-onnx-ascend-910B-cann-8.0-paraformer-zh-2023-03-28
You should see the following output:
ls -lh sherpa-onnx-ascend-910B-cann-8.0-paraformer-zh-2023-03-28
total 1.1G
-rw------- 1 root root 291M Oct 17 23:39 decoder.om
-rw------- 1 root root 701M Oct 17 23:39 encoder.om
-rw------- 1 root root 52M Oct 17 23:39 predictor.om
-rw-r--r-- 1 root root 379 Oct 17 23:39 README.md
-rwxr-xr-x 1 root root 5.5K Nov 3 09:37 test_om.py
drwxr-xr-x 2 root root 4.0K Oct 17 23:39 test_wavs
-rw-r--r-- 1 root root 74K Oct 17 23:39 tokens.txt
Hint
The above test_om.py uses ais_bench
Python API to run model.om without sherpa-onnx
Then run:
cd /path/to/sherpa-onnx/build
./bin/sherpa-onnx-vad-with-offline-asr \
--provider=ascend \
--silero-vad-model=./silero_vad.onnx \
--silero-vad-threshold=0.4 \
--paraformer="sherpa-onnx-ascend-910B-cann-8.0-paraformer-zh-2023-03-28/encoder.om,sherpa-onnx-ascend-910B-cann-8.0-paraformer-zh-2023-03-28/predictor.om,sherpa-onnx-ascend-910B-cann-8.0-paraformer-zh-2023-03-28/decoder.om" \
--tokens=sherpa-onnx-ascend-910B-cann-8.0-paraformer-zh-2023-03-28/tokens.txt \
./lei-jun-test.wav
| Wave filename | Content |
|---|---|
| lei-jun-test.wav |
The output is given below:
Click ▶ to see the output
/root/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./bin/sherpa-onnx-vad-with-offline-asr --provider=ascend --silero-vad-model=./silero_vad.onnx --silero-vad-threshold=0.4 --paraformer=sherpa-onnx-ascend-910B-cann-8.0-paraformer-zh-2023-03-28/encoder.om,sherpa-onnx-ascend-910B-cann-8.0-paraformer-zh-2023-03-28/predictor.om,sherpa-onnx-ascend-910B-cann-8.0-paraformer-zh-2023-03-28/decoder.om --tokens=sherpa-onnx-ascend-910B-cann-8.0-paraformer-zh-2023-03-28/tokens.txt ./lei-jun-test.wav
VadModelConfig(silero_vad=SileroVadModelConfig(model="./silero_vad.onnx", threshold=0.4, min_silence_duration=0.5, min_speech_duration=0.25, max_speech_duration=20, window_size=512), ten_vad=TenVadModelConfig(model="", threshold=0.5, min_silence_duration=0.5, min_speech_duration=0.25, max_speech_duration=20, window_size=256), sample_rate=16000, num_threads=1, provider="cpu", debug=False)
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="sherpa-onnx-ascend-910B-cann-8.0-paraformer-zh-2023-03-28/encoder.om,sherpa-onnx-ascend-910B-cann-8.0-paraformer-zh-2023-03-28/predictor.om,sherpa-onnx-ascend-910B-cann-8.0-paraformer-zh-2023-03-28/decoder.om"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), telespeech_ctc="", tokens="sherpa-onnx-ascend-910B-cann-8.0-paraformer-zh-2023-03-28/tokens.txt", num_threads=2, debug=False, provider="ascend", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
Recognizer created!
Started
Reading: ./lei-jun-test.wav
Started!
28.934 -- 36.140: 朋友们晚上好欢迎大家来参加今天晚上的活动谢谢大家
42.118 -- 57.676: 这是我第四次办年度演讲前三次呢因为疫情的原因都在小米科技园内举办现场的人很少这是四次
58.182 -- 67.020: 我们仔细想了想我们孩子想办一个比较大的聚会然后呢让我们的新朋友老朋友一起聚一聚
67.718 -- 71.084: 今天的话呢我们就在北京的
71.654 -- 91.580: 国家会议中心呢举办了这么一个活动现场呢来了很多人大概有三千五百人还有很多很多的朋友呢通过观看直播的方式来参与再一次呢对大家的参加表示感谢谢谢大家
98.470 -- 104.780: 两个月前我参加了今年武汉大学的毕业典礼
105.894 -- 110.892: 今年呢是武汉大学建校一百三十周年
111.750 -- 117.388: 作为校友被母校邀请在毕业典礼上致辞
117.990 -- 122.892: 这对我来说是至高无上的荣誉
123.654 -- 134.380: 站在讲台的那一刻面对全校师生关于武大的所有的记忆一下子涌现在脑海里
134.950 -- 139.660: 今天呢我就先和大家聊聊五大往事
141.830 -- 144.012: 那还是三十六年前
145.926 -- 147.724: 一九八七年
148.678 -- 151.724: 我呢考上了武汉大学的计算机系
152.646 -- 156.908: 在武汉大学的图书馆里看了一本书
157.574 -- 158.796: 硅谷之火
159.302 -- 161.708: 建立了我一生的梦想
163.206 -- 164.492: 看完书以后
165.286 -- 166.508: 热血沸腾
167.590 -- 169.356: 激动的睡不着觉
170.406 -- 171.244: 我还记得
172.006 -- 174.764: 那天晚上星光很亮
175.398 -- 177.868: 我就在武大的操场上
178.342 -- 179.948: 就是屏幕上这个操场
180.774 -- 185.388: 走了一圈又一圈走了整整一个晚上
186.470 -- 187.788: 我心里有火
188.934 -- 191.884: 我也想办一个伟大公司
193.958 -- 194.860: 就是这样
197.574 -- 202.540: 梦想之火在我心里彻底点燃了
209.734 -- 212.716: 但是一个大一的新生
220.230 -- 222.828: 但是一个大一的新生
223.782 -- 227.244: 一个从县城里出来的年轻人
228.134 -- 230.764: 什么也不会什么也没有
231.526 -- 236.460: 就想创办一家伟大的公司这不就是天方夜谭吗
237.574 -- 242.476: 这么离谱的一个梦想该如何实现呢
243.846 -- 246.988: 那天晚上我想了一整晚上
247.942 -- 249.068: 说实话
250.342 -- 253.900: 越想越糊涂完全理不清头绪
254.982 -- 261.516: 后来我在想哎干脆别想了把书练好是正事
262.150 -- 265.900: 所以呢我就下定决心认认真真读书
266.630 -- 267.340: 那么
268.486 -- 271.692: 我怎么能够把书读的不同反响
num threads: 2
decoding method: greedy_search
Elapsed seconds: 5.856 s
Real time factor (RTF): 5.856 / 272.448 = 0.021
Decode a short file
The following example demonstrates how to use the model to decode a short wave file.
cd /path/to/sherpa-onnx/build
./bin/sherpa-onnx-offline \
--provider=ascend \
--paraformer="sherpa-onnx-ascend-910B-cann-8.0-paraformer-zh-2023-03-28/encoder.om,sherpa-onnx-ascend-910B-cann-8.0-paraformer-zh-2023-03-28/predictor.om,sherpa-onnx-ascend-910B-cann-8.0-paraformer-zh-2023-03-28/decoder.om" \
--tokens=sherpa-onnx-ascend-910B-cann-8.0-paraformer-zh-2023-03-28/tokens.txt \
sherpa-onnx-ascend-910B-cann-8.0-paraformer-zh-2023-03-28/test_wavs/1.wav \
The output is given below:
/root/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./bin/sherpa-onnx-offline --provider=ascend --paraformer=sherpa-onnx-ascend-910B-cann-8.0-paraformer-zh-2023-03-28/encoder.om,sherpa-onnx-ascend-910B-cann-8.0-paraformer-zh-2023-03-28/predictor.om,sherpa-onnx-ascend-910B-cann-8.0-paraformer-zh-2023-03-28/decoder.om --tokens=sherpa-onnx-ascend-910B-cann-8.0-paraformer-zh-2023-03-28/tokens.txt sherpa-onnx-ascend-910B-cann-8.0-paraformer-zh-2023-03-28/test_wavs/1.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="sherpa-onnx-ascend-910B-cann-8.0-paraformer-zh-2023-03-28/encoder.om,sherpa-onnx-ascend-910B-cann-8.0-paraformer-zh-2023-03-28/predictor.om,sherpa-onnx-ascend-910B-cann-8.0-paraformer-zh-2023-03-28/decoder.om"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), telespeech_ctc="", tokens="sherpa-onnx-ascend-910B-cann-8.0-paraformer-zh-2023-03-28/tokens.txt", num_threads=2, debug=False, provider="ascend", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
Started
Done!
sherpa-onnx-ascend-910B-cann-8.0-paraformer-zh-2023-03-28/test_wavs/1.wav
{"lang": "", "emotion": "", "event": "", "text": "重点呢想谈三个问题首先呢就是这一轮全球金融动荡的表现", "timestamps": [], "durations": [], "tokens":["重", "点", "呢", "想", "谈", "三", "个", "问", "题", "首", "先", "呢", "就", "是", "这", "一", "轮", "全", "球", "金", "融", "动", "荡", "的", "表", "现"], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 0.109 s
Real time factor (RTF): 0.109 / 5.156 = 0.021
sherpa-onnx-ascend-910B2-cann-8.0-paraformer-zh-2025-10-07 (四川话、重庆话、川渝方言)
This model is converted from sherpa-onnx-paraformer-zh-int8-2025-10-07 (四川话、重庆话、川渝方言) using code from the following URL:
Hint
You can find how to run the export code at
The original PyTorch checkpoint is available at
Hint
It supports dynamic input shapes, but the batch size is fixed to 1 at present.
Please refer to sherpa-onnx-ascend-910B-cann-8.0-paraformer-zh-2023-03-28 (Chinese + English) for how to use this model.
sherpa-onnx-ascend-910B-cann-7.0-5-seconds-zipformer-ctc-zh-2025-07-03 (中文)
This model is converted from sherpa-onnx-zipformer-ctc-zh-int8-2025-07-03 (Chinese) and supports only Chinese.
This model only accepts input audio up to 5 seconds. Audio shorter than 5 seconds is internally padded,
and audio longer than 5 seconds is truncated.
You can select a model that accepts longer input. For instance,
sherpa-onnx-ascend-910B-cann-7.0-10-seconds-zipformer-ctc-zh-2025-07-03 can accept input audio up to 10 seconds.
We provide models for accepting input ranging from 5 seconds to 30 seconds. See https://github.com/k2-fsa/sherpa-onnx/releases/tag/asr-models-ascend for more details.
Hint
You can find how to run the export code at
Decode a short file
The following example demonstrates how to use the model to decode a short wave file.
cd /path/to/sherpa-onnx/build
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models-ascend/sherpa-onnx-ascend-910B-cann-7.0-5-seconds-zipformer-ctc-zh-2025-07-03.tar.bz2
tar xvf sherpa-onnx-ascend-910B-cann-7.0-5-seconds-zipformer-ctc-zh-2025-07-03.tar.bz2
rm sherpa-onnx-ascend-910B-cann-7.0-5-seconds-zipformer-ctc-zh-2025-07-03.tar.bz2
./bin/sherpa-onnx-offline \
--provider=ascend \
--zipformer-ctc-model=./sherpa-onnx-ascend-910B-cann-7.0-5-seconds-zipformer-ctc-zh-2025-07-03/model.om \
--tokens=./sherpa-onnx-ascend-910B-cann-7.0-5-seconds-zipformer-ctc-zh-2025-07-03/tokens.txt \
./sherpa-onnx-ascend-910B-cann-7.0-5-seconds-zipformer-ctc-zh-2025-07-03/test_wavs/0.wav
The output is given below:
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model="./sherpa-onnx-ascend-910B-cann-7.0-5-seconds-zipformer-ctc-zh-2025-07-03/model.om"), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), telespeech_ctc="", tokens="./sherpa-onnx-ascend-910B-cann-7.0-5-seconds-zipformer-ctc-zh-2025-07-03/tokens.txt", num_threads=2, debug=False, provider="ascend", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
recognizer created in 3.108 s
Started
/root/sherpa-onnx-master/sherpa-onnx/csrc/ascend/offline-zipformer-ctc-model-ascend.cc:Run:58 Number of input frames 561 is too large. Truncate it to 500 frames.
/root/sherpa-onnx-master/sherpa-onnx/csrc/ascend/offline-zipformer-ctc-model-ascend.cc:Run:62 Recognition result may be truncated/incomplete. Please select a model accepting longer audios.
Done!
./sherpa-onnx-ascend-910B-cann-7.0-5-seconds-zipformer-ctc-zh-2025-07-03/test_wavs/0.wav
{"lang": "", "emotion": "", "event": "", "text": "对我做了介绍那么我想说的是呢大家如果对我的研究感兴趣呢", "timestamps": [0.00, 0.32, 0.48, 0.64, 0.80, 0.96, 1.08, 1.16, 1.60, 1.76, 1.92, 2.08, 2.24, 2.40, 2.56, 2.72, 3.04, 3.20, 3.36, 3.44, 3.52, 3.68, 3.76, 3.84, 4.00, 4.16, 4.32, 4.48, 4.60, 4.68, 4.80], "durations": [], "tokens":["▁ƌŕş", "▁ƍĩĴ", "▁ƌĢĽ", "▁ƋŠħ", "▁ƋšĬ", "▁Ǝ", "š", "Į", "▁Ɛģň", "▁Ƌşĩ", "▁ƍĩĴ", "▁ƍĤř", "▁ƏŕŚ", "▁ƎĽĥ", "▁ƍĻŕ", "▁ƌĴŇ", "▁ƌŊō", "▁ƌŔŜ", "▁ƌŌģ", "▁ƍŃŁ", "▁ƌŕş", "▁ƍĩĴ", "▁ƎĽĥ", "▁ƎŅķ", "▁ƎŏŜ", "▁ƍĥń", "▁ƌĦŚ", "▁Ə", "Ŝ", "ň", "▁ƌĴŇ"], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 0.041 s
Real time factor (RTF): 0.041 / 5.611 = 0.007