Pre-trained Models
This page describes how to download pre-trained FireRedAsr models.
Note that we support models from the following two repositories
v1 contains only one model, based on attention-encoder-decoder and it is somewhat slow on CPU.
v2 contains one more CTC model, which is very fast on CPU. The AED model in v2 is also much faster than the v1 AED model.
sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25 (v2, CTC, Chinese + English, 普通话、粤语(香港和广东)、四川话、上海话、吴语、闽南话、安徽话、 福建话、甘肃话、贵州话、河北话、河南话、湖北话、湖南话、江西话、辽宁话、宁夏话、 陕西话、山西话、山东话、天津话、云南话等20多种方言)
This model is converted from https://www.modelscope.cn/models/FireRedTeam/FireRedASR2-AED. Note that only the CTC branch is converted. The attention decoder branch is excluded.
It supports both Chinese and English, as well as more than 20 dialects and accents.
Note
支持的中文方言/口音:粤语(香港和广东)、四川话、上海话、吴语、闽南话、安徽话、 福建话、甘肃话、贵州话、河北话、河南话、湖北话、湖南话、江西话、辽宁话、宁夏话、 陕西话、山西话、山东话、天津话、云南话等。
The sections below show how to use it.
Real-time/streaming speech recognition on Android
Please visit https://k2-fsa.github.io/sherpa/onnx/android/apk.html and select the file
sherpa-onnx-<version>-arm64-v8a-simulated_streaming_asr-zh_en-fire_red_asr2_ctc_int8_2026_02_25.apk
Note
For instance, if you choose version 1.12.27, you should use sherpa-onnx-1.12.27-arm64-v8a-simulated_streaming_asr-zh_en-fire_red_asr2_ctc_int8_2026_02_25.apk
中国用户,请使用 sherpa-onnx-1.12.27-arm64-v8a-simulated_streaming_asr-zh_en-fire_red_asr2_ctc_int8_2026_02_25.apk
The source code for the APK can be found at
See Build sherpa-onnx for Android for how to build our Android demo.
Download
Please use the following commands to download it:
cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25.tar.bz2
tar xvf sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25.tar.bz2
rm sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25.tar.bz2
After downloading, you should find the following files:
ls -lh sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25
total 1515528
-rw-r--r--@ 1 fangjun staff 740M 26 Feb 13:42 model.int8.onnx
-rw-r--r--@ 1 fangjun staff 190B 26 Feb 13:35 README.md
drwxr-xr-x@ 10 fangjun staff 320B 26 Feb 13:42 test_wavs
-rw-r--r--@ 1 fangjun staff 77K 26 Feb 13:42 tokens.txt
ls -lh sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25/test_wavs/
total 3848
-rw-r--r--@ 1 fangjun staff 314K 26 Feb 13:42 0.wav
-rw-r--r--@ 1 fangjun staff 159K 26 Feb 13:42 1.wav
-rw-r--r--@ 1 fangjun staff 147K 26 Feb 13:42 2.wav
-rw-r--r--@ 1 fangjun staff 245K 26 Feb 13:42 3-sichuan.wav
-rw-r--r--@ 1 fangjun staff 276K 26 Feb 13:42 3.wav
-rw-r--r--@ 1 fangjun staff 244K 26 Feb 13:42 4-tianjin.wav
-rw-r--r--@ 1 fangjun staff 250K 26 Feb 13:42 5-henan.wav
-rw-r--r--@ 1 fangjun staff 276K 26 Feb 13:42 8k.wav
Decode a file
Please use the following command to decode a wave file:
./build/bin/sherpa-onnx-offline \
--num-threads=1 \
--fire-red-asr-ctc=./sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25/model.int8.onnx \
--tokens=./sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25/tokens.txt \
./sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25/test_wavs/1.wav
You should see the following output:
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:373 ./build/bin/sherpa-onnx-offline --num-threads=1 --fire-red-asr-ctc=./sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25/model.int8.onnx --tokens=./sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25/tokens.txt ./sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25/test_wavs/1.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1, enable_token_timestamps=False, enable_segment_timestamps=False), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder="", merged_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), funasr_nano=OfflineFunASRNanoModelConfig(encoder_adaptor="", llm="", embedding="", tokenizer="", system_prompt="You are a helpful assistant.", user_prompt="语音转写:", max_new_tokens=512, temperature=1e-06, top_p=0.8, seed=42, language="", itn=True, hotwords=""), medasr=OfflineMedAsrCtcModelConfig(model=""), fire_red_asr_ctc=OfflineFireRedAsrCtcModelConfig(model="./sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25/model.int8.onnx"), telespeech_ctc="", tokens="./sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
recognizer created in 0.496 s
Started
Done!
./sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25/test_wavs/1.wav
{"lang": "", "emotion": "", "event": "", "text": "这是第一种第二种叫呃与 ALWAYSISE ALWAYS什么意思啊", "timestamps": [0.76, 0.92, 1.04, 1.16, 1.28, 1.48, 1.60, 1.76, 2.04, 2.72, 3.16, 3.56, 3.80, 4.12, 4.52, 4.60, 4.72, 4.84, 4.92], "durations": [], "tokens":["这", "是", "第", "一", "种", "第", "二", "种", "叫", "呃", "与", " ALWAYS", "ISE", " ALWAYS", "什", "么", "意", "思", "啊"], "ys_log_probs": [], "words": []}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.883 s
Real time factor (RTF): 0.883 / 5.100 = 0.173
Decode long files with a VAD
The following example demonstrates how to use the model to decode a long wave file.
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/lei-jun-test.wav
build/bin/sherpa-onnx-vad-with-offline-asr \
--num-threads=3 \
--silero-vad-model=./silero_vad.onnx \
--fire-red-asr-ctc=./sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25/model.int8.onnx \
--tokens=./sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25/tokens.txt \
./lei-jun-test.wav
You should see the following output:
| Wave filename | Content |
|---|---|
| lei-jun-test.wav |
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:373 build/bin/sherpa-onnx-vad-with-offline-asr --num-threads=3 --silero-vad-model=./silero_vad.onnx --fire-red-asr-ctc=./sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25/model.int8.onnx --tokens=./sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25/tokens.txt ./lei-jun-test.wav
VadModelConfig(silero_vad=SileroVadModelConfig(model="./silero_vad.onnx", threshold=0.5, min_silence_duration=0.5, min_speech_duration=0.25, max_speech_duration=20, window_size=512, neg_threshold=-1), ten_vad=TenVadModelConfig(model="", threshold=0.5, min_silence_duration=0.5, min_speech_duration=0.25, max_speech_duration=20, window_size=256), sample_rate=16000, num_threads=1, provider="cpu", debug=False)
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1, enable_token_timestamps=False, enable_segment_timestamps=False), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder="", merged_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), funasr_nano=OfflineFunASRNanoModelConfig(encoder_adaptor="", llm="", embedding="", tokenizer="", system_prompt="You are a helpful assistant.", user_prompt="语音转写:", max_new_tokens=512, temperature=1e-06, top_p=0.8, seed=42, language="", itn=True, hotwords=""), medasr=OfflineMedAsrCtcModelConfig(model=""), fire_red_asr_ctc=OfflineFireRedAsrCtcModelConfig(model="./sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25/model.int8.onnx"), telespeech_ctc="", tokens="./sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25/tokens.txt", num_threads=3, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
Recognizer created!
Started
Reading: ./lei-jun-test.wav
Started!
29.918 -- 34.028: 上好欢迎大家来参加今天晚上的活动
34.494 -- 35.660: 谢谢大家
42.174 -- 43.916: 这是我第四次
45.054 -- 45.932: 年度演讲
47.006 -- 49.772: 前三次呢因为疫情的原因
50.526 -- 55.212: 都在小米科技园内举办现场呢的人很少
56.158 -- 57.292: 这是第四次
58.206 -- 62.316: 我们仔细想了想我们还是想办一个比较大的聚会
62.846 -- 66.636: 然后呢让我们的新朋友老朋友一起聚一聚
67.742 -- 68.748: 今天的话呢
69.214 -- 70.668: 我们就在北京的
71.646 -- 74.668: 国家会议中心呢举办了这么一个活动
75.486 -- 79.276: 现场呢来了很多人大概有三千五百人
79.998 -- 81.932: 还有很多很多的朋友呢
82.686 -- 85.612: 通过观看直播的方式来参与
86.366 -- 90.284: 再一次呢对大家的参加表示感谢谢谢大家
98.494 -- 99.564: 两个月前
100.382 -- 104.236: 我参加了今年武汉大学的毕业典礼
105.950 -- 107.084: 今年呢是
107.902 -- 110.380: 武汉大学建校一百三十周年
111.742 -- 112.652: 作为校友
113.374 -- 114.636: 被母校邀请
115.230 -- 117.068: 在毕业典礼上致辞
118.046 -- 119.180: 这对我来说
119.934 -- 122.540: 是志高无上的荣誉
123.678 -- 125.516: 站在讲台的那一刻
126.238 -- 128.332: 面对全校师生
129.214 -- 134.092: 关于武大的所有的记忆一下子涌现在脑海里
134.974 -- 139.212: 今天呢我就先和大家聊聊五大往事
141.982 -- 143.788: 还是三十六年前
145.950 -- 147.436: 一九八七年
148.702 -- 151.372: 我呢考上了武汉大学的计算机系
152.702 -- 156.524: 在武汉大学的图书馆里看了一本书
157.566 -- 158.540: 硅谷之火
159.326 -- 161.612: 建立了我一生的梦想
163.294 -- 164.428: 看完书以后
165.278 -- 166.412: 热血沸腾
167.614 -- 169.292: 激动的睡不着觉
170.430 -- 171.212: 我还记得
171.998 -- 174.540: 那天晚上星光很亮
175.390 -- 177.612: 我就在五大的操场上
178.366 -- 179.692: 就是屏幕上这个操场
180.798 -- 182.444: 走了一圈又一圈
182.974 -- 185.100: 走了整整一个晚上
186.494 -- 187.692: 我心里有团火
188.926 -- 191.692: 我也想办一个伟大的公司
193.950 -- 194.764: 就是这样
197.662 -- 198.764: 梦想吃火
199.294 -- 202.220: 在我心里彻底点燃了
210.814 -- 212.300: 一个大一的新
220.990 -- 222.508: 一个大一的新生
223.998 -- 226.796: 个从县城里出来的年轻人
228.478 -- 230.476: 么也不会什么也没有
231.550 -- 236.012: 就想创办一家伟大的公司这不就是天荒夜谭吗
237.726 -- 239.596: 这么离谱的一个梦想
240.414 -- 242.156: 该如何实现呢
243.870 -- 244.716: 那天晚上
245.182 -- 246.732: 我想了一整晚上
247.998 -- 248.812: 说实话
250.334 -- 253.676: 越想越糊涂完全理不清头绪
255.006 -- 255.948: 后来我在想
256.830 -- 257.740: 干脆别想了
258.366 -- 259.692: 把书好
260.446 -- 261.196: 是正是
262.174 -- 262.892: 所以呢
263.390 -- 265.644: 我就下定决心认认正真读书
266.654 -- 267.084: 那么
268.478 -- 271.276: 我怎么能够把书读的不同反响呢
num threads: 3
decoding method: greedy_search
Elapsed seconds: 15.693 s
Real time factor (RTF): 15.693 / 272.448 = 0.058
Real-time/Streaming Speech recognition from a microphone with VAD
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
./build/bin/sherpa-onnx-vad-microphone-simulated-streaming-asr \
--silero-vad-model=./silero_vad.onnx \
--fire-red-asr-ctc=./sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25/model.int8.onnx \
--tokens=./sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25/tokens.txt
Speech recognition from a microphone
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-microphone-offline \
--fire-red-asr-ctc=./sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25/model.int8.onnx \
--tokens=./sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25/tokens.txt
Speech recognition from a microphone with VAD
cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
./build/bin/sherpa-onnx-vad-microphone-offline-asr \
--silero-vad-model=./silero_vad.onnx \
--fire-red-asr-ctc=./sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25/model.int8.onnx \
--tokens=./sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25/tokens.txt
sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26 (v2, AED, Chinese + English, 普通话、粤语(香港和广东)、四川话、上海话、吴语、闽南话、安徽话、 福建话、甘肃话、贵州话、河北话、河南话、湖北话、湖南话、江西话、辽宁话、宁夏话、 陕西话、山西话、山东话、天津话、云南话等20多种方言)
This model is converted from https://www.modelscope.cn/models/FireRedTeam/FireRedASR2-AED.
It supports both Chinese and English, as well as more than 20 dialects and accents.
Note
支持的中文方言/口音:粤语(香港和广东)、四川话、上海话、吴语、闽南话、安徽话、 福建话、甘肃话、贵州话、河北话、河南话、湖北话、湖南话、江西话、辽宁话、宁夏话、 陕西话、山西话、山东话、天津话、云南话等。
The sections below show how to use it.
Download
Please use the following commands to download it:
cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26.tar.bz2
tar xvf sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26.tar.bz2
rm sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26.tar.bz2
After downloading, you should find the following files:
ls -lh sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26
total 2426960
-rw-r--r--@ 1 fangjun staff 398M 26 Feb 13:57 decoder.int8.onnx
-rw-r--r--@ 1 fangjun staff 779M 26 Feb 13:57 encoder.int8.onnx
-rw-r--r--@ 1 fangjun staff 107B 26 Feb 13:57 README.md
drwxr-xr-x@ 10 fangjun staff 320B 26 Feb 13:57 test_wavs
-rw-r--r--@ 1 fangjun staff 77K 26 Feb 13:57 tokens.txt
Decode a file
Please use the following command to decode a wave file:
./build/bin/sherpa-onnx-offline \
--fire-red-asr-encoder=./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/encoder.int8.onnx \
--fire-red-asr-decoder=./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/decoder.int8.onnx \
--tokens=./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/tokens.txt \
--num-threads=1 \
./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/test_wavs/0.wav
You should see the following output:
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:373 ./build/bin/sherpa-onnx-offline --fire-red-asr-encoder=./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/encoder.int8.onnx --fire-red-asr-decoder=./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/decoder.int8.onnx --tokens=./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/tokens.txt --num-threads=1 ./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/test_wavs/0.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1, enable_token_timestamps=False, enable_segment_timestamps=False), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/encoder.int8.onnx", decoder="./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/decoder.int8.onnx"), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder="", merged_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), funasr_nano=OfflineFunASRNanoModelConfig(encoder_adaptor="", llm="", embedding="", tokenizer="", system_prompt="You are a helpful assistant.", user_prompt="语音转写:", max_new_tokens=512, temperature=1e-06, top_p=0.8, seed=42, language="", itn=True, hotwords=""), medasr=OfflineMedAsrCtcModelConfig(model=""), fire_red_asr_ctc=OfflineFireRedAsrCtcModelConfig(model=""), telespeech_ctc="", tokens="./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
recognizer created in 1.276 s
Started
Done!
./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/test_wavs/0.wav
{"lang": "", "emotion": "", "event": "", "text": "昨天是 MONDAY TODAY IS礼拜二 THE DAY AFTER TOMORROW是星期三", "timestamps": [], "durations": [], "tokens":["昨", "天", "是", " MO", "ND", "AY", " TO", "D", "AY", " IS", "礼", "拜", "二", " THE", " DAY", " AFTER", " TO", "M", "OR", "ROW", "是", "星", "期", "三"], "ys_log_probs": [], "words": []}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 3.343 s
Real time factor (RTF): 3.343 / 10.053 = 0.333
Decode long files with a VAD
The following example demonstrates how to use the model to decode a long wave file.
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/lei-jun-test.wav
build/bin/sherpa-onnx-vad-with-offline-asr \
--num-threads=3 \
--silero-vad-model=./silero_vad.onnx \
--fire-red-asr-encoder=./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/encoder.int8.onnx \
--fire-red-asr-decoder=./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/decoder.int8.onnx \
--tokens=./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/tokens.txt \
./lei-jun-test.wav
You should see the following output:
| Wave filename | Content |
|---|---|
| lei-jun-test.wav |
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:373 build/bin/sherpa-onnx-vad-with-offline-asr --num-threads=3 --silero-vad-model=./silero_vad.onnx --fire-red-asr-encoder=./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/encoder.int8.onnx --fire-red-asr-decoder=./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/decoder.int8.onnx --tokens=./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/tokens.txt ./lei-jun-test.wav
VadModelConfig(silero_vad=SileroVadModelConfig(model="./silero_vad.onnx", threshold=0.5, min_silence_duration=0.5, min_speech_duration=0.25, max_speech_duration=20, window_size=512, neg_threshold=-1), ten_vad=TenVadModelConfig(model="", threshold=0.5, min_silence_duration=0.5, min_speech_duration=0.25, max_speech_duration=20, window_size=256), sample_rate=16000, num_threads=1, provider="cpu", debug=False)
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1, enable_token_timestamps=False, enable_segment_timestamps=False), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/encoder.int8.onnx", decoder="./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/decoder.int8.onnx"), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder="", merged_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), funasr_nano=OfflineFunASRNanoModelConfig(encoder_adaptor="", llm="", embedding="", tokenizer="", system_prompt="You are a helpful assistant.", user_prompt="语音转写:", max_new_tokens=512, temperature=1e-06, top_p=0.8, seed=42, language="", itn=True, hotwords=""), medasr=OfflineMedAsrCtcModelConfig(model=""), fire_red_asr_ctc=OfflineFireRedAsrCtcModelConfig(model=""), telespeech_ctc="", tokens="./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/tokens.txt", num_threads=3, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
Recognizer created!
Started
Reading: ./lei-jun-test.wav
Started!
29.918 -- 34.028: 晚上好欢迎大家来参加今天晚上的活动
34.494 -- 35.660: 谢谢大家
42.174 -- 43.916: 这是我第四次
45.054 -- 45.932: 年度演讲
47.006 -- 49.772: 前三次呢因为疫情的原因
50.526 -- 55.212: 都在小米科技园内举办现场的人很少
56.158 -- 57.292: 这是第四次
58.206 -- 62.316: 我们仔细想了想我们还是想办一个比较大的聚会
62.846 -- 66.636: 然后呢让我们的新朋友老朋友一起聚一聚
67.742 -- 68.748: 今天的话呢
69.214 -- 70.668: 我们就在北京的
71.646 -- 74.668: 国家会议中心呢举办了这么一个活动
75.486 -- 79.276: 现场呢来了很多人大概有三千五百人
79.998 -- 81.932: 还有很多很多的朋友呢
82.686 -- 85.612: 通过观看直播的方式来参与
86.366 -- 90.284: 再一次呢对大家的参加表示感谢谢谢大家
98.494 -- 99.564: 两个月前
100.382 -- 104.236: 我参加了今年武汉大学的毕业典礼
105.950 -- 107.084: 今年呢是
107.902 -- 110.380: 武汉大学建校一百三十周年
111.742 -- 112.652: 作为校友
113.374 -- 114.636: 被母校邀请
115.230 -- 117.068: 在毕业典礼上致辞
118.046 -- 119.180: 这对我来说
119.934 -- 122.540: 是至高无上的荣誉
123.678 -- 125.516: 站在讲台的那一刻
126.238 -- 128.332: 面对全校师生
129.214 -- 134.092: 关于武大的所有的记忆一下子涌现在脑海里
134.974 -- 139.212: 今天呢我就先和大家聊聊五大往事
141.982 -- 143.788: 还是三十六年前
145.950 -- 147.436: 一九八七年
148.702 -- 151.372: 我呢考上了武汉大学的计算机系
152.702 -- 156.524: 在武汉大学的图书馆里看了一本书
157.566 -- 158.540: 硅谷之火
159.326 -- 161.612: 建立了我一生的梦想
163.294 -- 164.428: 看完书以后
165.278 -- 166.412: 热血沸腾
167.614 -- 169.292: 激动的睡不着觉
170.430 -- 171.212: 我还记得
171.998 -- 174.540: 那天晚上星光很亮
175.390 -- 177.612: 我就在武大的操场上
178.366 -- 179.692: 就是屏幕上这个
180.798 -- 182.444: 走了一圈又一圈
182.974 -- 185.100: 走了整整一个晚上
186.494 -- 187.692: 我心里有团火
188.926 -- 191.692: 我也想办一个伟大的公司
193.950 -- 194.764: 就是这样
197.662 -- 198.764: 梦想之火
199.294 -- 202.220: 在我心里彻底点燃了
210.814 -- 212.300: 一个大一的新生
220.990 -- 222.508: 一个大一的新生
223.998 -- 226.796: 个从县城里出来的年轻人
228.478 -- 230.476: 什么也不会什么也没有
231.550 -- 236.012: 就想创办一家伟大的公司这不就是天方夜谭吗
237.726 -- 239.596: 这么离谱的一个梦想
240.414 -- 242.156: 该如何实现呢
243.870 -- 244.716: 那天晚上
245.182 -- 246.732: 我想了一整晚上
247.998 -- 248.812: 说实话
250.334 -- 253.676: 越想越糊涂完全理不清头绪
255.006 -- 255.948: 后来我在想
256.830 -- 257.740: 干脆别想了
258.366 -- 259.692: 把书念好
260.446 -- 261.196: 是正事
262.174 -- 262.892: 所以呢
263.390 -- 265.644: 我就下定决心认认真真读书
266.654 -- 267.084: 那么
268.478 -- 271.276: 我怎么能够把书读的不同凡响呢
num threads: 3
decoding method: greedy_search
Elapsed seconds: 49.470 s
Real time factor (RTF): 49.470 / 272.448 = 0.182
Real-time/Streaming Speech recognition from a microphone with VAD
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
./build/bin/sherpa-onnx-vad-microphone-simulated-streaming-asr \
--silero-vad-model=./silero_vad.onnx \
--fire-red-asr-encoder=./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/encoder.int8.onnx \
--fire-red-asr-decoder=./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/decoder.int8.onnx \
--tokens=./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/tokens.txt
Speech recognition from a microphone
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-microphone-offline \
--fire-red-asr-encoder=./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/encoder.int8.onnx \
--fire-red-asr-decoder=./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/decoder.int8.onnx \
--tokens=./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/tokens.txt
Speech recognition from a microphone with VAD
cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
./build/bin/sherpa-onnx-vad-microphone-offline-asr \
--silero-vad-model=./silero_vad.onnx \
--fire-red-asr-encoder=./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/encoder.int8.onnx \
--fire-red-asr-decoder=./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/decoder.int8.onnx \
--tokens=./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/tokens.txt
sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16 (v1, Chinese + English, 普通话、四川话、河南话等)
This model is converted from https://huggingface.co/FireRedTeam/FireRedASR-AED-L
It supports the following 2 languages:
Chinese (普通话, 四川话、天津话、河南话等方言)
English
In the following, we describe how to download it.
Download
Please use the following commands to download it:
cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16.tar.bz2
tar xvf sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16.tar.bz2
rm sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16.tar.bz2
After downloading, you should find the following files:
ls -lh sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/
total 1.7G
-rw-r--r-- 1 kuangfangjun root 188 Feb 16 16:22 README.md
-rw-r--r-- 1 kuangfangjun root 425M Feb 16 16:21 decoder.int8.onnx
-rw-r--r-- 1 kuangfangjun root 1.3G Feb 16 16:21 encoder.int8.onnx
drwxr-xr-x 10 kuangfangjun root 0 Feb 16 16:26 test_wavs
-rw-r--r-- 1 kuangfangjun root 70K Feb 16 16:21 tokens.txt
ls -lh sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/test_wavs/
total 1.9M
-rw-r--r-- 1 kuangfangjun root 315K Feb 16 16:24 0.wav
-rw-r--r-- 1 kuangfangjun root 160K Feb 16 16:24 1.wav
-rw-r--r-- 1 kuangfangjun root 147K Feb 16 16:24 2.wav
-rw-r--r-- 1 kuangfangjun root 245K Feb 16 16:25 3-sichuan.wav
-rw-r--r-- 1 kuangfangjun root 276K Feb 16 16:24 3.wav
-rw-r--r-- 1 kuangfangjun root 245K Feb 16 16:25 4-tianjin.wav
-rw-r--r-- 1 kuangfangjun root 250K Feb 16 16:26 5-henan.wav
-rw-r--r-- 1 kuangfangjun root 276K Feb 16 16:24 8k.wav
Decode a file
Please use the following command to decode a wave file:
./build/bin/sherpa-onnx-offline \
--tokens=./sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/tokens.txt \
--fire-red-asr-encoder=./sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/encoder.int8.onnx \
--fire-red-asr-decoder=./sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/decoder.int8.onnx \
--num-threads=1 \
./sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/test_wavs/0.wav
You should see the following output:
/star-fj/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:375 ./build/bin/sherpa-onnx-offline --tokens=./sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/tokens.txt --fire-red-asr-encoder=./sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/encoder.int8.onnx --fire-red-asr-decoder=./sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/decoder.int8.onnx --num-threads=1 ./sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/test_wavs/0.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="./sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/encoder.int8.onnx", decoder="./sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/decoder.int8.onnx"), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), telespeech_ctc="", tokens="./sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="")
Creating recognizer ...
Started
Done!
./sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/test_wavs/0.wav
{"lang": "", "emotion": "", "event": "", "text": "昨天是 MONDAY TODAY IS礼拜二 THE DAY AFTER TOMORROW是星期三", "timestamps": [], "tokens":["昨", "天", "是", " MO", "ND", "AY", " TO", "D", "AY", " IS", "礼", "拜", "二", " THE", " DAY", " AFTER", " TO", "M", "OR", "ROW", "是", "星", "期", "三"], "words": []}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 19.555 s
Real time factor (RTF): 19.555 / 10.053 = 1.945