Pre-trained Models

This page describes how to download pre-trained FireRedAsr models.

Note that we support models from the following two repositories

v1 contains only one model, based on attention-encoder-decoder and it is somewhat slow on CPU.

v2 contains one more CTC model, which is very fast on CPU. The AED model in v2 is also much faster than the v1 AED model.

sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25 (v2, CTC, Chinese + English, 普通话、粤语(香港和广东)、四川话、上海话、吴语、闽南话、安徽话、 福建话、甘肃话、贵州话、河北话、河南话、湖北话、湖南话、江西话、辽宁话、宁夏话、 陕西话、山西话、山东话、天津话、云南话等20多种方言)

This model is converted from https://www.modelscope.cn/models/FireRedTeam/FireRedASR2-AED. Note that only the CTC branch is converted. The attention decoder branch is excluded.

It supports both Chinese and English, as well as more than 20 dialects and accents.

Note

支持的中文方言/口音:粤语(香港和广东)、四川话、上海话、吴语、闽南话、安徽话、 福建话、甘肃话、贵州话、河北话、河南话、湖北话、湖南话、江西话、辽宁话、宁夏话、 陕西话、山西话、山东话、天津话、云南话等。

The sections below show how to use it.

Real-time/streaming speech recognition on Android

Please visit https://k2-fsa.github.io/sherpa/onnx/android/apk.html and select the file

sherpa-onnx-<version>-arm64-v8a-simulated_streaming_asr-zh_en-fire_red_asr2_ctc_int8_2026_02_25.apk

The source code for the APK can be found at

See Build sherpa-onnx for Android for how to build our Android demo.

Download

Please use the following commands to download it:

cd /path/to/sherpa-onnx

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25.tar.bz2
tar xvf sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25.tar.bz2
rm sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25.tar.bz2

After downloading, you should find the following files:

ls -lh sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25
total 1515528
-rw-r--r--@  1 fangjun  staff   740M 26 Feb 13:42 model.int8.onnx
-rw-r--r--@  1 fangjun  staff   190B 26 Feb 13:35 README.md
drwxr-xr-x@ 10 fangjun  staff   320B 26 Feb 13:42 test_wavs
-rw-r--r--@  1 fangjun  staff    77K 26 Feb 13:42 tokens.txt

ls -lh sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25/test_wavs/
total 3848
-rw-r--r--@ 1 fangjun  staff   314K 26 Feb 13:42 0.wav
-rw-r--r--@ 1 fangjun  staff   159K 26 Feb 13:42 1.wav
-rw-r--r--@ 1 fangjun  staff   147K 26 Feb 13:42 2.wav
-rw-r--r--@ 1 fangjun  staff   245K 26 Feb 13:42 3-sichuan.wav
-rw-r--r--@ 1 fangjun  staff   276K 26 Feb 13:42 3.wav
-rw-r--r--@ 1 fangjun  staff   244K 26 Feb 13:42 4-tianjin.wav
-rw-r--r--@ 1 fangjun  staff   250K 26 Feb 13:42 5-henan.wav
-rw-r--r--@ 1 fangjun  staff   276K 26 Feb 13:42 8k.wav

Decode a file

Please use the following command to decode a wave file:

./build/bin/sherpa-onnx-offline \
  --num-threads=1 \
  --fire-red-asr-ctc=./sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25/model.int8.onnx \
  --tokens=./sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25/tokens.txt \
  ./sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25/test_wavs/1.wav

You should see the following output:

/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:373 ./build/bin/sherpa-onnx-offline --num-threads=1 --fire-red-asr-ctc=./sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25/model.int8.onnx --tokens=./sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25/tokens.txt ./sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25/test_wavs/1.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1, enable_token_timestamps=False, enable_segment_timestamps=False), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder="", merged_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), funasr_nano=OfflineFunASRNanoModelConfig(encoder_adaptor="", llm="", embedding="", tokenizer="", system_prompt="You are a helpful assistant.", user_prompt="语音转写:", max_new_tokens=512, temperature=1e-06, top_p=0.8, seed=42, language="", itn=True, hotwords=""), medasr=OfflineMedAsrCtcModelConfig(model=""), fire_red_asr_ctc=OfflineFireRedAsrCtcModelConfig(model="./sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25/model.int8.onnx"), telespeech_ctc="", tokens="./sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
recognizer created in 0.496 s
Started
Done!

./sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25/test_wavs/1.wav
{"lang": "", "emotion": "", "event": "", "text": "这是第一种第二种叫呃与 ALWAYSISE ALWAYS什么意思啊", "timestamps": [0.76, 0.92, 1.04, 1.16, 1.28, 1.48, 1.60, 1.76, 2.04, 2.72, 3.16, 3.56, 3.80, 4.12, 4.52, 4.60, 4.72, 4.84, 4.92], "durations": [], "tokens":["这", "是", "第", "一", "种", "第", "二", "种", "叫", "呃", "与", " ALWAYS", "ISE", " ALWAYS", "什", "么", "意", "思", "啊"], "ys_log_probs": [], "words": []}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.883 s
Real time factor (RTF): 0.883 / 5.100 = 0.173

Decode long files with a VAD

The following example demonstrates how to use the model to decode a long wave file.

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/lei-jun-test.wav

build/bin/sherpa-onnx-vad-with-offline-asr \
  --num-threads=3 \
  --silero-vad-model=./silero_vad.onnx \
  --fire-red-asr-ctc=./sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25/model.int8.onnx \
  --tokens=./sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25/tokens.txt \
  ./lei-jun-test.wav

You should see the following output:

Wave filename Content
lei-jun-test.wav
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:373 build/bin/sherpa-onnx-vad-with-offline-asr --num-threads=3 --silero-vad-model=./silero_vad.onnx --fire-red-asr-ctc=./sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25/model.int8.onnx --tokens=./sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25/tokens.txt ./lei-jun-test.wav 

VadModelConfig(silero_vad=SileroVadModelConfig(model="./silero_vad.onnx", threshold=0.5, min_silence_duration=0.5, min_speech_duration=0.25, max_speech_duration=20, window_size=512, neg_threshold=-1), ten_vad=TenVadModelConfig(model="", threshold=0.5, min_silence_duration=0.5, min_speech_duration=0.25, max_speech_duration=20, window_size=256), sample_rate=16000, num_threads=1, provider="cpu", debug=False)
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1, enable_token_timestamps=False, enable_segment_timestamps=False), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder="", merged_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), funasr_nano=OfflineFunASRNanoModelConfig(encoder_adaptor="", llm="", embedding="", tokenizer="", system_prompt="You are a helpful assistant.", user_prompt="语音转写:", max_new_tokens=512, temperature=1e-06, top_p=0.8, seed=42, language="", itn=True, hotwords=""), medasr=OfflineMedAsrCtcModelConfig(model=""), fire_red_asr_ctc=OfflineFireRedAsrCtcModelConfig(model="./sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25/model.int8.onnx"), telespeech_ctc="", tokens="./sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25/tokens.txt", num_threads=3, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
Recognizer created!
Started
Reading: ./lei-jun-test.wav
Started!
29.918 -- 34.028: 上好欢迎大家来参加今天晚上的活动
34.494 -- 35.660: 谢谢大家
42.174 -- 43.916: 这是我第四次
45.054 -- 45.932: 年度演讲
47.006 -- 49.772: 前三次呢因为疫情的原因
50.526 -- 55.212: 都在小米科技园内举办现场呢的人很少
56.158 -- 57.292: 这是第四次
58.206 -- 62.316: 我们仔细想了想我们还是想办一个比较大的聚会
62.846 -- 66.636: 然后呢让我们的新朋友老朋友一起聚一聚
67.742 -- 68.748: 今天的话呢
69.214 -- 70.668: 我们就在北京的
71.646 -- 74.668: 国家会议中心呢举办了这么一个活动
75.486 -- 79.276: 现场呢来了很多人大概有三千五百人
79.998 -- 81.932: 还有很多很多的朋友呢
82.686 -- 85.612: 通过观看直播的方式来参与
86.366 -- 90.284: 再一次呢对大家的参加表示感谢谢谢大家
98.494 -- 99.564: 两个月前
100.382 -- 104.236: 我参加了今年武汉大学的毕业典礼
105.950 -- 107.084: 今年呢是
107.902 -- 110.380: 武汉大学建校一百三十周年
111.742 -- 112.652: 作为校友
113.374 -- 114.636: 被母校邀请
115.230 -- 117.068: 在毕业典礼上致辞
118.046 -- 119.180: 这对我来说
119.934 -- 122.540: 是志高无上的荣誉
123.678 -- 125.516: 站在讲台的那一刻
126.238 -- 128.332: 面对全校师生
129.214 -- 134.092: 关于武大的所有的记忆一下子涌现在脑海里
134.974 -- 139.212: 今天呢我就先和大家聊聊五大往事
141.982 -- 143.788: 还是三十六年前
145.950 -- 147.436: 一九八七年
148.702 -- 151.372: 我呢考上了武汉大学的计算机系
152.702 -- 156.524: 在武汉大学的图书馆里看了一本书
157.566 -- 158.540: 硅谷之火
159.326 -- 161.612: 建立了我一生的梦想
163.294 -- 164.428: 看完书以后
165.278 -- 166.412: 热血沸腾
167.614 -- 169.292: 激动的睡不着觉
170.430 -- 171.212: 我还记得
171.998 -- 174.540: 那天晚上星光很亮
175.390 -- 177.612: 我就在五大的操场上
178.366 -- 179.692: 就是屏幕上这个操场
180.798 -- 182.444: 走了一圈又一圈
182.974 -- 185.100: 走了整整一个晚上
186.494 -- 187.692: 我心里有团火
188.926 -- 191.692: 我也想办一个伟大的公司
193.950 -- 194.764: 就是这样
197.662 -- 198.764: 梦想吃火
199.294 -- 202.220: 在我心里彻底点燃了
210.814 -- 212.300: 一个大一的新
220.990 -- 222.508: 一个大一的新生
223.998 -- 226.796: 个从县城里出来的年轻人
228.478 -- 230.476: 么也不会什么也没有
231.550 -- 236.012: 就想创办一家伟大的公司这不就是天荒夜谭吗
237.726 -- 239.596: 这么离谱的一个梦想
240.414 -- 242.156: 该如何实现呢
243.870 -- 244.716: 那天晚上
245.182 -- 246.732: 我想了一整晚上
247.998 -- 248.812: 说实话
250.334 -- 253.676: 越想越糊涂完全理不清头绪
255.006 -- 255.948: 后来我在想
256.830 -- 257.740: 干脆别想了
258.366 -- 259.692: 把书好
260.446 -- 261.196: 是正是
262.174 -- 262.892: 所以呢
263.390 -- 265.644: 我就下定决心认认正真读书
266.654 -- 267.084: 那么
268.478 -- 271.276: 我怎么能够把书读的不同反响呢
num threads: 3
decoding method: greedy_search
Elapsed seconds: 15.693 s
Real time factor (RTF): 15.693 / 272.448 = 0.058

Real-time/Streaming Speech recognition from a microphone with VAD

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx

./build/bin/sherpa-onnx-vad-microphone-simulated-streaming-asr \
  --silero-vad-model=./silero_vad.onnx \
  --fire-red-asr-ctc=./sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25/model.int8.onnx \
  --tokens=./sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25/tokens.txt

Speech recognition from a microphone

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-microphone-offline \
  --fire-red-asr-ctc=./sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25/model.int8.onnx \
  --tokens=./sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25/tokens.txt

Speech recognition from a microphone with VAD

cd /path/to/sherpa-onnx

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx

./build/bin/sherpa-onnx-vad-microphone-offline-asr \
  --silero-vad-model=./silero_vad.onnx \
  --fire-red-asr-ctc=./sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25/model.int8.onnx \
  --tokens=./sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25/tokens.txt

sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26 (v2, AED, Chinese + English, 普通话、粤语(香港和广东)、四川话、上海话、吴语、闽南话、安徽话、 福建话、甘肃话、贵州话、河北话、河南话、湖北话、湖南话、江西话、辽宁话、宁夏话、 陕西话、山西话、山东话、天津话、云南话等20多种方言)

This model is converted from https://www.modelscope.cn/models/FireRedTeam/FireRedASR2-AED.

It supports both Chinese and English, as well as more than 20 dialects and accents.

Note

支持的中文方言/口音:粤语(香港和广东)、四川话、上海话、吴语、闽南话、安徽话、 福建话、甘肃话、贵州话、河北话、河南话、湖北话、湖南话、江西话、辽宁话、宁夏话、 陕西话、山西话、山东话、天津话、云南话等。

The sections below show how to use it.

Download

Please use the following commands to download it:

cd /path/to/sherpa-onnx

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26.tar.bz2
tar xvf sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26.tar.bz2
rm sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26.tar.bz2

After downloading, you should find the following files:

ls -lh sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26
total 2426960
-rw-r--r--@  1 fangjun  staff   398M 26 Feb 13:57 decoder.int8.onnx
-rw-r--r--@  1 fangjun  staff   779M 26 Feb 13:57 encoder.int8.onnx
-rw-r--r--@  1 fangjun  staff   107B 26 Feb 13:57 README.md
drwxr-xr-x@ 10 fangjun  staff   320B 26 Feb 13:57 test_wavs
-rw-r--r--@  1 fangjun  staff    77K 26 Feb 13:57 tokens.txt

Decode a file

Please use the following command to decode a wave file:

./build/bin/sherpa-onnx-offline \
  --fire-red-asr-encoder=./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/encoder.int8.onnx \
  --fire-red-asr-decoder=./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/decoder.int8.onnx \
  --tokens=./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/tokens.txt \
  --num-threads=1 \
  ./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/test_wavs/0.wav

You should see the following output:

/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:373 ./build/bin/sherpa-onnx-offline --fire-red-asr-encoder=./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/encoder.int8.onnx --fire-red-asr-decoder=./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/decoder.int8.onnx --tokens=./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/tokens.txt --num-threads=1 ./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/test_wavs/0.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1, enable_token_timestamps=False, enable_segment_timestamps=False), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/encoder.int8.onnx", decoder="./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/decoder.int8.onnx"), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder="", merged_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), funasr_nano=OfflineFunASRNanoModelConfig(encoder_adaptor="", llm="", embedding="", tokenizer="", system_prompt="You are a helpful assistant.", user_prompt="语音转写:", max_new_tokens=512, temperature=1e-06, top_p=0.8, seed=42, language="", itn=True, hotwords=""), medasr=OfflineMedAsrCtcModelConfig(model=""), fire_red_asr_ctc=OfflineFireRedAsrCtcModelConfig(model=""), telespeech_ctc="", tokens="./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
recognizer created in 1.276 s
Started
Done!

./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/test_wavs/0.wav
{"lang": "", "emotion": "", "event": "", "text": "昨天是 MONDAY TODAY IS礼拜二 THE DAY AFTER TOMORROW是星期三", "timestamps": [], "durations": [], "tokens":["昨", "天", "是", " MO", "ND", "AY", " TO", "D", "AY", " IS", "礼", "拜", "二", " THE", " DAY", " AFTER", " TO", "M", "OR", "ROW", "是", "星", "期", "三"], "ys_log_probs": [], "words": []}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 3.343 s
Real time factor (RTF): 3.343 / 10.053 = 0.333

Decode long files with a VAD

The following example demonstrates how to use the model to decode a long wave file.

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/lei-jun-test.wav

build/bin/sherpa-onnx-vad-with-offline-asr \
  --num-threads=3 \
  --silero-vad-model=./silero_vad.onnx \
  --fire-red-asr-encoder=./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/encoder.int8.onnx \
  --fire-red-asr-decoder=./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/decoder.int8.onnx \
  --tokens=./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/tokens.txt \
  ./lei-jun-test.wav

You should see the following output:

Wave filename Content
lei-jun-test.wav
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:373 build/bin/sherpa-onnx-vad-with-offline-asr --num-threads=3 --silero-vad-model=./silero_vad.onnx --fire-red-asr-encoder=./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/encoder.int8.onnx --fire-red-asr-decoder=./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/decoder.int8.onnx --tokens=./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/tokens.txt ./lei-jun-test.wav 

VadModelConfig(silero_vad=SileroVadModelConfig(model="./silero_vad.onnx", threshold=0.5, min_silence_duration=0.5, min_speech_duration=0.25, max_speech_duration=20, window_size=512, neg_threshold=-1), ten_vad=TenVadModelConfig(model="", threshold=0.5, min_silence_duration=0.5, min_speech_duration=0.25, max_speech_duration=20, window_size=256), sample_rate=16000, num_threads=1, provider="cpu", debug=False)
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1, enable_token_timestamps=False, enable_segment_timestamps=False), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/encoder.int8.onnx", decoder="./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/decoder.int8.onnx"), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder="", merged_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), funasr_nano=OfflineFunASRNanoModelConfig(encoder_adaptor="", llm="", embedding="", tokenizer="", system_prompt="You are a helpful assistant.", user_prompt="语音转写:", max_new_tokens=512, temperature=1e-06, top_p=0.8, seed=42, language="", itn=True, hotwords=""), medasr=OfflineMedAsrCtcModelConfig(model=""), fire_red_asr_ctc=OfflineFireRedAsrCtcModelConfig(model=""), telespeech_ctc="", tokens="./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/tokens.txt", num_threads=3, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
Recognizer created!
Started
Reading: ./lei-jun-test.wav
Started!
29.918 -- 34.028: 晚上好欢迎大家来参加今天晚上的活动
34.494 -- 35.660: 谢谢大家
42.174 -- 43.916: 这是我第四次
45.054 -- 45.932: 年度演讲
47.006 -- 49.772: 前三次呢因为疫情的原因
50.526 -- 55.212: 都在小米科技园内举办现场的人很少
56.158 -- 57.292: 这是第四次
58.206 -- 62.316: 我们仔细想了想我们还是想办一个比较大的聚会
62.846 -- 66.636: 然后呢让我们的新朋友老朋友一起聚一聚
67.742 -- 68.748: 今天的话呢
69.214 -- 70.668: 我们就在北京的
71.646 -- 74.668: 国家会议中心呢举办了这么一个活动
75.486 -- 79.276: 现场呢来了很多人大概有三千五百人
79.998 -- 81.932: 还有很多很多的朋友呢
82.686 -- 85.612: 通过观看直播的方式来参与
86.366 -- 90.284: 再一次呢对大家的参加表示感谢谢谢大家
98.494 -- 99.564: 两个月前
100.382 -- 104.236: 我参加了今年武汉大学的毕业典礼
105.950 -- 107.084: 今年呢是
107.902 -- 110.380: 武汉大学建校一百三十周年
111.742 -- 112.652: 作为校友
113.374 -- 114.636: 被母校邀请
115.230 -- 117.068: 在毕业典礼上致辞
118.046 -- 119.180: 这对我来说
119.934 -- 122.540: 是至高无上的荣誉
123.678 -- 125.516: 站在讲台的那一刻
126.238 -- 128.332: 面对全校师生
129.214 -- 134.092: 关于武大的所有的记忆一下子涌现在脑海里
134.974 -- 139.212: 今天呢我就先和大家聊聊五大往事
141.982 -- 143.788: 还是三十六年前
145.950 -- 147.436: 一九八七年
148.702 -- 151.372: 我呢考上了武汉大学的计算机系
152.702 -- 156.524: 在武汉大学的图书馆里看了一本书
157.566 -- 158.540: 硅谷之火
159.326 -- 161.612: 建立了我一生的梦想
163.294 -- 164.428: 看完书以后
165.278 -- 166.412: 热血沸腾
167.614 -- 169.292: 激动的睡不着觉
170.430 -- 171.212: 我还记得
171.998 -- 174.540: 那天晚上星光很亮
175.390 -- 177.612: 我就在武大的操场上
178.366 -- 179.692: 就是屏幕上这个
180.798 -- 182.444: 走了一圈又一圈
182.974 -- 185.100: 走了整整一个晚上
186.494 -- 187.692: 我心里有团火
188.926 -- 191.692: 我也想办一个伟大的公司
193.950 -- 194.764: 就是这样
197.662 -- 198.764: 梦想之火
199.294 -- 202.220: 在我心里彻底点燃了
210.814 -- 212.300: 一个大一的新生
220.990 -- 222.508: 一个大一的新生
223.998 -- 226.796: 个从县城里出来的年轻人
228.478 -- 230.476: 什么也不会什么也没有
231.550 -- 236.012: 就想创办一家伟大的公司这不就是天方夜谭吗
237.726 -- 239.596: 这么离谱的一个梦想
240.414 -- 242.156: 该如何实现呢
243.870 -- 244.716: 那天晚上
245.182 -- 246.732: 我想了一整晚上
247.998 -- 248.812: 说实话
250.334 -- 253.676: 越想越糊涂完全理不清头绪
255.006 -- 255.948: 后来我在想
256.830 -- 257.740: 干脆别想了
258.366 -- 259.692: 把书念好
260.446 -- 261.196: 是正事
262.174 -- 262.892: 所以呢
263.390 -- 265.644: 我就下定决心认认真真读书
266.654 -- 267.084: 那么
268.478 -- 271.276: 我怎么能够把书读的不同凡响呢
num threads: 3
decoding method: greedy_search
Elapsed seconds: 49.470 s
Real time factor (RTF): 49.470 / 272.448 = 0.182

Real-time/Streaming Speech recognition from a microphone with VAD

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx

./build/bin/sherpa-onnx-vad-microphone-simulated-streaming-asr \
  --silero-vad-model=./silero_vad.onnx \
  --fire-red-asr-encoder=./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/encoder.int8.onnx \
  --fire-red-asr-decoder=./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/decoder.int8.onnx \
  --tokens=./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/tokens.txt

Speech recognition from a microphone

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-microphone-offline \
  --fire-red-asr-encoder=./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/encoder.int8.onnx \
  --fire-red-asr-decoder=./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/decoder.int8.onnx \
  --tokens=./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/tokens.txt

Speech recognition from a microphone with VAD

cd /path/to/sherpa-onnx

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx

./build/bin/sherpa-onnx-vad-microphone-offline-asr \
  --silero-vad-model=./silero_vad.onnx \
  --fire-red-asr-encoder=./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/encoder.int8.onnx \
  --fire-red-asr-decoder=./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/decoder.int8.onnx \
  --tokens=./sherpa-onnx-fire-red-asr2-zh_en-int8-2026-02-26/tokens.txt

sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16 (v1, Chinese + English, 普通话、四川话、河南话等)

This model is converted from https://huggingface.co/FireRedTeam/FireRedASR-AED-L

It supports the following 2 languages:

  • Chinese (普通话, 四川话、天津话、河南话等方言)

  • English

In the following, we describe how to download it.

Download

Please use the following commands to download it:

cd /path/to/sherpa-onnx

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16.tar.bz2
tar xvf sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16.tar.bz2
rm sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16.tar.bz2

After downloading, you should find the following files:

ls -lh sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/
total 1.7G
-rw-r--r--  1 kuangfangjun root  188 Feb 16 16:22 README.md
-rw-r--r--  1 kuangfangjun root 425M Feb 16 16:21 decoder.int8.onnx
-rw-r--r--  1 kuangfangjun root 1.3G Feb 16 16:21 encoder.int8.onnx
drwxr-xr-x 10 kuangfangjun root    0 Feb 16 16:26 test_wavs
-rw-r--r--  1 kuangfangjun root  70K Feb 16 16:21 tokens.txt

ls -lh sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/test_wavs/
total 1.9M
-rw-r--r-- 1 kuangfangjun root 315K Feb 16 16:24 0.wav
-rw-r--r-- 1 kuangfangjun root 160K Feb 16 16:24 1.wav
-rw-r--r-- 1 kuangfangjun root 147K Feb 16 16:24 2.wav
-rw-r--r-- 1 kuangfangjun root 245K Feb 16 16:25 3-sichuan.wav
-rw-r--r-- 1 kuangfangjun root 276K Feb 16 16:24 3.wav
-rw-r--r-- 1 kuangfangjun root 245K Feb 16 16:25 4-tianjin.wav
-rw-r--r-- 1 kuangfangjun root 250K Feb 16 16:26 5-henan.wav
-rw-r--r-- 1 kuangfangjun root 276K Feb 16 16:24 8k.wav

Decode a file

Please use the following command to decode a wave file:

./build/bin/sherpa-onnx-offline \
  --tokens=./sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/tokens.txt \
  --fire-red-asr-encoder=./sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/encoder.int8.onnx \
  --fire-red-asr-decoder=./sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/decoder.int8.onnx \
  --num-threads=1 \
  ./sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/test_wavs/0.wav

You should see the following output:

/star-fj/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:375 ./build/bin/sherpa-onnx-offline --tokens=./sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/tokens.txt --fire-red-asr-encoder=./sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/encoder.int8.onnx --fire-red-asr-decoder=./sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/decoder.int8.onnx --num-threads=1 ./sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/test_wavs/0.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="./sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/encoder.int8.onnx", decoder="./sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/decoder.int8.onnx"), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), telespeech_ctc="", tokens="./sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="")
Creating recognizer ...
Started
Done!

./sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/test_wavs/0.wav
{"lang": "", "emotion": "", "event": "", "text": "昨天是 MONDAY TODAY IS礼拜二 THE DAY AFTER TOMORROW是星期三", "timestamps": [], "tokens":["昨", "天", "是", " MO", "ND", "AY", " TO", "D", "AY", " IS", "礼", "拜", "二", " THE", " DAY", " AFTER", " TO", "M", "OR", "ROW", "是", "星", "期", "三"], "words": []}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 19.555 s
Real time factor (RTF): 19.555 / 10.053 = 1.945