Pre-trained Models
This page describes how to download pre-trained Fun-ASR-Nano-2512 models.
sherpa-onnx-funasr-nano-int8-2025-12-30 (Chinese, English, Japanese)
This model is converted from Fun-ASR-Nano-2512 using scripts from https://github.com/Wasser1462/FunASR-nano-onnx.
It supports the following 3 languages:
Chinese
English
Japanese
Hint
中文包括 7 种方言(吴语、粤语、闽语、客家话、赣语、湘语、晋语)和 26 种地方口音(河南、山西、湖北、四川、重庆、云南、贵州、广东、广西 及其他 20 多个地区)。
英文和日文涵盖多种地方口音。
此外还支持歌词识别和说唱语音识别。
In the following, we describe how to use it.
Huggingface space
You can visit
to try this model in your browser.
Hint
You need to first select the language 31 languages (FunASR Nano)
and then select the model csukuangfj/sherpa-onnx-funasr-nano-int8-2025-12-30.
Android APKs
Real-time speech recognition Android APKs can be found at
Please always download the latest version.
Hint
Please search for multi-funasr_nano_int8_2025_12_30.apk in the above page, e.g.,
sherpa-onnx-1.12.21-arm64-v8a-vad_asr-multi-funasr_nano_int8_2025_12_30.apk.
Hint
For Chinese users, you can also visit https://k2-fsa.github.io/sherpa/onnx/vad/apk-asr-cn.html
Download
Please use the following commands to download it:
cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-funasr-nano-int8-2025-12-30.tar.bz2
# For Chinese users, you can also use
# wget https://modelscope.cn/models/csukuangfj/asr-models/resolve/master/sherpa-onnx-funasr-nano-int8-2025-12-30.tar.bz2
tar xvf sherpa-onnx-funasr-nano-int8-2025-12-30.tar.bz2
rm sherpa-onnx-funasr-nano-int8-2025-12-30.tar.bz2
After downloading, you should find the following files:
ls -lh sherpa-onnx-funasr-nano-int8-2025-12-30/
total 948M
drwxr-xr-x 5 kuangfangjun root 0 Jan 7 19:28 Qwen3-0.6B
-rw-r--r-- 1 kuangfangjun root 253 Jan 7 19:33 README.md
-rw-r--r-- 1 kuangfangjun root 149M Jan 7 19:33 embedding.int8.onnx
-rw-r--r-- 1 kuangfangjun root 227M Jan 7 19:34 encoder_adaptor.int8.onnx
-rw-r--r-- 1 kuangfangjun root 573M Jan 7 19:34 llm.int8.onnx
drwxr-xr-x 27 kuangfangjun root 0 Jan 7 19:28 test_wavs
ls -lh sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B/
total 16M
-rw-r--r-- 1 kuangfangjun root 1.6M Jan 7 19:34 merges.txt
-rw-r--r-- 1 kuangfangjun root 11M Jan 7 19:34 tokenizer.json
-rw-r--r-- 1 kuangfangjun root 2.7M Jan 7 19:34 vocab.json
ls -lh sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/
total 9.7M
-rw-r--r-- 1 kuangfangjun root 6.9K Jan 7 19:33 README.md
-rw-r--r-- 1 kuangfangjun root 220K Jan 7 19:33 dia_hunan.wav
-rw-r--r-- 1 kuangfangjun root 253K Jan 7 19:33 dia_minnan.wav
-rw-r--r-- 1 kuangfangjun root 229K Jan 7 19:33 dia_sh.wav
-rw-r--r-- 1 kuangfangjun root 297K Jan 7 19:33 dia_yue.wav
-rw-r--r-- 1 kuangfangjun root 215K Jan 7 19:33 far_2.wav
-rw-r--r-- 1 kuangfangjun root 682K Jan 7 19:33 far_3.wav
-rw-r--r-- 1 kuangfangjun root 284K Jan 7 19:33 far_4.wav
-rw-r--r-- 1 kuangfangjun root 279K Jan 7 19:33 far_5.wav
-rw-r--r-- 1 kuangfangjun root 254K Jan 7 19:33 ja.wav
-rw-r--r-- 1 kuangfangjun root 255K Jan 7 19:33 ja_en_codeswitch.wav
-rw-r--r-- 1 kuangfangjun root 259K Jan 7 19:33 lyrics.wav
-rw-r--r-- 1 kuangfangjun root 431K Jan 7 19:33 lyrics_2.wav
-rw-r--r-- 1 kuangfangjun root 546K Jan 7 19:33 lyrics_3.wav
-rw-r--r-- 1 kuangfangjun root 1.3M Jan 7 19:33 lyrics_en_1.wav
-rw-r--r-- 1 kuangfangjun root 679K Jan 7 19:33 lyrics_en_2.wav
-rw-r--r-- 1 kuangfangjun root 1.7M Jan 7 19:33 lyrics_en_3.wav
-rw-r--r-- 1 kuangfangjun root 331K Jan 7 19:33 noise_en.wav
-rw-r--r-- 1 kuangfangjun root 267K Jan 7 19:33 rag_biochemistry.wav
-rw-r--r-- 1 kuangfangjun root 214K Jan 7 19:33 rag_chemistry.wav
-rw-r--r-- 1 kuangfangjun root 248K Jan 7 19:33 rag_history.wav
-rw-r--r-- 1 kuangfangjun root 173K Jan 7 19:33 rag_math.wav
-rw-r--r-- 1 kuangfangjun root 192K Jan 7 19:33 rag_medical.wav
-rw-r--r-- 1 kuangfangjun root 379K Jan 7 19:33 rag_physics.wav
-rw-r--r-- 1 kuangfangjun root 224K Jan 7 19:33 vietnamese.wav
Hint
If you need the float32 model file, please use:
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-funasr-nano-2025-12-30.tar.bz2
# For Chinese users, you can also use
# wget https://modelscope.cn/models/csukuangfj/asr-models/resolve/master/sherpa-onnx-funasr-nano-2025-12-30.tar.bz2
tar xvf sherpa-onnx-funasr-nano-2025-12-30.tar.bz2
rm sherpa-onnx-funasr-nano-2025-12-30.tar.bz2
ls -lh sherpa-onnx-funasr-nano-2025-12-30
total 3.7G
drwxr-xr-x 5 kuangfangjun root 0 Jan 7 19:27 Qwen3-0.6B
-rw-r--r-- 1 kuangfangjun root 253 Jan 13 12:19 README.md
-rw-r--r-- 1 kuangfangjun root 594M Jan 13 12:20 embedding.onnx
-rw-r--r-- 1 kuangfangjun root 888M Jan 13 12:19 encoder_adaptor.onnx
-rw-r--r-- 1 kuangfangjun root 2.3G Jan 13 12:22 llm.fp32.data
-rw-r--r-- 1 kuangfangjun root 1011K Jan 13 12:20 llm.fp32.onnx
drwxr-xr-x 27 kuangfangjun root 0 Jan 7 19:27 test_wavs
If you need the float16 model file, please use:
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-funasr-nano-fp16-2025-12-30.tar.bz2
# For Chinese users, you can also use
# wget https://modelscope.cn/models/csukuangfj/asr-models/resolve/master/sherpa-onnx-funasr-nano-fp16-2025-12-30.tar.bz2
tar xvf sherpa-onnx-funasr-nano-fp16-2025-12-30.tar.bz2
rm sherpa-onnx-funasr-nano-fp16-2025-12-30.tar.bz2
ls -lh sherpa-onnx-funasr-nano-fp16-2025-12-30/
total 1.5G
drwxr-xr-x 5 kuangfangjun root 0 Jan 7 19:24 Qwen3-0.6B
-rw-r--r-- 1 kuangfangjun root 253 Jan 13 12:26 README.md
-rw-r--r-- 1 kuangfangjun root 149M Jan 13 12:26 embedding.int8.onnx
-rw-r--r-- 1 kuangfangjun root 227M Jan 13 12:27 encoder_adaptor.int8.onnx
-rw-r--r-- 1 kuangfangjun root 1.2G Jan 13 12:27 llm.fp16.onnx
drwxr-xr-x 27 kuangfangjun root 0 Jan 7 19:24 test_wavs
dia_hunan.wav (湖南方言)
To decode the test file ./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/dia_hunan.wav:
./build/bin/sherpa-onnx-offline \
--funasr-nano-encoder-adaptor=./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx \
--funasr-nano-llm=./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx \
--funasr-nano-tokenizer=./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B \
--funasr-nano-embedding=./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx \
./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/dia_hunan.wav
You should see the following output:
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:373 ./build/bin/sherpa-onnx-offline --funasr-nano-encoder-adaptor=./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx --funasr-nano-llm=./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx --funasr-nano-tokenizer=./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B --funasr-nano-embedding=./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx ./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/dia_hunan.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), funasr_nano=OfflineFunASRNanoModelConfig(encoder_adaptor="./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx", llm="./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx", embedding="./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx", tokenizer="./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B", system_prompt="You are a helpful assistant.", user_prompt="语音转写:", max_new_tokens=512, temperature=1e-06, top_p=0.8, seed=42), medasr=OfflineMedAsrCtcModelConfig(model=""), telespeech_ctc="", tokens="", num_threads=2, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
recognizer created in 1.641 s
Started
Done!
./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/dia_hunan.wav
{"lang": "", "emotion": "", "event": "", "text": "他总的来讲,孙膑对本怀的理解、文样比庞涓略胜一筹。", "timestamps": [0.00, 0.33, 0.66, 1.00, 1.33, 1.66, 1.99, 2.32, 2.66, 2.99, 3.32, 3.65, 3.98, 4.31, 4.65, 4.98, 5.31, 5.64, 5.97, 6.31, 6.64], "durations": [], "tokens":["他", "总的", "来讲", ",", "孙", "膑", "对", "本", "怀", "的理解", "、", "文", "样", "比", "庞", "涓", "略", "胜", "一", "筹", "。"], "ys_log_probs": [], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 1.154 s
Real time factor (RTF): 1.154 / 7.012 = 0.165
| Wave filename | Content | Ground truth |
|---|---|---|
| dia_hunan.wav | 但总来讲孙膑对兵法的理解运用比庞涓略胜一筹。 |
dia_minnan.wav (闽南语)
To decode the test file ./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/dia_minnan.wav:
./build/bin/sherpa-onnx-offline \
--funasr-nano-encoder-adaptor=./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx \
--funasr-nano-llm=./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx \
--funasr-nano-tokenizer=./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B \
--funasr-nano-embedding=./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx \
./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/dia_minnan.wav
You should see the following output:
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:373 ./build/bin/sherpa-onnx-offline --funasr-nano-encoder-adaptor=./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx --funasr-nano-llm=./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx --funasr-nano-tokenizer=./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B --funasr-nano-embedding=./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx ./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/dia_minnan.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), funasr_nano=OfflineFunASRNanoModelConfig(encoder_adaptor="./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx", llm="./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx", embedding="./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx", tokenizer="./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B", system_prompt="You are a helpful assistant.", user_prompt="语音转写:", max_new_tokens=512, temperature=1e-06, top_p=0.8, seed=42), medasr=OfflineMedAsrCtcModelConfig(model=""), telespeech_ctc="", tokens="", num_threads=2, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
recognizer created in 2.114 s
Started
Done!
./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/dia_minnan.wav
{"lang": "", "emotion": "", "event": "", "text": "嗯,一般有机会吧,因为这个哈开了哈,赚的再厉害,也都挺行啊。", "timestamps": [0.00, 0.37, 0.73, 1.10, 1.46, 1.83, 2.20, 2.56, 2.93, 3.29, 3.66, 4.03, 4.39, 4.76, 5.12, 5.49, 5.85, 6.22, 6.59, 6.95, 7.32, 7.68], "durations": [], "tokens":["嗯", ",", "一般", "有机会", "吧", ",", "因为", "这个", "哈", "开了", "哈", ",", "赚", "的", "再", "厉害", ",", "也都", "挺", "行", "啊", "。"], "ys_log_probs": [], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 1.329 s
Real time factor (RTF): 1.329 / 8.081 = 0.164
| Wave filename | Content | Ground truth |
|---|---|---|
| dia_minnan.wav | 嗯,下摆若有机会吧,因为即久吼开了吼卷啊遮厉害,会倒贴钱啊。 |
dia_sh.wav (上海话)
To decode the test file ./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/dia_sh.wav:
./build/bin/sherpa-onnx-offline \
--funasr-nano-encoder-adaptor=./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx \
--funasr-nano-llm=./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx \
--funasr-nano-tokenizer=./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B \
--funasr-nano-embedding=./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx \
./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/dia_sh.wav
You should see the following output:
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:373 ./build/bin/sherpa-onnx-offline --funasr-nano-encoder-adaptor=./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx --funasr-nano-llm=./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx --funasr-nano-tokenizer=./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B --funasr-nano-embedding=./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx ./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/dia_sh.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), funasr_nano=OfflineFunASRNanoModelConfig(encoder_adaptor="./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx", llm="./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx", embedding="./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx", tokenizer="./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B", system_prompt="You are a helpful assistant.", user_prompt="语音转写:", max_new_tokens=512, temperature=1e-06, top_p=0.8, seed=42), medasr=OfflineMedAsrCtcModelConfig(model=""), telespeech_ctc="", tokens="", num_threads=2, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
recognizer created in 1.813 s
Started
Done!
./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/dia_sh.wav
{"lang": "", "emotion": "", "event": "", "text": "人跟狗包括人跟动物接触上了才有感情,那么随着社会的富裕。", "timestamps": [0.00, 0.40, 0.81, 1.21, 1.62, 2.02, 2.42, 2.83, 3.23, 3.63, 4.04, 4.44, 4.85, 5.25, 5.65, 6.06, 6.46, 6.87], "durations": [], "tokens":["人", "跟", "狗", "包括", "人", "跟", "动物", "接触", "上了", "才有", "感情", ",", "那么", "随着", "社会", "的", "富裕", "。"], "ys_log_probs": [], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 1.168 s
Real time factor (RTF): 1.168 / 7.314 = 0.160
| Wave filename | Content | Ground truth |
|---|---|---|
| dia_sh.wav | 人跟狗,包括人跟动物接触长了,全有感情。葛末随了阿拉社会个富裕。 |
dia_yue.wav (粤语,广东话)
To decode the test file ./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/dia_yue.wav:
./build/bin/sherpa-onnx-offline \
--funasr-nano-encoder-adaptor=./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx \
--funasr-nano-llm=./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx \
--funasr-nano-tokenizer=./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B \
--funasr-nano-embedding=./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx \
./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/dia_yue.wav
You should see the following output:
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), funasr_nano=OfflineFunASRNanoModelConfig(encoder_adaptor="./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx", llm="./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx", embedding="./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx", tokenizer="./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B", system_prompt="You are a helpful assistant.", user_prompt="语音转写:", max_new_tokens=512, temperature=1e-06, top_p=0.8, seed=42), medasr=OfflineMedAsrCtcModelConfig(model=""), telespeech_ctc="", tokens="", num_threads=2, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
recognizer created in 1.607 s
Started
Done!
./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/dia_yue.wav
{"lang": "", "emotion": "", "event": "", "text": "啲身体好劲啊,跟住咧佢哋有一个人咧就突然可能就有高原反应啦,突然间就啊窒息咗,即系晕晕咗。", "timestamps": [0.00, 0.24, 0.47, 0.71, 0.94, 1.18, 1.41, 1.65, 1.89, 2.12, 2.36, 2.59, 2.83, 3.06, 3.30, 3.54, 3.77, 4.01, 4.24, 4.48, 4.72, 4.95, 5.19, 5.42, 5.66, 5.89, 6.13, 6.37, 6.60, 6.84, 7.07, 7.31, 7.54, 7.78, 8.02, 8.25, 8.49, 8.72, 8.96, 9.19], "durations": [], "tokens":["", "<0xB2>", "身体", "好", "劲", "啊", ",", "跟", "住", "咧", "", "<0xA2>", "", "<0x8B>", "有", "一
个人", "咧", "就", "突然", "可能", "就有", "高原", "反应", "啦", ",", "突然", "间", "就", "啊", "窒息", "", "<0x97>", ",", "即", "系", "晕", "晕", "", "<0x97>", "。"], "ys_log_probs": [], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 1.749 s
Real time factor (RTF): 1.749 / 9.474 = 0.185
| Wave filename | Content | Ground truth |
|---|---|---|
| dia_yue.wav | 啲身体好劲啊,跟住咧佢哋有一个人咧就突然可能就有高原反应啦,突然间就啊窒息咗,即系晕晕咗。 |
lyrics.wav (中文歌曲-1)
To decode the test file ./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/lyrics.wav:
./build/bin/sherpa-onnx-offline \
--funasr-nano-encoder-adaptor=./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx \
--funasr-nano-llm=./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx \
--funasr-nano-tokenizer=./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B \
--funasr-nano-embedding=./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx \
./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/lyrics.wav
You should see the following output:
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:373 ./build/bin/sherpa-onnx-offline --funasr-nano-encoder-adaptor=./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx --funasr-nano-llm=./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx --funasr-nano-tokenizer=./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B --funasr-nano-embedding=./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx ./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/lyrics.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), funasr_nano=OfflineFunASRNanoModelConfig(encoder_adaptor="./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx", llm="./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx", embedding="./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx", tokenizer="./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B", system_prompt="You are a helpful assistant.", user_prompt="语音转写:", max_new_tokens=512, temperature=1e-06, top_p=0.8, seed=42), medasr=OfflineMedAsrCtcModelConfig(model=""), telespeech_ctc="", tokens="", num_threads=2, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
recognizer created in 1.640 s
Started
Done!
./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/lyrics.wav
{"lang": "", "emotion": "", "event": "", "text": "我看到我的身后跟着我的人群,喜欢或恨不一样的神情。我知道这可能就是所谓的成名,我知道必须往前一步一步,不能停。", "timestamps": [0.00, 0.28, 0.57, 0.85, 1.14, 1.42, 1.70, 1.99, 2.27, 2.55, 2.84, 3.12, 3.41, 3.69, 3.97, 4.26, 4.54, 4.82, 5.11, 5.39, 5.68, 5.96, 6.24, 6.53, 6.81, 7.09, 7.38, 7.66, 7.95], "durations": [], "tokens":["我", "看到", "我的", "身后", "跟着", "我的", "人群", ",", "喜欢", "或", "恨", "不一样的", "神情", "。", "我知道", "这", "可能", "就是", "所谓的", "成名", ",", "我知道", "必须", "往前", "一步一步", ",", "不能", "停", "。"], "ys_log_probs": [], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 1.510 s
Real time factor (RTF): 1.510 / 8.266 = 0.183
| Wave filename | Content | Ground truth |
|---|---|---|
| lyrics.wav | 我看到我的身后盯着我的人群,喜欢或恨不一样的神情,我知道这可能就是所谓的成名,我知道必须往前一步也不能停。 |
lyrics_2.wav (中文歌曲-2)
To decode the test file ./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/lyrics_2.wav:
./build/bin/sherpa-onnx-offline \
--funasr-nano-encoder-adaptor=./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx \
--funasr-nano-llm=./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx \
--funasr-nano-tokenizer=./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B \
--funasr-nano-embedding=./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx \
./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/lyrics_2.wav
You should see the following output:
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:373 ./build/bin/sherpa-onnx-offline --funasr-nano-encoder-adaptor=./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx --funasr-nano-llm=./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx --funasr-nano-tokenizer=./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B --funasr-nano-embedding=./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx ./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/lyrics_2.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), funasr_nano=OfflineFunASRNanoModelConfig(encoder_adaptor="./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx", llm="./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx", embedding="./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx", tokenizer="./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B", system_prompt="You are a helpful assistant.", user_prompt="语音转写:", max_new_tokens=512, temperature=1e-06, top_p=0.8, seed=42), medasr=OfflineMedAsrCtcModelConfig(model=""), telespeech_ctc="", tokens="", num_threads=2, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
recognizer created in 1.932 s
Started
Done!
./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/lyrics_2.wav
{"lang": "", "emotion": "", "event": "", "text": "明明那么远,为何却感觉离他那么近?闭上眼睛,深深的背出那所有押韵。虽然不听说唱了,但你已学会自信。我代表所有中文说唱歌手向你致敬,如今面对困难的你早已不再抱怨。", "timestamps": [0.00, 0.25, 0.49, 0.74, 0.98, 1.23, 1.47, 1.72, 1.96, 2.21, 2.46, 2.70, 2.95, 3.19, 3.44, 3.68, 3.93, 4.17, 4.42, 4.67, 4.91, 5.16, 5.40, 5.65, 5.89, 6.14, 6.38, 6.63, 6.88, 7.12, 7.37, 7.61, 7.86, 8.10, 8.35, 8.59, 8.84, 9.08, 9.33, 9.58, 9.82, 10.07, 10.31, 10.56, 10.80, 11.05, 11.29, 11.54, 11.79, 12.03, 12.28, 12.52, 12.77, 13.01, 13.26, 13.50], "durations": [], "tokens":["明明", "那么", "远", ",", "为何", "却", "感觉", "离", "他", "那么", "近", "?", "闭", "上", "眼睛", ",", "深深的", "背", "出", "那", "所有", "押", "韵", "。", "虽然", "不", "听说", "唱", "了", ",", "但", "你", "已", "学会", "自信", "。", "我", "代表", "所有", "中文", "说", "唱", "歌手", "向", "你", "致敬", ",", "如今", "面对", "困难", "的", "你", "早已", "不再", "抱怨", "。"], "ys_log_probs": [], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 2.627 s
Real time factor (RTF): 2.627 / 13.769 = 0.191
| Wave filename | Content | Ground truth |
|---|---|---|
| lyrics_2.wav | 明明那么远,为何却感觉离他那么近?闭上眼,你甚至能背出他所有押韵。虽然不听说唱了,但你已学会自信。我代表所有中文说唱歌手向你致敬。如今面对困难的你,早已不再抱怨。 |
lyrics_3.wav (中文歌曲-3)
To decode the test file ./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/lyrics_3.wav:
./build/bin/sherpa-onnx-offline \
--funasr-nano-encoder-adaptor=./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx \
--funasr-nano-llm=./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx \
--funasr-nano-tokenizer=./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B \
--funasr-nano-embedding=./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx \
./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/lyrics_3.wav
You should see the following output:
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:373 ./build/bin/sherpa-onnx-offline --funasr-nano-encoder-adaptor=./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx --funasr-nano-llm=./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx --funasr-nano-tokenizer=./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B --funasr-nano-embedding=./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx ./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/lyrics_3.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), funasr_nano=OfflineFunASRNanoModelConfig(encoder_adaptor="./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx", llm="./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx", embedding="./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx", tokenizer="./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B", system_prompt="You are a helpful assistant.", user_prompt="语音转写:", max_new_tokens=512, temperature=1e-06, top_p=0.8, seed=42), medasr=OfflineMedAsrCtcModelConfig(model=""), telespeech_ctc="", tokens="", num_threads=2, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
recognizer created in 1.704 s
Started
Done!
./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/lyrics_3.wav
{"lang": "", "emotion": "", "event": "", "text": "你听啊,秋末的落叶,你听它叹息着离别,只剩我独自领略海与山风和月。你听啊。", "timestamps": [0.00, 0.54, 1.09, 1.63, 2.18, 2.72, 3.26, 3.81, 4.35, 4.90, 5.44, 5.98, 6.53, 7.07, 7.62, 8.16, 8.70, 9.25, 9.79, 10.34, 10.88, 11.43, 11.97, 12.51, 13.06, 13.60, 14.15, 14.69, 15.23, 15.78, 16.32, 16.87], "durations": [], "tokens":["你", "听", "啊", ",", "秋", "末", "的", "落叶", ",", "你", "听", "它", "叹息", "着", "离", "别", ",", "只剩", "我", "独自", "领略", "海", "与", "山", "风", "和", "月", "。", "你", "听", "啊", "。"], "ys_log_probs": [], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 2.744 s
Real time factor (RTF): 2.744 / 17.461 = 0.157
| Wave filename | Content | Ground truth |
|---|---|---|
| lyrics_3.wav | 你听啊秋末的落叶,你听它叹息着离别,只剩我独自领略海与山风和月,你听啊。 |
lyrics_en_1.wav (英文歌曲-1)
To decode the test file ./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/lyrics_en_1.wav:
./build/bin/sherpa-onnx-offline \
--funasr-nano-encoder-adaptor=./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx \
--funasr-nano-llm=./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx \
--funasr-nano-tokenizer=./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B \
--funasr-nano-embedding=./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx \
./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/lyrics_en_1.wav
You should see the following output:
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:373 ./build/bin/sherpa-onnx-offline --funasr-nano-encoder-adaptor=./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx --funasr-nano-llm=./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx --funasr-nano-tokenizer=./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B --funasr-nano-embedding=./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx ./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/lyrics_en_1.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), funasr_nano=OfflineFunASRNanoModelConfig(encoder_adaptor="./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx", llm="./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx", embedding="./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx", tokenizer="./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B", system_prompt="You are a helpful assistant.", user_prompt="语音转写:", max_new_tokens=512, temperature=1e-06, top_p=0.8, seed=42), medasr=OfflineMedAsrCtcModelConfig(model=""), telespeech_ctc="", tokens="", num_threads=2, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
recognizer created in 1.989 s
Started
Done!
./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/lyrics_en_1.wav
{"lang": "", "emotion": "", "event": "", "text": "When I was young, I'd listen to the radio waiting for my favorite songs. When they played, I'd sing along. It made me smile.", "timestamps": [0.00, 0.75, 1.50, 2.25, 3.01, 3.76, 4.51, 5.26, 6.01, 6.76, 7.51, 8.26, 9.02, 9.77, 10.52, 11.27, 12.02, 12.77, 13.52, 14.27, 15.03, 15.78, 16.53, 17.28, 18.03, 18.78, 19.53, 20.28, 21.04, 21.79, 22.54], "durations": [], "tokens":["When", "I", "was", "young", ",", "I", "'d", "listen", "to", "the", "radio", "waiting", "for", "my", "favorite", "songs", ".", "When", "they", "played", ",", "I", "'d", "sing", "along", ".", "It", "made", "me", "smile", "."], "ys_log_probs": [], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 3.729 s
Real time factor (RTF): 3.729 / 23.341 = 0.160
| Wave filename | Content | Ground truth |
|---|---|---|
| lyrics_en_1.wav | When I was young I'd listen to the radio. Waiting for my favorite songs. When they played I'd sing along. It made me smile. |
lyrics_en_2.wav (英文歌曲-2)
To decode the test file ./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/lyrics_en_2.wav:
./build/bin/sherpa-onnx-offline \
--funasr-nano-encoder-adaptor=./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx \
--funasr-nano-llm=./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx \
--funasr-nano-tokenizer=./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B \
--funasr-nano-embedding=./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx \
./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/lyrics_en_2.wav
You should see the following output:
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:373 ./build/bin/sherpa-onnx-offline --funasr-nano-encoder-adaptor=./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx --funasr-nano-llm=./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx --funasr-nano-tokenizer=./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B --funasr-nano-embedding=./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx ./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/lyrics_en_2.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), funasr_nano=OfflineFunASRNanoModelConfig(encoder_adaptor="./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx", llm="./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx", embedding="./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx", tokenizer="./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B", system_prompt="You are a helpful assistant.", user_prompt="语音转写:", max_new_tokens=512, temperature=1e-06, top_p=0.8, seed=42), medasr=OfflineMedAsrCtcModelConfig(model=""), telespeech_ctc="", tokens="", num_threads=2, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
recognizer created in 1.707 s
Started
Done!
./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/lyrics_en_2.wav
{"lang": "", "emotion": "", "event": "", "text": "I see your monsters, I see your pain, tell me your problems, I'll chase them away. I'll be your lighthouse, I'll make it okay. When I see your monsters, I'll stand there so brave and chase them all away.", "timestamps": [0.00, 0.42, 0.83, 1.25, 1.67, 2.08, 2.50, 2.92, 3.33, 3.75, 4.17, 4.58, 5.00, 5.42, 5.83, 6.25, 6.67, 7.08, 7.50, 7.92, 8.33, 8.75, 9.17, 9.58, 10.00, 10.42, 10.84, 11.25, 11.67, 12.09, 12.50, 12.92, 13.34, 13.75, 14.17, 14.59, 15.00, 15.42, 15.84, 16.25, 16.67, 17.09, 17.50, 17.92, 18.34, 18.75, 19.17, 19.59, 20.00, 20.42, 20.84, 21.25], "durations": [], "tokens":["I", "see", "your", "monsters", ",", "I", "see", "your", "pain", ",", "tell", "me", "your", "problems", ",", "I", "'ll", "chase", "them", "away", ".", "I", "'ll", "be", "your", "l", "ighthouse", ",", "I", "'ll", "make", "it", "okay", ".", "When", "I", "see", "your", "monsters", ",", "I", "'ll", "stand", "there", "so", "brave", "and", "chase", "them", "all", "away", "."], "ys_log_probs": [], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 3.741 s
Real time factor (RTF): 3.741 / 21.711 = 0.172
| Wave filename | Content | Ground truth |
|---|---|---|
| lyrics_en_2.wav | I see your monsters. I see your pain. Tell me your problems; I'll chase them away. I'll be your lighthouse. I'll make it okay. When I see your monsters, I'll stand there so brave and chase them all away. |
lyrics_en_3.wav (英文歌曲-3)
To decode the test file ./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/lyrics_en_3.wav:
./build/bin/sherpa-onnx-offline \
--funasr-nano-encoder-adaptor=./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx \
--funasr-nano-llm=./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx \
--funasr-nano-tokenizer=./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B \
--funasr-nano-embedding=./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx \
./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/lyrics_en_3.wav
You should see the following output:
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:373 ./build/bin/sherpa-onnx-offline --funasr-nano-encoder-adaptor=./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx --funasr-nano-llm=./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx --funasr-nano-tokenizer=./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B --funasr-nano-embedding=./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx ./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/lyrics_en_3.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), funasr_nano=OfflineFunASRNanoModelConfig(encoder_adaptor="./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx", llm="./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx", embedding="./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx", tokenizer="./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B", system_prompt="You are a helpful assistant.", user_prompt="语音转写:", max_new_tokens=512, temperature=1e-06, top_p=0.8, seed=42), medasr=OfflineMedAsrCtcModelConfig(model=""), telespeech_ctc="", tokens="", num_threads=2, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
recognizer created in 1.758 s
Started
Done!
./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/lyrics_en_3.wav
{"lang": "", "emotion": "", "event": "", "text": "An empty street, an empty house, a hole inside my heart. I'm all alone, the rooms are getting smaller. I wonder how, I wonder why, I wonder where they are. The days we had, the songs we sang together.", "timestamps": [0.00, 0.50, 0.99, 1.49, 1.99, 2.49, 2.98, 3.48, 3.98, 4.47, 4.97, 5.47, 5.96, 6.46, 6.96, 7.45, 7.95, 8.45, 8.95, 9.44, 9.94, 10.44, 10.93, 11.43, 11.93, 12.43, 12.92, 13.42, 13.92, 14.41, 14.91, 15.41, 15.90, 16.40, 16.90, 17.40, 17.89, 18.39, 18.89, 19.38, 19.88, 20.38, 20.87, 21.37, 21.87, 22.36, 22.86, 23.36, 23.86, 24.35], "durations": [], "tokens":["An", "empty", "street", ",", "an", "empty", "house", ",", "a", "hole", "inside", "my", "heart", ".", "I", "'m", "all", "alone", ",", "the", "rooms", "are", "getting", "smaller", ".", "I", "wonder", "how", ",", "I", "wonder", "why", ",", "I", "wonder", "where", "they", "are", ".", "The", "days", "we", "had", ",", "the", "songs", "we", "sang", "together", "."], "ys_log_probs": [], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 4.418 s
Real time factor (RTF): 4.418 / 24.882 = 0.178
| Wave filename | Content | Ground truth |
|---|---|---|
| lyrics_en_3.wav | An empty street, an empty house, a hole inside my heart. I'm all alone and the rooms are getting smaller. I wonder how, I wonder why, I wonder where they are. The days we had, the songs we sang together. |
noise_en.wav (英文歌曲-3)
To decode the test file ./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/noise_en.wav:
./build/bin/sherpa-onnx-offline \
--funasr-nano-encoder-adaptor=./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx \
--funasr-nano-llm=./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx \
--funasr-nano-tokenizer=./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B \
--funasr-nano-embedding=./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx \
./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/noise_en.wav
You should see the following output:
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:373 ./build/bin/sherpa-onnx-offline --funasr-nano-encoder-adaptor=./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx --funasr-nano-llm=./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx --funasr-nano-tokenizer=./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B --funasr-nano-embedding=./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx ./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/noise_en.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), funasr_nano=OfflineFunASRNanoModelConfig(encoder_adaptor="./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx", llm="./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx", embedding="./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx", tokenizer="./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B", system_prompt="You are a helpful assistant.", user_prompt="语音转写:", max_new_tokens=512, temperature=1e-06, top_p=0.8, seed=42), medasr=OfflineMedAsrCtcModelConfig(model=""), telespeech_ctc="", tokens="", num_threads=2, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
recognizer created in 1.661 s
Started
Done!
./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/noise_en.wav
{"lang": "", "emotion": "", "event": "", "text": "So what's interesting here is, I feel that you know brands knowing this when people sort of speak to the voice assistants at home, and if you want to be the brand.", "timestamps": [0.00, 0.29, 0.59, 0.88, 1.17, 1.47, 1.76, 2.06, 2.35, 2.64, 2.94, 3.23, 3.52, 3.82, 4.11, 4.40, 4.70, 4.99, 5.28, 5.58, 5.87, 6.17, 6.46, 6.75, 7.05, 7.34, 7.63, 7.93, 8.22, 8.51, 8.81, 9.10, 9.40, 9.69, 9.98, 10.28], "durations": [], "tokens":["So", "what", "'s", "interesting", "here", "is", ",", "I", "feel", "that", "you", "know", "brands", "knowing", "this", "when", "people", "sort", "of", "speak", "to", "the", "voice", "assistants", "at", "home", ",", "and", "if", "you", "want", "to", "be", "the", "brand", "."], "ys_log_probs": [], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 1.968 s
Real time factor (RTF): 1.968 / 10.568 = 0.186
| Wave filename | Content | Ground truth |
|---|---|---|
| noise_en.wav | So what's interesting here is I feel that you know brands knowing this when people sort of speak to the voice assistance at home and if you want to be the brand. |
far_2.wav
To decode the test file ./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/far_2.wav:
./build/bin/sherpa-onnx-offline \
--funasr-nano-encoder-adaptor=./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx \
--funasr-nano-llm=./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx \
--funasr-nano-tokenizer=./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B \
--funasr-nano-embedding=./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx \
./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/far_2.wav
You should see the following output:
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:373 ./build/bin/sherpa-onnx-offline --funasr-nano-encoder-adaptor=./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx --funasr-nano-llm=./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx --funasr-nano-tokenizer=./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B --funasr-nano-embedding=./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx ./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/far_2.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), funasr_nano=OfflineFunASRNanoModelConfig(encoder_adaptor="./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx", llm="./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx", embedding="./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx", tokenizer="./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B", system_prompt="You are a helpful assistant.", user_prompt="语音转写:", max_new_tokens=512, temperature=1e-06, top_p=0.8, seed=42), medasr=OfflineMedAsrCtcModelConfig(model=""), telespeech_ctc="", tokens="", num_threads=2, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
recognizer created in 1.844 s
Started
Done!
./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/far_2.wav
{"lang": "", "emotion": "", "event": "", "text": "然后被灌顶了渣南县的城防,跑了半截的。那么前方即将到达省公路站,左边是八号线。", "timestamps": [0.00, 0.23, 0.46, 0.68, 0.91, 1.14, 1.37, 1.60, 1.83, 2.05, 2.28, 2.51, 2.74, 2.97, 3.20, 3.42, 3.65, 3.88, 4.11, 4.34, 4.57, 4.79, 5.02, 5.25, 5.48, 5.71, 5.94, 6.16, 6.39, 6.62], "durations": [], "tokens":["然后", "被", "灌", "顶", "了", "渣", "南", "县", "的", "城", "防", ",", "跑了", "半", "截", "的", "。", "那么", "前方", "即将", "到达", "省", "公路", "站", ",", "左边", "是", "八", "号线", "。"], "ys_log_probs": [], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 1.296 s
Real time factor (RTF): 1.296 / 6.850 = 0.189
| Wave filename | Content | Ground truth |
|---|---|---|
| far_2.wav | 然后被冠以了渣男线的称号,好了,不管这个,那么前方即将到达沈杜公路站,左边是8号线。 |
far_3.wav
To decode the test file ./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/far_3.wav:
./build/bin/sherpa-onnx-offline \
--funasr-nano-encoder-adaptor=./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx \
--funasr-nano-llm=./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx \
--funasr-nano-tokenizer=./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B \
--funasr-nano-embedding=./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx \
./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/far_3.wav
You should see the following output:
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:373 ./build/bin/sherpa-onnx-offline --funasr-nano-encoder-adaptor=./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx --funasr-nano-llm=./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx --funasr-nano-tokenizer=./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B --funasr-nano-embedding=./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx ./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/far_3.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), funasr_nano=OfflineFunASRNanoModelConfig(encoder_adaptor="./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx", llm="./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx", embedding="./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx", tokenizer="./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B", system_prompt="You are a helpful assistant.", user_prompt="语音转写:", max_new_tokens=512, temperature=1e-06, top_p=0.8, seed=42), medasr=OfflineMedAsrCtcModelConfig(model=""), telespeech_ctc="", tokens="", num_threads=2, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
recognizer created in 1.673 s
Started
Done!
./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/far_3.wav
{"lang": "", "emotion": "", "event": "", "text": "周末要不要去露营?最近天气超舒服,露营我怕虫子咬,而且晚上睡帐篷会不会很冷啊?放心,我借了专业装备,还有暖宝宝,再带点火锅食材,边吃边看星星超惬意。", "timestamps": [0.00, 0.40, 0.81, 1.21, 1.61, 2.02, 2.42, 2.82, 3.23, 3.63, 4.04, 4.44, 4.84, 5.25, 5.65, 6.05, 6.46, 6.86, 7.26, 7.67, 8.07, 8.47, 8.88, 9.28, 9.68, 10.09, 10.49, 10.90, 11.30, 11.70, 12.11, 12.51, 12.91, 13.32, 13.72, 14.12, 14.53, 14.93, 15.33, 15.74, 16.14, 16.54, 16.95, 17.35, 17.75, 18.16, 18.56, 18.97, 19.37, 19.77, 20.18, 20.58, 20.98, 21.39], "durations": [], "tokens":["周末", "要不要", "去", "露", "营", "?", "最近", "天气", "超", "舒服", ",", "露", "营", "我", "怕", "虫", "子", "咬", ",", "而且", "晚上", "睡", "帐篷", "会不会", "很", "冷", "啊", "?", "放心", ",", "我", "借", "了", "专业", "装备", ",", "还有", "暖", "宝宝", ",", "再", "带", "点", "火锅", "食材", ",", "边", "吃", "边", "看", "星星", "超", "惬意", "。"], "ys_log_probs": [], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 3.809 s
Real time factor (RTF): 3.809 / 21.804 = 0.175
| Wave filename | Content | Ground truth |
|---|---|---|
| far_3.wav | 周末要不要去露营,最近天气超舒服,露营?我怕虫子咬,而且晚上睡帐篷会不会很冷啊?放心,我借了专业装备还有暖宝宝,再带点火锅食材,边吃边看星星超惬意。 |
far_4.wav
To decode the test file ./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/far_4.wav:
./build/bin/sherpa-onnx-offline \
--funasr-nano-encoder-adaptor=./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx \
--funasr-nano-llm=./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx \
--funasr-nano-tokenizer=./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B \
--funasr-nano-embedding=./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx \
./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/far_4.wav
You should see the following output:
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:373 ./build/bin/sherpa-onnx-offline --funasr-nano-encoder-adaptor=./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx --funasr-nano-llm=./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx --funasr-nano-tokenizer=./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B --funasr-nano-embedding=./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx ./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/far_4.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), funasr_nano=OfflineFunASRNanoModelConfig(encoder_adaptor="./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx", llm="./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx", embedding="./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx", tokenizer="./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B", system_prompt="You are a helpful assistant.", user_prompt="语音转写:", max_new_tokens=512, temperature=1e-06, top_p=0.8, seed=42), medasr=OfflineMedAsrCtcModelConfig(model=""), telespeech_ctc="", tokens="", num_threads=2, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
recognizer created in 1.859 s
Started
Done!
./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/far_4.wav
{"lang": "", "emotion": "", "event": "", "text": "唯一的遗憾就是他那个八宝鸭还有烤鸭子没吃上,估计得提前预定吧,只能怪我自己没有做好功课。", "timestamps": [0.00, 0.31, 0.63, 0.94, 1.25, 1.56, 1.88, 2.19, 2.50, 2.81, 3.13, 3.44, 3.75, 4.07, 4.38, 4.69, 5.00, 5.32, 5.63, 5.94, 6.26, 6.57, 6.88, 7.19, 7.51, 7.82, 8.13, 8.44, 8.76], "durations": [], "tokens":["唯一的", "遗憾", "就是", "他", "那个", "八", "宝", "鸭", "还有", "烤", "鸭", "子", "没", "吃", "上", ",", "估计", "得", "提前", "预定", "吧", ",", "只能", "怪", "我自己", "没有", "做好", "功课", "。"], "ys_log_probs": [], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 1.570 s
Real time factor (RTF): 1.570 / 9.079 = 0.173
| Wave filename | Content | Ground truth |
|---|---|---|
| far_4.wav | 唯一的遗憾就是他那个八宝鸭还有烤鸭都没吃上, 估计得提前预定吧, 只能怪我自己没有做好功课. |
far_5.wav
To decode the test file ./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/far_5.wav:
./build/bin/sherpa-onnx-offline \
--funasr-nano-encoder-adaptor=./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx \
--funasr-nano-llm=./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx \
--funasr-nano-tokenizer=./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B \
--funasr-nano-embedding=./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx \
./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/far_5.wav
You should see the following output:
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:373 ./build/bin/sherpa-onnx-offline --funasr-nano-encoder-adaptor=./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx --funasr-nano-llm=./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx --funasr-nano-tokenizer=./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B --funasr-nano-embedding=./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx ./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/far_5.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), funasr_nano=OfflineFunASRNanoModelConfig(encoder_adaptor="./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx", llm="./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx", embedding="./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx", tokenizer="./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B", system_prompt="You are a helpful assistant.", user_prompt="语音转写:", max_new_tokens=512, temperature=1e-06, top_p=0.8, seed=42), medasr=OfflineMedAsrCtcModelConfig(model=""), telespeech_ctc="", tokens="", num_threads=2, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
recognizer created in 1.714 s
Started
Done!
./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/far_5.wav
{"lang": "", "emotion": "", "event": "", "text": "别紧张,我只是我是在这边逛街,然后看到你们在这边拍照,想跟你交个朋友,认识一下。", "timestamps": [0.00, 0.36, 0.71, 1.07, 1.42, 1.78, 2.13, 2.49, 2.84, 3.20, 3.56, 3.91, 4.27, 4.62, 4.98, 5.33, 5.69, 6.05, 6.40, 6.76, 7.11, 7.47, 7.82, 8.18, 8.53], "durations": [], "tokens":["别", "紧张", ",", "我只是", "我", "是在", "这边", "逛街", ",", "然后", "看到", "你们", "在这", "边", "拍照", ",", "想", "跟你", "交", "个", "朋友", ",", "认识", "一下", "。"], "ys_log_probs": [], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 1.503 s
Real time factor (RTF): 1.503 / 8.917 = 0.169
| Wave filename | Content | Ground truth |
|---|---|---|
| far_5.wav | 别紧张, 我只是我是在这边逛街, 然后看到你们在这边拍照, 想跟你交个朋友, 认识一下. |
rag_chemistry.wav
To decode the test file ./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/rag_chemistry.wav:
./build/bin/sherpa-onnx-offline \
--funasr-nano-encoder-adaptor=./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx \
--funasr-nano-llm=./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx \
--funasr-nano-tokenizer=./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B \
--funasr-nano-embedding=./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx \
./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/rag_chemistry.wav
You should see the following output:
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:373 ./build/bin/sherpa-onnx-offline --funasr-nano-encoder-adaptor=./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx --funasr-nano-llm=./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx --funasr-nano-tokenizer=./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B --funasr-nano-embedding=./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx ./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/rag_chemistry.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), funasr_nano=OfflineFunASRNanoModelConfig(encoder_adaptor="./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx", llm="./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx", embedding="./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx", tokenizer="./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B", system_prompt="You are a helpful assistant.", user_prompt="语音转写:", max_new_tokens=512, temperature=1e-06, top_p=0.8, seed=42), medasr=OfflineMedAsrCtcModelConfig(model=""), telespeech_ctc="", tokens="", num_threads=2, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
recognizer created in 1.899 s
Started
Done!
./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/rag_chemistry.wav
{"lang": "", "emotion": "", "event": "", "text": "比如说只在当时被认为是一种含氧酸盐。", "timestamps": [0.00, 0.62, 1.23, 1.85, 2.47, 3.09, 3.70, 4.32, 4.94, 5.56, 6.17], "durations": [], "tokens":["比如说", "只", "在", "当时", "被认为", "是一种", "含", "氧", "酸", "盐", "。"], "ys_log_probs": [], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 1.081 s
Real time factor (RTF): 1.081 / 6.840 = 0.158
| Wave filename | Content | Ground truth |
|---|---|---|
| rag_chemistry.wav | 比如说酯在当时被认为是一种含氧酸盐 |
rag_history.wav
To decode the test file ./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/rag_history.wav:
./build/bin/sherpa-onnx-offline \
--funasr-nano-encoder-adaptor=./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx \
--funasr-nano-llm=./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx \
--funasr-nano-tokenizer=./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B \
--funasr-nano-embedding=./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx \
./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/rag_history.wav
You should see the following output:
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:373 ./build/bin/sherpa-onnx-offline --funasr-nano-encoder-adaptor=./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx --funasr-nano-llm=./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx --funasr-nano-tokenizer=./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B --funasr-nano-embedding=./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx ./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/rag_history.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), funasr_nano=OfflineFunASRNanoModelConfig(encoder_adaptor="./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx", llm="./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx", embedding="./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx", tokenizer="./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B", system_prompt="You are a helpful assistant.", user_prompt="语音转写:", max_new_tokens=512, temperature=1e-06, top_p=0.8, seed=42), medasr=OfflineMedAsrCtcModelConfig(model=""), telespeech_ctc="", tokens="", num_threads=2, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
recognizer created in 1.776 s
Started
Done!
./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/rag_history.wav
{"lang": "", "emotion": "", "event": "", "text": "由罗马皇帝钦点的犹地亚王大希律王统治期间", "timestamps": [0.00, 0.49, 0.98, 1.48, 1.97, 2.46, 2.95, 3.44, 3.93, 4.43, 4.92, 5.41, 5.90, 6.39, 6.89, 7.38], "durations": [], "tokens":["由", "罗马", "皇帝", "钦", "点", "的", "犹", "地", "亚", "王", "大", "希", "律", "王", "统治", "期间"], "ys_log_probs": [], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 1.261 s
Real time factor (RTF): 1.261 / 7.920 = 0.159
| Wave filename | Content | Ground truth |
|---|---|---|
| rag_history.wav | 由罗马皇帝钦点的犹地亚王大希律王统治期间 |
rag_math.wav
To decode the test file ./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/rag_math.wav:
./build/bin/sherpa-onnx-offline \
--funasr-nano-encoder-adaptor=./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx \
--funasr-nano-llm=./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx \
--funasr-nano-tokenizer=./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B \
--funasr-nano-embedding=./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx \
./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/rag_math.wav
You should see the following output:
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:373 ./build/bin/sherpa-onnx-offline --funasr-nano-encoder-adaptor=./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx --funasr-nano-llm=./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx --funasr-nano-tokenizer=./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B --funasr-nano-embedding=./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx ./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/rag_math.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), funasr_nano=OfflineFunASRNanoModelConfig(encoder_adaptor="./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx", llm="./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx", embedding="./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx", tokenizer="./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B", system_prompt="You are a helpful assistant.", user_prompt="语音转写:", max_new_tokens=512, temperature=1e-06, top_p=0.8, seed=42), medasr=OfflineMedAsrCtcModelConfig(model=""), telespeech_ctc="", tokens="", num_threads=2, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
recognizer created in 1.943 s
Started
Done!
./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/rag_math.wav
{"lang": "", "emotion": "", "event": "", "text": "对微分形式的积分是微分几何中的基本概念。", "timestamps": [0.00, 0.39, 0.78, 1.17, 1.56, 1.95, 2.34, 2.73, 3.13, 3.52, 3.91, 4.30, 4.69, 5.08], "durations": [], "tokens":["对", "微", "分", "形式", "的", "积分", "是", "微", "分", "几何", "中的", "基本", "概念", "。"], "ys_log_probs": [], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 0.946 s
Real time factor (RTF): 0.946 / 5.520 = 0.171
| Wave filename | Content | Ground truth |
|---|---|---|
| rag_math.wav | 对微分形式的积分是微分几何中的基本概念 |
rag_medical.wav
To decode the test file ./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/rag_medical.wav:
./build/bin/sherpa-onnx-offline \
--funasr-nano-encoder-adaptor=./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx \
--funasr-nano-llm=./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx \
--funasr-nano-tokenizer=./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B \
--funasr-nano-embedding=./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx \
./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/rag_medical.wav
You should see the following output:
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:373 ./build/bin/sherpa-onnx-offline --funasr-nano-encoder-adaptor=./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx --funasr-nano-llm=./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx --funasr-nano-tokenizer=./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B --funasr-nano-embedding=./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx ./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/rag_medical.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), funasr_nano=OfflineFunASRNanoModelConfig(encoder_adaptor="./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx", llm="./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx", embedding="./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx", tokenizer="./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B", system_prompt="You are a helpful assistant.", user_prompt="语音转写:", max_new_tokens=512, temperature=1e-06, top_p=0.8, seed=42), medasr=OfflineMedAsrCtcModelConfig(model=""), telespeech_ctc="", tokens="", num_threads=2, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
recognizer created in 1.744 s
Started
Done!
./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/rag_medical.wav
{"lang": "", "emotion": "", "event": "", "text": "肾脏中肾小球囊上的细胞膜孔隙很小。", "timestamps": [0.00, 0.47, 0.93, 1.40, 1.87, 2.33, 2.80, 3.27, 3.74, 4.20, 4.67, 5.14, 5.60], "durations": [], "tokens":["肾脏", "中", "肾", "小", "球", "囊", "上的", "细胞", "膜", "孔", "隙", "很小", "。"], "ys_log_probs": [], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 1.006 s
Real time factor (RTF): 1.006 / 6.120 = 0.164
| Wave filename | Content | Ground truth |
|---|---|---|
| rag_medical.wav | 肾脏中肾小球囊上的细胞膜孔隙很小 |
rag_physics.wav
To decode the test file ./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/rag_physics.wav:
./build/bin/sherpa-onnx-offline \
--funasr-nano-encoder-adaptor=./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx \
--funasr-nano-llm=./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx \
--funasr-nano-tokenizer=./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B \
--funasr-nano-embedding=./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx \
./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/rag_physics.wav
You should see the following output:
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:373 ./build/bin/sherpa-onnx-offline --funasr-nano-encoder-adaptor=./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx --funasr-nano-llm=./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx --funasr-nano-tokenizer=./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B --funasr-nano-embedding=./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx ./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/rag_physics.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), funasr_nano=OfflineFunASRNanoModelConfig(encoder_adaptor="./sherpa-onnx-funasr-nano-int8-2025-12-30/encoder_adaptor.int8.onnx", llm="./sherpa-onnx-funasr-nano-int8-2025-12-30/llm.int8.onnx", embedding="./sherpa-onnx-funasr-nano-int8-2025-12-30/embedding.int8.onnx", tokenizer="./sherpa-onnx-funasr-nano-int8-2025-12-30/Qwen3-0.6B", system_prompt="You are a helpful assistant.", user_prompt="语音转写:", max_new_tokens=512, temperature=1e-06, top_p=0.8, seed=42), medasr=OfflineMedAsrCtcModelConfig(model=""), telespeech_ctc="", tokens="", num_threads=2, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
recognizer created in 1.794 s
Started
Done!
./sherpa-onnx-funasr-nano-int8-2025-12-30/test_wavs/rag_physics.wav
{"lang": "", "emotion": "", "event": "", "text": "根据碰撞理论,月面样本缺少挥发性物质。", "timestamps": [0.00, 1.01, 2.01, 3.02, 4.02, 5.03, 6.03, 7.04, 8.05, 9.05, 10.06, 11.06], "durations": [], "tokens":["根据", "碰撞", "理论", ",", "月", "面", "样本", "缺少", "挥发", "性", "物质", "。"], "ys_log_probs": [], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 1.743 s
Real time factor (RTF): 1.743 / 12.120 = 0.144
| Wave filename | Content | Ground truth |
|---|---|---|
| rag_physics.wav | 根据碰撞理论月面样本缺少挥发性物质 |