Pre-trained Models
This page describes how to download pre-trained SenseVoice models.
sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17 (Chinese, English, Japanese, Korean, Cantonese, 中英日韩粤语)
This model is converted from https://www.modelscope.cn/models/iic/SenseVoiceSmall using the script export-ncnn.py.
It supports the following 5 languages:
Chinese (Mandarin, 普通话)
Cantonese (粤语, 广东话)
English
Japanese
Korean
In the following, we describe how to use it.
Hint
For RKNN
users, please refer to sherpa-onnx-rk3588-20-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17 (Chinese, English, Japanese, Korean, Cantonese, 中英日韩粤语).
For onnxruntime
users, please refer to sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17 (Chinese, English, Japanese, Korean, Cantonese, 中英日韩粤语).
Download
Please use the following commands to download it:
cd /path/to/sherpa-ncnn
wget https://github.com/k2-fsa/sherpa-ncnn/releases/download/asr-models/sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17.tar.bz2
tar xvf sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17.tar.bz2
rm sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17.tar.bz2
ls -lh sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/
total 907400
-rw-r--r-- 1 fangjun staff 71B Sep 13 19:17 LICENSE
-rw-r--r-- 1 fangjun staff 104B Sep 13 19:17 README.md
-rw-r--r-- 1 fangjun staff 443M Sep 13 19:17 model.ncnn.bin
-rw-r--r-- 1 fangjun staff 162K Sep 13 19:17 model.ncnn.param
drwxr-xr-x 7 fangjun staff 224B Sep 13 19:17 test_wavs
-rw-r--r-- 1 fangjun staff 308K Sep 13 19:17 tokens.txt
Hint
If you want to use the
int8
quantized model, please run:wget https://github.com/k2-fsa/sherpa-ncnn/releases/download/asr-models/sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-int8-2024-07-17.tar.bz2 tar xvf sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-int8-2024-07-17.tar.bz2 rm sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-int8-2024-07-17.tar.bz2
ls -lh sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-int8-2024-07-17/
total 460696
-rw-r--r-- 1 fangjun staff 71B Sep 17 14:26 LICENSE
-rw-r--r-- 1 fangjun staff 104B Sep 17 14:26 README.md
-rw-r--r-- 1 fangjun staff 222M Sep 17 14:28 model.ncnn.bin
-rw-r--r-- 1 fangjun staff 158K Sep 17 14:28 model.ncnn.param
drwxr-xr-x 7 fangjun staff 224B Sep 17 14:26 test_wavs
-rw-r--r-- 1 fangjun staff 308K Sep 17 14:26 tokens.txt
Decode a file
Without inverse text normalization
To decode a file without inverse text normalization, please use:
./build/bin/sherpa-ncnn-offline \
--tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt \
--sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17 \
--num-threads=1 \
./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/test_wavs/zh.wav
You should see the following output:
/Users/fangjun/open-source/sherpa-ncnn/sherpa-ncnn/csrc/parse-options.cc:Read:381 ./build/bin/sherpa-ncnn-offline --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17 --num-threads=1 ./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/test_wavs/zh.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(sense_voice=OfflineSenseVoiceModelConfig(model_dir="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17", language="auto", use_itn=False), tokens="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt", num_threads=1, debug=False), decoding_method="greedy_search", blank_penalty=0)
Creating recognizer ...
Started
Done!
./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/test_wavs/zh.wav
{"lang": "<|zh|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "开放时间早上九点至下午五点", "timestamps": [0.72, 0.96, 1.26, 1.44, 1.92, 2.10, 2.58, 2.82, 3.30, 3.90, 4.20, 4.56, 4.74], "tokens":["开", "放", "时", "间", "早", "上", "九", "点", "至", "下", "午", "五", "点"]}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.661 s
Real time factor (RTF): 0.661 / 5.592 = 0.118
With inverse text normalization
To decode a file with inverse text normalization, please use:
./build/bin/sherpa-ncnn-offline \
--tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt \
--sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17 \
--sense-voice-use-itn=1 \
--num-threads=1 \
./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/test_wavs/zh.wav
You should see the following output:
/Users/fangjun/open-source/sherpa-ncnn/sherpa-ncnn/csrc/parse-options.cc:Read:381 ./build/bin/sherpa-ncnn-offline --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17 --sense-voice-use-itn=1 --num-threads=1 ./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/test_wavs/zh.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(sense_voice=OfflineSenseVoiceModelConfig(model_dir="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17", language="auto", use_itn=True), tokens="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt", num_threads=1, debug=False), decoding_method="greedy_search", blank_penalty=0)
Creating recognizer ...
Started
Done!
./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/test_wavs/zh.wav
{"lang": "<|zh|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "开放时间早上9点至下午五点。", "timestamps": [0.72, 0.96, 1.26, 1.44, 1.92, 2.10, 2.58, 2.82, 3.30, 3.90, 4.20, 4.56, 4.74, 5.46], "tokens":["开", "放", "时", "间", "早", "上", "9", "点", "至", "下", "午", "五", "点", "。"]}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.613 s
Real time factor (RTF): 0.613 / 5.592 = 0.110
Hint
When inverse text normalziation is enabled, the results contain punctuations.
Real-time Speech recognition from a microphone
First, download a VAD model
cd /path/to/sherpa-ncnn
wget https://github.com/k2-fsa/sherpa-ncnn/releases/download/models/sherpa-ncnn-silero-vad.tar.bz2
tar xvf sherpa-ncnn-silero-vad.tar.bz2
rm sherpa-ncnn-silero-vad.tar.bz2
Now, run it:
./build/bin/sherpa-ncnn-vad-microphone-simulated-streaming-asr \
--tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt \
--sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17 \
--silero-vad-model-dir=./sherpa-ncnn-silero-vad \
--num-threads=1
Hint
You can use ./build/bin/sherpa-ncnn-pa-devs
to list all microphone devices.
The output of the command:
./build/bin/sherpa-ncnn-pa-devs
is given below:
PortAudio version: 0x00130700
Version text: 'PortAudio V19.7.0-devel, revision 147dd722548358763a8b649b3e4b41dfffbcfbb6'
Number of devices = 5
--------------------------------------- device #0
Name = Background Music
Host API = Core Audio
Max inputs = 2, Max outputs = 2
Default low input latency = 0.0100
Default low output latency = 0.0015
Default high input latency = 0.1000
Default high output latency = 0.0116
Default sample rate = 44100.00
Supported standard sample rates
for half-duplex 16 bit 2 channel input =
8000.00, 9600.00, 11025.00, 12000.00,
16000.00, 22050.00, 24000.00, 32000.00,
44100.00, 48000.00, 88200.00, 96000.00,
192000.00
Supported standard sample rates
for half-duplex 16 bit 2 channel output =
8000.00, 9600.00, 11025.00, 12000.00,
16000.00, 22050.00, 24000.00, 32000.00,
44100.00, 48000.00, 88200.00, 96000.00,
192000.00
Supported standard sample rates
for full-duplex 16 bit 2 channel input, 2 channel output =
8000.00, 9600.00, 11025.00, 12000.00,
16000.00, 22050.00, 24000.00, 32000.00,
44100.00, 48000.00, 88200.00, 96000.00,
192000.00
--------------------------------------- device #1
Name = Background Music (UI Sounds)
Host API = Core Audio
Max inputs = 2, Max outputs = 2
Default low input latency = 0.0100
Default low output latency = 0.0015
Default high input latency = 0.1000
Default high output latency = 0.0116
Default sample rate = 44100.00
Supported standard sample rates
for half-duplex 16 bit 2 channel input =
8000.00, 9600.00, 11025.00, 12000.00,
16000.00, 22050.00, 24000.00, 32000.00,
44100.00, 48000.00, 88200.00, 96000.00,
192000.00
Supported standard sample rates
for half-duplex 16 bit 2 channel output =
8000.00, 9600.00, 11025.00, 12000.00,
16000.00, 22050.00, 24000.00, 32000.00,
44100.00, 48000.00, 88200.00, 96000.00,
192000.00
Supported standard sample rates
for full-duplex 16 bit 2 channel input, 2 channel output =
8000.00, 9600.00, 11025.00, 12000.00,
16000.00, 22050.00, 24000.00, 32000.00,
44100.00, 48000.00, 88200.00, 96000.00,
192000.00
--------------------------------------- device #2
[ Default Input ]
Name = MacBook Pro Microphone
Host API = Core Audio
Max inputs = 1, Max outputs = 0
Default low input latency = 0.0345
Default low output latency = 0.0100
Default high input latency = 0.0439
Default high output latency = 0.1000
Default sample rate = 48000.00
Supported standard sample rates
for half-duplex 16 bit 1 channel input =
8000.00, 9600.00, 11025.00, 12000.00,
16000.00, 22050.00, 24000.00, 32000.00,
44100.00, 48000.00, 88200.00, 96000.00,
192000.00
--------------------------------------- device #3
[ Default Output ]
Name = MacBook Pro Speakers
Host API = Core Audio
Max inputs = 0, Max outputs = 2
Default low input latency = 0.0100
Default low output latency = 0.0120
Default high input latency = 0.1000
Default high output latency = 0.0214
Default sample rate = 48000.00
Supported standard sample rates
for half-duplex 16 bit 2 channel output =
8000.00, 9600.00, 11025.00, 12000.00,
16000.00, 22050.00, 24000.00, 32000.00,
44100.00, 48000.00, 88200.00, 96000.00,
192000.00
--------------------------------------- device #4
Name = WeMeet Audio Device
Host API = Core Audio
Max inputs = 2, Max outputs = 2
Default low input latency = 0.0100
Default low output latency = 0.0013
Default high input latency = 0.1000
Default high output latency = 0.0107
Default sample rate = 48000.00
Supported standard sample rates
for half-duplex 16 bit 2 channel input =
8000.00, 9600.00, 11025.00, 12000.00,
16000.00, 22050.00, 24000.00, 32000.00,
44100.00, 48000.00, 88200.00, 96000.00,
192000.00
Supported standard sample rates
for half-duplex 16 bit 2 channel output =
8000.00, 9600.00, 11025.00, 12000.00,
16000.00, 22050.00, 24000.00, 32000.00,
44100.00, 48000.00, 88200.00, 96000.00,
192000.00
Supported standard sample rates
for full-duplex 16 bit 2 channel input, 2 channel output =
8000.00, 9600.00, 11025.00, 12000.00,
16000.00, 22050.00, 24000.00, 32000.00,
44100.00, 48000.00, 88200.00, 96000.00,
192000.00
----------------------------------------------
Hint
If you want to use device #2
with sample rate 48000
, please run:
./build/bin/sherpa-ncnn-vad-microphone-simulated-streaming-asr \
--mic-device-index=2 \
--mic-sample-rate=48000 \
--tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt \
--sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17 \
--silero-vad-model-dir=./sherpa-ncnn-silero-vad \
--num-threads=1
Speed test on RK3588 CPU
RTF of SenseVoice
in
sherpa-ncnn |
1 thread |
2 threads |
3 threads |
4 threads |
Cortex A55
(fp16 quantization)
|
0.584 |
0.320 |
0.231 |
0.188 |
Cortex A55
(int8 quantization)
|
0.346 |
0.202 |
0.152 |
0.126 |
Cortex A76
(fp16 quantization)
|
0.142 |
0.079 |
0.063 |
0.049 |
Cortex A76
(int8 quantization)
|
0.097 |
0.062 |
0.045 |
0.035 |
See also Speed test on RK3588 CPU for sherpa-onnx.
Cortex A55
# 1 cortex A55 CPU
taskset 0x01 ./build/bin/sherpa-ncnn-offline \
--num-threads=1 \
--tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt \
--sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17 \
./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/test_wavs/zh.wav
# 2 cortex A55 CPUs
taskset 0x03 ./build/bin/sherpa-ncnn-offline \
--num-threads=2 \
--tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt \
--sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17 \
./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/test_wavs/zh.wav
# 3 cortex A55 CPUs
taskset 0x07 ./build/bin/sherpa-ncnn-offline \
--num-threads=3 \
--tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt \
--sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17 \
./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/test_wavs/zh.wav
# 4 cortex A55 CPUs
taskset 0x0f ./build/bin/sherpa-ncnn-offline \
--num-threads=4 \
--tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt \
--sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17 \
./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/test_wavs/zh.wav
# For int8 models, please use
taskset 0x01 ./build/bin/sherpa-ncnn-offline \
--num-threads=1 \
--tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-int8-2024-07-17/tokens.txt \
--sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-int8-2024-07-17 \
./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-int8-2024-07-17/test_wavs/zh.wav
Cortex A76
# 1 cortex A76 CPU
taskset 0x10 ./build/bin/sherpa-ncnn-offline \
--num-threads=1 \
--tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt \
--sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17 \
./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/test_wavs/zh.wav
# 2 cortex A76 CPUs
taskset 0x30 ./build/bin/sherpa-ncnn-offline \
--num-threads=2 \
--tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt \
--sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17 \
./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/test_wavs/zh.wav
# 3 cortex A76 CPUs
taskset 0x70 ./build/bin/sherpa-ncnn-offline \
--num-threads=3 \
--tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt \
--sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17 \
./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/test_wavs/zh.wav
# 4 cortex A76 CPUs
taskset 0xf0 ./build/bin/sherpa-ncnn-offline \
--num-threads=4 \
--tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt \
--sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17 \
./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/test_wavs/zh.wav
sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 (Chinese, English, Japanese, Korean, Cantonese, 中英日韩粤语)
This model is converted from
It is fine-tuned on sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17 (Chinese, English, Japanese, Korean, Cantonese, 中英日韩粤语) with 21.8k hours of Cantonese
data.
It supports the following 5 languages:
Chinese (Mandarin, 普通话)
Cantonese (粤语, 广东话)
English
Japanese
Korean
Hint
If you want a Cantonese
ASR model, please choose this model.
Hint
For RKNN
users, please refer to sherpa-onnx-rk3588-20-seconds-sense-voice-zh-en-ja-ko-yue-2025-09-09 (Chinese, English, Japanese, Korean, Cantonese, 中英日韩粤语).
For onnxruntime
users, please refer to sherpa-onnx-sense-voice-zh-en-ja-ko-yue-int8-2025-09-09 (Chinese, English, Japanese, Korean, Cantonese, 中英日韩粤语).
In the following, we describe how to use it.
Download
Please use the following commands to download it:
wget https://github.com/k2-fsa/sherpa-ncnn/releases/download/asr-models/sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09.tar.bz2
tar xvf sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09.tar.bz2
rm sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09.tar.bz2
After downloading, you should find the following files:
ls -lh sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/
total 918672
-rw-r--r-- 1 fangjun staff 131B Sep 13 19:17 README.md
-rw-r--r-- 1 fangjun staff 443M Sep 13 19:17 model.ncnn.bin
-rw-r--r-- 1 fangjun staff 162K Sep 13 19:17 model.ncnn.param
drwxr-xr-x 23 fangjun staff 736B Sep 13 19:17 test_wavs
-rw-r--r-- 1 fangjun staff 308K Sep 13 19:17 tokens.txt
ls sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/
en.wav yue-1.wav yue-11.wav yue-13.wav yue-15.wav yue-17.wav yue-3.wav yue-5.wav yue-7.wav yue-9.wav zh.wav
yue-0.wav yue-10.wav yue-12.wav yue-14.wav yue-16.wav yue-2.wav yue-4.wav yue-6.wav yue-8.wav yue.wav
Hint
If you want to use the int8
quantized model, please run:
wget https://github.com/k2-fsa/sherpa-ncnn/releases/download/asr-models/sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-int8-2025-09-09.tar.bz2
tar xvf sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-int8-2025-09-09.tar.bz2
rm sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-int8-2025-09-09.tar.bz2
ls -lh sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-int8-2025-09-09
total 461200
-rw-r--r-- 1 fangjun staff 131B Sep 17 14:25 README.md
-rw-r--r-- 1 fangjun staff 222M Sep 17 14:26 model.ncnn.bin
-rw-r--r-- 1 fangjun staff 158K Sep 17 14:26 model.ncnn.param
drwxr-xr-x 23 fangjun staff 736B Sep 17 14:25 test_wavs
-rw-r--r-- 1 fangjun staff 308K Sep 17 14:25 tokens.txt
In the following, we show how to decode the files sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-*.wav
.
yue-0.wav
Wave filename | Content | Ground truth |
---|---|---|
yue-0.wav | 两只小企鹅都有嘢食 |
./build/bin/sherpa-ncnn-offline \
--tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt \
--sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 \
--num-threads=1 \
sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-0.wav
/Users/fangjun/open-source/sherpa-ncnn/sherpa-ncnn/csrc/parse-options.cc:Read:381 ./build/bin/sherpa-ncnn-offline --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 --num-threads=1 sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-0.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(sense_voice=OfflineSenseVoiceModelConfig(model_dir="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09", language="auto", use_itn=False), tokens="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt", num_threads=1, debug=False), decoding_method="greedy_search", blank_penalty=0)
Creating recognizer ...
Started
Done!
sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-0.wav
{"lang": "<|yue|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "两只小企鹅都有嘢食", "timestamps": [0.36, 0.60, 0.90, 1.08, 1.32, 1.74, 1.98, 2.16, 2.40], "tokens":["两", "只", "小", "企", "鹅", "都", "有", "嘢", "食"]}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.416 s
Real time factor (RTF): 0.416 / 3.072 = 0.135
yue-1.wav
Wave filename | Content | Ground truth |
---|---|---|
yue-1.wav | 叫做诶诶直入式你个脑部里边咧记得呢一个嘅以前香港有一个广告好出名嘅佢乜嘢都冇噶净系影住喺弥敦道佢哋间铺头嘅啫但系就不停有人嗌啦平平吧平吧 |
./build/bin/sherpa-ncnn-offline \
--tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt \
--sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 \
--num-threads=1 \
sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-1.wav
/Users/fangjun/open-source/sherpa-ncnn/sherpa-ncnn/csrc/parse-options.cc:Read:381 ./build/bin/sherpa-ncnn-offline --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 --num-threads=1 sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-1.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(sense_voice=OfflineSenseVoiceModelConfig(model_dir="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09", language="auto", use_itn=False), tokens="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt", num_threads=1, debug=False), decoding_method="greedy_search", blank_penalty=0)
Creating recognizer ...
Started
Done!
sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-1.wav
{"lang": "<|yue|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "叫做诶诶直入式你个脑部里边咧记得呢一个嘅以前香港有一个广告好出名嘅佢乜嘢都冇噶净系影住喺弥敦道佢哋间铺头嘅啫但系就不停有人嗌啦平平吧平吧", "timestamps": [0.06, 0.18, 0.36, 0.72, 1.08, 1.38, 1.56, 1.86, 1.98, 2.16, 2.52, 2.76, 2.88, 3.00, 3.24, 3.36, 3.60, 3.72, 3.84, 3.96, 4.20, 4.32, 4.44, 4.62, 4.74, 4.86, 4.92, 5.04, 5.16, 5.34, 5.46, 5.58, 5.88, 6.30, 6.60, 6.78, 6.90, 7.02, 7.20, 7.50, 7.68, 7.86, 7.98, 8.16, 8.28, 8.46, 8.64, 8.88, 8.94, 9.18, 9.30, 9.48, 9.66, 9.78, 10.08, 10.14, 10.26, 10.50, 10.62, 10.80, 10.92, 11.04, 11.22, 12.00, 12.72, 13.02, 13.92, 14.16], "tokens":["叫", "做", "诶", "诶", "直", "入", "式", "你", "个", "脑", "部", "里", "边", "咧", "记", "得", "呢", "一", "个", "嘅", "以", "前", "香", "港", "有", "一", "个", "广", "告", "好", "出", "名", "嘅", "佢", "乜", "嘢", "都", "冇", "噶", "净", "系", "影", "住", "喺", "弥", "敦", "道", "佢", "哋", "间", "铺", "头", "嘅", "啫", "但", "系", "就", "不", "停", "有", "人", "嗌", "啦", "平", "平", "吧", "平", "吧"]}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 1.987 s
Real time factor (RTF): 1.987 / 15.104 = 0.132
yue-2.wav
Wave filename | Content | Ground truth |
---|---|---|
yue-2.wav | 忽然从光线死角嘅阴影度窜出一只大猫 |
./build/bin/sherpa-ncnn-offline \
--tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt \
--sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 \
--num-threads=1 \
sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-2.wav
/Users/fangjun/open-source/sherpa-ncnn/sherpa-ncnn/csrc/parse-options.cc:Read:381 ./build/bin/sherpa-ncnn-offline --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 --num-threads=1 sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-2.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(sense_voice=OfflineSenseVoiceModelConfig(model_dir="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09", language="auto", use_itn=False), tokens="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt", num_threads=1, debug=False), decoding_method="greedy_search", blank_penalty=0)
Creating recognizer ...
Started
Done!
sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-2.wav
{"lang": "<|yue|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "忽然从光线死角嘅阴影度窜出一只大猫", "timestamps": [0.36, 0.54, 0.96, 1.26, 1.50, 1.80, 2.04, 2.22, 2.34, 2.52, 2.76, 3.12, 3.30, 3.48, 3.60, 3.78, 3.90], "tokens":["忽", "然", "从", "光", "线", "死", "角", "嘅", "阴", "影", "度", "窜", "出", "一", "只", "大", "猫"]}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.533 s
Real time factor (RTF): 0.533 / 4.608 = 0.116
yue-3.wav
Wave filename | Content | Ground truth |
---|---|---|
yue-3.wav | 今日我带大家去见识一位九零后嘅靓仔咧 |
./build/bin/sherpa-ncnn-offline \
--tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt \
--sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 \
--num-threads=1 \
sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-3.wav
/Users/fangjun/open-source/sherpa-ncnn/sherpa-ncnn/csrc/parse-options.cc:Read:381 ./build/bin/sherpa-ncnn-offline --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 --num-threads=1 sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-3.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(sense_voice=OfflineSenseVoiceModelConfig(model_dir="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09", language="auto", use_itn=False), tokens="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt", num_threads=1, debug=False), decoding_method="greedy_search", blank_penalty=0)
Creating recognizer ...
Started
Done!
sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-3.wav
{"lang": "<|yue|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "今日我带大家去见识一位九零后嘅靓仔咧", "timestamps": [0.24, 0.36, 0.60, 0.72, 1.02, 1.14, 1.44, 1.74, 1.92, 2.10, 2.22, 2.52, 2.76, 2.94, 3.18, 3.30, 3.48, 3.78], "tokens":["今", "日", "我", "带", "大", "家", "去", "见", "识", "一", "位", "九", "零", "后", "嘅", "靓", "仔", "咧"]}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.590 s
Real time factor (RTF): 0.590 / 4.352 = 0.136
yue-4.wav
Wave filename | Content | Ground truth |
---|---|---|
yue-4.wav | 香港嘅消费市场从此不一样 |
./build/bin/sherpa-ncnn-offline \
--tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt \
--sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 \
--num-threads=1 \
sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-4.wav
/Users/fangjun/open-source/sherpa-ncnn/sherpa-ncnn/csrc/parse-options.cc:Read:381 ./build/bin/sherpa-ncnn-offline --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 --num-threads=1 sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-4.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(sense_voice=OfflineSenseVoiceModelConfig(model_dir="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09", language="auto", use_itn=False), tokens="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt", num_threads=1, debug=False), decoding_method="greedy_search", blank_penalty=0)
Creating recognizer ...
Started
Done!
sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-4.wav
{"lang": "<|yue|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "香港嘅消费市场从此不一样", "timestamps": [0.36, 0.54, 0.72, 0.90, 1.08, 1.38, 1.56, 1.92, 2.10, 2.40, 2.58, 2.76], "tokens":["香", "港", "嘅", "消", "费", "市", "场", "从", "此", "不", "一", "样"]}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.398 s
Real time factor (RTF): 0.398 / 3.200 = 0.124
yue-5.wav
Wave filename | Content | Ground truth |
---|---|---|
yue-5.wav | 景天谂唔到呢个守门嘅弟子竟然咁无礼霎时间面色都变埋 |
./build/bin/sherpa-ncnn-offline \
--tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt \
--sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 \
--num-threads=1 \
sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-5.wav
/Users/fangjun/open-source/sherpa-ncnn/sherpa-ncnn/csrc/parse-options.cc:Read:381 ./build/bin/sherpa-ncnn-offline --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 --num-threads=1 sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-5.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(sense_voice=OfflineSenseVoiceModelConfig(model_dir="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09", language="auto", use_itn=False), tokens="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt", num_threads=1, debug=False), decoding_method="greedy_search", blank_penalty=0)
Creating recognizer ...
Started
Done!
sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-5.wav
{"lang": "<|yue|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "景天谂唔到呢个守门嘅弟子竟然咁无礼霎时间面色都变埋", "timestamps": [0.42, 0.60, 0.96, 1.14, 1.20, 1.38, 1.50, 1.62, 1.86, 2.04, 2.22, 2.34, 3.00, 3.24, 3.42, 3.84, 4.08, 4.80, 5.16, 5.34, 5.58, 5.82, 6.06, 6.24, 6.42], "tokens":["景", "天", "谂", "唔", "到", "呢", "个", "守", "门", "嘅", "弟", "子", "竟", "然", "咁", "无", "礼", "霎", "时", "间", "面", "色", "都", "变", "埋"]}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.824 s
Real time factor (RTF): 0.824 / 7.168 = 0.115
yue-6.wav
Wave filename | Content | Ground truth |
---|---|---|
yue-6.wav | 六个星期嘅课程包括六堂课同两个测验你唔掌握到基本嘅十九个声母五十六个韵母同九个声调我哋仲针对咗广东话学习者会遇到嘅大樽颈啊以国语为母语人士最难掌握嘅五大韵母教课书唔会教你嘅七种变音同十种变调说话生硬唔自然嘅根本性问题提供全新嘅学习方向等你突破难关 |
./build/bin/sherpa-ncnn-offline \
--tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt \
--sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 \
--num-threads=1 \
sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-6.wav
/Users/fangjun/open-source/sherpa-ncnn/sherpa-ncnn/csrc/parse-options.cc:Read:381 ./build/bin/sherpa-ncnn-offline --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 --num-threads=1 sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-6.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(sense_voice=OfflineSenseVoiceModelConfig(model_dir="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09", language="auto", use_itn=False), tokens="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt", num_threads=1, debug=False), decoding_method="greedy_search", blank_penalty=0)
Creating recognizer ...
Started
Done!
sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-6.wav
{"lang": "<|yue|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "六个星期嘅课程包括六堂课同两个测验你只掌握到基本嘅十九个声母五十六个韵母同九个声调我哋仲针对咗广东话学习者会遇到嘅大樽颈啊以国语为母语人士最难掌握嘅五大韵母教课书唔会教你嘅七种变音同十种变调说话生硬唔自然嘅根本性问题提供全新嘅学习方向等你突破难关", "timestamps": [0.36, 0.66, 0.84, 1.08, 1.26, 1.44, 1.68, 2.16, 2.34, 2.58, 2.76, 2.94, 3.36, 3.60, 3.78, 4.02, 4.26, 4.80, 5.16, 5.40, 5.52, 5.70, 5.94, 6.06, 6.30, 6.54, 6.78, 6.96, 7.08, 7.32, 7.68, 7.80, 7.98, 8.10, 8.28, 8.52, 8.88, 9.12, 9.36, 9.54, 9.72, 10.14, 10.26, 10.44, 10.56, 10.74, 10.92, 11.22, 11.34, 11.52, 11.70, 11.82, 12.00, 12.42, 12.66, 12.84, 13.02, 13.44, 13.74, 13.98, 14.22, 14.52, 14.82, 15.00, 15.24, 15.42, 15.60, 15.84, 15.90, 16.32, 16.62, 16.86, 17.10, 17.28, 17.64, 17.82, 18.06, 18.30, 18.78, 19.02, 19.20, 19.50, 19.62, 19.80, 19.98, 20.16, 20.34, 20.58, 20.82, 21.00, 21.30, 21.54, 21.78, 22.02, 22.20, 22.98, 23.28, 23.52, 23.70, 24.18, 24.36, 24.60, 24.78, 25.14, 25.38, 25.68, 25.92, 26.04, 26.52, 26.70, 27.00, 27.18, 27.42, 27.60, 27.72, 27.90, 28.08, 28.50, 28.74, 29.28, 29.46, 29.76, 29.94], "tokens":["六", "个", "星", "期", "嘅", "课", "程", "包", "括", "六", "堂", "课", "同", "两", "个", "测", "验", "你", "只", "掌", "握", "到", "基", "本", "嘅", "十", "九", "个", "声", "母", "五", "十", "六", "个", "韵", "母", "同", "九", "个", "声", "调", "我", "哋", "仲", "针", "对", "咗", "广", "东", "话", "学", "习", "者", "会", "遇", "到", "嘅", "大", "樽", "颈", "啊", "以", "国", "语", "为", "母", "语", "人", "士", "最", "难", "掌", "握", "嘅", "五", "大", "韵", "母", "教", "课", "书", "唔", "会", "教", "你", "嘅", "七", "种", "变", "音", "同", "十", "种", "变", "调", "说", "话", "生", "硬", "唔", "自", "然", "嘅", "根", "本", "性", "问", "题", "提", "供", "全", "新", "嘅", "学", "习", "方", "向", "等", "你", "突", "破", "难", "关"]}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 3.758 s
Real time factor (RTF): 3.758 / 30.592 = 0.123
yue-7.wav
Wave filename | Content | Ground truth |
---|---|---|
yue-7.wav | 同意嘅累积唔系阴同阳嘅累积可以讲三既融合咗一同意融合咗阴同阳 |
./build/bin/sherpa-ncnn-offline \
--tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt \
--sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 \
--num-threads=1 \
sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-7.wav
/Users/fangjun/open-source/sherpa-ncnn/sherpa-ncnn/csrc/parse-options.cc:Read:381 ./build/bin/sherpa-ncnn-offline --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 --num-threads=1 sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-7.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(sense_voice=OfflineSenseVoiceModelConfig(model_dir="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09", language="auto", use_itn=False), tokens="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt", num_threads=1, debug=False), decoding_method="greedy_search", blank_penalty=0)
Creating recognizer ...
Started
Done!
sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-7.wav
{"lang": "<|yue|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "同二嘅累积唔系阴同阳嘅累积可以讲三既融合咗一同二融合咗阴同阳", "timestamps": [0.48, 0.84, 1.20, 1.38, 1.56, 2.52, 2.70, 3.00, 3.42, 3.66, 3.96, 4.20, 4.38, 5.40, 5.76, 6.00, 6.78, 7.86, 8.28, 8.46, 8.70, 9.24, 9.72, 10.08, 11.28, 11.46, 11.70, 12.12, 12.54, 12.78], "tokens":["同", "二", "嘅", "累", "积", "唔", "系", "阴", "同", "阳", "嘅", "累", "积", "可", "以", "讲", "三", "既", "融", "合", "咗", "一", "同", "二", "融", "合", "咗", "阴", "同", "阳"]}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 1.642 s
Real time factor (RTF): 1.642 / 13.900 = 0.118
yue-8.wav
Wave filename | Content | Ground truth |
---|---|---|
yue-8.wav | 而较早前已经复航嘅氹仔北安码头星期五开始增设夜间航班不过两个码头暂时都冇凌晨班次有旅客希望尽快恢复可以留喺澳门长啲时间 |
./build/bin/sherpa-ncnn-offline \
--tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt \
--sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 \
--num-threads=1 \
sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-8.wav
/Users/fangjun/open-source/sherpa-ncnn/sherpa-ncnn/csrc/parse-options.cc:Read:381 ./build/bin/sherpa-ncnn-offline --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 --num-threads=1 sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-8.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(sense_voice=OfflineSenseVoiceModelConfig(model_dir="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09", language="auto", use_itn=False), tokens="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt", num_threads=1, debug=False), decoding_method="greedy_search", blank_penalty=0)
Creating recognizer ...
Started
Done!
sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-8.wav
{"lang": "<|yue|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "而较早前已经复航嘅氹仔北安码头星期五开始增设夜间航班不过两个码头暂时都冇凌晨班次有旅客希望尽快恢复可以留喺澳门长啲时间", "timestamps": [0.30, 0.54, 0.72, 0.90, 1.14, 1.26, 1.50, 1.68, 1.86, 2.04, 2.28, 2.58, 2.70, 3.00, 3.12, 3.42, 3.60, 3.78, 4.02, 4.14, 4.44, 4.62, 4.92, 5.04, 5.28, 5.40, 6.12, 6.36, 6.60, 6.78, 6.96, 7.14, 7.44, 7.62, 7.80, 7.98, 8.16, 8.34, 8.58, 8.76, 9.54, 9.72, 9.90, 10.14, 10.26, 10.50, 10.62, 10.92, 11.10, 11.58, 11.70, 11.94, 12.06, 12.30, 12.48, 12.78, 12.96, 13.20, 13.44], "tokens":["而", "较", "早", "前", "已", "经", "复", "航", "嘅", "氹", "仔", "北", "安", "码", "头", "星", "期", "五", "开", "始", "增", "设", "夜", "间", "航", "班", "不", "过", "两", "个", "码", "头", "暂", "时", "都", "冇", "凌", "晨", "班", "次", "有", "旅", "客", "希", "望", "尽", "快", "恢", "复", "可", "以", "留", "喺", "澳", "门", "长", "啲", "时", "间"]}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 1.655 s
Real time factor (RTF): 1.655 / 14.080 = 0.118
yue-9.wav
Wave filename | Content | Ground truth |
---|---|---|
yue-9.wav | 刘备仲马鞭一指蜀兵一齐掩杀过去打到吴兵大败唉刘备八路兵马以雷霆万钧之势啊杀到吴兵啊尸横遍野血流成河 |
./build/bin/sherpa-ncnn-offline \
--tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt \
--sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 \
--num-threads=1 \
sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-9.wav
/Users/fangjun/open-source/sherpa-ncnn/sherpa-ncnn/csrc/parse-options.cc:Read:381 ./build/bin/sherpa-ncnn-offline --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 --num-threads=1 sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-9.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(sense_voice=OfflineSenseVoiceModelConfig(model_dir="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09", language="auto", use_itn=False), tokens="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt", num_threads=1, debug=False), decoding_method="greedy_search", blank_penalty=0)
Creating recognizer ...
Started
Done!
sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-9.wav
{"lang": "<|yue|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "刘备仲马鞭得指蜀兵一齐掩杀过去打到吴兵大败唉刘备八路兵马以雷霆万军之势啊杀到吴兵啊尸横遍野血流成河", "timestamps": [0.30, 0.48, 0.72, 0.90, 1.14, 1.32, 1.44, 2.22, 2.58, 2.88, 3.06, 3.42, 3.60, 3.90, 3.96, 4.32, 4.50, 4.68, 4.92, 5.28, 5.46, 6.06, 6.60, 6.84, 7.26, 7.56, 7.74, 7.98, 8.58, 8.88, 9.12, 9.36, 9.60, 9.84, 10.08, 10.26, 10.38, 10.56, 10.80, 10.98, 11.22, 11.52, 12.12, 12.36, 12.66, 12.90, 13.14, 13.32, 13.50], "tokens":["刘", "备", "仲", "马", "鞭", "得", "指", "蜀", "兵", "一", "齐", "掩", "杀", "过", "去", "打", "到", "吴", "兵", "大", "败", "唉", "刘", "备", "八", "路", "兵", "马", "以", "雷", "霆", "万", "军", "之", "势", "啊", "杀", "到", "吴", "兵", "啊", "尸", "横", "遍", "野", "血", "流", "成", "河"]}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 1.663 s
Real time factor (RTF): 1.663 / 14.336 = 0.116
yue-10.wav
Wave filename | Content | Ground truth |
---|---|---|
yue-10.wav | 原来王力宏咧系佢家中里面咧成就最低个吓哇 |
./build/bin/sherpa-ncnn-offline \
--tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt \
--sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 \
--num-threads=1 \
sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-10.wav
/Users/fangjun/open-source/sherpa-ncnn/sherpa-ncnn/csrc/parse-options.cc:Read:381 ./build/bin/sherpa-ncnn-offline --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 --num-threads=1 sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-10.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(sense_voice=OfflineSenseVoiceModelConfig(model_dir="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09", language="auto", use_itn=False), tokens="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt", num_threads=1, debug=False), decoding_method="greedy_search", blank_penalty=0)
Creating recognizer ...
Started
Done!
sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-10.wav
{"lang": "<|yue|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "原来王力宏咧系佢家中里边咧成就最低个吓哇", "timestamps": [0.42, 0.54, 0.90, 1.14, 1.44, 1.62, 1.80, 1.92, 2.16, 2.34, 2.58, 2.70, 2.82, 3.06, 3.24, 3.54, 3.78, 4.26, 4.92, 5.76], "tokens":["原", "来", "王", "力", "宏", "咧", "系", "佢", "家", "中", "里", "边", "咧", "成", "就", "最", "低", "个", "吓", "哇"]}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.749 s
Real time factor (RTF): 0.749 / 6.656 = 0.113
yue-11.wav
Wave filename | Content | Ground truth |
---|---|---|
yue-11.wav | 无论你提出任何嘅要求 |
./build/bin/sherpa-ncnn-offline \
--tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt \
--sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 \
--num-threads=1 \
sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-11.wav
/Users/fangjun/open-source/sherpa-ncnn/sherpa-ncnn/csrc/parse-options.cc:Read:381 ./build/bin/sherpa-ncnn-offline --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 --num-threads=1 sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-11.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(sense_voice=OfflineSenseVoiceModelConfig(model_dir="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09", language="auto", use_itn=False), tokens="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt", num_threads=1, debug=False), decoding_method="greedy_search", blank_penalty=0)
Creating recognizer ...
Started
Done!
sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-11.wav
{"lang": "<|yue|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "无论你提出任何嘅要求", "timestamps": [0.48, 0.60, 0.78, 1.02, 1.14, 1.32, 1.50, 1.68, 1.86, 2.10], "tokens":["无", "论", "你", "提", "出", "任", "何", "嘅", "要", "求"]}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.354 s
Real time factor (RTF): 0.354 / 2.688 = 0.132
yue-12.wav
Wave filename | Content | Ground truth |
---|---|---|
yue-12.wav | 咁咁多样材料咁我哋首先第一步处理咗一件 |
./build/bin/sherpa-ncnn-offline \
--tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt \
--sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 \
--num-threads=1 \
sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-12.wav
/Users/fangjun/open-source/sherpa-ncnn/sherpa-ncnn/csrc/parse-options.cc:Read:381 ./build/bin/sherpa-ncnn-offline --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 --num-threads=1 sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-12.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(sense_voice=OfflineSenseVoiceModelConfig(model_dir="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09", language="auto", use_itn=False), tokens="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt", num_threads=1, debug=False), decoding_method="greedy_search", blank_penalty=0)
Creating recognizer ...
Started
Done!
sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-12.wav
{"lang": "<|yue|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "咁咁多样材料咁我哋首先第一步处理咗一件", "timestamps": [0.30, 0.72, 0.90, 1.14, 1.38, 1.56, 1.92, 2.10, 2.22, 2.34, 2.58, 2.88, 3.00, 3.18, 3.60, 3.84, 4.02, 4.14, 4.26], "tokens":["咁", "咁", "多", "样", "材", "料", "咁", "我", "哋", "首", "先", "第", "一", "步", "处", "理", "咗", "一", "件"]}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.592 s
Real time factor (RTF): 0.592 / 4.864 = 0.122
yue-13.wav
Wave filename | Content | Ground truth |
---|---|---|
yue-13.wav | 啲点样对于佢哋嘅服务态度啊不透过呢一年左右嘅时间啦其实大家都静一静啦咁你就会见到香港嘅经济其实 |
./build/bin/sherpa-ncnn-offline \
--tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt \
--sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 \
--num-threads=1 \
sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-13.wav
/Users/fangjun/open-source/sherpa-ncnn/sherpa-ncnn/csrc/parse-options.cc:Read:381 ./build/bin/sherpa-ncnn-offline --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 --num-threads=1 sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-13.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(sense_voice=OfflineSenseVoiceModelConfig(model_dir="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09", language="auto", use_itn=False), tokens="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt", num_threads=1, debug=False), decoding_method="greedy_search", blank_penalty=0)
Creating recognizer ...
Started
Done!
sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-13.wav
{"lang": "<|yue|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "啲点样对于佢哋嘅服务态度啊希透过呢一年左右嘅时间啦其实大家都静一静啦咁你就会见到香港嘅经济其实", "timestamps": [0.00, 0.24, 0.42, 0.72, 0.84, 1.08, 1.20, 1.68, 2.16, 2.34, 2.58, 2.76, 2.94, 3.24, 3.54, 3.72, 4.02, 4.32, 4.50, 4.80, 4.98, 5.16, 5.34, 5.52, 5.70, 6.06, 6.24, 6.48, 6.60, 6.78, 7.02, 7.20, 7.38, 7.56, 7.92, 8.16, 8.34, 8.52, 8.70, 8.82, 9.00, 9.18, 9.36, 9.48, 9.66, 9.96, 10.14], "tokens":["啲", "点", "样", "对", "于", "佢", "哋", "嘅", "服", "务", "态", "度", "啊", "希", "透", "过", "呢", "一", "年", "左", "右", "嘅", "时", "间", "啦", "其", "实", "大", "家", "都", "静", "一", "静", "啦", "咁", "你", "就", "会", "见", "到", "香", "港", "嘅", "经", "济", "其", "实"]}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 1.270 s
Real time factor (RTF): 1.270 / 10.624 = 0.120
yue-14.wav
Wave filename | Content | Ground truth |
---|---|---|
yue-14.wav | 就即刻会同贵正两位八代长老带埋五名七代弟子前啲灵蛇岛想话生擒谢信抢咗屠龙宝刀翻嚟献俾帮主嘅 |
./build/bin/sherpa-ncnn-offline \
--tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt \
--sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 \
--num-threads=1 \
sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-14.wav
/Users/fangjun/open-source/sherpa-ncnn/sherpa-ncnn/csrc/parse-options.cc:Read:381 ./build/bin/sherpa-ncnn-offline --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 --num-threads=1 sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-14.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(sense_voice=OfflineSenseVoiceModelConfig(model_dir="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09", language="auto", use_itn=False), tokens="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt", num_threads=1, debug=False), decoding_method="greedy_search", blank_penalty=0)
Creating recognizer ...
Started
Done!
sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-14.wav
{"lang": "<|yue|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "就即刻会同贵正两位八代长老带埋五名七代弟子前啲灵蛇岛想话生擒谢信抢咗屠龙宝刀翻嚟献俾帮主嘅", "timestamps": [0.18, 0.36, 0.48, 0.72, 0.84, 1.20, 1.44, 1.74, 1.92, 2.10, 2.28, 2.52, 2.76, 3.60, 3.84, 4.14, 4.32, 4.56, 4.80, 5.04, 5.22, 5.88, 6.12, 6.24, 6.42, 6.78, 7.68, 7.92, 8.16, 8.52, 8.82, 9.18, 9.96, 10.26, 10.38, 10.62, 10.86, 11.10, 11.22, 11.40, 11.64, 11.88, 12.18, 12.30, 12.66], "tokens":["就", "即", "刻", "会", "同", "贵", "正", "两", "位", "八", "代", "长", "老", "带", "埋", "五", "名", "七", "代", "弟", "子", "前", "啲", "灵", "蛇", "岛", "想", "话", "生", "擒", "谢", "信", "抢", "咗", "屠", "龙", "宝", "刀", "翻", "嚟", "献", "俾", "帮", "主", "嘅"]}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 1.577 s
Real time factor (RTF): 1.577 / 13.056 = 0.121
yue-15.wav
Wave filename | Content | Ground truth |
---|---|---|
yue-15.wav | 我知道我的观众大部分都是对广东话有兴趣想学广东话的人 |
./build/bin/sherpa-ncnn-offline \
--tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt \
--sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 \
--num-threads=1 \
sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-15.wav
/Users/fangjun/open-source/sherpa-ncnn/sherpa-ncnn/csrc/parse-options.cc:Read:381 ./build/bin/sherpa-ncnn-offline --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 --num-threads=1 sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-15.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(sense_voice=OfflineSenseVoiceModelConfig(model_dir="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09", language="auto", use_itn=False), tokens="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt", num_threads=1, debug=False), decoding_method="greedy_search", blank_penalty=0)
Creating recognizer ...
Started
Done!
sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-15.wav
{"lang": "<|yue|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "我知道我嘅观众大部分都系对广东话有兴趣想学广东话嘅人", "timestamps": [0.42, 0.54, 0.66, 0.84, 1.02, 1.20, 1.38, 1.98, 2.22, 2.40, 2.64, 2.76, 2.88, 3.12, 3.24, 3.42, 3.60, 3.78, 4.02, 4.62, 4.92, 5.16, 5.34, 5.52, 5.70, 5.94], "tokens":["我", "知", "道", "我", "嘅", "观", "众", "大", "部", "分", "都", "系", "对", "广", "东", "话", "有", "兴", "趣", "想", "学", "广", "东", "话", "嘅", "人"]}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.753 s
Real time factor (RTF): 0.753 / 6.400 = 0.118
yue-16.wav
Wave filename | Content | Ground truth |
---|---|---|
yue-16.wav | 诶原来啊我哋中国人呢讲究物极必反 |
./build/bin/sherpa-ncnn-offline \
--tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt \
--sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 \
--num-threads=1 \
sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-16.wav
/Users/fangjun/open-source/sherpa-ncnn/sherpa-ncnn/csrc/parse-options.cc:Read:381 ./build/bin/sherpa-ncnn-offline --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 --num-threads=1 sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-16.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(sense_voice=OfflineSenseVoiceModelConfig(model_dir="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09", language="auto", use_itn=False), tokens="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt", num_threads=1, debug=False), decoding_method="greedy_search", blank_penalty=0)
Creating recognizer ...
Started
Done!
sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-16.wav
{"lang": "<|yue|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "原来啊我哋中国人呢讲究密极必反", "timestamps": [1.92, 2.04, 2.22, 2.64, 2.76, 2.94, 3.12, 3.36, 3.48, 3.72, 3.84, 4.02, 4.20, 4.44, 4.62], "tokens":["原", "来", "啊", "我", "哋", "中", "国", "人", "呢", "讲", "究", "密", "极", "必", "反"]}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.650 s
Real time factor (RTF): 0.650 / 5.700 = 0.114
yue-17.wav
Wave filename | Content | Ground truth |
---|---|---|
yue-17.wav | 如果东边道建成咁丹东呢就会成为最近嘅出海港同埋经过哈大线出海相比绥分河则会减少运渠三百五十六公里 |
./build/bin/sherpa-ncnn-offline \
--tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt \
--sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 \
--num-threads=1 \
sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-17.wav
/Users/fangjun/open-source/sherpa-ncnn/sherpa-ncnn/csrc/parse-options.cc:Read:381 ./build/bin/sherpa-ncnn-offline --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 --num-threads=1 sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-17.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(sense_voice=OfflineSenseVoiceModelConfig(model_dir="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09", language="auto", use_itn=False), tokens="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt", num_threads=1, debug=False), decoding_method="greedy_search", blank_penalty=0)
Creating recognizer ...
Started
Done!
sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-17.wav
{"lang": "<|yue|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "如果东边道建成咁丹东呢就会成为最近嘅出海港同埋经过哈大线出海相比绥分河将会减少运渠三百五十六公里", "timestamps": [0.48, 0.60, 0.84, 0.96, 1.20, 1.50, 1.74, 2.58, 3.00, 3.18, 3.36, 3.78, 4.02, 4.20, 4.32, 4.56, 4.74, 4.86, 5.04, 5.22, 5.46, 6.36, 6.54, 6.78, 6.90, 7.08, 7.32, 7.50, 7.80, 7.92, 8.16, 8.34, 9.24, 9.54, 9.84, 10.26, 10.50, 10.74, 10.86, 11.22, 11.40, 11.82, 12.12, 12.30, 12.48, 12.60, 12.84, 13.02], "tokens":["如", "果", "东", "边", "道", "建", "成", "咁", "丹", "东", "呢", "就", "会", "成", "为", "最", "近", "嘅", "出", "海", "港", "同", "埋", "经", "过", "哈", "大", "线", "出", "海", "相", "比", "绥", "分", "河", "将", "会", "减", "少", "运", "渠", "三", "百", "五", "十", "六", "公", "里"]}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 1.824 s
Real time factor (RTF): 1.824 / 13.800 = 0.132