Pre-trained Models

This page describes how to download pre-trained SenseVoice models.

sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17 (Chinese, English, Japanese, Korean, Cantonese, 中英日韩粤语)

This model is converted from https://www.modelscope.cn/models/iic/SenseVoiceSmall using the script export-ncnn.py.

It supports the following 5 languages:

  • Chinese (Mandarin, 普通话)

  • Cantonese (粤语, 广东话)

  • English

  • Japanese

  • Korean

In the following, we describe how to use it.

Download

Please use the following commands to download it:

cd /path/to/sherpa-ncnn

wget https://github.com/k2-fsa/sherpa-ncnn/releases/download/asr-models/sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17.tar.bz2
tar xvf sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17.tar.bz2
rm sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17.tar.bz2
ls -lh  sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/

total 907400
-rw-r--r--  1 fangjun  staff    71B Sep 13 19:17 LICENSE
-rw-r--r--  1 fangjun  staff   104B Sep 13 19:17 README.md
-rw-r--r--  1 fangjun  staff   443M Sep 13 19:17 model.ncnn.bin
-rw-r--r--  1 fangjun  staff   162K Sep 13 19:17 model.ncnn.param
drwxr-xr-x  7 fangjun  staff   224B Sep 13 19:17 test_wavs
-rw-r--r--  1 fangjun  staff   308K Sep 13 19:17 tokens.txt

Hint

If you want to use the int8 quantized model, please run:

wget https://github.com/k2-fsa/sherpa-ncnn/releases/download/asr-models/sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-int8-2024-07-17.tar.bz2
tar xvf sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-int8-2024-07-17.tar.bz2
rm sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-int8-2024-07-17.tar.bz2
ls -lh sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-int8-2024-07-17/

total 460696
-rw-r--r--  1 fangjun  staff    71B Sep 17 14:26 LICENSE
-rw-r--r--  1 fangjun  staff   104B Sep 17 14:26 README.md
-rw-r--r--  1 fangjun  staff   222M Sep 17 14:28 model.ncnn.bin
-rw-r--r--  1 fangjun  staff   158K Sep 17 14:28 model.ncnn.param
drwxr-xr-x  7 fangjun  staff   224B Sep 17 14:26 test_wavs
-rw-r--r--  1 fangjun  staff   308K Sep 17 14:26 tokens.txt

Decode a file

Without inverse text normalization

To decode a file without inverse text normalization, please use:

./build/bin/sherpa-ncnn-offline \
  --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt \
  --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17 \
  --num-threads=1 \
  ./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/test_wavs/zh.wav

You should see the following output:

/Users/fangjun/open-source/sherpa-ncnn/sherpa-ncnn/csrc/parse-options.cc:Read:381 ./build/bin/sherpa-ncnn-offline --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17 --num-threads=1 ./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/test_wavs/zh.wav

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(sense_voice=OfflineSenseVoiceModelConfig(model_dir="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17", language="auto", use_itn=False), tokens="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt", num_threads=1, debug=False), decoding_method="greedy_search", blank_penalty=0)
Creating recognizer ...
Started
Done!

./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/test_wavs/zh.wav
{"lang": "<|zh|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "开放时间早上九点至下午五点", "timestamps": [0.72, 0.96, 1.26, 1.44, 1.92, 2.10, 2.58, 2.82, 3.30, 3.90, 4.20, 4.56, 4.74], "tokens":["开", "放", "时", "间", "早", "上", "九", "点", "至", "下", "午", "五", "点"]}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.661 s
Real time factor (RTF): 0.661 / 5.592 = 0.118

With inverse text normalization

To decode a file with inverse text normalization, please use:

./build/bin/sherpa-ncnn-offline \
  --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt \
  --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17 \
  --sense-voice-use-itn=1 \
  --num-threads=1 \
  ./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/test_wavs/zh.wav

You should see the following output:

/Users/fangjun/open-source/sherpa-ncnn/sherpa-ncnn/csrc/parse-options.cc:Read:381 ./build/bin/sherpa-ncnn-offline --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17 --sense-voice-use-itn=1 --num-threads=1 ./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/test_wavs/zh.wav

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(sense_voice=OfflineSenseVoiceModelConfig(model_dir="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17", language="auto", use_itn=True), tokens="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt", num_threads=1, debug=False), decoding_method="greedy_search", blank_penalty=0)
Creating recognizer ...
Started
Done!

./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/test_wavs/zh.wav
{"lang": "<|zh|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "开放时间早上9点至下午五点。", "timestamps": [0.72, 0.96, 1.26, 1.44, 1.92, 2.10, 2.58, 2.82, 3.30, 3.90, 4.20, 4.56, 4.74, 5.46], "tokens":["开", "放", "时", "间", "早", "上", "9", "点", "至", "下", "午", "五", "点", "。"]}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.613 s
Real time factor (RTF): 0.613 / 5.592 = 0.110

Hint

When inverse text normalziation is enabled, the results contain punctuations.

Real-time Speech recognition from a microphone

First, download a VAD model

cd /path/to/sherpa-ncnn

wget https://github.com/k2-fsa/sherpa-ncnn/releases/download/models/sherpa-ncnn-silero-vad.tar.bz2
tar xvf sherpa-ncnn-silero-vad.tar.bz2
rm sherpa-ncnn-silero-vad.tar.bz2

Now, run it:

./build/bin/sherpa-ncnn-vad-microphone-simulated-streaming-asr \
  --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt \
  --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17 \
  --silero-vad-model-dir=./sherpa-ncnn-silero-vad \
  --num-threads=1

Hint

You can use ./build/bin/sherpa-ncnn-pa-devs to list all microphone devices.

The output of the command:

./build/bin/sherpa-ncnn-pa-devs

is given below:

PortAudio version: 0x00130700
Version text: 'PortAudio V19.7.0-devel, revision 147dd722548358763a8b649b3e4b41dfffbcfbb6'
Number of devices = 5
--------------------------------------- device #0
Name                        = Background Music
Host API                    = Core Audio
Max inputs = 2, Max outputs = 2
Default low input latency   =   0.0100
Default low output latency  =   0.0015
Default high input latency  =   0.1000
Default high output latency =   0.0116
Default sample rate         = 44100.00
Supported standard sample rates
 for half-duplex 16 bit 2 channel input = 
	 8000.00,  9600.00, 11025.00, 12000.00,
	16000.00, 22050.00, 24000.00, 32000.00,
	44100.00, 48000.00, 88200.00, 96000.00,
	192000.00
Supported standard sample rates
 for half-duplex 16 bit 2 channel output = 
	 8000.00,  9600.00, 11025.00, 12000.00,
	16000.00, 22050.00, 24000.00, 32000.00,
	44100.00, 48000.00, 88200.00, 96000.00,
	192000.00
Supported standard sample rates
 for full-duplex 16 bit 2 channel input, 2 channel output = 
	 8000.00,  9600.00, 11025.00, 12000.00,
	16000.00, 22050.00, 24000.00, 32000.00,
	44100.00, 48000.00, 88200.00, 96000.00,
	192000.00
--------------------------------------- device #1
Name                        = Background Music (UI Sounds)
Host API                    = Core Audio
Max inputs = 2, Max outputs = 2
Default low input latency   =   0.0100
Default low output latency  =   0.0015
Default high input latency  =   0.1000
Default high output latency =   0.0116
Default sample rate         = 44100.00
Supported standard sample rates
 for half-duplex 16 bit 2 channel input = 
	 8000.00,  9600.00, 11025.00, 12000.00,
	16000.00, 22050.00, 24000.00, 32000.00,
	44100.00, 48000.00, 88200.00, 96000.00,
	192000.00
Supported standard sample rates
 for half-duplex 16 bit 2 channel output = 
	 8000.00,  9600.00, 11025.00, 12000.00,
	16000.00, 22050.00, 24000.00, 32000.00,
	44100.00, 48000.00, 88200.00, 96000.00,
	192000.00
Supported standard sample rates
 for full-duplex 16 bit 2 channel input, 2 channel output = 
	 8000.00,  9600.00, 11025.00, 12000.00,
	16000.00, 22050.00, 24000.00, 32000.00,
	44100.00, 48000.00, 88200.00, 96000.00,
	192000.00
--------------------------------------- device #2
[ Default Input ]
Name                        = MacBook Pro Microphone
Host API                    = Core Audio
Max inputs = 1, Max outputs = 0
Default low input latency   =   0.0345
Default low output latency  =   0.0100
Default high input latency  =   0.0439
Default high output latency =   0.1000
Default sample rate         = 48000.00
Supported standard sample rates
 for half-duplex 16 bit 1 channel input = 
	 8000.00,  9600.00, 11025.00, 12000.00,
	16000.00, 22050.00, 24000.00, 32000.00,
	44100.00, 48000.00, 88200.00, 96000.00,
	192000.00
--------------------------------------- device #3
[ Default Output ]
Name                        = MacBook Pro Speakers
Host API                    = Core Audio
Max inputs = 0, Max outputs = 2
Default low input latency   =   0.0100
Default low output latency  =   0.0120
Default high input latency  =   0.1000
Default high output latency =   0.0214
Default sample rate         = 48000.00
Supported standard sample rates
 for half-duplex 16 bit 2 channel output = 
	 8000.00,  9600.00, 11025.00, 12000.00,
	16000.00, 22050.00, 24000.00, 32000.00,
	44100.00, 48000.00, 88200.00, 96000.00,
	192000.00
--------------------------------------- device #4
Name                        = WeMeet Audio Device
Host API                    = Core Audio
Max inputs = 2, Max outputs = 2
Default low input latency   =   0.0100
Default low output latency  =   0.0013
Default high input latency  =   0.1000
Default high output latency =   0.0107
Default sample rate         = 48000.00
Supported standard sample rates
 for half-duplex 16 bit 2 channel input = 
	 8000.00,  9600.00, 11025.00, 12000.00,
	16000.00, 22050.00, 24000.00, 32000.00,
	44100.00, 48000.00, 88200.00, 96000.00,
	192000.00
Supported standard sample rates
 for half-duplex 16 bit 2 channel output = 
	 8000.00,  9600.00, 11025.00, 12000.00,
	16000.00, 22050.00, 24000.00, 32000.00,
	44100.00, 48000.00, 88200.00, 96000.00,
	192000.00
Supported standard sample rates
 for full-duplex 16 bit 2 channel input, 2 channel output = 
	 8000.00,  9600.00, 11025.00, 12000.00,
	16000.00, 22050.00, 24000.00, 32000.00,
	44100.00, 48000.00, 88200.00, 96000.00,
	192000.00
----------------------------------------------

Hint

If you want to use device #2 with sample rate 48000, please run:

./build/bin/sherpa-ncnn-vad-microphone-simulated-streaming-asr \
  --mic-device-index=2 \
  --mic-sample-rate=48000 \
  --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt \
  --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17 \
  --silero-vad-model-dir=./sherpa-ncnn-silero-vad \
  --num-threads=1

Speed test on RK3588 CPU

RTF of SenseVoice
in sherpa-ncnn

1 thread

2 threads

3 threads

4 threads

Cortex A55
(fp16 quantization)

0.584

0.320

0.231

0.188

Cortex A55
(int8 quantization)

0.346

0.202

0.152

0.126

Cortex A76
(fp16 quantization)

0.142

0.079

0.063

0.049

Cortex A76
(int8 quantization)

0.097

0.062

0.045

0.035

See also Speed test on RK3588 CPU for sherpa-onnx.

Cortex A55

# 1 cortex A55 CPU
taskset 0x01 ./build/bin/sherpa-ncnn-offline \
  --num-threads=1 \
  --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt \
  --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17 \
  ./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/test_wavs/zh.wav

# 2 cortex A55 CPUs
taskset 0x03 ./build/bin/sherpa-ncnn-offline \
  --num-threads=2 \
  --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt \
  --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17 \
  ./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/test_wavs/zh.wav

# 3 cortex A55 CPUs
taskset 0x07 ./build/bin/sherpa-ncnn-offline \
  --num-threads=3 \
  --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt \
  --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17 \
  ./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/test_wavs/zh.wav

# 4 cortex A55 CPUs
taskset 0x0f ./build/bin/sherpa-ncnn-offline \
  --num-threads=4 \
  --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt \
  --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17 \
  ./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/test_wavs/zh.wav

# For int8 models, please use
taskset 0x01 ./build/bin/sherpa-ncnn-offline \
  --num-threads=1 \
  --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-int8-2024-07-17/tokens.txt \
  --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-int8-2024-07-17 \
  ./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-int8-2024-07-17/test_wavs/zh.wav

Cortex A76

# 1 cortex A76 CPU
taskset 0x10 ./build/bin/sherpa-ncnn-offline \
  --num-threads=1 \
  --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt \
  --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17 \
  ./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/test_wavs/zh.wav

# 2 cortex A76 CPUs
taskset 0x30 ./build/bin/sherpa-ncnn-offline \
  --num-threads=2 \
  --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt \
  --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17 \
  ./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/test_wavs/zh.wav

# 3 cortex A76 CPUs
taskset 0x70 ./build/bin/sherpa-ncnn-offline \
  --num-threads=3 \
  --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt \
  --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17 \
  ./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/test_wavs/zh.wav

# 4 cortex A76 CPUs
taskset 0xf0 ./build/bin/sherpa-ncnn-offline \
  --num-threads=4 \
  --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt \
  --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17 \
  ./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17/test_wavs/zh.wav

sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 (Chinese, English, Japanese, Korean, Cantonese, 中英日韩粤语)

This model is converted from

It is fine-tuned on sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2024-07-17 (Chinese, English, Japanese, Korean, Cantonese, 中英日韩粤语) with 21.8k hours of Cantonese data.

It supports the following 5 languages:

  • Chinese (Mandarin, 普通话)

  • Cantonese (粤语, 广东话)

  • English

  • Japanese

  • Korean

Hint

If you want a Cantonese ASR model, please choose this model.

In the following, we describe how to use it.

Download

Please use the following commands to download it:

wget https://github.com/k2-fsa/sherpa-ncnn/releases/download/asr-models/sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09.tar.bz2
tar xvf sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09.tar.bz2
rm sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09.tar.bz2

After downloading, you should find the following files:

ls -lh sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/

total 918672
-rw-r--r--   1 fangjun  staff   131B Sep 13 19:17 README.md
-rw-r--r--   1 fangjun  staff   443M Sep 13 19:17 model.ncnn.bin
-rw-r--r--   1 fangjun  staff   162K Sep 13 19:17 model.ncnn.param
drwxr-xr-x  23 fangjun  staff   736B Sep 13 19:17 test_wavs
-rw-r--r--   1 fangjun  staff   308K Sep 13 19:17 tokens.txt
ls sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/

en.wav     yue-1.wav  yue-11.wav yue-13.wav yue-15.wav yue-17.wav yue-3.wav  yue-5.wav  yue-7.wav  yue-9.wav  zh.wav
yue-0.wav  yue-10.wav yue-12.wav yue-14.wav yue-16.wav yue-2.wav  yue-4.wav  yue-6.wav  yue-8.wav  yue.wav

Hint

If you want to use the int8 quantized model, please run:

wget https://github.com/k2-fsa/sherpa-ncnn/releases/download/asr-models/sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-int8-2025-09-09.tar.bz2
tar xvf sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-int8-2025-09-09.tar.bz2
rm sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-int8-2025-09-09.tar.bz2
ls -lh sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-int8-2025-09-09

total 461200
-rw-r--r--   1 fangjun  staff   131B Sep 17 14:25 README.md
-rw-r--r--   1 fangjun  staff   222M Sep 17 14:26 model.ncnn.bin
-rw-r--r--   1 fangjun  staff   158K Sep 17 14:26 model.ncnn.param
drwxr-xr-x  23 fangjun  staff   736B Sep 17 14:25 test_wavs
-rw-r--r--   1 fangjun  staff   308K Sep 17 14:25 tokens.txt

In the following, we show how to decode the files sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-*.wav.

yue-0.wav

Wave filename Content Ground truth
yue-0.wav 两只小企鹅都有嘢食
./build/bin/sherpa-ncnn-offline \
  --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt \
  --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 \
  --num-threads=1 \
  sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-0.wav
/Users/fangjun/open-source/sherpa-ncnn/sherpa-ncnn/csrc/parse-options.cc:Read:381 ./build/bin/sherpa-ncnn-offline --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 --num-threads=1 sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-0.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(sense_voice=OfflineSenseVoiceModelConfig(model_dir="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09", language="auto", use_itn=False), tokens="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt", num_threads=1, debug=False), decoding_method="greedy_search", blank_penalty=0)
Creating recognizer ...
Started
Done!

sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-0.wav
{"lang": "<|yue|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "两只小企鹅都有嘢食", "timestamps": [0.36, 0.60, 0.90, 1.08, 1.32, 1.74, 1.98, 2.16, 2.40], "tokens":["两", "只", "小", "企", "鹅", "都", "有", "嘢", "食"]}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.416 s
Real time factor (RTF): 0.416 / 3.072 = 0.135

yue-1.wav

Wave filename Content Ground truth
yue-1.wav 叫做诶诶直入式你个脑部里边咧记得呢一个嘅以前香港有一个广告好出名嘅佢乜嘢都冇噶净系影住喺弥敦道佢哋间铺头嘅啫但系就不停有人嗌啦平平吧平吧
./build/bin/sherpa-ncnn-offline \
  --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt \
  --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 \
  --num-threads=1 \
  sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-1.wav
/Users/fangjun/open-source/sherpa-ncnn/sherpa-ncnn/csrc/parse-options.cc:Read:381 ./build/bin/sherpa-ncnn-offline --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 --num-threads=1 sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-1.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(sense_voice=OfflineSenseVoiceModelConfig(model_dir="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09", language="auto", use_itn=False), tokens="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt", num_threads=1, debug=False), decoding_method="greedy_search", blank_penalty=0)
Creating recognizer ...
Started
Done!

sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-1.wav
{"lang": "<|yue|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "叫做诶诶直入式你个脑部里边咧记得呢一个嘅以前香港有一个广告好出名嘅佢乜嘢都冇噶净系影住喺弥敦道佢哋间铺头嘅啫但系就不停有人嗌啦平平吧平吧", "timestamps": [0.06, 0.18, 0.36, 0.72, 1.08, 1.38, 1.56, 1.86, 1.98, 2.16, 2.52, 2.76, 2.88, 3.00, 3.24, 3.36, 3.60, 3.72, 3.84, 3.96, 4.20, 4.32, 4.44, 4.62, 4.74, 4.86, 4.92, 5.04, 5.16, 5.34, 5.46, 5.58, 5.88, 6.30, 6.60, 6.78, 6.90, 7.02, 7.20, 7.50, 7.68, 7.86, 7.98, 8.16, 8.28, 8.46, 8.64, 8.88, 8.94, 9.18, 9.30, 9.48, 9.66, 9.78, 10.08, 10.14, 10.26, 10.50, 10.62, 10.80, 10.92, 11.04, 11.22, 12.00, 12.72, 13.02, 13.92, 14.16], "tokens":["叫", "做", "诶", "诶", "直", "入", "式", "你", "个", "脑", "部", "里", "边", "咧", "记", "得", "呢", "一", "个", "嘅", "以", "前", "香", "港", "有", "一", "个", "广", "告", "好", "出", "名", "嘅", "佢", "乜", "嘢", "都", "冇", "噶", "净", "系", "影", "住", "喺", "弥", "敦", "道", "佢", "哋", "间", "铺", "头", "嘅", "啫", "但", "系", "就", "不", "停", "有", "人", "嗌", "啦", "平", "平", "吧", "平", "吧"]}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 1.987 s
Real time factor (RTF): 1.987 / 15.104 = 0.132

yue-2.wav

Wave filename Content Ground truth
yue-2.wav 忽然从光线死角嘅阴影度窜出一只大猫
./build/bin/sherpa-ncnn-offline \
  --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt \
  --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 \
  --num-threads=1 \
  sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-2.wav
/Users/fangjun/open-source/sherpa-ncnn/sherpa-ncnn/csrc/parse-options.cc:Read:381 ./build/bin/sherpa-ncnn-offline --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 --num-threads=1 sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-2.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(sense_voice=OfflineSenseVoiceModelConfig(model_dir="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09", language="auto", use_itn=False), tokens="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt", num_threads=1, debug=False), decoding_method="greedy_search", blank_penalty=0)
Creating recognizer ...
Started
Done!

sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-2.wav
{"lang": "<|yue|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "忽然从光线死角嘅阴影度窜出一只大猫", "timestamps": [0.36, 0.54, 0.96, 1.26, 1.50, 1.80, 2.04, 2.22, 2.34, 2.52, 2.76, 3.12, 3.30, 3.48, 3.60, 3.78, 3.90], "tokens":["忽", "然", "从", "光", "线", "死", "角", "嘅", "阴", "影", "度", "窜", "出", "一", "只", "大", "猫"]}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.533 s
Real time factor (RTF): 0.533 / 4.608 = 0.116

yue-3.wav

Wave filename Content Ground truth
yue-3.wav 今日我带大家去见识一位九零后嘅靓仔咧
./build/bin/sherpa-ncnn-offline \
  --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt \
  --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 \
  --num-threads=1 \
  sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-3.wav
/Users/fangjun/open-source/sherpa-ncnn/sherpa-ncnn/csrc/parse-options.cc:Read:381 ./build/bin/sherpa-ncnn-offline --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 --num-threads=1 sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-3.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(sense_voice=OfflineSenseVoiceModelConfig(model_dir="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09", language="auto", use_itn=False), tokens="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt", num_threads=1, debug=False), decoding_method="greedy_search", blank_penalty=0)
Creating recognizer ...
Started
Done!

sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-3.wav
{"lang": "<|yue|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "今日我带大家去见识一位九零后嘅靓仔咧", "timestamps": [0.24, 0.36, 0.60, 0.72, 1.02, 1.14, 1.44, 1.74, 1.92, 2.10, 2.22, 2.52, 2.76, 2.94, 3.18, 3.30, 3.48, 3.78], "tokens":["今", "日", "我", "带", "大", "家", "去", "见", "识", "一", "位", "九", "零", "后", "嘅", "靓", "仔", "咧"]}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.590 s
Real time factor (RTF): 0.590 / 4.352 = 0.136

yue-4.wav

Wave filename Content Ground truth
yue-4.wav 香港嘅消费市场从此不一样
./build/bin/sherpa-ncnn-offline \
  --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt \
  --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 \
  --num-threads=1 \
  sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-4.wav
/Users/fangjun/open-source/sherpa-ncnn/sherpa-ncnn/csrc/parse-options.cc:Read:381 ./build/bin/sherpa-ncnn-offline --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 --num-threads=1 sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-4.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(sense_voice=OfflineSenseVoiceModelConfig(model_dir="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09", language="auto", use_itn=False), tokens="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt", num_threads=1, debug=False), decoding_method="greedy_search", blank_penalty=0)
Creating recognizer ...
Started
Done!

sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-4.wav
{"lang": "<|yue|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "香港嘅消费市场从此不一样", "timestamps": [0.36, 0.54, 0.72, 0.90, 1.08, 1.38, 1.56, 1.92, 2.10, 2.40, 2.58, 2.76], "tokens":["香", "港", "嘅", "消", "费", "市", "场", "从", "此", "不", "一", "样"]}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.398 s
Real time factor (RTF): 0.398 / 3.200 = 0.124

yue-5.wav

Wave filename Content Ground truth
yue-5.wav 景天谂唔到呢个守门嘅弟子竟然咁无礼霎时间面色都变埋
./build/bin/sherpa-ncnn-offline \
  --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt \
  --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 \
  --num-threads=1 \
  sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-5.wav
/Users/fangjun/open-source/sherpa-ncnn/sherpa-ncnn/csrc/parse-options.cc:Read:381 ./build/bin/sherpa-ncnn-offline --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 --num-threads=1 sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-5.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(sense_voice=OfflineSenseVoiceModelConfig(model_dir="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09", language="auto", use_itn=False), tokens="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt", num_threads=1, debug=False), decoding_method="greedy_search", blank_penalty=0)
Creating recognizer ...
Started
Done!

sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-5.wav
{"lang": "<|yue|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "景天谂唔到呢个守门嘅弟子竟然咁无礼霎时间面色都变埋", "timestamps": [0.42, 0.60, 0.96, 1.14, 1.20, 1.38, 1.50, 1.62, 1.86, 2.04, 2.22, 2.34, 3.00, 3.24, 3.42, 3.84, 4.08, 4.80, 5.16, 5.34, 5.58, 5.82, 6.06, 6.24, 6.42], "tokens":["景", "天", "谂", "唔", "到", "呢", "个", "守", "门", "嘅", "弟", "子", "竟", "然", "咁", "无", "礼", "霎", "时", "间", "面", "色", "都", "变", "埋"]}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.824 s
Real time factor (RTF): 0.824 / 7.168 = 0.115

yue-6.wav

Wave filename Content Ground truth
yue-6.wav 六个星期嘅课程包括六堂课同两个测验你唔掌握到基本嘅十九个声母五十六个韵母同九个声调我哋仲针对咗广东话学习者会遇到嘅大樽颈啊以国语为母语人士最难掌握嘅五大韵母教课书唔会教你嘅七种变音同十种变调说话生硬唔自然嘅根本性问题提供全新嘅学习方向等你突破难关
./build/bin/sherpa-ncnn-offline \
  --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt \
  --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 \
  --num-threads=1 \
  sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-6.wav
/Users/fangjun/open-source/sherpa-ncnn/sherpa-ncnn/csrc/parse-options.cc:Read:381 ./build/bin/sherpa-ncnn-offline --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 --num-threads=1 sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-6.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(sense_voice=OfflineSenseVoiceModelConfig(model_dir="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09", language="auto", use_itn=False), tokens="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt", num_threads=1, debug=False), decoding_method="greedy_search", blank_penalty=0)
Creating recognizer ...
Started
Done!

sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-6.wav
{"lang": "<|yue|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "六个星期嘅课程包括六堂课同两个测验你只掌握到基本嘅十九个声母五十六个韵母同九个声调我哋仲针对咗广东话学习者会遇到嘅大樽颈啊以国语为母语人士最难掌握嘅五大韵母教课书唔会教你嘅七种变音同十种变调说话生硬唔自然嘅根本性问题提供全新嘅学习方向等你突破难关", "timestamps": [0.36, 0.66, 0.84, 1.08, 1.26, 1.44, 1.68, 2.16, 2.34, 2.58, 2.76, 2.94, 3.36, 3.60, 3.78, 4.02, 4.26, 4.80, 5.16, 5.40, 5.52, 5.70, 5.94, 6.06, 6.30, 6.54, 6.78, 6.96, 7.08, 7.32, 7.68, 7.80, 7.98, 8.10, 8.28, 8.52, 8.88, 9.12, 9.36, 9.54, 9.72, 10.14, 10.26, 10.44, 10.56, 10.74, 10.92, 11.22, 11.34, 11.52, 11.70, 11.82, 12.00, 12.42, 12.66, 12.84, 13.02, 13.44, 13.74, 13.98, 14.22, 14.52, 14.82, 15.00, 15.24, 15.42, 15.60, 15.84, 15.90, 16.32, 16.62, 16.86, 17.10, 17.28, 17.64, 17.82, 18.06, 18.30, 18.78, 19.02, 19.20, 19.50, 19.62, 19.80, 19.98, 20.16, 20.34, 20.58, 20.82, 21.00, 21.30, 21.54, 21.78, 22.02, 22.20, 22.98, 23.28, 23.52, 23.70, 24.18, 24.36, 24.60, 24.78, 25.14, 25.38, 25.68, 25.92, 26.04, 26.52, 26.70, 27.00, 27.18, 27.42, 27.60, 27.72, 27.90, 28.08, 28.50, 28.74, 29.28, 29.46, 29.76, 29.94], "tokens":["六", "个", "星", "期", "嘅", "课", "程", "包", "括", "六", "堂", "课", "同", "两", "个", "测", "验", "你", "只", "掌", "握", "到", "基", "本", "嘅", "十", "九", "个", "声", "母", "五", "十", "六", "个", "韵", "母", "同", "九", "个", "声", "调", "我", "哋", "仲", "针", "对", "咗", "广", "东", "话", "学", "习", "者", "会", "遇", "到", "嘅", "大", "樽", "颈", "啊", "以", "国", "语", "为", "母", "语", "人", "士", "最", "难", "掌", "握", "嘅", "五", "大", "韵", "母", "教", "课", "书", "唔", "会", "教", "你", "嘅", "七", "种", "变", "音", "同", "十", "种", "变", "调", "说", "话", "生", "硬", "唔", "自", "然", "嘅", "根", "本", "性", "问", "题", "提", "供", "全", "新", "嘅", "学", "习", "方", "向", "等", "你", "突", "破", "难", "关"]}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 3.758 s
Real time factor (RTF): 3.758 / 30.592 = 0.123

yue-7.wav

Wave filename Content Ground truth
yue-7.wav 同意嘅累积唔系阴同阳嘅累积可以讲三既融合咗一同意融合咗阴同阳
./build/bin/sherpa-ncnn-offline \
  --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt \
  --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 \
  --num-threads=1 \
  sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-7.wav
/Users/fangjun/open-source/sherpa-ncnn/sherpa-ncnn/csrc/parse-options.cc:Read:381 ./build/bin/sherpa-ncnn-offline --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 --num-threads=1 sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-7.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(sense_voice=OfflineSenseVoiceModelConfig(model_dir="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09", language="auto", use_itn=False), tokens="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt", num_threads=1, debug=False), decoding_method="greedy_search", blank_penalty=0)
Creating recognizer ...
Started
Done!

sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-7.wav
{"lang": "<|yue|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "同二嘅累积唔系阴同阳嘅累积可以讲三既融合咗一同二融合咗阴同阳", "timestamps": [0.48, 0.84, 1.20, 1.38, 1.56, 2.52, 2.70, 3.00, 3.42, 3.66, 3.96, 4.20, 4.38, 5.40, 5.76, 6.00, 6.78, 7.86, 8.28, 8.46, 8.70, 9.24, 9.72, 10.08, 11.28, 11.46, 11.70, 12.12, 12.54, 12.78], "tokens":["同", "二", "嘅", "累", "积", "唔", "系", "阴", "同", "阳", "嘅", "累", "积", "可", "以", "讲", "三", "既", "融", "合", "咗", "一", "同", "二", "融", "合", "咗", "阴", "同", "阳"]}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 1.642 s
Real time factor (RTF): 1.642 / 13.900 = 0.118

yue-8.wav

Wave filename Content Ground truth
yue-8.wav 而较早前已经复航嘅氹仔北安码头星期五开始增设夜间航班不过两个码头暂时都冇凌晨班次有旅客希望尽快恢复可以留喺澳门长啲时间
./build/bin/sherpa-ncnn-offline \
  --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt \
  --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 \
  --num-threads=1 \
  sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-8.wav
/Users/fangjun/open-source/sherpa-ncnn/sherpa-ncnn/csrc/parse-options.cc:Read:381 ./build/bin/sherpa-ncnn-offline --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 --num-threads=1 sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-8.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(sense_voice=OfflineSenseVoiceModelConfig(model_dir="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09", language="auto", use_itn=False), tokens="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt", num_threads=1, debug=False), decoding_method="greedy_search", blank_penalty=0)
Creating recognizer ...
Started
Done!

sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-8.wav
{"lang": "<|yue|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "而较早前已经复航嘅氹仔北安码头星期五开始增设夜间航班不过两个码头暂时都冇凌晨班次有旅客希望尽快恢复可以留喺澳门长啲时间", "timestamps": [0.30, 0.54, 0.72, 0.90, 1.14, 1.26, 1.50, 1.68, 1.86, 2.04, 2.28, 2.58, 2.70, 3.00, 3.12, 3.42, 3.60, 3.78, 4.02, 4.14, 4.44, 4.62, 4.92, 5.04, 5.28, 5.40, 6.12, 6.36, 6.60, 6.78, 6.96, 7.14, 7.44, 7.62, 7.80, 7.98, 8.16, 8.34, 8.58, 8.76, 9.54, 9.72, 9.90, 10.14, 10.26, 10.50, 10.62, 10.92, 11.10, 11.58, 11.70, 11.94, 12.06, 12.30, 12.48, 12.78, 12.96, 13.20, 13.44], "tokens":["而", "较", "早", "前", "已", "经", "复", "航", "嘅", "氹", "仔", "北", "安", "码", "头", "星", "期", "五", "开", "始", "增", "设", "夜", "间", "航", "班", "不", "过", "两", "个", "码", "头", "暂", "时", "都", "冇", "凌", "晨", "班", "次", "有", "旅", "客", "希", "望", "尽", "快", "恢", "复", "可", "以", "留", "喺", "澳", "门", "长", "啲", "时", "间"]}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 1.655 s
Real time factor (RTF): 1.655 / 14.080 = 0.118

yue-9.wav

Wave filename Content Ground truth
yue-9.wav 刘备仲马鞭一指蜀兵一齐掩杀过去打到吴兵大败唉刘备八路兵马以雷霆万钧之势啊杀到吴兵啊尸横遍野血流成河
./build/bin/sherpa-ncnn-offline \
  --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt \
  --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 \
  --num-threads=1 \
  sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-9.wav
/Users/fangjun/open-source/sherpa-ncnn/sherpa-ncnn/csrc/parse-options.cc:Read:381 ./build/bin/sherpa-ncnn-offline --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 --num-threads=1 sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-9.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(sense_voice=OfflineSenseVoiceModelConfig(model_dir="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09", language="auto", use_itn=False), tokens="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt", num_threads=1, debug=False), decoding_method="greedy_search", blank_penalty=0)
Creating recognizer ...
Started
Done!

sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-9.wav
{"lang": "<|yue|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "刘备仲马鞭得指蜀兵一齐掩杀过去打到吴兵大败唉刘备八路兵马以雷霆万军之势啊杀到吴兵啊尸横遍野血流成河", "timestamps": [0.30, 0.48, 0.72, 0.90, 1.14, 1.32, 1.44, 2.22, 2.58, 2.88, 3.06, 3.42, 3.60, 3.90, 3.96, 4.32, 4.50, 4.68, 4.92, 5.28, 5.46, 6.06, 6.60, 6.84, 7.26, 7.56, 7.74, 7.98, 8.58, 8.88, 9.12, 9.36, 9.60, 9.84, 10.08, 10.26, 10.38, 10.56, 10.80, 10.98, 11.22, 11.52, 12.12, 12.36, 12.66, 12.90, 13.14, 13.32, 13.50], "tokens":["刘", "备", "仲", "马", "鞭", "得", "指", "蜀", "兵", "一", "齐", "掩", "杀", "过", "去", "打", "到", "吴", "兵", "大", "败", "唉", "刘", "备", "八", "路", "兵", "马", "以", "雷", "霆", "万", "军", "之", "势", "啊", "杀", "到", "吴", "兵", "啊", "尸", "横", "遍", "野", "血", "流", "成", "河"]}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 1.663 s
Real time factor (RTF): 1.663 / 14.336 = 0.116

yue-10.wav

Wave filename Content Ground truth
yue-10.wav 原来王力宏咧系佢家中里面咧成就最低个吓哇
./build/bin/sherpa-ncnn-offline \
  --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt \
  --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 \
  --num-threads=1 \
  sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-10.wav
/Users/fangjun/open-source/sherpa-ncnn/sherpa-ncnn/csrc/parse-options.cc:Read:381 ./build/bin/sherpa-ncnn-offline --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 --num-threads=1 sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-10.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(sense_voice=OfflineSenseVoiceModelConfig(model_dir="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09", language="auto", use_itn=False), tokens="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt", num_threads=1, debug=False), decoding_method="greedy_search", blank_penalty=0)
Creating recognizer ...
Started
Done!

sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-10.wav
{"lang": "<|yue|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "原来王力宏咧系佢家中里边咧成就最低个吓哇", "timestamps": [0.42, 0.54, 0.90, 1.14, 1.44, 1.62, 1.80, 1.92, 2.16, 2.34, 2.58, 2.70, 2.82, 3.06, 3.24, 3.54, 3.78, 4.26, 4.92, 5.76], "tokens":["原", "来", "王", "力", "宏", "咧", "系", "佢", "家", "中", "里", "边", "咧", "成", "就", "最", "低", "个", "吓", "哇"]}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.749 s
Real time factor (RTF): 0.749 / 6.656 = 0.113

yue-11.wav

Wave filename Content Ground truth
yue-11.wav 无论你提出任何嘅要求
./build/bin/sherpa-ncnn-offline \
  --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt \
  --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 \
  --num-threads=1 \
  sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-11.wav
/Users/fangjun/open-source/sherpa-ncnn/sherpa-ncnn/csrc/parse-options.cc:Read:381 ./build/bin/sherpa-ncnn-offline --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 --num-threads=1 sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-11.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(sense_voice=OfflineSenseVoiceModelConfig(model_dir="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09", language="auto", use_itn=False), tokens="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt", num_threads=1, debug=False), decoding_method="greedy_search", blank_penalty=0)
Creating recognizer ...
Started
Done!

sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-11.wav
{"lang": "<|yue|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "无论你提出任何嘅要求", "timestamps": [0.48, 0.60, 0.78, 1.02, 1.14, 1.32, 1.50, 1.68, 1.86, 2.10], "tokens":["无", "论", "你", "提", "出", "任", "何", "嘅", "要", "求"]}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.354 s
Real time factor (RTF): 0.354 / 2.688 = 0.132

yue-12.wav

Wave filename Content Ground truth
yue-12.wav 咁咁多样材料咁我哋首先第一步处理咗一件
./build/bin/sherpa-ncnn-offline \
  --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt \
  --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 \
  --num-threads=1 \
  sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-12.wav
/Users/fangjun/open-source/sherpa-ncnn/sherpa-ncnn/csrc/parse-options.cc:Read:381 ./build/bin/sherpa-ncnn-offline --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 --num-threads=1 sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-12.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(sense_voice=OfflineSenseVoiceModelConfig(model_dir="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09", language="auto", use_itn=False), tokens="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt", num_threads=1, debug=False), decoding_method="greedy_search", blank_penalty=0)
Creating recognizer ...
Started
Done!

sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-12.wav
{"lang": "<|yue|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "咁咁多样材料咁我哋首先第一步处理咗一件", "timestamps": [0.30, 0.72, 0.90, 1.14, 1.38, 1.56, 1.92, 2.10, 2.22, 2.34, 2.58, 2.88, 3.00, 3.18, 3.60, 3.84, 4.02, 4.14, 4.26], "tokens":["咁", "咁", "多", "样", "材", "料", "咁", "我", "哋", "首", "先", "第", "一", "步", "处", "理", "咗", "一", "件"]}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.592 s
Real time factor (RTF): 0.592 / 4.864 = 0.122

yue-13.wav

Wave filename Content Ground truth
yue-13.wav 啲点样对于佢哋嘅服务态度啊不透过呢一年左右嘅时间啦其实大家都静一静啦咁你就会见到香港嘅经济其实
./build/bin/sherpa-ncnn-offline \
  --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt \
  --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 \
  --num-threads=1 \
  sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-13.wav
/Users/fangjun/open-source/sherpa-ncnn/sherpa-ncnn/csrc/parse-options.cc:Read:381 ./build/bin/sherpa-ncnn-offline --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 --num-threads=1 sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-13.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(sense_voice=OfflineSenseVoiceModelConfig(model_dir="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09", language="auto", use_itn=False), tokens="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt", num_threads=1, debug=False), decoding_method="greedy_search", blank_penalty=0)
Creating recognizer ...
Started
Done!

sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-13.wav
{"lang": "<|yue|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "啲点样对于佢哋嘅服务态度啊希透过呢一年左右嘅时间啦其实大家都静一静啦咁你就会见到香港嘅经济其实", "timestamps": [0.00, 0.24, 0.42, 0.72, 0.84, 1.08, 1.20, 1.68, 2.16, 2.34, 2.58, 2.76, 2.94, 3.24, 3.54, 3.72, 4.02, 4.32, 4.50, 4.80, 4.98, 5.16, 5.34, 5.52, 5.70, 6.06, 6.24, 6.48, 6.60, 6.78, 7.02, 7.20, 7.38, 7.56, 7.92, 8.16, 8.34, 8.52, 8.70, 8.82, 9.00, 9.18, 9.36, 9.48, 9.66, 9.96, 10.14], "tokens":["啲", "点", "样", "对", "于", "佢", "哋", "嘅", "服", "务", "态", "度", "啊", "希", "透", "过", "呢", "一", "年", "左", "右", "嘅", "时", "间", "啦", "其", "实", "大", "家", "都", "静", "一", "静", "啦", "咁", "你", "就", "会", "见", "到", "香", "港", "嘅", "经", "济", "其", "实"]}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 1.270 s
Real time factor (RTF): 1.270 / 10.624 = 0.120

yue-14.wav

Wave filename Content Ground truth
yue-14.wav 就即刻会同贵正两位八代长老带埋五名七代弟子前啲灵蛇岛想话生擒谢信抢咗屠龙宝刀翻嚟献俾帮主嘅
./build/bin/sherpa-ncnn-offline \
  --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt \
  --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 \
  --num-threads=1 \
  sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-14.wav
/Users/fangjun/open-source/sherpa-ncnn/sherpa-ncnn/csrc/parse-options.cc:Read:381 ./build/bin/sherpa-ncnn-offline --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 --num-threads=1 sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-14.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(sense_voice=OfflineSenseVoiceModelConfig(model_dir="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09", language="auto", use_itn=False), tokens="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt", num_threads=1, debug=False), decoding_method="greedy_search", blank_penalty=0)
Creating recognizer ...
Started
Done!

sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-14.wav
{"lang": "<|yue|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "就即刻会同贵正两位八代长老带埋五名七代弟子前啲灵蛇岛想话生擒谢信抢咗屠龙宝刀翻嚟献俾帮主嘅", "timestamps": [0.18, 0.36, 0.48, 0.72, 0.84, 1.20, 1.44, 1.74, 1.92, 2.10, 2.28, 2.52, 2.76, 3.60, 3.84, 4.14, 4.32, 4.56, 4.80, 5.04, 5.22, 5.88, 6.12, 6.24, 6.42, 6.78, 7.68, 7.92, 8.16, 8.52, 8.82, 9.18, 9.96, 10.26, 10.38, 10.62, 10.86, 11.10, 11.22, 11.40, 11.64, 11.88, 12.18, 12.30, 12.66], "tokens":["就", "即", "刻", "会", "同", "贵", "正", "两", "位", "八", "代", "长", "老", "带", "埋", "五", "名", "七", "代", "弟", "子", "前", "啲", "灵", "蛇", "岛", "想", "话", "生", "擒", "谢", "信", "抢", "咗", "屠", "龙", "宝", "刀", "翻", "嚟", "献", "俾", "帮", "主", "嘅"]}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 1.577 s
Real time factor (RTF): 1.577 / 13.056 = 0.121

yue-15.wav

Wave filename Content Ground truth
yue-15.wav 我知道我的观众大部分都是对广东话有兴趣想学广东话的人
./build/bin/sherpa-ncnn-offline \
  --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt \
  --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 \
  --num-threads=1 \
  sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-15.wav
/Users/fangjun/open-source/sherpa-ncnn/sherpa-ncnn/csrc/parse-options.cc:Read:381 ./build/bin/sherpa-ncnn-offline --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 --num-threads=1 sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-15.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(sense_voice=OfflineSenseVoiceModelConfig(model_dir="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09", language="auto", use_itn=False), tokens="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt", num_threads=1, debug=False), decoding_method="greedy_search", blank_penalty=0)
Creating recognizer ...
Started
Done!

sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-15.wav
{"lang": "<|yue|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "我知道我嘅观众大部分都系对广东话有兴趣想学广东话嘅人", "timestamps": [0.42, 0.54, 0.66, 0.84, 1.02, 1.20, 1.38, 1.98, 2.22, 2.40, 2.64, 2.76, 2.88, 3.12, 3.24, 3.42, 3.60, 3.78, 4.02, 4.62, 4.92, 5.16, 5.34, 5.52, 5.70, 5.94], "tokens":["我", "知", "道", "我", "嘅", "观", "众", "大", "部", "分", "都", "系", "对", "广", "东", "话", "有", "兴", "趣", "想", "学", "广", "东", "话", "嘅", "人"]}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.753 s
Real time factor (RTF): 0.753 / 6.400 = 0.118

yue-16.wav

Wave filename Content Ground truth
yue-16.wav 诶原来啊我哋中国人呢讲究物极必反
./build/bin/sherpa-ncnn-offline \
  --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt \
  --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 \
  --num-threads=1 \
  sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-16.wav
/Users/fangjun/open-source/sherpa-ncnn/sherpa-ncnn/csrc/parse-options.cc:Read:381 ./build/bin/sherpa-ncnn-offline --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 --num-threads=1 sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-16.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(sense_voice=OfflineSenseVoiceModelConfig(model_dir="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09", language="auto", use_itn=False), tokens="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt", num_threads=1, debug=False), decoding_method="greedy_search", blank_penalty=0)
Creating recognizer ...
Started
Done!

sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-16.wav
{"lang": "<|yue|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "原来啊我哋中国人呢讲究密极必反", "timestamps": [1.92, 2.04, 2.22, 2.64, 2.76, 2.94, 3.12, 3.36, 3.48, 3.72, 3.84, 4.02, 4.20, 4.44, 4.62], "tokens":["原", "来", "啊", "我", "哋", "中", "国", "人", "呢", "讲", "究", "密", "极", "必", "反"]}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.650 s
Real time factor (RTF): 0.650 / 5.700 = 0.114

yue-17.wav

Wave filename Content Ground truth
yue-17.wav 如果东边道建成咁丹东呢就会成为最近嘅出海港同埋经过哈大线出海相比绥分河则会减少运渠三百五十六公里
./build/bin/sherpa-ncnn-offline \
  --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt \
  --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 \
  --num-threads=1 \
  sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-17.wav
/Users/fangjun/open-source/sherpa-ncnn/sherpa-ncnn/csrc/parse-options.cc:Read:381 ./build/bin/sherpa-ncnn-offline --tokens=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt --sense-voice-model-dir=./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09 --num-threads=1 sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-17.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(sense_voice=OfflineSenseVoiceModelConfig(model_dir="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09", language="auto", use_itn=False), tokens="./sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/tokens.txt", num_threads=1, debug=False), decoding_method="greedy_search", blank_penalty=0)
Creating recognizer ...
Started
Done!

sherpa-ncnn-sense-voice-zh-en-ja-ko-yue-2025-09-09/test_wavs/yue-17.wav
{"lang": "<|yue|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "如果东边道建成咁丹东呢就会成为最近嘅出海港同埋经过哈大线出海相比绥分河将会减少运渠三百五十六公里", "timestamps": [0.48, 0.60, 0.84, 0.96, 1.20, 1.50, 1.74, 2.58, 3.00, 3.18, 3.36, 3.78, 4.02, 4.20, 4.32, 4.56, 4.74, 4.86, 5.04, 5.22, 5.46, 6.36, 6.54, 6.78, 6.90, 7.08, 7.32, 7.50, 7.80, 7.92, 8.16, 8.34, 9.24, 9.54, 9.84, 10.26, 10.50, 10.74, 10.86, 11.22, 11.40, 11.82, 12.12, 12.30, 12.48, 12.60, 12.84, 13.02], "tokens":["如", "果", "东", "边", "道", "建", "成", "咁", "丹", "东", "呢", "就", "会", "成", "为", "最", "近", "嘅", "出", "海", "港", "同", "埋", "经", "过", "哈", "大", "线", "出", "海", "相", "比", "绥", "分", "河", "将", "会", "减", "少", "运", "渠", "三", "百", "五", "十", "六", "公", "里"]}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 1.824 s
Real time factor (RTF): 1.824 / 13.800 = 0.132