sherpa-onnx-wenetspeech-yue-u2pp-conformer-ctc-zh-en-cantonese-int8-2025-09-10 (Cantonese, 粤语)
This model is converted from
It uses 21.8k hours of training data.
Hint
If you want a Cantonese ASR model, please choose this model
or sherpa-onnx-sense-voice-zh-en-ja-ko-yue-int8-2025-09-09 (Chinese, English, Japanese, Korean, Cantonese, 中英日韩粤语)
Huggingface space
You can visit
to try this model in your browser.
Hint
You need to first select the language Cantonese
and then select the model csukuangfj/sherpa-onnx-wenetspeech-yue-u2pp-conformer-ctc-zh-en-cantonese-int8-2025-09-10.
Android APKs
Real-time speech recognition Android APKs can be found at
Hint
Please always download the latest version.
Please search for wenetspeech_yue_u2pconformer_ctc_2025_09_10.
Download
Please use the following commands to download it:
cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-wenetspeech-yue-u2pp-conformer-ctc-zh-en-cantonese-int8-2025-09-10.tar.bz2
tar xf sherpa-onnx-wenetspeech-yue-u2pp-conformer-ctc-zh-en-cantonese-int8-2025-09-10.tar.bz2
rm sherpa-onnx-wenetspeech-yue-u2pp-conformer-ctc-zh-en-cantonese-int8-2025-09-10.tar.bz2
After downloading, you should find the following files:
ls -lh sherpa-onnx-wenetspeech-yue-u2pp-conformer-ctc-zh-en-cantonese-int8-2025-09-10/
total 263264
-rw-r--r-- 1 fangjun staff 129B Sep 10 14:18 README.md
-rw-r--r-- 1 fangjun staff 128M Sep 10 14:18 model.int8.onnx
drwxr-xr-x 22 fangjun staff 704B Sep 10 14:18 test_wavs
-rw-r--r-- 1 fangjun staff 83K Sep 10 14:18 tokens.txt
Real-time/Streaming Speech recognition from a microphone with VAD
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
./build/bin/sherpa-onnx-vad-microphone-simulated-streaming-asr \
--silero-vad-model=./silero_vad.onnx \
--tokens=./{{model_path}}/tokens.txt \
--wenet-ctc-model=./{{model_path}}/model.int8.onnx \
--num-threads=1
Decode wave files
{% for wav in wav_files %} {{ wav.filename }} {{ ‘”’ * wav.filename|length }}
| Wave filename | Content | Ground truth |
|---|---|---|
| {{ wav.filename }} | {{ wav.ground_truth }} |
./build/bin/sherpa-onnx-offline \
--tokens=./{{model_path}}/tokens.txt \
--wenet-ctc-model=./{{model_path}}/model.int8.onnx \
--num-threads=1 \
./{{model_path}}/test_wavs/{{ wav.filename }}
{% endfor %}