Zipformer CTC models

This page lists non-streaming Zipformer CTC models from icefall.

sherpa-onnx-zipformer-ctc-zh-int8-2025-07-03 (Chinese)

This model supports only Chinese. Its word error rate (WER) on aishell and wenetspeech is given below:

aishell test

wenetspeech test_net

wenetspeech test_meeting

Word error rate (%)

1.74

5.92

7.75

In the following, we describe how to download it and use it with sherpa-onnx.

Pre-built Android APK

APP

Download (arm64-v8a)

国内镜像

Source code

Simulated-streaming ASR

URL

URL

URL

VAD + ASR

URL

URL

URL

Hint

Please always download the latest version. We use v1.12.3 in the above table as an example.

See Pre-built APKs for more Android pre-built APKs.

WebAssembly example (ASR from a microphone)

URL

Huggingface space

https://huggingface.co/spaces/k2-fsa/web-assembly-vad-asr-sherpa-onnx-zh-zipformer-ctc

国内镜像

https://hf.qhduan.com/spaces/k2-fsa/web-assembly-vad-asr-sherpa-onnx-zh-zipformer-ctc

ModelScope space

https://modelscope.cn/studios/csukuangfj/web-assembly-vad-asr-sherpa-onnx-zh-zipformer-ctc/summary

screenshot

Hint

Source code for the WebAssembly example can be found at

Huggingface space (Decode a file)

URL

Huggingface space

https://huggingface.co/spaces/k2-fsa/automatic-speech-recognition

国内镜像

https://hf.qhduan.com/spaces/k2-fsa/automatic-speech-recognition

screenshot

Download the model

Please use the following commands to download it.

cd /path/to/sherpa-onnx

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-zipformer-ctc-zh-int8-2025-07-03.tar.bz2

tar xvf sherpa-onnx-zipformer-ctc-zh-int8-2025-07-03.tar.bz2
rm sherpa-onnx-zipformer-ctc-zh-int8-2025-07-03.tar.bz2

ls -lh sherpa-onnx-zipformer-ctc-zh-int8-2025-07-03

Please check that the file sizes of the pre-trained models are correct. See the sizes of each file below.

total 722384
-rw-r--r--  1 fangjun  staff   176B Jul  3 14:36 README.md
-rw-r--r--  1 fangjun  staff   249K Jul  3 14:36 bbpe.model
-rw-r--r--  1 fangjun  staff   350M Jul  3 14:36 model.int8.onnx
drwxr-xr-x  5 fangjun  staff   160B Jul  3 14:36 test_wavs
-rw-r--r--  1 fangjun  staff    13K Jul  3 14:36 tokens.txt

Note

There are also fp16 and float32 models.

Download address

Comment

fp16

URL

float16 quantized. Suitable for GPU.

float32

URL

Not quantized.

Decode wave files

Hint

It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.

The following code shows how to use the int8 model to decode wave files.

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-offline \
  --zipformer-ctc-model=./sherpa-onnx-zipformer-ctc-zh-int8-2025-07-03/model.int8.onnx \
  --tokens=./sherpa-onnx-zipformer-ctc-zh-int8-2025-07-03/tokens.txt \
  sherpa-onnx-zipformer-ctc-zh-int8-2025-07-03/test_wavs/0.wav

Note

Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.

Caution

If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

You should see the following output:

/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx-offline --zipformer-ctc-model=./sherpa-onnx-zipformer-ctc-zh-int8-2025-07-03/model.int8.onnx --tokens=./sherpa-onnx-zipformer-ctc-zh-int8-2025-07-03/tokens.txt sherpa-onnx-zipformer-ctc-zh-int8-2025-07-03/test_wavs/0.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model="./sherpa-onnx-zipformer-ctc-zh-int8-2025-07-03/model.int8.onnx"), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), telespeech_ctc="", tokens="./sherpa-onnx-zipformer-ctc-zh-int8-2025-07-03/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(dict_dir="", lexicon="", rule_fsts=""))
Creating recognizer ...
Started
Done!

sherpa-onnx-zipformer-ctc-zh-int8-2025-07-03/test_wavs/0.wav
{"lang": "", "emotion": "", "event": "", "text": "对我做了介绍那么我想说的是呢大家如果对我的研究感兴趣呢", "timestamps": [0.16, 0.32, 0.48, 0.64, 0.80, 0.96, 1.00, 1.08, 1.60, 1.76, 1.92, 2.08, 2.24, 2.40, 2.56, 2.72, 3.04, 3.20, 3.36, 3.52, 3.68, 3.76, 3.84, 3.92, 4.00, 4.16, 4.32, 4.48, 4.60, 4.68, 4.80], "tokens":["▁ƌŕş", "▁ƍĩĴ", "▁ƌĢĽ", "▁ƋŠħ", "▁ƋšĬ", "▁Ǝ", "š", "Į", "▁Ɛģň", "▁Ƌşĩ", "▁ƍĩĴ", "▁ƍĤř", "▁ƏŕŚ", "▁ƎĽĥ", "▁ƍĻŕ", "▁ƌĴŇ", "▁ƌŊō", "▁ƌŔŜ", "▁ƌŌģ", "▁ƍŃŁ", "▁ƌŕş", "▁ƍĩĴ", "▁ƎĽĥ", "▁ƎŅķ", "▁ƎŏŜ", "▁ƍĥń", "▁ƌĦŚ", "▁Ə", "Ŝ", "ň", "▁ƌĴŇ"], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 0.349 s
Real time factor (RTF): 0.349 / 5.611 = 0.062

Speech recognition from a microphone

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-microphone-offline \
  --zipformer-ctc-model=./sherpa-onnx-zipformer-ctc-zh-int8-2025-07-03/model.int8.onnx \
  --tokens=./sherpa-onnx-zipformer-ctc-zh-int8-2025-07-03/tokens.txt

Speech recognition from a microphone with VAD

cd /path/to/sherpa-onnx

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx

./build/bin/sherpa-onnx-vad-microphone-offline-asr \
  --silero-vad-model=./silero_vad.onnx \
  --zipformer-ctc-model=./sherpa-onnx-zipformer-ctc-zh-int8-2025-07-03/model.int8.onnx \
  --tokens=./sherpa-onnx-zipformer-ctc-zh-int8-2025-07-03/tokens.txt