
sherpa-onnx-telespeech-ctc-int8-zh-2024-06-04 (支持非常多种方言)



Please use the following commands to download it.

cd /path/to/sherpa-onnx


# For Chinese users, please use the following mirror
# wget

tar xvf sherpa-onnx-telespeech-ctc-int8-zh-2024-06-04.tar.bz2
rm sherpa-onnx-telespeech-ctc-int8-zh-2024-06-04.tar.bz2

Please check that the file sizes of the pre-trained models are correct. See the file sizes of *.onnx files below.

$ ls -lh *.onnx
-rw-r--r--  1 fangjun  staff   325M Jun  4 11:56 model.int8.onnx

Decode wave files


It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-offline \
  --telespeech-ctc=./sherpa-onnx-telespeech-ctc-int8-zh-2024-06-04/model.int8.onnx \
  --tokens=./sherpa-onnx-telespeech-ctc-int8-zh-2024-06-04/tokens.txt \
  --model-type=telespeech_ctc \
  --num-threads=1 \
  ./sherpa-onnx-telespeech-ctc-int8-zh-2024-06-04/test_wavs/3-sichuan.wav \
  ./sherpa-onnx-telespeech-ctc-int8-zh-2024-06-04/test_wavs/4-tianjin.wav \


Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.


If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

You should see the following output:

/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/ ./build/bin/sherpa-onnx-offline --telespeech-ctc=./sherpa-onnx-telespeech-ctc-int8-zh-2024-06-04/model.int8.onnx --tokens=./sherpa-onnx-telespeech-ctc-int8-zh-2024-06-04/tokens.txt --model-type=telespeech_ctc --num-threads=1 ./sherpa-onnx-telespeech-ctc-int8-zh-2024-06-04/test_wavs/3-sichuan.wav ./sherpa-onnx-telespeech-ctc-int8-zh-2024-06-04/test_wavs/4-tianjin.wav ./sherpa-onnx-telespeech-ctc-int8-zh-2024-06-04/test_wavs/5-henan.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), telespeech_ctc="./sherpa-onnx-telespeech-ctc-int8-zh-2024-06-04/model.int8.onnx", tokens="./sherpa-onnx-telespeech-ctc-int8-zh-2024-06-04/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="telespeech_ctc", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0)
Creating recognizer ...

{"text": "自己就是在那个在那个就是在情界里面就是感觉演得特别好就是好像很真实样知道吧", "timestamps": [0.08, 0.36, 0.52, 0.72, 0.92, 1.16, 1.36, 1.88, 2.20, 2.36, 3.16, 3.28, 3.40, 3.60, 3.80, 3.92, 4.08, 4.24, 4.40, 4.56, 4.76, 5.16, 5.32, 5.44, 5.64, 5.76, 5.88, 6.04, 6.16, 6.28, 6.40, 6.60, 6.88, 7.12, 7.40, 7.52, 7.64], "tokens":["自", "己", "就", "是", "在", "那", "个", "在", "那", "个", "就", "是", "在", "情", "界", "里", "面", "就", "是", "感", "觉", "演", "得", "特", "别", "好", "就", "是", "好", "像", "很", "真", "实", "样", "知", "道", "吧"]}
{"text": "他就每个人手法就这意思法律意识太单薄了而且就是也不顾及到别人的感受", "timestamps": [0.36, 0.56, 1.04, 1.16, 1.24, 1.64, 1.88, 2.24, 2.40, 2.60, 2.80, 3.12, 3.32, 3.64, 3.80, 3.96, 4.16, 4.44, 4.68, 4.80, 5.00, 5.16, 5.28, 6.12, 6.28, 6.44, 6.60, 6.72, 6.88, 7.04, 7.12, 7.32, 7.52], "tokens":["他", "就", "每", "个", "人", "手", "法", "就", "这", "意", "思", "法", "律", "意", "识", "太", "单", "薄", "了", "而", "且", "就", "是", "也", "不", "顾", "及", "到", "别", "人", "的", "感", "受"]}
{"text": "他这个管一都通到有时候都通到七八层楼高然它这个管箱就可交到那个那珠子上", "timestamps": [0.04, 0.12, 0.24, 0.40, 1.00, 1.24, 1.44, 1.68, 2.32, 2.48, 2.60, 2.64, 2.80, 3.00, 3.16, 3.32, 3.52, 3.68, 3.92, 5.00, 5.16, 5.28, 5.32, 5.44, 5.84, 6.00, 6.12, 6.48, 6.68, 6.84, 7.00, 7.16, 7.32, 7.56, 7.68], "tokens":["他", "这", "个", "管", "一", "都", "通", "到", "有", "时", "候", "都", "通", "到", "七", "八", "层", "楼", "高", "然", "它", "这", "个", "管", "箱", "就", "可", "交", "到", "那", "个", "那", "珠", "子", "上"]}
num threads: 1
decoding method: greedy_search
Elapsed seconds: 3.406 s
Real time factor (RTF): 3.406 / 23.634 = 0.144


The feature_dim=80 is incorrect in the above logs. The actual value is 40.