T-one CTC-based Models

Hint

Please refer to Installation to install sherpa-onnx before you read this section.

sherpa-onnx-streaming-t-one-russian-2025-09-08 (Russian, 俄语)

This model is converted from https://github.com/voicekit-team/T-one using scripts from https://github.com/k2-fsa/sherpa-onnx/tree/master/scripts/t-one

It expects sample rate 8000 Hz.

Download the model

cd /path/to/sherpa-onnx

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-t-one-russian-2025-09-08.tar.bz2
tar xvf sherpa-onnx-streaming-t-one-russian-2025-09-08.tar.bz2
rm sherpa-onnx-streaming-t-one-russian-2025-09-08.tar.bz2

ls -lh sherpa-onnx-streaming-t-one-russian-2025-09-08

The output is given below:

-rw-r--r--  1 fangjun  staff    99K Sep  8 17:12 0.wav
-rw-r--r--  1 fangjun  staff   553B Sep  8 17:12 LICENSE
-rw-r--r--  1 fangjun  staff   126B Sep  8 17:12 README.md
-rw-r--r--  1 fangjun  staff   138M Sep  8 17:12 model.onnx
-rw-r--r--  1 fangjun  staff   202B Sep  8 17:12 tokens.txt

Decode a single wave file

Hint

It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 8 kHz.

The following code shows how to use the model to decode a wave file:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx \
  --t-one-ctc-model=./sherpa-onnx-streaming-t-one-russian-2025-09-08/model.onnx \
  --tokens=./sherpa-onnx-streaming-t-one-russian-2025-09-08/tokens.txt \
  ./sherpa-onnx-streaming-t-one-russian-2025-09-08/0.wav

Note

Please use ./build/bin/Release/sherpa-onnx.exe for Windows.

Caution

If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

You should see the following output:

/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:372 ./build/bin/sherpa-onnx --t-one-ctc-model=./sherpa-onnx-streaming-t-one-russian-2025-09-08/model.onnx --tokens=./sherpa-onnx-streaming-t-one-russian-2025-09-08/tokens.txt ./sherpa-onnx-streaming-t-one-russian-2025-09-08/0.wav 

OnlineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OnlineModelConfig(transducer=OnlineTransducerModelConfig(encoder="", decoder="", joiner=""), paraformer=OnlineParaformerModelConfig(encoder="", decoder=""), wenet_ctc=OnlineWenetCtcModelConfig(model="", chunk_size=16, num_left_chunks=4), zipformer2_ctc=OnlineZipformer2CtcModelConfig(model=""), nemo_ctc=OnlineNeMoCtcModelConfig(model=""), t_one_ctc=OnlineToneCtcModelConfig(model="./sherpa-onnx-streaming-t-one-russian-2025-09-08/model.onnx"), provider_config=ProviderConfig(device=0, provider="cpu", cuda_config=CudaConfig(cudnn_conv_algo_search=1), trt_config=TensorrtConfig(trt_max_workspace_size=2147483647, trt_max_partition_iterations=10, trt_min_subgraph_size=5, trt_fp16_enable="True", trt_detailed_build_log="False", trt_engine_cache_enable="True", trt_engine_cache_path=".", trt_timing_cache_enable="True", trt_timing_cache_path=".",trt_dump_subgraphs="False" )), tokens="./sherpa-onnx-streaming-t-one-russian-2025-09-08/tokens.txt", num_threads=1, warm_up=0, debug=False, model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OnlineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1, shallow_fusion=True), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.2, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), ctc_fst_decoder_config=OnlineCtcFstDecoderConfig(graph="", max_active=3000), enable_endpoint=True, max_active_paths=4, hotwords_score=1.5, hotwords_file="", decoding_method="greedy_search", blank_penalty=0, temperature_scale=2, rule_fsts="", rule_fars="", reset_encoder=False, hr=HomophoneReplacerConfig(dict_dir="", lexicon="", rule_fsts=""))
./sherpa-onnx-streaming-t-one-russian-2025-09-08/0.wav
Number of threads: 1, Elapsed seconds: 2, Audio duration (s): 6.4, Real time factor (RTF) = 2/6.4 = 0.32
ну сейчас к тебе приедет бригада давай давай я жду
{ "text": "ну сейчас к тебе приедет бригада давай давай я жду", "tokens": ["н", "у", " ", "с", "е", "й", "ч", "а", "с", " ", "к", " ", "т", "е", "б", "е", " ", "п", "р", "и", "е", "д", "е", "т", " ", "б", "р", "и", "г", "а", "д", "а", " ", "д", "а", "в", "а", "й", " ", "д", "а", "в", "а", "й", " ", "я", " ", "ж", "д", "у"], "timestamps": [0.75, 0.78, 0.84, 0.96, 0.99, 1.02, 1.05, 1.08, 1.14, 1.23, 1.50, 1.56, 1.62, 1.65, 1.68, 1.74, 1.98, 2.16, 2.19, 2.25, 2.28, 2.34, 2.37, 2.43, 2.55, 2.70, 2.76, 2.79, 2.85, 2.88, 2.97, 3.03, 3.24, 3.33, 3.39, 3.45, 3.48, 3.57, 6.42, 6.48, 6.54, 6.60, 6.63, 6.66, 6.69, 6.75, 6.78, 6.81, 6.84, 6.90], "ys_probs": [], "lm_probs": [], "context_scores": [], "segment": 0, "words": [], "start_time": 0.00, "is_final": false, "is_eof": false}

Real-time speech recognition from a microphone

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-microphone \
  --t-one-ctc-model=./sherpa-onnx-streaming-t-one-russian-2025-09-08/model.onnx \
  --tokens=./sherpa-onnx-streaming-t-one-russian-2025-09-08/tokens.txt

Hint

If your system is Linux (including embedded Linux), you can also use sherpa-onnx-alsa to do real-time speech recognition with your microphone if sherpa-onnx-microphone does not work for you.

Huggingface space

You can try this model by visiting

https://huggingface.co/spaces/k2-fsa/automatic-speech-recognition