Pre-trained models

Two kinds of end-to-end (E2E) models are supported by k2-fsa/sherpa:

CTC
Transducer

Hint

For transducer-based models, we only support stateless transducers. To the best of our knowledge, only icefall supports that. In other words, only transducer models from icefall are currently supported.

For CTC-based models, we support any type of models trained using CTC loss as long as you can export the model via torchscript. Models from the following frameworks are currently supported: icefall, WeNet, and torchaudio (Wav2Vec 2.0). If you have a CTC model and want it to be supported in k2-fsa/sherpa, please create an issue at https://github.com/k2-fsa/sherpa/issues.

Hint

You can try the pre-trained models in your browser without installing anything. See https://huggingface.co/spaces/k2-fsa/automatic-speech-recognition.

This page lists all available pre-trained models that you can download.

Hint

We provide pre-trained models for the following languages:

Arabic

Chinese

English

German

Tibetan

Hint

We provide a colab notebook for you to try offline recognition step by step.

It shows how to install sherpa and use it as offline recognizer, which supports the models from icefall, the WeNet framework and torchaudio.

Pretrained models