Introduction
sherpa is the deployment framework of the Next-gen Kaldi
project.
sherpa supports deploying speech related pre-trained models on various platforms with various language bindings.
If you are interested in how to train your own model or fine tune a pre-trained model, please refer to icefall.
At present, sherpa has the following sub-projects:
The differences are compared below:
Installation difficulty |
hard |
|
|
NN lib |
|||
CPU Support |
x86, x86_64 |
x86, x86_64,
arm32 , arm64 **RISC-V** |
x86, x86_64,
arm32 , arm64 ,**RISC-V** |
GPU Support |
Yes
(with
CUDA for NVIDIA GPUs)
|
Yes |
Yes
(with
Vulkan for ARM GPUs)
|
OS Support |
Linux, Windows,
macOS
|
Linux, Windows,
macOS,
iOS ,Android |
Linux, Windows,
macOS,
iOS ,Android |
Support
batch_size > 1
|
Yes |
Yes |
|
Support RKNN |
No |
|
|
Provided APIs |
C++, Python |
C, C++, Python,
C#, Java, Kotlin,
Swift, Go,
JavaScript, Dart
Pascal, Rust
|
C, C++, Python,
C#, Kotlin,
Swift, Go
|
Supported functions |
streaming ASR
non-streaming ASR
|
streaming ASR,
non-streaming ASR,
text-to-speech,
speaker diarization,
speaker identification,
speaker verification,
spoken language
identification,
speech denoising,
speech enhancement,
audio tagging,
VAD,
keyword spotting,
|
streaming ASR,
non-streaming ASR,
text-to-speech,
VAD,
|