KittenTTS
This page explains how to use sherpa-onnx with KittenTTS.
KittenTTS is a compact English text-to-speech model. It does not require a
reference audio prompt. You select a speaker with --sid and synthesize
audio directly.
Download a pre-trained model
The quickest way is to download one of the pre-built model archives from https://github.com/k2-fsa/sherpa-onnx/releases/tag/tts-models.
For example:
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/kitten-nano-en-v0_1-fp16.tar.bz2
tar xf kitten-nano-en-v0_1-fp16.tar.bz2
rm kitten-nano-en-v0_1-fp16.tar.bz2
Other released KittenTTS models are listed in kitten-nano-en-v0_1-fp16.
Run a command-line example
The following command uses the same model files as rust-api-examples/examples/kitten_tts_en.rs:
./build/bin/sherpa-onnx-offline-tts \
--kitten-model=./kitten-nano-en-v0_1-fp16/model.fp16.onnx \
--kitten-voices=./kitten-nano-en-v0_1-fp16/voices.bin \
--kitten-tokens=./kitten-nano-en-v0_1-fp16/tokens.txt \
--kitten-data-dir=./kitten-nano-en-v0_1-fp16/espeak-ng-data \
--sid=0 \
--output-filename=./kitten-en.wav \
"Today as always, men fall into two groups: slaves and free men."
You can also use this tracked helper script:
API examples
Additional example code is available in k2-fsa/sherpa-onnx:
Rust
C++ and C
Python
Go
Java and Kotlin
Dart and Swift
.NET
JavaScript
Pascal