KittenTTS

This page explains how to use sherpa-onnx with KittenTTS.

KittenTTS is a compact English text-to-speech model. It does not require a reference audio prompt. You select a speaker with --sid and synthesize audio directly.

Download a pre-trained model

The quickest way is to download one of the pre-built model archives from https://github.com/k2-fsa/sherpa-onnx/releases/tag/tts-models.

For example:

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/kitten-nano-en-v0_1-fp16.tar.bz2
tar xf kitten-nano-en-v0_1-fp16.tar.bz2
rm kitten-nano-en-v0_1-fp16.tar.bz2

Other released KittenTTS models are listed in kitten-nano-en-v0_1-fp16.

Run a command-line example

The following command uses the same model files as rust-api-examples/examples/kitten_tts_en.rs:

./build/bin/sherpa-onnx-offline-tts \
  --kitten-model=./kitten-nano-en-v0_1-fp16/model.fp16.onnx \
  --kitten-voices=./kitten-nano-en-v0_1-fp16/voices.bin \
  --kitten-tokens=./kitten-nano-en-v0_1-fp16/tokens.txt \
  --kitten-data-dir=./kitten-nano-en-v0_1-fp16/espeak-ng-data \
  --sid=0 \
  --output-filename=./kitten-en.wav \
  "Today as always, men fall into two groups: slaves and free men."

You can also use this tracked helper script:

API examples

Additional example code is available in k2-fsa/sherpa-onnx:

See also