PocketTTS
This page explains how to use sherpa-onnx with PocketTTS.
PocketTTS is an offline zero-shot text-to-speech model. It uses a short reference audio clip to clone the target voice.
Unlike ZipVoice, PocketTTS does not require a reference
transcript. You only need --reference-audio.
Download a pre-trained model
Download the released PocketTTS archive from https://github.com/k2-fsa/sherpa-onnx/releases/tag/tts-models:
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/sherpa-onnx-pocket-tts-int8-2026-01-26.tar.bz2
tar xf sherpa-onnx-pocket-tts-int8-2026-01-26.tar.bz2
rm sherpa-onnx-pocket-tts-int8-2026-01-26.tar.bz2
Run a command-line example
The following command uses the same model files as rust-api-examples/examples/pocket_tts.rs:
./build/bin/sherpa-onnx-offline-tts \
--pocket-lm-flow=./sherpa-onnx-pocket-tts-int8-2026-01-26/lm_flow.int8.onnx \
--pocket-lm-main=./sherpa-onnx-pocket-tts-int8-2026-01-26/lm_main.int8.onnx \
--pocket-encoder=./sherpa-onnx-pocket-tts-int8-2026-01-26/encoder.onnx \
--pocket-decoder=./sherpa-onnx-pocket-tts-int8-2026-01-26/decoder.int8.onnx \
--pocket-text-conditioner=./sherpa-onnx-pocket-tts-int8-2026-01-26/text_conditioner.onnx \
--pocket-vocab-json=./sherpa-onnx-pocket-tts-int8-2026-01-26/vocab.json \
--pocket-token-scores-json=./sherpa-onnx-pocket-tts-int8-2026-01-26/token_scores.json \
--reference-audio=./sherpa-onnx-pocket-tts-int8-2026-01-26/test_wavs/bria.wav \
--num-steps=2 \
--output-filename=./pocket.wav \
"Today as always, men fall into two groups: slaves and free men."
You can also use this tracked helper script:
API examples
Additional example code is available here:
Rust
C++ and C
Python
Go
Java and Kotlin
Dart and Swift
.NET
JavaScript
Pascal
Notes
PocketTTS needs a reference audio clip.
PocketTTS does not require reference text. This is different from ZipVoice.
The reference audio should contain the voice that you want to clone.
--num-stepscontrols the generation quality/speed tradeoff.