sherpa-onnx C API 1.0
Public C API and C++ wrapper for sherpa-onnx
Loading...
Searching...
No Matches
Text-to-Speech (TTS) Models

sherpa-onnx supports multiple offline TTS model families through the SherpaOnnxCreateOfflineTts() API. Configure exactly one model family by filling in the corresponding sub-struct of SherpaOnnxOfflineTtsModelConfig.

See also
SherpaOnnxCreateOfflineTts, SherpaOnnxOfflineTtsConfig, SherpaOnnxOfflineTtsModelConfig

Kokoro

Kokoro is a high-quality TTS model with multiple voice support. It uses a model file and a voices binary file.

memset(&config, 0, sizeof(config));
config.model.kokoro.model = "./kokoro-en-v0_19/model.onnx";
config.model.kokoro.voices = "./kokoro-en-v0_19/voices.bin";
config.model.kokoro.tokens = "./kokoro-en-v0_19/tokens.txt";
config.model.kokoro.data_dir = "./kokoro-en-v0_19/espeak-ng-data";
config.model.num_threads = 2;
const SherpaOnnxOfflineTts * SherpaOnnxCreateOfflineTts(const SherpaOnnxOfflineTtsConfig *config)
Create an offline TTS engine.
struct SherpaOnnxOfflineTts SherpaOnnxOfflineTts
Opaque offline TTS handle.
Definition c-api.h:2507
Configuration for offline text-to-speech.
Definition c-api.h:2440
SherpaOnnxOfflineTtsModelConfig model
Definition c-api.h:2442
SherpaOnnxOfflineTtsKokoroModelConfig kokoro
Definition c-api.h:2411

Model package: kokoro-en-v0_19 (English), kokoro-multi-lang-v1_0 (multilingual)

Example source: kokoro-tts-en-c-api.c

VITS (Piper)

VITS models from the Piper project use espeak-ng for phonemization.

memset(&config, 0, sizeof(config));
config.model.vits.model =
"./vits-piper-en_US-lessac-medium/en_US-lessac-medium.onnx";
config.model.vits.tokens =
"./vits-piper-en_US-lessac-medium/tokens.txt";
"./vits-piper-en_US-lessac-medium/espeak-ng-data";
config.model.vits.noise_scale = 0.667f;
config.model.vits.noise_scale_w = 0.8f;
config.model.vits.length_scale = 1.0f;
config.model.num_threads = 2;
SherpaOnnxOfflineTtsVitsModelConfig vits
Definition c-api.h:2401

Model package: vits-piper-en_US-lessac-medium

Example source: offline-tts-c-api.c

Matcha

Matcha is a flow-matching TTS model that requires a separate vocoder (e.g., Vocos).

memset(&config, 0, sizeof(config));
"./matcha-icefall-en_US-ljspeech/model-steps-3.onnx";
config.model.matcha.vocoder = "./vocos-22khz-univ.onnx";
"./matcha-icefall-en_US-ljspeech/tokens.txt";
"./matcha-icefall-en_US-ljspeech/espeak-ng-data";
config.model.num_threads = 2;
SherpaOnnxOfflineTtsMatchaModelConfig matcha
Definition c-api.h:2409

Model package: matcha-icefall-en_US-ljspeech (English), matcha-icefall-zh-baker (Chinese)

Example source: matcha-tts-en-c-api.c

Kitten

Kitten is a compact TTS model with voice support.

memset(&config, 0, sizeof(config));
config.model.kitten.model = "./kitten-nano-en-v0_1-fp16/model.fp16.onnx";
config.model.kitten.voices = "./kitten-nano-en-v0_1-fp16/voices.bin";
config.model.kitten.tokens = "./kitten-nano-en-v0_1-fp16/tokens.txt";
"./kitten-nano-en-v0_1-fp16/espeak-ng-data";
config.model.num_threads = 2;
SherpaOnnxOfflineTtsKittenModelConfig kitten
Definition c-api.h:2413

Model package: kitten-nano-en-v0_1-fp16

Example source: kitten-tts-en-c-api.c

ZipVoice

ZipVoice is a flow-matching TTS model with a separate vocoder. It supports Chinese and English.

memset(&config, 0, sizeof(config));
"./sherpa-onnx-zipvoice-distill-int8-zh-en-emilia/encoder.int8.onnx";
"./sherpa-onnx-zipvoice-distill-int8-zh-en-emilia/decoder.int8.onnx";
config.model.zipvoice.vocoder = "./vocos_24khz.onnx";
"./sherpa-onnx-zipvoice-distill-int8-zh-en-emilia/tokens.txt";
"./sherpa-onnx-zipvoice-distill-int8-zh-en-emilia/lexicon.txt";
"./sherpa-onnx-zipvoice-distill-int8-zh-en-emilia/espeak-ng-data";
config.model.num_threads = 2;
SherpaOnnxOfflineTtsZipvoiceModelConfig zipvoice
Definition c-api.h:2415

Model package: sherpa-onnx-zipvoice-distill-int8-zh-en-emilia

Example source: zipvoice-tts-zh-en-c-api.c

Pocket

Pocket TTS uses a language model flow architecture with multiple ONNX files.

memset(&config, 0, sizeof(config));
"./sherpa-onnx-pocket-tts-int8-2026-01-26/lm_flow.int8.onnx";
"./sherpa-onnx-pocket-tts-int8-2026-01-26/lm_main.int8.onnx";
"./sherpa-onnx-pocket-tts-int8-2026-01-26/encoder.onnx";
"./sherpa-onnx-pocket-tts-int8-2026-01-26/decoder.int8.onnx";
"./sherpa-onnx-pocket-tts-int8-2026-01-26/text_conditioner.onnx";
"./sherpa-onnx-pocket-tts-int8-2026-01-26/vocab.json";
"./sherpa-onnx-pocket-tts-int8-2026-01-26/token_scores.json";
config.model.num_threads = 2;
SherpaOnnxOfflineTtsPocketModelConfig pocket
Definition c-api.h:2417

Model package: sherpa-onnx-pocket-tts-int8-2026-01-26

Example source: pocket-tts-en-c-api.c

Supertonic

Supertonic is a non-autoregressive TTS model using duration prediction and vector estimation.

memset(&config, 0, sizeof(config));
"./sherpa-onnx-supertonic-3-tts-int8-2026-05-11/duration_predictor.int8.onnx";
"./sherpa-onnx-supertonic-3-tts-int8-2026-05-11/text_encoder.int8.onnx";
"./sherpa-onnx-supertonic-3-tts-int8-2026-05-11/vector_estimator.int8.onnx";
"./sherpa-onnx-supertonic-3-tts-int8-2026-05-11/vocoder.int8.onnx";
"./sherpa-onnx-supertonic-3-tts-int8-2026-05-11/tts.json";
"./sherpa-onnx-supertonic-3-tts-int8-2026-05-11/unicode_indexer.bin";
"./sherpa-onnx-supertonic-3-tts-int8-2026-05-11/voice.bin";
config.model.num_threads = 2;
SherpaOnnxOfflineTtsSupertonicModelConfig supertonic
Definition c-api.h:2419

Model package: sherpa-onnx-supertonic-3-tts-int8-2026-05-11

Example source: supertonic-tts-en-c-api.c