This documentation covers the public native APIs shipped in:
These headers expose the main sherpa-onnx inference features for native applications and for language bindings that need a stable ABI.
What is documented here
The generated docs include the public APIs for:
- streaming ASR
- non-streaming ASR
- keyword spotting
- voice activity detection
- offline text-to-speech
- spoken language identification
- speaker embedding extraction and speaker management
- audio tagging
- offline and online punctuation
- linear resampling
- offline speaker diarization
- offline and online speech enhancement
Model-specific documentation
Each model family has its own documentation page with config examples:
- Non-Streaming (Offline) ASR Models — Non-streaming ASR: Zipformer Transducer, Zipformer CTC, Whisper, SenseVoice, Paraformer, Moonshine, FireRedAsr, Dolphin, Canary, Cohere, WeNet, Omnilingual, FunASR Nano, Qwen3, MedASR, TeleSpeech, GigaAM v2, Parakeet TDT, NeMo CTC
- Streaming (Online) ASR Models — Streaming ASR: Transducer (Zipformer), Nemotron, Paraformer, Zipformer2 CTC, T-One CTC
- Text-to-Speech (TTS) Models — Text-to-Speech: Kokoro, VITS (Piper), Matcha, Kitten, ZipVoice, Pocket, Supertonic
- Voice Activity Detection (VAD) — Voice Activity Detection: Silero VAD, Ten VAD
- Audio Tagging — Audio Tagging: Zipformer, CED
- Punctuation Restoration — Punctuation: Offline (CT-Transformer), Online (CNN-BiLSTM)
- Speech Enhancement / Denoising — Speech Enhancement: GTCRN, DPDFNet (offline and online)
- Source Separation — Source Separation: Spleeter, UVR
- Offline Speaker Diarization — Speaker Diarization: Pyannote segmentation + embedding clustering
- Speaker Embedding Extraction and Management — Speaker Embedding: extraction, enrollment, search, verification
- Spoken Language Identification — Spoken Language Identification: Whisper-based
- Keyword Spotting — Keyword Spotting: Zipformer KWS
- Linear Resampler — Linear Resampler
The C API also includes HarmonyOS-specific constructor variants where applicable.
Which header should I use?
Use c-api.h if you are:
- writing C code
- building FFI bindings for other languages
- integrating through a plain C ABI
Use cxx-api.h if you are:
- writing C++ code directly
- preferring RAII wrappers over manual destroy/free calls
- preferring
std::string, std::vector, and move-only wrapper classes
Common ownership rules
For the C API:
- objects created by
SherpaOnnxCreate*() are usually destroyed with a matching SherpaOnnxDestroy*()
- result snapshots, returned strings, and returned arrays must be released with the specific matching free/destroy function documented on each API
- some helpers return pointers to statically owned strings; those must not be freed
For the C++ API:
- wrapper classes are move-only and use RAII
- copied result objects are returned as standard C++ value types
- callers normally do not need to manage the underlying C pointers directly
Typical workflow
For both APIs, the usual flow is:
- create and fill a config object
- create the engine or recognizer
- create a stream if the feature is stream-based
- feed audio or text
- run decode/compute/generate
- read back results
- destroy resources, or let the C++ wrappers clean them up automatically
Recommended entry points
Start with:
Representative example programs live in:
Useful examples include:
Offline ASR (C API):
Streaming ASR (C API):
TTS (C API):
Other features (C API):
C++ API examples:
Generating the documentation
From sherpa-onnx/c-api/, run:
HTML output is written to: