kitten-nano-en-v0_1-fp16
Info about this model
This model is kitten-tts-nano-0.1 and it is from https://huggingface.co/KittenML/kitten-tts-nano-0.1
It supports only English
.
Number of speakers | Sample rate |
---|---|
8 | 24000 |
Model download address
https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/kitten-nano-en-v0_1-fp16.tar.bz2
Android APK
The following table shows the Android TTS Engine APK with this model for sherpa-onnx v1.12.10
If you don't know what ABI is, you probably need to select
arm64-v8a
.
The source code for the APK can be found at
https://github.com/k2-fsa/sherpa-onnx/tree/master/android/SherpaOnnxTtsEngine
Please refer to the documentation for how to build the APK from source code.
More Android APKs can be found at
https://k2-fsa.github.io/sherpa/onnx/tts/apk-engine.html
Meaning of speaker suffix
Suffix | Meaning |
---|---|
f | Female |
m | Male |
speaker ID to speaker name (sid -> name)
The mapping from speaker ID (sid) to speaker name is given below:
0 - 1 | 0 -> expr-voice-2-m | 1 -> expr-voice-2-f |
2 - 3 | 2 -> expr-voice-3-m | 3 -> expr-voice-3-f |
4 - 5 | 4 -> expr-voice-4-m | 5 -> expr-voice-4-f |
6 - 7 | 6 -> expr-voice-5-m | 7 -> expr-voice-5-f |
speaker name to speaker ID (name -> sid)
The mapping from speaker name to speaker ID (sid) is given below:
0 - 1 | expr-voice-2-m -> 0 | expr-voice-2-f -> 1 |
2 - 3 | expr-voice-3-m -> 2 | expr-voice-3-f -> 3 |
4 - 5 | expr-voice-4-m -> 4 | expr-voice-4-f -> 5 |
6 - 7 | expr-voice-5-m -> 6 | expr-voice-5-f -> 7 |
C API
You can use the following code to play with kitten-nano-en-v0_1-fp16
with C API.
#include <stdio.h>
#include <string.h>
#include "sherpa-onnx/c-api/c-api.h"
int main() {
SherpaOnnxOfflineTtsConfig config;
memset(&config, 0, sizeof(config));
config.model.kitten.model = "./kitten-nano-en-v0_1-fp16/model.fp16.onnx";
config.model.kitten.voices = "./kitten-nano-en-v0_1-fp16/voices.bin";
config.model.kitten.tokens = "./kitten-nano-en-v0_1-fp16/tokens.txt";
config.model.kitten.data_dir = "./kitten-nano-en-v0_1-fp16/espeak-ng-data";
config.model.num_threads = 1;
const SherpaOnnxOfflineTts *tts = SherpaOnnxCreateOfflineTts(&config);
int sid = 0; // speaker id
const char *text = "Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone.";
const SherpaOnnxGeneratedAudio *audio =
SherpaOnnxOfflineTtsGenerate(tts, text, sid, 1.0);
SherpaOnnxWriteWave(audio->samples, audio->n, audio->sample_rate,
"./test.wav");
// You need to free the pointers to avoid memory leak in your app
SherpaOnnxDestroyOfflineTtsGeneratedAudio(audio);
SherpaOnnxDestroyOfflineTts(tts);
printf("Saved to ./test.wav\n");
return 0;
}
In the following, we describe how to compile and run the above C example.
Use shared library (dynamic link)
cd /tmp
git clone https://github.com/k2-fsa/sherpa-onnx
cd sherpa-onnx
mkdir build-shared
cd build-shared
cmake -DSHERPA_ONNX_ENABLE_C_API=ON -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=ON -DCMAKE_INSTALL_PREFIX=/tmp/sherpa-onnx/shared ..
make
make install
You can find required header file and library files inside /tmp/sherpa-onnx/shared
.
Assume you have saved the above example file as /tmp/test-kitten.c
.
Then you can compile it with the following command:
gcc -I /tmp/sherpa-onnx/shared/include -L /tmp/sherpa-onnx/shared/lib -lsherpa-onnx-c-api -lonnxruntime -o /tmp/test-kitten /tmp/test-kitten.c
Now you can run
cd /tmp
# Assume you have downloaded the model and extracted it to /tmp
./test-kitten
You probably need to run
# For Linux export LD_LIBRARY_PATH=/tmp/sherpa-onnx/shared/lib:$LD_LIBRARY_PATH # For macOS export DYLD_LIBRARY_PATH=/tmp/sherpa-onnx/shared/lib:$DYLD_LIBRARY_PATH
before you run
/tmp/test-kitten
.
Use static library (static link)
Please see the documentation at
https://k2-fsa.github.io/sherpa/onnx/c-api/index.html
Python API
Assume you have installed sherpa-onnx
via
pip install sherpa-onnx
and you have downloaded the model from
You can use the following code to play with ./kitten-nano-en-v0_1-fp16
import sherpa_onnx
import soundfile as sf
config = sherpa_onnx.OfflineTtsConfig(
model=sherpa_onnx.OfflineTtsModelConfig(
kitten=sherpa_onnx.OfflineTtsKittenModelConfig(
model="./kitten-nano-en-v0_1-fp16/model.fp16.onnx",
voices="./kitten-nano-en-v0_1-fp16/voices.bin",
tokens="./kitten-nano-en-v0_1-fp16/tokens.txt",
data_dir="./kitten-nano-en-v0_1-fp16/espeak-ng-data",
),
num_threads=2,
),
)
if not config.validate():
raise ValueError("Please check your config")
tts = sherpa_onnx.OfflineTts(config)
audio = tts.generate(text="Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone.", sid=0, speed=1.0)
sf.write("test.mp3", audio.samples, samplerate=audio.sample_rate)
Samples
For the following text:
Friends fell out often because life was changing so fast.
The easiest thing in the world was to lose touch with someone.
sample audios for different speakers are listed below: