matcha-icefall-zh-baker
Info about this model
This model is trained using the code from https://github.com/k2-fsa/icefall/tree/master/egs/baker_zh/TTS/matcha
It supports only Chinese.
| Number of speakers | Sample rate |
|---|---|
| 1 | 22050 |
Samples
For the following text:
某某银行的副行长和一些行政领导表示,他们去过长江和长白山; 经济不断增长。2024年12月31号,拨打110或者18920240511。123456块钱。当夜幕降临,星光点点,伴随着微风拂面,我在静谧中感受着时光的流转,思念如涟漪荡漾,梦境如画卷展开,我与自然融为一体,沉静在这片宁静的美丽之中,感受着生命的奇迹与温柔.
sample audios for different speakers are listed below:
Speaker 0
Android APK
The following table shows the Android TTS Engine APK with this model for sherpa-onnx v
If you don’t know what ABI is, you probably need to select
arm64-v8a.
The source code for the APK can be found at
https://github.com/k2-fsa/sherpa-onnx/tree/master/android/SherpaOnnxTtsEngine
Please refer to the documentation for how to build the APK from source code.
More Android APKs can be found at
https://k2-fsa.github.io/sherpa/onnx/tts/apk-engine.html
Huggingface space
You can try this model by visiting https://huggingface.co/spaces/k2-fsa/text-to-speech
Huggingface space (WebAssembly, wasm)
You can try this model by visiting
https://huggingface.co/spaces/k2-fsa/web-assembly-zh-tts-matcha
The source code is available at https://github.com/k2-fsa/sherpa-onnx/tree/master/wasm/tts
Download the model
You need to download the acoustic model and the vocoder model.
Download the acoustic model
Please use the following code to download the model:
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/matcha-icefall-zh-baker.tar.bz2
tar xvf matcha-icefall-zh-baker.tar.bz2
rm matcha-icefall-zh-baker.tar.bz2
You should see the following output:
ls -lh matcha-icefall-zh-baker/
total 150848
-rw-r--r--@ 1 fangjun staff 58K 6 Oct 08:39 date.fst
drwxr-xr-x@ 10 fangjun staff 320B 18 Feb 2025 dict
-rw-r--r--@ 1 fangjun staff 1.3M 6 Oct 08:39 lexicon.txt
-rw-r--r--@ 1 fangjun staff 72M 6 Oct 08:39 model-steps-3.onnx
-rw-r--r--@ 1 fangjun staff 63K 6 Oct 08:39 number.fst
-rw-r--r--@ 1 fangjun staff 87K 6 Oct 08:39 phone.fst
-rw-r--r--@ 1 fangjun staff 370B 6 Oct 08:39 README.md
-rw-r--r--@ 1 fangjun staff 19K 6 Oct 08:39 tokens.txt
Note: The
dictdirectory is no longer needed for this model.
Download the vocoder model
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/vocoder-models/vocos-22khz-univ.onnx
You should see the following output
ls -lh vocos-22khz-univ.onnx
-rw-r--r--@ 1 fangjun staff 51M 17 Mar 2025 vocos-22khz-univ.onnx
Python API
The following code shows how to use the Python API of sherpa-onnx with this model.
Example code
Click to expand
import sherpa_onnx
import soundfile as sf
config = sherpa_onnx.OfflineTtsConfig(
model=sherpa_onnx.OfflineTtsModelConfig(
matcha=sherpa_onnx.OfflineTtsMatchaModelConfig(
acoustic_model="matcha-icefall-zh-baker/model-steps-3.onnx",
vocoder="vocos-22khz-univ.onnx",
lexicon="matcha-icefall-zh-baker/lexicon.txt",
tokens="matcha-icefall-zh-baker/tokens.txt",
),
num_threads=2,
debug=True, # set it False to disable debug output
),
max_num_sentences=1,
rule_fsts="matcha-icefall-zh-baker/phone.fst,matcha-icefall-zh-baker/date.fst,matcha-icefall-zh-baker/number.fst",
)
if not config.validate():
raise ValueError("Please check your config")
tts = sherpa_onnx.OfflineTts(config)
text = "某某银行的副行长和一些行政领导表示,他们去过长江和长白山; 经济不断增长。2024年12月31号,拨打110或者18920240511。123456块钱。当夜幕降临,星光点点,伴随着微风拂面,我在静谧中感受着时光的流转,思念如涟漪荡漾,梦境如画卷展开,我与自然融为一体,沉静在这片宁静的美丽之中,感受着生命的奇迹与温柔."
audio = tts.generate(text, sid=0, speed=1.0)
sf.write(
"./test.mp3",
audio.samples,
samplerate=audio.sample_rate,
)
You can save it as test-zh.py and then run:
pip install sherpa-onnx soundfile
python3 ./test-zh.py
You will get a file test.mp3 in the end.
C API
You can use the following code to play with matcha-icefall-zh-baker using C API.
Example code
Click to expand
#include <stdio.h>
#include <string.h>
#include "sherpa-onnx/c-api/c-api.h"
int main() {
SherpaOnnxOfflineTtsConfig config;
memset(&config, 0, sizeof(config));
config.model.matcha.acoustic_model = "matcha-icefall-zh-baker/model-steps-3.onnx";
config.model.matcha.vocoder = "vocos-22khz-univ.onnx";
config.model.matcha.lexicon = "matcha-icefall-zh-baker/lexicon.txt";
config.model.matcha.tokens = "matcha-icefall-zh-baker/tokens.txt";
config.model.num_threads = 1;
// If you want to see debug messages, please set it to 1
config.model.debug = 0;
config.rule_fsts = "matcha-icefall-zh-baker/phone.fst,matcha-icefall-zh-baker/date.fst,matcha-icefall-zh-baker/number.fst";
const SherpaOnnxOfflineTts *tts = SherpaOnnxCreateOfflineTts(&config);
int sid = 0; // speaker id
const char *text = "某某银行的副行长和一些行政领导表示,他们去过长江和长白山; 经济不断增长。2024年12月31号,拨打110或者18920240511。123456块钱。当夜幕降临,星光点点,伴随着微风拂面,我在静谧中感受着时光的流转,思念如涟漪荡漾,梦境如画卷展开,我与自然融为一体,沉静在这片宁静的美丽之中,感受着生命的奇迹与温柔.";
const SherpaOnnxGeneratedAudio *audio =
SherpaOnnxOfflineTtsGenerate(tts, text, sid, 1.0);
SherpaOnnxWriteWave(audio->samples, audio->n, audio->sample_rate,
"./test.wav");
// You need to free the pointers to avoid memory leak in your app
SherpaOnnxDestroyOfflineTtsGeneratedAudio(audio);
SherpaOnnxDestroyOfflineTts(tts);
printf("Saved to ./test.wav\n");
return 0;
}
In the following, we describe how to compile and run the above C example.
Use shared library (dynamic link)
cd /tmp
git clone https://github.com/k2-fsa/sherpa-onnx
cd sherpa-onnx
mkdir build-shared
cd build-shared
cmake \
-DSHERPA_ONNX_ENABLE_C_API=ON \
-DCMAKE_BUILD_TYPE=Release \
-DBUILD_SHARED_LIBS=ON \
-DCMAKE_INSTALL_PREFIX=/tmp/sherpa-onnx/shared \
..
make
make install
You can find required header file and library files inside /tmp/sherpa-onnx/shared.
Assume you have saved the above example file as /tmp/test-zh.c.
Then you can compile it with the following command:
gcc \
-I /tmp/sherpa-onnx/shared/include \
-L /tmp/sherpa-onnx/shared/lib \
-lsherpa-onnx-c-api \
-lonnxruntime \
-o /tmp/test-zh \
/tmp/test-zh.c
Now you can run
cd /tmp
# Assume you have downloaded the acoustic model as well as the vocoder model and put them to /tmp
./test-zh
You probably need to run
# For Linux export LD_LIBRARY_PATH=/tmp/sherpa-onnx/shared/lib:$LD_LIBRARY_PATH # For macOS export DYLD_LIBRARY_PATH=/tmp/sherpa-onnx/shared/lib:$DYLD_LIBRARY_PATHbefore you run
/tmp/test-zh.
Use static library (static link)
Please see the documentation at
https://k2-fsa.github.io/sherpa/onnx/c-api/index.html
C++ API
You can use the following code to play with matcha-icefall-zh-baker using C++ API.
Example code
Click to expand
#include <cstdint>
#include <cstdio>
#include <string>
#include "sherpa-onnx/c-api/cxx-api.h"
static int32_t ProgressCallback(const float *samples, int32_t num_samples,
float progress, void *arg) {
fprintf(stderr, "Progress: %.3f%%\n", progress * 100);
// return 1 to continue generating
// return 0 to stop generating
return 1;
}
int32_t main(int32_t argc, char *argv[]) {
using namespace sherpa_onnx::cxx; // NOLINT
OfflineTtsConfig config;
config.model.matcha.acoustic_model = "matcha-icefall-zh-baker/model-steps-3.onnx";
config.model.matcha.vocoder = "vocos-22khz-univ.onnx";
config.model.matcha.lexicon = "matcha-icefall-zh-baker/lexicon.txt";
config.model.matcha.tokens = "matcha-icefall-zh-baker/tokens.txt";
config.model.num_threads = 1;
// If you want to see debug messages, please set it to 1
config.model.debug = 0;
config.rule_fsts = "matcha-icefall-zh-baker/phone.fst,matcha-icefall-zh-baker/date.fst,matcha-icefall-zh-baker/number.fst";
std::string filename = "./test.wav";
std::string text = "某某银行的副行长和一些行政领导表示,他们去过长江和长白山; 经济不断增长。2024年12月31号,拨打110或者18920240511。123456块钱。当夜幕降临,星光点点,伴随着微风拂面,我在静谧中感受着时光的流转,思念如涟漪荡漾,梦境如画卷展开,我与自然融为一体,沉静在这片宁静的美丽之中,感受着生命的奇迹与温柔.";
auto tts = OfflineTts::Create(config);
int32_t sid = 0;
float speed = 1.0; // larger -> faster in speech speed
#if 0
// If you don't want to use a callback, then please enable this branch
GeneratedAudio audio = tts.Generate(text, sid, speed);
#else
GeneratedAudio audio = tts.Generate(text, sid, speed, ProgressCallback);
#endif
WriteWave(filename, {audio.samples, audio.sample_rate});
fprintf(stderr, "Input text is: %s\n", text.c_str());
fprintf(stderr, "Speaker ID is: %d\n", sid);
fprintf(stderr, "Saved to: %s\n", filename.c_str());
return 0;
}
In the following, we describe how to compile and run the above C++ example.
Use shared library (dynamic link)
cd /tmp
git clone https://github.com/k2-fsa/sherpa-onnx
cd sherpa-onnx
mkdir build-shared
cd build-shared
cmake \
-DSHERPA_ONNX_ENABLE_C_API=ON \
-DCMAKE_BUILD_TYPE=Release \
-DBUILD_SHARED_LIBS=ON \
-DCMAKE_INSTALL_PREFIX=/tmp/sherpa-onnx/shared \
..
make
make install
You can find required header file and library files inside /tmp/sherpa-onnx/shared.
Assume you have saved the above example file as /tmp/test-zh.cc.
Then you can compile it with the following command:
g++ \
-std=c++17 \
-I /tmp/sherpa-onnx/shared/include \
-L /tmp/sherpa-onnx/shared/lib \
-lsherpa-onnx-cxx-api \
-lsherpa-onnx-c-api \
-lonnxruntime \
-o /tmp/test-zh \
/tmp/test-zh.cc
Now you can run
cd /tmp
# Assume you have downloaded the acoustic model as well as the vocoder model and put them to /tmp
./test-zh
You probably need to run
# For Linux export LD_LIBRARY_PATH=/tmp/sherpa-onnx/shared/lib:$LD_LIBRARY_PATH # For macOS export DYLD_LIBRARY_PATH=/tmp/sherpa-onnx/shared/lib:$DYLD_LIBRARY_PATHbefore you run
/tmp/test-zh.
Use static library (static link)
Please see the documentation at