matcha-icefall-zh-baker

Info about this model

This model is trained using the code from https://github.com/k2-fsa/icefall/tree/master/egs/baker_zh/TTS/matcha

It supports only Chinese.

Number of speakers	Sample rate
1	22050

Samples

For the following text:

某某银行的副行长和一些行政领导表示，他们去过长江和长白山; 经济不断增长。2024年12月31号，拨打110或者18920240511。123456块钱。当夜幕降临，星光点点，伴随着微风拂面，我在静谧中感受着时光的流转，思念如涟漪荡漾，梦境如画卷展开，我与自然融为一体，沉静在这片宁静的美丽之中，感受着生命的奇迹与温柔.

sample audios for different speakers are listed below:

Speaker 0

Android APK

The following table shows the Android TTS Engine APK with this model for sherpa-onnx v

ABI	URL	中国镜像
arm64-v8a	Download	下载
armeabi-v7a	Download	下载
x86_64	Download	下载
x86	Download	下载

If you don’t know what ABI is, you probably need to select arm64-v8a.

The source code for the APK can be found at

https://github.com/k2-fsa/sherpa-onnx/tree/master/android/SherpaOnnxTtsEngine

Please refer to the documentation for how to build the APK from source code.

More Android APKs can be found at

https://k2-fsa.github.io/sherpa/onnx/tts/apk-engine.html

Huggingface space

You can try this model by visiting https://huggingface.co/spaces/k2-fsa/text-to-speech

Huggingface space (WebAssembly, wasm)

You can try this model by visiting

https://huggingface.co/spaces/k2-fsa/web-assembly-zh-tts-matcha

The source code is available at https://github.com/k2-fsa/sherpa-onnx/tree/master/wasm/tts

Download the model

You need to download the acoustic model and the vocoder model.

Download the acoustic model

Please use the following code to download the model:

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/matcha-icefall-zh-baker.tar.bz2

tar xvf matcha-icefall-zh-baker.tar.bz2
rm matcha-icefall-zh-baker.tar.bz2

You should see the following output:

ls -lh matcha-icefall-zh-baker/
total 150848
-rw-r--r--@  1 fangjun  staff    58K  6 Oct 08:39 date.fst
drwxr-xr-x@ 10 fangjun  staff   320B 18 Feb  2025 dict
-rw-r--r--@  1 fangjun  staff   1.3M  6 Oct 08:39 lexicon.txt
-rw-r--r--@  1 fangjun  staff    72M  6 Oct 08:39 model-steps-3.onnx
-rw-r--r--@  1 fangjun  staff    63K  6 Oct 08:39 number.fst
-rw-r--r--@  1 fangjun  staff    87K  6 Oct 08:39 phone.fst
-rw-r--r--@  1 fangjun  staff   370B  6 Oct 08:39 README.md
-rw-r--r--@  1 fangjun  staff    19K  6 Oct 08:39 tokens.txt

Note: The dict directory is no longer needed for this model.

Download the vocoder model

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/vocoder-models/vocos-22khz-univ.onnx

You should see the following output

ls -lh vocos-22khz-univ.onnx

-rw-r--r--@ 1 fangjun  staff    51M 17 Mar  2025 vocos-22khz-univ.onnx

Python API

The following code shows how to use the Python API of sherpa-onnx with this model.

Example code

Click to expand

import sherpa_onnx
import soundfile as sf

config = sherpa_onnx.OfflineTtsConfig(
    model=sherpa_onnx.OfflineTtsModelConfig(
        matcha=sherpa_onnx.OfflineTtsMatchaModelConfig(
            acoustic_model="matcha-icefall-zh-baker/model-steps-3.onnx",
            vocoder="vocos-22khz-univ.onnx",
            lexicon="matcha-icefall-zh-baker/lexicon.txt",
            tokens="matcha-icefall-zh-baker/tokens.txt",
        ),
        num_threads=2,
        debug=True, # set it False to disable debug output
    ),
    max_num_sentences=1,
    rule_fsts="matcha-icefall-zh-baker/phone.fst,matcha-icefall-zh-baker/date.fst,matcha-icefall-zh-baker/number.fst",
)

if not config.validate():
    raise ValueError("Please check your config")

tts = sherpa_onnx.OfflineTts(config)
text = "某某银行的副行长和一些行政领导表示，他们去过长江和长白山; 经济不断增长。2024年12月31号，拨打110或者18920240511。123456块钱。当夜幕降临，星光点点，伴随着微风拂面，我在静谧中感受着时光的流转，思念如涟漪荡漾，梦境如画卷展开，我与自然融为一体，沉静在这片宁静的美丽之中，感受着生命的奇迹与温柔."


audio = tts.generate(text, sid=0, speed=1.0)

sf.write(
    "./test.mp3",
    audio.samples,
    samplerate=audio.sample_rate,
)

You can save it as test-zh.py and then run:

pip install sherpa-onnx soundfile

python3 ./test-zh.py

You will get a file test.mp3 in the end.

C API

You can use the following code to play with matcha-icefall-zh-baker using C API.

Example code

Click to expand

#include <stdio.h>
#include <string.h>

#include "sherpa-onnx/c-api/c-api.h"

int main() {
  SherpaOnnxOfflineTtsConfig config;
  memset(&config, 0, sizeof(config));

  config.model.matcha.acoustic_model = "matcha-icefall-zh-baker/model-steps-3.onnx";
  config.model.matcha.vocoder = "vocos-22khz-univ.onnx";
  config.model.matcha.lexicon = "matcha-icefall-zh-baker/lexicon.txt";
  config.model.matcha.tokens = "matcha-icefall-zh-baker/tokens.txt";
  config.model.num_threads = 1;

  // If you want to see debug messages, please set it to 1
  config.model.debug = 0;
  config.rule_fsts = "matcha-icefall-zh-baker/phone.fst,matcha-icefall-zh-baker/date.fst,matcha-icefall-zh-baker/number.fst";

  const SherpaOnnxOfflineTts *tts = SherpaOnnxCreateOfflineTts(&config);

  int sid = 0; // speaker id
  const char *text = "某某银行的副行长和一些行政领导表示，他们去过长江和长白山; 经济不断增长。2024年12月31号，拨打110或者18920240511。123456块钱。当夜幕降临，星光点点，伴随着微风拂面，我在静谧中感受着时光的流转，思念如涟漪荡漾，梦境如画卷展开，我与自然融为一体，沉静在这片宁静的美丽之中，感受着生命的奇迹与温柔.";

  const SherpaOnnxGeneratedAudio *audio =
      SherpaOnnxOfflineTtsGenerate(tts, text, sid, 1.0);

  SherpaOnnxWriteWave(audio->samples, audio->n, audio->sample_rate,
                      "./test.wav");

  // You need to free the pointers to avoid memory leak in your app
  SherpaOnnxDestroyOfflineTtsGeneratedAudio(audio);
  SherpaOnnxDestroyOfflineTts(tts);

  printf("Saved to ./test.wav\n");

  return 0;
}

In the following, we describe how to compile and run the above C example.

Use shared library (dynamic link)

cd /tmp
git clone https://github.com/k2-fsa/sherpa-onnx
cd sherpa-onnx
mkdir build-shared
cd build-shared

cmake \
 -DSHERPA_ONNX_ENABLE_C_API=ON \
 -DCMAKE_BUILD_TYPE=Release \
 -DBUILD_SHARED_LIBS=ON \
 -DCMAKE_INSTALL_PREFIX=/tmp/sherpa-onnx/shared \
 ..

make
make install

You can find required header file and library files inside /tmp/sherpa-onnx/shared.

Assume you have saved the above example file as /tmp/test-zh.c. Then you can compile it with the following command:

gcc \
  -I /tmp/sherpa-onnx/shared/include \
  -L /tmp/sherpa-onnx/shared/lib \
  -lsherpa-onnx-c-api \
  -lonnxruntime \
  -o /tmp/test-zh \
  /tmp/test-zh.c

Now you can run

cd /tmp

# Assume you have downloaded the acoustic model as well as the vocoder model and put them to /tmp
./test-zh

You probably need to run

# For Linux
export LD_LIBRARY_PATH=/tmp/sherpa-onnx/shared/lib:$LD_LIBRARY_PATH

# For macOS
export DYLD_LIBRARY_PATH=/tmp/sherpa-onnx/shared/lib:$DYLD_LIBRARY_PATH

before you run /tmp/test-zh.

Use static library (static link)

Please see the documentation at

https://k2-fsa.github.io/sherpa/onnx/c-api/index.html

C++ API

You can use the following code to play with matcha-icefall-zh-baker using C++ API.

Example code

Click to expand

#include <cstdint>
#include <cstdio>
#include <string>

#include "sherpa-onnx/c-api/cxx-api.h"

static int32_t ProgressCallback(const float *samples, int32_t num_samples,
                                float progress, void *arg) {
  fprintf(stderr, "Progress: %.3f%%\n", progress * 100);
  // return 1 to continue generating
  // return 0 to stop generating
  return 1;
}

int32_t main(int32_t argc, char *argv[]) {
  using namespace sherpa_onnx::cxx; // NOLINT
  OfflineTtsConfig config;
  config.model.matcha.acoustic_model = "matcha-icefall-zh-baker/model-steps-3.onnx";
  config.model.matcha.vocoder = "vocos-22khz-univ.onnx";
  config.model.matcha.lexicon = "matcha-icefall-zh-baker/lexicon.txt";
  config.model.matcha.tokens = "matcha-icefall-zh-baker/tokens.txt";
  config.model.num_threads = 1;

  // If you want to see debug messages, please set it to 1
  config.model.debug = 0;
  config.rule_fsts = "matcha-icefall-zh-baker/phone.fst,matcha-icefall-zh-baker/date.fst,matcha-icefall-zh-baker/number.fst";

  std::string filename = "./test.wav";
  std::string text = "某某银行的副行长和一些行政领导表示，他们去过长江和长白山; 经济不断增长。2024年12月31号，拨打110或者18920240511。123456块钱。当夜幕降临，星光点点，伴随着微风拂面，我在静谧中感受着时光的流转，思念如涟漪荡漾，梦境如画卷展开，我与自然融为一体，沉静在这片宁静的美丽之中，感受着生命的奇迹与温柔.";

  auto tts = OfflineTts::Create(config);
  int32_t sid = 0;
  float speed = 1.0; // larger -> faster in speech speed

#if 0
  // If you don't want to use a callback, then please enable this branch
  GeneratedAudio audio = tts.Generate(text, sid, speed);
#else
  GeneratedAudio audio = tts.Generate(text, sid, speed, ProgressCallback);
#endif

  WriteWave(filename, {audio.samples, audio.sample_rate});

  fprintf(stderr, "Input text is: %s\n", text.c_str());
  fprintf(stderr, "Speaker ID is: %d\n", sid);
  fprintf(stderr, "Saved to: %s\n", filename.c_str());

  return 0;
}

In the following, we describe how to compile and run the above C++ example.

Use shared library (dynamic link)

cd /tmp
git clone https://github.com/k2-fsa/sherpa-onnx
cd sherpa-onnx
mkdir build-shared
cd build-shared

cmake \
 -DSHERPA_ONNX_ENABLE_C_API=ON \
 -DCMAKE_BUILD_TYPE=Release \
 -DBUILD_SHARED_LIBS=ON \
 -DCMAKE_INSTALL_PREFIX=/tmp/sherpa-onnx/shared \
 ..

make
make install

You can find required header file and library files inside /tmp/sherpa-onnx/shared.

Assume you have saved the above example file as /tmp/test-zh.cc. Then you can compile it with the following command:

g++ \
  -std=c++17 \
  -I /tmp/sherpa-onnx/shared/include \
  -L /tmp/sherpa-onnx/shared/lib \
  -lsherpa-onnx-cxx-api \
  -lsherpa-onnx-c-api \
  -lonnxruntime \
  -o /tmp/test-zh \
  /tmp/test-zh.cc

Now you can run

cd /tmp

# Assume you have downloaded the acoustic model as well as the vocoder model and put them to /tmp
./test-zh

You probably need to run

# For Linux
export LD_LIBRARY_PATH=/tmp/sherpa-onnx/shared/lib:$LD_LIBRARY_PATH

# For macOS
export DYLD_LIBRARY_PATH=/tmp/sherpa-onnx/shared/lib:$DYLD_LIBRARY_PATH

before you run /tmp/test-zh.

Use static library (static link)

Please see the documentation at

https://k2-fsa.github.io/sherpa/onnx/c-api/index.html

Keyboard shortcuts

sherpa-onnx text-to-speech samples