Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

matcha-icefall-zh-en

Info about this model

This model is trained using the code modified from https://github.com/k2-fsa/icefall/tree/master/egs/baker_zh/TTS/matcha

It is from https://modelscope.cn/models/dengcunqin/matcha_tts_zh_en_20251010

It supports Chinese and English.

Number of speakersSample rate
116000

Download the model

Click to expand

You need to download the acoustic model and the vocoder model.

Download the acoustic model

Please use the following code to download the model:

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/matcha-icefall-zh-en.tar.bz2

tar xvf matcha-icefall-zh-en.tar.bz2
rm matcha-icefall-zh-en.tar.bz2

You should see the following output:

ls -lh matcha-icefall-zh-en/
total 168432
-rw-r--r--@   1 fangjun  staff    58K  4 Dec 14:29 date-zh.fst
drwxr-xr-x@ 122 fangjun  staff   3.8K 28 Nov  2023 espeak-ng-data
-rw-r--r--@   1 fangjun  staff   1.3M  4 Dec 14:29 lexicon.txt
-rw-r--r--@   1 fangjun  staff    72M  4 Dec 14:29 model-steps-3.onnx
-rw-r--r--@   1 fangjun  staff    63K  4 Dec 14:29 number-zh.fst
-rw-r--r--@   1 fangjun  staff    87K  4 Dec 14:29 phone-zh.fst
-rw-r--r--@   1 fangjun  staff   2.0K  4 Dec 14:29 README.md
-rw-r--r--@   1 fangjun  staff    21K  4 Dec 14:29 tokens.txt

Download the vocoder model

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/vocoder-models/vocos-16khz-univ.onnx

You should see the following output

ls -lh vocos-16khz-univ.onnx

-rw-r--r--@ 1 fangjun  staff    51M  4 Dec 14:54 vocos-16khz-univ.onnx

Huggingface space

You can try this model by visiting https://huggingface.co/spaces/k2-fsa/text-to-speech

Huggingface space (WebAssembly, wasm)

You can try this model by visiting

https://huggingface.co/spaces/k2-fsa/web-assembly-zh-en-tts-matcha

The source code is available at https://github.com/k2-fsa/sherpa-onnx/tree/master/wasm/tts

Android APK

Click to expand

The following table shows the Android TTS Engine APK with this model for sherpa-onnx v

ABIURL中国镜像
arm64-v8aDownload下载
armeabi-v7aDownload下载
x86_64Download下载
x86Download下载

If you don’t know what ABI is, you probably need to select arm64-v8a.

The source code for the APK can be found at

https://github.com/k2-fsa/sherpa-onnx/tree/master/android/SherpaOnnxTtsEngine

Please refer to the documentation for how to build the APK from source code.

More Android APKs can be found at

https://k2-fsa.github.io/sherpa/onnx/tts/apk-engine.html

Python API

Click to expand

The following code shows how to use the Python API of sherpa-onnx with this model.

import sherpa_onnx
import soundfile as sf

config = sherpa_onnx.OfflineTtsConfig(
    model=sherpa_onnx.OfflineTtsModelConfig(
        matcha=sherpa_onnx.OfflineTtsMatchaModelConfig(
            acoustic_model="matcha-icefall-zh-en/model-steps-3.onnx",
            vocoder="vocos-16khz-univ.onnx",
            lexicon="matcha-icefall-zh-en/lexicon.txt",
            tokens="matcha-icefall-zh-en/tokens.txt",
            data_dir="matcha-icefall-zh-en/espeak-ng-data",
        ),
        num_threads=2,
        debug=True, # set it False to disable debug output
    ),
    max_num_sentences=1,
    rule_fsts="matcha-icefall-zh-en/phone-zh.fst,matcha-icefall-zh-en/date-zh.fst,matcha-icefall-zh-en/number-zh.fst",
)

if not config.validate():
    raise ValueError("Please check your config")

tts = sherpa_onnx.OfflineTts(config)
text = "我最近在学习machine learning,希望能够在未来的artificial intelligence领域有所建树。在这次vocation中,我们计划去Paris欣赏埃菲尔铁塔和卢浮宫的美景。某某银行的副行长和一些行政领导表示,他们去过长江和长白山; 经济不断增长。开始数字测试。2025年12月4号,拨打110或者189202512043。123456块钱。在这个快速发展的时代,人工智能技术正在改变我们的生活方式。语音合成作为人工智能的重要应用之一,让机器能够用自然流畅的语音与人类进行交流。"


audio = tts.generate(text, sid=0, speed=1.0)

sf.write(
    "./test.mp3",
    audio.samples,
    samplerate=audio.sample_rate,
)

You can save it as test_zh_en.py and then run:

pip install sherpa-onnx soundfile

python3 ./test_zh_en.py

You will get a file test.mp3 in the end.

C API

Click to expand

You can use the following code to play with matcha-icefall-zh-en using C API.

#include <stdio.h>
#include <string.h>

#include "sherpa-onnx/c-api/c-api.h"

int main() {
  SherpaOnnxOfflineTtsConfig config;
  memset(&config, 0, sizeof(config));

  config.model.matcha.acoustic_model = "matcha-icefall-zh-en/model-steps-3.onnx";
  config.model.matcha.vocoder = "vocos-16khz-univ.onnx";
  config.model.matcha.lexicon = "matcha-icefall-zh-en/lexicon.txt";
  config.model.matcha.tokens = "matcha-icefall-zh-en/tokens.txt";
  config.model.matcha.data_dir = "matcha-icefall-zh-en/espeak-ng-data";
  config.model.num_threads = 1;

  // If you want to see debug messages, please set it to 1
  config.model.debug = 0;
  config.rule_fsts = "matcha-icefall-zh-en/phone-zh.fst,matcha-icefall-zh-en/date-zh.fst,matcha-icefall-zh-en/number-zh.fst";

  const SherpaOnnxOfflineTts *tts = SherpaOnnxCreateOfflineTts(&config);

  const char *text = "我最近在学习machine learning,希望能够在未来的artificial intelligence领域有所建树。在这次vocation中,我们计划去Paris欣赏埃菲尔铁塔和卢浮宫的美景。某某银行的副行长和一些行政领导表示,他们去过长江和长白山; 经济不断增长。开始数字测试。2025年12月4号,拨打110或者189202512043。123456块钱。在这个快速发展的时代,人工智能技术正在改变我们的生活方式。语音合成作为人工智能的重要应用之一,让机器能够用自然流畅的语音与人类进行交流。";

  SherpaOnnxGenerationConfig gen_cfg;
  memset(&gen_cfg, 0, sizeof(gen_cfg));
  gen_cfg.sid = 0;
  gen_cfg.speed = 1.0;

  const SherpaOnnxGeneratedAudio *audio =
      SherpaOnnxOfflineTtsGenerateWithConfig(tts, text, &gen_cfg, NULL, NULL);

  SherpaOnnxWriteWave(audio->samples, audio->n, audio->sample_rate,
                      "./test.wav");

  // You need to free the pointers to avoid memory leak in your app
  SherpaOnnxDestroyOfflineTtsGeneratedAudio(audio);
  SherpaOnnxDestroyOfflineTts(tts);

  printf("Saved to ./test.wav\n");

  return 0;
}

In the following, we describe how to compile and run the above C example.

cd /tmp
git clone https://github.com/k2-fsa/sherpa-onnx
cd sherpa-onnx
mkdir build-shared
cd build-shared

cmake \
 -DSHERPA_ONNX_ENABLE_C_API=ON \
 -DCMAKE_BUILD_TYPE=Release \
 -DBUILD_SHARED_LIBS=ON \
 -DCMAKE_INSTALL_PREFIX=/tmp/sherpa-onnx/shared \
 ..

make
make install

You can find required header file and library files inside /tmp/sherpa-onnx/shared.

Assume you have saved the above example file as /tmp/test-zh-en.c. Then you can compile it with the following command:

gcc \
  -I /tmp/sherpa-onnx/shared/include \
  -L /tmp/sherpa-onnx/shared/lib \
  -lsherpa-onnx-c-api \
  -lonnxruntime \
  -o /tmp/test-zh-en \
  /tmp/test-zh-en.c

Now you can run

cd /tmp

# Assume you have downloaded the acoustic model as well as the vocoder model and put them to /tmp
./test-zh-en

You probably need to run

# For Linux
export LD_LIBRARY_PATH=/tmp/sherpa-onnx/shared/lib:$LD_LIBRARY_PATH

# For macOS
export DYLD_LIBRARY_PATH=/tmp/sherpa-onnx/shared/lib:$DYLD_LIBRARY_PATH

before you run /tmp/test-zh-en.

Please see the documentation at

https://k2-fsa.github.io/sherpa/onnx/c-api/index.html

C++ API

Click to expand

You can use the following code to play with matcha-icefall-zh-en using C++ API.

#include <cstdint>
#include <cstdio>
#include <string>

#include "sherpa-onnx/c-api/cxx-api.h"

static int32_t ProgressCallback(const float *samples, int32_t num_samples,
                                float progress, void *arg) {
  fprintf(stderr, "Progress: %.3f%%\n", progress * 100);
  // return 1 to continue generating
  // return 0 to stop generating
  return 1;
}

int32_t main(int32_t argc, char *argv[]) {
  using namespace sherpa_onnx::cxx; // NOLINT
  OfflineTtsConfig config;
  config.model.matcha.acoustic_model = "matcha-icefall-zh-en/model-steps-3.onnx";
  config.model.matcha.vocoder = "vocos-16khz-univ.onnx";
  config.model.matcha.lexicon = "matcha-icefall-zh-en/lexicon.txt";
  config.model.matcha.tokens = "matcha-icefall-zh-en/tokens.txt";
  config.model.matcha.data_dir = "matcha-icefall-zh-en/espeak-ng-data";
  config.model.num_threads = 1;

  // If you want to see debug messages, please set it to 1
  config.model.debug = 0;
  config.rule_fsts = "matcha-icefall-zh-en/phone-zh.fst,matcha-icefall-zh-en/date-zh.fst,matcha-icefall-zh-en/number-zh.fst";

  std::string filename = "./test.wav";
  std::string text = "我最近在学习machine learning,希望能够在未来的artificial intelligence领域有所建树。在这次vocation中,我们计划去Paris欣赏埃菲尔铁塔和卢浮宫的美景。某某银行的副行长和一些行政领导表示,他们去过长江和长白山; 经济不断增长。开始数字测试。2025年12月4号,拨打110或者189202512043。123456块钱。在这个快速发展的时代,人工智能技术正在改变我们的生活方式。语音合成作为人工智能的重要应用之一,让机器能够用自然流畅的语音与人类进行交流。";

  auto tts = OfflineTts::Create(config);

  GenerationConfig gen_cfg;
  gen_cfg.sid = 0;
  gen_cfg.speed = 1.0; // larger -> faster in speech speed

#if 0
  // If you don't want to use a callback, then please enable this branch
  GeneratedAudio audio = tts.Generate(text, gen_cfg);
#else
  GeneratedAudio audio = tts.Generate(text, gen_cfg, ProgressCallback);
#endif

  WriteWave(filename, {audio.samples, audio.sample_rate});

  fprintf(stderr, "Input text is: %s\n", text.c_str());
  fprintf(stderr, "Speaker ID is: %d\n", gen_cfg.sid);
  fprintf(stderr, "Saved to: %s\n", filename.c_str());

  return 0;
}

In the following, we describe how to compile and run the above C++ example.

cd /tmp
git clone https://github.com/k2-fsa/sherpa-onnx
cd sherpa-onnx
mkdir build-shared
cd build-shared

cmake \
 -DSHERPA_ONNX_ENABLE_C_API=ON \
 -DCMAKE_BUILD_TYPE=Release \
 -DBUILD_SHARED_LIBS=ON \
 -DCMAKE_INSTALL_PREFIX=/tmp/sherpa-onnx/shared \
 ..

make
make install

You can find required header file and library files inside /tmp/sherpa-onnx/shared.

Assume you have saved the above example file as /tmp/test-zh-en.cc. Then you can compile it with the following command:

g++ \
  -std=c++17 \
  -I /tmp/sherpa-onnx/shared/include \
  -L /tmp/sherpa-onnx/shared/lib \
  -lsherpa-onnx-cxx-api \
  -lsherpa-onnx-c-api \
  -lonnxruntime \
  -o /tmp/test-zh-en \
  /tmp/test-zh-en.cc

Now you can run

cd /tmp

# Assume you have downloaded the acoustic model as well as the vocoder model and put them to /tmp
./test-zh-en

You probably need to run

# For Linux
export LD_LIBRARY_PATH=/tmp/sherpa-onnx/shared/lib:$LD_LIBRARY_PATH

# For macOS
export DYLD_LIBRARY_PATH=/tmp/sherpa-onnx/shared/lib:$DYLD_LIBRARY_PATH

before you run /tmp/test-zh-en.

Please see the documentation at

https://k2-fsa.github.io/sherpa/onnx/c-api/index.html

Rust API

Click to expand

You can use the following code to play with matcha-icefall-zh-en with Rust API.

use sherpa_onnx::{
    GenerationConfig, OfflineTts, OfflineTtsConfig, OfflineTtsMatchaModelConfig,
};

fn main() {
    let config = OfflineTtsConfig {
        model: sherpa_onnx::OfflineTtsModelConfig {
            matcha: OfflineTtsMatchaModelConfig {
                acoustic_model: Some("matcha-icefall-zh-en/model-steps-3.onnx".into()),
                vocoder: Some("vocos-16khz-univ.onnx".into()),
                tokens: Some("matcha-icefall-zh-en/tokens.txt".into()),
                data_dir: Some("matcha-icefall-zh-en/espeak-ng-data".into()),
                lexicon: Some("matcha-icefall-zh-en/lexicon.txt".into()),
                ..Default::default()
            },
            num_threads: 2,
            debug: false,
            ..Default::default()
        },
        rule_fsts: Some("matcha-icefall-zh-en/phone-zh.fst,matcha-icefall-zh-en/date-zh.fst,matcha-icefall-zh-en/number-zh.fst".into()),
        ..Default::default()
    };

    let tts = OfflineTts::create(&config).expect("Failed to create OfflineTts");

    println!("Sample rate: {}", tts.sample_rate());
    println!("Num speakers: {}", tts.num_speakers());

    let text = "我最近在学习machine learning,希望能够在未来的artificial intelligence领域有所建树。在这次vocation中,我们计划去Paris欣赏埃菲尔铁塔和卢浮宫的美景。某某银行的副行长和一些行政领导表示,他们去过长江和长白山; 经济不断增长。开始数字测试。2025年12月4号,拨打110或者189202512043。123456块钱。在这个快速发展的时代,人工智能技术正在改变我们的生活方式。语音合成作为人工智能的重要应用之一,让机器能够用自然流畅的语音与人类进行交流。";

    let gen_config = GenerationConfig {
        sid: 0,
        speed: 1.0,
        ..Default::default()
    };

    let audio = tts
        .generate_with_config(
            text,
            &gen_config,
            Some(|_samples: &[f32], progress: f32| -> bool {
                println!("Progress: {:.1}%", progress * 100.0);
                true
            }),
        )
        .expect("Generation failed");

    let filename = "./test.wav";
    if audio.save(filename) {
        println!("Saved to: {}", filename);
    } else {
        eprintln!("Failed to save {}", filename);
    }
}

Please refer to the Rust API documentation for how to build and run the above Rust example.

Node.js (addon) API

Click to expand

You need to install the sherpa-onnx-node npm package first:

npm install sherpa-onnx-node

You can use the following code to play with matcha-icefall-zh-en with the Node.js addon API.

const sherpa_onnx = require('sherpa-onnx-node');

function createOfflineTts() {
  const config = {
    model: {
      matcha: {
        acousticModel: 'matcha-icefall-zh-en/model-steps-3.onnx',
        vocoder: 'vocos-16khz-univ.onnx',
        tokens: 'matcha-icefall-zh-en/tokens.txt',
        dataDir: 'matcha-icefall-zh-en/espeak-ng-data',
        lexicon: 'matcha-icefall-zh-en/lexicon.txt',
      },
      debug: true,
      numThreads: 1,
      provider: 'cpu',
    },
    maxNumSentences: 1,
    ruleFsts: 'matcha-icefall-zh-en/phone-zh.fst,matcha-icefall-zh-en/date-zh.fst,matcha-icefall-zh-en/number-zh.fst',
  };
  return new sherpa_onnx.OfflineTts(config);
}

const tts = createOfflineTts();

const text = '我最近在学习machine learning,希望能够在未来的artificial intelligence领域有所建树。在这次vocation中,我们计划去Paris欣赏埃菲尔铁塔和卢浮宫的美景。某某银行的副行长和一些行政领导表示,他们去过长江和长白山; 经济不断增长。开始数字测试。2025年12月4号,拨打110或者189202512043。123456块钱。在这个快速发展的时代,人工智能技术正在改变我们的生活方式。语音合成作为人工智能的重要应用之一,让机器能够用自然流畅的语音与人类进行交流。';

const generationConfig = new sherpa_onnx.GenerationConfig({
  sid: 0,
  speed: 1.0,
  silenceScale: 0.2,
});

let start = Date.now();
const audio = tts.generate({text, generationConfig});
let stop = Date.now();
const elapsed_seconds = (stop - start) / 1000;
const duration = audio.samples.length / audio.sampleRate;
const real_time_factor = elapsed_seconds / duration;
console.log('Wave duration', duration.toFixed(3), 'seconds');
console.log('Elapsed', elapsed_seconds.toFixed(3), 'seconds');
console.log(
    `RTF = ${elapsed_seconds.toFixed(3)}/${duration.toFixed(3)} =`,
    real_time_factor.toFixed(3));

const filename = 'test.wav';
sherpa_onnx.writeWave(
    filename, {samples: audio.samples, sampleRate: audio.sampleRate});

console.log(`Saved to ${filename}`);

Please refer to the Node.js addon API documentation for more details.

Dart API

Click to expand

You can use the following code to play with matcha-icefall-zh-en with Dart API.

import 'package:sherpa_onnx/sherpa_onnx.dart' as sherpa_onnx;

void main() {
  final matcha = sherpa_onnx.OfflineTtsMatchaModelConfig(
    acousticModel: 'matcha-icefall-zh-en/model-steps-3.onnx',
    vocoder: 'vocos-16khz-univ.onnx',
    tokens: 'matcha-icefall-zh-en/tokens.txt',
    dataDir: 'matcha-icefall-zh-en/espeak-ng-data',
    lexicon: 'matcha-icefall-zh-en/lexicon.txt',
  );

  final modelConfig = sherpa_onnx.OfflineTtsModelConfig(
    matcha: matcha,
    numThreads: 1,
    debug: true,
  );
  final config = sherpa_onnx.OfflineTtsConfig(
    model: modelConfig,
    maxNumSenetences: 1,
  );

  final tts = sherpa_onnx.OfflineTts(config);
  final genConfig = sherpa_onnx.OfflineTtsGenerationConfig(
    sid: 0,
    speed: 1.0,
    silenceScale: 0.2,
  );
  final audio = tts.generateWithConfig(text: '我最近在学习machine learning,希望能够在未来的artificial intelligence领域有所建树。在这次vocation中,我们计划去Paris欣赏埃菲尔铁塔和卢浮宫的美景。某某银行的副行长和一些行政领导表示,他们去过长江和长白山; 经济不断增长。开始数字测试。2025年12月4号,拨打110或者189202512043。123456块钱。在这个快速发展的时代,人工智能技术正在改变我们的生活方式。语音合成作为人工智能的重要应用之一,让机器能够用自然流畅的语音与人类进行交流。', config: genConfig);
  tts.free();

  sherpa_onnx.writeWave(
    filename: 'test.wav',
    samples: audio.samples,
    sampleRate: audio.sampleRate,
  );
  print('Saved to test.wav');
}

Please refer to the Dart API documentation for more details.

Swift API

Click to expand

You can use the following code to play with matcha-icefall-zh-en with Swift API.

func run() {
  let matcha = sherpaOnnxOfflineTtsMatchaModelConfig(
    acousticModel: "matcha-icefall-zh-en/model-steps-3.onnx",
    vocoder: "vocos-16khz-univ.onnx",
    tokens: "matcha-icefall-zh-en/tokens.txt",
    dataDir: "matcha-icefall-zh-en/espeak-ng-data",
    lexicon: "matcha-icefall-zh-en/lexicon.txt"
  )
  let modelConfig = sherpaOnnxOfflineTtsModelConfig(matcha: matcha)
  var ttsConfig = sherpaOnnxOfflineTtsConfig(model: modelConfig)

  let tts = SherpaOnnxOfflineTtsWrapper(config: &ttsConfig)

  let text = "我最近在学习machine learning,希望能够在未来的artificial intelligence领域有所建树。在这次vocation中,我们计划去Paris欣赏埃菲尔铁塔和卢浮宫的美景。某某银行的副行长和一些行政领导表示,他们去过长江和长白山; 经济不断增长。开始数字测试。2025年12月4号,拨打110或者189202512043。123456块钱。在这个快速发展的时代,人工智能技术正在改变我们的生活方式。语音合成作为人工智能的重要应用之一,让机器能够用自然流畅的语音与人类进行交流。"
  var genConfig = SherpaOnnxGenerationConfigSwift()
  genConfig.sid = 0
  genConfig.speed = 1.0
  genConfig.silenceScale = 0.2

  let audio = tts.generateWithConfig(text: text, config: genConfig, callback: nil, arg: nil)
  let filename = "test.wav"
  let ok = audio.save(filename: filename)
  if ok == 1 {
    print("Saved to \(filename)")
  } else {
    print("Failed to save \(filename)")
  }
}

@main
struct App {
  static func main() {
    run()
  }
}

Please refer to the Swift API documentation for more details.

C# API

Click to expand

You can use the following code to play with matcha-icefall-zh-en with C# API.

using SherpaOnnx;

var config = new OfflineTtsConfig();
config.Model.Matcha.AcousticModel = "matcha-icefall-zh-en/model-steps-3.onnx";
config.Model.Matcha.Vocoder = "vocos-16khz-univ.onnx";
config.Model.Matcha.Tokens = "matcha-icefall-zh-en/tokens.txt";
config.Model.Matcha.DataDir = "matcha-icefall-zh-en/espeak-ng-data";
config.Model.Matcha.Lexicon = "matcha-icefall-zh-en/lexicon.txt";
config.Model.NumThreads = 1;
config.Model.Debug = 1;
config.Model.Provider = "cpu";
config.RuleFsts = "matcha-icefall-zh-en/phone-zh.fst,matcha-icefall-zh-en/date-zh.fst,matcha-icefall-zh-en/number-zh.fst";
config.MaxNumSentences = 1;

var tts = new OfflineTts(config);
var text = "我最近在学习machine learning,希望能够在未来的artificial intelligence领域有所建树。在这次vocation中,我们计划去Paris欣赏埃菲尔铁塔和卢浮宫的美景。某某银行的副行长和一些行政领导表示,他们去过长江和长白山; 经济不断增长。开始数字测试。2025年12月4号,拨打110或者189202512043。123456块钱。在这个快速发展的时代,人工智能技术正在改变我们的生活方式。语音合成作为人工智能的重要应用之一,让机器能够用自然流畅的语音与人类进行交流。";

OfflineTtsGenerationConfig genConfig = new OfflineTtsGenerationConfig();
genConfig.Sid = 0;
genConfig.Speed = 1.0f;
genConfig.SilenceScale = 0.2f;

var audio = tts.GenerateWithConfig(text, genConfig, null);
var ok = audio.SaveToWaveFile("./test.wav");

if (ok)
{
  Console.WriteLine("Saved to ./test.wav");
}
else
{
  Console.WriteLine("Failed to save ./test.wav");
}

Please refer to the C# API documentation for more details.

Kotlin API

Click to expand

You can use the following code to play with matcha-icefall-zh-en with Kotlin API.

package com.k2fsa.sherpa.onnx

fun main() {
  var config = OfflineTtsConfig(
    model = OfflineTtsModelConfig(
      matcha = OfflineTtsMatchaModelConfig(
        acousticModel = "matcha-icefall-zh-en/model-steps-3.onnx",
        vocoder = "vocos-16khz-univ.onnx",
        tokens = "matcha-icefall-zh-en/tokens.txt",
        dataDir = "matcha-icefall-zh-en/espeak-ng-data",
        lexicon = "matcha-icefall-zh-en/lexicon.txt",
      ),
      numThreads = 1,
      debug = true,
    ),
    ruleFsts = "matcha-icefall-zh-en/phone-zh.fst,matcha-icefall-zh-en/date-zh.fst,matcha-icefall-zh-en/number-zh.fst",
  )
  val tts = OfflineTts(config = config)
  val genConfig = GenerationConfig(
    sid = 0,
    speed = 1.0f,
    silenceScale = 0.2f,
  )
  val audio = tts.generateWithConfigAndCallback(
    text = "我最近在学习machine learning,希望能够在未来的artificial intelligence领域有所建树。在这次vocation中,我们计划去Paris欣赏埃菲尔铁塔和卢浮宫的美景。某某银行的副行长和一些行政领导表示,他们去过长江和长白山; 经济不断增长。开始数字测试。2025年12月4号,拨打110或者189202512043。123456块钱。在这个快速发展的时代,人工智能技术正在改变我们的生活方式。语音合成作为人工智能的重要应用之一,让机器能够用自然流畅的语音与人类进行交流。",
    config = genConfig,
    callback = ::callback,
  )
  audio.save(filename = "test.wav")
  tts.release()
  println("Saved to test.wav")
}

fun callback(samples: FloatArray): Int {
  // 1 means to continue
  // 0 means to stop
  return 1
}

Please refer to the Kotlin API documentation for more details.

Java API

Click to expand

You can use the following code to play with matcha-icefall-zh-en with Java API.

import com.k2fsa.sherpa.onnx.*;

public class TtsDemo {
  public static void main(String[] args) {
    var matcha = new OfflineTtsMatchaModelConfig();
    matcha.setAcousticModel("matcha-icefall-zh-en/model-steps-3.onnx");
    matcha.setVocoder("vocos-16khz-univ.onnx");
    matcha.setTokens("matcha-icefall-zh-en/tokens.txt");
    matcha.setDataDir("matcha-icefall-zh-en/espeak-ng-data");
    matcha.setLexicon("matcha-icefall-zh-en/lexicon.txt");

    var modelConfig = new OfflineTtsModelConfig();
    modelConfig.setMatcha(matcha);
    modelConfig.setNumThreads(1);
    modelConfig.setDebug(true);

    var config = new OfflineTtsConfig();
    config.setModel(modelConfig);
    config.setRuleFsts("matcha-icefall-zh-en/phone-zh.fst,matcha-icefall-zh-en/date-zh.fst,matcha-icefall-zh-en/number-zh.fst");
    config.setMaxNumSentences(1);

    var tts = new OfflineTts(config);
    var text = "我最近在学习machine learning,希望能够在未来的artificial intelligence领域有所建树。在这次vocation中,我们计划去Paris欣赏埃菲尔铁塔和卢浮宫的美景。某某银行的副行长和一些行政领导表示,他们去过长江和长白山; 经济不断增长。开始数字测试。2025年12月4号,拨打110或者189202512043。123456块钱。在这个快速发展的时代,人工智能技术正在改变我们的生活方式。语音合成作为人工智能的重要应用之一,让机器能够用自然流畅的语音与人类进行交流。";

    var genConfig = new GenerationConfig();
    genConfig.setSid(0);
    genConfig.setSpeed(1.0f);
    genConfig.setSilenceScale(0.2f);

    var audio = tts.generateWithConfigAndCallback(text, genConfig, (samples) -> {
      // 1 means to continue, 0 means to stop
      return 1;
    });

    audio.save("test.wav");
    tts.release();
    System.out.println("Saved to test.wav");
  }
}

Please refer to the Java API documentation for more details.

Pascal API

Click to expand

You can use the following code to play with matcha-icefall-zh-en with Pascal API.

program test_matcha;

{$mode objfpc}

uses
  SysUtils,
  sherpa_onnx;

var
  Config: TSherpaOnnxOfflineTtsConfig;
  Tts: TSherpaOnnxOfflineTts;
  Audio: TSherpaOnnxGeneratedAudio;
  GenConfig: TSherpaOnnxGenerationConfig;

begin
  FillChar(Config, SizeOf(Config), 0);

  Config.Model.Matcha.AcousticModel := 'matcha-icefall-zh-en/model-steps-3.onnx';
  Config.Model.Matcha.Vocoder := 'vocos-16khz-univ.onnx';
  Config.Model.Matcha.Tokens := 'matcha-icefall-zh-en/tokens.txt';
  Config.Model.Matcha.DataDir := 'matcha-icefall-zh-en/espeak-ng-data';
  Config.Model.Matcha.Lexicon := 'matcha-icefall-zh-en/lexicon.txt';
  Config.Model.NumThreads := 1;
  Config.Model.Debug := True;
  Config.RuleFsts := 'matcha-icefall-zh-en/phone-zh.fst,matcha-icefall-zh-en/date-zh.fst,matcha-icefall-zh-en/number-zh.fst';
  Config.MaxNumSentences := 1;

  Tts := TSherpaOnnxOfflineTts.Create(@Config);

  GenConfig.Sid := 0;
  GenConfig.Speed := 1.0;
  GenConfig.SilenceScale := 0.2;

  Audio := Tts.GenerateWithConfig('我最近在学习machine learning,希望能够在未来的artificial intelligence领域有所建树。在这次vocation中,我们计划去Paris欣赏埃菲尔铁塔和卢浮宫的美景。某某银行的副行长和一些行政领导表示,他们去过长江和长白山; 经济不断增长。开始数字测试。2025年12月4号,拨打110或者189202512043。123456块钱。在这个快速发展的时代,人工智能技术正在改变我们的生活方式。语音合成作为人工智能的重要应用之一,让机器能够用自然流畅的语音与人类进行交流。', @GenConfig, nil);

  WriteWave('./test.wav', Audio.Samples, Audio.N, Audio.SampleRate);

  WriteLn('Saved to ./test.wav');

  Audio.Free;
  Tts.Free;
end.

Please refer to the Pascal API documentation for more details.

Go API

Click to expand

You can use the following code to play with matcha-icefall-zh-en with Go API.

package main

import (
	"fmt"
	sherpa "github.com/k2-fsa/sherpa-onnx-go/sherpa_onnx"
)

func main() {
	config := sherpa.OfflineTtsConfig{
		Model: sherpa.OfflineTtsModelConfig{
			Matcha: sherpa.OfflineTtsMatchaModelConfig{
				AcousticModel: "matcha-icefall-zh-en/model-steps-3.onnx",
				Vocoder:       "vocos-16khz-univ.onnx",
				Tokens:        "matcha-icefall-zh-en/tokens.txt",
				DataDir:       "matcha-icefall-zh-en/espeak-ng-data",
				Lexicon:       "matcha-icefall-zh-en/lexicon.txt",
			},
			NumThreads: 1,
			Debug:      true,
		},
		RuleFsts: "matcha-icefall-zh-en/phone-zh.fst,matcha-icefall-zh-en/date-zh.fst,matcha-icefall-zh-en/number-zh.fst",
		MaxNumSentences: 1,
	}

	tts := sherpa.NewOfflineTts(&config)
	defer tts.Delete()

	text := "我最近在学习machine learning,希望能够在未来的artificial intelligence领域有所建树。在这次vocation中,我们计划去Paris欣赏埃菲尔铁塔和卢浮宫的美景。某某银行的副行长和一些行政领导表示,他们去过长江和长白山; 经济不断增长。开始数字测试。2025年12月4号,拨打110或者189202512043。123456块钱。在这个快速发展的时代,人工智能技术正在改变我们的生活方式。语音合成作为人工智能的重要应用之一,让机器能够用自然流畅的语音与人类进行交流。"

	genConfig := sherpa.GenerationConfig{
		Sid:          0,
		Speed:        1.0,
		SilenceScale: 0.2,
	}

	audio := tts.GenerateWithConfig(text, &genConfig, nil)

	filename := "./test.wav"
	sherpa.WriteWave(filename, audio.Samples, audio.SampleRate)

	fmt.Printf("Saved to %s\n", filename)
}

Please refer to the Go API documentation for more details.

Samples

For the following text:

我最近在学习machine learning,希望能够在未来的artificial intelligence领域有所建树。在这次vocation中,我们计划去Paris欣赏埃菲尔铁塔和卢浮宫的美景。某某银行的副行长和一些行政领导表示,他们去过长江和长白山; 经济不断增长。开始数字测试。2025年12月4号,拨打110或者189202512043。123456块钱。在这个快速发展的时代,人工智能技术正在改变我们的生活方式。语音合成作为人工智能的重要应用之一,让机器能够用自然流畅的语音与人类进行交流。

sample audios for different speakers are listed below:

Speaker 0