Matcha
This page lists pre-trained models using Matcha-TTS.
Caution
Models are from icefall.
We don’t support models from https://github.com/shivammehta25/Matcha-TTS.
matcha-icefall-en_US-ljspeech (American English, 1 female speaker)
This model is trained using
The dataset used to train the model is from
In the following, we describe how to download it and use it with sherpa-onnx.
Download the model
Please use the following commands to download it.
cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/matcha-icefall-en_US-ljspeech.tar.bz2
tar xvf matcha-icefall-en_US-ljspeech.tar.bz2
rm matcha-icefall-en_US-ljspeech.tar.bz2
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/vocoder-models/hifigan_v2.onnx
Caution
Remember to also download the vocoder model. We use hifigan_v2 in the example. You can also select hifigan_v1 or hifigan_v3.
Please check that the file sizes of the pre-trained models are correct. See
the file sizes of *.onnx
files below.
ls -lh matcha-icefall-en_US-ljspeech/
total 144856
-rw-r--r-- 1 fangjun staff 251B Jan 2 11:05 README.md
drwxr-xr-x 122 fangjun staff 3.8K Nov 28 2023 espeak-ng-data
-rw-r--r--@ 1 fangjun staff 71M Jan 2 04:04 model-steps-3.onnx
-rw-r--r-- 1 fangjun staff 954B Jan 2 11:05 tokens.txt
ls -lh hifigan_v2.onnx
-rw-r--r-- 1 fangjun staff 3.6M Dec 30 17:10 hifigan_v2.onnx
Generate speech with executables compiled from C++
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-offline-tts \
--matcha-acoustic-model=./matcha-icefall-en_US-ljspeech/model-steps-3.onnx \
--matcha-vocoder=./hifigan_v2.onnx \
--matcha-tokens=./matcha-icefall-en_US-ljspeech/tokens.txt \
--matcha-data-dir=./matcha-icefall-en_US-ljspeech/espeak-ng-data \
--num-threads=2 \
--output-filename=./matcha-ljspeech-0.wav \
--debug=1 \
"Today as always, men fall into two groups: slaves and free men. Whoever does not have two-thirds of his day for himself, is a slave, whatever he may be: a statesman, a businessman, an official, or a scholar."
After running, it will generate a file matcha-ljspeech-0.wav
in the
current directory.
soxi ./matcha-ljspeech-0.wav
Input File : './matcha-ljspeech-0.wav'
Channels : 1
Sample Rate : 22050
Precision : 16-bit
Duration : 00:00:15.06 = 332032 samples ~ 1129.36 CDDA sectors
File Size : 664k
Bit Rate : 353k
Sample Encoding: 16-bit Signed Integer PCM
Wave filename | Content | Text |
---|---|---|
matcha-ljspeech-0.wav | "Today as always, men fall into two groups: slaves and free men. Whoever does not have two-thirds of his day for himself, is a slave, whatever he may be: a statesman, a businessman, an official, or a scholar." |
Generate speech with Python script
cd /path/to/sherpa-onnx
python3 ./python-api-examples/offline-tts.py \
--matcha-acoustic-model=./matcha-icefall-en_US-ljspeech/model-steps-3.onnx \
--matcha-vocoder=./hifigan_v2.onnx \
--matcha-tokens=./matcha-icefall-en_US-ljspeech/tokens.txt \
--matcha-data-dir=./matcha-icefall-en_US-ljspeech/espeak-ng-data \
--num-threads=2 \
--output-filename=./matcha-ljspeech-1.wav \
--debug=1 \
"Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone."
soxi ./matcha-ljspeech-1.wav
Input File : './matcha-ljspeech-1.wav'
Channels : 1
Sample Rate : 22050
Precision : 16-bit
Duration : 00:00:07.92 = 174592 samples ~ 593.85 CDDA sectors
File Size : 349k
Bit Rate : 353k
Sample Encoding: 16-bit Signed Integer PCM
Wave filename | Content | Text |
---|---|---|
matcha-ljspeech-1.wav | "Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone." |
matcha-icefall-zh-baker (Chinese, 1 female speaker)
This model is trained using
The dataset used to train the model is from
Caution
The dataset is for non-commercial
use only.
In the following, we describe how to download it and use it with sherpa-onnx.
Download the model
Please use the following commands to download it.
cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/matcha-icefall-zh-baker.tar.bz2
tar xvf matcha-icefall-zh-baker.tar.bz2
rm matcha-icefall-zh-baker.tar.bz2
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/vocoder-models/hifigan_v2.onnx
Caution
Remember to also download the vocoder model. We use hifigan_v2 in the example. You can also select hifigan_v1 or hifigan_v3.
Please check that the file sizes of the pre-trained models are correct. See
the file sizes of *.onnx
files below.
ls -lh matcha-icefall-zh-baker/
total 167344
-rw-r--r-- 1 fangjun staff 370B Dec 31 14:51 README.md
-rw-r--r-- 1 fangjun staff 58K Dec 31 14:51 date.fst
drwxr-xr-x 9 fangjun staff 288B Apr 19 2024 dict
-rw-r--r-- 1 fangjun staff 1.3M Dec 31 14:51 lexicon.txt
-rw-r--r-- 1 fangjun staff 72M Dec 31 14:51 model-steps-3.onnx
-rw-r--r-- 1 fangjun staff 63K Dec 31 14:51 number.fst
-rw-r--r-- 1 fangjun staff 87K Dec 31 14:51 phone.fst
-rw-r--r-- 1 fangjun staff 19K Dec 31 14:51 tokens.txt
ls -lh hifigan_v2.onnx
-rw-r--r-- 1 fangjun staff 3.6M Dec 30 17:10 hifigan_v2.onnx
Generate speech with executables compiled from C++
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-offline-tts \
--matcha-acoustic-model=./matcha-icefall-zh-baker/model-steps-3.onnx \
--matcha-vocoder=./hifigan_v2.onnx \
--matcha-lexicon=./matcha-icefall-zh-baker/lexicon.txt \
--matcha-tokens=./matcha-icefall-zh-baker/tokens.txt \
--matcha-dict-dir=./matcha-icefall-zh-baker/dict \
--num-threads=2 \
--output-filename=./matcha-baker-0.wav \
--debug=1 \
"当夜幕降临,星光点点,伴随着微风拂面,我在静谧中感受着时光的流转,思念如涟漪荡漾,梦境如画卷展开,我与自然融为一体,沉静在这片宁静的美丽之中,感受着生命的奇迹与温柔."
./build/bin/sherpa-onnx-offline-tts \
--matcha-acoustic-model=./matcha-icefall-zh-baker/model-steps-3.onnx \
--matcha-vocoder=./hifigan_v2.onnx \
--matcha-lexicon=./matcha-icefall-zh-baker/lexicon.txt \
--matcha-tokens=./matcha-icefall-zh-baker/tokens.txt \
--tts-rule-fsts=./matcha-icefall-zh-baker/phone.fst,./matcha-icefall-zh-baker/date.fst,./matcha-icefall-zh-baker/number.fst \
--matcha-dict-dir=./matcha-icefall-zh-baker/dict \
--output-filename=./matcha-baker-1.wav \
"某某银行的副行长和一些行政领导表示,他们去过长江和长白山; 经济不断增长。2024年12月31号,拨打110或者18920240511。123456块钱。"
After running, it will generate two files, matcha-baker-0.wav
and
matcha-baker-1.wav
, in the current directory.
soxi matcha-baker-*.wav
Input File : 'matcha-baker-0.wav'
Channels : 1
Sample Rate : 22050
Precision : 16-bit
Duration : 00:00:22.65 = 499456 samples ~ 1698.83 CDDA sectors
File Size : 999k
Bit Rate : 353k
Sample Encoding: 16-bit Signed Integer PCM
Input File : 'matcha-baker-1.wav'
Channels : 1
Sample Rate : 22050
Precision : 16-bit
Duration : 00:00:22.65 = 499456 samples ~ 1698.83 CDDA sectors
File Size : 999k
Bit Rate : 353k
Sample Encoding: 16-bit Signed Integer PCM
Total Duration of 2 files: 00:00:45.30
Wave filename | Content | Text |
---|---|---|
matcha-baker-0.wav | "当夜幕降临,星光点点,伴随着微风拂面,我在静谧中感受着时光的流转,思念如涟漪荡漾,梦境如画卷展开,我与自然融为一体,沉静在这片宁静的美丽之中,感受着生命的奇迹与温柔." | |
matcha-baker-1.wav | "某某银行的副行长和一些行政领导表示,他们去过长江和长白山; 经济不断增长。2024年12月31号,拨打110或者18920240511。123456块钱。" |
Generate speech with Python script
cd /path/to/sherpa-onnx
python3 ./python-api-examples/offline-tts.py \
--matcha-acoustic-model=./matcha-icefall-zh-baker/model-steps-3.onnx \
--matcha-vocoder=./hifigan_v2.onnx \
--matcha-lexicon=./matcha-icefall-zh-baker/lexicon.txt \
--matcha-tokens=./matcha-icefall-zh-baker/tokens.txt \
--tts-rule-fsts=./matcha-icefall-zh-baker/phone.fst,./matcha-icefall-zh-baker/date.fst,./matcha-icefall-zh-baker/number.fst \
--matcha-dict-dir=./matcha-icefall-zh-baker/dict \
--output-filename=./matcha-baker-2.wav \
--debug=1 \
"三百六十行,行行出状元。你行的!明天就是 2025年1月1号啦!银行卡被卡住了,你帮个忙,行不行?"
After running, it will generate a file matcha-baker-zh-2.wav
in the current directory.
soxi matcha-baker-2.wav
Input File : 'matcha-baker-2.wav'
Channels : 1
Sample Rate : 22050
Precision : 16-bit
Duration : 00:00:12.71 = 280320 samples ~ 953.469 CDDA sectors
File Size : 561k
Bit Rate : 353k
Sample Encoding: 16-bit Signed Integer PCM
Wave filename | Content | Text |
---|---|---|
matcha-baker-2.wav | "三百六十行,行行出状元。你行的!明天就是 2025年1月1号啦!银行卡被卡住了,你帮个忙,行不行?" |
RTF on Raspberry Pi 4 Model B Rev 1.5
We use the following command to test the RTF of this model on Raspberry Pi 4 Model B Rev 1.5:
for t in 1 2 3 4; do
build/bin/sherpa-onnx-offline-tts \
--num-threads=$t \
--matcha-acoustic-model=./matcha-icefall-zh-baker/model-steps-3.onnx \
--matcha-vocoder=./hifigan_v2.onnx \
--matcha-lexicon=./matcha-icefall-zh-baker/lexicon.txt \
--matcha-tokens=./matcha-icefall-zh-baker/tokens.txt \
--matcha-dict-dir=./matcha-icefall-zh-baker/dict \
--output-filename=./matcha-baker-0.wav \
"当夜幕降临,星光点点,伴随着微风拂面,我在静谧中感受着时光的流转,思念如涟漪荡漾,梦境如画卷展开,我与自然融为一体,沉静在这片宁静的美丽之中,感受 着生命的奇迹与温柔."
done
The results are given below:
num_threads
1
2
3
4
RTF
0.892
0.536
0.432
0.391