This tutorial shows you how to train an VITS model with the VCTK dataset.


TTS related recipes require packages in requirements-tts.txt.

Data preparation

$ cd egs/vctk/TTS
$ ./

To run stage 1 to stage 6, use

$ ./ --stage 1 --stop_stage 6


$ export CUDA_VISIBLE_DEVICES="0,1,2,3"
$ ./vits/ \
    --world-size 4 \
    --num-epochs 1000 \
    --start-epoch 1 \
    --use-fp16 1 \
    --exp-dir vits/exp \
    --tokens data/tokens.txt
    --max-duration 350


You can adjust the hyper-parameters to control the size of the VITS model and the training configurations. For more details, please run ./vits/ --help.


The training can take a long time (usually a couple of days).

Training logs, checkpoints and tensorboard logs are saved in vits/exp.


The inference part uses checkpoints saved by the training part, so you have to run the training part first. It will save the ground-truth and generated wavs to the directory vits/exp/infer/epoch-*/wav, e.g., vits/exp/infer/epoch-1000/wav.

$ ./vits/ \
    --epoch 1000 \
    --exp-dir vits/exp \
    --tokens data/tokens.txt \
    --max-duration 500


For more details, please run ./vits/ --help.

Export models

Currently we only support ONNX model exporting. It will generate two files in the given exp-dir: vits-epoch-*.onnx and vits-epoch-*.int8.onnx.

$ ./vits/ \
    --epoch 1000 \
    --exp-dir vits/exp \
    --tokens data/tokens.txt

You can test the exported ONNX model with:

$ ./vits/ \
    --model-filename vits/exp/vits-epoch-1000.onnx \
    --tokens data/tokens.txt

Download pretrained models

If you don’t want to train from scratch, you can download the pretrained models by visiting the following link: