Piper

In this section, we describe how to convert piper pre-trained models from https://huggingface.co/rhasspy/piper-voices.

Hint

You can find all of the converted models from piper in the following address:

https://github.com/k2-fsa/sherpa-onnx/releases/tag/tts-models

If you want to convert your own pre-trained piper models or if you want to learn how the conversion works, please read on.

Otherwise, you only need to download the converted models from the above link.

Note that there are pre-trained models for over 30 languages from piper. All models share the same converting method, so we use an American English model in this section as an example.

Install dependencies

pip install onnx onnxruntime

Hint

We suggest that you always use the latest version of onnxruntime.

Find the pre-trained model from piper

All American English models from piper can be found at https://huggingface.co/rhasspy/piper-voices/tree/main/en/en_US.

We use https://huggingface.co/rhasspy/piper-voices/tree/main/en/en_US/amy/low as an example in this section.

Download the pre-trained model

We need to download two files for each model:

wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/amy/low/en_US-amy-low.onnx
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/amy/low/en_US-amy-low.onnx.json

Add meta data to the onnx model

Please use the following code to add meta data to the downloaded onnx model.

#!/usr/bin/env python3

import json
import os
from typing import Any, Dict

import onnx


def add_meta_data(filename: str, meta_data: Dict[str, Any]):
    """Add meta data to an ONNX model. It is changed in-place.

    Args:
      filename:
        Filename of the ONNX model to be changed.
      meta_data:
        Key-value pairs.
    """
    model = onnx.load(filename)
    for key, value in meta_data.items():
        meta = model.metadata_props.add()
        meta.key = key
        meta.value = str(value)

    onnx.save(model, filename)


def load_config(model):
    with open(f"{model}.json", "r") as file:
        config = json.load(file)
    return config


def generate_tokens(config):
    id_map = config["phoneme_id_map"]
    with open("tokens.txt", "w", encoding="utf-8") as f:
        for s, i in id_map.items():
            f.write(f"{s} {i[0]}\n")
    print("Generated tokens.txt")


def main():
    # Caution: Please change the filename
    filename = "en_US-amy-low.onnx"

    # The rest of the file should not be changed.
    # You only need to change the above filename = "xxx.onxx" in this file

    config = load_config(filename)

    print("generate tokens")
    generate_tokens(config)

    print("add model metadata")
    meta_data = {
        "model_type": "vits",
        "comment": "piper",  # must be piper for models from piper
        "language": config["language"]["name_english"],
        "voice": config["espeak"]["voice"],  # e.g., en-us
        "has_espeak": 1,
        "n_speakers": config["num_speakers"],
        "sample_rate": config["audio"]["sample_rate"],
    }
    print(meta_data)
    add_meta_data(filename, meta_data)


main()

After running the above script, your en_US-amy-low.onnx is updated with meta data and it also generates a new file tokens.txt.

From now on, you don’t need the config json file en_US-amy-low.onnx.json any longer.

Download espeak-ng-data

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/espeak-ng-data.tar.bz2
tar xf espeak-ng-data.tar.bz2

Note that espeak-ng-data.tar.bz2 is shared by all models from piper, no matter which language your are using for your model.

Test your converted model

To have a quick test of your converted model, you can use

pip install sherpa-onnx

to install sherpa-onnx and then use the following commands to test your model:

# The command "pip install sherpa-onnx" will install several binaries,
# including the following one

which sherpa-onnx-offline-tts

sherpa-onnx-offline-tts \
  --vits-model=./en_US-amy-low.onnx \
  --vits-tokens=./tokens.txt \
  --vits-data-dir=./espeak-ng-data \
  --output-filename=./test.wav \
  "How are you doing? This is a text-to-speech application using next generation Kaldi."

The above command should generate a wave file test.wav.

Wave filename	Content	Text
test.wav		How are you doing? This is a text-to-speech application using next generation Kaldi.

Congratulations! You have successfully converted a model from piper and run it with sherpa-onnx.