On-device text-to-speech (TTS)

This page describes how to build SherpaOnnxTts for on-device text-to-speech that runs on HarmonyOS.

Open the project with DevEco Studio

You need to first download the code:

# Assume we place it inside /Users/fangjun/open-source
# You can place it anywhere you like.

cd /Users/fangjun/open-source/

git clone https://github.com/k2-fsa/sherpa-onnx

Then start DevEco Studio and follow the screenshots below:

Screenshot of starting DevEco

Fig. 57 Step 1: Click Open

Screenshot of selecting SherpaOnnxTts to open

Fig. 58 Step 2: Select SherpaOnnxTts inside the harmony-os folder and click Open

Screenshot of check version

Fig. 59 Step 3: Check that it is using the latest version. You can visit sherpa_onnx to check available versions.

Select a text-to-speech model

The code supports hundreds of text-to-speech models from

and we have to modify the code to use the model that we choose.

Hint

You can try all of the above models at the following huggingface space:

We give two examples below about how to use the following two models:

Use vits-melo-tts-zh_en

First, we download and unzip the model.

Caution: The model MUST be placed inside the directory rawfile.

cd /Users/fangjun/open-source/sherpa-onnx/harmony-os/SherpaOnnxTts/entry/src/main/resources/rawfile
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-melo-tts-zh_en.tar.bz2
tar xvf vits-melo-tts-zh_en.tar.bz2
rm vits-melo-tts-zh_en.tar.bz2

# Now remove extra files to save space
rm vits-melo-tts-zh_en/model.int8.onnx
rm vits-melo-tts-zh_en/new_heteronym.fst

Please check that your directory looks exactly like the following:

(py38) fangjuns-MacBook-Pro:rawfile fangjun$ pwd
/Users/fangjun/open-source/sherpa-onnx/harmony-os/SherpaOnnxTts/entry/src/main/resources/rawfile
(py38) fangjuns-MacBook-Pro:rawfile fangjun$ ls
vits-melo-tts-zh_en
(py38) fangjuns-MacBook-Pro:rawfile fangjun$ ls -lh vits-melo-tts-zh_en/
total 346848
-rw-r--r--  1 fangjun  staff   1.0K Aug  3 11:11 LICENSE
-rw-r--r--  1 fangjun  staff   156B Aug  3 11:11 README.md
-rw-r--r--  1 fangjun  staff    58K Aug  3 11:11 date.fst
drwxr-xr-x  9 fangjun  staff   288B Apr 19  2024 dict
-rw-r--r--  1 fangjun  staff   6.5M Sep 27 14:19 lexicon.txt
-rw-r--r--  1 fangjun  staff   163M Aug  3 11:11 model.onnx
-rw-r--r--  1 fangjun  staff    63K Aug  3 11:11 number.fst
-rw-r--r--  1 fangjun  staff    87K Aug  3 11:11 phone.fst
-rw-r--r--  1 fangjun  staff   655B Aug  3 11:11 tokens.txt

Now you should see the following inside DevEco Studio:

Screenshot of vits-melo-tts-zh_en inside rawfile

Fig. 60 Step 4: Check the model directory inside the rawfile directory.

Now it is time to modify the code to use our model.

We need to change NonStreamingTtsWorker.ets.

Screenshot of changing code for vits-melo-tts-zh_en

Fig. 61 Step 5: Change the code to use our selected model

Finally, we can build the project. See the screenshot below:

Screenshot of building the project

Fig. 62 Step 6: Build the project

If you have an emulator, you can now start it.

Screenshot of selecting device manager

Fig. 63 Step 7: Select the device manager

Screenshot of starting the emulator

Fig. 64 Step 8: Start the emulator

After the emulator is started, follow the screenshot below to run the app on the emulator:

Screenshot of starting the app on the emulator

Fig. 65 Step 9: Start the app on the emulator

You should see something like below:

Screenshot of app running on the emulator

Fig. 66 Step 10: The app is running on the emulator

Congratulations!

You have successfully run a on-device text-to-speech APP on HarmonyOS!

Use vits-piper-en_US-libritts_r-medium

First, we download and unzip the model.

Caution: The model MUST be placed inside the directory rawfile.

cd /Users/fangjun/open-source/sherpa-onnx/harmony-os/SherpaOnnxTts/entry/src/main/resources/rawfile
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-piper-en_US-libritts_r-medium.tar.bz2
tar xvf vits-piper-en_US-libritts_r-medium.tar.bz2
rm xvf vits-piper-en_US-libritts_r-medium.tar.bz2

Please check that your directory looks exactly like the following:

(py38) fangjuns-MacBook-Pro:rawfile fangjun$ pwd
/Users/fangjun/open-source/sherpa-onnx/harmony-os/SherpaOnnxTts/entry/src/main/resources/rawfile
(py38) fangjuns-MacBook-Pro:rawfile fangjun$ ls
vits-piper-en_US-libritts_r-medium
(py38) fangjuns-MacBook-Pro:rawfile fangjun$ ls -lh vits-piper-en_US-libritts_r-medium/
total 153552
-rw-r--r--    1 fangjun  staff   279B Nov 29  2023 MODEL_CARD
-rw-r--r--    1 fangjun  staff    75M Nov 29  2023 en_US-libritts_r-medium.onnx
-rw-r--r--    1 fangjun  staff    20K Nov 29  2023 en_US-libritts_r-medium.onnx.json
drwxr-xr-x  122 fangjun  staff   3.8K Nov 28  2023 espeak-ng-data
-rw-r--r--    1 fangjun  staff   954B Nov 29  2023 tokens.txt
-rwxr-xr-x    1 fangjun  staff   1.8K Nov 29  2023 vits-piper-en_US.py
-rwxr-xr-x    1 fangjun  staff   730B Nov 29  2023 vits-piper-en_US.sh

Now you should see the following inside DevEco Studio:

Screenshot of vits-piper-en_US-libritts_r-medium inside rawfile

Fig. 67 Step 4: Check the model directory inside the rawfile directory.

Now it is time to modify the code to use our model.

We need to change NonStreamingTtsWorker.ets.

Screenshot of changing code for vits-piper-en_US-libritts_r-medium

Fig. 68 Step 5: Change the code to use our selected model

Finally, we can build the project. See the screenshot below:

Screenshot of changing code for vits-piper-en_US-libritts_r-medium

Fig. 69 Step 6: Build the project

If you have an emulator, you can now start it.

Screenshot of selecting device manager

Fig. 70 Step 7: Select the device manager

Screenshot of starting the emulator

Fig. 71 Step 8: Start the emulator

After the emulator is started, follow the screenshot below to run the app on the emulator:

Screenshot of starting the app on the emulator

Fig. 72 Step 9: Start the app on the emulator

You should see something like below:

Screenshot of app running on the emulator

Fig. 73 Step 10: The app is running on the emulator

Congratulations!

You have successfully run a on-device text-to-speech APP on HarmonyOS!