Examples

This page collects starter examples for running sherpa-onnx with the SpacemiT execution provider.

Note

The first inference run is usually used for initialization, so its timing is not representative. In real usage, you can run a first inference with empty data and ignore that timing result.

Prepare models

You can use xslim to quantize your ONNX models to dynamic int8 format or static int8 format, which is optimized for SpacemiT. For example, you can quantize a VITS TTS model with:

# You can install xslim with pip, or build it from source. For building from source, please refer to
# https://github.com/spacemit-com/xslim
# pip install xslim

python3 -m xslim -i ./model.onnx -o ./model.dynq.onnx --dynq

Offline TTS

${SHERPA_ONNX_INSTALL_DIR}/bin/sherpa-onnx-offline-tts \
   --provider=spacemit \
   --vits-model=./en_US-lessac-medium.dynq.onnx \
   --vits-data-dir=./espeak-ng-data \
   --vits-tokens=./tokens.txt \
   --output-filename=./liliana-piper-en_US-lessac-medium.wav \
   'liliana, the most beautiful and lovely assistant of our team!'

Offline ASR

The local SpacemiT test directory also contains an offline ASR example based on SenseVoice:

${SHERPA_ONNX_INSTALL_DIR}/bin/sherpa-onnx-offline \
  --provider=spacemit \
  --tokens=./tokens.txt \
  --sense-voice-model=./model.dynq.onnx \
  --num-threads=4 \
  ./test_wavs/zh.wav \
  ./test_wavs/en.wav \
  ./test_wavs/ja.wav \
  ./test_wavs/ko.wav

You should see the following output:

Creating recognizer ...
........./sherpa-onnx/sherpa-onnx/csrc/session.cc:SpiltProviderAndConfig:63 Provider string: spacemit
........./sherpa-onnx/sherpa-onnx/csrc/session.cc:GetSessionOptionsImpl:337 Use SpacemiT Execution Provider
........./sherpa-onnx/sherpa-onnx/csrc/session.cc:GetSessionOptionsImpl:347 Set IntraOpNumThreads to 1
........./sherpa-onnx/sherpa-onnx/csrc/session.cc:GetSessionOptionsImpl:349 Set InterOpNumThreads to 1
........./sherpa-onnx/sherpa-onnx/csrc/session.cc:GetSessionOptionsImpl:354 Set SPACEMIT_EP_INTRA_THREAD_NUM to 4
recognizer created in 4.014 s
Started
Done!

./test_wavs/zh.wav
{"lang": "<|zh|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "开饭时间早上九点至下午五点", "timestamps": [0.72, 0.96, 1.26, 1.44, 1.92, 2.10, 2.58, 2.82, 3.30, 3.90, 4.20, 4.56, 4.74], "durations": [], "tokens":["开", "饭", "时", "间", "早", "上", "九", "点", "至", "下", "午", "五", "点"], "ys_log_probs": [], "words": []}
----
./test_wavs/en.wav
{"lang": "<|en|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "the tribal chieftain called for the boy and presented him with fifty pieces of gold", "timestamps": [0.90, 1.26, 1.56, 1.80, 2.16, 2.46, 2.76, 2.94, 3.12, 3.60, 3.96, 4.50, 4.74, 5.10, 5.46, 5.88, 6.18], "durations": [], "tokens":["the", " tri", "bal", " chief", "tain", " called", " for", " the", " boy", " and", " presented", " him", " with", " fifty", " pieces", " of", " gold"], "ys_log_probs": [], "words": []}
----
./test_wavs/ja.wav
{"lang": "<|ja|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "うちの中学は弁当制で持っていけない場合は50円の学校販売のパンを買う", "timestamps": [0.42, 0.60, 0.72, 0.90, 1.08, 1.26, 1.44, 1.62, 1.80, 2.04, 2.46, 2.52, 2.64, 2.76, 2.88, 3.00, 3.12, 3.24, 3.36, 3.48, 3.78, 3.96, 4.20, 4.38, 4.56, 4.68, 4.92, 5.10, 5.28, 5.40, 5.52, 5.70, 5.82, 6.00], "durations": [], "tokens":["う", "ち", "の", "中", "学", "は", "弁", "当", "制", "で", "持", "っ", "て", "い", "け", "な", "い", "場", "合", "は", "5", "0", "円", "の", "学", "校", "販", "売", "の", "パ", "ン", "を", "買", "う"], "ys_log_probs": [], "words": []}
----
./test_wavs/ko.wav
{"lang": "<|ko|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "조 금만 생각 을 하 면서 살 면 훨씬 편할 거야", "timestamps": [0.78, 0.96, 1.14, 1.32, 1.56, 1.62, 1.80, 1.86, 1.98, 2.22, 2.40, 2.76, 3.06, 3.30, 3.42, 3.54], "durations": [], "tokens":["조", " 금", "만", " 생각", " ", "을", " 하", " ", "면서", " 살", " 면", " 훨씬", " 편", "할", " 거", "야"], "ys_log_probs": [], "words": []}
----
num threads: 4
decoding method: greedy_search
Elapsed seconds: 6.831 s
Real time factor (RTF): 6.831 / 24.552 = 0.278